Principal component regression analysis with SPSS.
Liu, R X; Kuang, J; Gong, Q; Hou, X L
2003-06-01
The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Dascălu, Cristina Gena; Antohe, Magda Ecaterina
2009-01-01
Based on the eigenvalues and the eigenvectors analysis, the principal component analysis has the purpose to identify the subspace of the main components from a set of parameters, which are enough to characterize the whole set of parameters. Interpreting the data for analysis as a cloud of points, we find through geometrical transformations the directions where the cloud's dispersion is maximal--the lines that pass through the cloud's center of weight and have a maximal density of points around them (by defining an appropriate criteria function and its minimization. This method can be successfully used in order to simplify the statistical analysis on questionnaires--because it helps us to select from a set of items only the most relevant ones, which cover the variations of the whole set of data. For instance, in the presented sample we started from a questionnaire with 28 items and, applying the principal component analysis we identified 7 principal components--or main items--fact that simplifies significantly the further data statistical analysis.
ERIC Educational Resources Information Center
Hendrix, Dean
2010-01-01
This study analyzed 2005-2006 Web of Science bibliometric data from institutions belonging to the Association of Research Libraries (ARL) and corresponding ARL statistics to find any associations between indicators from the two data sets. Principal components analysis on 36 variables from 103 universities revealed obvious associations between…
Mahler, Barbara J.
2008-01-01
The statistical analyses taken together indicate that the geochemistry at the freshwater-zone wells is more variable than that at the transition-zone wells. The geochemical variability at the freshwater-zone wells might result from dilution of ground water by meteoric water. This is indicated by relatively constant major ion molar ratios; a preponderance of positive correlations between SC, major ions, and trace elements; and a principal components analysis in which the major ions are strongly loaded on the first principal component. Much of the variability at three of the four transition-zone wells might result from the use of different laboratory analytical methods or reporting procedures during the period of sampling. This is reflected by a lack of correlation between SC and major ion concentrations at the transition-zone wells and by a principal components analysis in which the variability is fairly evenly distributed across several principal components. The statistical analyses further indicate that, although the transition-zone wells are less well connected to surficial hydrologic conditions than the freshwater-zone wells, there is some connection but the response time is longer.
Use of multivariate statistics to identify unreliable data obtained using CASA.
Martínez, Luis Becerril; Crispín, Rubén Huerta; Mendoza, Maximino Méndez; Gallegos, Oswaldo Hernández; Martínez, Andrés Aragón
2013-06-01
In order to identify unreliable data in a dataset of motility parameters obtained from a pilot study acquired by a veterinarian with experience in boar semen handling, but without experience in the operation of a computer assisted sperm analysis (CASA) system, a multivariate graphical and statistical analysis was performed. Sixteen boar semen samples were aliquoted then incubated with varying concentrations of progesterone from 0 to 3.33 µg/ml and analyzed in a CASA system. After standardization of the data, Chernoff faces were pictured for each measurement, and a principal component analysis (PCA) was used to reduce the dimensionality and pre-process the data before hierarchical clustering. The first twelve individual measurements showed abnormal features when Chernoff faces were drawn. PCA revealed that principal components 1 and 2 explained 63.08% of the variance in the dataset. Values of principal components for each individual measurement of semen samples were mapped to identify differences among treatment or among boars. Twelve individual measurements presented low values of principal component 1. Confidence ellipses on the map of principal components showed no statistically significant effects for treatment or boar. Hierarchical clustering realized on two first principal components produced three clusters. Cluster 1 contained evaluations of the two first samples in each treatment, each one of a different boar. With the exception of one individual measurement, all other measurements in cluster 1 were the same as observed in abnormal Chernoff faces. Unreliable data in cluster 1 are probably related to the operator inexperience with a CASA system. These findings could be used to objectively evaluate the skill level of an operator of a CASA system. This may be particularly useful in the quality control of semen analysis using CASA systems.
NASA Technical Reports Server (NTRS)
Williams, D. L.; Borden, F. Y.
1977-01-01
Methods to accurately delineate the types of land cover in the urban-rural transition zone of metropolitan areas were considered. The application of principal components analysis to multidate LANDSAT imagery was investigated as a means of reducing the overlap between residential and agricultural spectral signatures. The statistical concepts of principal components analysis were discussed, as well as the results of this analysis when applied to multidate LANDSAT imagery of the Washington, D.C. metropolitan area.
The Influence Function of Principal Component Analysis by Self-Organizing Rule.
Higuchi; Eguchi
1998-07-28
This article is concerned with a neural network approach to principal component analysis (PCA). An algorithm for PCA by the self-organizing rule has been proposed and its robustness observed through the simulation study by Xu and Yuille (1995). In this article, the robustness of the algorithm against outliers is investigated by using the theory of influence function. The influence function of the principal component vector is given in an explicit form. Through this expression, the method is shown to be robust against any directions orthogonal to the principal component vector. In addition, a statistic generated by the self-organizing rule is proposed to assess the influence of data in PCA.
Surzhikov, V D; Surzhikov, D V
2014-01-01
The search and measurement of causal relationships between exposure to air pollution and health state of the population is based on the system analysis and risk assessment to improve the quality of research. With this purpose there is applied the modern statistical analysis with the use of criteria of independence, principal component analysis and discriminate function analysis. As a result of analysis out of all atmospheric pollutants there were separated four main components: for diseases of the circulatory system main principal component is implied with concentrations of suspended solids, nitrogen dioxide, carbon monoxide, hydrogen fluoride, for the respiratory diseases the main c principal component is closely associated with suspended solids, sulfur dioxide and nitrogen dioxide, charcoal black. The discriminant function was shown to be used as a measure of the level of air pollution.
Transforming Graph Data for Statistical Relational Learning
2012-10-01
Jordan, 2003), PLSA (Hofmann, 1999), ? Classification via RMN (Taskar et al., 2003) or SVM (Hasan, Chaoji, Salem , & Zaki, 2006) ? Hierarchical...dimensionality reduction methods such as Principal 407 Rossi, McDowell, Aha, & Neville Component Analysis (PCA), Principal Factor Analysis ( PFA ), and...clustering algorithm. Journal of the Royal Statistical Society. Series C, Applied statistics, 28, 100–108. Hasan, M. A., Chaoji, V., Salem , S., & Zaki, M
Principal Component Analysis: Resources for an Essential Application of Linear Algebra
ERIC Educational Resources Information Center
Pankavich, Stephen; Swanson, Rebecca
2015-01-01
Principal Component Analysis (PCA) is a highly useful topic within an introductory Linear Algebra course, especially since it can be used to incorporate a number of applied projects. This method represents an essential application and extension of the Spectral Theorem and is commonly used within a variety of fields, including statistics,…
Applications of Nonlinear Principal Components Analysis to Behavioral Data.
ERIC Educational Resources Information Center
Hicks, Marilyn Maginley
1981-01-01
An empirical investigation of the statistical procedure entitled nonlinear principal components analysis was conducted on a known equation and on measurement data in order to demonstrate the procedure and examine its potential usefulness. This method was suggested by R. Gnanadesikan and based on an early paper of Karl Pearson. (Author/AL)
ERIC Educational Resources Information Center
Brusco, Michael J.; Singh, Renu; Steinley, Douglas
2009-01-01
The selection of a subset of variables from a pool of candidates is an important problem in several areas of multivariate statistics. Within the context of principal component analysis (PCA), a number of authors have argued that subset selection is crucial for identifying those variables that are required for correct interpretation of the…
NASA Astrophysics Data System (ADS)
Chattopadhyay, Goutami; Chattopadhyay, Surajit; Chakraborthy, Parthasarathi
2012-07-01
The present study deals with daily total ozone concentration time series over four metro cities of India namely Kolkata, Mumbai, Chennai, and New Delhi in the multivariate environment. Using the Kaiser-Meyer-Olkin measure, it is established that the data set under consideration are suitable for principal component analysis. Subsequently, by introducing rotated component matrix for the principal components, the predictors suitable for generating artificial neural network (ANN) for daily total ozone prediction are identified. The multicollinearity is removed in this way. Models of ANN in the form of multilayer perceptron trained through backpropagation learning are generated for all of the study zones, and the model outcomes are assessed statistically. Measuring various statistics like Pearson correlation coefficients, Willmott's indices, percentage errors of prediction, and mean absolute errors, it is observed that for Mumbai and Kolkata the proposed ANN model generates very good predictions. The results are supported by the linearly distributed coordinates in the scatterplots.
An application of principal component analysis to the clavicle and clavicle fixation devices.
Daruwalla, Zubin J; Courtis, Patrick; Fitzpatrick, Clare; Fitzpatrick, David; Mullett, Hannan
2010-03-26
Principal component analysis (PCA) enables the building of statistical shape models of bones and joints. This has been used in conjunction with computer assisted surgery in the past. However, PCA of the clavicle has not been performed. Using PCA, we present a novel method that examines the major modes of size and three-dimensional shape variation in male and female clavicles and suggests a method of grouping the clavicle into size and shape categories. Twenty-one high-resolution computerized tomography scans of the clavicle were reconstructed and analyzed using a specifically developed statistical software package. After performing statistical shape analysis, PCA was applied to study the factors that account for anatomical variation. The first principal component representing size accounted for 70.5 percent of anatomical variation. The addition of a further three principal components accounted for almost 87 percent. Using statistical shape analysis, clavicles in males have a greater lateral depth and are longer, wider and thicker than in females. However, the sternal angle in females is larger than in males. PCA confirmed these differences between genders but also noted that men exhibit greater variance and classified clavicles into five morphological groups. This unique approach is the first that standardizes a clavicular orientation. It provides information that is useful to both, the biomedical engineer and clinician. Other applications include implant design with regard to modifying current or designing future clavicle fixation devices. Our findings support the need for further development of clavicle fixation devices and the questioning of whether gender-specific devices are necessary.
Considering Horn's Parallel Analysis from a Random Matrix Theory Point of View.
Saccenti, Edoardo; Timmerman, Marieke E
2017-03-01
Horn's parallel analysis is a widely used method for assessing the number of principal components and common factors. We discuss the theoretical foundations of parallel analysis for principal components based on a covariance matrix by making use of arguments from random matrix theory. In particular, we show that (i) for the first component, parallel analysis is an inferential method equivalent to the Tracy-Widom test, (ii) its use to test high-order eigenvalues is equivalent to the use of the joint distribution of the eigenvalues, and thus should be discouraged, and (iii) a formal test for higher-order components can be obtained based on a Tracy-Widom approximation. We illustrate the performance of the two testing procedures using simulated data generated under both a principal component model and a common factors model. For the principal component model, the Tracy-Widom test performs consistently in all conditions, while parallel analysis shows unpredictable behavior for higher-order components. For the common factor model, including major and minor factors, both procedures are heuristic approaches, with variable performance. We conclude that the Tracy-Widom procedure is preferred over parallel analysis for statistically testing the number of principal components based on a covariance matrix.
Non-rigid image registration using a statistical spline deformation model.
Loeckx, Dirk; Maes, Frederik; Vandermeulen, Dirk; Suetens, Paul
2003-07-01
We propose a statistical spline deformation model (SSDM) as a method to solve non-rigid image registration. Within this model, the deformation is expressed using a statistically trained B-spline deformation mesh. The model is trained by principal component analysis of a training set. This approach allows to reduce the number of degrees of freedom needed for non-rigid registration by only retaining the most significant modes of variation observed in the training set. User-defined transformation components, like affine modes, are merged with the principal components into a unified framework. Optimization proceeds along the transformation components rather then along the individual spline coefficients. The concept of SSDM's is applied to the temporal registration of thorax CR-images using pattern intensity as the registration measure. Our results show that, using 30 training pairs, a reduction of 33% is possible in the number of degrees of freedom without deterioration of the result. The same accuracy as without SSDM's is still achieved after a reduction up to 66% of the degrees of freedom.
Ghosh, Debasree; Chattopadhyay, Parimal
2012-06-01
The objective of the work was to use the method of quantitative descriptive analysis (QDA) to describe the sensory attributes of the fermented food products prepared with the incorporation of lactic cultures. Panellists were selected and trained to evaluate various attributes specially color and appearance, body texture, flavor, overall acceptability and acidity of the fermented food products like cow milk curd and soymilk curd, idli, sauerkraut and probiotic ice cream. Principal component analysis (PCA) identified the six significant principal components that accounted for more than 90% of the variance in the sensory attribute data. Overall product quality was modelled as a function of principal components using multiple least squares regression (R (2) = 0.8). The result from PCA was statistically analyzed by analysis of variance (ANOVA). These findings demonstrate the utility of quantitative descriptive analysis for identifying and measuring the fermented food product attributes that are important for consumer acceptability.
[A study of Boletus bicolor from different areas using Fourier transform infrared spectrometry].
Zhou, Zai-Jin; Liu, Gang; Ren, Xian-Pei
2010-04-01
It is hard to differentiate the same species of wild growing mushrooms from different areas by macromorphological features. In this paper, Fourier transform infrared (FTIR) spectroscopy combined with principal component analysis was used to identify 58 samples of boletus bicolor from five different areas. Based on the fingerprint infrared spectrum of boletus bicolor samples, principal component analysis was conducted on 58 boletus bicolor spectra in the range of 1 350-750 cm(-1) using the statistical software SPSS 13.0. According to the result, the accumulated contributing ratio of the first three principal components accounts for 88.87%. They included almost all the information of samples. The two-dimensional projection plot using first and second principal component is a satisfactory clustering effect for the classification and discrimination of boletus bicolor. All boletus bicolor samples were divided into five groups with a classification accuracy of 98.3%. The study demonstrated that wild growing boletus bicolor at species level from different areas can be identified by FTIR spectra combined with principal components analysis.
A novel principal component analysis for spatially misaligned multivariate air pollution data.
Jandarov, Roman A; Sheppard, Lianne A; Sampson, Paul D; Szpiro, Adam A
2017-01-01
We propose novel methods for predictive (sparse) PCA with spatially misaligned data. These methods identify principal component loading vectors that explain as much variability in the observed data as possible, while also ensuring the corresponding principal component scores can be predicted accurately by means of spatial statistics at locations where air pollution measurements are not available. This will make it possible to identify important mixtures of air pollutants and to quantify their health effects in cohort studies, where currently available methods cannot be used. We demonstrate the utility of predictive (sparse) PCA in simulated data and apply the approach to annual averages of particulate matter speciation data from national Environmental Protection Agency (EPA) regulatory monitors.
Information extraction from multivariate images
NASA Technical Reports Server (NTRS)
Park, S. K.; Kegley, K. A.; Schiess, J. R.
1986-01-01
An overview of several multivariate image processing techniques is presented, with emphasis on techniques based upon the principal component transformation (PCT). Multiimages in various formats have a multivariate pixel value, associated with each pixel location, which has been scaled and quantized into a gray level vector, and the bivariate of the extent to which two images are correlated. The PCT of a multiimage decorrelates the multiimage to reduce its dimensionality and reveal its intercomponent dependencies if some off-diagonal elements are not small, and for the purposes of display the principal component images must be postprocessed into multiimage format. The principal component analysis of a multiimage is a statistical analysis based upon the PCT whose primary application is to determine the intrinsic component dimensionality of the multiimage. Computational considerations are also discussed.
Soleimani, Mohammad Ali; Yaghoobzadeh, Ameneh; Bahrami, Nasim; Sharif, Saeed Pahlevan; Sharif Nia, Hamid
2016-10-01
In this study, 398 Iranian cancer patients completed the 15-item Templer's Death Anxiety Scale (TDAS). Tests of internal consistency, principal components analysis, and confirmatory factor analysis were conducted to assess the internal consistency and factorial validity of the Persian TDAS. The construct reliability statistic and average variance extracted were also calculated to measure construct reliability, convergent validity, and discriminant validity. Principal components analysis indicated a 3-component solution, which was generally supported in the confirmatory analysis. However, acceptable cutoffs for construct reliability, convergent validity, and discriminant validity were not fulfilled for the three subscales that were derived from the principal component analysis. This study demonstrated both the advantages and potential limitations of using the TDAS with Persian-speaking cancer patients.
NASA Technical Reports Server (NTRS)
Aires, Filipe; Rossow, William B.; Chedin, Alain; Hansen, James E. (Technical Monitor)
2000-01-01
The use of the Principal Component Analysis technique for the analysis of geophysical time series has been questioned in particular for its tendency to extract components that mix several physical phenomena even when the signal is just their linear sum. We demonstrate with a data simulation experiment that the Independent Component Analysis, a recently developed technique, is able to solve this problem. This new technique requires the statistical independence of components, a stronger constraint, that uses higher-order statistics, instead of the classical decorrelation a weaker constraint, that uses only second-order statistics. Furthermore, ICA does not require additional a priori information such as the localization constraint used in Rotational Techniques.
Schlairet, Maura C; Schlairet, Timothy James; Sauls, Denise H; Bellflowers, Lois
2015-03-01
Establishing the impact of the high-fidelity simulation environment on student performance, as well as identifying factors that could predict learning, would refine simulation outcome expectations among educators. The purpose of this quasi-experimental pilot study was to explore the impact of simulation on emotion and cognitive load among beginning nursing students. Forty baccalaureate nursing students participated in teaching simulations, rated their emotional state and cognitive load, and completed evaluation simulations. Two principal components of emotion were identified representing the pleasant activation and pleasant deactivation components of affect. Mean rating of cognitive load following simulation was high. Linear regression identiffed slight but statistically nonsignificant positive associations between principal components of emotion and cognitive load. Logistic regression identified a negative but statistically nonsignificant effect of cognitive load on assessment performance. Among lower ability students, a more pronounced effect of cognitive load on assessment performance was observed; this also was statistically non-significant. Copyright 2015, SLACK Incorporated.
Steingass, Christof Björn; Jutzi, Manfred; Müller, Jenny; Carle, Reinhold; Schmarr, Hans-Georg
2015-03-01
Ripening-dependent changes of pineapple volatiles were studied in a nontargeted profiling analysis. Volatiles were isolated via headspace solid phase microextraction and analyzed by comprehensive 2D gas chromatography and mass spectrometry (HS-SPME-GC×GC-qMS). Profile patterns presented in the contour plots were evaluated applying image processing techniques and subsequent multivariate statistical data analysis. Statistical methods comprised unsupervised hierarchical cluster analysis (HCA) and principal component analysis (PCA) to classify the samples. Supervised partial least squares discriminant analysis (PLS-DA) and partial least squares (PLS) regression were applied to discriminate different ripening stages and describe the development of volatiles during postharvest storage, respectively. Hereby, substantial chemical markers allowing for class separation were revealed. The workflow permitted the rapid distinction between premature green-ripe pineapples and postharvest-ripened sea-freighted fruits. Volatile profiles of fully ripe air-freighted pineapples were similar to those of green-ripe fruits postharvest ripened for 6 days after simulated sea freight export, after PCA with only two principal components. However, PCA considering also the third principal component allowed differentiation between air-freighted fruits and the four progressing postharvest maturity stages of sea-freighted pineapples.
Tipton, John; Hooten, Mevin B.; Goring, Simon
2017-01-01
Scientific records of temperature and precipitation have been kept for several hundred years, but for many areas, only a shorter record exists. To understand climate change, there is a need for rigorous statistical reconstructions of the paleoclimate using proxy data. Paleoclimate proxy data are often sparse, noisy, indirect measurements of the climate process of interest, making each proxy uniquely challenging to model statistically. We reconstruct spatially explicit temperature surfaces from sparse and noisy measurements recorded at historical United States military forts and other observer stations from 1820 to 1894. One common method for reconstructing the paleoclimate from proxy data is principal component regression (PCR). With PCR, one learns a statistical relationship between the paleoclimate proxy data and a set of climate observations that are used as patterns for potential reconstruction scenarios. We explore PCR in a Bayesian hierarchical framework, extending classical PCR in a variety of ways. First, we model the latent principal components probabilistically, accounting for measurement error in the observational data. Next, we extend our method to better accommodate outliers that occur in the proxy data. Finally, we explore alternatives to the truncation of lower-order principal components using different regularization techniques. One fundamental challenge in paleoclimate reconstruction efforts is the lack of out-of-sample data for predictive validation. Cross-validation is of potential value, but is computationally expensive and potentially sensitive to outliers in sparse data scenarios. To overcome the limitations that a lack of out-of-sample records presents, we test our methods using a simulation study, applying proper scoring rules including a computationally efficient approximation to leave-one-out cross-validation using the log score to validate model performance. The result of our analysis is a spatially explicit reconstruction of spatio-temporal temperature from a very sparse historical record.
ASCS online fault detection and isolation based on an improved MPCA
NASA Astrophysics Data System (ADS)
Peng, Jianxin; Liu, Haiou; Hu, Yuhui; Xi, Junqiang; Chen, Huiyan
2014-09-01
Multi-way principal component analysis (MPCA) has received considerable attention and been widely used in process monitoring. A traditional MPCA algorithm unfolds multiple batches of historical data into a two-dimensional matrix and cut the matrix along the time axis to form subspaces. However, low efficiency of subspaces and difficult fault isolation are the common disadvantages for the principal component model. This paper presents a new subspace construction method based on kernel density estimation function that can effectively reduce the storage amount of the subspace information. The MPCA model and the knowledge base are built based on the new subspace. Then, fault detection and isolation with the squared prediction error (SPE) statistic and the Hotelling ( T 2) statistic are also realized in process monitoring. When a fault occurs, fault isolation based on the SPE statistic is achieved by residual contribution analysis of different variables. For fault isolation of subspace based on the T 2 statistic, the relationship between the statistic indicator and state variables is constructed, and the constraint conditions are presented to check the validity of fault isolation. Then, to improve the robustness of fault isolation to unexpected disturbances, the statistic method is adopted to set the relation between single subspace and multiple subspaces to increase the corrective rate of fault isolation. Finally fault detection and isolation based on the improved MPCA is used to monitor the automatic shift control system (ASCS) to prove the correctness and effectiveness of the algorithm. The research proposes a new subspace construction method to reduce the required storage capacity and to prove the robustness of the principal component model, and sets the relationship between the state variables and fault detection indicators for fault isolation.
Reconstruction Error and Principal Component Based Anomaly Detection in Hyperspectral Imagery
2014-03-27
2003), and (Jackson D. A., 1993). In 1933, Hotelling ( Hotelling , 1933), who coined the term ‘principal components,’ surmised that there was a...goodness of fit and multivariate quality control with the statistic Qi = (Xi(1×p) − X̂i(1×p) )(Xi(1×p) − X̂i(1×p) ) T (20) where, under the...sparsely targeted scenes through SNR or other methods. 5) Customize sorting and histogram construction methods in Multiple PCA to avoid redundancy
Statistical Inference for Porous Materials using Persistent Homology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moon, Chul; Heath, Jason E.; Mitchell, Scott A.
2017-12-01
We propose a porous materials analysis pipeline using persistent homology. We rst compute persistent homology of binarized 3D images of sampled material subvolumes. For each image we compute sets of homology intervals, which are represented as summary graphics called persistence diagrams. We convert persistence diagrams into image vectors in order to analyze the similarity of the homology of the material images using the mature tools for image analysis. Each image is treated as a vector and we compute its principal components to extract features. We t a statistical model using the loadings of principal components to estimate material porosity, permeability,more » anisotropy, and tortuosity. We also propose an adaptive version of the structural similarity index (SSIM), a similarity metric for images, as a measure to determine the statistical representative elementary volumes (sREV) for persistence homology. Thus we provide a capability for making a statistical inference of the uid ow and transport properties of porous materials based on their geometry and connectivity.« less
Rowlands, G J; Musoke, A J; Morzaria, S P; Nagda, S M; Ballingall, K T; McKeever, D J
2000-04-01
A statistically derived disease reaction index based on parasitological, clinical and haematological measurements observed in 309 5 to 8-month-old Boran cattle following laboratory challenge with Theileria parva is described. Principal component analysis was applied to 13 measures including first appearance of schizonts, first appearance of piroplasms and first occurrence of pyrexia, together with the duration and severity of these symptoms, and white blood cell count. The first principal component, which was based on approximately equal contributions of the 13 variables, provided the definition for the disease reaction index, defined on a scale of 0-10. As well as providing a more objective measure of the severity of the reaction, the continuous nature of the index score enables more powerful statistical analysis of the data compared with that which has been previously possible through clinically derived categories of non-, mild, moderate and severe reactions.
Fernee, Christianne; Browne, Martin; Zakrzewski, Sonia
2017-01-01
This paper introduces statistical shape modelling (SSM) for use in osteoarchaeology research. SSM is a full field, multi-material analytical technique, and is presented as a supplementary geometric morphometric (GM) tool. Lower mandibular canines from two archaeological populations and one modern population were sampled, digitised using micro-CT, aligned, registered to a baseline and statistically modelled using principal component analysis (PCA). Sample material properties were incorporated as a binary enamel/dentin parameter. Results were assessed qualitatively and quantitatively using anatomical landmarks. Finally, the technique’s application was demonstrated for inter-sample comparison through analysis of the principal component (PC) weights. It was found that SSM could provide high detail qualitative and quantitative insight with respect to archaeological inter- and intra-sample variability. This technique has value for archaeological, biomechanical and forensic applications including identification, finite element analysis (FEA) and reconstruction from partial datasets. PMID:29216199
Online signature recognition using principal component analysis and artificial neural network
NASA Astrophysics Data System (ADS)
Hwang, Seung-Jun; Park, Seung-Je; Baek, Joong-Hwan
2016-12-01
In this paper, we propose an algorithm for on-line signature recognition using fingertip point in the air from the depth image acquired by Kinect. We extract 10 statistical features from X, Y, Z axis, which are invariant to changes in shifting and scaling of the signature trajectories in three-dimensional space. Artificial neural network is adopted to solve the complex signature classification problem. 30 dimensional features are converted into 10 principal components using principal component analysis, which is 99.02% of total variances. We implement the proposed algorithm and test to actual on-line signatures. In experiment, we verify the proposed method is successful to classify 15 different on-line signatures. Experimental result shows 98.47% of recognition rate when using only 10 feature vectors.
Priority of VHS Development Based in Potential Area using Principal Component Analysis
NASA Astrophysics Data System (ADS)
Meirawan, D.; Ana, A.; Saripudin, S.
2018-02-01
The current condition of VHS is still inadequate in quality, quantity and relevance. The purpose of this research is to analyse the development of VHS based on the development of regional potential by using principal component analysis (PCA) in Bandung, Indonesia. This study used descriptive qualitative data analysis using the principle of secondary data reduction component. The method used is Principal Component Analysis (PCA) analysis with Minitab Statistics Software tool. The results of this study indicate the value of the lowest requirement is a priority of the construction of development VHS with a program of majors in accordance with the development of regional potential. Based on the PCA score found that the main priority in the development of VHS in Bandung is in Saguling, which has the lowest PCA value of 416.92 in area 1, Cihampelas with the lowest PCA value in region 2 and Padalarang with the lowest PCA value.
Azevedo, C F; Nascimento, M; Silva, F F; Resende, M D V; Lopes, P S; Guimarães, S E F; Glória, L S
2015-10-09
A significant contribution of molecular genetics is the direct use of DNA information to identify genetically superior individuals. With this approach, genome-wide selection (GWS) can be used for this purpose. GWS consists of analyzing a large number of single nucleotide polymorphism markers widely distributed in the genome; however, because the number of markers is much larger than the number of genotyped individuals, and such markers are highly correlated, special statistical methods are widely required. Among these methods, independent component regression, principal component regression, partial least squares, and partial principal components stand out. Thus, the aim of this study was to propose an application of the methods of dimensionality reduction to GWS of carcass traits in an F2 (Piau x commercial line) pig population. The results show similarities between the principal and the independent component methods and provided the most accurate genomic breeding estimates for most carcass traits in pigs.
Principal components of wrist circumduction from electromagnetic surgical tracking.
Rasquinha, Brian J; Rainbow, Michael J; Zec, Michelle L; Pichora, David R; Ellis, Randy E
2017-02-01
An electromagnetic (EM) surgical tracking system was used for a functionally calibrated kinematic analysis of wrist motion. Circumduction motions were tested for differences in subject gender and for differences in the sense of the circumduction as clockwise or counter-clockwise motion. Twenty subjects were instrumented for EM tracking. Flexion-extension motion was used to identify the functional axis. Subjects performed unconstrained wrist circumduction in a clockwise and counter-clockwise sense. Data were decomposed into orthogonal flexion-extension motions and radial-ulnar deviation motions. PCA was used to concisely represent motions. Nonparametric Wilcoxon tests were used to distinguish the groups. Flexion-extension motions were projected onto a direction axis with a root-mean-square error of [Formula: see text]. Using the first three principal components, there was no statistically significant difference in gender (all [Formula: see text]). For motion sense, radial-ulnar deviation distinguished the sense of circumduction in the first principal component ([Formula: see text]) and in the third principal component ([Formula: see text]); flexion-extension distinguished the sense in the second principal component ([Formula: see text]). The clockwise sense of circumduction could be distinguished by a multifactorial combination of components; there were no gender differences in this small population. These data constitute a baseline for normal wrist circumduction. The multifactorial PCA findings suggest that a higher-dimensional method, such as manifold analysis, may be a more concise way of representing circumduction in human joints.
Independent component analysis for automatic note extraction from musical trills
NASA Astrophysics Data System (ADS)
Brown, Judith C.; Smaragdis, Paris
2004-05-01
The method of principal component analysis, which is based on second-order statistics (or linear independence), has long been used for redundancy reduction of audio data. The more recent technique of independent component analysis, enforcing much stricter statistical criteria based on higher-order statistical independence, is introduced and shown to be far superior in separating independent musical sources. This theory has been applied to piano trills and a database of trill rates was assembled from experiments with a computer-driven piano, recordings of a professional pianist, and commercially available compact disks. The method of independent component analysis has thus been shown to be an outstanding, effective means of automatically extracting interesting musical information from a sea of redundant data.
Differential principal component analysis of ChIP-seq.
Ji, Hongkai; Li, Xia; Wang, Qian-fei; Ning, Yang
2013-04-23
We propose differential principal component analysis (dPCA) for analyzing multiple ChIP-sequencing datasets to identify differential protein-DNA interactions between two biological conditions. dPCA integrates unsupervised pattern discovery, dimension reduction, and statistical inference into a single framework. It uses a small number of principal components to summarize concisely the major multiprotein synergistic differential patterns between the two conditions. For each pattern, it detects and prioritizes differential genomic loci by comparing the between-condition differences with the within-condition variation among replicate samples. dPCA provides a unique tool for efficiently analyzing large amounts of ChIP-sequencing data to study dynamic changes of gene regulation across different biological conditions. We demonstrate this approach through analyses of differential chromatin patterns at transcription factor binding sites and promoters as well as allele-specific protein-DNA interactions.
Influences of High Quality Army Enlistments
1987-03-01
The second component was formed with the Money for College and Unemployment variables. The Kaiser - Meyer - Olkin (KMO) statistics (Norusis, 1985, p.129...advertising variables were in the same component for moot of the subgroups. The Kaiser - Meyer - Olkin (1MO) values for the a6vertising variables were at...one component. The Kaiser - tMeyer- Olkin (KMO) measure of sampling adequacy indicated that principal component analysis may not be appropriate for
NASA Technical Reports Server (NTRS)
Park, Steve
1990-01-01
A large and diverse number of computational techniques are routinely used to process and analyze remotely sensed data. These techniques include: univariate statistics; multivariate statistics; principal component analysis; pattern recognition and classification; other multivariate techniques; geometric correction; registration and resampling; radiometric correction; enhancement; restoration; Fourier analysis; and filtering. Each of these techniques will be considered, in order.
NASA Astrophysics Data System (ADS)
Shirota, Yukari; Hashimoto, Takako; Fitri Sari, Riri
2018-03-01
It has been very significant to visualize time series big data. In the paper we shall discuss a new analysis method called “statistical shape analysis” or “geometry driven statistics” on time series statistical data in economics. In the paper, we analyse the agriculture, value added and industry, value added (percentage of GDP) changes from 2000 to 2010 in Asia. We handle the data as a set of landmarks on a two-dimensional image to see the deformation using the principal components. The point of the analysis method is the principal components of the given formation which are eigenvectors of its bending energy matrix. The local deformation can be expressed as the set of non-Affine transformations. The transformations give us information about the local differences between in 2000 and in 2010. Because the non-Affine transformation can be decomposed into a set of partial warps, we present the partial warps visually. The statistical shape analysis is widely used in biology but, in economics, no application can be found. In the paper, we investigate its potential to analyse the economic data.
NASA Technical Reports Server (NTRS)
Aires, Filipe; Rossow, William B.; Chedin, Alain; Hansen, James E. (Technical Monitor)
2001-01-01
The Independent Component Analysis is a recently developed technique for component extraction. This new method requires the statistical independence of the extracted components, a stronger constraint that uses higher-order statistics, instead of the classical decorrelation, a weaker constraint that uses only second-order statistics. This technique has been used recently for the analysis of geophysical time series with the goal of investigating the causes of variability in observed data (i.e. exploratory approach). We demonstrate with a data simulation experiment that, if initialized with a Principal Component Analysis, the Independent Component Analysis performs a rotation of the classical PCA (or EOF) solution. This rotation uses no localization criterion like other Rotation Techniques (RT), only the global generalization of decorrelation by statistical independence is used. This rotation of the PCA solution seems to be able to solve the tendency of PCA to mix several physical phenomena, even when the signal is just their linear sum.
Missing data is a common problem in the application of statistical techniques. In principal component analysis (PCA), a technique for dimensionality reduction, incomplete data points are either discarded or imputed using interpolation methods. Such approaches are less valid when ...
A 5 year (2002-2006) simulation of CMAQ covering the eastern United States is evaluated using principle component analysis in order to identify and characterize statistically significant patterns of model bias. Such analysis is useful in that in can identify areas of poor model ...
A Genealogical Interpretation of Principal Components Analysis
McVean, Gil
2009-01-01
Principal components analysis, PCA, is a statistical method commonly used in population genetics to identify structure in the distribution of genetic variation across geographical location and ethnic background. However, while the method is often used to inform about historical demographic processes, little is known about the relationship between fundamental demographic parameters and the projection of samples onto the primary axes. Here I show that for SNP data the projection of samples onto the principal components can be obtained directly from considering the average coalescent times between pairs of haploid genomes. The result provides a framework for interpreting PCA projections in terms of underlying processes, including migration, geographical isolation, and admixture. I also demonstrate a link between PCA and Wright's fst and show that SNP ascertainment has a largely simple and predictable effect on the projection of samples. Using examples from human genetics, I discuss the application of these results to empirical data and the implications for inference. PMID:19834557
Spatial and temporal variability of hyperspectral signatures of terrain
NASA Astrophysics Data System (ADS)
Jones, K. F.; Perovich, D. K.; Koenig, G. G.
2008-04-01
Electromagnetic signatures of terrain exhibit significant spatial heterogeneity on a range of scales as well as considerable temporal variability. A statistical characterization of the spatial heterogeneity and spatial scaling algorithms of terrain electromagnetic signatures are required to extrapolate measurements to larger scales. Basic terrain elements including bare soil, grass, deciduous, and coniferous trees were studied in a quasi-laboratory setting using instrumented test sites in Hanover, NH and Yuma, AZ. Observations were made using a visible and near infrared spectroradiometer (350 - 2500 nm) and hyperspectral camera (400 - 1100 nm). Results are reported illustrating: i) several difference scenes; ii) a terrain scene time series sampled over an annual cycle; and iii) the detection of artifacts in scenes. A principal component analysis indicated that the first three principal components typically explained between 90 and 99% of the variance of the 30 to 40-channel hyperspectral images. Higher order principal components of hyperspectral images are useful for detecting artifacts in scenes.
Wheat crown rot pathogens Fusarium graminearum and F. pseudograminearum lack specialization.
Chakraborty, Sukumar; Obanor, Friday; Westecott, Rhyannyn; Abeywickrama, Krishanthi
2010-10-01
This article reports a lack of pathogenic specialization among Australian Fusarium graminearum and F. pseudograminearum causing crown rot (CR) of wheat using analysis of variance (ANOVA), principal component and biplot analysis, Kendall's coefficient of concordance (W), and κ statistics. Overall, F. pseudograminearum was more aggressive than F. graminearum, supporting earlier delineation of the crown-infecting group as a new species. Although significant wheat line-pathogen isolate interaction in ANOVA suggested putative specialization when seedlings of 60 wheat lines were inoculated with 4 pathogen isolates or 26 wheat lines were inoculated with 10 isolates, significant W and κ showed agreement in rank order of wheat lines, indicating a lack of specialization. The first principal component representing nondifferential aggressiveness explained a large part (up to 65%) of the variation in CR severity. The differential components were small and more pronounced in seedlings than in adult plants. By maximizing variance on the first two principal components, biplots were useful for highlighting the association between isolates and wheat lines. A key finding of this work is that a range of analytical tools are needed to explore pathogenic specialization, and a statistically significant interaction in an ANOVA cannot be taken as conclusive evidence of specialization. With no highly resistant wheat cultivars, Fusarium isolates mostly differ in aggressiveness; however, specialization may appear as more resistant cultivars become widespread.
Principal component analysis of Raman spectra for TiO2 nanoparticle characterization
NASA Astrophysics Data System (ADS)
Ilie, Alina Georgiana; Scarisoareanu, Monica; Morjan, Ion; Dutu, Elena; Badiceanu, Maria; Mihailescu, Ion
2017-09-01
The Raman spectra of anatase/rutile mixed phases of Sn doped TiO2 nanoparticles and undoped TiO2 nanoparticles, synthesised by laser pyrolysis, with nanocrystallite dimensions varying from 8 to 28 nm, was simultaneously processed with a self-written software that applies Principal Component Analysis (PCA) on the measured spectrum to verify the possibility of objective auto-characterization of nanoparticles from their vibrational modes. The photo-excited process of Raman scattering is very sensible to the material characteristics, especially in the case of nanomaterials, where more properties become relevant for the vibrational behaviour. We used PCA, a statistical procedure that performs eigenvalue decomposition of descriptive data covariance, to automatically analyse the sample's measured Raman spectrum, and to interfere the correlation between nanoparticle dimensions, tin and carbon concentration, and their Principal Component values (PCs). This type of application can allow an approximation of the crystallite size, or tin concentration, only by measuring the Raman spectrum of the sample. The study of loadings of the principal components provides information of the way the vibrational modes are affected by the nanoparticle features and the spectral area relevant for the classification.
Principal Component Analysis in the Spectral Analysis of the Dynamic Laser Speckle Patterns
NASA Astrophysics Data System (ADS)
Ribeiro, K. M.; Braga, R. A., Jr.; Horgan, G. W.; Ferreira, D. D.; Safadi, T.
2014-02-01
Dynamic laser speckle is a phenomenon that interprets an optical patterns formed by illuminating a surface under changes with coherent light. Therefore, the dynamic change of the speckle patterns caused by biological material is known as biospeckle. Usually, these patterns of optical interference evolving in time are analyzed by graphical or numerical methods, and the analysis in frequency domain has also been an option, however involving large computational requirements which demands new approaches to filter the images in time. Principal component analysis (PCA) works with the statistical decorrelation of data and it can be used as a data filtering. In this context, the present work evaluated the PCA technique to filter in time the data from the biospeckle images aiming the reduction of time computer consuming and improving the robustness of the filtering. It was used 64 images of biospeckle in time observed in a maize seed. The images were arranged in a data matrix and statistically uncorrelated by PCA technique, and the reconstructed signals were analyzed using the routine graphical and numerical methods to analyze the biospeckle. Results showed the potential of the PCA tool in filtering the dynamic laser speckle data, with the definition of markers of principal components related to the biological phenomena and with the advantage of fast computational processing.
NASA Astrophysics Data System (ADS)
Li, Jiangtong; Luo, Yongdao; Dai, Honglin
2018-01-01
Water is the source of life and the essential foundation of all life. With the development of industrialization, the phenomenon of water pollution is becoming more and more frequent, which directly affects the survival and development of human. Water quality detection is one of the necessary measures to protect water resources. Ultraviolet (UV) spectral analysis is an important research method in the field of water quality detection, which partial least squares regression (PLSR) analysis method is becoming predominant technology, however, in some special cases, PLSR's analysis produce considerable errors. In order to solve this problem, the traditional principal component regression (PCR) analysis method was improved by using the principle of PLSR in this paper. The experimental results show that for some special experimental data set, improved PCR analysis method performance is better than PLSR. The PCR and PLSR is the focus of this paper. Firstly, the principal component analysis (PCA) is performed by MATLAB to reduce the dimensionality of the spectral data; on the basis of a large number of experiments, the optimized principal component is extracted by using the principle of PLSR, which carries most of the original data information. Secondly, the linear regression analysis of the principal component is carried out with statistic package for social science (SPSS), which the coefficients and relations of principal components can be obtained. Finally, calculating a same water spectral data set by PLSR and improved PCR, analyzing and comparing two results, improved PCR and PLSR is similar for most data, but improved PCR is better than PLSR for data near the detection limit. Both PLSR and improved PCR can be used in Ultraviolet spectral analysis of water, but for data near the detection limit, improved PCR's result better than PLSR.
Vargas-Bello-Pérez, Einar; Toro-Mujica, Paula; Enriquez-Hidalgo, Daniel; Fellenberg, María Angélica; Gómez-Cortés, Pilar
2017-06-01
We used a multivariate chemometric approach to differentiate or associate retail bovine milks with different fat contents and non-dairy beverages, using fatty acid profiles and statistical analysis. We collected samples of bovine milk (whole, semi-skim, and skim; n = 62) and non-dairy beverages (n = 27), and we analyzed them using gas-liquid chromatography. Principal component analysis of the fatty acid data yielded 3 significant principal components, which accounted for 72% of the total variance in the data set. Principal component 1 was related to saturated fatty acids (C4:0, C6:0, C8:0, C12:0, C14:0, C17:0, and C18:0) and monounsaturated fatty acids (C14:1 cis-9, C16:1 cis-9, C17:1 cis-9, and C18:1 trans-11); whole milk samples were clearly differentiated from the rest using this principal component. Principal component 2 differentiated semi-skim milk samples by n-3 fatty acid content (C20:3n-3, C20:5n-3, and C22:6n-3). Principal component 3 was related to C18:2 trans-9,trans-12 and C20:4n-6, and its lower scores were observed in skim milk and non-dairy beverages. A cluster analysis yielded 3 groups: group 1 consisted of only whole milk samples, group 2 was represented mainly by semi-skim milks, and group 3 included skim milk and non-dairy beverages. Overall, the present study showed that a multivariate chemometric approach is a useful tool for differentiating or associating retail bovine milks and non-dairy beverages using their fatty acid profile. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Fichtl, G. H.; Holland, R. L.
1978-01-01
A stochastic model of spacecraft motion was developed based on the assumption that the net torque vector due to crew activity and rocket thruster firings is a statistically stationary Gaussian vector process. The process had zero ensemble mean value, and the components of the torque vector were mutually stochastically independent. The linearized rigid-body equations of motion were used to derive the autospectral density functions of the components of the spacecraft rotation vector. The cross-spectral density functions of the components of the rotation vector vanish for all frequencies so that the components of rotation were mutually stochastically independent. The autospectral and cross-spectral density functions of the induced gravity environment imparted to scientific apparatus rigidly attached to the spacecraft were calculated from the rotation rate spectral density functions via linearized inertial frame to body-fixed principal axis frame transformation formulae. The induced gravity process was a Gaussian one with zero mean value. Transformation formulae were used to rotate the principal axis body-fixed frame to which the rotation rate and induced gravity vector were referred to a body-fixed frame in which the components of the induced gravity vector were stochastically independent. Rice's theory of exceedances was used to calculate expected exceedance rates of the components of the rotation and induced gravity vector processes.
Benson, Nsikak U.; Asuquo, Francis E.; Williams, Akan B.; Essien, Joseph P.; Ekong, Cyril I.; Akpabio, Otobong; Olajire, Abaas A.
2016-01-01
Trace metals (Cd, Cr, Cu, Ni and Pb) concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria). The degree of contamination was assessed using the individual contamination factors (ICF) and global contamination factor (GCF). Multivariate statistical approaches including principal component analysis (PCA), cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources. PMID:27257934
Climate drivers on malaria transmission in Arunachal Pradesh, India.
Upadhyayula, Suryanaryana Murty; Mutheneni, Srinivasa Rao; Chenna, Sumana; Parasaram, Vaideesh; Kadiri, Madhusudhan Rao
2015-01-01
The present study was conducted during the years 2006 to 2012 and provides information on prevalence of malaria and its regulation with effect to various climatic factors in East Siang district of Arunachal Pradesh, India. Correlation analysis, Principal Component Analysis and Hotelling's T² statistics models are adopted to understand the effect of weather variables on malaria transmission. The epidemiological study shows that the prevalence of malaria is mostly caused by the parasite Plasmodium vivax followed by Plasmodium falciparum. It is noted that, the intensity of malaria cases declined gradually from the year 2006 to 2012. The transmission of malaria observed was more during the rainy season, as compared to summer and winter seasons. Further, the data analysis study with Principal Component Analysis and Hotelling's T² statistic has revealed that the climatic variables such as temperature and rainfall are the most influencing factors for the high rate of malaria transmission in East Siang district of Arunachal Pradesh.
Xiao, Keke; Chen, Yun; Jiang, Xie; Zhou, Yan
2017-03-01
An investigation was conducted for 20 different types of sludge in order to identify the key organic compounds in extracellular polymeric substances (EPS) that are important in assessing variations of sludge filterability. The different types of sludge varied in initial total solids (TS) content, organic composition and pre-treatment methods. For instance, some of the sludges were pre-treated by acid, ultrasonic, thermal, alkaline, or advanced oxidation technique. The Pearson's correlation results showed significant correlations between sludge filterability and zeta potential, pH, dissolved organic carbon, protein and polysaccharide in soluble EPS (SB EPS), loosely bound EPS (LB EPS) and tightly bound EPS (TB EPS). The principal component analysis (PCA) method was used to further explore correlations between variables and similarities among EPS fractions of different types of sludge. Two principal components were extracted: principal component 1 accounted for 59.24% of total EPS variations, while principal component 2 accounted for 25.46% of total EPS variations. Dissolved organic carbon, protein and polysaccharide in LB EPS showed higher eigenvector projection values than the corresponding compounds in SB EPS and TB EPS in principal component 1. Further characterization of fractionized key organic compounds in LB EPS was conducted with size-exclusion chromatography-organic carbon detection-organic nitrogen detection (LC-OCD-OND). A numerical multiple linear regression model was established to describe relationship between organic compounds in LB EPS and sludge filterability. Copyright © 2016 Elsevier Ltd. All rights reserved.
Evidence of tampering in watermark identification
NASA Astrophysics Data System (ADS)
McLauchlan, Lifford; Mehrübeoglu, Mehrübe
2009-08-01
In this work, watermarks are embedded in digital images in the discrete wavelet transform (DWT) domain. Principal component analysis (PCA) is performed on the DWT coefficients. Next higher order statistics based on the principal components and the eigenvalues are determined for different sets of images. Feature sets are analyzed for different types of attacks in m dimensional space. The results demonstrate the separability of the features for the tampered digital copies. Different feature sets are studied to determine more effective tamper evident feature sets. The digital forensics, the probable manipulation(s) or modification(s) performed on the digital information can be identified using the described technique.
Jankovic, Marko; Ogawa, Hidemitsu
2004-10-01
Principal Component Analysis (PCA) and Principal Subspace Analysis (PSA) are classic techniques in statistical data analysis, feature extraction and data compression. Given a set of multivariate measurements, PCA and PSA provide a smaller set of "basis vectors" with less redundancy, and a subspace spanned by them, respectively. Artificial neurons and neural networks have been shown to perform PSA and PCA when gradient ascent (descent) learning rules are used, which is related to the constrained maximization (minimization) of statistical objective functions. Due to their low complexity, such algorithms and their implementation in neural networks are potentially useful in cases of tracking slow changes of correlations in the input data or in updating eigenvectors with new samples. In this paper we propose PCA learning algorithm that is fully homogeneous with respect to neurons. The algorithm is obtained by modification of one of the most famous PSA learning algorithms--Subspace Learning Algorithm (SLA). Modification of the algorithm is based on Time-Oriented Hierarchical Method (TOHM). The method uses two distinct time scales. On a faster time scale PSA algorithm is responsible for the "behavior" of all output neurons. On a slower scale, output neurons will compete for fulfillment of their "own interests". On this scale, basis vectors in the principal subspace are rotated toward the principal eigenvectors. At the end of the paper it will be briefly analyzed how (or why) time-oriented hierarchical method can be used for transformation of any of the existing neural network PSA method, into PCA method.
Ramli, Saifullah; Ismail, Noryati; Alkarkhi, Abbas Fadhl Mubarek; Easa, Azhar Mat
2010-08-01
Banana peel flour (BPF) prepared from green or ripe Cavendish and Dream banana fruits were assessed for their total starch (TS), digestible starch (DS), resistant starch (RS), total dietary fibre (TDF), soluble dietary fibre (SDF) and insoluble dietary fibre (IDF). Principal component analysis (PCA) identified that only 1 component was responsible for 93.74% of the total variance in the starch and dietary fibre components that differentiated ripe and green banana flours. Cluster analysis (CA) applied to similar data obtained two statistically significant clusters (green and ripe bananas) to indicate difference in behaviours according to the stages of ripeness based on starch and dietary fibre components. We concluded that the starch and dietary fibre components could be used to discriminate between flours prepared from peels obtained from fruits of different ripeness. The results were also suggestive of the potential of green and ripe BPF as functional ingredients in food.
Ramli, Saifullah; Ismail, Noryati; Alkarkhi, Abbas Fadhl Mubarek; Easa, Azhar Mat
2010-01-01
Banana peel flour (BPF) prepared from green or ripe Cavendish and Dream banana fruits were assessed for their total starch (TS), digestible starch (DS), resistant starch (RS), total dietary fibre (TDF), soluble dietary fibre (SDF) and insoluble dietary fibre (IDF). Principal component analysis (PCA) identified that only 1 component was responsible for 93.74% of the total variance in the starch and dietary fibre components that differentiated ripe and green banana flours. Cluster analysis (CA) applied to similar data obtained two statistically significant clusters (green and ripe bananas) to indicate difference in behaviours according to the stages of ripeness based on starch and dietary fibre components. We concluded that the starch and dietary fibre components could be used to discriminate between flours prepared from peels obtained from fruits of different ripeness. The results were also suggestive of the potential of green and ripe BPF as functional ingredients in food. PMID:24575193
Machine learning of frustrated classical spin models. I. Principal component analysis
NASA Astrophysics Data System (ADS)
Wang, Ce; Zhai, Hui
2017-10-01
This work aims at determining whether artificial intelligence can recognize a phase transition without prior human knowledge. If this were successful, it could be applied to, for instance, analyzing data from the quantum simulation of unsolved physical models. Toward this goal, we first need to apply the machine learning algorithm to well-understood models and see whether the outputs are consistent with our prior knowledge, which serves as the benchmark for this approach. In this work, we feed the computer data generated by the classical Monte Carlo simulation for the X Y model in frustrated triangular and union jack lattices, which has two order parameters and exhibits two phase transitions. We show that the outputs of the principal component analysis agree very well with our understanding of different orders in different phases, and the temperature dependences of the major components detect the nature and the locations of the phase transitions. Our work offers promise for using machine learning techniques to study sophisticated statistical models, and our results can be further improved by using principal component analysis with kernel tricks and the neural network method.
Kong, Jessica; Giridharagopal, Rajiv; Harrison, Jeffrey S; Ginger, David S
2018-05-31
Correlating nanoscale chemical specificity with operational physics is a long-standing goal of functional scanning probe microscopy (SPM). We employ a data analytic approach combining multiple microscopy modes, using compositional information in infrared vibrational excitation maps acquired via photoinduced force microscopy (PiFM) with electrical information from conductive atomic force microscopy. We study a model polymer blend comprising insulating poly(methyl methacrylate) (PMMA) and semiconducting poly(3-hexylthiophene) (P3HT). We show that PiFM spectra are different from FTIR spectra, but can still be used to identify local composition. We use principal component analysis to extract statistically significant principal components and principal component regression to predict local current and identify local polymer composition. In doing so, we observe evidence of semiconducting P3HT within PMMA aggregates. These methods are generalizable to correlated SPM data and provide a meaningful technique for extracting complex compositional information that are impossible to measure from any one technique.
Dihedral angle principal component analysis of molecular dynamics simulations.
Altis, Alexandros; Nguyen, Phuong H; Hegger, Rainer; Stock, Gerhard
2007-06-28
It has recently been suggested by Mu et al. [Proteins 58, 45 (2005)] to use backbone dihedral angles instead of Cartesian coordinates in a principal component analysis of molecular dynamics simulations. Dihedral angles may be advantageous because internal coordinates naturally provide a correct separation of internal and overall motion, which was found to be essential for the construction and interpretation of the free energy landscape of a biomolecule undergoing large structural rearrangements. To account for the circular statistics of angular variables, a transformation from the space of dihedral angles {phi(n)} to the metric coordinate space {x(n)=cos phi(n),y(n)=sin phi(n)} was employed. To study the validity and the applicability of the approach, in this work the theoretical foundations underlying the dihedral angle principal component analysis (dPCA) are discussed. It is shown that the dPCA amounts to a one-to-one representation of the original angle distribution and that its principal components can readily be characterized by the corresponding conformational changes of the peptide. Furthermore, a complex version of the dPCA is introduced, in which N angular variables naturally lead to N eigenvalues and eigenvectors. Applying the methodology to the construction of the free energy landscape of decaalanine from a 300 ns molecular dynamics simulation, a critical comparison of the various methods is given.
Dihedral angle principal component analysis of molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Altis, Alexandros; Nguyen, Phuong H.; Hegger, Rainer; Stock, Gerhard
2007-06-01
It has recently been suggested by Mu et al. [Proteins 58, 45 (2005)] to use backbone dihedral angles instead of Cartesian coordinates in a principal component analysis of molecular dynamics simulations. Dihedral angles may be advantageous because internal coordinates naturally provide a correct separation of internal and overall motion, which was found to be essential for the construction and interpretation of the free energy landscape of a biomolecule undergoing large structural rearrangements. To account for the circular statistics of angular variables, a transformation from the space of dihedral angles {φn} to the metric coordinate space {xn=cosφn,yn=sinφn} was employed. To study the validity and the applicability of the approach, in this work the theoretical foundations underlying the dihedral angle principal component analysis (dPCA) are discussed. It is shown that the dPCA amounts to a one-to-one representation of the original angle distribution and that its principal components can readily be characterized by the corresponding conformational changes of the peptide. Furthermore, a complex version of the dPCA is introduced, in which N angular variables naturally lead to N eigenvalues and eigenvectors. Applying the methodology to the construction of the free energy landscape of decaalanine from a 300ns molecular dynamics simulation, a critical comparison of the various methods is given.
Fleming, Brandon J.; LaMotte, Andrew E.; Sekellick, Andrew J.
2013-01-01
Hydrogeologic regions in the fractured rock area of Maryland were classified using geographic information system tools with principal components and cluster analyses. A study area consisting of the 8-digit Hydrologic Unit Code (HUC) watersheds with rivers that flow through the fractured rock area of Maryland and bounded by the Fall Line was further subdivided into 21,431 catchments from the National Hydrography Dataset Plus. The catchments were then used as a common hydrologic unit to compile relevant climatic, topographic, and geologic variables. A principal components analysis was performed on 10 input variables, and 4 principal components that accounted for 83 percent of the variability in the original data were identified. A subsequent cluster analysis grouped the catchments based on four principal component scores into six hydrogeologic regions. Two crystalline rock hydrogeologic regions, including large parts of the Washington, D.C. and Baltimore metropolitan regions that represent over 50 percent of the fractured rock area of Maryland, are distinguished by differences in recharge, Precipitation minus Potential Evapotranspiration, sand content in soils, and groundwater contributions to streams. This classification system will provide a georeferenced digital hydrogeologic framework for future investigations of groundwater availability in the fractured rock area of Maryland.
2014-01-01
Background The chemical composition of aerosols and particle size distributions are the most significant factors affecting air quality. In particular, the exposure to finer particles can cause short and long-term effects on human health. In the present paper PM10 (particulate matter with aerodynamic diameter lower than 10 μm), CO, NOx (NO and NO2), Benzene and Toluene trends monitored in six monitoring stations of Bari province are shown. The data set used was composed by bi-hourly means for all parameters (12 bi-hourly means per day for each parameter) and it’s referred to the period of time from January 2005 and May 2007. The main aim of the paper is to provide a clear illustration of how large data sets from monitoring stations can give information about the number and nature of the pollutant sources, and mainly to assess the contribution of the traffic source to PM10 concentration level by using multivariate statistical techniques such as Principal Component Analysis (PCA) and Absolute Principal Component Scores (APCS). Results Comparing the night and day mean concentrations (per day) for each parameter it has been pointed out that there is a different night and day behavior for some parameters such as CO, Benzene and Toluene than PM10. This suggests that CO, Benzene and Toluene concentrations are mainly connected with transport systems, whereas PM10 is mostly influenced by different factors. The statistical techniques identified three recurrent sources, associated with vehicular traffic and particulate transport, covering over 90% of variance. The contemporaneous analysis of gas and PM10 has allowed underlining the differences between the sources of these pollutants. Conclusions The analysis of the pollutant trends from large data set and the application of multivariate statistical techniques such as PCA and APCS can give useful information about air quality and pollutant’s sources. These knowledge can provide useful advices to environmental policies in order to reach the WHO recommended levels. PMID:24555534
Incorporating principal component analysis into air quality model evaluation
The efficacy of standard air quality model evaluation techniques is becoming compromised as the simulation periods continue to lengthen in response to ever increasing computing capacity. Accordingly, the purpose of this paper is to demonstrate a statistical approach called Princi...
Parallel auto-correlative statistics with VTK.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pebay, Philippe Pierre; Bennett, Janine Camille
2013-08-01
This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.
Morphological variation of 508 hatchling alligators from three lakes in north central Florida (Lakes Woodruff, Apopka, and Orange) was analyzed using multivariate statistics. Morphological variation was found among clutches as well as among lakes. Principal components analysis wa...
Stashenko, Elena E; Martínez, Jairo R; Ruíz, Carlos A; Arias, Ginna; Durán, Camilo; Salgar, William; Cala, Mónica
2010-01-01
Chromatographic (GC/flame ionization detection, GC/MS) and statistical analyses were applied to the study of essential oils and extracts obtained from flowers, leaves, and stems of Lippia origanoides plants, growing wild in different Colombian regions. Retention indices, mass spectra, and standard substances were used in the identification of 139 substances detected in these essential oils and extracts. Principal component analysis allowed L. origanoides classification into three chemotypes, characterized according to their essential oil major components. Alpha- and beta-phellandrenes, p-cymene, and limonene distinguished chemotype A; carvacrol and thymol were the distinctive major components of chemotypes B and C, respectively. Pinocembrin (5,7-dihydroxyflavanone) was found in L. origanoides chemotype A supercritical fluid (CO(2)) extract at a concentration of 0.83+/-0.03 mg/g of dry plant material, which makes this plant an interesting source of an important bioactive flavanone with diverse potential applications in cosmetic, food, and pharmaceutical products.
Asymptotics of empirical eigenstructure for high dimensional spiked covariance.
Wang, Weichen; Fan, Jianqing
2017-06-01
We derive the asymptotic distributions of the spiked eigenvalues and eigenvectors under a generalized and unified asymptotic regime, which takes into account the magnitude of spiked eigenvalues, sample size, and dimensionality. This regime allows high dimensionality and diverging eigenvalues and provides new insights into the roles that the leading eigenvalues, sample size, and dimensionality play in principal component analysis. Our results are a natural extension of those in Paul (2007) to a more general setting and solve the rates of convergence problems in Shen et al. (2013). They also reveal the biases of estimating leading eigenvalues and eigenvectors by using principal component analysis, and lead to a new covariance estimator for the approximate factor model, called shrinkage principal orthogonal complement thresholding (S-POET), that corrects the biases. Our results are successfully applied to outstanding problems in estimation of risks of large portfolios and false discovery proportions for dependent test statistics and are illustrated by simulation studies.
Asymptotics of empirical eigenstructure for high dimensional spiked covariance
Wang, Weichen
2017-01-01
We derive the asymptotic distributions of the spiked eigenvalues and eigenvectors under a generalized and unified asymptotic regime, which takes into account the magnitude of spiked eigenvalues, sample size, and dimensionality. This regime allows high dimensionality and diverging eigenvalues and provides new insights into the roles that the leading eigenvalues, sample size, and dimensionality play in principal component analysis. Our results are a natural extension of those in Paul (2007) to a more general setting and solve the rates of convergence problems in Shen et al. (2013). They also reveal the biases of estimating leading eigenvalues and eigenvectors by using principal component analysis, and lead to a new covariance estimator for the approximate factor model, called shrinkage principal orthogonal complement thresholding (S-POET), that corrects the biases. Our results are successfully applied to outstanding problems in estimation of risks of large portfolios and false discovery proportions for dependent test statistics and are illustrated by simulation studies. PMID:28835726
Contribution of botanical origin and sugar composition of honeys on the crystallization phenomenon.
Escuredo, Olga; Dobre, Irina; Fernández-González, María; Seijo, M Carmen
2014-04-15
The present work provides information regarding the statistical relationships among the palynological characteristics, sugars (fructose, glucose, sucrose, melezitose and maltose), moisture content and sugar ratios (F+G, F/G and G/W) of 136 different honey types (including bramble, chestnut, eucalyptus, heather, acacia, lime, rape, sunflower and honeydew). Results of the statistical analyses (multiple comparison Bonferroni test, Spearman rank correlations and principal components) revealed the valuable significance of the botanical origin on the sugar ratios (F+G, F/G and G/W). Brassica napus and Helianthus annuus pollen were the variables situated near F+G and G/W ratio, while Castanea sativa, Rubus and Eucalyptus pollen were located further away, as shown in the principal component analysis. The F/G ratio of sunflower, rape and lime honeys were lower than those found for the chestnut, eucalyptus, heather, acacia and honeydew honeys (>1.4). A lower value F/G ratio and lower water content were related with a faster crystallization in the honey. Copyright © 2013 Elsevier Ltd. All rights reserved.
Buonaccorsi, G A; Rose, C J; O'Connor, J P B; Roberts, C; Watson, Y; Jackson, A; Jayson, G C; Parker, G J M
2010-01-01
Clinical trials of anti-angiogenic and vascular-disrupting agents often use biomarkers derived from DCE-MRI, typically reporting whole-tumor summary statistics and so overlooking spatial parameter variations caused by tissue heterogeneity. We present a data-driven segmentation method comprising tracer-kinetic model-driven registration for motion correction, conversion from MR signal intensity to contrast agent concentration for cross-visit normalization, iterative principal components analysis for imputation of missing data and dimensionality reduction, and statistical outlier detection using the minimum covariance determinant to obtain a robust Mahalanobis distance. After applying these techniques we cluster in the principal components space using k-means. We present results from a clinical trial of a VEGF inhibitor, using time-series data selected because of problems due to motion and outlier time series. We obtained spatially-contiguous clusters that map to regions with distinct microvascular characteristics. This methodology has the potential to uncover localized effects in trials using DCE-MRI-based biomarkers.
Relevant principal component analysis applied to the characterisation of Portuguese heather honey.
Martins, Rui C; Lopes, Victor V; Valentão, Patrícia; Carvalho, João C M F; Isabel, Paulo; Amaral, Maria T; Batista, Maria T; Andrade, Paula B; Silva, Branca M
2008-01-01
The main purpose of this study was the characterisation of 'Serra da Lousã' heather honey by using novel statistical methodology, relevant principal component analysis, in order to assess the correlations between production year, locality and composition. Herein, we also report its chemical composition in terms of sugars, glycerol and ethanol, and physicochemical parameters. Sugars profiles from 'Serra da Lousã' heather and 'Terra Quente de Trás-os-Montes' lavender honeys were compared and allowed the discrimination: 'Serra da Lousã' honeys do not contain sucrose, generally exhibit lower contents of turanose, trehalose and maltose and higher contents of fructose and glucose. Different localities from 'Serra da Lousã' provided groups of samples with high and low glycerol contents. Glycerol and ethanol contents were revealed to be independent of the sugars profiles. These data and statistical models can be very useful in the comparison and detection of adulterations during the quality control analysis of 'Serra da Lousã' honey.
Donato, Gianluca; Bartlett, Marian Stewart; Hager, Joseph C.; Ekman, Paul; Sejnowski, Terrence J.
2010-01-01
The Facial Action Coding System (FACS) [23] is an objective method for quantifying facial movement in terms of component actions. This system is widely used in behavioral investigations of emotion, cognitive processes, and social interaction. The coding is presently performed by highly trained human experts. This paper explores and compares techniques for automatically recognizing facial actions in sequences of images. These techniques include analysis of facial motion through estimation of optical flow; holistic spatial analysis, such as principal component analysis, independent component analysis, local feature analysis, and linear discriminant analysis; and methods based on the outputs of local filters, such as Gabor wavelet representations and local principal components. Performance of these systems is compared to naive and expert human subjects. Best performances were obtained using the Gabor wavelet representation and the independent component representation, both of which achieved 96 percent accuracy for classifying 12 facial actions of the upper and lower face. The results provide converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions. PMID:21188284
Salvatore, Stefania; Røislien, Jo; Baz-Lomba, Jose A; Bramness, Jørgen G
2017-03-01
Wastewater-based epidemiology is an alternative method for estimating the collective drug use in a community. We applied functional data analysis, a statistical framework developed for analysing curve data, to investigate weekly temporal patterns in wastewater measurements of three prescription drugs with known abuse potential: methadone, oxazepam and methylphenidate, comparing them to positive and negative control drugs. Sewage samples were collected in February 2014 from a wastewater treatment plant in Oslo, Norway. The weekly pattern of each drug was extracted by fitting of generalized additive models, using trigonometric functions to model the cyclic behaviour. From the weekly component, the main temporal features were then extracted using functional principal component analysis. Results are presented through the functional principal components (FPCs) and corresponding FPC scores. Clinically, the most important weekly feature of the wastewater-based epidemiology data was the second FPC, representing the difference between average midweek level and a peak during the weekend, representing possible recreational use of a drug in the weekend. Estimated scores on this FPC indicated recreational use of methylphenidate, with a high weekend peak, but not for methadone and oxazepam. The functional principal component analysis uncovered clinically important temporal features of the weekly patterns of the use of prescription drugs detected from wastewater analysis. This may be used as a post-marketing surveillance method to monitor prescription drugs with abuse potential. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Espeland, Mark A; Bray, George A; Neiberg, Rebecca; Rejeski, W Jack; Knowler, William C; Lang, Wei; Cheskin, Lawrence J; Williamson, Don; Lewis, C Beth; Wing, Rena
2009-10-01
To demonstrate how principal components analysis can be used to describe patterns of weight changes in response to an intensive lifestyle intervention. Principal components analysis was applied to monthly percent weight changes measured on 2,485 individuals enrolled in the lifestyle arm of the Action for Health in Diabetes (Look AHEAD) clinical trial. These individuals were 45 to 75 years of age, with type 2 diabetes and body mass indices greater than 25 kg/m(2). Associations between baseline characteristics and weight loss patterns were described using analyses of variance. Three components collectively accounted for 97.0% of total intrasubject variance: a gradually decelerating weight loss (88.8%), early versus late weight loss (6.6%), and a mid-year trough (1.6%). In agreement with previous reports, each of the baseline characteristics we examined had statistically significant relationships with weight loss patterns. As examples, males tended to have a steeper trajectory of percent weight loss and to lose weight more quickly than women. Individuals with higher hemoglobin A(1c) (glycosylated hemoglobin; HbA(1c)) tended to have a flatter trajectory of percent weight loss and to have mid-year troughs in weight loss compared to those with lower HbA(1c). Principal components analysis provided a coherent description of characteristic patterns of weight changes and is a useful vehicle for identifying their correlates and potentially for predicting weight control outcomes.
Catanuto, Giuseppe; Taher, Wafa; Rocco, Nicola; Catalano, Francesca; Allegra, Dario; Milotta, Filippo Luigi Maria; Stanco, Filippo; Gallo, Giovanni; Nava, Maurizio Bruno
2018-03-20
Breast shape is defined utilizing mainly qualitative assessment (full, flat, ptotic) or estimates, such as volume or distances between reference points, that cannot describe it reliably. We will quantitatively describe breast shape with two parameters derived from a statistical methodology denominated principal component analysis (PCA). We created a heterogeneous dataset of breast shapes acquired with a commercial infrared 3-dimensional scanner on which PCA was performed. We plotted on a Cartesian plane the two highest values of PCA for each breast (principal components 1 and 2). Testing of the methodology on a preoperative and postoperative surgical case and test-retest was performed by two operators. The first two principal components derived from PCA are able to characterize the shape of the breast included in the dataset. The test-retest demonstrated that different operators are able to obtain very similar values of PCA. The system is also able to identify major changes in the preoperative and postoperative stages of a two-stage reconstruction. Even minor changes were correctly detected by the system. This methodology can reliably describe the shape of a breast. An expert operator and a newly trained operator can reach similar results in a test/re-testing validation. Once developed and after further validation, this methodology could be employed as a good tool for outcome evaluation, auditing, and benchmarking.
Mansouri, Majdi; Nounou, Mohamed N; Nounou, Hazem N
2017-09-01
In our previous work, we have demonstrated the effectiveness of the linear multiscale principal component analysis (PCA)-based moving window (MW)-generalized likelihood ratio test (GLRT) technique over the classical PCA and multiscale principal component analysis (MSPCA)-based GLRT methods. The developed fault detection algorithm provided optimal properties by maximizing the detection probability for a particular false alarm rate (FAR) with different values of windows, and however, most real systems are nonlinear, which make the linear PCA method not able to tackle the issue of non-linearity to a great extent. Thus, in this paper, first, we apply a nonlinear PCA to obtain an accurate principal component of a set of data and handle a wide range of nonlinearities using the kernel principal component analysis (KPCA) model. The KPCA is among the most popular nonlinear statistical methods. Second, we extend the MW-GLRT technique to one that utilizes exponential weights to residuals in the moving window (instead of equal weightage) as it might be able to further improve fault detection performance by reducing the FAR using exponentially weighed moving average (EWMA). The developed detection method, which is called EWMA-GLRT, provides improved properties, such as smaller missed detection and FARs and smaller average run length. The idea behind the developed EWMA-GLRT is to compute a new GLRT statistic that integrates current and previous data information in a decreasing exponential fashion giving more weight to the more recent data. This provides a more accurate estimation of the GLRT statistic and provides a stronger memory that will enable better decision making with respect to fault detection. Therefore, in this paper, a KPCA-based EWMA-GLRT method is developed and utilized in practice to improve fault detection in biological phenomena modeled by S-systems and to enhance monitoring process mean. The idea behind a KPCA-based EWMA-GLRT fault detection algorithm is to combine the advantages brought forward by the proposed EWMA-GLRT fault detection chart with the KPCA model. Thus, it is used to enhance fault detection of the Cad System in E. coli model through monitoring some of the key variables involved in this model such as enzymes, transport proteins, regulatory proteins, lysine, and cadaverine. The results demonstrate the effectiveness of the proposed KPCA-based EWMA-GLRT method over Q , GLRT, EWMA, Shewhart, and moving window-GLRT methods. The detection performance is assessed and evaluated in terms of FAR, missed detection rates, and average run length (ARL 1 ) values.
How Many Separable Sources? Model Selection In Independent Components Analysis
Woods, Roger P.; Hansen, Lars Kai; Strother, Stephen
2015-01-01
Unlike mixtures consisting solely of non-Gaussian sources, mixtures including two or more Gaussian components cannot be separated using standard independent components analysis methods that are based on higher order statistics and independent observations. The mixed Independent Components Analysis/Principal Components Analysis (mixed ICA/PCA) model described here accommodates one or more Gaussian components in the independent components analysis model and uses principal components analysis to characterize contributions from this inseparable Gaussian subspace. Information theory can then be used to select from among potential model categories with differing numbers of Gaussian components. Based on simulation studies, the assumptions and approximations underlying the Akaike Information Criterion do not hold in this setting, even with a very large number of observations. Cross-validation is a suitable, though computationally intensive alternative for model selection. Application of the algorithm is illustrated using Fisher's iris data set and Howells' craniometric data set. Mixed ICA/PCA is of potential interest in any field of scientific investigation where the authenticity of blindly separated non-Gaussian sources might otherwise be questionable. Failure of the Akaike Information Criterion in model selection also has relevance in traditional independent components analysis where all sources are assumed non-Gaussian. PMID:25811988
Zhang, Jian; Hou, Dibo; Wang, Ke; Huang, Pingjie; Zhang, Guangxin; Loáiciga, Hugo
2017-05-01
The detection of organic contaminants in water distribution systems is essential to protect public health from potential harmful compounds resulting from accidental spills or intentional releases. Existing methods for detecting organic contaminants are based on quantitative analyses such as chemical testing and gas/liquid chromatography, which are time- and reagent-consuming and involve costly maintenance. This study proposes a novel procedure based on discrete wavelet transform and principal component analysis for detecting organic contamination events from ultraviolet spectral data. Firstly, the spectrum of each observation is transformed using discrete wavelet with a coiflet mother wavelet to capture the abrupt change along the wavelength. Principal component analysis is then employed to approximate the spectra based on capture and fusion features. The significant value of Hotelling's T 2 statistics is calculated and used to detect outliers. An alarm of contamination event is triggered by sequential Bayesian analysis when the outliers appear continuously in several observations. The effectiveness of the proposed procedure is tested on-line using a pilot-scale setup and experimental data.
Evaluation of Meterorite Amono Acid Analysis Data Using Multivariate Techniques
NASA Technical Reports Server (NTRS)
McDonald, G.; Storrie-Lombardi, M.; Nealson, K.
1999-01-01
The amino acid distributions in the Murchison carbonaceous chondrite, Mars meteorite ALH84001, and ice from the Allan Hills region of Antarctica are shown, using a multivariate technique known as Principal Component Analysis (PCA), to be statistically distinct from the average amino acid compostion of 101 terrestrial protein superfamilies.
USDA-ARS?s Scientific Manuscript database
Six methods were compared with respect to spectral fingerprinting of a well-characterized series of broccoli samples. Spectral fingerprints were acquired for finely-powdered solid samples using Fourier transform-infrared (IR) and Fourier transform-near infrared (NIR) spectrometry and for aqueous met...
Shahlaei, Mohsen; Sabet, Razieh; Ziari, Maryam Bahman; Moeinifard, Behzad; Fassihi, Afshin; Karbakhsh, Reza
2010-10-01
Quantitative relationships between molecular structure and methionine aminopeptidase-2 inhibitory activity of a series of cytotoxic anthranilic acid sulfonamide derivatives were discovered. We have demonstrated the detailed application of two efficient nonlinear methods for evaluation of quantitative structure-activity relationships of the studied compounds. Components produced by principal component analysis as input of developed nonlinear models were used. The performance of the developed models namely PC-GRNN and PC-LS-SVM were tested by several validation methods. The resulted PC-LS-SVM model had a high statistical quality (R(2)=0.91 and R(CV)(2)=0.81) for predicting the cytotoxic activity of the compounds. Comparison between predictability of PC-GRNN and PC-LS-SVM indicates that later method has higher ability to predict the activity of the studied molecules. Copyright (c) 2010 Elsevier Masson SAS. All rights reserved.
Color enhancement of landsat agricultural imagery: JPL LACIE image processing support task
NASA Technical Reports Server (NTRS)
Madura, D. P.; Soha, J. M.; Green, W. B.; Wherry, D. B.; Lewis, S. D.
1978-01-01
Color enhancement techniques were applied to LACIE LANDSAT segments to determine if such enhancement can assist analysis in crop identification. The procedure involved increasing the color range by removing correlation between components. First, a principal component transformation was performed, followed by contrast enhancement to equalize component variances, followed by an inverse transformation to restore familiar color relationships. Filtering was applied to lower order components to reduce color speckle in the enhanced products. Use of single acquisition and multiple acquisition statistics to control the enhancement were compared, and the effects of normalization investigated. Evaluation is left to LACIE personnel.
Investigation of domain walls in PPLN by confocal raman microscopy and PCA analysis
NASA Astrophysics Data System (ADS)
Shur, Vladimir Ya.; Zelenovskiy, Pavel; Bourson, Patrice
2017-07-01
Confocal Raman microscopy (CRM) is a powerful tool for investigation of ferroelectric domains. Mechanical stresses and electric fields existed in the vicinity of neutral and charged domain walls modify frequency, intensity and width of spectral lines [1], thus allowing to visualize micro- and nanodomain structures both at the surface and in the bulk of the crystal [2,3]. Stresses and fields are naturally coupled in ferroelectrics due to inverse piezoelectric effect and hardly can be separated in Raman spectra. PCA is a powerful statistical method for analysis of large data matrix providing a set of orthogonal variables, called principal components (PCs). PCA is widely used for classification of experimental data, for example, in crystallization experiments, for detection of small amounts of components in solid mixtures etc. [4,5]. In Raman spectroscopy PCA was applied for analysis of phase transitions and provided critical pressure with good accuracy [6]. In the present work we for the first time applied Principal Component Analysis (PCA) method for analysis of Raman spectra measured in periodically poled lithium niobate (PPLN). We found that principal components demonstrate different sensitivity to mechanical stresses and electric fields in the vicinity of the domain walls. This allowed us to separately visualize spatial distribution of fields and electric fields at the surface and in the bulk of PPLN.
ERIC Educational Resources Information Center
Yung-Kuan, Chan; Hsieh, Ming-Yuan; Lee, Chin-Feng; Huang, Chih-Cheng; Ho, Li-Chih
2017-01-01
Under the hyper-dynamic education situation, this research, in order to comprehensively explore the interplays between Teacher Competence Demands (TCD) and Learning Organization Requests (LOR), cross-employs the data refined method of Descriptive Statistics (DS) method and Analysis of Variance (ANOVA) and Principal Components Analysis (PCA)…
A Psychometric Investigation of the Marlowe-Crowne Social Desirability Scale Using Rasch Measurement
ERIC Educational Resources Information Center
Seol, Hyunsoo
2007-01-01
The author used Rasch measurement to examine the reliability and validity of 382 Korean university students' scores on the Marlowe-Crowne Social Desirability Scale (MCSDS; D. P. Crowne and D. Marlowe, 1960). Results revealed that item-fit statistics and principal component analysis with standardized residuals provide evidence of MCSDS'…
Principal component analysis of the cytokine and chemokine response to human traumatic brain injury.
Helmy, Adel; Antoniades, Chrystalina A; Guilfoyle, Mathew R; Carpenter, Keri L H; Hutchinson, Peter J
2012-01-01
There is a growing realisation that neuro-inflammation plays a fundamental role in the pathology of Traumatic Brain Injury (TBI). This has led to the search for biomarkers that reflect these underlying inflammatory processes using techniques such as cerebral microdialysis. The interpretation of such biomarker data has been limited by the statistical methods used. When analysing data of this sort the multiple putative interactions between mediators need to be considered as well as the timing of production and high degree of statistical co-variance in levels of these mediators. Here we present a cytokine and chemokine dataset from human brain following human traumatic brain injury and use principal component analysis and partial least squares discriminant analysis to demonstrate the pattern of production following TBI, distinct phases of the humoral inflammatory response and the differing patterns of response in brain and in peripheral blood. This technique has the added advantage of making no assumptions about the Relative Recovery (RR) of microdialysis derived parameters. Taken together these techniques can be used in complex microdialysis datasets to summarise the data succinctly and generate hypotheses for future study.
Tchabo, William; Ma, Yongkun; Kwaw, Emmanuel; Zhang, Haining; Xiao, Lulu; Apaliya, Maurice T
2018-01-15
The four different methods of color measurement of wine proposed by Boulton, Giusti, Glories and Commission International de l'Eclairage (CIE) were applied to assess the statistical relationship between the phytochemical profile and chromatic characteristics of sulfur dioxide-free mulberry (Morus nigra) wine submitted to non-thermal maturation processes. The alteration in chromatic properties and phenolic composition of non-thermal aged mulberry wine were examined, aided by the used of Pearson correlation, cluster and principal component analysis. The results revealed a positive effect of non-thermal processes on phytochemical families of wines. From Pearson correlation analysis relationships between chromatic indexes and flavonols as well as anthocyanins were established. Cluster analysis highlighted similarities between Boulton and Giusti parameters, as well as Glories and CIE parameters in the assessment of chromatic properties of wines. Finally, principal component analysis was able to discriminate wines subjected to different maturation techniques on the basis of their chromatic and phenolics characteristics. Copyright © 2017. Published by Elsevier Ltd.
Understanding software faults and their role in software reliability modeling
NASA Technical Reports Server (NTRS)
Munson, John C.
1994-01-01
This study is a direct result of an on-going project to model the reliability of a large real-time control avionics system. In previous modeling efforts with this system, hardware reliability models were applied in modeling the reliability behavior of this system. In an attempt to enhance the performance of the adapted reliability models, certain software attributes were introduced in these models to control for differences between programs and also sequential executions of the same program. As the basic nature of the software attributes that affect software reliability become better understood in the modeling process, this information begins to have important implications on the software development process. A significant problem arises when raw attribute measures are to be used in statistical models as predictors, for example, of measures of software quality. This is because many of the metrics are highly correlated. Consider the two attributes: lines of code, LOC, and number of program statements, Stmts. In this case, it is quite obvious that a program with a high value of LOC probably will also have a relatively high value of Stmts. In the case of low level languages, such as assembly language programs, there might be a one-to-one relationship between the statement count and the lines of code. When there is a complete absence of linear relationship among the metrics, they are said to be orthogonal or uncorrelated. Usually the lack of orthogonality is not serious enough to affect a statistical analysis. However, for the purposes of some statistical analysis such as multiple regression, the software metrics are so strongly interrelated that the regression results may be ambiguous and possibly even misleading. Typically, it is difficult to estimate the unique effects of individual software metrics in the regression equation. The estimated values of the coefficients are very sensitive to slight changes in the data and to the addition or deletion of variables in the regression equation. Since most of the existing metrics have common elements and are linear combinations of these common elements, it seems reasonable to investigate the structure of the underlying common factors or components that make up the raw metrics. The technique we have chosen to use to explore this structure is a procedure called principal components analysis. Principal components analysis is a decomposition technique that may be used to detect and analyze collinearity in software metrics. When confronted with a large number of metrics measuring a single construct, it may be desirable to represent the set by some smaller number of variables that convey all, or most, of the information in the original set. Principal components are linear transformations of a set of random variables that summarize the information contained in the variables. The transformations are chosen so that the first component accounts for the maximal amount of variation of the measures of any possible linear transform; the second component accounts for the maximal amount of residual variation; and so on. The principal components are constructed so that they represent transformed scores on dimensions that are orthogonal. Through the use of principal components analysis, it is possible to have a set of highly related software attributes mapped into a small number of uncorrelated attribute domains. This definitively solves the problem of multi-collinearity in subsequent regression analysis. There are many software metrics in the literature, but principal component analysis reveals that there are few distinct sources of variation, i.e. dimensions, in this set of metrics. It would appear perfectly reasonable to characterize the measurable attributes of a program with a simple function of a small number of orthogonal metrics each of which represents a distinct software attribute domain.
Lindsell, Christopher J.; Welty, Leah J.; Mazumdar, Madhu; Thurston, Sally W.; Rahbar, Mohammad H.; Carter, Rickey E.; Pollock, Bradley H.; Cucchiara, Andrew J.; Kopras, Elizabeth J.; Jovanovic, Borko D.; Enders, Felicity T.
2014-01-01
Abstract Introduction Statistics is an essential training component for a career in clinical and translational science (CTS). Given the increasing complexity of statistics, learners may have difficulty selecting appropriate courses. Our question was: what depth of statistical knowledge do different CTS learners require? Methods For three types of CTS learners (principal investigator, co‐investigator, informed reader of the literature), each with different backgrounds in research (no previous research experience, reader of the research literature, previous research experience), 18 experts in biostatistics, epidemiology, and research design proposed levels for 21 statistical competencies. Results Statistical competencies were categorized as fundamental, intermediate, or specialized. CTS learners who intend to become independent principal investigators require more specialized training, while those intending to become informed consumers of the medical literature require more fundamental education. For most competencies, less training was proposed for those with more research background. Discussion When selecting statistical coursework, the learner's research background and career goal should guide the decision. Some statistical competencies are considered to be more important than others. Baseline knowledge assessments may help learners identify appropriate coursework. Conclusion Rather than one size fits all, tailoring education to baseline knowledge, learner background, and future goals increases learning potential while minimizing classroom time. PMID:25212569
Middleton, David A; Hughes, Eleri; Madine, Jillian
2004-08-11
We describe an NMR approach for detecting the interactions between phospholipid membranes and proteins, peptides, or small molecules. First, 1H-13C dipolar coupling profiles are obtained from hydrated lipid samples at natural isotope abundance using cross-polarization magic-angle spinning NMR methods. Principal component analysis of dipolar coupling profiles for synthetic lipid membranes in the presence of a range of biologically active additives reveals clusters that relate to different modes of interaction of the additives with the lipid bilayer. Finally, by representing profiles from multiple samples in the form of contour plots, it is possible to reveal statistically significant changes in dipolar couplings, which reflect perturbations in the lipid molecules at the membrane surface or within the hydrophobic interior.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koch, C.D.; Pirkle, F.L.; Schmidt, J.S.
1981-01-01
A Principal Components Analysis (PCA) has been written to aid in the interpretation of multivariate aerial radiometric data collected by the US Department of Energy (DOE) under the National Uranium Resource Evaluation (NURE) program. The variations exhibited by these data have been reduced and classified into a number of linear combinations by using the PCA program. The PCA program then generates histograms and outlier maps of the individual variates. Black and white plots can be made on a Calcomp plotter by the application of follow-up programs. All programs referred to in this guide were written for a DEC-10. From thismore » analysis a geologist may begin to interpret the data structure. Insight into geological processes underlying the data may be obtained.« less
NASA Astrophysics Data System (ADS)
Vítková, Gabriela; Prokeš, Lubomír; Novotný, Karel; Pořízka, Pavel; Novotný, Jan; Všianský, Dalibor; Čelko, Ladislav; Kaiser, Jozef
2014-11-01
Focusing on historical aspect, during archeological excavation or restoration works of buildings or different structures built from bricks it is important to determine, preferably in-situ and in real-time, the locality of bricks origin. Fast classification of bricks on the base of Laser-Induced Breakdown Spectroscopy (LIBS) spectra is possible using multivariate statistical methods. Combination of principal component analysis (PCA) and linear discriminant analysis (LDA) was applied in this case. LIBS was used to classify altogether the 29 brick samples from 7 different localities. Realizing comparative study using two different LIBS setups - stand-off and table-top it is shown that stand-off LIBS has a big potential for archeological in-field measurements.
Correcting for population structure and kinship using the linear mixed model: theory and extensions.
Hoffman, Gabriel E
2013-01-01
Population structure and kinship are widespread confounding factors in genome-wide association studies (GWAS). It has been standard practice to include principal components of the genotypes in a regression model in order to account for population structure. More recently, the linear mixed model (LMM) has emerged as a powerful method for simultaneously accounting for population structure and kinship. The statistical theory underlying the differences in empirical performance between modeling principal components as fixed versus random effects has not been thoroughly examined. We undertake an analysis to formalize the relationship between these widely used methods and elucidate the statistical properties of each. Moreover, we introduce a new statistic, effective degrees of freedom, that serves as a metric of model complexity and a novel low rank linear mixed model (LRLMM) to learn the dimensionality of the correction for population structure and kinship, and we assess its performance through simulations. A comparison of the results of LRLMM and a standard LMM analysis applied to GWAS data from the Multi-Ethnic Study of Atherosclerosis (MESA) illustrates how our theoretical results translate into empirical properties of the mixed model. Finally, the analysis demonstrates the ability of the LRLMM to substantially boost the strength of an association for HDL cholesterol in Europeans.
Using Structural Equation Modeling To Fit Models Incorporating Principal Components.
ERIC Educational Resources Information Center
Dolan, Conor; Bechger, Timo; Molenaar, Peter
1999-01-01
Considers models incorporating principal components from the perspectives of structural-equation modeling. These models include the following: (1) the principal-component analysis of patterned matrices; (2) multiple analysis of variance based on principal components; and (3) multigroup principal-components analysis. Discusses fitting these models…
NASA Astrophysics Data System (ADS)
Chavez, Roberto; Lozano, Sergio; Correia, Pedro; Sanz-Rodrigo, Javier; Probst, Oliver
2013-04-01
With the purpose of efficiently and reliably generating long-term wind resource maps for the wind energy industry, the application and verification of a statistical methodology for the climate downscaling of wind fields at surface level is presented in this work. This procedure is based on the combination of the Monte Carlo and the Principal Component Analysis (PCA) statistical methods. Firstly the Monte Carlo method is used to create a huge number of daily-based annual time series, so called climate representative years, by the stratified sampling of a 33-year-long time series corresponding to the available period of the NCAR/NCEP global reanalysis data set (R-2). Secondly the representative years are evaluated such that the best set is chosen according to its capability to recreate the Sea Level Pressure (SLP) temporal and spatial fields from the R-2 data set. The measure of this correspondence is based on the Euclidean distance between the Empirical Orthogonal Functions (EOF) spaces generated by the PCA (Principal Component Analysis) decomposition of the SLP fields from both the long-term and the representative year data sets. The methodology was verified by comparing the selected 365-days period against a 9-year period of wind fields generated by dynamical downscaling the Global Forecast System data with the mesoscale model SKIRON for the Iberian Peninsula. These results showed that, compared to the traditional method of dynamical downscaling any random 365-days period, the error in the average wind velocity by the PCA's representative year was reduced by almost 30%. Moreover the Mean Absolute Errors (MAE) in the monthly and daily wind profiles were also reduced by almost 25% along all SKIRON grid points. These results showed also that the methodology presented maximum error values in the wind speed mean of 0.8 m/s and maximum MAE in the monthly curves of 0.7 m/s. Besides the bulk numbers, this work shows the spatial distribution of the errors across the Iberian domain and additional wind statistics such as the velocity and directional frequency. Additional repetitions were performed to prove the reliability and robustness of this kind-of statistical-dynamical downscaling method.
Statistical downscaling of summer precipitation over northwestern South America
NASA Astrophysics Data System (ADS)
Palomino Lemus, Reiner; Córdoba Machado, Samir; Raquel Gámiz Fortis, Sonia; Castro Díez, Yolanda; Jesús Esteban Parra, María
2015-04-01
In this study a statistical downscaling (SD) model using Principal Component Regression (PCR) for simulating summer precipitation in Colombia during the period 1950-2005, has been developed, and climate projections during the 2071-2100 period by applying the obtained SD model have been obtained. For these ends the Principal Components (PCs) of the SLP reanalysis data from NCEP were used as predictor variables, while the observed gridded summer precipitation was the predictand variable. Period 1950-1993 was utilized for calibration and 1994-2010 for validation. The Bootstrap with replacement was applied to provide estimations of the statistical errors. All models perform reasonably well at regional scales, and the spatial distribution of the correlation coefficients between predicted and observed gridded precipitation values show high values (between 0.5 and 0.93) along Andes range, north and north Pacific of Colombia. Additionally, the ability of the MIROC5 GCM to simulate the summer precipitation in Colombia, for present climate (1971-2005), has been analyzed by calculating the differences between the simulated and observed precipitation values. The simulation obtained by this GCM strongly overestimates the precipitation along a horizontal sector through the center of Colombia, especially important at the east and west of this country. However, the SD model applied to the SLP of the GCM shows its ability to faithfully reproduce the rainfall field. Finally, in order to get summer precipitation projections in Colombia for the period 1971-2100, the downscaled model, recalibrated for the total period 1950-2010, has been applied to the SLP output from MIROC5 model under the RCP2.6, RCP4.5 and RCP8.5 scenarios. The changes estimated by the SD models are not significant under the RCP2.6 scenario, while for the RCP4.5 and RCP8.5 scenarios a significant increase of precipitation appears regard to the present values in all the regions, reaching around the 27% in northern Colombia region under the RCP8.5 scenario. Keywords: Statistical downscaling, precipitation, Principal Component Regression, climate change, Colombia. ACKNOWLEDGEMENTS This work has been financed by the projects P11-RNM-7941 (Junta de Andalucía-Spain) and CGL2013-48539-R (MINECO-Spain, FEDER).
Saliba, Christopher M; Clouthier, Allison L; Brandon, Scott C E; Rainbow, Michael J; Deluzio, Kevin J
2018-05-29
Abnormal loading of the knee joint contributes to the pathogenesis of knee osteoarthritis. Gait retraining is a non-invasive intervention that aims to reduce knee loads by providing audible, visual, or haptic feedback of gait parameters. The computational expense of joint contact force prediction has limited real-time feedback to surrogate measures of the contact force, such as the knee adduction moment. We developed a method to predict knee joint contact forces using motion analysis and a statistical regression model that can be implemented in near real-time. Gait waveform variables were deconstructed using principal component analysis and a linear regression was used to predict the principal component scores of the contact force waveforms. Knee joint contact force waveforms were reconstructed using the predicted scores. We tested our method using a heterogenous population of asymptomatic controls and subjects with knee osteoarthritis. The reconstructed contact force waveforms had mean (SD) RMS differences of 0.17 (0.05) bodyweight compared to the contact forces predicted by a musculoskeletal model. Our method successfully predicted subject-specific shape features of contact force waveforms and is a potentially powerful tool in biofeedback and clinical gait analysis.
Modeling vertebrate diversity in Oregon using satellite imagery
NASA Astrophysics Data System (ADS)
Cablk, Mary Elizabeth
Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.
Liu, Zechang; Wang, Liping; Liu, Yumei
2018-01-18
Hops impart flavor to beer, with the volatile components characterizing the various hop varieties and qualities. Fingerprinting, especially flavor fingerprinting, is often used to identify 'flavor products' because inconsistencies in the description of flavor may lead to an incorrect definition of beer quality. Compared to flavor fingerprinting, volatile fingerprinting is simpler and easier. We performed volatile fingerprinting using head space-solid phase micro-extraction gas chromatography-mass spectrometry combined with similarity analysis and principal component analysis (PCA) for evaluating and distinguishing between three major Chinese hops. Eighty-four volatiles were identified, which were classified into seven categories. Volatile fingerprinting based on similarity analysis did not yield any obvious result. By contrast, hop varieties and qualities were identified using volatile fingerprinting based on PCA. The potential variables explained the variance in the three hop varieties. In addition, the dendrogram and principal component score plot described the differences and classifications of hops. Volatile fingerprinting plus multivariate statistical analysis can rapidly differentiate between the different varieties and qualities of the three major Chinese hops. Furthermore, this method can be used as a reference in other fields. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.
Sel, İlker; Çakmakcı, Mehmet; Özkaya, Bestamin; Suphi Altan, H
2016-10-01
Main objective of this study was to develop a statistical model for easier and faster Biochemical Methane Potential (BMP) prediction of landfilled municipal solid waste by analyzing waste composition of excavated samples from 12 sampling points and three waste depths representing different landfilling ages of closed and active sections of a sanitary landfill site located in İstanbul, Turkey. Results of Principal Component Analysis (PCA) were used as a decision support tool to evaluation and describe the waste composition variables. Four principal component were extracted describing 76% of data set variance. The most effective components were determined as PCB, PO, T, D, W, FM, moisture and BMP for the data set. Multiple Linear Regression (MLR) models were built by original compositional data and transformed data to determine differences. It was observed that even residual plots were better for transformed data the R(2) and Adjusted R(2) values were not improved significantly. The best preliminary BMP prediction models consisted of D, W, T and FM waste fractions for both versions of regressions. Adjusted R(2) values of the raw and transformed models were determined as 0.69 and 0.57, respectively. Copyright © 2016 Elsevier Ltd. All rights reserved.
Forensic age estimation by morphometric analysis of the manubrium from 3D MR images.
Martínez Vera, Naira P; Höller, Johannes; Widek, Thomas; Neumayer, Bernhard; Ehammer, Thomas; Urschler, Martin
2017-08-01
Forensic age estimation research based on skeletal structures focuses on patterns of growth and development using different bones. In this work, our aim was to study growth-related evolution of the manubrium in living adolescents and young adults using magnetic resonance imaging (MRI), which is an image acquisition modality that does not involve ionizing radiation. In a first step, individual manubrium and subject features were correlated with age, which confirmed a statistically significant change of manubrium volume (M vol :p<0.01, R 2 ¯=0.50) and surface area (M sur :p<0.01, R 2 ¯=0.53) for the studied age range. Additionally, shapes of the manubria were for the first time investigated using principal component analysis. The decomposition of the data in principal components allowed to analyse the contribution of each component to total shape variation. With 13 principal components, ∼96% of shape variation could be described (M shp :p<0.01, R 2 ¯=0.60). Multiple linear regression analysis modelled the relationship between the statistically best correlated variables and age. Models including manubrium shape, volume or surface area divided by the height of the subject (Y∼M shp M sur /S h :p<0.01, R 2 ¯=0.71; Y∼M shp M vol /S h :p<0.01, R 2 ¯=0.72) presented a standard error of estimate of two years. In order to estimate the accuracy of these two manubrium-based age estimation models, cross validation experiments predicting age on held-out test sets were performed. Median absolute difference of predicted and known chronological age was 1.18 years for the best performing model (Y∼M shp M sur /S h :p<0.01, R p 2 =0.67). In conclusion, despite limitations in determining legal majority age, manubrium morphometry analysis presented statistically significant results for skeletal age estimation, which indicates that this bone structure may be considered as a new candidate in multi-factorial MRI-based age estimation. Copyright © 2017 Elsevier B.V. All rights reserved.
A Methodology for the Parametric Reconstruction of Non-Steady and Noisy Meteorological Time Series
NASA Astrophysics Data System (ADS)
Rovira, F.; Palau, J. L.; Millán, M.
2009-09-01
Climatic and meteorological time series often show some persistence (in time) in the variability of certain features. One could regard annual, seasonal and diurnal time variability as trivial persistence in the variability of some meteorological magnitudes (as, e.g., global radiation, air temperature above surface, etc.). In these cases, the traditional Fourier transform into frequency space will show the principal harmonics as the components with the largest amplitude. Nevertheless, meteorological measurements often show other non-steady (in time) variability. Some fluctuations in measurements (at different time scales) are driven by processes that prevail on some days (or months) of the year but disappear on others. By decomposing a time series into time-frequency space through the continuous wavelet transformation, one is able to determine both the dominant modes of variability and how those modes vary in time. This study is based on a numerical methodology to analyse non-steady principal harmonics in noisy meteorological time series. This methodology combines both the continuous wavelet transform and the development of a parametric model that includes the time evolution of the principal and the most statistically significant harmonics of the original time series. The parameterisation scheme proposed in this study consists of reproducing the original time series by means of a statistically significant finite sum of sinusoidal signals (waves), each defined by using the three usual parameters: amplitude, frequency and phase. To ensure the statistical significance of the parametric reconstruction of the original signal, we propose a standard statistical t-student analysis of the confidence level of the amplitude in the parametric spectrum for the different wave components. Once we have assured the level of significance of the different waves composing the parametric model, we can obtain the statistically significant principal harmonics (in time) of the original time series by using the Fourier transform of the modelled signal. Acknowledgements The CEAM Foundation is supported by the Generalitat Valenciana and BANCAIXA (València, Spain). This study has been partially funded by the European Commission (FP VI, Integrated Project CIRCE - No. 036961) and by the Ministerio de Ciencia e Innovación, research projects "TRANSREG” (CGL2007-65359/CLI) and "GRACCIE” (CSD2007-00067, Program CONSOLIDER-INGENIO 2010).
Modified neural networks for rapid recovery of tokamak plasma parameters for real time control
NASA Astrophysics Data System (ADS)
Sengupta, A.; Ranjan, P.
2002-07-01
Two modified neural network techniques are used for the identification of the equilibrium plasma parameters of the Superconducting Steady State Tokamak I from external magnetic measurements. This is expected to ultimately assist in a real time plasma control. As different from the conventional network structure where a single network with the optimum number of processing elements calculates the outputs, a multinetwork system connected in parallel does the calculations here in one of the methods. This network is called the double neural network. The accuracy of the recovered parameters is clearly more than the conventional network. The other type of neural network used here is based on the statistical function parametrization combined with a neural network. The principal component transformation removes linear dependences from the measurements and a dimensional reduction process reduces the dimensionality of the input space. This reduced and transformed input set, rather than the entire set, is fed into the neural network input. This is known as the principal component transformation-based neural network. The accuracy of the recovered parameters in the latter type of modified network is found to be a further improvement over the accuracy of the double neural network. This result differs from that obtained in an earlier work where the double neural network showed better performance. The conventional network and the function parametrization methods have also been used for comparison. The conventional network has been used for an optimization of the set of magnetic diagnostics. The effective set of sensors, as assessed by this network, are compared with the principal component based network. Fault tolerance of the neural networks has been tested. The double neural network showed the maximum resistance to faults in the diagnostics, while the principal component based network performed poorly. Finally the processing times of the methods have been compared. The double network and the principal component network involve the minimum computation time, although the conventional network also performs well enough to be used in real time.
A first application of independent component analysis to extracting structure from stock returns.
Back, A D; Weigend, A S
1997-08-01
This paper explores the application of a signal processing technique known as independent component analysis (ICA) or blind source separation to multivariate financial time series such as a portfolio of stocks. The key idea of ICA is to linearly map the observed multivariate time series into a new space of statistically independent components (ICs). We apply ICA to three years of daily returns of the 28 largest Japanese stocks and compare the results with those obtained using principal component analysis. The results indicate that the estimated ICs fall into two categories, (i) infrequent large shocks (responsible for the major changes in the stock prices), and (ii) frequent smaller fluctuations (contributing little to the overall level of the stocks). We show that the overall stock price can be reconstructed surprisingly well by using a small number of thresholded weighted ICs. In contrast, when using shocks derived from principal components instead of independent components, the reconstructed price is less similar to the original one. ICA is shown to be a potentially powerful method of analyzing and understanding driving mechanisms in financial time series. The application to portfolio optimization is described in Chin and Weigend (1998).
SIMREL: Software for Coefficient Alpha and Its Confidence Intervals with Monte Carlo Studies
ERIC Educational Resources Information Center
Yurdugul, Halil
2009-01-01
This article describes SIMREL, a software program designed for the simulation of alpha coefficients and the estimation of its confidence intervals. SIMREL runs on two alternatives. In the first one, if SIMREL is run for a single data file, it performs descriptive statistics, principal components analysis, and variance analysis of the item scores…
The spatial and temporal variability of ambient air concentrations of SO2, SO42-, NO3
NASA Astrophysics Data System (ADS)
He, Shiyuan; Wang, Lifan; Huang, Jianhua Z.
2018-04-01
With growing data from ongoing and future supernova surveys, it is possible to empirically quantify the shapes of SNIa light curves in more detail, and to quantitatively relate the shape parameters with the intrinsic properties of SNIa. Building such relationships is critical in controlling systematic errors associated with supernova cosmology. Based on a collection of well-observed SNIa samples accumulated in the past years, we construct an empirical SNIa light curve model using a statistical method called the functional principal component analysis (FPCA) for sparse and irregularly sampled functional data. Using this method, the entire light curve of an SNIa is represented by a linear combination of principal component functions, and the SNIa is represented by a few numbers called “principal component scores.” These scores are used to establish relations between light curve shapes and physical quantities such as intrinsic color, interstellar dust reddening, spectral line strength, and spectral classes. These relations allow for descriptions of some critical physical quantities based purely on light curve shape parameters. Our study shows that some important spectral feature information is being encoded in the broad band light curves; for instance, we find that the light curve shapes are correlated with the velocity and velocity gradient of the Si II λ6355 line. This is important for supernova surveys (e.g., LSST and WFIRST). Moreover, the FPCA light curve model is used to construct the entire light curve shape, which in turn is used in a functional linear form to adjust intrinsic luminosity when fitting distance models.
Reese, Sarah E; Archer, Kellie J; Therneau, Terry M; Atkinson, Elizabeth J; Vachon, Celine M; de Andrade, Mariza; Kocher, Jean-Pierre A; Eckel-Passow, Jeanette E
2013-11-15
Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal component analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of PCA to quantify the existence of batch effects, called guided PCA (gPCA). We describe a test statistic that uses gPCA to test whether a batch effect exists. We apply our proposed test statistic derived using gPCA to simulated data and to two copy number variation case studies: the first study consisted of 614 samples from a breast cancer family study using Illumina Human 660 bead-chip arrays, whereas the second case study consisted of 703 samples from a family blood pressure study that used Affymetrix SNP Array 6.0. We demonstrate that our statistic has good statistical properties and is able to identify significant batch effects in two copy number variation case studies. We developed a new statistic that uses gPCA to identify whether batch effects exist in high-throughput genomic data. Although our examples pertain to copy number data, gPCA is general and can be used on other data types as well. The gPCA R package (Available via CRAN) provides functionality and data to perform the methods in this article. reesese@vcu.edu
Raman signatures of ferroic domain walls captured by principal component analysis.
Nataf, G F; Barrett, N; Kreisel, J; Guennou, M
2018-01-24
Ferroic domain walls are currently investigated by several state-of-the art techniques in order to get a better understanding of their distinct, functional properties. Here, principal component analysis (PCA) of Raman maps is used to study ferroelectric domain walls (DWs) in LiNbO 3 and ferroelastic DWs in NdGaO 3 . It is shown that PCA allows us to quickly and reliably identify small Raman peak variations at ferroelectric DWs and that the value of a peak shift can be deduced-accurately and without a priori-from a first order Taylor expansion of the spectra. The ability of PCA to separate the contribution of ferroelastic domains and DWs to Raman spectra is emphasized. More generally, our results provide a novel route for the statistical analysis of any property mapped across a DW.
Harrison, Jay M; Howard, Delia; Malven, Marianne; Halls, Steven C; Culler, Angela H; Harrigan, George G; Wolfinger, Russell D
2013-07-03
Compositional studies on genetically modified (GM) and non-GM crops have consistently demonstrated that their respective levels of key nutrients and antinutrients are remarkably similar and that other factors such as germplasm and environment contribute more to compositional variability than transgenic breeding. We propose that graphical and statistical approaches that can provide meaningful evaluations of the relative impact of different factors to compositional variability may offer advantages over traditional frequentist testing. A case study on the novel application of principal variance component analysis (PVCA) in a compositional assessment of herbicide-tolerant GM cotton is presented. Results of the traditional analysis of variance approach confirmed the compositional equivalence of the GM and non-GM cotton. The multivariate approach of PVCA provided further information on the impact of location and germplasm on compositional variability relative to GM.
Corriveau, H; Arsenault, A B; Dutil, E; Lepage, Y
1992-01-01
An evaluation based on the Bobath approach to treatment has previously been developed and partially validated. The purpose of the present study was to verify the content validity of this evaluation with the use of a statistical approach known as principal components analysis. Thirty-eight hemiplegic subjects participated in the study. Analysis of the scores on each of six parameters (sensorium, active movements, muscle tone, reflex activity, postural reactions, and pain) was evaluated on three occasions across a 2-month period. Each time this produced three factors that contained 70% of the variation in the data set. The first component mainly reflected variations in mobility, the second mainly variations in muscle tone, and the third mainly variations in sensorium and pain. The results of such exploratory analysis highlight the fact that some of the parameters are not only important but also interrelated. These results seem to partially support the conceptual framework substantiating the Bobath approach to treatment.
Craters on Earth, Moon, and Mars: Multivariate classification and mode of origin
Pike, R.J.
1974-01-01
Testing extraterrestrial craters and candidate terrestrial analogs for morphologic similitude is treated as a problem in numerical taxonomy. According to a principal-components solution and a cluster analysis, 402 representative craters on the Earth, the Moon, and Mars divide into two major classes of contrasting shapes and modes of origin. Craters of net accumulation of material (cratered lunar domes, Martian "calderas," and all terrestrial volcanoes except maars and tuff rings) group apart from craters of excavation (terrestrial meteorite impact and experimental explosion craters, typical Martian craters, and all other lunar craters). Maars and tuff rings belong to neither group but are transitional. The classification criteria are four independent attributes of topographic geometry derived from seven descriptive variables by the principal-components transformation. Morphometric differences between crater bowl and raised rim constitute the strongest of the four components. Although single topographic variables cannot confidently predict the genesis of individual extraterrestrial craters, multivariate statistical models constructed from several variables can distinguish consistently between large impact craters and volcanoes. ?? 1974.
Trends in stratospheric ozone profiles using functional mixed models
NASA Astrophysics Data System (ADS)
Park, A.; Guillas, S.; Petropavlovskikh, I.
2013-11-01
This paper is devoted to the modeling of altitude-dependent patterns of ozone variations over time. Umkehr ozone profiles (quarter of Umkehr layer) from 1978 to 2011 are investigated at two locations: Boulder (USA) and Arosa (Switzerland). The study consists of two statistical stages. First we approximate ozone profiles employing an appropriate basis. To capture primary modes of ozone variations without losing essential information, a functional principal component analysis is performed. It penalizes roughness of the function and smooths excessive variations in the shape of the ozone profiles. As a result, data-driven basis functions (empirical basis functions) are obtained. The coefficients (principal component scores) corresponding to the empirical basis functions represent dominant temporal evolution in the shape of ozone profiles. We use those time series coefficients in the second statistical step to reveal the important sources of the patterns and variations in the profiles. We estimate the effects of covariates - month, year (trend), quasi-biennial oscillation, the solar cycle, the Arctic oscillation, the El Niño/Southern Oscillation cycle and the Eliassen-Palm flux - on the principal component scores of ozone profiles using additive mixed effects models. The effects are represented as smooth functions and the smooth functions are estimated by penalized regression splines. We also impose a heteroscedastic error structure that reflects the observed seasonality in the errors. The more complex error structure enables us to provide more accurate estimates of influences and trends, together with enhanced uncertainty quantification. Also, we are able to capture fine variations in the time evolution of the profiles, such as the semi-annual oscillation. We conclude by showing the trends by altitude over Boulder and Arosa, as well as for total column ozone. There are great variations in the trends across altitudes, which highlights the benefits of modeling ozone profiles.
Groundwater quality assessment of urban Bengaluru using multivariate statistical techniques
NASA Astrophysics Data System (ADS)
Gulgundi, Mohammad Shahid; Shetty, Amba
2018-03-01
Groundwater quality deterioration due to anthropogenic activities has become a subject of prime concern. The objective of the study was to assess the spatial and temporal variations in groundwater quality and to identify the sources in the western half of the Bengaluru city using multivariate statistical techniques. Water quality index rating was calculated for pre and post monsoon seasons to quantify overall water quality for human consumption. The post-monsoon samples show signs of poor quality in drinking purpose compared to pre-monsoon. Cluster analysis (CA), principal component analysis (PCA) and discriminant analysis (DA) were applied to the groundwater quality data measured on 14 parameters from 67 sites distributed across the city. Hierarchical cluster analysis (CA) grouped the 67 sampling stations into two groups, cluster 1 having high pollution and cluster 2 having lesser pollution. Discriminant analysis (DA) was applied to delineate the most meaningful parameters accounting for temporal and spatial variations in groundwater quality of the study area. Temporal DA identified pH as the most important parameter, which discriminates between water quality in the pre-monsoon and post-monsoon seasons and accounts for 72% seasonal assignation of cases. Spatial DA identified Mg, Cl and NO3 as the three most important parameters discriminating between two clusters and accounting for 89% spatial assignation of cases. Principal component analysis was applied to the dataset obtained from the two clusters, which evolved three factors in each cluster, explaining 85.4 and 84% of the total variance, respectively. Varifactors obtained from principal component analysis showed that groundwater quality variation is mainly explained by dissolution of minerals from rock water interactions in the aquifer, effect of anthropogenic activities and ion exchange processes in water.
Krohn, M.D.; Milton, N.M.; Segal, D.; Enland, A.
1981-01-01
A principal component image enhancement has been effective in applying Landsat data to geologic mapping in a heavily forested area of E Virginia. The image enhancement procedure consists of a principal component transformation, a histogram normalization, and the inverse principal componnet transformation. The enhancement preserves the independence of the principal components, yet produces a more readily interpretable image than does a single principal component transformation. -from Authors
Verma, Deepak; Sankhyan, Varun; Katoch, Sanjeet; Thakur, Yash Pal
2015-12-01
In the present study, biometric traits (body length [BL], heart girth [HG], paunch girth (PG), forelimb length (FLL), hind limb length (HLL), face length, forehead width, forehead length, height at hump, hump length (HL), hook to hook distance, pin to pin distance, tail length (TL), TL up to switch, horn length, horn circumference, and ear length were studied in 218 adult hill cattle of Himachal Pradesh for phenotypic characterization. Morphological and biometrical observations were recorded on 218 hill cattle randomly selected from different districts within the breeding tract. Multivariate statistics and principal component analysis are used to account for the maximum portion of variation present in the original set of variables with a minimum number of composite variables through Statistical software, SAS 9.2. Five components were extracted which accounted for 65.9% of variance. The first component explained general body confirmation and explained 34.7% variation. It was represented by significant loading for BL, HG, PG, FLL, and HLL. Communality estimate ranged from 0.41 (HL) to 0.88 (TL). Second, third, fourth, and fifth component had a high loading for tail characteristics, horn characteristics, facial biometrics, and rear body, respectively. The result of component analysis of biometric traits suggested that indigenous hill cattle of Himachal Pradesh are small and compact size cattle with a medium hump, horizontally placed short ears, and a long tail. The study also revealed that factors extracted from the present investigation could be used in breeding programs with sufficient reduction in the number of biometric traits to be recorded to explain the body confirmation.
Batch Statistical Process Monitoring Approach to a Cocrystallization Process.
Sarraguça, Mafalda C; Ribeiro, Paulo R S; Dos Santos, Adenilson O; Lopes, João A
2015-12-01
Cocrystals are defined as crystalline structures composed of two or more compounds that are solid at room temperature held together by noncovalent bonds. Their main advantages are the increase of solubility, bioavailability, permeability, stability, and at the same time retaining active pharmaceutical ingredient bioactivity. The cocrystallization between furosemide and nicotinamide by solvent evaporation was monitored on-line using near-infrared spectroscopy (NIRS) as a process analytical technology tool. The near-infrared spectra were analyzed using principal component analysis. Batch statistical process monitoring was used to create control charts to perceive the process trajectory and define control limits. Normal and non-normal operating condition batches were performed and monitored with NIRS. The use of NIRS associated with batch statistical process models allowed the detection of abnormal variations in critical process parameters, like the amount of solvent or amount of initial components present in the cocrystallization. © 2015 Wiley Periodicals, Inc. and the American Pharmacists Association.
Mager, P P; Rothe, H
1990-10-01
Multicollinearity of physicochemical descriptors leads to serious consequences in quantitative structure-activity relationship (QSAR) analysis, such as incorrect estimators and test statistics of regression coefficients of the ordinary least-squares (OLS) model applied usually to QSARs. Beside the diagnosis of the known simple collinearity, principal component regression analysis (PCRA) also allows the diagnosis of various types of multicollinearity. Only if the absolute values of PCRA estimators are order statistics that decrease monotonically, the effects of multicollinearity can be circumvented. Otherwise, obscure phenomena may be observed, such as good data recognition but low predictive model power of a QSAR model.
Kalegowda, Yogesh; Harmer, Sarah L
2012-03-20
Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.
Catelani, Tiago A; Santos, João Rodrigo; Páscoa, Ricardo N M J; Pezza, Leonardo; Pezza, Helena R; Lopes, João A
2018-03-01
This work proposes the use of near infrared (NIR) spectroscopy in diffuse reflectance mode and multivariate statistical process control (MSPC) based on principal component analysis (PCA) for real-time monitoring of the coffee roasting process. The main objective was the development of a MSPC methodology able to early detect disturbances to the roasting process resourcing to real-time acquisition of NIR spectra. A total of fifteen roasting batches were defined according to an experimental design to develop the MSPC models. This methodology was tested on a set of five batches where disturbances of different nature were imposed to simulate real faulty situations. Some of these batches were used to optimize the model while the remaining was used to test the methodology. A modelling strategy based on a time sliding window provided the best results in terms of distinguishing batches with and without disturbances, resourcing to typical MSPC charts: Hotelling's T 2 and squared predicted error statistics. A PCA model encompassing a time window of four minutes with three principal components was able to efficiently detect all disturbances assayed. NIR spectroscopy combined with the MSPC approach proved to be an adequate auxiliary tool for coffee roasters to detect faults in a conventional roasting process in real-time. Copyright © 2017 Elsevier B.V. All rights reserved.
Statistical process control of cocrystallization processes: A comparison between OPLS and PLS.
Silva, Ana F T; Sarraguça, Mafalda Cruz; Ribeiro, Paulo R; Santos, Adenilson O; De Beer, Thomas; Lopes, João Almeida
2017-03-30
Orthogonal partial least squares regression (OPLS) is being increasingly adopted as an alternative to partial least squares (PLS) regression due to the better generalization that can be achieved. Particularly in multivariate batch statistical process control (BSPC), the use of OPLS for estimating nominal trajectories is advantageous. In OPLS, the nominal process trajectories are expected to be captured in a single predictive principal component while uncorrelated variations are filtered out to orthogonal principal components. In theory, OPLS will yield a better estimation of the Hotelling's T 2 statistic and corresponding control limits thus lowering the number of false positives and false negatives when assessing the process disturbances. Although OPLS advantages have been demonstrated in the context of regression, its use on BSPC was seldom reported. This study proposes an OPLS-based approach for BSPC of a cocrystallization process between hydrochlorothiazide and p-aminobenzoic acid monitored on-line with near infrared spectroscopy and compares the fault detection performance with the same approach based on PLS. A series of cocrystallization batches with imposed disturbances were used to test the ability to detect abnormal situations by OPLS and PLS-based BSPC methods. Results demonstrated that OPLS was generally superior in terms of sensibility and specificity in most situations. In some abnormal batches, it was found that the imposed disturbances were only detected with OPLS. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Ye, M.; Pacheco Castro, R. B.; Pacheco Avila, J.; Cabrera Sansores, A.
2014-12-01
The karstic aquifer of Yucatan is a vulnerable and complex system. The first fifteen meters of this aquifer have been polluted, due to this the protection of this resource is important because is the only source of potable water of the entire State. Through the assessment of groundwater quality we can gain some knowledge about the main processes governing water chemistry as well as spatial patterns which are important to establish protection zones. In this work multivariate statistical techniques are used to assess the groundwater quality of the supply wells (30 to 40 meters deep) in the hidrogeologic region of the Ring of Cenotes, located in Yucatan, Mexico. Cluster analysis and principal component analysis are applied in groundwater chemistry data of the study area. Results of principal component analysis show that the main sources of variation in the data are due sea water intrusion and the interaction of the water with the carbonate rocks of the system and some pollution processes. The cluster analysis shows that the data can be divided in four clusters. The spatial distribution of the clusters seems to be random, but is consistent with sea water intrusion and pollution with nitrates. The overall results show that multivariate statistical analysis can be successfully applied in the groundwater quality assessment of this karstic aquifer.
A climatology of total ozone mapping spectrometer data using rotated principal component analysis
NASA Astrophysics Data System (ADS)
Eder, Brian K.; Leduc, Sharon K.; Sickles, Joseph E.
1999-02-01
The spatial and temporal variability of total column ozone (Ω) obtained from the total ozone mapping spectrometer (TOMS version 7.0) during the period 1980-1992 was examined through the use of a multivariate statistical technique called rotated principal component analysis. Utilization of Kaiser's varimax orthogonal rotation led to the identification of 14, mostly contiguous subregions that together accounted for more than 70% of the total Ω variance. Each subregion displayed statistically unique Ω characteristics that were further examined through time series and spectral density analyses, revealing significant periodicities on semiannual, annual, quasi-biennial, and longer term time frames. This analysis facilitated identification of the probable mechanisms responsible for the variability of Ω within the 14 homogeneous subregions. The mechanisms were either dynamical in nature (i.e., advection associated with baroclinic waves, the quasi-biennial oscillation, or El Niño-Southern Oscillation) or photochemical in nature (i.e., production of odd oxygen (O or O3) associated with the annual progression of the Sun). The analysis has also revealed that the influence of a data retrieval artifact, found in equatorial latitudes of version 6.0 of the TOMS data, has been reduced in version 7.0.
NASA Astrophysics Data System (ADS)
Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.
2008-04-01
Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.
Defining the ecological hydrology of Taiwan Rivers using multivariate statistical methods
NASA Astrophysics Data System (ADS)
Chang, Fi-John; Wu, Tzu-Ching; Tsai, Wen-Ping; Herricks, Edwin E.
2009-09-01
SummaryThe identification and verification of ecohydrologic flow indicators has found new support as the importance of ecological flow regimes is recognized in modern water resources management, particularly in river restoration and reservoir management. An ecohydrologic indicator system reflecting the unique characteristics of Taiwan's water resources and hydrology has been developed, the Taiwan ecohydrological indicator system (TEIS). A major challenge for the water resources community is using the TEIS to provide environmental flow rules that improve existing water resources management. This paper examines data from the extensive network of flow monitoring stations in Taiwan using TEIS statistics to define and refine environmental flow options in Taiwan. Multivariate statistical methods were used to examine TEIS statistics for 102 stations representing the geographic and land use diversity of Taiwan. The Pearson correlation coefficient showed high multicollinearity between the TEIS statistics. Watersheds were separated into upper and lower-watershed locations. An analysis of variance indicated significant differences between upstream, more natural, and downstream, more developed, locations in the same basin with hydrologic indicator redundancy in flow change and magnitude statistics. Issues of multicollinearity were examined using a Principal Component Analysis (PCA) with the first three components related to general flow and high/low flow statistics, frequency and time statistics, and quantity statistics. These principle components would explain about 85% of the total variation. A major conclusion is that managers must be aware of differences among basins, as well as differences within basins that will require careful selection of management procedures to achieve needed flow regimes.
PCA leverage: outlier detection for high-dimensional functional magnetic resonance imaging data.
Mejia, Amanda F; Nebel, Mary Beth; Eloyan, Ani; Caffo, Brian; Lindquist, Martin A
2017-07-01
Outlier detection for high-dimensional (HD) data is a popular topic in modern statistical research. However, one source of HD data that has received relatively little attention is functional magnetic resonance images (fMRI), which consists of hundreds of thousands of measurements sampled at hundreds of time points. At a time when the availability of fMRI data is rapidly growing-primarily through large, publicly available grassroots datasets-automated quality control and outlier detection methods are greatly needed. We propose principal components analysis (PCA) leverage and demonstrate how it can be used to identify outlying time points in an fMRI run. Furthermore, PCA leverage is a measure of the influence of each observation on the estimation of principal components, which are often of interest in fMRI data. We also propose an alternative measure, PCA robust distance, which is less sensitive to outliers and has controllable statistical properties. The proposed methods are validated through simulation studies and are shown to be highly accurate. We also conduct a reliability study using resting-state fMRI data from the Autism Brain Imaging Data Exchange and find that removal of outliers using the proposed methods results in more reliable estimation of subject-level resting-state networks using independent components analysis. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Vina, Andres; Peters, Albert J.; Ji, Lei
2003-01-01
There is a global concern about the increase in atmospheric concentrations of greenhouse gases. One method being discussed to encourage greenhouse gas mitigation efforts is based on a trading system whereby carbon emitters can buy effective mitigation efforts from farmers implementing conservation tillage practices. These practices sequester carbon from the atmosphere, and such a trading system would require a low-cost and accurate method of verification. Remote sensing technology can offer such a verification technique. This paper is focused on the use of standard image processing procedures applied to a multispectral Ikonos image, to determine whether it is possible to validate that farmers have complied with agreements to implement conservation tillage practices. A principal component analysis (PCA) was performed in order to isolate image variance in cropped fields. Analyses of variance (ANOVA) statistical procedures were used to evaluate the capability of each Ikonos band and each principal component to discriminate between conventional and conservation tillage practices. A logistic regression model was implemented on the principal component most effective in discriminating between conventional and conservation tillage, in order to produce a map of the probability of conventional tillage. The Ikonos imagery, in combination with ground-reference information, proved to be a useful tool for verification of conservation tillage practices.
Characterization of exopolymers of aquatic bacteria by pyrolysis-mass spectrometry
NASA Technical Reports Server (NTRS)
Ford, T.; Sacco, E.; Black, J.; Kelley, T.; Goodacre, R.; Berkeley, R. C.; Mitchell, R.
1991-01-01
Exopolymers from a diverse collection of marine and freshwater bacteria were characterized by pyrolysis-mass spectrometry (Py-MS). Py-MS provides spectra of pyrolysis fragments that are characteristic of the original material. Analysis of the spectra by multivariate statistical techniques (principal component and canonical variate analysis) separated these exopolymers into distinct groups. Py-MS clearly distinguished characteristic fragments, which may be derived from components responsible for functional differences between polymers. The importance of these distinctions and the relevance of pyrolysis information to exopolysaccharide function in aquatic bacteria is discussed.
O'Connor, B P
2000-08-01
Popular statistical software packages do not have the proper procedures for determining the number of components in factor and principal components analyses. Parallel analysis and Velicer's minimum average partial (MAP) test are validated procedures, recommended widely by statisticians. However, many researchers continue to use alternative, simpler, but flawed procedures, such as the eigenvalues-greater-than-one rule. Use of the proper procedures might be increased if these procedures could be conducted within familiar software environments. This paper describes brief and efficient programs for using SPSS and SAS to conduct parallel analyses and the MAP test.
Ehresmann, Bernd; de Groot, Marcel J; Alex, Alexander; Clark, Timothy
2004-01-01
New molecular descriptors based on statistical descriptions of the local ionization potential, local electron affinity, and the local polarizability at the surface of the molecule are proposed. The significance of these descriptors has been tested by calculating them for the Maybridge database in addition to our set of 26 descriptors reported previously. The new descriptors show little correlation with those already in use. Furthermore, the principal components of the extended set of descriptors for the Maybridge data show that especially the descriptors based on the local electron affinity extend the variance in our set of descriptors, which we have previously shown to be relevant to physical properties. The first nine principal components are shown to be most significant. As an example of the usefulness of the new descriptors, we have set up a QSPR model for boiling points using both the old and new descriptors.
NASA Astrophysics Data System (ADS)
Jha, S. K.; Brockman, R. A.; Hoffman, R. M.; Sinha, V.; Pilchak, A. L.; Porter, W. J.; Buchanan, D. J.; Larsen, J. M.; John, R.
2018-05-01
Principal component analysis and fuzzy c-means clustering algorithms were applied to slip-induced strain and geometric metric data in an attempt to discover unique microstructural configurations and their frequencies of occurrence in statistically representative instantiations of a titanium alloy microstructure. Grain-averaged fatigue indicator parameters were calculated for the same instantiation. The fatigue indicator parameters strongly correlated with the spatial location of the microstructural configurations in the principal components space. The fuzzy c-means clustering method identified clusters of data that varied in terms of their average fatigue indicator parameters. Furthermore, the number of points in each cluster was inversely correlated to the average fatigue indicator parameter. This analysis demonstrates that data-driven methods have significant potential for providing unbiased determination of unique microstructural configurations and their frequencies of occurrence in a given volume from the point of view of strain localization and fatigue crack initiation.
Rodrigues, Camila Carneiro Dos Santos; Santos, Ewerton; Ramos, Brunalisa Silva; Damasceno, Flaviana Cardoso; Correa, José Augusto Martins
2018-06-01
The 16 priority PAH were determined in sediment samples from the insular zone of Guajará Bay and Guamá River (Southern Amazon River mouth). Low hydrocarbon levels were observed and naphthalene was the most representative PAH. The low molecular weight PAH represented 51% of the total PAH. Statistical analysis showed that the sampling sites are not significantly different. Source analysis by PAH ratios and principal component analysis revealed that PAH are primary from a few rate of fossil fuel combustion, mainly related to the local small community activity. All samples presented no biological stress or damage potencial according to the sediment quality guidelines. This study discuss baselines for PAH in surface sediments from Amazonic aquatic systems based on source determination by PAH ratios and principal component analysis, sediment quality guidelines and through comparison with previous studies data.
Yazdani, Farzaneh; Razeghi, Mohsen; Karimi, Mohammad Taghi; Raeisi Shahraki, Hadi; Salimi Bani, Milad
2018-05-01
Despite the theoretical link between foot hyperpronation and biomechanical dysfunction of the pelvis, the literature lacks evidence that confirms this assumption in truly hyperpronated feet subjects during gait. Changes in the kinematic pattern of the pelvic segment were assessed in 15 persons with hyperpronated feet and compared to a control group of 15 persons with normally aligned feet during the stance phase of gait based on biomechanical musculoskeletal simulation. Kinematic and kinetic data were collected while participants walked at a comfortable self-selected speed. A generic OpenSim musculoskeletal model with 23 degrees of freedom and 92 muscles was scaled for each participant. OpenSim inverse kinematic analysis was applied to calculate segment angles in the sagittal, frontal and horizontal planes. Principal component analysis was employed as a data reduction technique, as well as a computational tool to obtain principal component scores. Independent-sample t-test was used to detect group differences. The difference between groups in scores for the first principal component in the sagittal plane was statistically significant (p = 0.01; effect size = 1.06), but differences between principal component scores in the frontal and horizontal planes were not significant. The hyperpronation group had greater anterior pelvic tilt during 20%-80% of the stance phase. In conclusion, in persons with hyperpronation we studied the role of the pelvic segment was mainly to maintain postural balance in the sagittal plane by increasing anterior pelvic inclination. Since anterior pelvic tilt may be associated with low back symptoms, the evaluation of foot posture should be considered in assessing the patients with low back and pelvic dysfunction.
Regionalization of precipitation characteristics in Iran's Lake Urmia basin
NASA Astrophysics Data System (ADS)
Fazel, Nasim; Berndtsson, Ronny; Uvo, Cintia Bertacchi; Madani, Kaveh; Kløve, Bjørn
2018-04-01
Lake Urmia in northwest Iran, once one of the largest hypersaline lakes in the world, has shrunk by almost 90% in area and 80% in volume during the last four decades. To improve the understanding of regional differences in water availability throughout the region and to refine the existing information on precipitation variability, this study investigated the spatial pattern of precipitation for the Lake Urmia basin. Daily rainfall time series from 122 precipitation stations with different record lengths were used to extract 15 statistical descriptors comprising 25th percentile, 75th percentile, and coefficient of variation for annual and seasonal total precipitation. Principal component analysis in association with cluster analysis identified three main homogeneous precipitation groups in the lake basin. The first sub-region (group 1) includes stations located in the center and southeast; the second sub-region (group 2) covers mostly northern and northeastern part of the basin, and the third sub-region (group 3) covers the western and southern edges of the basin. Results of principal component (PC) and clustering analyses showed that seasonal precipitation variation is the most important feature controlling the spatial pattern of precipitation in the lake basin. The 25th and 75th percentiles of winter and autumn are the most important variables controlling the spatial pattern of the first rotated principal component explaining about 32% of the total variance. Summer and spring precipitation variations are the most important variables in the second and third rotated principal components, respectively. Seasonal variation in precipitation amount and seasonality are explained by topography and influenced by the lake and westerly winds that are related to the strength of the North Atlantic Oscillation. Despite using incomplete time series with different lengths, the identified sub-regions are physically meaningful.
Rank estimation and the multivariate analysis of in vivo fast-scan cyclic voltammetric data
Keithley, Richard B.; Carelli, Regina M.; Wightman, R. Mark
2010-01-01
Principal component regression has been used in the past to separate current contributions from different neuromodulators measured with in vivo fast-scan cyclic voltammetry. Traditionally, a percent cumulative variance approach has been used to determine the rank of the training set voltammetric matrix during model development, however this approach suffers from several disadvantages including the use of arbitrary percentages and the requirement of extreme precision of training sets. Here we propose that Malinowski’s F-test, a method based on a statistical analysis of the variance contained within the training set, can be used to improve factor selection for the analysis of in vivo fast-scan cyclic voltammetric data. These two methods of rank estimation were compared at all steps in the calibration protocol including the number of principal components retained, overall noise levels, model validation as determined using a residual analysis procedure, and predicted concentration information. By analyzing 119 training sets from two different laboratories amassed over several years, we were able to gain insight into the heterogeneity of in vivo fast-scan cyclic voltammetric data and study how differences in factor selection propagate throughout the entire principal component regression analysis procedure. Visualizing cyclic voltammetric representations of the data contained in the retained and discarded principal components showed that using Malinowski’s F-test for rank estimation of in vivo training sets allowed for noise to be more accurately removed. Malinowski’s F-test also improved the robustness of our criterion for judging multivariate model validity, even though signal-to-noise ratios of the data varied. In addition, pH change was the majority noise carrier of in vivo training sets while dopamine prediction was more sensitive to noise. PMID:20527815
Barnes, Jill N; Harvey, Ronée E; Miller, Kathleen B; Jayachandran, Muthuvel; Malterer, Katherine R; Lahr, Brian D; Bailey, Kent R; Joyner, Michael J; Miller, Virginia M
2018-01-01
Cerebrovascular reactivity (CVR) is reduced in patients with cognitive decline. Women with a history of preeclampsia are at increased risk for cognitive decline. This study examined an association between pregnancy history and CVR using a subgroup of 40 age- and parity-matched pairs of women having histories of preeclampsia (n=27) or normotensive pregnancy (n=29) and the association of activated blood elements with CVR. Middle cerebral artery velocity was measured by Doppler ultrasound before and during hypercapnia to assess CVR. Thirty-eight parameters of blood cellular elements, microvesicles, and cell-cell interactions measured in venous blood were assessed for association with CVR using principal component analysis. Middle cerebral artery velocity was lower in the preeclampsia compared with the normotensive group at baseline (63±4 versus 73±3 cm/s; P =0.047) and during hypercapnia ( P =0.013-0.056). CVR was significantly lower in the preeclampsia compared with the normotensive group (2.1±1.3 versus 2.9±1.1 cm·s·mm Hg; P =0.009). Globally, the association of the 7 identified principal components with preeclampsia ( P =0.107) and with baseline middle cerebral artery velocity ( P =0.067) did not reach statistical significance. The interaction between pregnancy history and principal components with respect to CVR ( P =0.084) was driven by a nominally significant interaction between preeclampsia and the individual principal component defined by blood elements, platelet aggregation, and interactions of platelets with monocytes and granulocytes ( P =0.008). These results suggest that having a history of preeclampsia negatively affects the cerebral circulation years beyond the pregnancy and that this effect was associated with activated blood elements. © 2017 American Heart Association, Inc.
Value assignment and uncertainty evaluation for single-element reference solutions
NASA Astrophysics Data System (ADS)
Possolo, Antonio; Bodnar, Olha; Butler, Therese A.; Molloy, John L.; Winchester, Michael R.
2018-06-01
A Bayesian statistical procedure is proposed for value assignment and uncertainty evaluation for the mass fraction of the elemental analytes in single-element solutions distributed as NIST standard reference materials. The principal novelty that we describe is the use of information about relative differences observed historically between the measured values obtained via gravimetry and via high-performance inductively coupled plasma optical emission spectrometry, to quantify the uncertainty component attributable to between-method differences. This information is encapsulated in a prior probability distribution for the between-method uncertainty component, and it is then used, together with the information provided by current measurement data, to produce a probability distribution for the value of the measurand from which an estimate and evaluation of uncertainty are extracted using established statistical procedures.
2007-10-01
1984. Complex principal component analysis : Theory and examples. Journal of Climate and Applied Meteorology 23: 1660-1673. Hotelling, H. 1933...Sediments 99. ASCE: 2,566-2,581. Von Storch, H., and A. Navarra. 1995. Analysis of climate variability. Applications of statistical techniques. Berlin...ERDC TN-SWWRP-07-9 October 2007 Regional Morphology Empirical Analysis Package (RMAP): Orthogonal Function Analysis , Background and Examples by
Field-effect transistors (2nd revised and enlarged edition)
NASA Astrophysics Data System (ADS)
Bocharov, L. N.
The design, principle of operation, and principal technical characteristics of field-effect transistors produced in the USSR are described. Problems related to the use of field-effect transistors in various radioelectronic devices are examined, and tables of parameters and mean statistical characteristics are presented for the main types of field-effect transistors. Methods for calculating various circuit components are discussed and illustrated by numerical examples.
Improving the Power of GWAS and Avoiding Confounding from Population Stratification with PC-Select
Tucker, George; Price, Alkes L.; Berger, Bonnie
2014-01-01
Using a reduced subset of SNPs in a linear mixed model can improve power for genome-wide association studies, yet this can result in insufficient correction for population stratification. We propose a hybrid approach using principal components that does not inflate statistics in the presence of population stratification and improves power over standard linear mixed models. PMID:24788602
Climatic change projections for winter streamflow in Guadalquivir river
NASA Astrophysics Data System (ADS)
Jesús Esteban Parra, María; Hidalgo Muñoz, José Manuel; García-Valdecasas-Ojeda, Matilde; Raquel Gámiz Fortis, Sonia; Castro Díez, Yolanda
2015-04-01
In this work we have obtained climate change projections for winter streamflow of the Guadalquivir River in the period 2071-2100 using the Principal Component Regression (PCR) method. The streamflow data base used has been provided by the Center for Studies and Experimentation of Public Works, CEDEX. Series from gauging stations and reservoirs with less than 10% of missing data (filled by regression with well correlated neighboring stations) have been considered. The homogeneity of these series has been evaluated through the Pettit test and degree of human alteration by the Common Area Index. The application of these criteria led to the selection of 13 streamflow time series homogeneously distributed over the basin, covering the period 1952-2011. For this streamflow data, winter seasonal values were obtained by averaging the monthly values from January to March. The PCR method has been applied using the Principal Components of the mean anomalies of sea level pressure (SLP) in winter (December to February averaged) as predictors of streamflow for the development of a downscaled statistical model. The SLP database is the NCEP reanalysis covering the North Atlantic region, and the calibration and validation periods used for fitting and evaluating the ability of the model are 1952-1992 and 1993-2011, respectively. In general, using four Principal Components, regression models are able to explain up to 70% of the variance of the streamflow data. Finally, the statistical model obtained for the observational data was applied to the SLP data for the period 2071-2100, using the outputs of different GCMs of the CMIP5 under the RPC8.5 scenario. The results found for the end of the century show no significant changes or moderate decrease in the streamflow of this river for most GCMs in winter, but for some of them the decrease is very strong. Keywords: Statistical downscaling, streamflow, Guadalquivir River, climate change. ACKNOWLEDGEMENTS This work has been financed by the projects P11-RNM-7941 (Junta de Andalucía-Spain) and CGL2013-48539-R (MINECO-Spain, FEDER).
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
On the Fallibility of Principal Components in Research
ERIC Educational Resources Information Center
Raykov, Tenko; Marcoulides, George A.; Li, Tenglong
2017-01-01
The measurement error in principal components extracted from a set of fallible measures is discussed and evaluated. It is shown that as long as one or more measures in a given set of observed variables contains error of measurement, so also does any principal component obtained from the set. The error variance in any principal component is shown…
Statistical learning and selective inference.
Taylor, Jonathan; Tibshirani, Robert J
2015-06-23
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
The assessment of facial variation in 4747 British school children.
Toma, Arshed M; Zhurov, Alexei I; Playle, Rebecca; Marshall, David; Rosin, Paul L; Richmond, Stephen
2012-12-01
The aim of this study is to identify key components contributing to facial variation in a large population-based sample of 15.5-year-old children (2514 females and 2233 males). The subjects were recruited from the Avon Longitudinal Study of Parents and Children. Three-dimensional facial images were obtained for each subject using two high-resolution Konica Minolta laser scanners. Twenty-one reproducible facial landmarks were identified and their coordinates were recorded. The facial images were registered using Procrustes analysis. Principal component analysis was then employed to identify independent groups of correlated coordinates. For the total data set, 14 principal components (PCs) were identified which explained 82 per cent of the total variance, with the first three components accounting for 46 per cent of the variance. Similar results were obtained for males and females separately with only subtle gender differences in some PCs. Facial features may be treated as a multidimensional statistical continuum with respect to the PCs. The first three PCs characterize the face in terms of height, width, and prominence of the nose. The derived PCs may be useful to identify and classify faces according to a scale of normality.
Nature of Driving Force for Protein Folding: A Result From Analyzing the Statistical Potential
NASA Astrophysics Data System (ADS)
Li, Hao; Tang, Chao; Wingreen, Ned S.
1997-07-01
In a statistical approach to protein structure analysis, Miyazawa and Jernigan derived a 20×20 matrix of inter-residue contact energies between different types of amino acids. Using the method of eigenvalue decomposition, we find that the Miyazawa-Jernigan matrix can be accurately reconstructed from its first two principal component vectors as Mij = C0+C1\\(qi+qj\\)+C2qiqj, with constant C's, and 20 q values associated with the 20 amino acids. This regularity is due to hydrophobic interactions and a force of demixing, the latter obeying Hildebrand's solubility theory of simple liquids.
Differential use of fresh water environments by wintering waterfowl of coastal Texas
White, D.H.; James, D.
1978-01-01
A comparative study of the environmental relationships among 14 species of wintering waterfowl was conducted at the Welder Wildlife Foundation, San Patricia County, near Sinton, Texas during the fall and early winter of 1973. Measurements of 20 environmental factors (social, vegetational, physical, and chemical) were subjected to multivariate statistical methods to determine certain niche characteristics and environmental relationships of waterfowl wintering in the aquatic community.....Each waterfowl species occupied a unique realized niche by responding to distinct combinations of environmental factors identified by principal component analysis. One percent confidence ellipses circumscribing the mean scores plotted for the first and second principal components gave an indication of relative niche width for each species. The waterfowl environments were significantly different interspecifically and water depth at feeding site and % emergent vegetation were most important in the separation. This was shown by subjecting the transformed data to multivariate analysis of variance with an associated step-down procedure. The species were distributed along a community cline extending from shallow water with abundant emergent vegetation to open deep water with little emergent vegetation of any kind. Four waterfowl subgroups were significantly separated along the cline, as indicated by one-way analysis of variance with Duncan?s multiple range test. Clumping of the bird species toward the middle of the available habitat hyperspace was shown in a plot of the principal component scores for the random samples and individual species.....Naturally occurring relationships among waterfowl were clarified using principal comcomponent analysis and related multivariate procedures. These techniques may prove useful in wetland management for particular groups of waterfowl based on habitat preferences.
Principal Curves on Riemannian Manifolds.
Hauberg, Soren
2016-09-01
Euclidean statistics are often generalized to Riemannian manifolds by replacing straight-line interpolations with geodesic ones. While these Riemannian models are familiar-looking, they are restricted by the inflexibility of geodesics, and they rely on constructions which are optimal only in Euclidean domains. We consider extensions of Principal Component Analysis (PCA) to Riemannian manifolds. Classic Riemannian approaches seek a geodesic curve passing through the mean that optimizes a criteria of interest. The requirements that the solution both is geodesic and must pass through the mean tend to imply that the methods only work well when the manifold is mostly flat within the support of the generating distribution. We argue that instead of generalizing linear Euclidean models, it is more fruitful to generalize non-linear Euclidean models. Specifically, we extend the classic Principal Curves from Hastie & Stuetzle to data residing on a complete Riemannian manifold. We show that for elliptical distributions in the tangent of spaces of constant curvature, the standard principal geodesic is a principal curve. The proposed model is simple to compute and avoids many of the pitfalls of traditional geodesic approaches. We empirically demonstrate the effectiveness of the Riemannian principal curves on several manifolds and datasets.
Ma, Li; Sun, Jing; Yang, Zhaoguang; Wang, Lin
2015-12-01
Heavy metal contamination attracted a wide spread attention due to their strong toxicity and persistence. The Ganxi River, located in Chenzhou City, Southern China, has been severely polluted by lead/zinc ore mining activities. This work investigated the heavy metal pollution in agricultural soils around the Ganxi River. The total concentrations of heavy metals were determined by inductively coupled plasma-mass spectrometry. The potential risk associated with the heavy metals in soil was assessed by Nemerow comprehensive index and potential ecological risk index. In both methods, the study area was rated as very high risk. Multivariate statistical methods including Pearson's correlation analysis, hierarchical cluster analysis, and principal component analysis were employed to evaluate the relationships between heavy metals, as well as the correlation between heavy metals and pH, to identify the metal sources. Three distinct clusters have been observed by hierarchical cluster analysis. In principal component analysis, a total of two components were extracted to explain over 90% of the total variance, both of which were associated with anthropogenic sources.
NASA Astrophysics Data System (ADS)
Dafu, Shen; Leihong, Zhang; Dong, Liang; Bei, Li; Yi, Kang
2017-07-01
The purpose of this study is to improve the reconstruction precision and better copy the color of spectral image surfaces. A new spectral reflectance reconstruction algorithm based on an iterative threshold combined with weighted principal component space is presented in this paper, and the principal component with weighted visual features is the sparse basis. Different numbers of color cards are selected as the training samples, a multispectral image is the testing sample, and the color differences in the reconstructions are compared. The channel response value is obtained by a Mega Vision high-accuracy, multi-channel imaging system. The results show that spectral reconstruction based on weighted principal component space is superior in performance to that based on traditional principal component space. Therefore, the color difference obtained using the compressive-sensing algorithm with weighted principal component analysis is less than that obtained using the algorithm with traditional principal component analysis, and better reconstructed color consistency with human eye vision is achieved.
NASA Astrophysics Data System (ADS)
El Sharawy, Mohamed S.; Gaafar, Gamal R.
2016-12-01
Both reservoir engineers and petrophysicists have been concerned about dividing a reservoir into zones for engineering and petrophysics purposes. Through decades, several techniques and approaches were introduced. Out of them, statistical reservoir zonation, stratigraphic modified Lorenz (SML) plot and the principal component and clustering analyses techniques were chosen to apply on the Nubian sandstone reservoir of Palaeozoic - Lower Cretaceous age, Gulf of Suez, Egypt, by using five adjacent wells. The studied reservoir consists mainly of sandstone with some intercalation of shale layers with varying thickness from one well to another. The permeability ranged from less than 1 md to more than 1000 md. The statistical reservoir zonation technique, depending on core permeability, indicated that the cored interval of the studied reservoir can be divided into two zones. Using reservoir properties such as porosity, bulk density, acoustic impedance and interval transit time indicated also two zones with an obvious variation in separation depth and zones continuity. The stratigraphic modified Lorenz (SML) plot indicated the presence of more than 9 flow units in the cored interval as well as a high degree of microscopic heterogeneity. On the other hand, principal component and cluster analyses, depending on well logging data (gamma ray, sonic, density and neutron), indicated that the whole reservoir can be divided at least into four electrofacies having a noticeable variation in reservoir quality, as correlated with the measured permeability. Furthermore, continuity or discontinuity of the reservoir zones can be determined using this analysis.
Principal Component and Linkage Analysis of Cardiovascular Risk Traits in the Norfolk Isolate
Cox, Hannah C.; Bellis, Claire; Lea, Rod A.; Quinlan, Sharon; Hughes, Roger; Dyer, Thomas; Charlesworth, Jac; Blangero, John; Griffiths, Lyn R.
2009-01-01
Objective(s) An individual's risk of developing cardiovascular disease (CVD) is influenced by genetic factors. This study focussed on mapping genetic loci for CVD-risk traits in a unique population isolate derived from Norfolk Island. Methods This investigation focussed on 377 individuals descended from the population founders. Principal component analysis was used to extract orthogonal components from 11 cardiovascular risk traits. Multipoint variance component methods were used to assess genome-wide linkage using SOLAR to the derived factors. A total of 285 of the 377 related individuals were informative for linkage analysis. Results A total of 4 principal components accounting for 83% of the total variance were derived. Principal component 1 was loaded with body size indicators; principal component 2 with body size, cholesterol and triglyceride levels; principal component 3 with the blood pressures; and principal component 4 with LDL-cholesterol and total cholesterol levels. Suggestive evidence of linkage for principal component 2 (h2 = 0.35) was observed on chromosome 5q35 (LOD = 1.85; p = 0.0008). While peak regions on chromosome 10p11.2 (LOD = 1.27; p = 0.005) and 12q13 (LOD = 1.63; p = 0.003) were observed to segregate with principal components 1 (h2 = 0.33) and 4 (h2 = 0.42), respectively. Conclusion(s): This study investigated a number of CVD risk traits in a unique isolated population. Findings support the clustering of CVD risk traits and provide interesting evidence of a region on chromosome 5q35 segregating with weight, waist circumference, HDL-c and total triglyceride levels. PMID:19339786
NASA Astrophysics Data System (ADS)
Shah, Syed Muhammad Saqlain; Batool, Safeera; Khan, Imran; Ashraf, Muhammad Usman; Abbas, Syed Hussnain; Hussain, Syed Adnan
2017-09-01
Automatic diagnosis of human diseases are mostly achieved through decision support systems. The performance of these systems is mainly dependent on the selection of the most relevant features. This becomes harder when the dataset contains missing values for the different features. Probabilistic Principal Component Analysis (PPCA) has reputation to deal with the problem of missing values of attributes. This research presents a methodology which uses the results of medical tests as input, extracts a reduced dimensional feature subset and provides diagnosis of heart disease. The proposed methodology extracts high impact features in new projection by using Probabilistic Principal Component Analysis (PPCA). PPCA extracts projection vectors which contribute in highest covariance and these projection vectors are used to reduce feature dimension. The selection of projection vectors is done through Parallel Analysis (PA). The feature subset with the reduced dimension is provided to radial basis function (RBF) kernel based Support Vector Machines (SVM). The RBF based SVM serves the purpose of classification into two categories i.e., Heart Patient (HP) and Normal Subject (NS). The proposed methodology is evaluated through accuracy, specificity and sensitivity over the three datasets of UCI i.e., Cleveland, Switzerland and Hungarian. The statistical results achieved through the proposed technique are presented in comparison to the existing research showing its impact. The proposed technique achieved an accuracy of 82.18%, 85.82% and 91.30% for Cleveland, Hungarian and Switzerland dataset respectively.
NASA Astrophysics Data System (ADS)
Chattopadhyay, Surajit; Chattopadhyay, Goutami
2012-10-01
In the work discussed in this paper we considered total ozone time series over Kolkata (22°34'10.92″N, 88°22'10.92″E), an urban area in eastern India. Using cloud cover, average temperature, and rainfall as the predictors, we developed an artificial neural network, in the form of a multilayer perceptron with sigmoid non-linearity, for prediction of monthly total ozone concentrations from values of the predictors in previous months. We also estimated total ozone from values of the predictors in the same month. Before development of the neural network model we removed multicollinearity by means of principal component analysis. On the basis of the variables extracted by principal component analysis, we developed three artificial neural network models. By rigorous statistical assessment it was found that cloud cover and rainfall can act as good predictors for monthly total ozone when they are considered as the set of input variables for the neural network model constructed in the form of a multilayer perceptron. In general, the artificial neural network has good potential for predicting and estimating monthly total ozone on the basis of the meteorological predictors. It was further observed that during pre-monsoon and winter seasons, the proposed models perform better than during and after the monsoon.
Hrdlickova Kuckova, Stepanka; Rambouskova, Gabriela; Hynek, Radovan; Cejnar, Pavel; Oltrogge, Doris; Fuchs, Robert
2015-11-01
Matrix-assisted laser desorption/ionisation-time of flight (MALDI-TOF) mass spectrometry is commonly used for the identification of proteinaceous binders and their mixtures in artworks. The determination of protein binders is based on a comparison between the m/z values of tryptic peptides in the unknown sample and a reference one (egg, casein, animal glues etc.), but this method has greater potential to study changes due to ageing and the influence of organic/inorganic components on protein identification. However, it is necessary to then carry out statistical evaluation on the obtained data. Before now, it has been complicated to routinely convert the mass spectrometric data into a statistical programme, to extract and match the appropriate peaks. Only several 'homemade' computer programmes without user-friendly interfaces are available for these purposes. In this paper, we would like to present our completely new, publically available, non-commercial software, ms-alone and multiMS-toolbox, for principal component analyses of MALDI-TOF MS data for R software, and their application to the study of the influence of heterogeneous matrices (organic lakes) for protein identification. Using this new software, we determined the main factors that influence the protein analyses of artificially aged model mixtures of organic lakes and fish glue, prepared according to historical recipes that were used for book illumination, using MALDI-TOF peptide mass mapping. Copyright © 2015 John Wiley & Sons, Ltd.
Cocco, Simona; Monasson, Remi; Weigt, Martin
2013-01-01
Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant ‘patterns’ of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold. PMID:23990764
NASA Astrophysics Data System (ADS)
Tibaduiza, D.-A.; Torres-Arredondo, M.-A.; Mujica, L. E.; Rodellar, J.; Fritzen, C.-P.
2013-12-01
This article is concerned with the practical use of Multiway Principal Component Analysis (MPCA), Discrete Wavelet Transform (DWT), Squared Prediction Error (SPE) measures and Self-Organizing Maps (SOM) to detect and classify damages in mechanical structures. The formalism is based on a distributed piezoelectric active sensor network for the excitation and detection of structural dynamic responses. Statistical models are built using PCA when the structure is known to be healthy either directly from the dynamic responses or from wavelet coefficients at different scales representing Time-frequency information. Different damages on the tested structures are simulated by adding masses at different positions. The data from the structure in different states (damaged or not) are then projected into the different principal component models by each actuator in order to obtain the input feature vectors for a SOM from the scores and the SPE measures. An aircraft fuselage from an Airbus A320 and a multi-layered carbon fiber reinforced plastic (CFRP) plate are used as examples to test the approaches. Results are presented, compared and discussed in order to determine their potential in structural health monitoring. These results showed that all the simulated damages were detectable and the selected features proved capable of separating all damage conditions from the undamaged state for both approaches.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tardiff, Mark F.; Runkle, Robert C.; Anderson, K. K.
2006-01-23
The goal of primary radiation monitoring in support of routine screening and emergency response is to detect characteristics in vehicle radiation signatures that indicate the presence of potential threats. Two conceptual approaches to analyzing gamma-ray spectra for threat detection are isotope identification and anomaly detection. While isotope identification is the time-honored method, an emerging technique is anomaly detection that uses benign vehicle gamma ray signatures to define an expectation of the radiation signature for vehicles that do not pose a threat. Newly acquired spectra are then compared to this expectation using statistical criteria that reflect acceptable false alarm rates andmore » probabilities of detection. The gamma-ray spectra analyzed here were collected at a U.S. land Port of Entry (POE) using a NaI-based radiation portal monitor (RPM). The raw data were analyzed to develop a benign vehicle expectation by decimating the original pulse-height channels to 35 energy bins, extracting composite variables via principal components analysis (PCA), and estimating statistically weighted distances from the mean vehicle spectrum with the mahalanobis distance (MD) metric. This paper reviews the methods used to establish the anomaly identification criteria and presents a systematic analysis of the response of the combined PCA and MD algorithm to modeled mono-energetic gamma-ray sources.« less
NASA Astrophysics Data System (ADS)
Salman, Ahmad; Lapidot, Itshak; Pomerantz, Ami; Tsror, Leah; Shufan, Elad; Moreh, Raymond; Mordechai, Shaul; Huleihel, Mahmoud
2012-01-01
The early diagnosis of phytopathogens is of a great importance; it could save large economical losses due to crops damaged by fungal diseases, and prevent unnecessary soil fumigation or the use of fungicides and bactericides and thus prevent considerable environmental pollution. In this study, 18 isolates of three different fungi genera were investigated; six isolates of Colletotrichum coccodes, six isolates of Verticillium dahliae and six isolates of Fusarium oxysporum. Our main goal was to differentiate these fungi samples on the level of isolates, based on their infrared absorption spectra obtained using the Fourier transform infrared-attenuated total reflection (FTIR-ATR) sampling technique. Advanced statistical and mathematical methods: principal component analysis (PCA), linear discriminant analysis (LDA), and k-means were applied to the spectra after manipulation. Our results showed significant spectral differences between the various fungi genera examined. The use of k-means enabled classification between the genera with a 94.5% accuracy, whereas the use of PCA [3 principal components (PCs)] and LDA has achieved a 99.7% success rate. However, on the level of isolates, the best differentiation results were obtained using PCA (9 PCs) and LDA for the lower wavenumber region (800-1775 cm-1), with identification success rates of 87%, 85.5%, and 94.5% for Colletotrichum, Fusarium, and Verticillium strains, respectively.
Tsai, Tsung-Yuan; Li, Jing-Sheng; Wang, Shaobai; Li, Pingyue; Kwon, Young-Min; Li, Guoan
2013-01-01
The statistical shape model (SSM) method that uses 2D images of the knee joint to predict the 3D joint surface model has been reported in literature. In this study, we constructed a SSM database using 152 human CT knee joint models, including the femur, tibia and patella and analyzed the characteristics of each principal component of the SSM. The surface models of two in vivo knees were predicted using the SSM and their 2D bi-plane fluoroscopic images. The predicted models were compared to their CT joint models. The differences between the predicted 3D knee joint surfaces and the CT image-based surfaces were 0.30 ± 0.81 mm, 0.34 ± 0.79 mm and 0.36 ± 0.59 mm for the femur, tibia and patella, respectively (average ± standard deviation). The computational time for each bone of the knee joint was within 30 seconds using a personal computer. The analysis of this study indicated that the SSM method could be a useful tool to construct 3D surface models of the knee with sub-millimeter accuracy in real time. Thus it may have a broad application in computer assisted knee surgeries that require 3D surface models of the knee. PMID:24156375
Maurer, Christian; Federolf, Peter; von Tscharner, Vinzenz; Stirling, Lisa; Nigg, Benno M
2012-05-01
Changes in gait kinematics have often been analyzed using pattern recognition methods such as principal component analysis (PCA). It is usually just the first few principal components that are analyzed, because they describe the main variability within a dataset and thus represent the main movement patterns. However, while subtle changes in gait pattern (for instance, due to different footwear) may not change main movement patterns, they may affect movements represented by higher principal components. This study was designed to test two hypotheses: (1) speed and gender differences can be observed in the first principal components, and (2) small interventions such as changing footwear change the gait characteristics of higher principal components. Kinematic changes due to different running conditions (speed - 3.1m/s and 4.9 m/s, gender, and footwear - control shoe and adidas MicroBounce shoe) were investigated by applying PCA and support vector machine (SVM) to a full-body reflective marker setup. Differences in speed changed the basic movement pattern, as was reflected by a change in the time-dependent coefficient derived from the first principal. Gender was differentiated by using the time-dependent coefficient derived from intermediate principal components. (Intermediate principal components are characterized by limb rotations of the thigh and shank.) Different shoe conditions were identified in higher principal components. This study showed that different interventions can be analyzed using a full-body kinematic approach. Within the well-defined vector space spanned by the data of all subjects, higher principal components should also be considered because these components show the differences that result from small interventions such as footwear changes. Crown Copyright © 2012. Published by Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Nagai, Toshiki; Mitsutake, Ayori; Takano, Hiroshi
2013-02-01
A new relaxation mode analysis method, which is referred to as the principal component relaxation mode analysis method, has been proposed to handle a large number of degrees of freedom of protein systems. In this method, principal component analysis is carried out first and then relaxation mode analysis is applied to a small number of principal components with large fluctuations. To reduce the contribution of fast relaxation modes in these principal components efficiently, we have also proposed a relaxation mode analysis method using multiple evolution times. The principal component relaxation mode analysis method using two evolution times has been applied to an all-atom molecular dynamics simulation of human lysozyme in aqueous solution. Slow relaxation modes and corresponding relaxation times have been appropriately estimated, demonstrating that the method is applicable to protein systems.
Alignment of the Stanford Linear Collider Arcs: Concepts and results
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pitthan, R.; Bell, B.; Friedsam, H.
1987-02-01
The alignment of the Arcs for the Stanford Linear Collider at SLAC has posed problems in accelerator survey and alignment not encountered before. These problems come less from the tight tolerances of 0.1 mm, although reaching such a tight statistically defined accuracy in a controlled manner is difficult enough, but from the absence of a common reference plane for the Arcs. Traditional circular accelerators, including HERA and LEP, have been designed in one plane referenced to local gravity. For the SLC Arcs no such single plane exists. Methods and concepts developed to solve these and other problems, connected with themore » unique design of SLC, range from the first use of satellites for accelerator alignment, use of electronic laser theodolites for placement of components, computer control of the manual adjustment process, complete automation of the data flow incorporating the most advanced concepts of geodesy, strict separation of survey and alignment, to linear principal component analysis for the final statistical smoothing of the mechanical components.« less
Detecting most influencing courses on students grades using block PCA
NASA Astrophysics Data System (ADS)
Othman, Osama H.; Gebril, Rami Salah
2014-12-01
One of the modern solutions adopted in dealing with the problem of large number of variables in statistical analyses is the Block Principal Component Analysis (Block PCA). This modified technique can be used to reduce the vertical dimension (variables) of the data matrix Xn×p by selecting a smaller number of variables, (say m) containing most of the statistical information. These selected variables can then be employed in further investigations and analyses. Block PCA is an adapted multistage technique of the original PCA. It involves the application of Cluster Analysis (CA) and variable selection throughout sub principal components scores (PC's). The application of Block PCA in this paper is a modified version of the original work of Liu et al (2002). The main objective was to apply PCA on each group of variables, (established using cluster analysis), instead of involving the whole large pack of variables which was proved to be unreliable. In this work, the Block PCA is used to reduce the size of a huge data matrix ((n = 41) × (p = 251)) consisting of Grade Point Average (GPA) of the students in 251 courses (variables) in the faculty of science in Benghazi University. In other words, we are constructing a smaller analytical data matrix of the GPA's of the students with less variables containing most variation (statistical information) in the original database. By applying the Block PCA, (12) courses were found to `absorb' most of the variation or influence from the original data matrix, and hence worth to be keep for future statistical exploring and analytical studies. In addition, the course Independent Study (Math.) was found to be the most influencing course on students GPA among the 12 selected courses.
Dong, Jianghu J; Wang, Liangliang; Gill, Jagbir; Cao, Jiguo
2017-01-01
This article is motivated by some longitudinal clinical data of kidney transplant recipients, where kidney function progression is recorded as the estimated glomerular filtration rates at multiple time points post kidney transplantation. We propose to use the functional principal component analysis method to explore the major source of variations of glomerular filtration rate curves. We find that the estimated functional principal component scores can be used to cluster glomerular filtration rate curves. Ordering functional principal component scores can detect abnormal glomerular filtration rate curves. Finally, functional principal component analysis can effectively estimate missing glomerular filtration rate values and predict future glomerular filtration rate values.
Q-mode versus R-mode principal component analysis for linear discriminant analysis (LDA)
NASA Astrophysics Data System (ADS)
Lee, Loong Chuen; Liong, Choong-Yeun; Jemain, Abdul Aziz
2017-05-01
Many literature apply Principal Component Analysis (PCA) as either preliminary visualization or variable con-struction methods or both. Focus of PCA can be on the samples (R-mode PCA) or variables (Q-mode PCA). Traditionally, R-mode PCA has been the usual approach to reduce high-dimensionality data before the application of Linear Discriminant Analysis (LDA), to solve classification problems. Output from PCA composed of two new matrices known as loadings and scores matrices. Each matrix can then be used to produce a plot, i.e. loadings plot aids identification of important variables whereas scores plot presents spatial distribution of samples on new axes that are also known as Principal Components (PCs). Fundamentally, the scores matrix always be the input variables for building classification model. A recent paper uses Q-mode PCA but the focus of analysis was not on the variables but instead on the samples. As a result, the authors have exchanged the use of both loadings and scores plots in which clustering of samples was studied using loadings plot whereas scores plot has been used to identify important manifest variables. Therefore, the aim of this study is to statistically validate the proposed practice. Evaluation is based on performance of external error obtained from LDA models according to number of PCs. On top of that, bootstrapping was also conducted to evaluate the external error of each of the LDA models. Results show that LDA models produced by PCs from R-mode PCA give logical performance and the matched external error are also unbiased whereas the ones produced with Q-mode PCA show the opposites. With that, we concluded that PCs produced from Q-mode is not statistically stable and thus should not be applied to problems of classifying samples, but variables. We hope this paper will provide some insights on the disputable issues.
Bostanov, Vladimir; Kotchoubey, Boris
2006-12-01
This study was aimed at developing a method for extraction and assessment of event-related brain potentials (ERP) from single-trials. This method should be applicable in the assessment of single persons' ERPs and should be able to handle both single ERP components and whole waveforms. We adopted a recently developed ERP feature extraction method, the t-CWT, for the purposes of hypothesis testing in the statistical assessment of ERPs. The t-CWT is based on the continuous wavelet transform (CWT) and Student's t-statistics. The method was tested in two ERP paradigms, oddball and semantic priming, by assessing individual-participant data on a single-trial basis, and testing the significance of selected ERP components, P300 and N400, as well as of whole ERP waveforms. The t-CWT was also compared to other univariate and multivariate ERP assessment methods: peak picking, area computation, discrete wavelet transform (DWT) and principal component analysis (PCA). The t-CWT produced better results than all of the other assessment methods it was compared with. The t-CWT can be used as a reliable and powerful method for ERP-component detection and testing of statistical hypotheses concerning both single ERP components and whole waveforms extracted from either single persons' or group data. The t-CWT is the first such method based explicitly on the criteria of maximal statistical difference between two average ERPs in the time-frequency domain and is particularly suitable for ERP assessment of individual data (e.g. in clinical settings), but also for the investigation of small and/or novel ERP effects from group data.
Wavelet decomposition based principal component analysis for face recognition using MATLAB
NASA Astrophysics Data System (ADS)
Sharma, Mahesh Kumar; Sharma, Shashikant; Leeprechanon, Nopbhorn; Ranjan, Aashish
2016-03-01
For the realization of face recognition systems in the static as well as in the real time frame, algorithms such as principal component analysis, independent component analysis, linear discriminate analysis, neural networks and genetic algorithms are used for decades. This paper discusses an approach which is a wavelet decomposition based principal component analysis for face recognition. Principal component analysis is chosen over other algorithms due to its relative simplicity, efficiency, and robustness features. The term face recognition stands for identifying a person from his facial gestures and having resemblance with factor analysis in some sense, i.e. extraction of the principal component of an image. Principal component analysis is subjected to some drawbacks, mainly the poor discriminatory power and the large computational load in finding eigenvectors, in particular. These drawbacks can be greatly reduced by combining both wavelet transform decomposition for feature extraction and principal component analysis for pattern representation and classification together, by analyzing the facial gestures into space and time domain, where, frequency and time are used interchangeably. From the experimental results, it is envisaged that this face recognition method has made a significant percentage improvement in recognition rate as well as having a better computational efficiency.
Investigation of carbon dioxide emission in China by primary component analysis.
Zhang, Jing; Wang, Cheng-Ming; Liu, Lian; Guo, Hang; Liu, Guo-Dong; Li, Yuan-Wei; Deng, Shi-Huai
2014-02-15
Principal component analysis (PCA) is employed to investigate the relationship between CO2 emissions (COEs) stemming from fossil fuel burning and cement manufacturing and their affecting factors. Eight affecting factors, namely, Population (P), Urban Population (UP); the Output Values of Primary Industry (PIOV), Secondary Industry (SIOV), and Tertiary Industry (TIOV); and the Proportions of Primary Industry's Output Value (PPIOV), Secondary Industry's Output Value (PSIOV), and Tertiary Industry's Output Value (PTIOV), are chosen. PCA is employed to eliminate the multicollinearity of the affecting factors. Two principal components, which can explain 92.86% of the variance of the eight affecting factors, are chosen as variables in the regression analysis. Ordinary least square regression is used to estimate multiple linear regression models, in which COEs and the principal components serve as dependent and independent variables, respectively. The results are given in the following. (1) Theoretically, the carbon intensities of PIOV, SIOV, and TIOV are 2573.4693, 552.7036, and 606.0791 kt per one billion $, respectively. The incomplete statistical data, the different statistical standards, and the ideology of self sufficiency and peasantry appear to show that the carbon intensity of PIOV is higher than those of SIOV and TIOV in China. (2) PPIOV, PSIOV, and PTIOV influence the fluctuations of COE. The parameters of PPIOV, PSIOV, and PTIOV are -2706946.7564, 2557300.5450, and 3924767.9807 kt, respectively. As the economic structure of China is strongly tied to technology level, the period when PIOV plays the leading position is characterized by lagging technology and economic developing. Thus, the influence of PPIOV has a negative value. As the increase of PSIOV and PTIOV is always followed by technological innovation and economic development, PSIOV and PTIOV have the opposite influence. (3) The carbon intensities of P and UP are 1.1029 and 1.7862 kt per thousand people, respectively. The carbon intensity of the rural population can be inferred to be lower than 1.1029 kt per thousand people. The characteristics of poverty and the use of bio-energy in rural areas result in a carbon intensity of the rural population that is lower than that of P. Copyright © 2013 Elsevier B.V. All rights reserved.
The Relation between Factor Score Estimates, Image Scores, and Principal Component Scores
ERIC Educational Resources Information Center
Velicer, Wayne F.
1976-01-01
Investigates the relation between factor score estimates, principal component scores, and image scores. The three methods compared are maximum likelihood factor analysis, principal component analysis, and a variant of rescaled image analysis. (RC)
The Butterflies of Principal Components: A Case of Ultrafine-Grained Polyphase Units
NASA Astrophysics Data System (ADS)
Rietmeijer, F. J. M.
1996-03-01
Dusts in the accretion regions of chondritic interplanetary dust particles [IDPs] consisted of three principal components: carbonaceous units [CUs], carbon-bearing chondritic units [GUs] and carbon-free silicate units [PUs]. Among others, differences among chondritic IDP morphologies and variable bulk C/Si ratios reflect variable mixtures of principal components. The spherical shapes of the initially amorphous principal components remain visible in many chondritic porous IDPs but fusion was documented for CUs, GUs and PUs. The PUs occur as coarse- and ultrafine-grained units that include so called GEMS. Spherical principal components preserved in an IDP as recognisable textural units have unique proporties with important implications for their petrological evolution from pre-accretion processing to protoplanet alteration and dynamic pyrometamorphism. Throughout their lifetime the units behaved as closed-systems without chemical exchange with other units. This behaviour is reflected in their mineralogies while the bulk compositions of principal components define the environments wherein they were formed.
NASA Astrophysics Data System (ADS)
Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.
2016-01-01
Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.
NASA Astrophysics Data System (ADS)
dos Santos, T. S.; Mendes, D.; Torres, R. R.
2015-08-01
Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANN) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon, Northeastern Brazil and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model out- put and observed monthly precipitation. We used GCMs experiments for the 20th century (RCP Historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANN significantly outperforms the MLR downscaling of monthly precipitation variability.
Gambling, games of skill and human ecology: a pilot study by a multidimensional analysis approach.
Valera, Luca; Giuliani, Alessandro; Gizzi, Alessio; Tartaglia, Francesco; Tambone, Vittoradolfo
2015-01-01
The present pilot study aims at analyzing the human activity of playing in the light of an indicator of human ecology (HE). We highlighted the four essential anthropological dimensions (FEAD), starting from the analysis of questionnaires administered to actual gamers. The coherence between theoretical construct and observational data is a remarkable proof-of-concept of the possibility of establishing an experimentally motivated link between a philosophical construct (coming from Huizinga's Homo ludens definition) and actual gamers' motivation pattern. The starting hypothesis is that the activity of playing becomes ecological (and thus not harmful) when it achieves the harmony between the FEAD, thus realizing HE; conversely, it becomes at risk of creating some form of addiction, when destroying FEAD balance. We analyzed the data by means of variable clustering (oblique principal components) so to experimentally verify the existence of the hypothesized dimensions. The subsequent projection of statistical units (gamers) on the orthogonal space spanned by principal components allowed us to generate a meaningful, albeit preliminary, clusterization of gamer profiles.
Optimizing protection efforts for amphibian conservation in Mediterranean landscapes
NASA Astrophysics Data System (ADS)
García-Muñoz, Enrique; Ceacero, Francisco; Carretero, Miguel A.; Pedrajas-Pulido, Luis; Parra, Gema; Guerrero, Francisco
2013-05-01
Amphibians epitomize the modern biodiversity crisis, and attract great attention from the scientific community since a complex puzzle of factors has influence on their disappearance. However, these factors are multiple and spatially variable, and declining in each locality is due to a particular combination of causes. This study shows a suitable statistical procedure to determine threats to amphibian species in medium size administrative areas. For our study case, ten biological and ecological variables feasible to affect the survival of 15 amphibian species were categorized and reduced through Principal Component Analysis. The principal components extracted were related to ecological plasticity, reproductive potential, and specificity of breeding habitats. Finally, the factor scores of species were joined in a presence-absence matrix that gives us information to identify where and why conservation management are requires. In summary, this methodology provides the necessary information to maximize benefits of conservation measures in small areas by identifying which ecological factors need management efforts and where should we focus them on.
Discrimination of serum Raman spectroscopy between normal and colorectal cancer
NASA Astrophysics Data System (ADS)
Li, Xiaozhou; Yang, Tianyue; Yu, Ting; Li, Siqi
2011-07-01
Raman spectroscopy of tissues has been widely studied for the diagnosis of various cancers, but biofluids were seldom used as the analyte because of the low concentration. Herein, serum of 30 normal people, 46 colon cancer, and 44 rectum cancer patients were measured Raman spectra and analyzed. The information of Raman peaks (intensity and width) and that of the fluorescence background (baseline function coefficients) were selected as parameters for statistical analysis. Principal component regression (PCR) and partial least square regression (PLSR) were used on the selected parameters separately to see the performance of the parameters. PCR performed better than PLSR in our spectral data. Then linear discriminant analysis (LDA) was used on the principal components (PCs) of the two regression method on the selected parameters, and a diagnostic accuracy of 88% and 83% were obtained. The conclusion is that the selected features can maintain the information of original spectra well and Raman spectroscopy of serum has the potential for the diagnosis of colorectal cancer.
Foong, Shaohui; Sun, Zhenglong
2016-08-12
In this paper, a novel magnetic field-based sensing system employing statistically optimized concurrent multiple sensor outputs for precise field-position association and localization is presented. This method capitalizes on the independence between simultaneous spatial field measurements at multiple locations to induce unique correspondences between field and position. This single-source-multi-sensor configuration is able to achieve accurate and precise localization and tracking of translational motion without contact over large travel distances for feedback control. Principal component analysis (PCA) is used as a pseudo-linear filter to optimally reduce the dimensions of the multi-sensor output space for computationally efficient field-position mapping with artificial neural networks (ANNs). Numerical simulations are employed to investigate the effects of geometric parameters and Gaussian noise corruption on PCA assisted ANN mapping performance. Using a 9-sensor network, the sensing accuracy and closed-loop tracking performance of the proposed optimal field-based sensing system is experimentally evaluated on a linear actuator with a significantly more expensive optical encoder as a comparison.
Sensor Failure Detection of FASSIP System using Principal Component Analysis
NASA Astrophysics Data System (ADS)
Sudarno; Juarsa, Mulya; Santosa, Kussigit; Deswandri; Sunaryo, Geni Rina
2018-02-01
In the nuclear reactor accident of Fukushima Daiichi in Japan, the damages of core and pressure vessel were caused by the failure of its active cooling system (diesel generator was inundated by tsunami). Thus researches on passive cooling system for Nuclear Power Plant are performed to improve the safety aspects of nuclear reactors. The FASSIP system (Passive System Simulation Facility) is an installation used to study the characteristics of passive cooling systems at nuclear power plants. The accuracy of sensor measurement of FASSIP system is essential, because as the basis for determining the characteristics of a passive cooling system. In this research, a sensor failure detection method for FASSIP system is developed, so the indication of sensor failures can be detected early. The method used is Principal Component Analysis (PCA) to reduce the dimension of the sensor, with the Squarred Prediction Error (SPE) and statistic Hotteling criteria for detecting sensor failure indication. The results shows that PCA method is capable to detect the occurrence of a failure at any sensor.
NASA Astrophysics Data System (ADS)
Babanova, Sofia; Artyushkova, Kateryna; Ulyanova, Yevgenia; Singhal, Sameer; Atanassov, Plamen
2014-01-01
Two statistical methods, design of experiments (DOE) and principal component analysis (PCA) are employed to investigate and improve performance of air-breathing gas-diffusional enzymatic electrodes. DOE is utilized as a tool for systematic organization and evaluation of various factors affecting the performance of the composite system. Based on the results from the DOE, an improved cathode is constructed. The current density generated utilizing the improved cathode (755 ± 39 μA cm-2 at 0.3 V vs. Ag/AgCl) is 2-5 times higher than the highest current density previously achieved. Three major factors contributing to the cathode performance are identified: the amount of enzyme, the volume of phosphate buffer used to immobilize the enzyme, and the thickness of the gas-diffusion layer (GDL). PCA is applied as an independent confirmation tool to support conclusions made by DOE and to visualize the contribution of factors in individual cathode configurations.
Yu, Marcia M L; Sandercock, P Mark L
2012-01-01
During the forensic examination of textile fibers, fibers are usually mounted on glass slides for visual inspection and identification under the microscope. One method that has the capability to accurately identify single textile fibers without subsequent demounting is Raman microspectroscopy. The effect of the mountant Entellan New on the Raman spectra of fibers was investigated to determine if it is suitable for fiber analysis. Raman spectra of synthetic fibers mounted in three different ways were collected and subjected to multivariate analysis. Principal component analysis score plots revealed that while spectra from different fiber classes formed distinct groups, fibers of the same class formed a single group regardless of the mounting method. The spectra of bare fibers and those mounted in Entellan New were found to be statistically indistinguishable by analysis of variance calculations. These results demonstrate that fibers mounted in Entellan New may be identified directly by Raman microspectroscopy without further sample preparation. © 2011 American Academy of Forensic Sciences.
Assessment of technological level of stem cell research using principal component analysis.
Do Cho, Sung; Hwan Hyun, Byung; Kim, Jae Kyeom
2016-01-01
In general, technological levels have been assessed based on specialist's opinion through the methods such as Delphi. But in such cases, results could be significantly biased per study design and individual expert. In this study, therefore scientific literatures and patents were selected by means of analytic indexes for statistic approach and technical assessment of stem cell fields. The analytic indexes, numbers and impact indexes of scientific literatures and patents, were weighted based on principal component analysis, and then, were summated into the single value. Technological obsolescence was calculated through the cited half-life of patents issued by the United States Patents and Trademark Office and was reflected in technological level assessment. As results, ranks of each nation's in reference to the technology level were rated by the proposed method. Furthermore we were able to evaluate strengthens and weaknesses thereof. Although our empirical research presents faithful results, in the further study, there is a need to compare the existing methods and the suggested method.
Gouvinhas, Irene; Machado, Nelson; Carvalho, Teresa; de Almeida, José M M M; Barros, Ana I R N A
2015-01-01
Extra virgin olive oils produced from three cultivars on different maturation stages were characterized using Raman spectroscopy. Chemometric methods (principal component analysis, discriminant analysis, principal component regression and partial least squares regression) applied to Raman spectral data were utilized to evaluate and quantify the statistical differences between cultivars and their ripening process. The models for predicting the peroxide value and free acidity of olive oils showed good calibration and prediction values and presented high coefficients of determination (>0.933). Both the R(2), and the correlation equations between the measured chemical parameters, and the values predicted by each approach are presented; these comprehend both PCR and PLS, used to assess SNV normalized Raman data, as well as first and second derivative of the spectra. This study demonstrates that a combination of Raman spectroscopy with multivariate analysis methods can be useful to predict rapidly olive oil chemical characteristics during the maturation process. Copyright © 2014 Elsevier B.V. All rights reserved.
Classification Techniques for Multivariate Data Analysis.
1980-03-28
analysis among biologists, botanists, and ecologists, while some social scientists may refer "typology". Other frequently encountered terms are pattern...the determinantal equation: lB -XW 0 (42) 49 The solutions X. are the eigenvalues of the matrix W-1 B 1 as in discriminant analysis. There are t non...Statistical Package for Social Sciences (SPSS) (14) subprogram FACTOR was used for the principal components analysis. It is designed both for the factor
NASA Astrophysics Data System (ADS)
Selivanova, Karina G.; Avrunin, Oleg G.; Zlepko, Sergii M.; Romanyuk, Sergii O.; Zabolotna, Natalia I.; Kotyra, Andrzej; Komada, Paweł; Smailova, Saule
2016-09-01
Research and systematization of motor disorders, taking into account the clinical and neurophysiologic phenomena, are important and actual problem of neurology. The article describes a technique for decomposing surface electromyography (EMG), using Principal Component Analysis. The decomposition is achieved by a set of algorithms that uses a specially developed for analyze EMG. The accuracy was verified by calculation of Mahalanobis distance and Probability error.
Foch, Eric; Milner, Clare E
2014-01-03
Iliotibial band syndrome (ITBS) is a common knee overuse injury among female runners. Atypical discrete trunk and lower extremity biomechanics during running may be associated with the etiology of ITBS. Examining discrete data points limits the interpretation of a waveform to a single value. Characterizing entire kinematic and kinetic waveforms may provide additional insight into biomechanical factors associated with ITBS. Therefore, the purpose of this cross-sectional investigation was to determine whether female runners with previous ITBS exhibited differences in kinematics and kinetics compared to controls using a principal components analysis (PCA) approach. Forty participants comprised two groups: previous ITBS and controls. Principal component scores were retained for the first three principal components and were analyzed using independent t-tests. The retained principal components accounted for 93-99% of the total variance within each waveform. Runners with previous ITBS exhibited low principal component one scores for frontal plane hip angle. Principal component one accounted for the overall magnitude in hip adduction which indicated that runners with previous ITBS assumed less hip adduction throughout stance. No differences in the remaining retained principal component scores for the waveforms were detected among groups. A smaller hip adduction angle throughout the stance phase of running may be a compensatory strategy to limit iliotibial band strain. This running strategy may have persisted after ITBS symptoms subsided. © 2013 Published by Elsevier Ltd.
Zhang, Hong-Guang; Yang, Qin-Min; Lu, Jian-Gang
2014-04-01
In this paper, a novel discriminant methodology based on near infrared spectroscopic analysis technique and least square support vector machine was proposed for rapid and nondestructive discrimination of different types of Polyacrylamide. The diffuse reflectance spectra of samples of Non-ionic Polyacrylamide, Anionic Polyacrylamide and Cationic Polyacrylamide were measured. Then principal component analysis method was applied to reduce the dimension of the spectral data and extract of the principal compnents. The first three principal components were used for cluster analysis of the three different types of Polyacrylamide. Then those principal components were also used as inputs of least square support vector machine model. The optimization of the parameters and the number of principal components used as inputs of least square support vector machine model was performed through cross validation based on grid search. 60 samples of each type of Polyacrylamide were collected. Thus a total of 180 samples were obtained. 135 samples, 45 samples for each type of Polyacrylamide, were randomly split into a training set to build calibration model and the rest 45 samples were used as test set to evaluate the performance of the developed model. In addition, 5 Cationic Polyacrylamide samples and 5 Anionic Polyacrylamide samples adulterated with different proportion of Non-ionic Polyacrylamide were also prepared to show the feasibilty of the proposed method to discriminate the adulterated Polyacrylamide samples. The prediction error threshold for each type of Polyacrylamide was determined by F statistical significance test method based on the prediction error of the training set of corresponding type of Polyacrylamide in cross validation. The discrimination accuracy of the built model was 100% for prediction of the test set. The prediction of the model for the 10 mixing samples was also presented, and all mixing samples were accurately discriminated as adulterated samples. The overall results demonstrate that the discrimination method proposed in the present paper can rapidly and nondestructively discriminate the different types of Polyacrylamide and the adulterated Polyacrylamide samples, and offered a new approach to discriminate the types of Polyacrylamide.
Nature of Driving Force for Protein Folding-- A Result From Analyzing the Statistical Potential
NASA Astrophysics Data System (ADS)
Li, Hao; Tang, Chao; Wingreen, Ned S.
1998-03-01
In a statistical approach to protein structure analysis, Miyazawa and Jernigan (MJ) derived a 20× 20 matrix of inter-residue contact energies between different types of amino acids. Using the method of eigenvalue decomposition, we find that the MJ matrix can be accurately reconstructed from its first two principal component vectors as M_ij=C_0+C_1(q_i+q_j)+C2 qi q_j, with constant C's, and 20 q values associated with the 20 amino acids. This regularity is due to hydrophobic interactions and a force of demixing, the latter obeying Hildebrand's solubility theory of simple liquids.
Systematic study of anharmonic features in a principal component analysis of gramicidin A.
Kurylowicz, Martin; Yu, Ching-Hsing; Pomès, Régis
2010-02-03
We use principal component analysis (PCA) to detect functionally interesting collective motions in molecular-dynamics simulations of membrane-bound gramicidin A. We examine the statistical and structural properties of all PCA eigenvectors and eigenvalues for the backbone and side-chain atoms. All eigenvalue spectra show two distinct power-law scaling regimes, quantitatively separating large from small covariance motions. Time trajectories of the largest PCs converge to Gaussian distributions at long timescales, but groups of small-covariance PCs, which are usually ignored as noise, have subdiffusive distributions. These non-Gaussian distributions imply anharmonic motions on the free-energy surface. We characterize the anharmonic components of motion by analyzing the mean-square displacement for all PCs. The subdiffusive components reveal picosecond-scale oscillations in the mean-square displacement at frequencies consistent with infrared measurements. In this regime, the slowest backbone mode exhibits tilting of the peptide planes, which allows carbonyl oxygen atoms to provide surrogate solvation for water and cation transport in the channel lumen. Higher-frequency modes are also apparent, and we describe their vibrational spectra. Our findings expand the utility of PCA for quantifying the essential features of motion on the anharmonic free-energy surface made accessible by atomistic molecular-dynamics simulations. Copyright (c) 2010 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Identification of the isomers using principal component analysis (PCA) method
NASA Astrophysics Data System (ADS)
Kepceoǧlu, Abdullah; Gündoǧdu, Yasemin; Ledingham, Kenneth William David; Kilic, Hamdi Sukur
2016-03-01
In this work, we have carried out a detailed statistical analysis for experimental data of mass spectra from xylene isomers. Principle Component Analysis (PCA) was used to identify the isomers which cannot be distinguished using conventional statistical methods for interpretation of their mass spectra. Experiments have been carried out using a linear TOF-MS coupled to a femtosecond laser system as an energy source for the ionisation processes. We have performed experiments and collected data which has been analysed and interpreted using PCA as a multivariate analysis of these spectra. This demonstrates the strength of the method to get an insight for distinguishing the isomers which cannot be identified using conventional mass analysis obtained through dissociative ionisation processes on these molecules. The PCA results dependending on the laser pulse energy and the background pressure in the spectrometers have been presented in this work.
Martinez-Murcia, Francisco Jesús; Lai, Meng-Chuan; Górriz, Juan Manuel; Ramírez, Javier; Young, Adam M H; Deoni, Sean C L; Ecker, Christine; Lombardo, Michael V; Baron-Cohen, Simon; Murphy, Declan G M; Bullmore, Edward T; Suckling, John
2017-03-01
Neuroimaging studies have reported structural and physiological differences that could help understand the causes and development of Autism Spectrum Disorder (ASD). Many of them rely on multisite designs, with the recruitment of larger samples increasing statistical power. However, recent large-scale studies have put some findings into question, considering the results to be strongly dependent on the database used, and demonstrating the substantial heterogeneity within this clinically defined category. One major source of variance may be the acquisition of the data in multiple centres. In this work we analysed the differences found in the multisite, multi-modal neuroimaging database from the UK Medical Research Council Autism Imaging Multicentre Study (MRC AIMS) in terms of both diagnosis and acquisition sites. Since the dissimilarities between sites were higher than between diagnostic groups, we developed a technique called Significance Weighted Principal Component Analysis (SWPCA) to reduce the undesired intensity variance due to acquisition site and to increase the statistical power in detecting group differences. After eliminating site-related variance, statistically significant group differences were found, including Broca's area and the temporo-parietal junction. However, discriminative power was not sufficient to classify diagnostic groups, yielding accuracies results close to random. Our work supports recent claims that ASD is a highly heterogeneous condition that is difficult to globally characterize by neuroimaging, and therefore different (and more homogenous) subgroups should be defined to obtain a deeper understanding of ASD. Hum Brain Mapp 38:1208-1223, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
XCOM intrinsic dimensionality for low-Z elements at diagnostic energies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bornefalk, Hans
2012-02-15
Purpose: To determine the intrinsic dimensionality of linear attenuation coefficients (LACs) from XCOM for elements with low atomic number (Z = 1-20) at diagnostic x-ray energies (25-120 keV). H{sub 0}{sup q}, the hypothesis that the space of LACs is spanned by q bases, is tested for various q-values. Methods: Principal component analysis is first applied and the LACs are projected onto the first q principal component bases. The residuals of the model values vs XCOM data are determined for all energies and atomic numbers. Heteroscedasticity invalidates the prerequisite of i.i.d. errors necessary for bootstrapping residuals. Instead wild bootstrap is applied,more » which, by not mixing residuals, allows the effect of the non-i.i.d residuals to be reflected in the result. Credible regions for the eigenvalues of the correlation matrix for the bootstrapped LAC data are determined. If subsequent credible regions for the eigenvalues overlap, the corresponding principal component is not considered to represent true data structure but noise. If this happens for eigenvalues l and l + 1, for any l{<=}q, H{sub 0}{sup q} is rejected. Results: The largest value of q for which H{sub 0}{sup q} is nonrejectable at the 5%-level is q = 4. This indicates that the statistically significant intrinsic dimensionality of low-Z XCOM data at diagnostic energies is four. Conclusions: The method presented allows determination of the statistically significant dimensionality of any noisy linear subspace. Knowledge of such significant dimensionality is of interest for any method making assumptions on intrinsic dimensionality and evaluating results on noisy reference data. For LACs, knowledge of the low-Z dimensionality might be relevant when parametrization schemes are tuned to XCOM data. For x-ray imaging techniques based on the basis decomposition method (Alvarez and Macovski, Phys. Med. Biol. 21, 733-744, 1976), an underlying dimensionality of two is commonly assigned to the LAC of human tissue at diagnostic energies. The finding of a higher statistically significant dimensionality thus raises the question whether a higher assumed model dimensionality (now feasible with the advent of multibin x-ray systems) might also be practically relevant, i.e., if better tissue characterization results can be obtained.« less
Early forest fire detection using principal component analysis of infrared video
NASA Astrophysics Data System (ADS)
Saghri, John A.; Radjabi, Ryan; Jacobs, John T.
2011-09-01
A land-based early forest fire detection scheme which exploits the infrared (IR) temporal signature of fire plume is described. Unlike common land-based and/or satellite-based techniques which rely on measurement and discrimination of fire plume directly from its infrared and/or visible reflectance imagery, this scheme is based on exploitation of fire plume temporal signature, i.e., temperature fluctuations over the observation period. The method is simple and relatively inexpensive to implement. The false alarm rate is expected to be lower that of the existing methods. Land-based infrared (IR) cameras are installed in a step-stare-mode configuration in potential fire-prone areas. The sequence of IR video frames from each camera is digitally processed to determine if there is a fire within camera's field of view (FOV). The process involves applying a principal component transformation (PCT) to each nonoverlapping sequence of video frames from the camera to produce a corresponding sequence of temporally-uncorrelated principal component (PC) images. Since pixels that form a fire plume exhibit statistically similar temporal variation (i.e., have a unique temporal signature), PCT conveniently renders the footprint/trace of the fire plume in low-order PC images. The PC image which best reveals the trace of the fire plume is then selected and spatially filtered via simple threshold and median filter operations to remove the background clutter, such as traces of moving tree branches due to wind.
NASA Astrophysics Data System (ADS)
Farsadnia, Farhad; Ghahreman, Bijan
2016-04-01
Hydrologic homogeneous group identification is considered both fundamental and applied research in hydrology. Clustering methods are among conventional methods to assess the hydrological homogeneous regions. Recently, Self-Organizing feature Map (SOM) method has been applied in some studies. However, the main problem of this method is the interpretation on the output map of this approach. Therefore, SOM is used as input to other clustering algorithms. The aim of this study is to apply a two-level Self-Organizing feature map and Ward hierarchical clustering method to determine the hydrologic homogenous regions in North and Razavi Khorasan provinces. At first by principal component analysis, we reduced SOM input matrix dimension, then the SOM was used to form a two-dimensional features map. To determine homogeneous regions for flood frequency analysis, SOM output nodes were used as input into the Ward method. Generally, the regions identified by the clustering algorithms are not statistically homogeneous. Consequently, they have to be adjusted to improve their homogeneity. After adjustment of the homogeneity regions by L-moment tests, five hydrologic homogeneous regions were identified. Finally, adjusted regions were created by a two-level SOM and then the best regional distribution function and associated parameters were selected by the L-moment approach. The results showed that the combination of self-organizing maps and Ward hierarchical clustering by principal components as input is more effective than the hierarchical method, by principal components or standardized inputs to achieve hydrologic homogeneous regions.
NASA Astrophysics Data System (ADS)
Song, Bowen; Zhang, Guopeng; Wang, Huafeng; Zhu, Wei; Liang, Zhengrong
2013-02-01
Various types of features, e.g., geometric features, texture features, projection features etc., have been introduced for polyp detection and differentiation tasks via computer aided detection and diagnosis (CAD) for computed tomography colonography (CTC). Although these features together cover more information of the data, some of them are statistically highly-related to others, which made the feature set redundant and burdened the computation task of CAD. In this paper, we proposed a new dimension reduction method which combines hierarchical clustering and principal component analysis (PCA) for false positives (FPs) reduction task. First, we group all the features based on their similarity using hierarchical clustering, and then PCA is employed within each group. Different numbers of principal components are selected from each group to form the final feature set. Support vector machine is used to perform the classification. The results show that when three principal components were chosen from each group we can achieve an area under the curve of receiver operating characteristics of 0.905, which is as high as the original dataset. Meanwhile, the computation time is reduced by 70% and the feature set size is reduce by 77%. It can be concluded that the proposed method captures the most important information of the feature set and the classification accuracy is not affected after the dimension reduction. The result is promising and further investigation, such as automatically threshold setting, are worthwhile and are under progress.
Geochemistry of sediments in the Northern and Central Adriatic Sea
NASA Astrophysics Data System (ADS)
De Lazzari, A.; Rampazzo, G.; Pavoni, B.
2004-03-01
Major, minor and trace elements, loss of ignition, specific surface area, quantities of calcite and dolomite, qualitative mineralogical composition, grain-size distribution and organic micropollutants (PAH, PCB, DDT) were determined on surficial marine sediments sampled during the 1990 ASCOP (Adriatic Scientific Cooperative Program) cruise. Mineralogical composition and carbonate content of the samples were found to be comparable with data previously reported in the literature, whereas geochemical composition and distribution of major, minor and trace elements for samples in international waters and in the central basin have never been reported before. The large amount of information contained in the variables of different origin has been processed by means of a comprehensive approach which establishes the relations among the components through the mathematical-statistical calculation of principal components (factors). These account for the major part of data variance loosing only marginal parts of information and are independent from the units of measure. The sample descriptors concerning natural components and contamination load are discussed by means of a statistical model based on an R-mode Factor analysis calculating four significant factors which explain 86.8% of the total variance, and represent important relationships between grain size, mineralogy, geochemistry and organic micropollutants. A description and an interpretation of factor composition is discussed on the basis of pollution inputs, basin geology and hydrodynamics. The areal distribution of the factors showed that it is the fine grain-size fraction, with oxides and hydroxides of colloidal origin, which are the main means of transport and thus the principal link between chemical, physical and granulometric elements in the Adriatic.
Santos, J L; Aparicio, I; Callejón, M; Alonso, E
2009-05-30
Several pharmaceutically active compounds have been monitored during 1-year period in influent and effluent wastewater from wastewater treatment plants (WWTPs) to evaluate their temporal evolution and removal from wastewater and to know which variables have influence in their removal rates. Pharmaceutical compounds monitored were four antiinflammatory drugs (diclofenac, ibuprofen, ketoprofen and naproxen), an antiepileptic drug (carbamazepine) and a nervous stimulant (caffeine). All of the pharmaceutically active compounds monitored, except diclofenac, were detected in influent and effluent wastewater. Mean concentrations measured in influent wastewater were 6.17, 0.48, 93.6, 1.83 and 5.41 microg/L for caffeine, carbamazepine, ibuprofen, ketoprofen and naproxen, respectively. Mean concentrations measured in effluent wastewater were 2.02, 0.56, 8.20, 0.84 and 2.10 microg/L for caffeine, carbamazepine, ibuprofen, ketoprofen and naproxen, respectively. Mean removal rates of the pharmaceuticals varied from 8.1% (carbamazepine) to 87.5% (ibuprofen). The existence of relationships between the concentrations of the pharmaceutical compounds, their removal rates, the characterization parameters of influent wastewaters and the WWTP control design parameters has been studied by means of statistical analysis (correlation and principal component analysis). With both statistical analyses, high correlations were obtained between the concentration of the pharmaceutical compounds and the characterization parameters of influent wastewaters; and between the removal rates of the pharmaceutical compounds, the removal rates of the characterization parameters of influent wastewaters and the WWTP hydraulic retention times. Principal component analysis showed the existence of two main components accounting for 76% of the total variability.
Cocco, S; Monasson, R; Sessak, V
2011-05-01
We consider the problem of inferring the interactions between a set of N binary variables from the knowledge of their frequencies and pairwise correlations. The inference framework is based on the Hopfield model, a special case of the Ising model where the interaction matrix is defined through a set of patterns in the variable space, and is of rank much smaller than N. We show that maximum likelihood inference is deeply related to principal component analysis when the amplitude of the pattern components ξ is negligible compared to √N. Using techniques from statistical mechanics, we calculate the corrections to the patterns to the first order in ξ/√N. We stress the need to generalize the Hopfield model and include both attractive and repulsive patterns in order to correctly infer networks with sparse and strong interactions. We present a simple geometrical criterion to decide how many attractive and repulsive patterns should be considered as a function of the sampling noise. We moreover discuss how many sampled configurations are required for a good inference, as a function of the system size N and of the amplitude ξ. The inference approach is illustrated on synthetic and biological data.
Chemical Polymorphism of Essential Oils of Artemisia vulgaris Growing Wild in Lithuania.
Judzentiene, Asta; Budiene, Jurga
2018-02-01
Compositional variability of mugwort (Artemisia vulgaris L.) essential oils has been investigated in the study. Plant material (over ground parts at full flowering stage) was collected from forty-four wild populations in Lithuania. The oils from aerial parts were obtained by hydrodistillation and analyzed by GC(FID) and GC/MS. In total, up to 111 components were determined in the oils. As the major constituents were found: sabinene, 1,8-cineole, artemisia ketone, both thujone isomers, camphor, cis-chrysanthenyl acetate, davanone and davanone B. The compositional data were subjected to statistical analysis. The application of PCA (Principal Component Analysis) and AHC (Agglomerative Hierarchical Clustering) allowed grouping the oils into six clusters. AHC permitted to distinguish an artemisia ketone chemotype, which, to the best of our knowledge, is very scarce. Additionally, two rare cis-chrysanthenyl acetate and sabinene oil types were determined for the plants growing in Lithuania. Besides, davanone was found for the first time as a principal component in mugwort oils. The performed study revealed significant chemical polymorphism of essential oils in mugwort plants native to Lithuania; it has expanded our chemotaxonomic knowledge both of A. vulgaris species and Artemisia genus. © 2018 Wiley-VHCA AG, Zurich, Switzerland.
Derivation of Boundary Manikins: A Principal Component Analysis
NASA Technical Reports Server (NTRS)
Young, Karen; Margerum, Sarah; Barr, Abbe; Ferrer, Mike A.; Rajulu, Sudhakar
2008-01-01
When designing any human-system interface, it is critical to provide realistic anthropometry to properly represent how a person fits within a given space. This study aimed to identify a minimum number of boundary manikins or representative models of subjects anthropometry from a target population, which would realistically represent the population. The boundary manikin anthropometry was derived using, Principal Component Analysis (PCA). PCA is a statistical approach to reduce a multi-dimensional dataset using eigenvectors and eigenvalues. The measurements used in the PCA were identified as those measurements critical for suit and cockpit design. The PCA yielded a total of 26 manikins per gender, as well as their anthropometry from the target population. Reduction techniques were implemented to reduce this number further with a final result of 20 female and 22 male subjects. The anthropometry of the boundary manikins was then be used to create 3D digital models (to be discussed in subsequent papers) intended for use by designers to test components of their space suit design, to verify that the requirements specified in the Human Systems Integration Requirements (HSIR) document are met. The end-goal is to allow for designers to generate suits which accommodate the diverse anthropometry of the user population.
Which statistics should tropical biologists learn?
Loaiza Velásquez, Natalia; González Lutz, María Isabel; Monge-Nájera, Julián
2011-09-01
Tropical biologists study the richest and most endangered biodiversity in the planet, and in these times of climate change and mega-extinctions, the need for efficient, good quality research is more pressing than in the past. However, the statistical component in research published by tropical authors sometimes suffers from poor quality in data collection; mediocre or bad experimental design and a rigid and outdated view of data analysis. To suggest improvements in their statistical education, we listed all the statistical tests and other quantitative analyses used in two leading tropical journals, the Revista de Biología Tropical and Biotropica, during a year. The 12 most frequent tests in the articles were: Analysis of Variance (ANOVA), Chi-Square Test, Student's T Test, Linear Regression, Pearson's Correlation Coefficient, Mann-Whitney U Test, Kruskal-Wallis Test, Shannon's Diversity Index, Tukey's Test, Cluster Analysis, Spearman's Rank Correlation Test and Principal Component Analysis. We conclude that statistical education for tropical biologists must abandon the old syllabus based on the mathematical side of statistics and concentrate on the correct selection of these and other procedures and tests, on their biological interpretation and on the use of reliable and friendly freeware. We think that their time will be better spent understanding and protecting tropical ecosystems than trying to learn the mathematical foundations of statistics: in most cases, a well designed one-semester course should be enough for their basic requirements.
Psychological Analyses of Courageous Performance in Military Personnel
1986-11-01
schedule HR heart rate IBI inter- beat interval N number of subjects NS not statistically significant P probability PCA principal components analysis RAQ...tones in the range of 400 to 600 Hz, set at a level of 60 dB, transmitted for 1 sec binaurally through earphones from a commercial oscillator. The...because of interference on the recording trace. Cardiac activity was measured in terms of heart rate (HR). The number of beats /minute was estimared by
The study was concerned with the identification of the job dimension underlying the job elements of the Position Analysis Questionnaire ( PAQ ), Form B...The PAQ is a structured job analysis instrument consisting of 187 worker-oriented job elements which are divided into six a priori major divisions...The statistical procedure of principal components analysis was used to identify the job dimensions of the PAQ . Forty-five job dimensions were
Tsai, Tsung-Yuan; Li, Jing-Sheng; Wang, Shaobai; Li, Pingyue; Kwon, Young-Min; Li, Guoan
2015-01-01
The statistical shape model (SSM) method that uses 2D images of the knee joint to predict the three-dimensional (3D) joint surface model has been reported in the literature. In this study, we constructed a SSM database using 152 human computed tomography (CT) knee joint models, including the femur, tibia and patella and analysed the characteristics of each principal component of the SSM. The surface models of two in vivo knees were predicted using the SSM and their 2D bi-plane fluoroscopic images. The predicted models were compared to their CT joint models. The differences between the predicted 3D knee joint surfaces and the CT image-based surfaces were 0.30 ± 0.81 mm, 0.34 ± 0.79 mm and 0.36 ± 0.59 mm for the femur, tibia and patella, respectively (average ± standard deviation). The computational time for each bone of the knee joint was within 30 s using a personal computer. The analysis of this study indicated that the SSM method could be a useful tool to construct 3D surface models of the knee with sub-millimeter accuracy in real time. Thus, it may have a broad application in computer-assisted knee surgeries that require 3D surface models of the knee.
Fasoula, S; Zisi, Ch; Sampsonidis, I; Virgiliou, Ch; Theodoridis, G; Gika, H; Nikitas, P; Pappa-Louisi, A
2015-03-27
In the present study a series of 45 metabolite standards belonging to four chemically similar metabolite classes (sugars, amino acids, nucleosides and nucleobases, and amines) was subjected to LC analysis on three HILIC columns under 21 different gradient conditions with the aim to explore whether the retention properties of these analytes are determined from the chemical group they belong. Two multivariate techniques, principal component analysis (PCA) and discriminant analysis (DA), were used for statistical evaluation of the chromatographic data and extraction similarities between chemically related compounds. The total variance explained by the first two principal components of PCA was found to be about 98%, whereas both statistical analyses indicated that all analytes are successfully grouped in four clusters of chemical structure based on the retention obtained in four or at least three chromatographic runs, which, however should be performed on two different HILIC columns. Moreover, leave-one-out cross-validation of the above retention data set showed that the chemical group in which an analyte belongs can be 95.6% correctly predicted when the analyte is subjected to LC analysis under the same four or three experimental conditions as the all set of analytes was run beforehand. That, in turn, may assist with disambiguation of analyte identification in complex biological extracts. Copyright © 2015 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harris, Michael D.; Dater, Manasi; Whitaker, Ross
In this study, statistical shape modeling (SSM) was used to quantify three-dimensional (3D) variation and morphologic differences between femurs with and without cam femoroacetabular impingement (FAI). 3D surfaces were generated from CT scans of femurs from 41 controls and 30 cam FAI patients. SSM correspondence particles were optimally positioned on each surface using a gradient descent energy function. Mean shapes for control and patient groups were defined from the resulting particle configurations. Morphological differences between group mean shapes and between the control mean and individual patients were calculated. Principal component analysis was used to describe anatomical variation present in bothmore » groups. The first 6 modes (or principal components) captured statistically significant shape variations, which comprised 84% of cumulative variation among the femurs. Shape variation was greatest in femoral offset, greater trochanter height, and the head-neck junction. The mean cam femur shape protruded above the control mean by a maximum of 3.3 mm with sustained protrusions of 2.5-3.0 mm along the anterolateral head-neck junction and distally along the anterior neck, corresponding well with reported cam lesion locations and soft-tissue damage. This study provides initial evidence that SSM can describe variations in femoral morphology in both controls and cam FAI patients and may be useful for developing new measurements of pathological anatomy. SSM may also be applied to characterize cam FAI severity and provide templates to guide patient-specific surgical resection of bone.« less
Nonlinear Principal Components Analysis: Introduction and Application
ERIC Educational Resources Information Center
Linting, Marielle; Meulman, Jacqueline J.; Groenen, Patrick J. F.; van der Koojj, Anita J.
2007-01-01
The authors provide a didactic treatment of nonlinear (categorical) principal components analysis (PCA). This method is the nonlinear equivalent of standard PCA and reduces the observed variables to a number of uncorrelated principal components. The most important advantages of nonlinear over linear PCA are that it incorporates nominal and ordinal…
USDA-ARS?s Scientific Manuscript database
Selective principal component regression analysis (SPCR) uses a subset of the original image bands for principal component transformation and regression. For optimal band selection before the transformation, this paper used genetic algorithms (GA). In this case, the GA process used the regression co...
Similarities between principal components of protein dynamics and random diffusion
NASA Astrophysics Data System (ADS)
Hess, Berk
2000-12-01
Principal component analysis, also called essential dynamics, is a powerful tool for finding global, correlated motions in atomic simulations of macromolecules. It has become an established technique for analyzing molecular dynamics simulations of proteins. The first few principal components of simulations of large proteins often resemble cosines. We derive the principal components for high-dimensional random diffusion, which are almost perfect cosines. This resemblance between protein simulations and noise implies that for many proteins the time scales of current simulations are too short to obtain convergence of collective motions.
Directly Reconstructing Principal Components of Heterogeneous Particles from Cryo-EM Images
Tagare, Hemant D.; Kucukelbir, Alp; Sigworth, Fred J.; Wang, Hongwei; Rao, Murali
2015-01-01
Structural heterogeneity of particles can be investigated by their three-dimensional principal components. This paper addresses the question of whether, and with what algorithm, the three-dimensional principal components can be directly recovered from cryo-EM images. The first part of the paper extends the Fourier slice theorem to covariance functions showing that the three-dimensional covariance, and hence the principal components, of a heterogeneous particle can indeed be recovered from two-dimensional cryo-EM images. The second part of the paper proposes a practical algorithm for reconstructing the principal components directly from cryo-EM images without the intermediate step of calculating covariances. This algorithm is based on maximizing the (posterior) likelihood using the Expectation-Maximization algorithm. The last part of the paper applies this algorithm to simulated data and to two real cryo-EM data sets: a data set of the 70S ribosome with and without Elongation Factor-G (EF-G), and a data set of the inluenza virus RNA dependent RNA Polymerase (RdRP). The first principal component of the 70S ribosome data set reveals the expected conformational changes of the ribosome as the EF-G binds and unbinds. The first principal component of the RdRP data set reveals a conformational change in the two dimers of the RdRP. PMID:26049077
Halai, Ajay D; Woollams, Anna M; Lambon Ralph, Matthew A
2017-01-01
Individual differences in the performance profiles of neuropsychologically-impaired patients are pervasive yet there is still no resolution on the best way to model and account for the variation in their behavioural impairments and the associated neural correlates. To date, researchers have generally taken one of three different approaches: a single-case study methodology in which each case is considered separately; a case-series design in which all individual patients from a small coherent group are examined and directly compared; or, group studies, in which a sample of cases are investigated as one group with the assumption that they are drawn from a homogenous category and that performance differences are of no interest. In recent research, we have developed a complementary alternative through the use of principal component analysis (PCA) of individual data from large patient cohorts. This data-driven approach not only generates a single unified model for the group as a whole (expressed in terms of the emergent principal components) but is also able to capture the individual differences between patients (in terms of their relative positions along the principal behavioural axes). We demonstrate the use of this approach by considering speech fluency, phonology and semantics in aphasia diagnosis and classification, as well as their unique neural correlates. PCA of the behavioural data from 31 patients with chronic post-stroke aphasia resulted in four statistically-independent behavioural components reflecting phonological, semantic, executive-cognitive and fluency abilities. Even after accounting for lesion volume, entering the four behavioural components simultaneously into a voxel-based correlational methodology (VBCM) analysis revealed that speech fluency (speech quanta) was uniquely correlated with left motor cortex and underlying white matter (including the anterior section of the arcuate fasciculus and the frontal aslant tract), phonological skills with regions in the superior temporal gyrus and pars opercularis, and semantics with the anterior temporal stem. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.
NASA Astrophysics Data System (ADS)
Lomakina, N. Ya.
2017-11-01
The work presents the results of the applied climatic division of the Siberian region into districts based on the methodology of objective classification of the atmospheric boundary layer climates by the "temperature-moisture-wind" complex realized with using the method of principal components and the special similarity criteria of average profiles and the eigen values of correlation matrices. On the territory of Siberia, it was identified 14 homogeneous regions for winter season and 10 regions were revealed for summer. The local statistical models were constructed for each region. These include vertical profiles of mean values, mean square deviations, and matrices of interlevel correlation of temperature, specific humidity, zonal and meridional wind velocity. The advantage of the obtained local statistical models over the regional models is shown.
Research of facial feature extraction based on MMC
NASA Astrophysics Data System (ADS)
Xue, Donglin; Zhao, Jiufen; Tang, Qinhong; Shi, Shaokun
2017-07-01
Based on the maximum margin criterion (MMC), a new algorithm of statistically uncorrelated optimal discriminant vectors and a new algorithm of orthogonal optimal discriminant vectors for feature extraction were proposed. The purpose of the maximum margin criterion is to maximize the inter-class scatter while simultaneously minimizing the intra-class scatter after the projection. Compared with original MMC method and principal component analysis (PCA) method, the proposed methods are better in terms of reducing or eliminating the statistically correlation between features and improving recognition rate. The experiment results on Olivetti Research Laboratory (ORL) face database shows that the new feature extraction method of statistically uncorrelated maximum margin criterion (SUMMC) are better in terms of recognition rate and stability. Besides, the relations between maximum margin criterion and Fisher criterion for feature extraction were revealed.
Vasilaki, V; Volcke, E I P; Nandi, A K; van Loosdrecht, M C M; Katsou, E
2018-04-26
Multivariate statistical analysis was applied to investigate the dependencies and underlying patterns between N 2 O emissions and online operational variables (dissolved oxygen and nitrogen component concentrations, temperature and influent flow-rate) during biological nitrogen removal from wastewater. The system under study was a full-scale reactor, for which hourly sensor data were available. The 15-month long monitoring campaign was divided into 10 sub-periods based on the profile of N 2 O emissions, using Binary Segmentation. The dependencies between operating variables and N 2 O emissions fluctuated according to Spearman's rank correlation. The correlation between N 2 O emissions and nitrite concentrations ranged between 0.51 and 0.78. Correlation >0.7 between N 2 O emissions and nitrate concentrations was observed at sub-periods with average temperature lower than 12 °C. Hierarchical k-means clustering and principal component analysis linked N 2 O emission peaks with precipitation events and ammonium concentrations higher than 2 mg/L, especially in sub-periods characterized by low N 2 O fluxes. Additionally, the highest ranges of measured N 2 O fluxes belonged to clusters corresponding with NO 3 -N concentration less than 1 mg/L in the upstream plug-flow reactor (middle of oxic zone), indicating slow nitrification rates. The results showed that the range of N 2 O emissions partially depends on the prior behavior of the system. The principal component analysis validated the findings from the clustering analysis and showed that ammonium, nitrate, nitrite and temperature explained a considerable percentage of the variance in the system for the majority of the sub-periods. The applied statistical methods, linked the different ranges of emissions with the system variables, provided insights on the effect of operating conditions on N 2 O emissions in each sub-period and can be integrated into N 2 O emissions data processing at wastewater treatment plants. Copyright © 2018. Published by Elsevier Ltd.
Representation of Probability Density Functions from Orbit Determination using the Particle Filter
NASA Technical Reports Server (NTRS)
Mashiku, Alinda K.; Garrison, James; Carpenter, J. Russell
2012-01-01
Statistical orbit determination enables us to obtain estimates of the state and the statistical information of its region of uncertainty. In order to obtain an accurate representation of the probability density function (PDF) that incorporates higher order statistical information, we propose the use of nonlinear estimation methods such as the Particle Filter. The Particle Filter (PF) is capable of providing a PDF representation of the state estimates whose accuracy is dependent on the number of particles or samples used. For this method to be applicable to real case scenarios, we need a way of accurately representing the PDF in a compressed manner with little information loss. Hence we propose using the Independent Component Analysis (ICA) as a non-Gaussian dimensional reduction method that is capable of maintaining higher order statistical information obtained using the PF. Methods such as the Principal Component Analysis (PCA) are based on utilizing up to second order statistics, hence will not suffice in maintaining maximum information content. Both the PCA and the ICA are applied to two scenarios that involve a highly eccentric orbit with a lower apriori uncertainty covariance and a less eccentric orbit with a higher a priori uncertainty covariance, to illustrate the capability of the ICA in relation to the PCA.
Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing
2016-01-01
A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method. PMID:26761006
Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing
2016-01-08
A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method.
Advanced Treatment Monitoring for Olympic-Level Athletes Using Unsupervised Modeling Techniques
Siedlik, Jacob A.; Bergeron, Charles; Cooper, Michael; Emmons, Russell; Moreau, William; Nabhan, Dustin; Gallagher, Philip; Vardiman, John P.
2016-01-01
Context Analysis of injury and illness data collected at large international competitions provides the US Olympic Committee and the national governing bodies for each sport with information to best prepare for future competitions. Research in which authors have evaluated medical contacts to provide the expected level of medical care and sports medicine services at international competitions is limited. Objective To analyze the medical-contact data for athletes, staff, and coaches who participated in the 2011 Pan American Games in Guadalajara, Mexico, using unsupervised modeling techniques to identify underlying treatment patterns. Design Descriptive epidemiology study. Setting Pan American Games. Patients or Other Participants A total of 618 US athletes (337 males, 281 females) participated in the 2011 Pan American Games. Main Outcome Measure(s) Medical data were recorded from the injury-evaluation and injury-treatment forms used by clinicians assigned to the central US Olympic Committee Sport Medicine Clinic and satellite locations during the operational 17-day period of the 2011 Pan American Games. We used principal components analysis and agglomerative clustering algorithms to identify and define grouped modalities. Lift statistics were calculated for within-cluster subgroups. Results Principal component analyses identified 3 components, accounting for 72.3% of the variability in datasets. Plots of the principal components showed that individual contacts focused on 4 treatment clusters: massage, paired manipulation and mobilization, soft tissue therapy, and general medical. Conclusions Unsupervised modeling techniques were useful for visualizing complex treatment data and provided insights for improved treatment modeling in athletes. Given its ability to detect clinically relevant treatment pairings in large datasets, unsupervised modeling should be considered a feasible option for future analyses of medical-contact data from international competitions. PMID:26794628
Principal semantic components of language and the measurement of meaning.
Samsonovich, Alexei V; Samsonovic, Alexei V; Ascoli, Giorgio A
2010-06-11
Metric systems for semantics, or semantic cognitive maps, are allocations of words or other representations in a metric space based on their meaning. Existing methods for semantic mapping, such as Latent Semantic Analysis and Latent Dirichlet Allocation, are based on paradigms involving dissimilarity metrics. They typically do not take into account relations of antonymy and yield a large number of domain-specific semantic dimensions. Here, using a novel self-organization approach, we construct a low-dimensional, context-independent semantic map of natural language that represents simultaneously synonymy and antonymy. Emergent semantics of the map principal components are clearly identifiable: the first three correspond to the meanings of "good/bad" (valence), "calm/excited" (arousal), and "open/closed" (freedom), respectively. The semantic map is sufficiently robust to allow the automated extraction of synonyms and antonyms not originally in the dictionaries used to construct the map and to predict connotation from their coordinates. The map geometric characteristics include a limited number ( approximately 4) of statistically significant dimensions, a bimodal distribution of the first component, increasing kurtosis of subsequent (unimodal) components, and a U-shaped maximum-spread planar projection. Both the semantic content and the main geometric features of the map are consistent between dictionaries (Microsoft Word and Princeton's WordNet), among Western languages (English, French, German, and Spanish), and with previously established psychometric measures. By defining the semantics of its dimensions, the constructed map provides a foundational metric system for the quantitative analysis of word meaning. Language can be viewed as a cumulative product of human experiences. Therefore, the extracted principal semantic dimensions may be useful to characterize the general semantic dimensions of the content of mental states. This is a fundamental step toward a universal metric system for semantics of human experiences, which is necessary for developing a rigorous science of the mind.
An Introductory Application of Principal Components to Cricket Data
ERIC Educational Resources Information Center
Manage, Ananda B. W.; Scariano, Stephen M.
2013-01-01
Principal Component Analysis is widely used in applied multivariate data analysis, and this article shows how to motivate student interest in this topic using cricket sports data. Here, principal component analysis is successfully used to rank the cricket batsmen and bowlers who played in the 2012 Indian Premier League (IPL) competition. In…
Least Principal Components Analysis (LPCA): An Alternative to Regression Analysis.
ERIC Educational Resources Information Center
Olson, Jeffery E.
Often, all of the variables in a model are latent, random, or subject to measurement error, or there is not an obvious dependent variable. When any of these conditions exist, an appropriate method for estimating the linear relationships among the variables is Least Principal Components Analysis. Least Principal Components are robust, consistent,…
Identifying apple surface defects using principal components analysis and artifical neural networks
USDA-ARS?s Scientific Manuscript database
Artificial neural networks and principal components were used to detect surface defects on apples in near-infrared images. Neural networks were trained and tested on sets of principal components derived from columns of pixels from images of apples acquired at two wavelengths (740 nm and 950 nm). I...
Butler, Rebecca A.
2014-01-01
Stroke aphasia is a multidimensional disorder in which patient profiles reflect variation along multiple behavioural continua. We present a novel approach to separating the principal aspects of chronic aphasic performance and isolating their neural bases. Principal components analysis was used to extract core factors underlying performance of 31 participants with chronic stroke aphasia on a large, detailed battery of behavioural assessments. The rotated principle components analysis revealed three key factors, which we labelled as phonology, semantic and executive/cognition on the basis of the common elements in the tests that loaded most strongly on each component. The phonology factor explained the most variance, followed by the semantic factor and then the executive-cognition factor. The use of principle components analysis rendered participants’ scores on these three factors orthogonal and therefore ideal for use as simultaneous continuous predictors in a voxel-based correlational methodology analysis of high resolution structural scans. Phonological processing ability was uniquely related to left posterior perisylvian regions including Heschl’s gyrus, posterior middle and superior temporal gyri and superior temporal sulcus, as well as the white matter underlying the posterior superior temporal gyrus. The semantic factor was uniquely related to left anterior middle temporal gyrus and the underlying temporal stem. The executive-cognition factor was not correlated selectively with the structural integrity of any particular region, as might be expected in light of the widely-distributed and multi-functional nature of the regions that support executive functions. The identified phonological and semantic areas align well with those highlighted by other methodologies such as functional neuroimaging and neurostimulation. The use of principle components analysis allowed us to characterize the neural bases of participants’ behavioural performance more robustly and selectively than the use of raw assessment scores or diagnostic classifications because principle components analysis extracts statistically unique, orthogonal behavioural components of interest. As such, in addition to improving our understanding of lesion–symptom mapping in stroke aphasia, the same approach could be used to clarify brain–behaviour relationships in other neurological disorders. PMID:25348632
Finding Planets in K2: A New Method of Cleaning the Data
NASA Astrophysics Data System (ADS)
Currie, Miles; Mullally, Fergal; Thompson, Susan E.
2017-01-01
We present a new method of removing systematic flux variations from K2 light curves by employing a pixel-level principal component analysis (PCA). This method decomposes the light curves into its principal components (eigenvectors), each with an associated eigenvalue, the value of which is correlated to how much influence the basis vector has on the shape of the light curve. This method assumes that the most influential basis vectors will correspond to the unwanted systematic variations in the light curve produced by K2’s constant motion. We correct the raw light curve by automatically fitting and removing the strongest principal components. The strongest principal components generally correspond to the flux variations that result from the motion of the star in the field of view. Our primary method of calculating the strongest principal components to correct for in the raw light curve estimates the noise by measuring the scatter in the light curve after using an algorithm for Savitsy-Golay detrending, which computes the combined photometric precision value (SG-CDPP value) used in classic Kepler. We calculate this value after correcting the raw light curve for each element in a list of cumulative sums of principal components so that we have as many noise estimate values as there are principal components. We then take the derivative of the list of SG-CDPP values and take the number of principal components that correlates to the point at which the derivative effectively goes to zero. This is the optimal number of principal components to exclude from the refitting of the light curve. We find that a pixel-level PCA is sufficient for cleaning unwanted systematic and natural noise from K2’s light curves. We present preliminary results and a basic comparison to other methods of reducing the noise from the flux variations.
A new statistical PCA-ICA algorithm for location of R-peaks in ECG.
Chawla, M P S; Verma, H K; Kumar, Vinod
2008-09-16
The success of ICA to separate the independent components from the mixture depends on the properties of the electrocardiogram (ECG) recordings. This paper discusses some of the conditions of independent component analysis (ICA) that could affect the reliability of the separation and evaluation of issues related to the properties of the signals and number of sources. Principal component analysis (PCA) scatter plots are plotted to indicate the diagnostic features in the presence and absence of base-line wander in interpreting the ECG signals. In this analysis, a newly developed statistical algorithm by authors, based on the use of combined PCA-ICA for two correlated channels of 12-channel ECG data is proposed. ICA technique has been successfully implemented in identifying and removal of noise and artifacts from ECG signals. Cleaned ECG signals are obtained using statistical measures like kurtosis and variance of variance after ICA processing. This analysis also paper deals with the detection of QRS complexes in electrocardiograms using combined PCA-ICA algorithm. The efficacy of the combined PCA-ICA algorithm lies in the fact that the location of the R-peaks is bounded from above and below by the location of the cross-over points, hence none of the peaks are ignored or missed.
Directly reconstructing principal components of heterogeneous particles from cryo-EM images.
Tagare, Hemant D; Kucukelbir, Alp; Sigworth, Fred J; Wang, Hongwei; Rao, Murali
2015-08-01
Structural heterogeneity of particles can be investigated by their three-dimensional principal components. This paper addresses the question of whether, and with what algorithm, the three-dimensional principal components can be directly recovered from cryo-EM images. The first part of the paper extends the Fourier slice theorem to covariance functions showing that the three-dimensional covariance, and hence the principal components, of a heterogeneous particle can indeed be recovered from two-dimensional cryo-EM images. The second part of the paper proposes a practical algorithm for reconstructing the principal components directly from cryo-EM images without the intermediate step of calculating covariances. This algorithm is based on maximizing the posterior likelihood using the Expectation-Maximization algorithm. The last part of the paper applies this algorithm to simulated data and to two real cryo-EM data sets: a data set of the 70S ribosome with and without Elongation Factor-G (EF-G), and a data set of the influenza virus RNA dependent RNA Polymerase (RdRP). The first principal component of the 70S ribosome data set reveals the expected conformational changes of the ribosome as the EF-G binds and unbinds. The first principal component of the RdRP data set reveals a conformational change in the two dimers of the RdRP. Copyright © 2015 Elsevier Inc. All rights reserved.
Infrared micro-spectroscopic studies of epithelial cells
Romeo, Melissa; Mohlenhoff, Brian; Jennings, Michael; Diem, Max
2009-01-01
We report results from a study of human and canine mucosal cells, investigated by infrared micro-spectroscopy, and analyzed by methods of multivariate statistics. We demonstrate that the infrared spectra of individual cells are sensitive to the stage of maturation, and that a distinction between healthy and diseased cells will be possible. Since this report is written for an audience not familiar with infrared micro-spectroscopy, a short introduction into this field is presented along with a summary of principal component analysis. PMID:16797481
1982-02-01
of them are pre- sented in this paper. As an application, important practical problems similar to the one posed by Gnanadesikan (1977), p. 77 can be... Gnanadesikan and Wilk (1969) to search for a non-linear combination, giving rise to non-linear first principal component. So, a p-dinensional vector can...distribution, Gnanadesikan and Gupta (1970) and earlier Eaton (1967) have considered the problem of ranking the r underlying populations according to the
40 CFR 60.2998 - What are the principal components of the model rule?
Code of Federal Regulations, 2010 CFR
2010-07-01
... 40 Protection of Environment 6 2010-07-01 2010-07-01 false What are the principal components of... December 9, 2004 Model Rule-Use of Model Rule § 60.2998 What are the principal components of the model rule... management plan. (c) Operator training and qualification. (d) Emission limitations and operating limits. (e...
40 CFR 60.2570 - What are the principal components of the model rule?
Code of Federal Regulations, 2010 CFR
2010-07-01
... 40 Protection of Environment 6 2010-07-01 2010-07-01 false What are the principal components of... Construction On or Before November 30, 1999 Use of Model Rule § 60.2570 What are the principal components of... (k) of this section. (a) Increments of progress toward compliance. (b) Waste management plan. (c...
Stilp, Christian E.; Kluender, Keith R.
2012-01-01
To the extent that sensorineural systems are efficient, redundancy should be extracted to optimize transmission of information, but perceptual evidence for this has been limited. Stilp and colleagues recently reported efficient coding of robust correlation (r = .97) among complex acoustic attributes (attack/decay, spectral shape) in novel sounds. Discrimination of sounds orthogonal to the correlation was initially inferior but later comparable to that of sounds obeying the correlation. These effects were attenuated for less-correlated stimuli (r = .54) for reasons that are unclear. Here, statistical properties of correlation among acoustic attributes essential for perceptual organization are investigated. Overall, simple strength of the principal correlation is inadequate to predict listener performance. Initial superiority of discrimination for statistically consistent sound pairs was relatively insensitive to decreased physical acoustic/psychoacoustic range of evidence supporting the correlation, and to more frequent presentations of the same orthogonal test pairs. However, increased range supporting an orthogonal dimension has substantial effects upon perceptual organization. Connectionist simulations and Eigenvalues from closed-form calculations of principal components analysis (PCA) reveal that perceptual organization is near-optimally weighted to shared versus unshared covariance in experienced sound distributions. Implications of reduced perceptual dimensionality for speech perception and plausible neural substrates are discussed. PMID:22292057
A method to estimate the effect of deformable image registration uncertainties on daily dose mapping
Murphy, Martin J.; Salguero, Francisco J.; Siebers, Jeffrey V.; Staub, David; Vaman, Constantin
2012-01-01
Purpose: To develop a statistical sampling procedure for spatially-correlated uncertainties in deformable image registration and then use it to demonstrate their effect on daily dose mapping. Methods: Sequential daily CT studies are acquired to map anatomical variations prior to fractionated external beam radiotherapy. The CTs are deformably registered to the planning CT to obtain displacement vector fields (DVFs). The DVFs are used to accumulate the dose delivered each day onto the planning CT. Each DVF has spatially-correlated uncertainties associated with it. Principal components analysis (PCA) is applied to measured DVF error maps to produce decorrelated principal component modes of the errors. The modes are sampled independently and reconstructed to produce synthetic registration error maps. The synthetic error maps are convolved with dose mapped via deformable registration to model the resulting uncertainty in the dose mapping. The results are compared to the dose mapping uncertainty that would result from uncorrelated DVF errors that vary randomly from voxel to voxel. Results: The error sampling method is shown to produce synthetic DVF error maps that are statistically indistinguishable from the observed error maps. Spatially-correlated DVF uncertainties modeled by our procedure produce patterns of dose mapping error that are different from that due to randomly distributed uncertainties. Conclusions: Deformable image registration uncertainties have complex spatial distributions. The authors have developed and tested a method to decorrelate the spatial uncertainties and make statistical samples of highly correlated error maps. The sample error maps can be used to investigate the effect of DVF uncertainties on daily dose mapping via deformable image registration. An initial demonstration of this methodology shows that dose mapping uncertainties can be sensitive to spatial patterns in the DVF uncertainties. PMID:22320766
Palazón, L; Navas, A
2017-06-01
Information on sediment contribution and transport dynamics from the contributing catchments is needed to develop management plans to tackle environmental problems related with effects of fine sediment as reservoir siltation. In this respect, the fingerprinting technique is an indirect technique known to be valuable and effective for sediment source identification in river catchments. Large variability in sediment delivery was found in previous studies in the Barasona catchment (1509 km 2 , Central Spanish Pyrenees). Simulation results with SWAT and fingerprinting approaches identified badlands and agricultural uses as the main contributors to sediment supply in the reservoir. In this study the <63 μm sediment fraction from the surface reservoir sediments (2 cm) are investigated following the fingerprinting procedure to assess how the use of different statistical procedures affects the amounts of source contributions. Three optimum composite fingerprints were selected to discriminate between source contributions based in land uses/land covers from the same dataset by the application of (1) discriminant function analysis; and its combination (as second step) with (2) Kruskal-Wallis H-test and (3) principal components analysis. Source contribution results were different between assessed options with the greatest differences observed for option using #3, including the two step process: principal components analysis and discriminant function analysis. The characteristics of the solutions by the applied mixing model and the conceptual understanding of the catchment showed that the most reliable solution was achieved using #2, the two step process of Kruskal-Wallis H-test and discriminant function analysis. The assessment showed the importance of the statistical procedure used to define the optimum composite fingerprint for sediment fingerprinting applications. Copyright © 2016 Elsevier Ltd. All rights reserved.
Statistical analysis of aerosol species, trace gasses, and meteorology in Chicago.
Binaku, Katrina; O'Brien, Timothy; Schmeling, Martina; Fosco, Tinamarie
2013-09-01
Both canonical correlation analysis (CCA) and principal component analysis (PCA) were applied to atmospheric aerosol and trace gas concentrations and meteorological data collected in Chicago during the summer months of 2002, 2003, and 2004. Concentrations of ammonium, calcium, nitrate, sulfate, and oxalate particulate matter, as well as, meteorological parameters temperature, wind speed, wind direction, and humidity were subjected to CCA and PCA. Ozone and nitrogen oxide mixing ratios were also included in the data set. The purpose of statistical analysis was to determine the extent of existing linear relationship(s), or lack thereof, between meteorological parameters and pollutant concentrations in addition to reducing dimensionality of the original data to determine sources of pollutants. In CCA, the first three canonical variate pairs derived were statistically significant at the 0.05 level. Canonical correlation between the first canonical variate pair was 0.821, while correlations of the second and third canonical variate pairs were 0.562 and 0.461, respectively. The first canonical variate pair indicated that increasing temperatures resulted in high ozone mixing ratios, while the second canonical variate pair showed wind speed and humidity's influence on local ammonium concentrations. No new information was uncovered in the third variate pair. Canonical loadings were also interpreted for information regarding relationships between data sets. Four principal components (PCs), expressing 77.0 % of original data variance, were derived in PCA. Interpretation of PCs suggested significant production and/or transport of secondary aerosols in the region (PC1). Furthermore, photochemical production of ozone and wind speed's influence on pollutants were expressed (PC2) along with overall measure of local meteorology (PC3). In summary, CCA and PCA results combined were successful in uncovering linear relationships between meteorology and air pollutants in Chicago and aided in determining possible pollutant sources.
Spectral gene set enrichment (SGSE).
Frost, H Robert; Li, Zhigang; Moore, Jason H
2015-03-03
Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.
Bioclimatic Classification of Northeast Asia for climate change response
NASA Astrophysics Data System (ADS)
Choi, Y.; Jeon, S. W.; Lim, C. H.
2016-12-01
As climate change has been getting worse, we should monitor the change of biodiversity, and distribution of species to handle the crisis and take advantage of climate change. The development of bioclimatic map which classifies land into homogenous zones by similar environment properties is the first step to establish a strategy. Statistically derived classifications of land provide useful spatial frameworks to support ecosystem research, monitoring and policy decisions. Many countries are trying to make this kind of map and actively utilize it to ecosystem conservation and management. However, the Northeast Asia including North Korea doesn't have detailed environmental information, and has not built environmental classification map. Therefore, this study presents a bioclimatic map of Northeast Asia based on statistical clustering of bioclimate data. Bioclim data ver1.4 which provided by WorldClim were considered for inclusion in a model. Eight of the most relevant climate variables were selected by correlation analysis, based on previous studies. Principal Components Analysis (PCA) was used to explain 86% of the variation into three independent dimensions, which were subsequently clustered using an ISODATA clustering. The bioclimatic zone of Northeast Asia could consist of 29, 35, and 50 zones. This bioclimatic map has a 30' resolution. To assess the accuracy, the correlation coefficient was calculated between the first principal component values of the classification variables and the vegetation index, Gross Primary Production (GPP). It shows about 0.5 Pearson correlation coefficient. This study constructed Northeast Asia bioclimatic map by statistical method with high resolution, but in order to better reflect the realities, the variety of climate variables should be considered. Also, further studies should do more quantitative and qualitative validation in various ways. Then, this could be used more effectively to support decision making on climate change adaptation.
Dall'Asta, Andrea; Schievano, Silvia; Bruse, Jan L; Paramasivam, Gowrishankar; Kaihura, Christine Tita; Dunaway, David; Lees, Christoph C
2017-07-01
The antenatal detection of facial dysmorphism using 3-dimensional ultrasound may raise the suspicion of an underlying genetic condition but infrequently leads to a definitive antenatal diagnosis. Despite advances in array and noninvasive prenatal testing, not all genetic conditions can be ascertained from such testing. The aim of this study was to investigate the feasibility of quantitative assessment of fetal face features using prenatal 3-dimensional ultrasound volumes and statistical shape modeling. STUDY DESIGN: Thirteen normal and 7 abnormal stored 3-dimensional ultrasound fetal face volumes were analyzed, at a median gestation of 29 +4 weeks (25 +0 to 36 +1 ). The 20 3-dimensional surface meshes generated were aligned and served as input for a statistical shape model, which computed the mean 3-dimensional face shape and 3-dimensional shape variations using principal component analysis. Ten shape modes explained more than 90% of the total shape variability in the population. While the first mode accounted for overall size differences, the second highlighted shape feature changes from an overall proportionate toward a more asymmetric face shape with a wide prominent forehead and an undersized, posteriorly positioned chin. Analysis of the Mahalanobis distance in principal component analysis shape space suggested differences between normal and abnormal fetuses (median and interquartile range distance values, 7.31 ± 5.54 for the normal group vs 13.27 ± 9.82 for the abnormal group) (P = .056). This feasibility study demonstrates that objective characterization and quantification of fetal facial morphology is possible from 3-dimensional ultrasound. This technique has the potential to assist in utero diagnosis, particularly of rare conditions in which facial dysmorphology is a feature. Copyright © 2017 Elsevier Inc. All rights reserved.
Pouch, Alison M; Vergnat, Mathieu; McGarvey, Jeremy R; Ferrari, Giovanni; Jackson, Benjamin M; Sehgal, Chandra M; Yushkevich, Paul A; Gorman, Robert C; Gorman, Joseph H
2014-01-01
The basis of mitral annuloplasty ring design has progressed from qualitative surgical intuition to experimental and theoretical analysis of annular geometry with quantitative imaging techniques. In this work, we present an automated three-dimensional (3D) echocardiographic image analysis method that can be used to statistically assess variability in normal mitral annular geometry to support advancement in annuloplasty ring design. Three-dimensional patient-specific models of the mitral annulus were automatically generated from 3D echocardiographic images acquired from subjects with normal mitral valve structure and function. Geometric annular measurements including annular circumference, annular height, septolateral diameter, intercommissural width, and the annular height to intercommissural width ratio were automatically calculated. A mean 3D annular contour was computed, and principal component analysis was used to evaluate variability in normal annular shape. The following mean ± standard deviations were obtained from 3D echocardiographic image analysis: annular circumference, 107.0 ± 14.6 mm; annular height, 7.6 ± 2.8 mm; septolateral diameter, 28.5 ± 3.7 mm; intercommissural width, 33.0 ± 5.3 mm; and annular height to intercommissural width ratio, 22.7% ± 6.9%. Principal component analysis indicated that shape variability was primarily related to overall annular size, with more subtle variation in the skewness and height of the anterior annular peak, independent of annular diameter. Patient-specific 3D echocardiographic-based modeling of the human mitral valve enables statistical analysis of physiologically normal mitral annular geometry. The tool can potentially lead to the development of a new generation of annuloplasty rings that restore the diseased mitral valve annulus back to a truly normal geometry. Copyright © 2014 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.
Cole, Jacqueline M; Cheng, Xie; Payne, Michael C
2016-11-07
The use of principal component analysis (PCA) to statistically infer features of local structure from experimental pair distribution function (PDF) data is assessed on a case study of rare-earth phosphate glasses (REPGs). Such glasses, codoped with two rare-earth ions (R and R') of different sizes and optical properties, are of interest to the laser industry. The determination of structure-property relationships in these materials is an important aspect of their technological development. Yet, realizing the local structure of codoped REPGs presents significant challenges relative to their singly doped counterparts; specifically, R and R' are difficult to distinguish in terms of establishing relative material compositions, identifying atomic pairwise correlation profiles in a PDF that are associated with each ion, and resolving peak overlap of such profiles in PDFs. This study demonstrates that PCA can be employed to help overcome these structural complications, by statistically inferring trends in PDFs that exist for a restricted set of experimental data on REPGs, and using these as training data to predict material compositions and PDF profiles in unknown codoped REPGs. The application of these PCA methods to resolve individual atomic pairwise correlations in t(r) signatures is also presented. The training methods developed for these structural predictions are prevalidated by testing their ability to reproduce known physical phenomena, such as the lanthanide contraction, on PDF signatures of the structurally simpler singly doped REPGs. The intrinsic limitations of applying PCA to analyze PDFs relative to the quality control of source data, data processing, and sample definition, are also considered. While this case study is limited to lanthanide-doped REPGs, this type of statistical inference may easily be extended to other inorganic solid-state materials and be exploited in large-scale data-mining efforts that probe many t(r) functions.
Maisuradze, Gia G; Leitner, David M
2007-05-15
Dihedral principal component analysis (dPCA) has recently been developed and shown to display complex features of the free energy landscape of a biomolecule that may be absent in the free energy landscape plotted in principal component space due to mixing of internal and overall rotational motion that can occur in principal component analysis (PCA) [Mu et al., Proteins: Struct Funct Bioinfo 2005;58:45-52]. Another difficulty in the implementation of PCA is sampling convergence, which we address here for both dPCA and PCA using a tetrapeptide as an example. We find that for both methods the sampling convergence can be reached over a similar time. Minima in the free energy landscape in the space of the two largest dihedral principal components often correspond to unique structures, though we also find some distinct minima to correspond to the same structure. 2007 Wiley-Liss, Inc.
NASA Astrophysics Data System (ADS)
Pompe, L.; Clausen, B. L.; Morton, D. M.
2015-12-01
Multi-component statistical techniques and GIS visualization are emerging trends in understanding large data sets. Our research applies these techniques to a large igneous geochemical data set from southern California to better understand magmatic and plate tectonic processes. A set of 480 granitic samples collected by Baird from this area were analyzed for 39 geochemical elements. Of these samples, 287 are from the Peninsular Ranges Batholith (PRB) and 164 from part of the Transverse Ranges (TR). Principal component analysis (PCA) summarized the 39 variables into 3 principal components (PC) by matrix multiplication and for the PRB are interpreted as follows: PC1 with about 30% of the variation included mainly compatible elements and SiO2 and indicates extent of differentation; PC2 with about 20% of the variation included HFS elements and may indicate crustal contamination as usually identified by Sri; PC3 with about 20% of the variation included mainly HRE elements and may indicate magma source depth as often diplayed using REE spider diagrams and possibly Sr/Y. Several elements did not fit well in any of the three components: Cr, Ni, U, and Na2O.For the PRB, the PC1 correlation with SiO2 was r=-0.85, the PC2 correlation with Sri was r=0.80, and the PC3 correlation with Gd/Yb was r=-0.76 and with Sr/Y was r=-0.66 . Extending this method to the TR, correlations were r=-0.85, -0.21, -0.06, and -0.64, respectively. A similar extent of correlation for both areas was visually evident using GIS interpolation.PC1 seems to do well at indicating differentiation index for both the PRB and TR and correlates very well with SiO2, Al2O3, MgO, FeO*, CaO, K2O, Sc, V, and Co, but poorly with Na2O and Cr. If the crustal component is represented by Sri, PC2 correlates well and less expesively with this indicator in the PRB, but not in the TR. Source depth has been related to the slope on REE spidergrams, and PC3 based on only the HREE and using the Sr/Y ratios gives a reasonable correlation for both PRB and TR, but the Gd/Yb ratio gives a reasonable correlation for only the PRB. The PRB data provide reasonable correlation between principal components and standard geochemical indicators, perhaps because of the well-recognized monotonic variation from SW to NE. Data sets from the TR give similar results in some cases, but poor correlation in others.
A Divergence Statistics Extension to VTK for Performance Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pebay, Philippe Pierre; Bennett, Janine Camille
This report follows the series of previous documents ([PT08, BPRT09b, PT09, BPT09, PT10, PB13], where we presented the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k -means, order and auto-correlative statistics engines which we developed within the Visualization Tool Kit ( VTK ) as a scalable, parallel and versatile statistics package. We now report on a new engine which we developed for the calculation of divergence statistics, a concept which we hereafter explain and whose main goal is to quantify the discrepancy, in a stasticial manner akin to measuring a distance, between an observed empirical distribution and a theoretical,more » "ideal" one. The ease of use of the new diverence statistics engine is illustrated by the means of C++ code snippets. Although this new engine does not yet have a parallel implementation, it has already been applied to HPC performance analysis, of which we provide an example.« less
NASA Astrophysics Data System (ADS)
O'Shea, Bethany; Jankowski, Jerzy
2006-12-01
The major ion composition of Great Artesian Basin groundwater in the lower Namoi River valley is relatively homogeneous in chemical composition. Traditional graphical techniques have been combined with multivariate statistical methods to determine whether subtle differences in the chemical composition of these waters can be delineated. Hierarchical cluster analysis and principal components analysis were successful in delineating minor variations within the groundwaters of the study area that were not visually identified in the graphical techniques applied. Hydrochemical interpretation allowed geochemical processes to be identified in each statistically defined water type and illustrated how these groundwaters differ from one another. Three main geochemical processes were identified in the groundwaters: ion exchange, precipitation, and mixing between waters from different sources. Both statistical methods delineated an anomalous sample suspected of being influenced by magmatic CO2 input. The use of statistical methods to complement traditional graphical techniques for waters appearing homogeneous is emphasized for all investigations of this type. Copyright
Chtourou, Fatma; Jabeur, Hazem; Lazzez, Ayda; Bouaziz, Mohamed
2017-05-03
Dynamics of squalene, sterol, aliphatic alcohol, pigment, and triterpenic diol accumulations in olive oils from adult and young trees of the Oueslati cultivar were studied for two consecutive years, 2013-2014 and 2014-2015. Data were compared statistically for differences by age of trees, maturation of olive, and year of harvesting. Results showed that the mean campesterol content in olive oil from adult trees at the green stage of maturation was significantly (p < 0.02) above the limit established by IOC legislation. However, the mean values of campesterol and Δ-7-stigmastenol were significantly (p < 0.01) above the limits in oils from young trees at the black stage of ripening. Principal component analysis was applied to alcohols, squalene, pigments, and sterols having noncompliance with the legislation. Then, data of 36 samples were subjected to a discriminant analysis with "maturation" as grouping variable and principal components as input variables. The model revealed clear discrimination of each tree age/maturation stage group.
Favaro, Livio; Tirelli, Tina; Pessani, Daniela
2010-01-01
Over the last decades, the populations of Austropotamobius pallipes have decreased markedly all over Europe. If we evaluate the ecological factors that determine its presence, we will have information that could guide conservation decisions. This study aims to investigate the chemical-physical demands of A. pallipes in NW Italy. To this end, we investigated 98 sites. We performed Principal Component Analysis using chemical-physical parameters, collected in both presence and absence sites. We then used principal components with eigenvalue > 1 to run Discriminant Function Analysis and Logistic Regression. The statistics on the concentration of Ca(2+), water hardness, pH and BOD(5) were significantly different in the presence and in the absence sites. pH and BOD(5) played the most important role in separating the presence from the absence locations. These findings are further evidence that we should reduce dissolved organic matter and fine particles in order to contribute to species management and conservation. Copyright 2009 Académie des sciences. Published by Elsevier SAS. All rights reserved.
Shabri, Ani; Samsudin, Ruhaidah
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.
NASA Astrophysics Data System (ADS)
Díaz-Ayil, Gilberto; Amouroux, Marine; Clanché, Fabien; Granjon, Yves; Blondel, Walter C. P. M.
2009-07-01
Spatially-resolved bimodal spectroscopy (multiple AutoFluorescence AF excitation and Diffuse Reflectance DR), was used in vivo to discriminate various healthy and precancerous skin stages in a pre-clinical model (UV-irradiated mouse): Compensatory Hyperplasia CH, Atypical Hyperplasia AH and Dysplasia D. A specific data preprocessing scheme was applied to intensity spectra (filtering, spectral correction and intensity normalization), and several sets of spectral characteristics were automatically extracted and selected based on their discrimination power, statistically tested for every pair-wise comparison of histological classes. Data reduction with Principal Components Analysis (PCA) was performed and 3 classification methods were implemented (k-NN, LDA and SVM), in order to compare diagnostic performance of each method. Diagnostic performance was studied and assessed in terms of Sensibility (Se) and Specificity (Sp) as a function of the selected features, of the combinations of 3 different inter-fibres distances and of the numbers of principal components, such that: Se and Sp ~ 100% when discriminating CH vs. others; Sp ~ 100% and Se > 95% when discriminating Healthy vs. AH or D; Sp ~ 74% and Se ~ 63% for AH vs. D.
A Principal Component Analysis/Fuzzy Comprehensive Evaluation for Rockburst Potential in Kimberlite
NASA Astrophysics Data System (ADS)
Pu, Yuanyuan; Apel, Derek; Xu, Huawei
2018-02-01
Kimberlite is an igneous rock which sometimes bears diamonds. Most of the diamonds mined in the world today are found in kimberlite ores. Burst potential in kimberlite has not been investigated, because kimberlite is mostly mined using open-pit mining, which poses very little threat of rock bursting. However, as the mining depth keeps increasing, the mines convert to underground mining methods, which can pose a threat of rock bursting in kimberlite. This paper focuses on the burst potential of kimberlite at a diamond mine in northern Canada. A combined model with the methods of principal component analysis (PCA) and fuzzy comprehensive evaluation (FCE) is developed to process data from 12 different locations in kimberlite pipes. Based on calculated 12 fuzzy evaluation vectors, 8 locations show a moderate burst potential, 2 locations show no burst potential, and 2 locations show strong and violent burst potential, respectively. Using statistical principles, a Mahalanobis distance is adopted to build a comprehensive fuzzy evaluation vector for the whole mine and the final evaluation for burst potential is moderate, which is verified by a practical rockbursting situation at mine site.
Multivariate relationships between groundwater chemistry and toxicity in an urban aquifer.
Dewhurst, Rachel E; Wells, N Claire; Crane, Mark; Callaghan, Amanda; Connon, Richard; Mather, John D
2003-11-01
Multivariate statistical methods were used to investigate the causes of toxicity and controls on groundwater chemistry from 274 boreholes in an urban area (London) of the United Kingdom. The groundwater was alkaline to neutral, and chemistry was dominated by calcium, sodium, and sulfate. Contaminants included fuels, solvents, and organic compounds derived from landfill material. The presence of organic material in the aquifer caused decreases in dissolved oxygen, sulfate and nitrate concentrations, and increases in ferrous iron and ammoniacal nitrogen concentrations. Pearson correlations between toxicity results and the concentration of individual analytes indicated that concentrations of ammoniacal nitrogen, dissolved oxygen, ferrous iron, and hydrocarbons were important where present. However, principal component and regression analysis suggested no significant correlation between toxicity and chemistry over the whole area. Multidimensional scaling was used to investigate differences in sites caused by historical use, landfill gas status, or position within the sample area. Significant differences were observed between sites with different historical land use and those with different gas status. Examination of the principal component matrix revealed that these differences are related to changes in the importance of reduced chemical species.
Physician performance assessment using a composite quality index.
Liu, Kaibo; Jain, Shabnam; Shi, Jianjun
2013-07-10
Assessing physician performance is important for the purposes of measuring and improving quality of service and reducing healthcare delivery costs. In recent years, physician performance scorecards have been used to provide feedback on individual measures; however, one key challenge is how to develop a composite quality index that combines multiple measures for overall physician performance evaluation. A controversy arises over establishing appropriate weights to combine indicators in multiple dimensions, and cannot be easily resolved. In this study, we proposed a generic unsupervised learning approach to develop a single composite index for physician performance assessment by using non-negative principal component analysis. We developed a new algorithm named iterative quadratic programming to solve the numerical issue in the non-negative principal component analysis approach. We conducted real case studies to demonstrate the performance of the proposed method. We provided interpretations from both statistical and clinical perspectives to evaluate the developed composite ranking score in practice. In addition, we implemented the root cause assessment techniques to explain physician performance for improvement purposes. Copyright © 2012 John Wiley & Sons, Ltd.
Szabo, J.K.; Fedriani, E.M.; Segovia-Gonzalez, M. M.; Astheimer, L.B.; Hooper, M.J.
2010-01-01
This paper introduces a new technique in ecology to analyze spatial and temporal variability in environmental variables. By using simple statistics, we explore the relations between abiotic and biotic variables that influence animal distributions. However, spatial and temporal variability in rainfall, a key variable in ecological studies, can cause difficulties to any basic model including time evolution. The study was of a landscape scale (three million square kilometers in eastern Australia), mainly over the period of 19982004. We simultaneously considered qualitative spatial (soil and habitat types) and quantitative temporal (rainfall) variables in a Geographical Information System environment. In addition to some techniques commonly used in ecology, we applied a new method, Functional Principal Component Analysis, which proved to be very suitable for this case, as it explained more than 97% of the total variance of the rainfall data, providing us with substitute variables that are easier to manage and are even able to explain rainfall patterns. The main variable came from a habitat classification that showed strong correlations with rainfall values and soil types. ?? 2010 World Scientific Publishing Company.
Does boldness explain vulnerability to angling in Eurasian perch Perca fluviatilis?
Vainikka, Anssi; Tammela, Ilkka; Hyvärinen, Pekka
2016-04-01
Consistent individual differences (CIDs) in behavior are of interest to both basic and applied research, because any selection acting on them could induce evolution of animal behavior. It has been suggested that CIDs in the behavior of fish might explain individual differences in vulnerability to fishing. If so, fishing could impose selection on fish behavior. In this study, we assessed boldness-indicating behaviors of Eurasian perch Perca fluviatilis using individually conducted experiments measuring the time taken to explore a novel arena containing predator (burbot, Lota lota ) cues. We studied if individual differences in boldness would explain vulnerability of individually tagged perch to experimental angling in outdoor ponds, or if fishing would impose selection on boldness-indicating behavior. Perch expressed repeatable individual differences in boldness-indicating behavior but the individual boldness-score (the first principal component) obtained using principal component analysis combining all the measured behavioral responses did not explain vulnerability to experimental angling. Instead, large body size appeared as the only statistically significant predictor of capture probability. Our results suggest that angling is selective for large size, but not always selective for high boldness.
Does boldness explain vulnerability to angling in Eurasian perch Perca fluviatilis?
Vainikka, Anssi; Tammela, Ilkka; Hyvärinen, Pekka
2016-01-01
Abstract Consistent individual differences (CIDs) in behavior are of interest to both basic and applied research, because any selection acting on them could induce evolution of animal behavior. It has been suggested that CIDs in the behavior of fish might explain individual differences in vulnerability to fishing. If so, fishing could impose selection on fish behavior. In this study, we assessed boldness-indicating behaviors of Eurasian perch Perca fluviatilis using individually conducted experiments measuring the time taken to explore a novel arena containing predator (burbot, Lota lota) cues. We studied if individual differences in boldness would explain vulnerability of individually tagged perch to experimental angling in outdoor ponds, or if fishing would impose selection on boldness-indicating behavior. Perch expressed repeatable individual differences in boldness-indicating behavior but the individual boldness-score (the first principal component) obtained using principal component analysis combining all the measured behavioral responses did not explain vulnerability to experimental angling. Instead, large body size appeared as the only statistically significant predictor of capture probability. Our results suggest that angling is selective for large size, but not always selective for high boldness. PMID:29491897
Shabri, Ani; Samsudin, Ruhaidah
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666
Zhang, Bing; Song, Xianfang; Zhang, Yinghua; Han, Dongmei; Tang, Changyuan; Yu, Yilei; Ma, Ying
2012-05-15
Water quality is the critical factor that influence on human health and quantity and quality of grain production in semi-humid and semi-arid area. Songnen plain is one of the grain bases in China, as well as one of the three major distribution regions of soda saline-alkali soil in the world. To assess the water quality, surface water and groundwater were sampled and analyzed by fuzzy membership analysis and multivariate statistics. The surface water were gather into class I, IV and V, while groundwater were grouped as class I, II, III and V by fuzzy membership analysis. The water samples were grouped into four categories according to irrigation water quality assessment diagrams of USDA. Most water samples distributed in category C1-S1, C2-S2 and C3-S3. Three groups were generated from hierarchical cluster analysis. Four principal components were extracted from principal component analysis. The indicators to water quality assessment were Na, HCO(3), NO(3), Fe, Mn and EC from principal component analysis. We conclude that surface water and shallow groundwater are suitable for irrigation, the reservoir and deep groundwater in upstream are the resources for drinking. The water for drinking should remove of the naturally occurring ions of Fe and Mn. The control of sodium and salinity hazard is required for irrigation. The integrated management of surface water and groundwater for drinking and irrigation is to solve the water issues. Copyright © 2012 Elsevier Ltd. All rights reserved.
Bayesian estimation of Karhunen–Loève expansions; A random subspace approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chowdhary, Kenny; Najm, Habib N.
One of the most widely-used statistical procedures for dimensionality reduction of high dimensional random fields is Principal Component Analysis (PCA), which is based on the Karhunen-Lo eve expansion (KLE) of a stochastic process with finite variance. The KLE is analogous to a Fourier series expansion for a random process, where the goal is to find an orthogonal transformation for the data such that the projection of the data onto this orthogonal subspace is optimal in the L 2 sense, i.e, which minimizes the mean square error. In practice, this orthogonal transformation is determined by performing an SVD (Singular Value Decomposition)more » on the sample covariance matrix or on the data matrix itself. Sampling error is typically ignored when quantifying the principal components, or, equivalently, basis functions of the KLE. Furthermore, it is exacerbated when the sample size is much smaller than the dimension of the random field. In this paper, we introduce a Bayesian KLE procedure, allowing one to obtain a probabilistic model on the principal components, which can account for inaccuracies due to limited sample size. The probabilistic model is built via Bayesian inference, from which the posterior becomes the matrix Bingham density over the space of orthonormal matrices. We use a modified Gibbs sampling procedure to sample on this space and then build a probabilistic Karhunen-Lo eve expansions over random subspaces to obtain a set of low-dimensional surrogates of the stochastic process. We illustrate this probabilistic procedure with a finite dimensional stochastic process inspired by Brownian motion.« less
Bayesian estimation of Karhunen–Loève expansions; A random subspace approach
Chowdhary, Kenny; Najm, Habib N.
2016-04-13
One of the most widely-used statistical procedures for dimensionality reduction of high dimensional random fields is Principal Component Analysis (PCA), which is based on the Karhunen-Lo eve expansion (KLE) of a stochastic process with finite variance. The KLE is analogous to a Fourier series expansion for a random process, where the goal is to find an orthogonal transformation for the data such that the projection of the data onto this orthogonal subspace is optimal in the L 2 sense, i.e, which minimizes the mean square error. In practice, this orthogonal transformation is determined by performing an SVD (Singular Value Decomposition)more » on the sample covariance matrix or on the data matrix itself. Sampling error is typically ignored when quantifying the principal components, or, equivalently, basis functions of the KLE. Furthermore, it is exacerbated when the sample size is much smaller than the dimension of the random field. In this paper, we introduce a Bayesian KLE procedure, allowing one to obtain a probabilistic model on the principal components, which can account for inaccuracies due to limited sample size. The probabilistic model is built via Bayesian inference, from which the posterior becomes the matrix Bingham density over the space of orthonormal matrices. We use a modified Gibbs sampling procedure to sample on this space and then build a probabilistic Karhunen-Lo eve expansions over random subspaces to obtain a set of low-dimensional surrogates of the stochastic process. We illustrate this probabilistic procedure with a finite dimensional stochastic process inspired by Brownian motion.« less
Fast, Exact Bootstrap Principal Component Analysis for p > 1 million
Fisher, Aaron; Caffo, Brian; Schwartz, Brian; Zipunnikov, Vadim
2015-01-01
Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject (p) is much larger than the number of subjects (n), calculating and storing the leading principal components from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap principal components, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same n-dimensional subspace as the original sample. As a result, all bootstrap principal components are limited to the same n-dimensional subspace and can be efficiently represented by their low dimensional coordinates in that subspace. Several uncertainty metrics can be computed solely based on the bootstrap distribution of these low dimensional coordinates, without calculating or storing the p-dimensional bootstrap components. Fast bootstrap PCA is applied to a dataset of sleep electroencephalogram recordings (p = 900, n = 392), and to a dataset of brain magnetic resonance images (MRIs) (p ≈ 3 million, n = 352). For the MRI dataset, our method allows for standard errors for the first 3 principal components based on 1000 bootstrap samples to be calculated on a standard laptop in 47 minutes, as opposed to approximately 4 days with standard methods. PMID:27616801
An Empirical Cumulus Parameterization Scheme for a Global Spectral Model
NASA Technical Reports Server (NTRS)
Rajendran, K.; Krishnamurti, T. N.; Misra, V.; Tao, W.-K.
2004-01-01
Realistic vertical heating and drying profiles in a cumulus scheme is important for obtaining accurate weather forecasts. A new empirical cumulus parameterization scheme based on a procedure to improve the vertical distribution of heating and moistening over the tropics is developed. The empirical cumulus parameterization scheme (ECPS) utilizes profiles of Tropical Rainfall Measuring Mission (TRMM) based heating and moistening derived from the European Centre for Medium- Range Weather Forecasts (ECMWF) analysis. A dimension reduction technique through rotated principal component analysis (RPCA) is performed on the vertical profiles of heating (Q1) and drying (Q2) over the convective regions of the tropics, to obtain the dominant modes of variability. Analysis suggests that most of the variance associated with the observed profiles can be explained by retaining the first three modes. The ECPS then applies a statistical approach in which Q1 and Q2 are expressed as a linear combination of the first three dominant principal components which distinctly explain variance in the troposphere as a function of the prevalent large-scale dynamics. The principal component (PC) score which quantifies the contribution of each PC to the corresponding loading profile is estimated through a multiple screening regression method which yields the PC score as a function of the large-scale variables. The profiles of Q1 and Q2 thus obtained are found to match well with the observed profiles. The impact of the ECPS is investigated in a series of short range (1-3 day) prediction experiments using the Florida State University global spectral model (FSUGSM, T126L14). Comparisons between short range ECPS forecasts and those with the modified Kuo scheme show a very marked improvement in the skill in ECPS forecasts. This improvement in the forecast skill with ECPS emphasizes the importance of incorporating realistic vertical distributions of heating and drying in the model cumulus scheme. This also suggests that in the absence of explicit models for convection, the proposed statistical scheme improves the modeling of the vertical distribution of heating and moistening in areas of deep convection.
ERIC Educational Resources Information Center
Oplatka, Izhar
2017-01-01
Purpose: In order to fill the gap in theoretical and empirical knowledge about the characteristics of principal workload, the purpose of this paper is to explore the components of principal workload as well as its determinants and the coping strategies commonly used by principals to face this personal state. Design/methodology/approach:…
NASA Astrophysics Data System (ADS)
Sanatkhani, Soroosh; Menon, Prahlad G.
2018-03-01
Left atrial appendage (LAA) is the source of 91% of the thrombi in patients with atrial arrhythmias ( 2.3 million US adults), turning this region into a potential threat for stroke. LAA geometries have been clinically categorized into four appearance groups viz. Cauliflower, Cactus, Chicken-Wing and WindSock, based on visual appearance in 3D volume visualizations of contrast-enhanced computed tomography (CT) imaging, and have further been correlated with stroke risk by considering clinical mortality statistics. However, such classification from visual appearance is limited by human subjectivity and is not sophisticated enough to address all the characteristics of the geometries. Quantification of LAA geometry metrics can reveal a more repeatable and reliable estimate on the characteristics of the LAA which correspond with stasis risk, and in-turn cardioembolic risk. We present an approach to quantify the appearance of the LAA in patients in atrial fibrillation (AF) using a weighted set of baseline eigen-modes of LAA appearance variation, as a means to objectify classification of patient-specific LAAs into the four accepted clinical appearance groups. Clinical images of 16 patients (4 per LAA appearance category) with atrial fibrillation (AF) were identified and visualized as volume images. All the volume images were rigidly reoriented in order to be spatially co-registered, normalized in terms of intensity, resampled and finally reshaped appropriately to carry out principal component analysis (PCA), in order to parametrize the LAA region's appearance based on principal components (PCs/eigen mode) of greyscale appearance, generating 16 eigen-modes of appearance variation. Our pilot studies show that the most dominant LAA appearance (i.e. reconstructable using the fewest eigen-modes) resembles the Chicken-Wing class, which is known to have the lowest stroke risk per clinical mortality statistics. Our findings indicate the possibility that LAA geometries with high risk of stroke are higher-order statistical variants of underlying lower risk shapes.
Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan
2017-12-01
Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis (PCA) and cluster analysis (CA). Copyright © 2017 Elsevier Ltd. All rights reserved.
Konaté, Ahmed Amara; Ma, Huolin; Pan, Heping; Qin, Zhen; Ahmed, Hafizullah Abba; Dembele, N'dji Dit Jacques
2017-10-01
The availability of a deep well that penetrates deep into the Ultra High Pressure (UHP) metamorphic rocks is unusual and consequently offers a unique chance to study the metamorphic rocks. One such borehole is located in the southern part of Donghai County in the Sulu UHP metamorphic belt of Eastern China, from the Chinese Continental Scientific Drilling Main hole. This study reports the results obtained from the analysis of oxide log data. A geochemical logging tool provides in situ, gamma ray spectroscopy measurements of major and trace elements in the borehole. Dry weight percent oxide concentration logs obtained for this study were SiO 2 , K 2 O, TiO 2 , H 2 O, CO 2 , Na 2 O, Fe 2 O 3 , FeO, CaO, MnO, MgO, P 2 O 5 and Al 2 O 3 . Cross plot and Principal Component Analysis methods were applied for lithology characterization and mineralogy description respectively. Cross plot analysis allows lithological variations to be characterized. Principal Component Analysis shows that the oxide logs can be summarized by two components related to the feldspar and hydrous minerals. This study has shown that geochemical logging tool data is accurate and adequate to be tremendously useful in UHP metamorphic rocks analysis. Copyright © 2017 Elsevier Ltd. All rights reserved.
Wang, Jian; Zhu, Jinmao; Huang, RuZhu; Yang, YuSheng
2012-07-01
We explored the rapid qualitative analysis of wheat cultivars with good lodging resistances by Fourier transform infrared resonance (FTIR) spectroscopy and multivariate statistical analysis. FTIR imaging showing that wheat stem cell walls were mainly composed of cellulose, pectin, protein, and lignin. Principal components analysis (PCA) was used to eliminate multicollinearity among multiple peak absorptions. PCA revealed the developmental internodes of wheat stems could be distributed from low to high along the load of the second principal component, which was consistent with the corresponding bands of cellulose in the FTIR spectra of the cell walls. Furthermore, four distinct stem populations could also be identified by spectral features related to their corresponding mechanical properties via PCA and cluster analysis. Histochemical staining of four types of wheat stems with various abilities to resist lodging revealed that cellulose contributed more than lignin to the ability to resist lodging. These results strongly suggested that the main cell wall component responsible for these differences was cellulose. Therefore, the combination of multivariate analysis and FTIR could rapidly screen wheat cultivars with good lodging resistance. Furthermore, the application of these methods to a much wider range of cultivars of unknown mechanical properties promises to be of interest.
Hao, Guozhu
2016-01-01
A water traffic system is a huge, nonlinear, complex system, and its stability is affected by various factors. Water traffic accidents can be considered to be a kind of mutation of a water traffic system caused by the coupling of multiple navigational environment factors. In this study, the catastrophe theory, principal component analysis (PCA), and multivariate statistics are integrated to establish a situation recognition model for a navigational environment with the aim of performing a quantitative analysis of the situation of this environment via the extraction and classification of its key influencing factors; in this model, the natural environment and traffic environment are considered to be two control variables. The Three Gorges Reservoir area of the Yangtze River is considered as an example, and six critical factors, i.e., the visibility, wind, current velocity, route intersection, channel dimension, and traffic flow, are classified into two principal components: the natural environment and traffic environment. These two components are assumed to have the greatest influence on the navigation risk. Then, the cusp catastrophe model is employed to identify the safety situation of the regional navigational environment in the Three Gorges Reservoir area. The simulation results indicate that the situation of the navigational environment of this area is gradually worsening from downstream to upstream. PMID:27391057
Jiang, Dan; Hao, Guozhu; Huang, Liwen; Zhang, Dan
2016-01-01
A water traffic system is a huge, nonlinear, complex system, and its stability is affected by various factors. Water traffic accidents can be considered to be a kind of mutation of a water traffic system caused by the coupling of multiple navigational environment factors. In this study, the catastrophe theory, principal component analysis (PCA), and multivariate statistics are integrated to establish a situation recognition model for a navigational environment with the aim of performing a quantitative analysis of the situation of this environment via the extraction and classification of its key influencing factors; in this model, the natural environment and traffic environment are considered to be two control variables. The Three Gorges Reservoir area of the Yangtze River is considered as an example, and six critical factors, i.e., the visibility, wind, current velocity, route intersection, channel dimension, and traffic flow, are classified into two principal components: the natural environment and traffic environment. These two components are assumed to have the greatest influence on the navigation risk. Then, the cusp catastrophe model is employed to identify the safety situation of the regional navigational environment in the Three Gorges Reservoir area. The simulation results indicate that the situation of the navigational environment of this area is gradually worsening from downstream to upstream.
Detection of micro solder balls using active thermography and probabilistic neural network
NASA Astrophysics Data System (ADS)
He, Zhenzhi; Wei, Li; Shao, Minghui; Lu, Xingning
2017-03-01
Micro solder ball/bump has been widely used in electronic packaging. It has been challenging to inspect these structures as the solder balls/bumps are often embedded between the component and substrates, especially in flip-chip packaging. In this paper, a detection method for micro solder ball/bump based on the active thermography and the probabilistic neural network is investigated. A VH680 infrared imager is used to capture the thermal image of the test vehicle, SFA10 packages. The temperature curves are processed using moving average technique to remove the peak noise. And the principal component analysis (PCA) is adopted to reconstruct the thermal images. The missed solder balls can be recognized explicitly in the second principal component image. Probabilistic neural network (PNN) is then established to identify the defective bump intelligently. The hot spots corresponding to the solder balls are segmented from the PCA reconstructed image, and statistic parameters are calculated. To characterize the thermal properties of solder bump quantitatively, three representative features are selected and used as the input vector in PNN clustering. The results show that the actual outputs and the expected outputs are consistent in identification of the missed solder balls, and all the bumps were recognized accurately, which demonstrates the viability of the PNN in effective defect inspection in high-density microelectronic packaging.
Statistical shape modeling of human cochlea: alignment and principal component analysis
NASA Astrophysics Data System (ADS)
Poznyakovskiy, Anton A.; Zahnert, Thomas; Fischer, Björn; Lasurashvili, Nikoloz; Kalaidzidis, Yannis; Mürbe, Dirk
2013-02-01
The modeling of the cochlear labyrinth in living subjects is hampered by insufficient resolution of available clinical imaging methods. These methods usually provide resolutions higher than 125 μm. This is too crude to record the position of basilar membrane and, as a result, keep apart even the scala tympani from other scalae. This problem could be avoided by the means of atlas-based segmentation. The specimens can endure higher radiation loads and, conversely, provide better-resolved images. The resulting surface can be used as the seed for atlas-based segmentation. To serve this purpose, we have developed a statistical shape model (SSM) of human scala tympani based on segmentations obtained from 10 μCT image stacks. After segmentation, we aligned the resulting surfaces using Procrustes alignment. This algorithm was slightly modified to accommodate single models with nodes which do not necessarily correspond to salient features and vary in number between models. We have established correspondence by mutual proximity between nodes. Rather than using the standard Euclidean norm, we have applied an alternative logarithmic norm to improve outlier treatment. The minimization was done using BFGS method. We have also split the surface nodes along an octree to reduce computation cost. Subsequently, we have performed the principal component analysis of the training set with Jacobi eigenvalue algorithm. We expect the resulting method to help acquiring not only better understanding in interindividual variations of cochlear anatomy, but also a step towards individual models for pre-operative diagnostics prior to cochlear implant insertions.
Kourgialas, Nektarios N; Dokou, Zoi; Karatzas, George P
2015-05-01
The purpose of this study was to create a modeling management tool for the simulation of extreme flow events under current and future climatic conditions. This tool is a combination of different components and can be applied in complex hydrogeological river basins, where frequent flood and drought phenomena occur. The first component is the statistical analysis of the available hydro-meteorological data. Specifically, principal components analysis was performed in order to quantify the importance of the hydro-meteorological parameters that affect the generation of extreme events. The second component is a prediction-forecasting artificial neural network (ANN) model that simulates, accurately and efficiently, river flow on an hourly basis. This model is based on a methodology that attempts to resolve a very difficult problem related to the accurate estimation of extreme flows. For this purpose, the available measurements (5 years of hourly data) were divided in two subsets: one for the dry and one for the wet periods of the hydrological year. This way, two ANNs were created, trained, tested and validated for a complex Mediterranean river basin in Crete, Greece. As part of the second management component a statistical downscaling tool was used for the creation of meteorological data according to the higher and lower emission climate change scenarios A2 and B1. These data are used as input in the ANN for the forecasting of river flow for the next two decades. The final component is the application of a meteorological index on the measured and forecasted precipitation and flow data, in order to assess the severity and duration of extreme events. Copyright © 2015 Elsevier Ltd. All rights reserved.
Fault Detection of Bearing Systems through EEMD and Optimization Algorithm
Lee, Dong-Han; Ahn, Jong-Hyo; Koh, Bong-Hwan
2017-01-01
This study proposes a fault detection and diagnosis method for bearing systems using ensemble empirical mode decomposition (EEMD) based feature extraction, in conjunction with particle swarm optimization (PSO), principal component analysis (PCA), and Isomap. First, a mathematical model is assumed to generate vibration signals from damaged bearing components, such as the inner-race, outer-race, and rolling elements. The process of decomposing vibration signals into intrinsic mode functions (IMFs) and extracting statistical features is introduced to develop a damage-sensitive parameter vector. Finally, PCA and Isomap algorithm are used to classify and visualize this parameter vector, to separate damage characteristics from healthy bearing components. Moreover, the PSO-based optimization algorithm improves the classification performance by selecting proper weightings for the parameter vector, to maximize the visualization effect of separating and grouping of parameter vectors in three-dimensional space. PMID:29143772
Statistical analysis of fNIRS data: a comprehensive review.
Tak, Sungho; Ye, Jong Chul
2014-01-15
Functional near-infrared spectroscopy (fNIRS) is a non-invasive method to measure brain activities using the changes of optical absorption in the brain through the intact skull. fNIRS has many advantages over other neuroimaging modalities such as positron emission tomography (PET), functional magnetic resonance imaging (fMRI), or magnetoencephalography (MEG), since it can directly measure blood oxygenation level changes related to neural activation with high temporal resolution. However, fNIRS signals are highly corrupted by measurement noises and physiology-based systemic interference. Careful statistical analyses are therefore required to extract neuronal activity-related signals from fNIRS data. In this paper, we provide an extensive review of historical developments of statistical analyses of fNIRS signal, which include motion artifact correction, short source-detector separation correction, principal component analysis (PCA)/independent component analysis (ICA), false discovery rate (FDR), serially-correlated errors, as well as inference techniques such as the standard t-test, F-test, analysis of variance (ANOVA), and statistical parameter mapping (SPM) framework. In addition, to provide a unified view of various existing inference techniques, we explain a linear mixed effect model with restricted maximum likelihood (ReML) variance estimation, and show that most of the existing inference methods for fNIRS analysis can be derived as special cases. Some of the open issues in statistical analysis are also described. Copyright © 2013 Elsevier Inc. All rights reserved.
Brown, C. Erwin
1993-01-01
Correlation analysis in conjunction with principal-component and multiple-regression analyses were applied to laboratory chemical and petrographic data to assess the usefulness of these techniques in evaluating selected physical and hydraulic properties of carbonate-rock aquifers in central Pennsylvania. Correlation and principal-component analyses were used to establish relations and associations among variables, to determine dimensions of property variation of samples, and to filter the variables containing similar information. Principal-component and correlation analyses showed that porosity is related to other measured variables and that permeability is most related to porosity and grain size. Four principal components are found to be significant in explaining the variance of data. Stepwise multiple-regression analysis was used to see how well the measured variables could predict porosity and (or) permeability for this suite of rocks. The variation in permeability and porosity is not totally predicted by the other variables, but the regression is significant at the 5% significance level. ?? 1993.
Nariai, N; Kim, S; Imoto, S; Miyano, S
2004-01-01
We propose a statistical method to estimate gene networks from DNA microarray data and protein-protein interactions. Because physical interactions between proteins or multiprotein complexes are likely to regulate biological processes, using only mRNA expression data is not sufficient for estimating a gene network accurately. Our method adds knowledge about protein-protein interactions to the estimation method of gene networks under a Bayesian statistical framework. In the estimated gene network, a protein complex is modeled as a virtual node based on principal component analysis. We show the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae cell cycle data. The proposed method improves the accuracy of the estimated gene networks, and successfully identifies some biological facts.
New software for statistical analysis of Cambridge Structural Database data
Sykes, Richard A.; McCabe, Patrick; Allen, Frank H.; Battle, Gary M.; Bruno, Ian J.; Wood, Peter A.
2011-01-01
A collection of new software tools is presented for the analysis of geometrical, chemical and crystallographic data from the Cambridge Structural Database (CSD). This software supersedes the program Vista. The new functionality is integrated into the program Mercury in order to provide statistical, charting and plotting options alongside three-dimensional structural visualization and analysis. The integration also permits immediate access to other information about specific CSD entries through the Mercury framework, a common requirement in CSD data analyses. In addition, the new software includes a range of more advanced features focused towards structural analysis such as principal components analysis, cone-angle correction in hydrogen-bond analyses and the ability to deal with topological symmetry that may be exhibited in molecular search fragments. PMID:22477784
Hemmateenejad, Bahram; Akhond, Morteza; Miri, Ramin; Shamsipur, Mojtaba
2003-01-01
A QSAR algorithm, principal component-genetic algorithm-artificial neural network (PC-GA-ANN), has been applied to a set of newly synthesized calcium channel blockers, which are of special interest because of their role in cardiac diseases. A data set of 124 1,4-dihydropyridines bearing different ester substituents at the C-3 and C-5 positions of the dihydropyridine ring and nitroimidazolyl, phenylimidazolyl, and methylsulfonylimidazolyl groups at the C-4 position with known Ca(2+) channel binding affinities was employed in this study. Ten different sets of descriptors (837 descriptors) were calculated for each molecule. The principal component analysis was used to compress the descriptor groups into principal components. The most significant descriptors of each set were selected and used as input for the ANN. The genetic algorithm (GA) was used for the selection of the best set of extracted principal components. A feed forward artificial neural network with a back-propagation of error algorithm was used to process the nonlinear relationship between the selected principal components and biological activity of the dihydropyridines. A comparison between PC-GA-ANN and routine PC-ANN shows that the first model yields better prediction ability.
Salvatore, Stefania; Bramness, Jørgen G; Røislien, Jo
2016-07-12
Wastewater-based epidemiology (WBE) is a novel approach in drug use epidemiology which aims to monitor the extent of use of various drugs in a community. In this study, we investigate functional principal component analysis (FPCA) as a tool for analysing WBE data and compare it to traditional principal component analysis (PCA) and to wavelet principal component analysis (WPCA) which is more flexible temporally. We analysed temporal wastewater data from 42 European cities collected daily over one week in March 2013. The main temporal features of ecstasy (MDMA) were extracted using FPCA using both Fourier and B-spline basis functions with three different smoothing parameters, along with PCA and WPCA with different mother wavelets and shrinkage rules. The stability of FPCA was explored through bootstrapping and analysis of sensitivity to missing data. The first three principal components (PCs), functional principal components (FPCs) and wavelet principal components (WPCs) explained 87.5-99.6 % of the temporal variation between cities, depending on the choice of basis and smoothing. The extracted temporal features from PCA, FPCA and WPCA were consistent. FPCA using Fourier basis and common-optimal smoothing was the most stable and least sensitive to missing data. FPCA is a flexible and analytically tractable method for analysing temporal changes in wastewater data, and is robust to missing data. WPCA did not reveal any rapid temporal changes in the data not captured by FPCA. Overall the results suggest FPCA with Fourier basis functions and common-optimal smoothing parameter as the most accurate approach when analysing WBE data.
40 CFR 62.14505 - What are the principal components of this subpart?
Code of Federal Regulations, 2010 CFR
2010-07-01
... 40 Protection of Environment 8 2010-07-01 2010-07-01 false What are the principal components of this subpart? 62.14505 Section 62.14505 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY... components of this subpart? This subpart contains the eleven major components listed in paragraphs (a...
Liu, Ruiming; Liu, Erqi; Yang, Jie; Zeng, Yong; Wang, Fanglin; Cao, Yuan
2007-11-01
Fukunaga-Koontz transform (FKT), stemming from principal component analysis (PCA), is used in many pattern recognition and image-processing fields. It cannot capture the higher-order statistical property of natural images, so its detection performance is not satisfying. PCA has been extended into kernel PCA in order to capture the higher-order statistics. However, thus far there have been no researchers who have definitely proposed kernel FKT (KFKT) and researched its detection performance. For accurately detecting potential small targets from infrared images, we first extend FKT into KFKT to capture the higher-order statistical properties of images. Then a framework based on Kalman prediction and KFKT, which can automatically detect and track small targets, is developed. Results of experiments show that KFKT outperforms FKT and the proposed framework is competent to automatically detect and track infrared point targets.
NASA Astrophysics Data System (ADS)
Liu, Ruiming; Liu, Erqi; Yang, Jie; Zeng, Yong; Wang, Fanglin; Cao, Yuan
2007-11-01
Fukunaga-Koontz transform (FKT), stemming from principal component analysis (PCA), is used in many pattern recognition and image-processing fields. It cannot capture the higher-order statistical property of natural images, so its detection performance is not satisfying. PCA has been extended into kernel PCA in order to capture the higher-order statistics. However, thus far there have been no researchers who have definitely proposed kernel FKT (KFKT) and researched its detection performance. For accurately detecting potential small targets from infrared images, we first extend FKT into KFKT to capture the higher-order statistical properties of images. Then a framework based on Kalman prediction and KFKT, which can automatically detect and track small targets, is developed. Results of experiments show that KFKT outperforms FKT and the proposed framework is competent to automatically detect and track infrared point targets.
Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.
Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V
2007-01-01
The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.
Chang, Cheng; Xu, Kaikun; Guo, Chaoping; Wang, Jinxia; Yan, Qi; Zhang, Jian; He, Fuchu; Zhu, Yunping
2018-05-22
Compared with the numerous software tools developed for identification and quantification of -omics data, there remains a lack of suitable tools for both downstream analysis and data visualization. To help researchers better understand the biological meanings in their -omics data, we present an easy-to-use tool, named PANDA-view, for both statistical analysis and visualization of quantitative proteomics data and other -omics data. PANDA-view contains various kinds of analysis methods such as normalization, missing value imputation, statistical tests, clustering and principal component analysis, as well as the most commonly-used data visualization methods including an interactive volcano plot. Additionally, it provides user-friendly interfaces for protein-peptide-spectrum representation of the quantitative proteomics data. PANDA-view is freely available at https://sourceforge.net/projects/panda-view/. 1987ccpacer@163.com and zhuyunping@gmail.com. Supplementary data are available at Bioinformatics online.
A Comparison of Analytical and Data Preprocessing Methods for Spectral Fingerprinting
LUTHRIA, DEVANAND L.; MUKHOPADHYAY, SUDARSAN; LIN, LONG-ZE; HARNLY, JAMES M.
2013-01-01
Spectral fingerprinting, as a method of discriminating between plant cultivars and growing treatments for a common set of broccoli samples, was compared for six analytical instruments. Spectra were acquired for finely powdered solid samples using Fourier transform infrared (FT-IR) and Fourier transform near-infrared (NIR) spectrometry. Spectra were also acquired for unfractionated aqueous methanol extracts of the powders using molecular absorption in the ultraviolet (UV) and visible (VIS) regions and mass spectrometry with negative (MS−) and positive (MS+) ionization. The spectra were analyzed using nested one-way analysis of variance (ANOVA) and principal component analysis (PCA) to statistically evaluate the quality of discrimination. All six methods showed statistically significant differences between the cultivars and treatments. The significance of the statistical tests was improved by the judicious selection of spectral regions (IR and NIR), masses (MS+ and MS−), and derivatives (IR, NIR, UV, and VIS). PMID:21352644
Baseline estimation in flame's spectra by using neural networks and robust statistics
NASA Astrophysics Data System (ADS)
Garces, Hugo; Arias, Luis; Rojas, Alejandro
2014-09-01
This work presents a baseline estimation method in flame spectra based on artificial intelligence structure as a neural network, combining robust statistics with multivariate analysis to automatically discriminate measured wavelengths belonging to continuous feature for model adaptation, surpassing restriction of measuring target baseline for training. The main contributions of this paper are: to analyze a flame spectra database computing Jolliffe statistics from Principal Components Analysis detecting wavelengths not correlated with most of the measured data corresponding to baseline; to systematically determine the optimal number of neurons in hidden layers based on Akaike's Final Prediction Error; to estimate baseline in full wavelength range sampling measured spectra; and to train an artificial intelligence structure as a Neural Network which allows to generalize the relation between measured and baseline spectra. The main application of our research is to compute total radiation with baseline information, allowing to diagnose combustion process state for optimization in early stages.
Hierarchical Regularity in Multi-Basin Dynamics on Protein Landscapes
NASA Astrophysics Data System (ADS)
Matsunaga, Yasuhiro; Kostov, Konstatin S.; Komatsuzaki, Tamiki
2004-04-01
We analyze time series of potential energy fluctuations and principal components at several temperatures for two kinds of off-lattice 46-bead models that have two distinctive energy landscapes. The less-frustrated "funnel" energy landscape brings about stronger nonstationary behavior of the potential energy fluctuations at the folding temperature than the other, rather frustrated energy landscape at the collapse temperature. By combining principal component analysis with an embedding nonlinear time-series analysis, it is shown that the fast fluctuations with small amplitudes of 70-80% of the principal components cause the time series to become almost "random" in only 100 simulation steps. However, the stochastic feature of the principal components tends to be suppressed through a wide range of degrees of freedom at the transition temperature.
Foot anthropometry and morphology phenomena.
Agić, Ante; Nikolić, Vasilije; Mijović, Budimir
2006-12-01
Foot structure description is important for many reasons. The foot anthropometric morphology phenomena are analyzed together with hidden biomechanical functionality in order to fully characterize foot structure and function. For younger Croatian population the scatter data of the individual foot variables were interpolated by multivariate statistics. Foot structure descriptors are influenced by many factors, as a style of life, race, climate, and things of the great importance in human society. Dominant descriptors are determined by principal component analysis. Some practical recommendation and conclusion for medical, sportswear and footwear practice are highlighted.
Quantitation of flavonoid constituents in citrus fruits.
Kawaii, S; Tomono, Y; Katase, E; Ogawa, K; Yano, M
1999-09-01
Twenty-four flavonoids have been determined in 66 Citrus species and near-citrus relatives, grown in the same field and year, by means of reversed phase high-performance liquid chromatography analysis. Statistical methods have been applied to find relations among the species. The F ratios of 21 flavonoids obtained by applying ANOVA analysis are significant, indicating that a classification of the species using these variables is reasonable to pursue. Principal component analysis revealed that the distributions of Citrus species belonging to different classes were largely in accordance with Tanaka's classification system.
The Different Paths in the Franchising Entrepreneurship Choice
NASA Astrophysics Data System (ADS)
Tomaras, Petros; Konstantopoulos, Nikolaos; Zondiros, Dimitris
2007-12-01
This study aims to testify the scientific veracity of the question: is the franchisees' choice on entrepreneurial start-up univocal or many-valued? Two variables are examined by registering daily activities of the entrepreneurial franchisees, as they appear by the answers given to a closed-ended questionnaire. We proceeded with a multiple variable statistical analysis (principal component analysis) of survey data collected from franchisees of a Greece-based franchise system. The results of the research indicate that among different value standards, the entrepreneurs conclude in choosing the franchising.
Differentiation of aflatoxigenic and non-aflatoxigenic strains of Aspergilli by FT-IR spectroscopy.
Atkinson, Curtis; Pechanova, Olga; Sparks, Darrell L; Brown, Ashli; Rodriguez, Jose M
2014-01-01
Fourier transform infrared spectroscopy (FT-IR) is a well-established and widely accepted methodology to identify and differentiate diverse microbial species. In this study, FT-IR was used to differentiate 20 strains of ubiquitous and agronomically important phytopathogens of Aspergillus flavus and Aspergillus parasiticus. By analyzing their spectral profiles via principal component and cluster analysis, differentiation was achieved between the aflatoxin-producing and nonproducing strains of both fungal species. This study thus indicates that FT-IR coupled to multivariate statistics can rapidly differentiate strains of Aspergilli based on their toxigenicity.
Principals' Perceptions Regarding Their Supervision and Evaluation
ERIC Educational Resources Information Center
Hvidston, David J.; Range, Bret G.; McKim, Courtney Ann
2015-01-01
This study examined the perceptions of principals concerning principal evaluation and supervisory feedback. Principals were asked two open-ended questions. Respondents included 82 principals in the Rocky Mountain region. The emerging themes were "Superintendent Performance," "Principal Evaluation Components," "Specific…
Lesiak, Ashton D; Musah, Rabi A
2016-09-01
A continuing challenge in analytical chemistry is species-level determination of the constituents of mixtures that are made of a combination of plant species. There is an added urgency to identify components in botanical mixtures that have mind altering properties, due to the increasing global abuse of combinations of such plants. Here we demonstrate the proof of principle that ambient ionization mass spectrometry, namely direct analysis in real time-high resolution mass spectrometry (DART-HRMS), and statistical analysis tools can be used to rapidly determine the individual components within a psychoactive brew (Ayahuasca) made from a mixture of botanicals. Five plant species used in Ayahuasca preparations were subjected to DART-HRMS analysis. The chemical fingerprint of each was reproducible but unique, thus enabling discrimination between them. The presence of important biomarkers, including N,N-dimethyltryptamine, harmaline and harmine, was confirmed using in-source collision-induced dissociation (CID). Six Ayahuasca brews made from combinations of various plant species were shown to possess a high level of similarity, despite having been made from different constituents. Nevertheless, the application of principal component analysis (PCA) was useful in distinguishing between each of the brews based on the botanical species used in the preparations. From a training set based on 900 individual analyses, three principal components covered 86.38% of the variance, and the leave-one-out cross validation was 98.88%. This is the first report of ambient ionization MS being successfully used for determination of the individual components of plant mixtures. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Voukantsis, Dimitris; Karatzas, Kostas; Kukkonen, Jaakko; Räsänen, Teemu; Karppinen, Ari; Kolehmainen, Mikko
2011-03-01
In this paper we propose a methodology consisting of specific computational intelligence methods, i.e. principal component analysis and artificial neural networks, in order to inter-compare air quality and meteorological data, and to forecast the concentration levels for environmental parameters of interest (air pollutants). We demonstrate these methods to data monitored in the urban areas of Thessaloniki and Helsinki in Greece and Finland, respectively. For this purpose, we applied the principal component analysis method in order to inter-compare the patterns of air pollution in the two selected cities. Then, we proceeded with the development of air quality forecasting models for both studied areas. On this basis, we formulated and employed a novel hybrid scheme in the selection process of input variables for the forecasting models, involving a combination of linear regression and artificial neural networks (multi-layer perceptron) models. The latter ones were used for the forecasting of the daily mean concentrations of PM₁₀ and PM₂.₅ for the next day. Results demonstrated an index of agreement between measured and modelled daily averaged PM₁₀ concentrations, between 0.80 and 0.85, while the kappa index for the forecasting of the daily averaged PM₁₀ concentrations reached 60% for both cities. Compared with previous corresponding studies, these statistical parameters indicate an improved performance of air quality parameters forecasting. It was also found that the performance of the models for the forecasting of the daily mean concentrations of PM₁₀ was not substantially different for both cities, despite the major differences of the two urban environments under consideration. Copyright © 2011 Elsevier B.V. All rights reserved.
Sato, Hiroshi; Ito, Toshiro; Kuroda, Yosuke; Uchiyama, Hiroki; Watanabe, Toshitaka; Yasuda, Naomi; Nakazawa, Junji; Harada, Ryo; Kawaharada, Nobuyoshi
2017-12-01
This study aimed to re-examine the conventional predictive factors for dissected aortic enlargement, such as the aortic and false lumen diameter and to consider whether the morphological elements of the dissected aorta could be predictors by quantifying the 'shape' of the true lumen based on elliptic Fourier analysis. A total of 80 patients with uncomplicated type B aortic dissection were included. The patients were divided into 'Enlargement group' and 'No Change group.' Between the 2 groups, the mean systolic blood pressure during follow-up, aortic and false lumen maximum diameters, and analysed morphological data were compared using each statistical method. The maximum aortic and false lumen diameters were significantly larger in the Enlargement group than in the No Change group (39.3 vs 35.9 mm; P = 0.0058) (23.5 vs 18.2 mm; P = 0.000095). The principal component 1, which is the data calculated by elliptic Fourier analysis, was significantly lower in the Enlargement group than in the No Change group (0.020 vs - 0.072; P = 0.000049). The mean systolic blood pressure ≥130 mmHg, aortic diameter, false lumen diameter and principal component 1 were included in the Cox proportional hazard model as covariates to determine the significant predictive variable. Principal component 1 demonstrated the only significance with aortic enlargement on multivariate analysis (odds ratio = 0.32; P = 0.048). The analysed and calculated morphological data of the shape of the true lumen can be more effective predictive factors of aortic enlargement of type B dissection than the conventional factors. © The Author 2017. Published by Oxford University Press on behalf of the European Association for Cardio-Thoracic Surgery. All rights reserved.
Model based approach to UXO imaging using the time domain electromagnetic method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lavely, E.M.
1999-04-01
Time domain electromagnetic (TDEM) sensors have emerged as a field-worthy technology for UXO detection in a variety of geological and environmental settings. This success has been achieved with commercial equipment that was not optimized for UXO detection and discrimination. The TDEM response displays a rich spatial and temporal behavior which is not currently utilized. Therefore, in this paper the author describes a research program for enhancing the effectiveness of the TDEM method for UXO detection and imaging. Fundamental research is required in at least three major areas: (a) model based imaging capability i.e. the forward and inverse problem, (b) detectormore » modeling and instrument design, and (c) target recognition and discrimination algorithms. These research problems are coupled and demand a unified treatment. For example: (1) the inverse solution depends on solution of the forward problem and knowledge of the instrument response; (2) instrument design with improved diagnostic power requires forward and inverse modeling capability; and (3) improved target recognition algorithms (such as neural nets) must be trained with data collected from the new instrument and with synthetic data computed using the forward model. Further, the design of the appropriate input and output layers of the net will be informed by the results of the forward and inverse modeling. A more fully developed model of the TDEM response would enable the joint inversion of data collected from multiple sensors (e.g., TDEM sensors and magnetometers). Finally, the author suggests that a complementary approach to joint inversions is the statistical recombination of data using principal component analysis. The decomposition into principal components is useful since the first principal component contains those features that are most strongly correlated from image to image.« less
Quantifying Burnout among Emergency Medicine Professionals
Wilson, William; Raj, Jeffrey Pradeep; Narayan, Girish; Ghiya, Murtuza; Murty, Shakuntala; Joseph, Bobby
2017-01-01
Background: Burnout is a syndrome explained as serious emotional depletion with poor adaptation at work due to prolonged occupational stress. It has three principal components namely emotional exhaustion(EE), depersonalization(DP) and diminished feelings of personal accomplishment(PA). Thus, we aimed at measuring the degree of burnout in doctors and nurses working in emergency medicine department (EMD) of 4 select tertiary care teaching hospitals in South India. Methods: A cross sectional survey was conducted among EMD professionals using a 30-item standardized pilot tested questionnaire as well as the Maslach burnout inventory. Univariate and Multivariate analyses were conducted using binary logistic regression models to identify predictors of burnout. Results: Total number of professionals interviewed were 105 of which 71.5% were women and 51.4% were doctors. Majority (78.1%) belonged to the age group 20-30 years. Prevalence of moderate to severe burnout in the 3 principal components EE, DP and PA were 64.8%, 71.4% and 73.3% respectively. After multivariate analysis, the risk factors [adjusted odds ratio (95% confidence intervals) for DP included facing more criticism [3.57(1.25,10.19)], disturbed sleep [6.44(1.45,28.49)] and being short tempered [3.14(1.09,9.09)]. While there were no statistically significant risk factors for EE, being affected by mortality [2.35(1.12,3.94)] and fear of medication errors [3.61(1.26, 10.37)] appeared to be significant predictors of PA. Conclusion: Degree of burn out among doctors and nurses is moderately high in all of the three principal components and some of the predictors identified were criticism, disturbed sleep, short tempered nature, fear of committing errors and witnessing death in EMD. PMID:29097859
Quantifying Burnout among Emergency Medicine Professionals.
Wilson, William; Raj, Jeffrey Pradeep; Narayan, Girish; Ghiya, Murtuza; Murty, Shakuntala; Joseph, Bobby
2017-01-01
Burnout is a syndrome explained as serious emotional depletion with poor adaptation at work due to prolonged occupational stress. It has three principal components namely emotional exhaustion(EE), depersonalization(DP) and diminished feelings of personal accomplishment(PA). Thus, we aimed at measuring the degree of burnout in doctors and nurses working in emergency medicine department (EMD) of 4 select tertiary care teaching hospitals in South India. A cross sectional survey was conducted among EMD professionals using a 30-item standardized pilot tested questionnaire as well as the Maslach burnout inventory. Univariate and Multivariate analyses were conducted using binary logistic regression models to identify predictors of burnout. Total number of professionals interviewed were 105 of which 71.5% were women and 51.4% were doctors. Majority (78.1%) belonged to the age group 20-30 years. Prevalence of moderate to severe burnout in the 3 principal components EE, DP and PA were 64.8%, 71.4% and 73.3% respectively. After multivariate analysis, the risk factors [adjusted odds ratio (95% confidence intervals) for DP included facing more criticism [3.57(1.25,10.19)], disturbed sleep [6.44(1.45,28.49)] and being short tempered [3.14(1.09,9.09)]. While there were no statistically significant risk factors for EE, being affected by mortality [2.35(1.12,3.94)] and fear of medication errors [3.61(1.26, 10.37)] appeared to be significant predictors of PA. Degree of burn out among doctors and nurses is moderately high in all of the three principal components and some of the predictors identified were criticism, disturbed sleep, short tempered nature, fear of committing errors and witnessing death in EMD.
Nguyen, Phuong H
2007-05-15
Principal component analysis is a powerful method for projecting multidimensional conformational space of peptides or proteins onto lower dimensional subspaces in which the main conformations are present, making it easier to reveal the structures of molecules from e.g. molecular dynamics simulation trajectories. However, the identification of all conformational states is still difficult if the subspaces consist of more than two dimensions. This is mainly due to the fact that the principal components are not independent with each other, and states in the subspaces cannot be visualized. In this work, we propose a simple and fast scheme that allows one to obtain all conformational states in the subspaces. The basic idea is that instead of directly identifying the states in the subspace spanned by principal components, we first transform this subspace into another subspace formed by components that are independent of one other. These independent components are obtained from the principal components by employing the independent component analysis method. Because of independence between components, all states in this new subspace are defined as all possible combinations of the states obtained from each single independent component. This makes the conformational analysis much simpler. We test the performance of the method by analyzing the conformations of the glycine tripeptide and the alanine hexapeptide. The analyses show that our method is simple and quickly reveal all conformational states in the subspaces. The folding pathways between the identified states of the alanine hexapeptide are analyzed and discussed in some detail. 2007 Wiley-Liss, Inc.
Liu, Hui-lin; Wan, Xia; Yang, Gong-huan
2013-02-01
To explore the relationship between the strength of tobacco control and the effectiveness of creating smoke-free hospital, and summarize the main factors that affect the program of creating smoke-free hospitals. A total of 210 hospitals from 7 provinces/municipalities directly under the central government were enrolled in this study using stratified random sampling method. Principle component analysis and regression analysis were conducted to analyze the strength of tobacco control and the effectiveness of creating smoke-free hospitals. Two principal components were extracted in the strength of tobacco control index, which respectively reflected the tobacco control policies and efforts, and the willingness and leadership of hospital managers regarding tobacco control. The regression analysis indicated that only the first principal component was significantly correlated with the progression in creating smoke-free hospital (P<0.001), i.e. hospitals with higher scores on the first principal component had better achievements in smoke-free environment creation. Tobacco control policies and efforts are critical in creating smoke-free hospitals. The principal component analysis provides a comprehensive and objective tool for evaluating the creation of smoke-free hospitals.
Critical Factors Explaining the Leadership Performance of High-Performing Principals
ERIC Educational Resources Information Center
Hutton, Disraeli M.
2018-01-01
The study explored critical factors that explain leadership performance of high-performing principals and examined the relationship between these factors based on the ratings of school constituents in the public school system. The principal component analysis with the use of Varimax Rotation revealed that four components explain 51.1% of the…
Molecular dynamics in principal component space.
Michielssens, Servaas; van Erp, Titus S; Kutzner, Carsten; Ceulemans, Arnout; de Groot, Bert L
2012-07-26
A molecular dynamics algorithm in principal component space is presented. It is demonstrated that sampling can be improved without changing the ensemble by assigning masses to the principal components proportional to the inverse square root of the eigenvalues. The setup of the simulation requires no prior knowledge of the system; a short initial MD simulation to extract the eigenvectors and eigenvalues suffices. Independent measures indicated a 6-7 times faster sampling compared to a regular molecular dynamics simulation.
Optimized principal component analysis on coronagraphic images of the fomalhaut system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meshkat, Tiffany; Kenworthy, Matthew A.; Quanz, Sascha P.
We present the results of a study to optimize the principal component analysis (PCA) algorithm for planet detection, a new algorithm complementing angular differential imaging and locally optimized combination of images (LOCI) for increasing the contrast achievable next to a bright star. The stellar point spread function (PSF) is constructed by removing linear combinations of principal components, allowing the flux from an extrasolar planet to shine through. The number of principal components used determines how well the stellar PSF is globally modeled. Using more principal components may decrease the number of speckles in the final image, but also increases themore » background noise. We apply PCA to Fomalhaut Very Large Telescope NaCo images acquired at 4.05 μm with an apodized phase plate. We do not detect any companions, with a model dependent upper mass limit of 13-18 M {sub Jup} from 4-10 AU. PCA achieves greater sensitivity than the LOCI algorithm for the Fomalhaut coronagraphic data by up to 1 mag. We make several adaptations to the PCA code and determine which of these prove the most effective at maximizing the signal-to-noise from a planet very close to its parent star. We demonstrate that optimizing the number of principal components used in PCA proves most effective for pulling out a planet signal.« less
Mjørud, Marit; Kirkevold, Marit; Røsvik, Janne; Engedal, Knut
2014-01-01
To investigate which factors the Quality of Life in Late-Stage Dementia (QUALID) scale holds when used among people with dementia (pwd) in nursing homes and to find out how the symptom load varies across the different severity levels of dementia. We included 661 pwd [mean age ± SD, 85.3 ± 8.6 years; 71.4% women]. The QUALID and the Clinical Dementia Rating (CDR) scale were applied. A principal component analysis (PCA) with varimax rotation and Kaiser normalization was applied to test the factor structure. Nonparametric analyses were applied to examine differences of symptom load across the three CDR groups. The mean QUALID score was 21.5 (±7.1), and the CDR scores of the three groups were 1 in 22.5%, 2 in 33.6% and 3 in 43.9%. The results of the statistical measures employed were the following: Crohnbach's α of QUALID, 0.74; Bartlett's test of sphericity, p <0.001; the Kaiser-Meyer-Olkin measure, 0.77. The PCA analysis resulted in three components accounting for 53% of the variance. The first component was 'tension' ('facial expression of discomfort', 'appears physically uncomfortable', 'verbalization suggests discomfort', 'being irritable and aggressive', 'appears calm', Crohnbach's α = 0.69), the second was 'well-being' ('smiles', 'enjoys eating', 'enjoys touching/being touched', 'enjoys social interaction', Crohnbach's α = 0.62) and the third was 'sadness' ('appears sad', 'cries', 'facial expression of discomfort', Crohnbach's α 0.65). The mean score on the components 'tension' and 'well-being' increased significantly with increasing severity levels of dementia. Three components of quality of life (qol) were identified. Qol decreased with increasing severity of dementia. © 2013 S. Karger AG, Basel.
Kaj, Mónika; Saint-Maurice, Pedro F; Karsai, István; Vass, Zoltán; Csányi, Tamás; Boronyai, Zoltán; Révész, László
2015-06-26
The purpose of this study was to create a physical education (PE) attitude scale and examine how it is associated with aerobic capacity (AC). Participants (n = 961, aged 15-20 years) were randomly selected from 26 Hungarian high schools. AC was estimated from performance on the Progressive Aerobic Cardiovascular and Endurance Run test, and the attitude scale had 31 items measured on a Likert scale that ranged from 1 to 5. Principal component analysis was used to examine the structure of the questionnaire while several 2-way analyses of variance and multiple linear regression (MLR) were computed to examine trends in AC and test the association between component scores obtained from the attitude scale and AC, respectively. Five components were identified: health orientation in PE (C1), avoid failure in PE (C2), success orientation in PE (C3), attitude toward physical activity (C4), and cooperation and social experience in PE (C5). There was a statistically significant main effect for sex on C3 (p < .05), C4 (p < .001), and C5 (p < .05) indicating that boys' scores were higher than girls. The Sex × Age interaction was also statistically significant (p < .05) and follow-up comparisons revealed sex differences in C5 for 15-year-old participants. Girls showed statistically significant higher values than boys in C5 at the age of 16 years. MLR results revealed that component scores were significantly associated with AC (p < .05). Statistically significant predictors included C1, C2, C3, and C4 for boys and C2 and C4 for girls. The 5-component scale seems to be suitable for measuring students' attitudes toward PE. The design of the study does not allow for direct associations between attitudes and AC but suggests that these 2 might be related.
Chemotypes of Pistacia atlantica leaf essential oils from Algeria.
Gourine, Nadhir; Bombarda, Isabelle; Yousfi, Mohamed; Gaydou, Emile M
2010-01-01
The essential oils obtained by hydrodistillation of Pistacia atlantica Desf. leaves collected from different regions of Algeria were analyzed by GC and GC-MS. The essential oil was rich in monoterpenes and oxygenated sesquiterpenes. The major components were alpha-pinene (0.0-67%), delta-3-carene (0.0-56%), spathulenol (0.5-22%), camphene (0.0-21%), terpinen-4-ol (0.0-16%) and beta-pinene (0.0-13%). Among the various components identified, twenty were used for statistical analyses. The result of principal component analysis (PCA) showed the occurrence of three chemotypes: a delta-3-carene chemotype (16.4-56.2%), a terpinen-4-ol chemotype (10.8-16.0%) and an alpha-pinene/camphene chemotype (10.9-66.6%/3.8-20.9%). It was found that the essential oil from female plants (delta-3-carene chemotype) could be easily differentiated from the two other chemotypes corresponding to male trees.
Functional data analysis of sleeping energy expenditure.
Lee, Jong Soo; Zakeri, Issa F; Butte, Nancy F
2017-01-01
Adequate sleep is crucial during childhood for metabolic health, and physical and cognitive development. Inadequate sleep can disrupt metabolic homeostasis and alter sleeping energy expenditure (SEE). Functional data analysis methods were applied to SEE data to elucidate the population structure of SEE and to discriminate SEE between obese and non-obese children. Minute-by-minute SEE in 109 children, ages 5-18, was measured in room respiration calorimeters. A smoothing spline method was applied to the calorimetric data to extract the true smoothing function for each subject. Functional principal component analysis was used to capture the important modes of variation of the functional data and to identify differences in SEE patterns. Combinations of functional principal component analysis and classifier algorithm were used to classify SEE. Smoothing effectively removed instrumentation noise inherent in the room calorimeter data, providing more accurate data for analysis of the dynamics of SEE. SEE exhibited declining but subtly undulating patterns throughout the night. Mean SEE was markedly higher in obese than non-obese children, as expected due to their greater body mass. SEE was higher among the obese than non-obese children (p<0.01); however, the weight-adjusted mean SEE was not statistically different (p>0.1, after post hoc testing). Functional principal component scores for the first two components explained 77.8% of the variance in SEE and also differed between groups (p = 0.037). Logistic regression, support vector machine or random forest classification methods were able to distinguish weight-adjusted SEE between obese and non-obese participants with good classification rates (62-64%). Our results implicate other factors, yet to be uncovered, that affect the weight-adjusted SEE of obese and non-obese children. Functional data analysis revealed differences in the structure of SEE between obese and non-obese children that may contribute to disruption of metabolic homeostasis.
Imamura, Ryota; Murata, Naoki; Shimanouchi, Toshinori; Yamashita, Kaoru; Fukuzawa, Masayuki; Noda, Minoru
2017-01-01
A new fluorescent arrayed biosensor has been developed to discriminate species and concentrations of target proteins by using plural different phospholipid liposome species encapsulating fluorescent molecules, utilizing differences in permeation of the fluorescent molecules through the membrane to modulate liposome-target protein interactions. This approach proposes a basically new label-free fluorescent sensor, compared with the common technique of developed fluorescent array sensors with labeling. We have confirmed a high output intensity of fluorescence emission related to characteristics of the fluorescent molecules dependent on their concentrations when they leak from inside the liposomes through the perturbed lipid membrane. After taking an array image of the fluorescence emission from the sensor using a CMOS imager, the output intensities of the fluorescence were analyzed by a principal component analysis (PCA) statistical method. It is found from PCA plots that different protein species with several concentrations were successfully discriminated by using the different lipid membranes with high cumulative contribution ratio. We also confirmed that the accuracy of the discrimination by the array sensor with a single shot is higher than that of a single sensor with multiple shots. PMID:28714873
Imamura, Ryota; Murata, Naoki; Shimanouchi, Toshinori; Yamashita, Kaoru; Fukuzawa, Masayuki; Noda, Minoru
2017-07-15
A new fluorescent arrayed biosensor has been developed to discriminate species and concentrations of target proteins by using plural different phospholipid liposome species encapsulating fluorescent molecules, utilizing differences in permeation of the fluorescent molecules through the membrane to modulate liposome-target protein interactions. This approach proposes a basically new label-free fluorescent sensor, compared with the common technique of developed fluorescent array sensors with labeling. We have confirmed a high output intensity of fluorescence emission related to characteristics of the fluorescent molecules dependent on their concentrations when they leak from inside the liposomes through the perturbed lipid membrane. After taking an array image of the fluorescence emission from the sensor using a CMOS imager, the output intensities of the fluorescence were analyzed by a principal component analysis (PCA) statistical method. It is found from PCA plots that different protein species with several concentrations were successfully discriminated by using the different lipid membranes with high cumulative contribution ratio. We also confirmed that the accuracy of the discrimination by the array sensor with a single shot is higher than that of a single sensor with multiple shots.
Zhang, Wanfeng; Zhu, Shukui; He, Sheng; Wang, Yanxin
2015-02-06
Using comprehensive two-dimensional gas chromatography coupled to time-of-flight mass spectrometry (GC×GC/TOFMS), volatile and semi-volatile organic compounds in crude oil samples from different reservoirs or regions were analyzed for the development of a molecular fingerprint database. Based on the GC×GC/TOFMS fingerprints of crude oils, principal component analysis (PCA) and cluster analysis were used to distinguish the oil sources and find biomarkers. As a supervised technique, the geological characteristics of crude oils, including thermal maturity, sedimentary environment etc., are assigned to the principal components. The results show that tri-aromatic steroid (TAS) series are the suitable marker compounds in crude oils for the oil screening, and the relative abundances of individual TAS compounds have excellent correlation with oil sources. In order to correct the effects of some other external factors except oil sources, the variables were defined as the content ratio of some target compounds and 13 parameters were proposed for the screening of oil sources. With the developed model, the crude oils were easily discriminated, and the result is in good agreement with the practical geological setting. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Ogruc Ildiz, G.; Arslan, M.; Unsalan, O.; Araujo-Andrade, C.; Kurt, E.; Karatepe, H. T.; Yilmaz, A.; Yalcinkaya, O. B.; Herken, H.
2016-01-01
In this study, a methodology based on Fourier-transform infrared spectroscopy and principal component analysis and partial least square methods is proposed for the analysis of blood plasma samples in order to identify spectral changes correlated with some biomarkers associated with schizophrenia and bipolarity. Our main goal was to use the spectral information for the calibration of statistical models to discriminate and classify blood plasma samples belonging to bipolar and schizophrenic patients. IR spectra of 30 samples of blood plasma obtained from each, bipolar and schizophrenic patients and healthy control group were collected. The results obtained from principal component analysis (PCA) show a clear discrimination between the bipolar (BP), schizophrenic (SZ) and control group' (CG) blood samples that also give possibility to identify three main regions that show the major differences correlated with both mental disorders (biomarkers). Furthermore, a model for the classification of the blood samples was calibrated using partial least square discriminant analysis (PLS-DA), allowing the correct classification of BP, SZ and CG samples. The results obtained applying this methodology suggest that it can be used as a complimentary diagnostic tool for the detection and discrimination of these mental diseases.
The fine-scale genetic structure and evolution of the Japanese population.
Takeuchi, Fumihiko; Katsuya, Tomohiro; Kimura, Ryosuke; Nabika, Toru; Isomura, Minoru; Ohkubo, Takayoshi; Tabara, Yasuharu; Yamamoto, Ken; Yokota, Mitsuhiro; Liu, Xuanyao; Saw, Woei-Yuh; Mamatyusupu, Dolikun; Yang, Wenjun; Xu, Shuhua; Teo, Yik-Ying; Kato, Norihiro
2017-01-01
The contemporary Japanese populations largely consist of three genetically distinct groups-Hondo, Ryukyu and Ainu. By principal-component analysis, while the three groups can be clearly separated, the Hondo people, comprising 99% of the Japanese, form one almost indistinguishable cluster. To understand fine-scale genetic structure, we applied powerful haplotype-based statistical methods to genome-wide single nucleotide polymorphism data from 1600 Japanese individuals, sampled from eight distinct regions in Japan. We then combined the Japanese data with 26 other Asian populations data to analyze the shared ancestry and genetic differentiation. We found that the Japanese could be separated into nine genetic clusters in our dataset, showing a marked concordance with geography; and that major components of ancestry profile of Japanese were from the Korean and Han Chinese clusters. We also detected and dated admixture in the Japanese. While genetic differentiation between Ryukyu and Hondo was suggested to be caused in part by positive selection, genetic differentiation among the Hondo clusters appeared to result principally from genetic drift. Notably, in Asians, we found the possibility that positive selection accentuated genetic differentiation among distant populations but attenuated genetic differentiation among close populations. These findings are significant for studies of human evolution and medical genetics.
NASA Technical Reports Server (NTRS)
Rampe, E. B.; Lanza, N. L.
2012-01-01
Orbital near-infrared (NIR) reflectance spectra of the martian surface from the OMEGA and CRISM instruments have identified a variety of phyllosilicates in Noachian terrains. The types of phyllosilicates present on Mars have important implications for the aqueous environments in which they formed, and, thus, for recognizing locales that may have been habitable. Current identifications of phyllosilicates from martian NIR data are based on the positions of spectral absorptions relative to laboratory data of well-characterized samples and from spectral ratios; however, some phyllosilicates can be difficult to distinguish from one another with these methods (i.e. illite vs. muscovite). Here we employ a multivariate statistical technique, principal component analysis (PCA), to differentiate between spectrally similar phyllosilicate minerals. PCA is commonly used in a variety of industries (pharmaceutical, agricultural, viticultural) to discriminate between samples. Previous work using PCA to analyze raw NIR reflectance data from mineral mixtures has shown that this is a viable technique for identifying mineral types, abundances, and particle sizes. Here, we evaluate PCA of second-derivative NIR reflectance data as a method for classifying phyllosilicates and test whether this method can be used to identify phyllosilicates on Mars.
Quantification of intensity variations in functional MR images using rotated principal components
NASA Astrophysics Data System (ADS)
Backfrieder, W.; Baumgartner, R.; Sámal, M.; Moser, E.; Bergmann, H.
1996-08-01
In functional MRI (fMRI), the changes in cerebral haemodynamics related to stimulated neural brain activity are measured using standard clinical MR equipment. Small intensity variations in fMRI data have to be detected and distinguished from non-neural effects by careful image analysis. Based on multivariate statistics we describe an algorithm involving oblique rotation of the most significant principal components for an estimation of the temporal and spatial distribution of the stimulated neural activity over the whole image matrix. This algorithm takes advantage of strong local signal variations. A mathematical phantom was designed to generate simulated data for the evaluation of the method. In simulation experiments, the potential of the method to quantify small intensity changes, especially when processing data sets containing multiple sources of signal variations, was demonstrated. In vivo fMRI data collected in both visual and motor stimulation experiments were analysed, showing a proper location of the activated cortical regions within well known neural centres and an accurate extraction of the activation time profile. The suggested method yields accurate absolute quantification of in vivo brain activity without the need of extensive prior knowledge and user interaction.
A Novel Acoustic Sensor Approach to Classify Seeds Based on Sound Absorption Spectra
Gasso-Tortajada, Vicent; Ward, Alastair J.; Mansur, Hasib; Brøchner, Torben; Sørensen, Claus G.; Green, Ole
2010-01-01
A non-destructive and novel in situ acoustic sensor approach based on the sound absorption spectra was developed for identifying and classifying different seed types. The absorption coefficient spectra were determined by using the impedance tube measurement method. Subsequently, a multivariate statistical analysis, i.e., principal component analysis (PCA), was performed as a way to generate a classification of the seeds based on the soft independent modelling of class analogy (SIMCA) method. The results show that the sound absorption coefficient spectra of different seed types present characteristic patterns which are highly dependent on seed size and shape. In general, seed particle size and sphericity were inversely related with the absorption coefficient. PCA presented reliable grouping capabilities within the diverse seed types, since the 95% of the total spectral variance was described by the first two principal components. Furthermore, the SIMCA classification model based on the absorption spectra achieved optimal results as 100% of the evaluation samples were correctly classified. This study contains the initial structuring of an innovative method that will present new possibilities in agriculture and industry for classifying and determining physical properties of seeds and other materials. PMID:22163455
Hirayama, Jun-ichiro; Hyvärinen, Aapo; Kiviniemi, Vesa; Kawanabe, Motoaki; Yamashita, Okito
2016-01-01
Characterizing the variability of resting-state functional brain connectivity across subjects and/or over time has recently attracted much attention. Principal component analysis (PCA) serves as a fundamental statistical technique for such analyses. However, performing PCA on high-dimensional connectivity matrices yields complicated “eigenconnectivity” patterns, for which systematic interpretation is a challenging issue. Here, we overcome this issue with a novel constrained PCA method for connectivity matrices by extending the idea of the previously proposed orthogonal connectivity factorization method. Our new method, modular connectivity factorization (MCF), explicitly introduces the modularity of brain networks as a parametric constraint on eigenconnectivity matrices. In particular, MCF analyzes the variability in both intra- and inter-module connectivities, simultaneously finding network modules in a principled, data-driven manner. The parametric constraint provides a compact module-based visualization scheme with which the result can be intuitively interpreted. We develop an optimization algorithm to solve the constrained PCA problem and validate our method in simulation studies and with a resting-state functional connectivity MRI dataset of 986 subjects. The results show that the proposed MCF method successfully reveals the underlying modular eigenconnectivity patterns in more general situations and is a promising alternative to existing methods. PMID:28002474
A SAR and QSAR study of new artemisinin compounds with antimalarial activity.
Santos, Cleydson Breno R; Vieira, Josinete B; Lobato, Cleison C; Hage-Melim, Lorane I S; Souto, Raimundo N P; Lima, Clarissa S; Costa, Elizabeth V M; Brasil, Davi S B; Macêdo, Williams Jorge C; Carvalho, José Carlos T
2013-12-30
The Hartree-Fock method and the 6-31G** basis set were employed to calculate the molecular properties of artemisinin and 20 derivatives with antimalarial activity. Maps of molecular electrostatic potential (MEPs) and molecular docking were used to investigate the interaction between ligands and the receptor (heme). Principal component analysis and hierarchical cluster analysis were employed to select the most important descriptors related to activity. The correlation between biological activity and molecular properties was obtained using the partial least squares and principal component regression methods. The regression PLS and PCR models built in this study were also used to predict the antimalarial activity of 30 new artemisinin compounds with unknown activity. The models obtained showed not only statistical significance but also predictive ability. The significant molecular descriptors related to the compounds with antimalarial activity were the hydration energy (HE), the charge on the O11 oxygen atom (QO11), the torsion angle O1-O2-Fe-N2 (D2) and the maximum rate of R/Sanderson Electronegativity (RTe+). These variables led to a physical and structural explanation of the molecular properties that should be selected for when designing new ligands to be used as antimalarial agents.
Macro policy responses to oil booms and busts in the United Arab Emirates
DOE Office of Scientific and Technical Information (OSTI.GOV)
Al-Mutawa, A.K.
1991-01-01
The effects of oil shocks and macro policy changes in the United Arab Emirates are analyzed. A theoretical model is developed within the framework of the Dutch Disease literature. It contains four unique features that are applicable to the United Arab Emirates' economy. There are: (1) the presence of a large foreign labor force; (2) OPEC's oil export quotas; (3) the division of oil profits; and (4) the important role of government expenditures. The model is then used to examine the welfare effects of the above-mentioned shocks. An econometric model is then specified that conforms to the analytical model. Inmore » the econometric model the method of principal components' is applied owing to the undersized sample data. The principal components methodology is used in both the identification testing and the estimation of the structural equations. The oil and macro policy shocks are then simulated. The simulation results show that an oil-quantity boom leads to a higher welfare gain than an oil-price boom. Under certain circumstances, this finding is also confirmed by the comparative statistics that follow from the analytical model.« less
Sand/cement ratio evaluation on mortar using neural networks and ultrasonic transmission inspection.
Molero, M; Segura, I; Izquierdo, M A G; Fuente, J V; Anaya, J J
2009-02-01
The quality and degradation state of building materials can be determined by nondestructive testing (NDT). These materials are composed of a cementitious matrix and particles or fragments of aggregates. Sand/cement ratio (s/c) provides the final material quality; however, the sand content can mask the matrix properties in a nondestructive measurement. Therefore, s/c ratio estimation is needed in nondestructive characterization of cementitious materials. In this study, a methodology to classify the sand content in mortar is presented. The methodology is based on ultrasonic transmission inspection, data reduction, and features extraction by principal components analysis (PCA), and neural network classification. This evaluation is carried out with several mortar samples, which were made while taking into account different cement types and s/c ratios. The estimated s/c ratio is determined by ultrasonic spectral attenuation with three different broadband transducers (0.5, 1, and 2 MHz). Statistical PCA to reduce the dimension of the captured traces has been applied. Feed-forward neural networks (NNs) are trained using principal components (PCs) and their outputs are used to display the estimated s/c ratios in false color images, showing the s/c ratio distribution of the mortar samples.
How multi segmental patterns deviate in spastic diplegia from typical developed.
Zago, Matteo; Sforza, Chiarella; Bona, Alessia; Cimolin, Veronica; Costici, Pier Francesco; Condoluci, Claudia; Galli, Manuela
2017-10-01
The relationship between gait features and coordination in children with Cerebral Palsy is not sufficiently analyzed yet. Principal Component Analysis can help in understanding motion patterns decomposing movement into its fundamental components (Principal Movements). This study aims at quantitatively characterizing the functional connections between multi-joint gait patterns in Cerebral Palsy. 65 children with spastic diplegia aged 10.6 (SD 3.7) years participated in standardized gait analysis trials; 31 typically developing adolescents aged 13.6 (4.4) years were also tested. To determine if posture affects gait patterns, patients were split into Crouch and knee Hyperextension group according to knee flexion angle at standing. 3D coordinates of hips, knees, ankles, metatarsal joints, pelvis and shoulders were submitted to Principal Component Analysis. Four Principal Movements accounted for 99% of global variance; components 1-3 explained major sagittal patterns, components 4-5 referred to movements on frontal plane and component 6 to additional movement refinements. Dimensionality was higher in patients than in controls (p<0.01), and the Crouch group significantly differed from controls in the application of components 1 and 4-6 (p<0.05), while the knee Hyperextension group in components 1-2 and 5 (p<0.05). Compensatory strategies of children with Cerebral Palsy (interactions between main and secondary movement patterns), were objectively determined. Principal Movements can reduce the effort in interpreting gait reports, providing an immediate and quantitative picture of the connections between movement components. Copyright © 2017 Elsevier Ltd. All rights reserved.
Constrained Principal Component Analysis: Various Applications.
ERIC Educational Resources Information Center
Hunter, Michael; Takane, Yoshio
2002-01-01
Provides example applications of constrained principal component analysis (CPCA) that illustrate the method on a variety of contexts common to psychological research. Two new analyses, decompositions into finer components and fitting higher order structures, are presented, followed by an illustration of CPCA on contingency tables and the CPCA of…
Friedman, David B
2012-01-01
All quantitative proteomics experiments measure variation between samples. When performing large-scale experiments that involve multiple conditions or treatments, the experimental design should include the appropriate number of individual biological replicates from each condition to enable the distinction between a relevant biological signal from technical noise. Multivariate statistical analyses, such as principal component analysis (PCA), provide a global perspective on experimental variation, thereby enabling the assessment of whether the variation describes the expected biological signal or the unanticipated technical/biological noise inherent in the system. Examples will be shown from high-resolution multivariable DIGE experiments where PCA was instrumental in demonstrating biologically significant variation as well as sample outliers, fouled samples, and overriding technical variation that would not be readily observed using standard univariate tests.
Lonni, Audrey Alesandra Stinghen Garcia; Longhini, Renata; Lopes, Gisely Cristiny; de Mello, João Carlos Palazzo; Scarminio, Ieda Spacino
2012-03-16
Statistical design mixtures of water, methanol, acetone and ethanol were used to extract material from Trichilia catigua (Meliaceae) barks to study the effects of different solvents and their mixtures on its yield, total polyphenol content and antioxidant activity. The experimental results and their response surface models showed that quaternary mixtures with approximately equal proportions of all four solvents provided the highest yields, total polyphenol contents and antioxidant activities of the crude extracts followed by ternary design mixtures. Principal component and hierarchical clustering analysis of the HPLC-DAD spectra of the chromatographic peaks of 1:1:1:1 water-methanol-acetone-ethanol mixture extracts indicate the presence of cinchonains, gallic acid derivatives, natural polyphenols, flavanoids, catechins, and epicatechins. Copyright © 2011 Elsevier B.V. All rights reserved.
Lu, Yuan-Chiao; Untaroiu, Costin D
2013-09-01
During car collisions, the shoulder belt exposes the occupant's clavicle to large loading conditions which often leads to a bone fracture. To better understand the geometric variability of clavicular cortical bone which may influence its injury tolerance, twenty human clavicles were evaluated using statistical shape analysis. The interior and exterior clavicular cortical bone surfaces were reconstructed from CT-scan images. Registration between one selected template and the remaining 19 clavicle models was conducted to remove translation and rotation differences. The correspondences of landmarks between the models were then established using coordinates and surface normals. Three registration methods were compared: the LM-ICP method; the global method; and the SHREC method. The LM-ICP registration method showed better performance than the global and SHREC registration methods, in terms of compactness, generalization, and specificity. The first four principal components obtained by using the LM-ICP registration method account for 61% and 67% of the overall anatomical variation for the exterior and interior cortical bone shapes, respectively. The length was found to be the most significant variation mode of the human clavicle. The mean and two boundary shape models were created using the four most significant principal components to investigate the size and shape variation of clavicular cortical bone. In the future, boundary shape models could be used to develop probabilistic finite element models which may help to better understand the variability in biomechanical responses and injuries to the clavicle. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Transport in the Subtropical Lowermost Stratosphere during CRYSTAL-FACE
NASA Technical Reports Server (NTRS)
Pittman, Jasna V.; Weinstock, elliot M.; Oglesby, Robert J.; Sayres, David S.; Smith, Jessica B.; Anderson, James G.; Cooper, Owen R.; Wofsy, Steven C.; Xueref, Irene; Gerbig, Cristoph;
2007-01-01
We use in situ measurements of water vapor (H2O), ozone (O3), carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), and total reactive nitrogen (NO(y)) obtained during the CRYSTAL-FACE campaign in July 2002 to study summertime transport in the subtropical lowermost stratosphere. We use an objective methodology to distinguish the latitudinal origin of the sampled air masses despite the influence of convection, and we calculate backward trajectories to elucidate their recent geographical history. The methodology consists of exploring the statistical behavior of the data by performing multivariate clustering and agglomerative hierarchical clustering calculations, and projecting cluster groups onto principal component space to identify air masses of like composition and hence presumed origin. The statistically derived cluster groups are then examined in physical space using tracer-tracer correlation plots. Interpretation of the principal component analysis suggests that the variability in the data is accounted for primarily by the mean age of air in the stratosphere, followed by the age of the convective influence, and lastly by the extent of convective influence, potentially related to the latitude of convective injection [Dessler and Sherwuud, 2004]. We find that high-latitude stratospheric air is the dominant source region during the beginning of the campaign while tropical air is the dominant source region during the rest of the campaign. Influence of convection from both local and non-local events is frequently observed. The identification of air mass origin is confirmed with backward trajectories, and the behavior of the trajectories is associated with the North American monsoon circulation.
NASA Astrophysics Data System (ADS)
Hoell, Simon; Omenzetter, Piotr
2015-03-01
The development of large wind turbines that enable to harvest energy more efficiently is a consequence of the increasing demand for renewables in the world. To optimize the potential energy output, light and flexible wind turbine blades (WTBs) are designed. However, the higher flexibilities and lower buckling capacities adversely affect the long-term safety and reliability of WTBs, and thus the increased operation and maintenance costs reduce the expected revenue. Effective structural health monitoring techniques can help to counteract this by limiting inspection efforts and avoiding unplanned maintenance actions. Vibration-based methods deserve high attention due to the moderate instrumentation efforts and the applicability for in-service measurements. The present paper proposes the use of cross-correlations (CCs) of acceleration responses between sensors at different locations for structural damage detection in WTBs. CCs were in the past successfully applied for damage detection in numerical and experimental beam structures while utilizing only single lags between the signals. The present approach uses vectors of CC coefficients for multiple lags between measurements of two selected sensors taken from multiple possible combinations of sensors. To reduce the dimensionality of the damage sensitive feature (DSF) vectors, principal component analysis is performed. The optimal number of principal components (PCs) is chosen with respect to a statistical threshold. Finally, the detection phase uses the selected PCs of the healthy structure to calculate scores from a current DSF vector, where statistical hypothesis testing is performed for making a decision about the current structural state. The method is applied to laboratory experiments conducted on a small WTB with non-destructive damage scenarios.
The linkage between geopotential height and monthly precipitation in Iran
NASA Astrophysics Data System (ADS)
Shirvani, Amin; Fadaei, Amir Sabetan; Landman, Willem A.
2018-04-01
This paper investigates the linkage between large-scale atmospheric circulation and monthly precipitation during November to April over Iran. Canonical correlation analysis (CCA) is used to set up the statistical linkage between the 850 hPa geopotential height large-scale circulation and monthly precipitation over Iran for the period 1968-2010. The monthly precipitation dataset for 50 synoptic stations distributed in different climate regions of Iran is considered as the response variable in the CCA. The monthly geopotential height reanalysis dataset over an area between 10° N and 60° N and from 20° E to 80° E is utilized as the explanatory variable in the CCA. Principal component analysis (PCA) as a pre-filter is used for data reduction for both explanatory and response variables before applying CCA. The optimal number of principal components and canonical variables to be retained in the CCA equations is determined using the highest average cross-validated Kendall's tau value. The 850 hPa geopotential height pattern over the Red Sea, Saudi Arabia, and Persian Gulf is found to be the major pattern related to Iranian monthly precipitation. The Pearson correlation between the area averaged of the observed and predicted precipitation over the study area for Jan, Feb, March, April, November, and December months are statistically significant at the 5% significance level and are 0.78, 0.80, 0.82, 0.74, 0.79, and 0.61, respectively. The relative operating characteristic (ROC) indicates that the highest scores for the above- and below-normal precipitation categories are, respectively, for February and April and the lowest scores found for December.
NASA Technical Reports Server (NTRS)
Pittman, Jasna V.; Weinstock, Elliot M.; Oglesby, Robert J.; Sayres, David S.; Smith, Jessica B.; Anderson, James G.; Cooper, Owen R.; Wofsy, Steven C.; Xueref, Irene; Gerbig, Cristoph;
2007-01-01
We use in situ measurements of water vapor (H2O), ozone (O3), carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), and total reactive nitrogen (NOy) obtained during the CRYSTAL-FACE campaign in July 2002 to study summertime transport in the subtropical lowermost stratosphere. We use an objective methodology to distinguish the latitudinal origin of the sampled air masses despite the influence of convection, and we calculate backward trajectories to elucidate their recent geographical history. The methodology consists of exploring the statistical behavior of the data by performing multivariate clustering and agglomerative hierarchical clustering calculations and projecting cluster groups onto principal component space to identify air masses of like composition and hence presumed origin. The statistically derived cluster groups are then examined in physical space using tracer-tracer correlation plots. Interpretation of the principal component analysis suggests that the variability in the data is accounted for primarily by the mean age of air in the stratosphere, followed by the age of the convective influence, and last by the extent of convective influence, potentially related to the latitude of convective injection (Dessler and Sherwood, 2004). We find that high-latitude stratospheric air is the dominant source region during the beginning of the campaign while tropical air is the dominant source region during the rest of the campaign. Influence of convection from both local and nonlocal events is frequently observed. The identification of air mass origin is confirmed with backward trajectories, and the behavior of the trajectories is associated with the North American monsoon circulation.
Quantifying Individual Brain Connectivity with Functional Principal Component Analysis for Networks.
Petersen, Alexander; Zhao, Jianyang; Carmichael, Owen; Müller, Hans-Georg
2016-09-01
In typical functional connectivity studies, connections between voxels or regions in the brain are represented as edges in a network. Networks for different subjects are constructed at a given graph density and are summarized by some network measure such as path length. Examining these summary measures for many density values yields samples of connectivity curves, one for each individual. This has led to the adoption of basic tools of functional data analysis, most commonly to compare control and disease groups through the average curves in each group. Such group differences, however, neglect the variability in the sample of connectivity curves. In this article, the use of functional principal component analysis (FPCA) is demonstrated to enrich functional connectivity studies by providing increased power and flexibility for statistical inference. Specifically, individual connectivity curves are related to individual characteristics such as age and measures of cognitive function, thus providing a tool to relate brain connectivity with these variables at the individual level. This individual level analysis opens a new perspective that goes beyond previous group level comparisons. Using a large data set of resting-state functional magnetic resonance imaging scans, relationships between connectivity and two measures of cognitive function-episodic memory and executive function-were investigated. The group-based approach was implemented by dichotomizing the continuous cognitive variable and testing for group differences, resulting in no statistically significant findings. To demonstrate the new approach, FPCA was implemented, followed by linear regression models with cognitive scores as responses, identifying significant associations of connectivity in the right middle temporal region with both cognitive scores.
Karain, Wael I
2017-11-28
Proteins undergo conformational transitions over different time scales. These transitions are closely intertwined with the protein's function. Numerous standard techniques such as principal component analysis are used to detect these transitions in molecular dynamics simulations. In this work, we add a new method that has the ability to detect transitions in dynamics based on the recurrences in the dynamical system. It combines bootstrapping and recurrence quantification analysis. We start from the assumption that a protein has a "baseline" recurrence structure over a given period of time. Any statistically significant deviation from this recurrence structure, as inferred from complexity measures provided by recurrence quantification analysis, is considered a transition in the dynamics of the protein. We apply this technique to a 132 ns long molecular dynamics simulation of the β-Lactamase Inhibitory Protein BLIP. We are able to detect conformational transitions in the nanosecond range in the recurrence dynamics of the BLIP protein during the simulation. The results compare favorably to those extracted using the principal component analysis technique. The recurrence quantification analysis based bootstrap technique is able to detect transitions between different dynamics states for a protein over different time scales. It is not limited to linear dynamics regimes, and can be generalized to any time scale. It also has the potential to be used to cluster frames in molecular dynamics trajectories according to the nature of their recurrence dynamics. One shortcoming for this method is the need to have large enough time windows to insure good statistical quality for the recurrence complexity measures needed to detect the transitions.
Sacks, Jason D; Ito, Kazuhiko; Wilson, William E; Neas, Lucas M
2012-10-01
With the advent of multicity studies, uniform statistical approaches have been developed to examine air pollution-mortality associations across cities. To assess the sensitivity of the air pollution-mortality association to different model specifications in a single and multipollutant context, the authors applied various regression models developed in previous multicity time-series studies of air pollution and mortality to data from Philadelphia, Pennsylvania (May 1992-September 1995). Single-pollutant analyses used daily cardiovascular mortality, fine particulate matter (particles with an aerodynamic diameter ≤2.5 µm; PM(2.5)), speciated PM(2.5), and gaseous pollutant data, while multipollutant analyses used source factors identified through principal component analysis. In single-pollutant analyses, risk estimates were relatively consistent across models for most PM(2.5) components and gaseous pollutants. However, risk estimates were inconsistent for ozone in all-year and warm-season analyses. Principal component analysis yielded factors with species associated with traffic, crustal material, residual oil, and coal. Risk estimates for these factors exhibited less sensitivity to alternative regression models compared with single-pollutant models. Factors associated with traffic and crustal material showed consistently positive associations in the warm season, while the coal combustion factor showed consistently positive associations in the cold season. Overall, mortality risk estimates examined using a source-oriented approach yielded more stable and precise risk estimates, compared with single-pollutant analyses.
Spiric, Aurelija; Trbovic, Dejana; Vranic, Danijela; Djinovic, Jasna; Petronijevic, Radivoj; Matekalo-Sverak, Vesna
2010-07-05
Studies performed on lipid extraction from animal and fish tissues do not provide information on its influence on fatty acid composition of the extracted lipids as well as on cholesterol content. Data presented in this paper indicate the impact of extraction procedures on fatty acid profile of fish lipids extracted by the modified Soxhlet and ASE (accelerated solvent extraction) procedure. Cholesterol was also determined by direct saponification method, too. Student's paired t-test used for comparison of the total fat content in carp fish population obtained by two extraction methods shows that differences between values of the total fat content determined by ASE and modified Soxhlet method are not statistically significant. Values obtained by three different methods (direct saponification, ASE and modified Soxhlet method), used for determination of cholesterol content in carp, were compared by one-way analysis of variance (ANOVA). The obtained results show that modified Soxhlet method gives results which differ significantly from the results obtained by direct saponification and ASE method. However the results obtained by direct saponification and ASE method do not differ significantly from each other. The highest quantities for cholesterol (37.65 to 65.44 mg/100 g) in the analyzed fish muscle were obtained by applying direct saponification method, as less destructive one, followed by ASE (34.16 to 52.60 mg/100 g) and modified Soxhlet extraction method (10.73 to 30.83 mg/100 g). Modified Soxhlet method for extraction of fish lipids gives higher values for n-6 fatty acids than ASE method (t(paired)=3.22 t(c)=2.36), while there is no statistically significant difference in the n-3 content levels between the methods (t(paired)=1.31). The UNSFA/SFA ratio obtained by using modified Soxhlet method is also higher than the ratio obtained using ASE method (t(paired)=4.88 t(c)=2.36). Results of Principal Component Analysis (PCA) showed that the highest positive impact to the second principal component (PC2) is recorded by C18:3 n-3, and C20:3 n-6, being present in a higher amount in the samples treated by the modified Soxhlet extraction, while C22:5 n-3, C20:3 n-3, C22:1 and C20:4, C16 and C18 negatively influence the score values of the PC2, showing significantly increased level in the samples treated by ASE method. Hotelling's paired T-square test used on the first three principal components for confirmation of differences in individual fatty acid content obtained by ASE and Soxhlet method in carp muscle showed statistically significant difference between these two data sets (T(2)=161.308, p<0.001). Copyright 2010 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Ginanjar, Irlandia; Pasaribu, Udjianna S.; Indratno, Sapto W.
2017-03-01
This article presents the application of the principal component analysis (PCA) biplot for the needs of data mining. This article aims to simplify and objectify the methods for objects clustering in PCA biplot. The novelty of this paper is to get a measure that can be used to objectify the objects clustering in PCA biplot. Orthonormal eigenvectors, which are the coefficients of a principal component model representing an association between principal components and initial variables. The existence of the association is a valid ground to objects clustering based on principal axes value, thus if m principal axes used in the PCA, then the objects can be classified into 2m clusters. The inter-city buses are clustered based on maintenance costs data by using two principal axes PCA biplot. The buses are clustered into four groups. The first group is the buses with high maintenance costs, especially for lube, and brake canvass. The second group is the buses with high maintenance costs, especially for tire, and filter. The third group is the buses with low maintenance costs, especially for lube, and brake canvass. The fourth group is buses with low maintenance costs, especially for tire, and filter.
Psychometric properties of the Generalized Anxiety Disorder Inventory in a Canadian sample.
Henderson, Leigh C; Antony, Martin M; Koerner, Naomi
2014-05-01
The Generalized Anxiety Disorder Inventory is a recently developed self-report measure that assesses symptoms of generalized anxiety disorder. Its psychometric properties have not been investigated further since its original development. The current study investigated its psychometric properties in a Canadian student/community sample. Exploratory principal component analysis replicated the original three-component structure. The total scale and subscales demonstrated excellent internal consistency reliability (α = 0.84-0.94) and correlated strongly with the Penn State Worry Questionnaire (r = 0.41-0.74, all ps <0.001) and Generalized Anxiety Disorder-7 (r = 0.55-0.84, all ps <0.001). However, only the total scale and cognitive subscale (r = 0.48-0.49, all ps <0.05) significantly predicted generalized anxiety disorder diagnosis established by diagnostic interview. The somatic subscale in particular may require revision to improve predictive validity. Revision may also be necessary given changes in required somatic symptoms for generalized anxiety disorder diagnostic criteria in more recent versions of the Diagnostic and Statistical Manual of Mental Disorders (i.e. although major changes occurred from Diagnostic and Statistical Manual of Mental Disorders-III-R to Diagnostic and Statistical Manual of Mental Disorders-IV, changes in Diagnostic and Statistical Manual of Mental Disorders-5 were minimal) and the possibility of changes in the upcoming 11th revision of the International Classification of Diseases.
Kakio, Tomoko; Nagase, Hitomi; Takaoka, Takashi; Yoshida, Naoko; Hirakawa, Junichi; Macha, Susan; Hiroshima, Takashi; Ikeda, Yukihiro; Tsuboi, Hirohito; Kimura, Kazuko
2018-06-01
The World Health Organization has warned that substandard and falsified medical products (SFs) can harm patients and fail to treat the diseases for which they were intended, and they affect every region of the world, leading to loss of confidence in medicines, health-care providers, and health systems. Therefore, development of analytical procedures to detect SFs is extremely important. In this study, we investigated the quality of pharmaceutical tablets containing the antihypertensive candesartan cilexetil, collected in China, Indonesia, Japan, and Myanmar, using the Japanese pharmacopeial analytical procedures for quality control, together with principal component analysis (PCA) of Raman spectrum obtained with handheld Raman spectrometer. Some samples showed delayed dissolution and failed to meet the pharmacopeial specification, whereas others failed the assay test. These products appeared to be substandard. Principal component analysis showed that all Raman spectra could be explained in terms of two components: the amount of the active pharmaceutical ingredient and the kinds of excipients. Principal component analysis score plot indicated one substandard, and the falsified tablets have similar principal components in Raman spectra, in contrast to authentic products. The locations of samples within the PCA score plot varied according to the source country, suggesting that manufacturers in different countries use different excipients. Our results indicate that the handheld Raman device will be useful for detection of SFs in the field. Principal component analysis of that Raman data clarify the difference in chemical properties between good quality products and SFs that circulate in the Asian market.
Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees.
Nye, Tom M W; Tang, Xiaoxian; Weyenberg, Grady; Yoshida, Ruriko
2017-12-01
Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample's structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the [Formula: see text]th principal component in Euclidean space: the locus of the weighted Fréchet mean of [Formula: see text] vertex trees when the weights vary over the [Formula: see text]-simplex. We establish some basic properties of these objects, in particular showing that they have dimension [Formula: see text], and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.
An Independent Filter for Gene Set Testing Based on Spectral Enrichment.
Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H
2015-01-01
Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.
Meyer, Karin; Kirkpatrick, Mark
2005-01-01
Principal component analysis is a widely used 'dimension reduction' technique, albeit generally at a phenotypic level. It is shown that we can estimate genetic principal components directly through a simple reparameterisation of the usual linear, mixed model. This is applicable to any analysis fitting multiple, correlated genetic effects, whether effects for individual traits or sets of random regression coefficients to model trajectories. Depending on the magnitude of genetic correlation, a subset of the principal component generally suffices to capture the bulk of genetic variation. Corresponding estimates of genetic covariance matrices are more parsimonious, have reduced rank and are smoothed, with the number of parameters required to model the dispersion structure reduced from k(k + 1)/2 to m(2k - m + 1)/2 for k effects and m principal components. Estimation of these parameters, the largest eigenvalues and pertaining eigenvectors of the genetic covariance matrix, via restricted maximum likelihood using derivatives of the likelihood, is described. It is shown that reduced rank estimation can reduce computational requirements of multivariate analyses substantially. An application to the analysis of eight traits recorded via live ultrasound scanning of beef cattle is given. PMID:15588566
Morin, R.H.
1997-01-01
Returns from drilling in unconsolidated cobble and sand aquifers commonly do not identify lithologic changes that may be meaningful for Hydrogeologic investigations. Vertical resolution of saturated, Quaternary, coarse braided-slream deposits is significantly improved by interpreting natural gamma (G), epithermal neutron (N), and electromagnetically induced resistivity (IR) logs obtained from wells at the Capital Station site in Boise, Idaho. Interpretation of these geophysical logs is simplified because these sediments are derived largely from high-gamma-producing source rocks (granitics of the Boise River drainage), contain few clays, and have undergone little diagenesis. Analysis of G, N, and IR data from these deposits with principal components analysis provides an objective means to determine if units can be recognized within the braided-stream deposits. In particular, performing principal components analysis on G, N, and IR data from eight wells at Capital Station (1) allows the variable system dimensionality to be reduced from three to two by selecting the two eigenvectors with the greatest variance as axes for principal component scatterplots, (2) generates principal components with interpretable physical meanings, (3) distinguishes sand from cobble-dominated units, and (4) provides a means to distinguish between cobble-dominated units.
Analysis and Evaluation of the Characteristic Taste Components in Portobello Mushroom.
Wang, Jinbin; Li, Wen; Li, Zhengpeng; Wu, Wenhui; Tang, Xueming
2018-05-10
To identify the characteristic taste components of the common cultivated mushroom (brown; Portobello), Agaricus bisporus, taste components in the stipe and pileus of Portobello mushroom harvested at different growth stages were extracted and identified, and principal component analysis (PCA) and taste active value (TAV) were used to reveal the characteristic taste components during the each of the growth stages of Portobello mushroom. In the stipe and pileus, 20 and 14 different principal taste components were identified, respectively, and they were considered as the principal taste components of Portobello mushroom fruit bodies, which included most amino acids and 5'-nucleotides. Some taste components that were found at high levels, such as lactic acid and citric acid, were not detected as Portobello mushroom principal taste components through PCA. However, due to their high content, Portobello mushroom could be used as a source of organic acids. The PCA and TAV results revealed that 5'-GMP, glutamic acid, malic acid, alanine, proline, leucine, and aspartic acid were the characteristic taste components of Portobello mushroom fruit bodies. Portobello mushroom was also found to be rich in protein and amino acids, so it might also be useful in the formulation of nutraceuticals and functional food. The results in this article could provide a theoretical basis for understanding and regulating the characteristic flavor components synthesis process of Portobello mushroom. © 2018 Institute of Food Technologists®.
NASA Astrophysics Data System (ADS)
Kistenev, Yu. V.; Shapovalov, A. V.; Borisov, A. V.; Vrazhnov, D. A.; Nikolaev, V. V.; Nikiforova, O. Y.
2015-12-01
The results of numerical simulation of application principal component analysis to absorption spectra of breath air of patients with pulmonary diseases are presented. Various methods of experimental data preprocessing are analyzed.
Trends and interannual variability of mass and steric sea level in the Tropical Asian Seas
NASA Astrophysics Data System (ADS)
Kleinherenbrink, Marcel; Riva, Riccardo; Frederikse, Thomas; Merrifield, Mark; Wada, Yoshihide
2017-08-01
The mass and steric components of sea level changes have been separated in the Tropical Asian Seas (TAS) using a statistically optimal combination of Jason satellite altimetry, GRACE satellite gravimetry, and ocean reanalyses. Using observational uncertainties, statistically optimally weighted time series for both components have been obtained in four regions within the TAS over the period January 2005 to December 2012. The mass and steric sea level variability is regressed with the first two principal components (PC1&2) of Pacific equatorial wind stress and the Dipole Mode Index (DMI). Sea level in the South China Sea is not affected by any of the indices. Steric variability in the TAS is largest in the deep Banda and Celebes seas and is affected by both PCs and the DMI. Mass variability is largest on the continental shelves, which is primarily controlled by PC1. We argue that a water flux from the Western Tropical Pacific Ocean is the cause for mass variability in the TAS. The steric trends are about 2 mm yr-1 larger than the mass trends in the TAS. A significant part of the mass trend can be explained by the aforementioned indices and the nodal cycle. Trends obtained from fingerprints of mass redistribution are statistically equal to mass trends after subtracting the nodal cycle and the indices. Ultimately, the effect of omitting the TAS in global sea level budgets is estimated to be 0.3 mm yr-1.
Big-Data RHEED analysis for understanding epitaxial film growth processes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vasudevan, Rama K; Tselev, Alexander; Baddorf, Arthur P
Reflection high energy electron diffraction (RHEED) has by now become a standard tool for in-situ monitoring of film growth by pulsed laser deposition and molecular beam epitaxy. Yet despite the widespread adoption and wealth of information in RHEED image, most applications are limited to observing intensity oscillations of the specular spot, and much additional information on growth is discarded. With ease of data acquisition and increased computation speeds, statistical methods to rapidly mine the dataset are now feasible. Here, we develop such an approach to the analysis of the fundamental growth processes through multivariate statistical analysis of RHEED image sequence.more » This approach is illustrated for growth of LaxCa1-xMnO3 films grown on etched (001) SrTiO3 substrates, but is universal. The multivariate methods including principal component analysis and k-means clustering provide insight into the relevant behaviors, the timing and nature of a disordered to ordered growth change, and highlight statistically significant patterns. Fourier analysis yields the harmonic components of the signal and allows separation of the relevant components and baselines, isolating the assymetric nature of the step density function and the transmission spots from the imperfect layer-by-layer (LBL) growth. These studies show the promise of big data approaches to obtaining more insight into film properties during and after epitaxial film growth. Furthermore, these studies open the pathway to use forward prediction methods to potentially allow significantly more control over growth process and hence final film quality.« less
Cole, Jacqueline M.; Cheng, Xie; Payne, Michael C.
2016-10-18
The use of principal component analysis (PCA) to statistically infer features of local structure from experimental pair distribution function (PDF) data is assessed on a case study of rare-earth phosphate glasses (REPGs). Such glasses, co-doped with two rare-earth ions (R and R’) of different sizes and optical properties, are of interest to the laser industry. The determination of structure-property relationships in these materials is an important aspect of their technological development. Yet, realizing the local structure of co-doped REPGs presents significant challenges relative to their singly-doped counterparts; specifically, R and R’ are difficult to distinguish in terms of establishing relativemore » material compositions, identifying atomic pairwise correlation profiles in a PDF that are associated with each ion, and resolving peak overlap of such profiles in PDFs. This study demonstrates that PCA can be employed to help overcome these structural complications, by statistically inferring trends in PDFs that exist for a restricted set of experimental data on REPGs, and using these as training data to predict material compositions and PDF profiles in unknown co-doped REPGs. The application of these PCA methods to resolve individual atomic pairwise correlations in t(r) signatures is also presented. The training methods developed for these structural predictions are pre-validated by testing their ability to reproduce known physical phenomena, such as the lanthanide contraction, on PDF signatures of the structurally simpler singly-doped REPGs. The intrinsic limitations of applying PCA to analyze PDFs relative to the quality control of source data, data processing, and sample definition, are also considered. Furthermore, while this case study is limited to lanthanide-doped REPGs, this type of statistical inference may easily be extended to other inorganic solid-state materials, and be exploited in large-scale data-mining efforts that probe many t(r) functions.« less
Butaciu, Sinziana; Senila, Marin; Sarbu, Costel; Ponta, Michaela; Tanaselia, Claudiu; Cadar, Oana; Roman, Marius; Radu, Emil; Sima, Mihaela; Frentiu, Tiberiu
2017-04-01
The study proposes a combined model based on diagrams (Gibbs, Piper, Stuyfzand Hydrogeochemical Classification System) and unsupervised statistical approaches (Cluster Analysis, Principal Component Analysis, Fuzzy Principal Component Analysis, Fuzzy Hierarchical Cross-Clustering) to describe natural enrichment of inorganic arsenic and co-occurring species in groundwater in the Banat Plain, southwestern Romania. Speciation of inorganic As (arsenite, arsenate), ion concentrations (Na + , K + , Ca 2+ , Mg 2+ , HCO 3 - , Cl - , F - , SO 4 2- , PO 4 3- , NO 3 - ), pH, redox potential, conductivity and total dissolved substances were performed. Classical diagrams provided the hydrochemical characterization, while statistical approaches were helpful to establish (i) the mechanism of naturally occurring of As and F - species and the anthropogenic one for NO 3 - , SO 4 2- , PO 4 3- and K + and (ii) classification of groundwater based on content of arsenic species. The HCO 3 - type of local groundwater and alkaline pH (8.31-8.49) were found to be responsible for the enrichment of arsenic species and occurrence of F - but by different paths. The PO 4 3- -AsO 4 3- ion exchange, water-rock interaction (silicates hydrolysis and desorption from clay) were associated to arsenate enrichment in the oxidizing aquifer. Fuzzy Hierarchical Cross-Clustering was the strongest tool for the rapid simultaneous classification of groundwaters as a function of arsenic content and hydrogeochemical characteristics. The approach indicated the Na + -F - -pH cluster as marker for groundwater with naturally elevated As and highlighted which parameters need to be monitored. A chemical conceptual model illustrating the natural and anthropogenic paths and enrichment of As and co-occurring species in the local groundwater supported by mineralogical analysis of rocks was established. Copyright © 2016 Elsevier Ltd. All rights reserved.
Gharekhan, Anita H; Arora, Siddharth; Oza, Ashok N; Sureshkumar, Mundan B; Pradhan, Asima; Panigrahi, Prasanta K
2011-08-01
Using the multiresolution ability of wavelets and effectiveness of singular value decomposition (SVD) to identify statistically robust parameters, we find a number of local and global features, capturing spectral correlations in the co- and cross-polarized channels, at different scales (of human breast tissues). The copolarized component, being sensitive to intrinsic fluorescence, shows different behavior for normal, benign, and cancerous tissues, in the emission domain of known fluorophores, whereas the perpendicular component, being more prone to the diffusive effect of scattering, points out differences in the Kernel-Smoother density estimate employed to the principal components, between malignant, normal, and benign tissues. The eigenvectors, corresponding to the dominant eigenvalues of the correlation matrix in SVD, also exhibit significant differences between the three tissue types, which clearly reflects the differences in the spectral correlation behavior. Interestingly, the most significant distinguishing feature manifests in the perpendicular component, corresponding to porphyrin emission range in the cancerous tissue. The fact that perpendicular component is strongly influenced by depolarization, and porphyrin emissions in cancerous tissue has been found to be strongly depolarized, may be the possible cause of the above observation.
NASA Astrophysics Data System (ADS)
Xu, M. L.; Yu, Y.; Ramaswamy, H. S.; Zhu, S. M.
2017-01-01
Chinese liquor aroma components were characterized during the aging process using gas chromatography (GC). Principal component and cluster analysis (PCA, CA) were used to discriminate the Chinese liquor age which has a great economic value. Of a total of 21 major aroma components identified and quantified, 13 components which included several acids, alcohols, esters, aldehydes and furans decreased significantly in the first year of aging, maintained the same levels (p > 0.05) for next three years and decreased again (p < 0.05) in the fifth year. On the contrary, a significant increase was observed in propionic acid, furfural and phenylethanol. Ethyl lactate was found to be the most stable aroma component during aging process. Results of PCA and CA demonstrated that young liquor (fresh) and aged liquors were well separated from each other, which is in consistent with the evolution of aroma components along with the aging process. These findings provide a quantitative basis for discriminating the Chinese liquor age and a scientific basis for further research on elucidating the liquor aging process, and a possible tool to guard against counterfeit and defective products.
NASA Astrophysics Data System (ADS)
Mendez, F. J.; Rueda, A.; Barnard, P.; Mori, N.; Nakajo, S.; Espejo, A.; del Jesus, M.; Diez Sierra, J.; Cofino, A. S.; Camus, P.
2016-02-01
Hurricanes hitting California have a very low ocurrence probability due to typically cool ocean temperature and westward tracks. However, damages associated to these improbable events would be dramatic in Southern California and understanding the oceanographic and atmospheric drivers is of paramount importance for coastal risk management for present and future climates. A statistical analysis of the historical events is very difficult due to the limited resolution of atmospheric and oceanographic forcing data available. In this work, we propose a combination of: (a) statistical downscaling methods (Espejo et al, 2015); and (b) a synthetic stochastic tropical cyclone (TC) model (Nakajo et al, 2014). To build the statistical downscaling model, Y=f(X), we apply a combination of principal component analysis and the k-means classification algorithm to find representative patterns from a potential TC index derived from large-scale SST fields in Eastern Central Pacific (predictor X) and the associated tropical cyclone ocurrence (predictand Y). SST data comes from NOAA Extended Reconstructed SST V3b providing information from 1854 to 2013 on a 2.0 degree x 2.0 degree global grid. As data for the historical occurrence and paths of tropical cycloneas are scarce, we apply a stochastic TC model which is based on a Monte Carlo simulation of the joint distribution of track, minimum sea level pressure and translation speed of the historical events in the Eastern Central Pacific Ocean. Results will show the ability of the approach to explain seasonal-to-interannual variability of the predictor X, which is clearly related to El Niño Southern Oscillation. References Espejo, A., Méndez, F.J., Diez, J., Medina, R., Al-Yahyai, S. (2015) Seasonal probabilistic forecasting of tropical cyclone activity in the North Indian Ocean, Journal of Flood Risk Management, DOI: 10.1111/jfr3.12197 Nakajo, S., N. Mori, T. Yasuda, and H. Mase (2014) Global Stochastic Tropical Cyclone Model Based on Principal Component Analysis and Cluster Analysis, Journal of Applied Meteorology and Climatology, DOI: 10.1175/JAMC-D-13-08.1
Facial anthropometric differences among gender, ethnicity, and age groups.
Zhuang, Ziqing; Landsittel, Douglas; Benson, Stacey; Roberge, Raymond; Shaffer, Ronald
2010-06-01
The impact of race/ethnicity upon facial anthropometric data in the US workforce, on the development of personal protective equipment, has not been investigated to any significant degree. The proliferation of minority populations in the US workforce has increased the need to investigate differences in facial dimensions among these workers. The objective of this study was to determine the face shape and size differences among race and age groups from the National Institute for Occupational Safety and Health survey of 3997 US civilian workers. Survey participants were divided into two gender groups, four racial/ethnic groups, and three age groups. Measurements of height, weight, neck circumference, and 18 facial dimensions were collected using traditional anthropometric techniques. A multivariate analysis of the data was performed using Principal Component Analysis. An exploratory analysis to determine the effect of different demographic factors had on anthropometric features was assessed via a linear model. The 21 anthropometric measurements, body mass index, and the first and second principal component scores were dependent variables, while gender, ethnicity, age, occupation, weight, and height served as independent variables. Gender significantly contributes to size for 19 of 24 dependent variables. African-Americans have statistically shorter, wider, and shallower noses than Caucasians. Hispanic workers have 14 facial features that are significantly larger than Caucasians, while their nose protrusion, height, and head length are significantly shorter. The other ethnic group was composed primarily of Asian subjects and has statistically different dimensions from Caucasians for 16 anthropometric values. Nineteen anthropometric values for subjects at least 45 years of age are statistically different from those measured for subjects between 18 and 29 years of age. Workers employed in manufacturing, fire fighting, healthcare, law enforcement, and other occupational groups have facial features that differ significantly than those in construction. Statistically significant differences in facial anthropometric dimensions (P < 0.05) were noted between males and females, all racial/ethnic groups, and the subjects who were at least 45 years old when compared to workers between 18 and 29 years of age. These findings could be important to the design and manufacture of respirators, as well as employers responsible for supplying respiratory protective equipment to their employees.
ERIC Educational Resources Information Center
Mugrage, Beverly; And Others
Three ridge regression solutions are compared with ordinary least squares regression and with principal components regression using all components. Ridge regression, particularly the Lawless-Wang solution, out-performed ordinary least squares regression and the principal components solution on the criteria of stability of coefficient and closeness…
A Note on McDonald's Generalization of Principal Components Analysis
ERIC Educational Resources Information Center
Shine, Lester C., II
1972-01-01
It is shown that McDonald's generalization of Classical Principal Components Analysis to groups of variables maximally channels the totalvariance of the original variables through the groups of variables acting as groups. An equation is obtained for determining the vectors of correlations of the L2 components with the original variables.…
Peterson, Leif E
2002-01-01
CLUSFAVOR (CLUSter and Factor Analysis with Varimax Orthogonal Rotation) 5.0 is a Windows-based computer program for hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles. CLUSFAVOR 5.0 standardizes input data; sorts data according to gene-specific coefficient of variation, standard deviation, average and total expression, and Shannon entropy; performs hierarchical cluster analysis using nearest-neighbor, unweighted pair-group method using arithmetic averages (UPGMA), or furthest-neighbor joining methods, and Euclidean, correlation, or jack-knife distances; and performs principal-component analysis. PMID:12184816
Balakrishnan, Poojitha; Vaidya, Dhananjay; Franceschini, Nora; Voruganti, V. Saroja; Gribble, Matthew O.; Haack, Karin; Laston, Sandra; Umans, Jason G.; Francesconi, Kevin A.; Goessler, Walter; North, Kari E.; Lee, Elisa; Yracheta, Joseph; Best, Lyle G.; MacCluer, Jean W.; Kent, Jack; Cole, Shelley A.; Navas-Acien, Ana
2016-01-01
Background: Metabolism of inorganic arsenic (iAs) is subject to inter-individual variability, which is explained partly by genetic determinants. Objectives: We investigated the association of genetic variants with arsenic species and principal components of arsenic species in the Strong Heart Family Study (SHFS). Methods: We examined variants previously associated with cardiometabolic traits (~ 200,000 from Illumina Cardio MetaboChip) or arsenic metabolism and toxicity (670) among 2,428 American Indian participants in the SHFS. Urine arsenic species were measured by high performance liquid chromatography–inductively coupled plasma mass spectrometry (HPLC-ICP-MS), and percent arsenic species [iAs, monomethylarsonate (MMA), and dimethylarsinate (DMA), divided by their sum × 100] were logit transformed. We created two orthogonal principal components that summarized iAs, MMA, and DMA and were also phenotypes for genetic analyses. Linear regression was performed for each phenotype, dependent on allele dosage of the variant. Models accounted for familial relatedness and were adjusted for age, sex, total arsenic levels, and population stratification. Single nucleotide polymorphism (SNP) associations were stratified by study site and were meta-analyzed. Bonferroni correction was used to account for multiple testing. Results: Variants at 10q24 were statistically significant for all percent arsenic species and principal components of arsenic species. The index SNP for iAs%, MMA%, and DMA% (rs12768205) and for the principal components (rs3740394, rs3740393) were located near AS3MT, whose gene product catalyzes methylation of iAs to MMA and DMA. Among the candidate arsenic variant associations, functional SNPs in AS3MT and 10q24 were most significant (p < 9.33 × 10–5). Conclusions: This hypothesis-driven association study supports the role of common variants in arsenic metabolism, particularly AS3MT and 10q24. Citation: Balakrishnan P, Vaidya D, Franceschini N, Voruganti VS, Gribble MO, Haack K, Laston S, Umans JG, Francesconi KA, Goessler W, North KE, Lee E, Yracheta J, Best LG, MacCluer JW, Kent J Jr., Cole SA, Navas-Acien A. 2017. Association of cardiometabolic genes with arsenic metabolism biomarkers in American Indian communities: the Strong Heart Family Study (SHFS). Environ Health Perspect 125:15–22; http://dx.doi.org/10.1289/EHP251 PMID:27352405
Balakrishnan, Poojitha; Vaidya, Dhananjay; Franceschini, Nora; Voruganti, V Saroja; Gribble, Matthew O; Haack, Karin; Laston, Sandra; Umans, Jason G; Francesconi, Kevin A; Goessler, Walter; North, Kari E; Lee, Elisa; Yracheta, Joseph; Best, Lyle G; MacCluer, Jean W; Kent, Jack; Cole, Shelley A; Navas-Acien, Ana
2017-01-01
Metabolism of inorganic arsenic (iAs) is subject to inter-individual variability, which is explained partly by genetic determinants. We investigated the association of genetic variants with arsenic species and principal components of arsenic species in the Strong Heart Family Study (SHFS). We examined variants previously associated with cardiometabolic traits (~ 200,000 from Illumina Cardio MetaboChip) or arsenic metabolism and toxicity (670) among 2,428 American Indian participants in the SHFS. Urine arsenic species were measured by high performance liquid chromatography-inductively coupled plasma mass spectrometry (HPLC-ICP-MS), and percent arsenic species [iAs, monomethylarsonate (MMA), and dimethylarsinate (DMA), divided by their sum × 100] were logit transformed. We created two orthogonal principal components that summarized iAs, MMA, and DMA and were also phenotypes for genetic analyses. Linear regression was performed for each phenotype, dependent on allele dosage of the variant. Models accounted for familial relatedness and were adjusted for age, sex, total arsenic levels, and population stratification. Single nucleotide polymorphism (SNP) associations were stratified by study site and were meta-analyzed. Bonferroni correction was used to account for multiple testing. Variants at 10q24 were statistically significant for all percent arsenic species and principal components of arsenic species. The index SNP for iAs%, MMA%, and DMA% (rs12768205) and for the principal components (rs3740394, rs3740393) were located near AS3MT, whose gene product catalyzes methylation of iAs to MMA and DMA. Among the candidate arsenic variant associations, functional SNPs in AS3MT and 10q24 were most significant (p < 9.33 × 10-5). This hypothesis-driven association study supports the role of common variants in arsenic metabolism, particularly AS3MT and 10q24. Citation: Balakrishnan P, Vaidya D, Franceschini N, Voruganti VS, Gribble MO, Haack K, Laston S, Umans JG, Francesconi KA, Goessler W, North KE, Lee E, Yracheta J, Best LG, MacCluer JW, Kent J Jr., Cole SA, Navas-Acien A. 2017. Association of cardiometabolic genes with arsenic metabolism biomarkers in American Indian communities: the Strong Heart Family Study (SHFS). Environ Health Perspect 125:15-22; http://dx.doi.org/10.1289/EHP251.
The Complexity of Human Walking: A Knee Osteoarthritis Study
Kotti, Margarita; Duffell, Lynsey D.; Faisal, Aldo A.; McGregor, Alison H.
2014-01-01
This study proposes a framework for deconstructing complex walking patterns to create a simple principal component space before checking whether the projection to this space is suitable for identifying changes from the normality. We focus on knee osteoarthritis, the most common knee joint disease and the second leading cause of disability. Knee osteoarthritis affects over 250 million people worldwide. The motivation for projecting the highly dimensional movements to a lower dimensional and simpler space is our belief that motor behaviour can be understood by identifying a simplicity via projection to a low principal component space, which may reflect upon the underlying mechanism. To study this, we recruited 180 subjects, 47 of which reported that they had knee osteoarthritis. They were asked to walk several times along a walkway equipped with two force plates that capture their ground reaction forces along 3 axes, namely vertical, anterior-posterior, and medio-lateral, at 1000 Hz. Data when the subject does not clearly strike the force plate were excluded, leaving 1–3 gait cycles per subject. To examine the complexity of human walking, we applied dimensionality reduction via Probabilistic Principal Component Analysis. The first principal component explains 34% of the variance in the data, whereas over 80% of the variance is explained by 8 principal components or more. This proves the complexity of the underlying structure of the ground reaction forces. To examine if our musculoskeletal system generates movements that are distinguishable between normal and pathological subjects in a low dimensional principal component space, we applied a Bayes classifier. For the tested cross-validated, subject-independent experimental protocol, the classification accuracy equals 82.62%. Also, a novel complexity measure is proposed, which can be used as an objective index to facilitate clinical decision making. This measure proves that knee osteoarthritis subjects exhibit more variability in the two-dimensional principal component space. PMID:25232949
Spectral discrimination of serum from liver cancer and liver cirrhosis using Raman spectroscopy
NASA Astrophysics Data System (ADS)
Yang, Tianyue; Li, Xiaozhou; Yu, Ting; Sun, Ruomin; Li, Siqi
2011-07-01
In this paper, Raman spectra of human serum were measured using Raman spectroscopy, then the spectra was analyzed by multivariate statistical methods of principal component analysis (PCA). Then linear discriminant analysis (LDA) was utilized to differentiate the loading score of different diseases as the diagnosing algorithm. Artificial neural network (ANN) was used for cross-validation. The diagnosis sensitivity and specificity by PCA-LDA are 88% and 79%, while that of the PCA-ANN are 89% and 95%. It can be seen that modern analyzing method is a useful tool for the analysis of serum spectra for diagnosing diseases.
The bio-optical properties of CDOM as descriptor of lake stratification.
Bracchini, Luca; Dattilo, Arduino Massimo; Hull, Vincent; Loiselle, Steven Arthur; Martini, Silvia; Rossi, Claudio; Santinelli, Chiara; Seritti, Alfredo
2006-11-01
Multivariate statistical techniques are used to demonstrate the fundamental role of CDOM optical properties in the description of water masses during the summer stratification of a deep lake. PC1 was linked with dissolved species and PC2 with suspended particles. In the first principal component that the role of CDOM bio-optical properties give a better description of the stratification of the Salto Lake with respect to temperature. The proposed multivariate approach can be used for the analysis of different stratified aquatic ecosystems in relation to interaction between bio-optical properties and stratification of the water body.
Principal Components Analysis of a JWST NIRSpec Detector Subsystem
NASA Technical Reports Server (NTRS)
Arendt, Richard G.; Fixsen, D. J.; Greenhouse, Matthew A.; Lander, Matthew; Lindler, Don; Loose, Markus; Moseley, S. H.; Mott, D. Brent; Rauscher, Bernard J.; Wen, Yiting;
2013-01-01
We present principal component analysis (PCA) of a flight-representative James Webb Space Telescope NearInfrared Spectrograph (NIRSpec) Detector Subsystem. Although our results are specific to NIRSpec and its T - 40 K SIDECAR ASICs and 5 m cutoff H2RG detector arrays, the underlying technical approach is more general. We describe how we measured the systems response to small environmental perturbations by modulating a set of bias voltages and temperature. We used this information to compute the systems principal noise components. Together with information from the astronomical scene, we show how the zeroth principal component can be used to calibrate out the effects of small thermal and electrical instabilities to produce cosmetically cleaner images with significantly less correlated noise. Alternatively, if one were designing a new instrument, one could use a similar PCA approach to inform a set of environmental requirements (temperature stability, electrical stability, etc.) that enabled the planned instrument to meet performance requirements
Let's Keep Our Quality School Principals on the Job
ERIC Educational Resources Information Center
Norton, M. Scott
2003-01-01
Research studies strongly support the fact that the leadership of the school principal impacts directly on the climate of the school and, in turn, on student achievement. National statistics relating to principal turnover and dwindling supplies of qualified replacements show clearly that principal turnover has reached crisis proportions.…
NASA Astrophysics Data System (ADS)
Lim, Hoong-Ta; Murukeshan, Vadakke Matham
2017-06-01
Hyperspectral imaging combines imaging and spectroscopy to provide detailed spectral information for each spatial point in the image. This gives a three-dimensional spatial-spatial-spectral datacube with hundreds of spectral images. Probe-based hyperspectral imaging systems have been developed so that they can be used in regions where conventional table-top platforms would find it difficult to access. A fiber bundle, which is made up of specially-arranged optical fibers, has recently been developed and integrated with a spectrograph-based hyperspectral imager. This forms a snapshot hyperspectral imaging probe, which is able to form a datacube using the information from each scan. Compared to the other configurations, which require sequential scanning to form a datacube, the snapshot configuration is preferred in real-time applications where motion artifacts and pixel misregistration can be minimized. Principal component analysis is a dimension-reducing technique that can be applied in hyperspectral imaging to convert the spectral information into uncorrelated variables known as principal components. A confidence ellipse can be used to define the region of each class in the principal component feature space and for classification. This paper demonstrates the use of the snapshot hyperspectral imaging probe to acquire data from samples of different colors. The spectral library of each sample was acquired and then analyzed using principal component analysis. Confidence ellipse was then applied to the principal components of each sample and used as the classification criteria. The results show that the applied analysis can be used to perform classification of the spectral data acquired using the snapshot hyperspectral imaging probe.
Pepper seed variety identification based on visible/near-infrared spectral technology
NASA Astrophysics Data System (ADS)
Li, Cuiling; Wang, Xiu; Meng, Zhijun; Fan, Pengfei; Cai, Jichen
2016-11-01
Pepper is a kind of important fruit vegetable, with the expansion of pepper hybrid planting area, detection of pepper seed purity is especially important. This research used visible/near infrared (VIS/NIR) spectral technology to detect the variety of single pepper seed, and chose hybrid pepper seeds "Zhuo Jiao NO.3", "Zhuo Jiao NO.4" and "Zhuo Jiao NO.5" as research sample. VIS/NIR spectral data of 80 "Zhuo Jiao NO.3", 80 "Zhuo Jiao NO.4" and 80 "Zhuo Jiao NO.5" pepper seeds were collected, and the original spectral data was pretreated with standard normal variable (SNV) transform, first derivative (FD), and Savitzky-Golay (SG) convolution smoothing methods. Principal component analysis (PCA) method was adopted to reduce the dimension of the spectral data and extract principal components, according to the distribution of the first principal component (PC1) along with the second principal component(PC2) in the twodimensional plane, similarly, the distribution of PC1 coupled with the third principal component(PC3), and the distribution of PC2 combined with PC3, distribution areas of three varieties of pepper seeds were divided in each twodimensional plane, and the discriminant accuracy of PCA was tested through observing the distribution area of samples' principal components in validation set. This study combined PCA and linear discriminant analysis (LDA) to identify single pepper seed varieties, results showed that with the FD preprocessing method, the discriminant accuracy of pepper seed varieties was 98% for validation set, it concludes that using VIS/NIR spectral technology is feasible for identification of single pepper seed varieties.
Long, J.M.; Fisher, W.L.
2006-01-01
We present a method for spatial interpretation of environmental variation in a reservoir that integrates principal components analysis (PCA) of environmental data with geographic information systems (GIS). To illustrate our method, we used data from a Great Plains reservoir (Skiatook Lake, Oklahoma) with longitudinal variation in physicochemical conditions. We measured 18 physicochemical features, mapped them using GIS, and then calculated and interpreted four principal components. Principal component 1 (PC1) was readily interpreted as longitudinal variation in water chemistry, but the other principal components (PC2-4) were difficult to interpret. Site scores for PC1-4 were calculated in GIS by summing weighted overlays of the 18 measured environmental variables, with the factor loadings from the PCA as the weights. PC1-4 were then ordered into a landscape hierarchy, an emergent property of this technique, which enabled their interpretation. PC1 was interpreted as a reservoir scale change in water chemistry, PC2 was a microhabitat variable of rip-rap substrate, PC3 identified coves/embayments and PC4 consisted of shoreline microhabitats related to slope. The use of GIS improved our ability to interpret the more obscure principal components (PC2-4), which made the spatial variability of the reservoir environment more apparent. This method is applicable to a variety of aquatic systems, can be accomplished using commercially available software programs, and allows for improved interpretation of the geographic environmental variability of a system compared to using typical PCA plots. ?? Copyright by the North American Lake Management Society 2006.
ERIC Educational Resources Information Center
Ross, Christine; Herrmann, Mariesa; Angus, Megan Hague
2015-01-01
The purpose of this study was to describe the measures used to evaluate principals in New Jersey in the first (pilot) year of the new principal evaluation system and examine three of the statistical properties of the measures: their variation among principals, their year-to-year stability, and the associations between these measures and the…
Giesen, E B W; Ding, M; Dalstra, M; van Eijden, T M G J
2003-09-01
As several morphological parameters of cancellous bone express more or less the same architectural measure, we applied principal components analysis to group these measures and correlated these to the mechanical properties. Cylindrical specimens (n = 24) were obtained in different orientations from embalmed mandibular condyles; the angle of the first principal direction and the axis of the specimen, expressing the orientation of the trabeculae, ranged from 10 degrees to 87 degrees. Morphological parameters were determined by a method based on Archimedes' principle and by micro-CT scanning, and the mechanical properties were obtained by mechanical testing. The principal components analysis was used to obtain a set of independent components to describe the morphology. This set was entered into linear regression analyses for explaining the variance in mechanical properties. The principal components analysis revealed four components: amount of bone, number of trabeculae, trabecular orientation, and miscellaneous. They accounted for about 90% of the variance in the morphological variables. The component loadings indicated that a higher amount of bone was primarily associated with more plate-like trabeculae, and not with more or thicker trabeculae. The trabecular orientation was most determinative (about 50%) in explaining stiffness, strength, and failure energy. The amount of bone was second most determinative and increased the explained variance to about 72%. These results suggest that trabecular orientation and amount of bone are important in explaining the anisotropic mechanical properties of the cancellous bone of the mandibular condyle.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wallace, Jack, E-mail: jack.wallace@ce.queensu.ca; Champagne, Pascale, E-mail: champagne@civil.queensu.ca; Monnier, Anne-Charlotte, E-mail: anne-charlotte.monnier@insa-lyon.fr
Highlights: • Performance of a hybrid passive landfill leachate treatment system was evaluated. • 33 Water chemistry parameters were sampled for 21 months and statistically analyzed. • Parameters were strongly linked and explained most (>40%) of the variation in data. • Alkalinity, ammonia, COD, heavy metals, and iron were criteria for performance. • Eight other parameters were key in modeling system dynamics and criteria. - Abstract: A pilot-scale hybrid-passive treatment system operated at the Merrick Landfill in North Bay, Ontario, Canada, treats municipal landfill leachate and provides for subsequent natural attenuation. Collected leachate is directed to a hybrid-passive treatment system,more » followed by controlled release to a natural attenuation zone before entering the nearby Little Sturgeon River. The study presents a comprehensive evaluation of the performance of the system using multivariate statistical techniques to determine the interactions between parameters, major pollutants in the leachate, and the biological and chemical processes occurring in the system. Five parameters (ammonia, alkalinity, chemical oxygen demand (COD), “heavy” metals of interest, with atomic weights above calcium, and iron) were set as criteria for the evaluation of system performance based on their toxicity to aquatic ecosystems and importance in treatment with respect to discharge regulations. System data for a full range of water quality parameters over a 21-month period were analyzed using principal components analysis (PCA), as well as principal components (PC) and partial least squares (PLS) regressions. PCA indicated a high degree of association for most parameters with the first PC, which explained a high percentage (>40%) of the variation in the data, suggesting strong statistical relationships among most of the parameters in the system. Regression analyses identified 8 parameters (set as independent variables) that were most frequently retained for modeling the five criteria parameters (set as dependent variables), on a statistically significant level: conductivity, dissolved oxygen (DO), nitrite (NO{sub 2}{sup −}), organic nitrogen (N), oxidation reduction potential (ORP), pH, sulfate and total volatile solids (TVS). The criteria parameters and the significant explanatory parameters were most important in modeling the dynamics of the passive treatment system during the study period. Such techniques and procedures were found to be highly valuable and could be applied to other sites to determine parameters of interest in similar naturalized engineered systems.« less
Improving GEFS Weather Forecasts for Indian Monsoon with Statistical Downscaling
NASA Astrophysics Data System (ADS)
Agrawal, Ankita; Salvi, Kaustubh; Ghosh, Subimal
2014-05-01
Weather forecast has always been a challenging research problem, yet of a paramount importance as it serves the role of 'key input' in formulating modus operandi for immediate future. Short range rainfall forecasts influence a wide range of entities, right from agricultural industry to a common man. Accurate forecasts actually help in minimizing the possible damage by implementing pre-decided plan of action and hence it is necessary to gauge the quality of forecasts which might vary with the complexity of weather state and regional parameters. Indian Summer Monsoon Rainfall (ISMR) is one such perfect arena to check the quality of weather forecast not only because of the level of intricacy in spatial and temporal patterns associated with it, but also the amount of damage it can cause (because of poor forecasts) to the Indian economy by affecting agriculture Industry. The present study is undertaken with the rationales of assessing, the ability of Global Ensemble Forecast System (GEFS) in predicting ISMR over central India and the skill of statistical downscaling technique in adding value to the predictions by taking them closer to evidentiary target dataset. GEFS is a global numerical weather prediction system providing the forecast results of different climate variables at a fine resolution (0.5 degree and 1 degree). GEFS shows good skills in predicting different climatic variables but fails miserably over rainfall predictions for Indian summer monsoon rainfall, which is evident from a very low to negative correlation values between predicted and observed rainfall. Towards the fulfilment of second rationale, the statistical relationship is established between the reasonably well predicted climate variables (GEFS) and observed rainfall. The GEFS predictors are treated with multicollinearity and dimensionality reduction techniques, such as principal component analysis (PCA) and least absolute shrinkage and selection operator (LASSO). Statistical relationship is established between the principal components and observed rainfall over training period and predictions are obtained for testing period. The validations show high improvements in correlation coefficient between observed and predicted data (0.25 to 0.55). The results speak in favour of statistical downscaling methodology which shows the capability to reduce the gap between observed data and predictions. A detailed study is required to be carried out by applying different downscaling techniques to quantify the improvements in predictions.
2017-01-01
Introduction This research paper aims to assess factors reported by parents associated with the successful transition of children with complex additional support requirements that have undergone a transition between school environments from 8 European Union member states. Methods Quantitative data were collected from 306 parents within education systems from 8 EU member states (Bulgaria, Cyprus, Greece, Ireland, the Netherlands, Romania, Spain and the UK). The data were derived from an online questionnaire and consisted of 41 questions. Information was collected on: parental involvement in their child’s transition, child involvement in transition, child autonomy, school ethos, professionals’ involvement in transition and integrated working, such as, joint assessment, cooperation and coordination between agencies. Survey questions that were designed on a Likert-scale were included in the Principal Components Analysis (PCA), additional survey questions, along with the results from the PCA, were used to build a logistic regression model. Results Four principal components were identified accounting for 48.86% of the variability in the data. Principal component 1 (PC1), ‘child inclusive ethos,’ contains 16.17% of the variation. Principal component 2 (PC2), which represents child autonomy and involvement, is responsible for 8.52% of the total variation. Principal component 3 (PC3) contains questions relating to parental involvement and contributed to 12.26% of the overall variation. Principal component 4 (PC4), which involves transition planning and coordination, contributed to 11.91% of the overall variation. Finally, the principal components were included in a logistic regression to evaluate the relationship between inclusion and a successful transition, as well as whether other factors that may have influenced transition. All four principal components were significantly associated with a successful transition, with PC1 being having the most effect (OR: 4.04, CI: 2.43–7.18, p<0.0001). Discussion To support a child with complex additional support requirements through transition from special school to mainstream, governments and professionals need to ensure children with additional support requirements and their parents are at the centre of all decisions that affect them. It is important that professionals recognise the educational, psychological, social and cultural contexts of a child with additional support requirements and their families which will provide a holistic approach and remove barriers for learning. PMID:28636649
Ravenscroft, John; Wazny, Kerri; Davis, John M
2017-01-01
This research paper aims to assess factors reported by parents associated with the successful transition of children with complex additional support requirements that have undergone a transition between school environments from 8 European Union member states. Quantitative data were collected from 306 parents within education systems from 8 EU member states (Bulgaria, Cyprus, Greece, Ireland, the Netherlands, Romania, Spain and the UK). The data were derived from an online questionnaire and consisted of 41 questions. Information was collected on: parental involvement in their child's transition, child involvement in transition, child autonomy, school ethos, professionals' involvement in transition and integrated working, such as, joint assessment, cooperation and coordination between agencies. Survey questions that were designed on a Likert-scale were included in the Principal Components Analysis (PCA), additional survey questions, along with the results from the PCA, were used to build a logistic regression model. Four principal components were identified accounting for 48.86% of the variability in the data. Principal component 1 (PC1), 'child inclusive ethos,' contains 16.17% of the variation. Principal component 2 (PC2), which represents child autonomy and involvement, is responsible for 8.52% of the total variation. Principal component 3 (PC3) contains questions relating to parental involvement and contributed to 12.26% of the overall variation. Principal component 4 (PC4), which involves transition planning and coordination, contributed to 11.91% of the overall variation. Finally, the principal components were included in a logistic regression to evaluate the relationship between inclusion and a successful transition, as well as whether other factors that may have influenced transition. All four principal components were significantly associated with a successful transition, with PC1 being having the most effect (OR: 4.04, CI: 2.43-7.18, p<0.0001). To support a child with complex additional support requirements through transition from special school to mainstream, governments and professionals need to ensure children with additional support requirements and their parents are at the centre of all decisions that affect them. It is important that professionals recognise the educational, psychological, social and cultural contexts of a child with additional support requirements and their families which will provide a holistic approach and remove barriers for learning.
Ibrahim, George M; Morgan, Benjamin R; Macdonald, R Loch
2014-03-01
Predictors of outcome after aneurysmal subarachnoid hemorrhage have been determined previously through hypothesis-driven methods that often exclude putative covariates and require a priori knowledge of potential confounders. Here, we apply a data-driven approach, principal component analysis, to identify baseline patient phenotypes that may predict neurological outcomes. Principal component analysis was performed on 120 subjects enrolled in a prospective randomized trial of clazosentan for the prevention of angiographic vasospasm. Correlation matrices were created using a combination of Pearson, polyserial, and polychoric regressions among 46 variables. Scores of significant components (with eigenvalues>1) were included in multivariate logistic regression models with incidence of severe angiographic vasospasm, delayed ischemic neurological deficit, and long-term outcome as outcomes of interest. Sixteen significant principal components accounting for 74.6% of the variance were identified. A single component dominated by the patients' initial hemodynamic status, World Federation of Neurosurgical Societies score, neurological injury, and initial neutrophil/leukocyte counts was significantly associated with poor outcome. Two additional components were associated with angiographic vasospasm, of which one was also associated with delayed ischemic neurological deficit. The first was dominated by the aneurysm-securing procedure, subarachnoid clot clearance, and intracerebral hemorrhage, whereas the second had high contributions from markers of anemia and albumin levels. Principal component analysis, a data-driven approach, identified patient phenotypes that are associated with worse neurological outcomes. Such data reduction methods may provide a better approximation of unique patient phenotypes and may inform clinical care as well as patient recruitment into clinical trials. http://www.clinicaltrials.gov. Unique identifier: NCT00111085.
Mujica Ascencio, Saul; Choe, ChunSik; Meinke, Martina C; Müller, Rainer H; Maksimov, George V; Wigger-Alberti, Walter; Lademann, Juergen; Darvin, Maxim E
2016-07-01
Propylene glycol is one of the known substances added in cosmetic formulations as a penetration enhancer. Recently, nanocrystals have been employed also to increase the skin penetration of active components. Caffeine is a component with many applications and its penetration into the epidermis is controversially discussed in the literature. In the present study, the penetration ability of two components - caffeine nanocrystals and propylene glycol, applied topically on porcine ear skin in the form of a gel, was investigated ex vivo using two confocal Raman microscopes operated at different excitation wavelengths (785nm and 633nm). Several depth profiles were acquired in the fingerprint region and different spectral ranges, i.e., 526-600cm(-1) and 810-880cm(-1) were chosen for independent analysis of caffeine and propylene glycol penetration into the skin, respectively. Multivariate statistical methods such as principal component analysis (PCA) and linear discriminant analysis (LDA) combined with Student's t-test were employed to calculate the maximum penetration depths of each substance (caffeine and propylene glycol). The results show that propylene glycol penetrates significantly deeper than caffeine (20.7-22.0μm versus 12.3-13.0μm) without any penetration enhancement effect on caffeine. The results confirm that different substances, even if applied onto the skin as a mixture, can penetrate differently. The penetration depths of caffeine and propylene glycol obtained using two different confocal Raman microscopes are comparable showing that both types of microscopes are well suited for such investigations and that multivariate statistical PCA-LDA methods combined with Student's t-test are very useful for analyzing the penetration of different substances into the skin. Copyright © 2016 Elsevier B.V. All rights reserved.
The Global Oscillation Network Group site survey. 1: Data collection and analysis methods
NASA Technical Reports Server (NTRS)
Hill, Frank; Fischer, George; Grier, Jennifer; Leibacher, John W.; Jones, Harrison B.; Jones, Patricia P.; Kupke, Renate; Stebbins, Robin T.
1994-01-01
The Global Oscillation Network Group (GONG) Project is planning to place a set of instruments around the world to observe solar oscillations as continuously as possible for at least three years. The Project has now chosen the sites that will comprise the network. This paper describes the methods of data collection and analysis that were used to make this decision. Solar irradiance data were collected with a one-minute cadence at fifteen sites around the world and analyzed to produce statistics of cloud cover, atmospheric extinction, and transparency power spectra at the individual sites. Nearly 200 reasonable six-site networks were assembled from the individual stations, and a set of statistical measures of the performance of the networks was analyzed using a principal component analysis. An accompanying paper presents the results of the survey.
NASA Astrophysics Data System (ADS)
Attallah, Bilal; Serir, Amina; Chahir, Youssef; Boudjelal, Abdelwahhab
2017-11-01
Palmprint recognition systems are dependent on feature extraction. A method of feature extraction using higher discrimination information was developed to characterize palmprint images. In this method, two individual feature extraction techniques are applied to a discrete wavelet transform of a palmprint image, and their outputs are fused. The two techniques used in the fusion are the histogram of gradient and the binarized statistical image features. They are then evaluated using an extreme learning machine classifier before selecting a feature based on principal component analysis. Three palmprint databases, the Hong Kong Polytechnic University (PolyU) Multispectral Palmprint Database, Hong Kong PolyU Palmprint Database II, and the Delhi Touchless (IIDT) Palmprint Database, are used in this study. The study shows that our method effectively identifies and verifies palmprints and outperforms other methods based on feature extraction.
Monitoring of an antigen manufacturing process.
Zavatti, Vanessa; Budman, Hector; Legge, Raymond; Tamer, Melih
2016-06-01
Fluorescence spectroscopy in combination with multivariate statistical methods was employed as a tool for monitoring the manufacturing process of pertactin (PRN), one of the virulence factors of Bordetella pertussis utilized in whopping cough vaccines. Fluorophores such as amino acids and co-enzymes were detected throughout the process. The fluorescence data collected at different stages of the fermentation and purification process were treated employing principal component analysis (PCA). Through PCA, it was feasible to identify sources of variability in PRN production. Then, partial least square (PLS) was employed to correlate the fluorescence spectra obtained from pure PRN samples and the final protein content measured by a Kjeldahl test from these samples. In view that a statistically significant correlation was found between fluorescence and PRN levels, this approach could be further used as a method to predict the final protein content.
Computerized system for assessing heart rate variability.
Frigy, A; Incze, A; Brânzaniuc, E; Cotoi, S
1996-01-01
The principal theoretical, methodological and clinical aspects of heart rate variability (HRV) analysis are reviewed. This method has been developed over the last 10 years as a useful noninvasive method of measuring the activity of the autonomic nervous system. The main components and the functioning of the computerized rhythm-analyzer system developed by our team are presented. The system is able to perform short-term (maximum 20 minutes) time domain HRV analysis and statistical analysis of the ventricular rate in any rhythm, particularly in atrial fibrillation. The performances of our system are demonstrated by using the graphics (RR histograms, delta RR histograms, RR scattergrams) and the statistical parameters resulted from the processing of three ECG recordings. These recordings are obtained from a normal subject, from a patient with advanced heart failure, and from a patient with atrial fibrillation.
Integration of statistical and physiological analyses of adaptation of near-isogenic barley lines.
Romagosa, I; Fox, P N; García Del Moral, L F; Ramos, J M; García Del Moral, B; Roca de Togores, F; Molina-Cano, J L
1993-08-01
Seven near-isogenic barley lines, differing for three independent mutant genes, were grown in 15 environments in Spain. Genotype x environment interaction (G x E) for grain yield was examined with the Additive Main Effects and Multiplicative interaction (AMMI) model. The results of this statistical analysis of multilocation yield-data were compared with a morpho-physiological characterization of the lines at two sites (Molina-Cano et al. 1990). The first two principal component axes from the AMMI analysis were strongly associated with the morpho-physiological characters. The independent but parallel discrimination among genotypes reflects genetic differences and highlights the power of the AMMI analysis as a tool to investigate G x E. Characters which appear to be positively associated with yield in the germplasm under study could be identified for some environments.
RepExplore: addressing technical replicate variance in proteomics and metabolomics data analysis.
Glaab, Enrico; Schneider, Reinhard
2015-07-01
High-throughput omics datasets often contain technical replicates included to account for technical sources of noise in the measurement process. Although summarizing these replicate measurements by using robust averages may help to reduce the influence of noise on downstream data analysis, the information on the variance across the replicate measurements is lost in the averaging process and therefore typically disregarded in subsequent statistical analyses.We introduce RepExplore, a web-service dedicated to exploit the information captured in the technical replicate variance to provide more reliable and informative differential expression and abundance statistics for omics datasets. The software builds on previously published statistical methods, which have been applied successfully to biomedical omics data but are difficult to use without prior experience in programming or scripting. RepExplore facilitates the analysis by providing a fully automated data processing and interactive ranking tables, whisker plot, heat map and principal component analysis visualizations to interpret omics data and derived statistics. Freely available at http://www.repexplore.tk enrico.glaab@uni.lu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Villamizar-Mejia, Rodolfo; Mujica-Delgado, Luis-Eduardo; Ruiz-Ordóñez, Magda-Liliana; Camacho-Navarro, Jhonatan; Moreno-Beltrán, Gustavo
2017-05-01
In previous works, damage detection of metallic specimens exposed to temperature changes has been achieved by using a statistical baseline model based on Principal Component Analysis (PCA), piezodiagnostics principle and taking into account temperature effect by augmenting the baseline model or by using several baseline models according to the current temperature. In this paper a new approach is presented, where damage detection is based in a new index that combine Q and T2 statistical indices with current temperature measurements. Experimental tests were achieved in a carbon-steel pipe of 1m length and 1.5 inches diameter, instrumented with piezodevices acting as actuators or sensors. A PCA baseline model was obtained to a temperature of 21º and then T2 and Q statistical indices were obtained for a 24h temperature profile. Also, mass adding at different points of pipe between sensor and actuator was used as damage. By using the combined index the temperature contribution can be separated and a better differentiation of damages respect to undamaged cases can be graphically obtained.
NASA Astrophysics Data System (ADS)
Senkbeil, J. C.; Brommer, D. M.; Comstock, I. J.; Loyd, T.
2012-07-01
Extratropical cyclones (ETCs) in the southern United States are often overlooked when compared with tropical cyclones in the region and ETCs in the northern United States. Although southern ETCs are significant weather events, there is currently not an operational scheme used for identifying and discussing these nameless storms. In this research, we classified 84 ETCs (1970-2009). We manually identified five distinct formation regions and seven unique ETC types using statistical classification. Statistical classification employed the use of principal components analysis and two methods of cluster analysis. Both manual and statistical storm types generally showed positive (negative) relationships with El Niño (La Niña). Manual storm types displayed precipitation swaths consistent with discrete storm tracks which further legitimizes the existence of multiple modes of southern ETCs. Statistical storm types also displayed unique precipitation intensity swaths, but these swaths were less indicative of track location. It is hoped that by classifying southern ETCs into types, that forecasters, hydrologists, and broadcast meteorologists might be able to better anticipate projected amounts of precipitation at their locations.
A principal components approach to parent-to-newborn body composition associations in South India
Veena, Sargoor R; Krishnaveni, Ghattu V; Wills, Andrew K; Hill, Jacqueline C; Fall, Caroline HD
2009-01-01
Background Size at birth is influenced by environmental factors, like maternal nutrition and parity, and by genes. Birth weight is a composite measure, encompassing bone, fat and lean mass. These may have different determinants. The main purpose of this paper was to use anthropometry and principal components analysis (PCA) to describe maternal and newborn body composition, and associations between them, in an Indian population. We also compared maternal and paternal measurements (body mass index (BMI) and height) as predictors of newborn body composition. Methods Weight, height, head and mid-arm circumferences, skinfold thicknesses and external pelvic diameters were measured at 30 ± 2 weeks gestation in 571 pregnant women attending the antenatal clinic of the Holdsworth Memorial Hospital, Mysore, India. Paternal height and weight were also measured. At birth, detailed neonatal anthropometry was performed. Unrotated and varimax rotated PCA was applied to the maternal and neonatal measurements. Results Rotated PCA reduced maternal measurements to 4 independent components (fat, pelvis, height and muscle) and neonatal measurements to 3 components (trunk+head, fat, and leg length). An SD increase in maternal fat was associated with a 0.16 SD increase (β) in neonatal fat (p < 0.001, adjusted for gestation, maternal parity, newborn sex and socio-economic status). Maternal pelvis, height and (for male babies) muscle predicted neonatal trunk+head (β = 0. 09 SD; p = 0.017, β = 0.12 SD; p = 0.006 and β = 0.27 SD; p < 0.001). In the mother-baby and father-baby comparison, maternal BMI predicted neonatal fat (β = 0.20 SD; p < 0.001) and neonatal trunk+head (β = 0.15 SD; p = 0.001). Both maternal (β = 0.12 SD; p = 0.002) and paternal height (β = 0.09 SD; p = 0.030) predicted neonatal trunk+head but the associations became weak and statistically non-significant in multivariate analysis. Only paternal height predicted neonatal leg length (β = 0.15 SD; p = 0.003). Conclusion Principal components analysis is a useful method to describe neonatal body composition and its determinants. Newborn adiposity is related to maternal nutritional status and parity, while newborn length is genetically determined. Further research is needed to understand mechanisms linking maternal pelvic size to fetal growth and the determinants and implications of the components (trunk v leg length) of fetal skeletal growth. PMID:19236724
Extracting chemical information from high-resolution Kβ X-ray emission spectroscopy
NASA Astrophysics Data System (ADS)
Limandri, S.; Robledo, J.; Tirao, G.
2018-06-01
High-resolution X-ray emission spectroscopy allows studying the chemical environment of a wide variety of materials. Chemical information can be obtained by fitting the X-ray spectra and observing the behavior of some spectral features. Spectral changes can also be quantified by means of statistical parameters calculated by considering the spectrum as a probability distribution. Another possibility is to perform statistical multivariate analysis, such as principal component analysis. In this work the performance of these procedures for extracting chemical information in X-ray emission spectroscopy spectra for mixtures of Mn2+ and Mn4+ oxides are studied. A detail analysis of the parameters obtained, as well as the associated uncertainties is shown. The methodologies are also applied for Mn oxidation state characterization of double perovskite oxides Ba1+xLa1-xMnSbO6 (with 0 ≤ x ≤ 0.7). The results show that statistical parameters and multivariate analysis are the most suitable for the analysis of this kind of spectra.
Guo, Hui; Zhang, Zhen; Yao, Yuan; Liu, Jialin; Chang, Ruirui; Liu, Zhao; Hao, Hongyuan; Huang, Taohong; Wen, Jun; Zhou, Tingting
2018-08-30
Semen sojae praeparatum with homology of medicine and food is a famous traditional Chinese medicine. A simple and effective quality fingerprint analysis, coupled with chemometrics methods, was developed for quality assessment of Semen sojae praeparatum. First, similarity analysis (SA) and hierarchical clusting analysis (HCA) were applied to select the qualitative markers, which obviously influence the quality of Semen sojae praeparatum. 21 chemicals were selected and characterized by high resolution ion trap/time-of-flight mass spectrometry (LC-IT-TOF-MS). Subsequently, principal components analysis (PCA) and orthogonal partial least squares discriminant analysis (OPLS-DA) were conducted to select the quantitative markers of Semen sojae praeparatum samples from different origins. Moreover, 11 compounds with statistical significance were determined quantitatively, which provided an accurate and informative data for quality evaluation. This study proposes a new strategy for "statistic analysis-based fingerprint establishment", which would be a valuable reference for further study. Copyright © 2018 Elsevier Ltd. All rights reserved.
Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye
2016-01-13
A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.
Wavelet methodology to improve single unit isolation in primary motor cortex cells
Ortiz-Rosario, Alexis; Adeli, Hojjat; Buford, John A.
2016-01-01
The proper isolation of action potentials recorded extracellularly from neural tissue is an active area of research in the fields of neuroscience and biomedical signal processing. This paper presents an isolation methodology for neural recordings using the wavelet transform (WT), a statistical thresholding scheme, and the principal component analysis (PCA) algorithm. The effectiveness of five different mother wavelets was investigated: biorthogonal, Daubachies, discrete Meyer, symmetric, and Coifman; along with three different wavelet coefficient thresholding schemes: fixed form threshold, Stein’s unbiased estimate of risk, and minimax; and two different thresholding rules: soft and hard thresholding. The signal quality was evaluated using three different statistical measures: mean-squared error, root-mean squared, and signal to noise ratio. The clustering quality was evaluated using two different statistical measures: isolation distance, and L-ratio. This research shows that the selection of the mother wavelet has a strong influence on the clustering and isolation of single unit neural activity, with the Daubachies 4 wavelet and minimax thresholding scheme performing the best. PMID:25794461
NASA Astrophysics Data System (ADS)
Geminale, A.; Grassi, D.; Altieri, F.; Serventi, G.; Carli, C.; Carrozzo, F. G.; Sgavetti, M.; Orosei, R.; D'Aversa, E.; Bellucci, G.; Frigeri, A.
2015-06-01
The aim of this work is to extract the surface contribution in the martian visible/near-infrared spectra removing the atmospheric components by means of Principal Component Analysis (PCA) and target transformation (TT). The developed technique is suitable for separating spectral components in a data set large enough to enable an effective usage of statistical methods, in support to the more common approaches to remove the gaseous component. In this context, a key role is played by the estimation, from the spectral population, of the covariance matrix that describes the statistical correlation of the signal among different points in the spectrum. As a general rule, the covariance matrix becomes more and more meaningful increasing the size of initial population, justifying therefore the importance of sizable datasets. Data collected by imaging spectrometers, such as the OMEGA (Observatoire pour la Minéralogie, l'Eau, les Glaces et l'Activité) instrument on board the ESA mission Mars Express (MEx), are particularly suitable for this purpose since it includes in the same session of observation a large number of spectra with different content of aerosols, gases and mineralogy. The methodology presented in this work has been first validated using a simulated dataset of spectra to evaluate its accuracy. Then, it has been applied to the analysis of OMEGA sessions over Nili Fossae and Mawrth Vallis regions, which have been already widely studied because of the presence of hydrated minerals. These minerals are key components of the surface to investigate the presence of liquid water flowing on the martian surface in the Noachian period. Moreover, since a correction for the atmospheric aerosols (dust) component is also applied to these observations, the present work is able to completely remove the atmospheric contribution from the analysed spectra. Once the surface reflectance, free from atmospheric contributions, has been obtained, the Modified Gaussian Model (MGM) has been applied to spectra showing the hydrated phase. Silicates and iron-bearing hydrated minerals have been identified by means of the electronic transitions of Fe2+ between 0.8 and 1.2 μm, while at longer wavelengths the hydrated mineralogy is identified by overtones of the OH group. Surface reflectance spectra, as derived through the method discussed in this paper, clearly show a lower level of the atmospheric residuals in the 1.9 hydration band, thus resulting in a better match with the MGM deconvolution parameters found for the laboratory spectra of martian hydrated mineral analogues and allowing a deeper investigation of this spectral range.
Introduction to uses and interpretation of principal component analyses in forest biology.
J. G. Isebrands; Thomas R. Crow
1975-01-01
The application of principal component analysis for interpretation of multivariate data sets is reviewed with emphasis on (1) reduction of the number of variables, (2) ordination of variables, and (3) applications in conjunction with multiple regression.
Principal component analysis of phenolic acid spectra
USDA-ARS?s Scientific Manuscript database
Phenolic acids are common plant metabolites that exhibit bioactive properties and have applications in functional food and animal feed formulations. The ultraviolet (UV) and infrared (IR) spectra of four closely related phenolic acid structures were evaluated by principal component analysis (PCA) to...
Optimal pattern synthesis for speech recognition based on principal component analysis
NASA Astrophysics Data System (ADS)
Korsun, O. N.; Poliyev, A. V.
2018-02-01
The algorithm for building an optimal pattern for the purpose of automatic speech recognition, which increases the probability of correct recognition, is developed and presented in this work. The optimal pattern forming is based on the decomposition of an initial pattern to principal components, which enables to reduce the dimension of multi-parameter optimization problem. At the next step the training samples are introduced and the optimal estimates for principal components decomposition coefficients are obtained by a numeric parameter optimization algorithm. Finally, we consider the experiment results that show the improvement in speech recognition introduced by the proposed optimization algorithm.
NASA Astrophysics Data System (ADS)
Gao, Yang; Chen, Maomao; Wu, Junyu; Zhou, Yuan; Cai, Chuangjian; Wang, Daliang; Luo, Jianwen
2017-09-01
Fluorescence molecular imaging has been used to target tumors in mice with xenograft tumors. However, tumor imaging is largely distorted by the aggregation of fluorescent probes in the liver. A principal component analysis (PCA)-based strategy was applied on the in vivo dynamic fluorescence imaging results of three mice with xenograft tumors to facilitate tumor imaging, with the help of a tumor-specific fluorescent probe. Tumor-relevant features were extracted from the original images by PCA and represented by the principal component (PC) maps. The second principal component (PC2) map represented the tumor-related features, and the first principal component (PC1) map retained the original pharmacokinetic profiles, especially of the liver. The distribution patterns of the PC2 map of the tumor-bearing mice were in good agreement with the actual tumor location. The tumor-to-liver ratio and contrast-to-noise ratio were significantly higher on the PC2 map than on the original images, thus distinguishing the tumor from its nearby fluorescence noise of liver. The results suggest that the PC2 map could serve as a bioimaging marker to facilitate in vivo tumor localization, and dynamic fluorescence molecular imaging with PCA could be a valuable tool for future studies of in vivo tumor metabolism and progression.
NASA Astrophysics Data System (ADS)
Ueki, Kenta; Iwamori, Hikaru
2017-10-01
In this study, with a view of understanding the structure of high-dimensional geochemical data and discussing the chemical processes at work in the evolution of arc magmas, we employed principal component analysis (PCA) to evaluate the compositional variations of volcanic rocks from the Sengan volcanic cluster of the Northeastern Japan Arc. We analyzed the trace element compositions of various arc volcanic rocks, sampled from 17 different volcanoes in a volcanic cluster. The PCA results demonstrated that the first three principal components accounted for 86% of the geochemical variation in the magma of the Sengan region. Based on the relationships between the principal components and the major elements, the mass-balance relationships with respect to the contributions of minerals, the composition of plagioclase phenocrysts, geothermal gradient, and seismic velocity structure in the crust, the first, the second, and the third principal components appear to represent magma mixing, crystallizations of olivine/pyroxene, and crystallizations of plagioclase, respectively. These represented 59%, 20%, and 6%, respectively, of the variance in the entire compositional range, indicating that magma mixing accounted for the largest variance in the geochemical variation of the arc magma. Our result indicated that crustal processes dominate the geochemical variation of magma in the Sengan volcanic cluster.
Prevalence of Dental Caries Among Primary School Children of India – A Cross-Sectional Study
Hiremath, Anand; Ankola, Anil V; Hebbal, Mamata; Mohandoss, Suganya; Pastay, Pratibha
2016-01-01
Introduction In India, the trend indicates an increase in oral health problems especially dental caries, which has been consistently increasing both in prevalence and in severity. Children of all age groups are affected by dental caries. It becomes imperative to collect the data on prevalence of dental caries and treatment needs to provide preventive care. Aim To assess the prevalence of dental caries and treatment needs of 6-11years old Indian school children. Materials and Methods This was a cross-sectional study. Sampling frame consisted of 6-11years old primary school children. Study sample consisted of 13,200 children selected from 10 talukas of Belgavi District, Karnataka, India. Clinical examination for dmft and DMFT was carried out in the school premises by five teams, each consisting of one faculty, three postgraduate students and five interns from the KLE VK Institute of Dental Sciences, Belagavi, Karnataka, India. The examiners were trained and calibrated by the principal investigator. Statistical analysis was done using Chi-square and t-test. Results The overall caries prevalence was 78.9%, mean dmft was 2.97±2.62 and mean DMFT was 0.17±0.53. The decayed teeth component was the principal component in both dmft and DMFT indices. The mean dmft in boys was higher compared to girls and it was found to be statistically significant (p<0.05). Conclusion This study provided us with the baseline data, using which treatment was provided to all the children screened. The children were provided treatment at the camp site/dental hospital/satellite centers and primary health care centers according to the facilities available. PMID:27891457
NASA Astrophysics Data System (ADS)
Singh, Asha Lata; Singh, Vipin Kumar
2018-06-01
A total of 22 water quality parameters were selected for the analysis of groundwater samples with reference to arsenic contamination. Samples were collected in the pre-monsoon and monsoon seasons of the year 2013. The maximum arsenic concentration in both the pre-monsoon and monsoon seasons was approximately the same, i.e., the maximum arsenic concentration being 75.60 and 74.46 µg/L in pre-monsoon and monsoon, respectively. Out of 72 collected samples, three were below the WHO guideline value of 10 µg/L for arsenic concentration. In 95.83% of the groundwater samples, the arsenic concentration was above the permissible limit. Nickel, manganese, and chromium concentrations were above the permissible limits in nearly all samples except for chromium concentration in a few pre-monsoon samples. However, the total iron concentrations in 23 samples (31.94%) were above the permissible limit. A total of six and seven principal components (PCs) were extracted using principal component analysis during the pre-monsoon and monsoon seasons, respectively, accounting for 76.25 and 78.52% of the total variation during two consecutive seasons. Correlation statistics revealed that the arsenic concentration was positively correlated with phosphate, iron, ammonium, bicarbonate, and manganese concentrations but negatively correlated with oxidation reduction potential (ORP), sulfate concentration, electrical conductivity, and total dissolved solids concentration. The negative correlation of arsenic with ORP suggested reducing conditions prevailing in the groundwater. The trilinear Piper diagram revealed calcium and magnesium enrichment of groundwater with an abundance of chloride ions but no predominance of bicarbonate ions. Thus, the groundwater fell into Ca2+ - Mg2+ - Cl- - SO4 2- category.
Integrated Approaches On Archaeo-Geophysical Data
NASA Astrophysics Data System (ADS)
Kucukdemirci, M.; Piro, S.; Zamuner, D.; Ozer, E.
2015-12-01
Key words: Ground Penetrating Radar (GPR), Magnetometry, Geophysical Data Integration, Principal Component Analyse (PCA), Aizanoi Archaeological Site An application of geophysical integration methods which often appealed are divided into two classes as qualitative and quantitative approaches. This work focused on the application of quantitative integration approaches, which involve the mathematical and statistical integration techniques, on the archaeo-geophysical data obtained in Aizanoi Archaeological Site,Turkey. Two geophysical methods were applied as Ground Penetrating Radar (GPR) and Magnetometry for archaeological prospection on the selected archaeological site. After basic data processing of each geophysical method, the mathematical approaches of Sums and Products and the statistical approach of Principal Component Analysis (PCA) have been applied for the integration. These integration approches were first tested on synthetic digital images before application to field data. Then the same approaches were applied to 2D magnetic maps and 2D GPR time slices which were obtained on the same unit grids in the archaeological site. Initially, the geophysical data were examined individually by referencing with archeological maps and informations obtained from archaeologists and some important structures as possible walls, roads and relics were determined. The results of all integration approaches provided very important and different details about the anomalies related to archaeological features. By using all those applications, integrated images can provide complementary informations as well about the archaeological relics under the ground. Acknowledgements The authors would like to thanks to Scientific and Technological Research Council of Turkey (TUBITAK), Fellowship for Visiting Scientists Programme for their support, Istanbul University Scientific Research Project Fund, (Project.No:12302) and archaeologist team of Aizanoi Archaeological site for their support during the field work.
Pistolis, John; Zimeras, Stelios; Chardalias, Kostas; Roupa, Zoe; Fildisis, George; Diomidous, Marianna
2016-06-01
Social networks (1) have been embedded in our daily life for a long time. They constitute a powerful tool used nowadays for both searching and exchanging information on different issues by using Internet searching engines (Google, Bing, etc.) and Social Networks (Facebook, Twitter etc.). In this paper, are presented the results of a research based on the frequency and the type of the usage of the Internet and the Social Networks by the general public and the health professionals. The objectives of the research were focused on the investigation of the frequency of seeking and meticulously searching for health information in the social media by both individuals and health practitioners. The exchanging of information is a procedure that involves the issues of reliability and quality of information. In this research, by using advanced statistical techniques an effort is made to investigate the participant's profile in using social networks for searching and exchanging information on health issues. Based on the answers 93 % of the people, use the Internet to find information on health-subjects. Considering principal component analysis, the most important health subjects were nutrition (0.719 %), respiratory issues (0.79 %), cardiological issues (0.777%), psychological issues (0.667%) and total (73.8%). The research results, based on different statistical techniques revealed that the 61.2% of the males and 56.4% of the females intended to use the social networks for searching medical information. Based on the principal components analysis, the most important sources that the participants mentioned, were the use of the Internet and social networks for exchanging information on health issues. These sources proved to be of paramount importance to the participants of the study. The same holds for nursing, medical and administrative staff in hospitals.
McAlinden, Colm; Pesudovs, Konrad; Moore, Jonathan E
2010-11-01
To develop an instrument to measure subjective quality of vision: the Quality of Vision (QoV) questionnaire. A 30-item instrument was designed with 10 symptoms rated in each of three scales (frequency, severity, and bothersome). The QoV was completed by 900 subjects in groups of spectacle wearers, contact lens wearers, and those having had laser refractive surgery, intraocular refractive surgery, or eye disease and investigated with Rasch analysis and traditional statistics. Validity and reliability were assessed by Rasch fit statistics, principal components analysis (PCA), person separation, differential item functioning (DIF), item targeting, construct validity (correlation with visual acuity, contrast sensitivity, total root mean square [RMS] higher order aberrations [HOA]), and test-retest reliability (two-way random intraclass correlation coefficients [ICC] and 95% repeatability coefficients [R(c)]). Rasch analysis demonstrated good precision, reliability, and internal consistency for all three scales (mean square infit and outfit within 0.81-1.27; PCA >60% variance explained by the principal component; person separation 2.08, 2.10, and 2.01 respectively; and minimal DIF). Construct validity was indicated by strong correlations with visual acuity, contrast sensitivity and RMS HOA. Test-retest reliability was evidenced by a minimum ICC of 0.867 and a minimum 95% R(c) of 1.55 units. The QoV Questionnaire consists of a Rasch-tested, linear-scaled, 30-item instrument on three scales providing a QoV score in terms of symptom frequency, severity, and bothersome. It is suitable for measuring QoV in patients with all types of refractive correction, eye surgery, and eye disease that cause QoV problems.
Maranesi, E; Merlo, A; Fioretti, S; Zemp, D D; Campanini, I; Quadri, P
2016-02-01
Identification of future non-fallers, infrequent and frequent fallers among older people would permit focusing the delivery of prevention programs on selected individuals. Posturographic parameters have been proven to differentiate between non-fallers and frequent fallers, but not between the first group and infrequent fallers. In this study, postural stability with eyes open and closed on both a firm and a compliant surface and while performing a cognitive task was assessed in a consecutive sample of 130 cognitively able elderly, mean age 77(7)years, categorized as non-fallers (N=67), infrequent fallers (one/two falls, N=45) and frequent fallers (more than two falls, N=18) according to their last year fall history. Principal Component Analysis was used to select the most significant features from a set of 17posturographic parameters. Next, variables derived from principal component analysis were used to test, in each task, group differences between the three groups. One parameter based on a combination of a set of Centre of Pressure anterior-posterior variables obtained from the eyes-open on a compliant surface task was statistically different among all groups, thus distinguishing infrequent fallers from both non-fallers (P<0.05) and frequent fallers (P<0.05). For the first time, a method based on posturographic data to retrospectively discriminate infrequent fallers was obtained. The joint use of both the eyes-open on a compliant surface condition and this new parameter could be used, in a future study, to improve the performance of protocols and to verify the ability of this method to identify new-fallers in elderly without cognitive impairment. Copyright © 2015 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Kronenberger, William G.; Thompson, Robert J., Jr.; Morrow, Catherine
1997-01-01
A principal components analysis of the Family Environment Scale (FES) (R. Moos and B. Moos, 1994) was performed using 113 undergraduates. Research supported 3 broad components encompassing the 10 FES subscales. These results supported previous research and the generalization of the FES to college samples. (SLD)
Time series analysis of collective motions in proteins
NASA Astrophysics Data System (ADS)
Alakent, Burak; Doruker, Pemra; ćamurdan, Mehmet C.
2004-01-01
The dynamics of α-amylase inhibitor tendamistat around its native state is investigated using time series analysis of the principal components of the Cα atomic displacements obtained from molecular dynamics trajectories. Collective motion along a principal component is modeled as a homogeneous nonstationary process, which is the result of the damped oscillations in local minima superimposed on a random walk. The motion in local minima is described by a stationary autoregressive moving average model, consisting of the frequency, damping factor, moving average parameters and random shock terms. Frequencies for the first 50 principal components are found to be in the 3-25 cm-1 range, which are well correlated with the principal component indices and also with atomistic normal mode analysis results. Damping factors, though their correlation is less pronounced, decrease as principal component indices increase, indicating that low frequency motions are less affected by friction. The existence of a positive moving average parameter indicates that the stochastic force term is likely to disturb the mode in opposite directions for two successive sampling times, showing the modes tendency to stay close to minimum. All these four parameters affect the mean square fluctuations of a principal mode within a single minimum. The inter-minima transitions are described by a random walk model, which is driven by a random shock term considerably smaller than that for the intra-minimum motion. The principal modes are classified into three subspaces based on their dynamics: essential, semiconstrained, and constrained, at least in partial consistency with previous studies. The Gaussian-type distributions of the intermediate modes, called "semiconstrained" modes, are explained by asserting that this random walk behavior is not completely free but between energy barriers.
Burst and Principal Components Analyses of MEA Data Separates Chemicals by Class
Microelectrode arrays (MEAs) detect drug and chemical induced changes in action potential "spikes" in neuronal networks and can be used to screen chemicals for neurotoxicity. Analytical "fingerprinting," using Principal Components Analysis (PCA) on spike trains recorded from prim...
EVALUATION OF ACID DEPOSITION MODELS USING PRINCIPAL COMPONENT SPACES
An analytical technique involving principal components analysis is proposed for use in the evaluation of acid deposition models. elationships among model predictions are compared to those among measured data, rather than the more common one-to-one comparison of predictions to mea...
Paramasivam, Sivakumar; Gronenborn, Angela M; Polenova, Tatyana
2018-08-01
Chemical shift tensors (CSTs) are an exquisite probe of local geometric and electronic structure. 15 N CST are very sensitive to hydrogen bonding, yet they have been reported for very few proteins to date. Here we present experimental results and statistical analysis of backbone amide 15 N CSTs for 100 residues of four proteins, two E. coli thioredoxin reassemblies (1-73-(U- 13 C, 15 N)/74-108-(U- 15 N) and 1-73-(U- 15 N)/74-108-(U- 13 C, 15 N)), dynein light chain 8 LC8, and CAP-Gly domain of the mammalian dynactin. The 15 N CSTs were measured by a symmetry-based CSA recoupling method, ROCSA. Our results show that the principal component δ 11 is very sensitive to the presence of hydrogen bonding interactions due to its unique orientation in the molecular frame. The downfield chemical shift change of backbone amide nitrogen nuclei with increasing hydrogen bond strength is manifested in the negative correlation of the principal components with hydrogen bond distance for both α-helical and β-sheet secondary structure elements. Our findings highlight the potential for the use of 15 N CSTs in protein structure refinement. Copyright © 2018 Elsevier Inc. All rights reserved.
Incorporating principal component analysis into air quality ...
The efficacy of standard air quality model evaluation techniques is becoming compromised as the simulation periods continue to lengthen in response to ever increasing computing capacity. Accordingly, the purpose of this paper is to demonstrate a statistical approach called Principal Component Analysis (PCA) with the intent of motivating its use by the evaluation community. One of the main objectives of PCA is to identify, through data reduction, the recurring and independent modes of variations (or signals) within a very large dataset, thereby summarizing the essential information of that dataset so that meaningful and descriptive conclusions can be made. In this demonstration, PCA is applied to a simple evaluation metric – the model bias associated with EPA's Community Multi-scale Air Quality (CMAQ) model when compared to weekly observations of sulfate (SO42−) and ammonium (NH4+) ambient air concentrations measured by the Clean Air Status and Trends Network (CASTNet). The advantages of using this technique are demonstrated as it identifies strong and systematic patterns of CMAQ model bias across a myriad of spatial and temporal scales that are neither constrained to geopolitical boundaries nor monthly/seasonal time periods (a limitation of many current studies). The technique also identifies locations (station–grid cell pairs) that are used as indicators for a more thorough diagnostic evaluation thereby hastening and facilitating understanding of the prob
Chen, Yushun; Viadero, Roger C; Wei, Xinchao; Fortney, Ronald; Hedrick, Lara B; Welsh, Stuart A; Anderson, James T; Lin, Lian-Shin
2009-01-01
Refining best management practices (BMPs) for future highway construction depends on a comprehensive understanding of environmental impacts from current construction methods. Based on a before-after-control impact (BACI) experimental design, long-term stream monitoring (1997-2006) was conducted at upstream (as control, n = 3) and downstream (as impact, n = 6) sites in the Lost River watershed of the Mid-Atlantic Highlands region, West Virginia. Monitoring data were analyzed to assess impacts of during and after highway construction on 15 water quality parameters and macroinvertebrate condition using the West Virginia stream condition index (WVSCI). Principal components analysis (PCA) identified regional primary water quality variances, and paired t tests and time series analysis detected seven highway construction-impacted water quality parameters which were mainly associated with the second principal component. In particular, impacts on turbidity, total suspended solids, and total iron during construction, impacts on chloride and sulfate during and after construction, and impacts on acidity and nitrate after construction were observed at the downstream sites. The construction had statistically significant impacts on macroinvertebrate index scores (i.e., WVSCI) after construction, but did not change the overall good biological condition. Implementing BMPs that address those construction-impacted water quality parameters can be an effective mitigation strategy for future highway construction in this highlands region.
Local Geographic Variation of Public Services Inequality: Does the Neighborhood Scale Matter?
Wei, Chunzhu; Cabrera-Barona, Pablo; Blaschke, Thomas
2016-01-01
This study aims to explore the effect of the neighborhood scale when estimating public services inequality based on the aggregation of social, environmental, and health-related indicators. Inequality analyses were carried out at three neighborhood scales: the original census blocks and two aggregated neighborhood units generated by the spatial “k”luster analysis by the tree edge removal (SKATER) algorithm and the self-organizing map (SOM) algorithm. Then, we combined a set of health-related public services indicators with the geographically weighted principal components analyses (GWPCA) and the principal components analyses (PCA) to measure the public services inequality across all multi-scale neighborhood units. Finally, a statistical test was applied to evaluate the scale effects in inequality measurements by combining all available field survey data. We chose Quito as the case study area. All of the aggregated neighborhood units performed better than the original census blocks in terms of the social indicators extracted from a field survey. The SKATER and SOM algorithms can help to define the neighborhoods in inequality analyses. Moreover, GWPCA performs better than PCA in multivariate spatial inequality estimation. Understanding the scale effects is essential to sustain a social neighborhood organization, which, in turn, positively affects social determinants of public health and public quality of life. PMID:27706072
Terahertz-dependent identification of simulated hole shapes in oil-gas reservoirs
NASA Astrophysics Data System (ADS)
Bao, Ri-Ma; Zhan, Hong-Lei; Miao, Xin-Yang; Zhao, Kun; Feng, Cheng-Jing; Dong, Chen; Li, Yi-Zhang; Xiao, Li-Zhi
2016-10-01
Detecting holes in oil-gas reservoirs is vital to the evaluation of reservoir potential. The main objective of this study is to demonstrate the feasibility of identifying general micro-hole shapes, including triangular, circular, and square shapes, in oil-gas reservoirs by adopting terahertz time-domain spectroscopy (THz-TDS). We evaluate the THz absorption responses of punched silicon (Si) wafers having micro-holes with sizes of 20 μm-500 μm. Principal component analysis (PCA) is used to establish a model between THz absorbance and hole shapes. The positions of samples in three-dimensional spaces for three principal components are used to determine the differences among diverse hole shapes and the homogeneity of similar shapes. In addition, a new Si wafer with the unknown hole shapes, including triangular, circular, and square, can be qualitatively identified by combining THz-TDS and PCA. Therefore, the combination of THz-TDS with mathematical statistical methods can serve as an effective approach to the rapid identification of micro-hole shapes in oil-gas reservoirs. Project supported by the National Natural Science Foundation of China (Grant No. 61405259), the National Basic Research Program of China (Grant No. 2014CB744302), and the Specially Founded Program on National Key Scientific Instruments and Equipment Development, China (Grant No. 2012YQ140005).
Villalba-Mora, Elena; Casas, Isabel; Lupiañez-Villanueva, Francisco; Maghiros, Ioannis
2015-07-01
We investigated the level of adoption of Health Information Technologies (HIT) services, and the factors that influence this, amongst specialised and primary care physicians; in Andalusia, Spain. We analysed the physicians' responses to an online survey. First, we performed a statistical descriptive analysis of the data; thereafter, a principal component analysis; and finally an order logit model to explain the effect of the use in the adoption and to analyse which are the existing barriers. The principal component analysis revealed three main uses of Health Information Technologies: Electronic Health Records (EHR), ePrescription and patient management and telemedicine services. Results from an ordered logit model showed that the frequency of use of HIT is associated with the physicians' perceived usefulness. Lack of financing appeared as a common barrier to the adoption of the three types of services. For ePrescription and patient management, the physician's lack of skills is still a barrier. In the case of telemedicine services, lack of security and lack of interest amongst professionals are the existing barriers. EHR functionalities are fully adopted, in terms of perceived usefulness. EPrescription and patient management are almost fully adopted, while telemedicine is in an early stage of adoption. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Fadil, Mouhcine; Farah, Abdellah; Ihssane, Bouchaib; Haloui, Taoufik; Lebrazi, Sara; Zghari, Badreddine; Rachiq, Saâd
2016-01-01
To investigate the effect of environmental factors such as light and shade on essential oil yield and morphological traits of Moroccan Myrtus communis, a chemometric study was conducted on 20 individuals growing under two contrasting light environments. The study of individual's parameters by principal component analysis has shown that essential oil yield, altitude, and leaves thickness were positively correlated between them and negatively correlated with plants height, leaves length and leaves width. Principal component analysis and hierarchical cluster analysis have also shown that the individuals of each sampling site were grouped separately. The one-way ANOVA test has confirmed the effect of light and shade on essential oil yield and morphological parameters by showing a statistically significant difference between them from the shaded side to the sunny one. Finally, the multiple linear model containing main, interaction and quadratic terms was chosen for the modeling of essential oil yield in terms of morphological parameters. Sun plants have a small height, small leaves length and width, but they are thicker and richer in essential oil than shade plants which have shown almost the opposite. The highlighted multiple linear model can be used to predict essential oil yield in the studied area.
The fine-scale genetic structure and evolution of the Japanese population
Katsuya, Tomohiro; Kimura, Ryosuke; Nabika, Toru; Isomura, Minoru; Ohkubo, Takayoshi; Tabara, Yasuharu; Yamamoto, Ken; Yokota, Mitsuhiro; Liu, Xuanyao; Saw, Woei-Yuh; Mamatyusupu, Dolikun; Yang, Wenjun; Xu, Shuhua
2017-01-01
The contemporary Japanese populations largely consist of three genetically distinct groups—Hondo, Ryukyu and Ainu. By principal-component analysis, while the three groups can be clearly separated, the Hondo people, comprising 99% of the Japanese, form one almost indistinguishable cluster. To understand fine-scale genetic structure, we applied powerful haplotype-based statistical methods to genome-wide single nucleotide polymorphism data from 1600 Japanese individuals, sampled from eight distinct regions in Japan. We then combined the Japanese data with 26 other Asian populations data to analyze the shared ancestry and genetic differentiation. We found that the Japanese could be separated into nine genetic clusters in our dataset, showing a marked concordance with geography; and that major components of ancestry profile of Japanese were from the Korean and Han Chinese clusters. We also detected and dated admixture in the Japanese. While genetic differentiation between Ryukyu and Hondo was suggested to be caused in part by positive selection, genetic differentiation among the Hondo clusters appeared to result principally from genetic drift. Notably, in Asians, we found the possibility that positive selection accentuated genetic differentiation among distant populations but attenuated genetic differentiation among close populations. These findings are significant for studies of human evolution and medical genetics. PMID:29091727
Blind deconvolution with principal components analysis for wide-field and small-aperture telescopes
NASA Astrophysics Data System (ADS)
Jia, Peng; Sun, Rongyu; Wang, Weinan; Cai, Dongmei; Liu, Huigen
2017-09-01
Telescopes with a wide field of view (greater than 1°) and small apertures (less than 2 m) are workhorses for observations such as sky surveys and fast-moving object detection, and play an important role in time-domain astronomy. However, images captured by these telescopes are contaminated by optical system aberrations, atmospheric turbulence, tracking errors and wind shear. To increase the quality of images and maximize their scientific output, we propose a new blind deconvolution algorithm based on statistical properties of the point spread functions (PSFs) of these telescopes. In this new algorithm, we first construct the PSF feature space through principal component analysis, and then classify PSFs from a different position and time using a self-organizing map. According to the classification results, we divide images of the same PSF types and select these PSFs to construct a prior PSF. The prior PSF is then used to restore these images. To investigate the improvement that this algorithm provides for data reduction, we process images of space debris captured by our small-aperture wide-field telescopes. Comparing the reduced results of the original images and the images processed with the standard Richardson-Lucy method, our method shows a promising improvement in astrometry accuracy.
NASA Astrophysics Data System (ADS)
Baglivo, Fabricio Hugo; Arini, Pedro David
2011-12-01
Electrocardiographic repolarization abnormalities can be detected by Principal Components Analysis of the T-wave. In this work we studied the efect of signal averaging on the mean value and reproducibility of the ratio of the 2nd to the 1st eigenvalue of T-wave (T21W) and the absolute and relative T-wave residuum (TrelWR and TabsWR) in the ECG during ischemia induced by Percutaneous Coronary Intervention. Also, the intra-subject and inter-subject variability of T-wave parameters have been analyzed. Results showed that TrelWR and TabsWR evaluated from the average of 10 complexes had lower values and higher reproducibility than those obtained from 1 complex. On the other hand T21W calculated from 10 complexes did not show statistical diferences versus the T21W calculated on single beats. The results of this study corroborate that, with a signal averaging technique, the 2nd and the 1st eigenvalue are not afected by noise while the 4th to 8th eigenvalues are so much afected by this, suggesting the use of the signal averaged technique before calculation of absolute and relative T-wave residuum. Finally, we have shown that T-wave morphology parameters present high intra-subject stability.
Haskard-Zolnierek, Kelly B
2012-01-01
This paper describes the development of the 47-item Physician-Patient Communication about Pain (PCAP) scale for use with audiotaped medical visit interactions. Patient pain was assessed with the Medical Outcomes Study SF-36 Bodily Pain Scale. Four raters assessed 181 audiotaped patient interactions with 68 physicians. Descriptive statistics of PCAP items were computed. Principal components analyses with 20 scale items were used to reduce the scale to composite variables for analyses. Validity was assessed through (1) comparing PCAP composite scores for patients with high versus low pain and (2) correlating PCAP composites with a separate communication rating scale. Principal components analyses yielded four physician and five patient communication composites (mean alpha=.77). Some evidence for concurrent validity was provided (5 of 18 correlations with communication validation rating scale were significant). Paired-sample t tests showed significant differences for 4 patient PCAP composites, showing the PCAP scale discriminates between high and low pain patients' communication. The PCAP scale shows partial evidence of reliability and two forms of validity. More research with this scale (developing more reliable and valid composites) is needed to extend these preliminary findings before this scale is applicable for use in practice. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Principal components analysis in clinical studies.
Zhang, Zhongheng; Castelló, Adela
2017-09-01
In multivariate analysis, independent variables are usually correlated to each other which can introduce multicollinearity in the regression models. One approach to solve this problem is to apply principal components analysis (PCA) over these variables. This method uses orthogonal transformation to represent sets of potentially correlated variables with principal components (PC) that are linearly uncorrelated. PCs are ordered so that the first PC has the largest possible variance and only some components are selected to represent the correlated variables. As a result, the dimension of the variable space is reduced. This tutorial illustrates how to perform PCA in R environment, the example is a simulated dataset in which two PCs are responsible for the majority of the variance in the data. Furthermore, the visualization of PCA is highlighted.
Complexity of free energy landscapes of peptides revealed by nonlinear principal component analysis.
Nguyen, Phuong H
2006-12-01
Employing the recently developed hierarchical nonlinear principal component analysis (NLPCA) method of Saegusa et al. (Neurocomputing 2004;61:57-70 and IEICE Trans Inf Syst 2005;E88-D:2242-2248), the complexities of the free energy landscapes of several peptides, including triglycine, hexaalanine, and the C-terminal beta-hairpin of protein G, were studied. First, the performance of this NLPCA method was compared with the standard linear principal component analysis (PCA). In particular, we compared two methods according to (1) the ability of the dimensionality reduction and (2) the efficient representation of peptide conformations in low-dimensional spaces spanned by the first few principal components. The study revealed that NLPCA reduces the dimensionality of the considered systems much better, than did PCA. For example, in order to get the similar error, which is due to representation of the original data of beta-hairpin in low dimensional space, one needs 4 and 21 principal components of NLPCA and PCA, respectively. Second, by representing the free energy landscapes of the considered systems as a function of the first two principal components obtained from PCA, we obtained the relatively well-structured free energy landscapes. In contrast, the free energy landscapes of NLPCA are much more complicated, exhibiting many states which are hidden in the PCA maps, especially in the unfolded regions. Furthermore, the study also showed that many states in the PCA maps are mixed up by several peptide conformations, while those of the NLPCA maps are more pure. This finding suggests that the NLPCA should be used to capture the essential features of the systems. (c) 2006 Wiley-Liss, Inc.
Jović, Ozren; Smolić, Tomislav; Primožič, Ines; Hrenar, Tomica
2016-04-19
The aim of this study was to investigate the feasibility of FTIR-ATR spectroscopy coupled with the multivariate numerical methodology for qualitative and quantitative analysis of binary and ternary edible oil mixtures. Four pure oils (extra virgin olive oil, high oleic sunflower oil, rapeseed oil, and sunflower oil), as well as their 54 binary and 108 ternary mixtures, were analyzed using FTIR-ATR spectroscopy in combination with principal component and discriminant analysis, partial least-squares, and principal component regression. It was found that the composition of all 166 samples can be excellently represented using only the first three principal components describing 98.29% of total variance in the selected spectral range (3035-2989, 1170-1140, 1120-1100, 1093-1047, and 930-890 cm(-1)). Factor scores in 3D space spanned by these three principal components form a tetrahedral-like arrangement: pure oils being at the vertices, binary mixtures at the edges, and ternary mixtures on the faces of a tetrahedron. To confirm the validity of results, we applied several cross-validation methods. Quantitative analysis was performed by minimization of root-mean-square error of cross-validation values regarding the spectral range, derivative order, and choice of method (partial least-squares or principal component regression), which resulted in excellent predictions for test sets (R(2) > 0.99 in all cases). Additionally, experimentally more demanding gas chromatography analysis of fatty acid content was carried out for all specimens, confirming the results obtained by FTIR-ATR coupled with principal component analysis. However, FTIR-ATR provided a considerably better model for prediction of mixture composition than gas chromatography, especially for high oleic sunflower oil.
Liu, Xiang; Guo, Ling-Peng; Zhang, Fei-Yun; Ma, Jie; Mu, Shu-Yong; Zhao, Xin; Li, Lan-Hai
2015-02-01
Eight physical and chemical indicators related to water quality were monitored from nineteen sampling sites along the Kunes River at the end of snowmelt season in spring. To investigate the spatial distribution characteristics of water physical and chemical properties, cluster analysis (CA), discriminant analysis (DA) and principal component analysis (PCA) are employed. The result of cluster analysis showed that the Kunes River could be divided into three reaches according to the similarities of water physical and chemical properties among sampling sites, representing the upstream, midstream and downstream of the river, respectively; The result of discriminant analysis demonstrated that the reliability of such a classification was high, and DO, Cl- and BOD5 were the significant indexes leading to this classification; Three principal components were extracted on the basis of the principal component analysis, in which accumulative variance contribution could reach 86.90%. The result of principal component analysis also indicated that water physical and chemical properties were mostly affected by EC, ORP, NO3(-) -N, NH4(+) -N, Cl- and BOD5. The sorted results of principal component scores in each sampling sites showed that the water quality was mainly influenced by DO in upstream, by pH in midstream, and by the rest of indicators in downstream. The order of comprehensive scores for principal components revealed that the water quality degraded from the upstream to downstream, i.e., the upstream had the best water quality, followed by the midstream, while the water quality at downstream was the worst. This result corresponded exactly to the three reaches classified using cluster analysis. Anthropogenic activity and the accumulation of pollutants along the river were probably the main reasons leading to this spatial difference.
Putilov, Arcady A; Donskaya, Olga G
2016-01-01
Age-associated changes in different bandwidths of the human electroencephalographic (EEG) spectrum are well documented, but their functional significance is poorly understood. This spectrum seems to represent summation of simultaneous influences of several sleep-wake regulatory processes. Scoring of its orthogonal (uncorrelated) principal components can help in separation of the brain signatures of these processes. In particular, the opposite age-associated changes were documented for scores on the two largest (1st and 2nd) principal components of the sleep EEG spectrum. A decrease of the first score and an increase of the second score can reflect, respectively, the weakening of the sleep drive and disinhibition of the opposing wake drive with age. In order to support the suggestion of age-associated disinhibition of the wake drive from the antagonistic influence of the sleep drive, we analyzed principal component scores of the resting EEG spectra obtained in sleep deprivation experiments with 81 healthy young adults aged between 19 and 26 and 40 healthy older adults aged between 45 and 66 years. At the second day of the sleep deprivation experiments, frontal scores on the 1st principal component of the EEG spectrum demonstrated an age-associated reduction of response to eyes closed relaxation. Scores on the 2nd principal component were either initially increased during wakefulness or less responsive to such sleep-provoking conditions (frontal and occipital scores, respectively). These results are in line with the suggestion of disinhibition of the wake drive with age. They provide an explanation of why older adults are less vulnerable to sleep deprivation than young adults.
NASA Astrophysics Data System (ADS)
Wojciechowski, Adam
2017-04-01
In order to assess ecodiversity understood as a comprehensive natural landscape factor (Jedicke 2001), it is necessary to apply research methods which recognize the environment in a holistic way. Principal component analysis may be considered as one of such methods as it allows to distinguish the main factors determining landscape diversity on the one hand, and enables to discover regularities shaping the relationships between various elements of the environment under study on the other hand. The procedure adopted to assess ecodiversity with the use of principal component analysis involves: a) determining and selecting appropriate factors of the assessed environment qualities (hypsometric, geological, hydrographic, plant, and others); b) calculating the absolute value of individual qualities for the basic areas under analysis (e.g. river length, forest area, altitude differences, etc.); c) principal components analysis and obtaining factor maps (maps of selected components); d) generating a resultant, detailed map and isolating several classes of ecodiversity. An assessment of ecodiversity with the use of principal component analysis was conducted in the test area of 299,67 km2 in Debnica Kaszubska commune. The whole commune is situated in the Weichselian glaciation area of high hypsometric and morphological diversity as well as high geo- and biodiversity. The analysis was based on topographical maps of the commune area in scale 1:25000 and maps of forest habitats. Consequently, nine factors reflecting basic environment elements were calculated: maximum height (m), minimum height (m), average height (m), the length of watercourses (km), the area of water reservoirs (m2), total forest area (ha), coniferous forests habitats area (ha), deciduous forest habitats area (ha), alder habitats area (ha). The values for individual factors were analysed for 358 grid cells of 1 km2. Based on the principal components analysis, four major factors affecting commune ecodiversity were distinguished: hypsometric component (PC1), deciduous forest habitats component (PC2), river valleys and alder habitats component (PC3), and lakes component (PC4). The distinguished factors characterise natural qualities of postglacial area and reflect well the role of the four most important groups of environment components in shaping ecodiversity of the area under study. The map of ecodiversity of Debnica Kaszubska commune was created on the basis of the first four principal component scores and then five classes of diversity were isolated: very low, low, average, high and very high. As a result of the assessment, five commune regions of very high ecodiversity were separated. These regions are also very attractive for tourists and valuable in terms of their rich nature which include protected areas such as Slupia Valley Landscape Park. The suggested method of ecodiversity assessment with the use of principal component analysis may constitute an alternative methodological proposition to other research methods used so far. Literature Jedicke E., 2001. Biodiversität, Geodiversität, Ökodiversität. Kriterien zur Analyse der Landschaftsstruktur - ein konzeptioneller Diskussionsbeitrag. Naturschutz und Landschaftsplanung, 33(2/3), 59-68.
A stochastic model of weather states and concurrent daily precipitation at multiple precipitation stations is described. our algorithms are invested for classification of daily weather states; k means, fuzzy clustering, principal components, and principal components coupled with ...
Rosacea assessment by erythema index and principal component analysis segmentation maps
NASA Astrophysics Data System (ADS)
Kuzmina, Ilona; Rubins, Uldis; Saknite, Inga; Spigulis, Janis
2017-12-01
RGB images of rosacea were analyzed using segmentation maps of principal component analysis (PCA) and erythema index (EI). Areas of segmented clusters were compared to Clinician's Erythema Assessment (CEA) values given by two dermatologists. The results show that visible blood vessels are segmented more precisely on maps of the erythema index and the third principal component (PC3). In many cases, a distribution of clusters on EI and PC3 maps are very similar. Mean values of clusters' areas on these maps show a decrease of the area of blood vessels and erythema and an increase of lighter skin area after the therapy for the patients with diagnosis CEA = 2 on the first visit and CEA=1 on the second visit. This study shows that EI and PC3 maps are more useful than the maps of the first (PC1) and second (PC2) principal components for indicating vascular structures and erythema on the skin of rosacea patients and therapy monitoring.
NASA Astrophysics Data System (ADS)
Zhang, Qiong; Peng, Cong; Lu, Yiming; Wang, Hao; Zhu, Kaiguang
2018-04-01
A novel technique is developed to level airborne geophysical data using principal component analysis based on flight line difference. In the paper, flight line difference is introduced to enhance the features of levelling error for airborne electromagnetic (AEM) data and improve the correlation between pseudo tie lines. Thus we conduct levelling to the flight line difference data instead of to the original AEM data directly. Pseudo tie lines are selected distributively cross profile direction, avoiding the anomalous regions. Since the levelling errors of selective pseudo tie lines show high correlations, principal component analysis is applied to extract the local levelling errors by low-order principal components reconstruction. Furthermore, we can obtain the levelling errors of original AEM data through inverse difference after spatial interpolation. This levelling method does not need to fly tie lines and design the levelling fitting function. The effectiveness of this method is demonstrated by the levelling results of survey data, comparing with the results from tie-line levelling and flight-line correlation levelling.
Multilevel sparse functional principal component analysis.
Di, Chongzhi; Crainiceanu, Ciprian M; Jank, Wolfgang S
2014-01-29
We consider analysis of sparsely sampled multilevel functional data, where the basic observational unit is a function and data have a natural hierarchy of basic units. An example is when functions are recorded at multiple visits for each subject. Multilevel functional principal component analysis (MFPCA; Di et al. 2009) was proposed for such data when functions are densely recorded. Here we consider the case when functions are sparsely sampled and may contain only a few observations per function. We exploit the multilevel structure of covariance operators and achieve data reduction by principal component decompositions at both between and within subject levels. We address inherent methodological differences in the sparse sampling context to: 1) estimate the covariance operators; 2) estimate the functional principal component scores; 3) predict the underlying curves. Through simulations the proposed method is able to discover dominating modes of variations and reconstruct underlying curves well even in sparse settings. Our approach is illustrated by two applications, the Sleep Heart Health Study and eBay auctions.
[Content of mineral elements of Gastrodia elata by principal components analysis].
Li, Jin-ling; Zhao, Zhi; Liu, Hong-chang; Luo, Chun-li; Huang, Ming-jin; Luo, Fu-lai; Wang, Hua-lei
2015-03-01
To study the content of mineral elements and the principal components in Gastrodia elata. Mineral elements were determined by ICP and the data was analyzed by SPSS. K element has the highest content-and the average content was 15.31 g x kg(-1). The average content of N element was 8.99 g x kg(-1), followed by K element. The coefficient of variation of K and N was small, but the Mn was the biggest with 51.39%. The highly significant positive correlation was found among N, P and K . Three principal components were selected by principal components analysis to evaluate the quality of G. elata. P, B, N, K, Cu, Mn, Fe and Mg were the characteristic elements of G. elata. The content of K and N elements was higher and relatively stable. The variation of Mn content was biggest. The quality of G. elata in Guizhou and Yunnan was better from the perspective of mineral elements.
Chemical information obtained from Auger depth profiles by means of advanced factor analysis (MLCFA)
NASA Astrophysics Data System (ADS)
De Volder, P.; Hoogewijs, R.; De Gryse, R.; Fiermans, L.; Vennik, J.
1993-01-01
The advanced multivariate statistical technique "maximum likelihood common factor analysis (MLCFA)" is shown to be superior to "principal component analysis (PCA)" for decomposing overlapping peaks into their individual component spectra of which neither the number of components nor the peak shape of the component spectra is known. An examination of the maximum resolving power of both techniques, MLCFA and PCA, by means of artificially created series of multicomponent spectra confirms this finding unambiguously. Substantial progress in the use of AES as a chemical-analysis technique is accomplished through the implementation of MLCFA. Chemical information from Auger depth profiles is extracted by investigating the variation of the line shape of the Auger signal as a function of the changing chemical state of the element. In particular, MLCFA combined with Auger depth profiling has been applied to problems related to steelcord-rubber tyre adhesion. MLCFA allows one to elucidate the precise nature of the interfacial layer of reaction products between natural rubber vulcanized on a thin brass layer. This study reveals many interesting chemical aspects of the oxi-sulfidation of brass undetectable with classical AES.
Visualizing Hyolaryngeal Mechanics in Swallowing Using Dynamic MRI
Pearson, William G.; Zumwalt, Ann C.
2013-01-01
Introduction Coordinates of anatomical landmarks are captured using dynamic MRI to explore whether a proposed two-sling mechanism underlies hyolaryngeal elevation in pharyngeal swallowing. A principal components analysis (PCA) is applied to coordinates to determine the covariant function of the proposed mechanism. Methods Dynamic MRI (dMRI) data were acquired from eleven healthy subjects during a repeated swallows task. Coordinates mapping the proposed mechanism are collected from each dynamic (frame) of a dynamic MRI swallowing series of a randomly selected subject in order to demonstrate shape changes in a single subject. Coordinates representing minimum and maximum hyolaryngeal elevation of all 11 subjects were also mapped to demonstrate shape changes of the system among all subjects. MophoJ software was used to perform PCA and determine vectors of shape change (eigenvectors) for elements of the two-sling mechanism of hyolaryngeal elevation. Results For both single subject and group PCAs, hyolaryngeal elevation accounted for the first principal component of variation. For the single subject PCA, the first principal component accounted for 81.5% of the variance. For the between subjects PCA, the first principal component accounted for 58.5% of the variance. Eigenvectors and shape changes associated with this first principal component are reported. Discussion Eigenvectors indicate that two-muscle slings and associated skeletal elements function as components of a covariant mechanism to elevate the hyolaryngeal complex. Morphological analysis is useful to model shape changes in the two-sling mechanism of hyolaryngeal elevation. PMID:25090608
Panazzolo, Diogo G; Sicuro, Fernando L; Clapauch, Ruth; Maranhão, Priscila A; Bouskela, Eliete; Kraemer-Aguiar, Luiz G
2012-11-13
We aimed to evaluate the multivariate association between functional microvascular variables and clinical-laboratorial-anthropometrical measurements. Data from 189 female subjects (34.0 ± 15.5 years, 30.5 ± 7.1 kg/m2), who were non-smokers, non-regular drug users, without a history of diabetes and/or hypertension, were analyzed by principal component analysis (PCA). PCA is a classical multivariate exploratory tool because it highlights common variation between variables allowing inferences about possible biological meaning of associations between them, without pre-establishing cause-effect relationships. In total, 15 variables were used for PCA: body mass index (BMI), waist circumference, systolic and diastolic blood pressure (BP), fasting plasma glucose, levels of total cholesterol, high-density lipoprotein cholesterol (HDL-c), low-density lipoprotein cholesterol (LDL-c), triglycerides (TG), insulin, C-reactive protein (CRP), and functional microvascular variables measured by nailfold videocapillaroscopy. Nailfold videocapillaroscopy was used for direct visualization of nutritive capillaries, assessing functional capillary density, red blood cell velocity (RBCV) at rest and peak after 1 min of arterial occlusion (RBCV(max)), and the time taken to reach RBCV(max) (TRBCV(max)). A total of 35% of subjects had metabolic syndrome, 77% were overweight/obese, and 9.5% had impaired fasting glucose. PCA was able to recognize that functional microvascular variables and clinical-laboratorial-anthropometrical measurements had a similar variation. The first five principal components explained most of the intrinsic variation of the data. For example, principal component 1 was associated with BMI, waist circumference, systolic BP, diastolic BP, insulin, TG, CRP, and TRBCV(max) varying in the same way. Principal component 1 also showed a strong association among HDL-c, RBCV, and RBCV(max), but in the opposite way. Principal component 3 was associated only with microvascular variables in the same way (functional capillary density, RBCV and RBCV(max)). Fasting plasma glucose appeared to be related to principal component 4 and did not show any association with microvascular reactivity. In non-diabetic female subjects, a multivariate scenario of associations between classic clinical variables strictly related to obesity and metabolic syndrome suggests a significant relationship between these diseases and microvascular reactivity.
The factorial reliability of the Middlesex Hospital Questionnaire in normal subjects.
Bagley, C
1980-03-01
The internal reliability of the Middlesex Hospital Questionnaire and its component subscales has been checked by means of principal components analyses of data on 256 normal subjects. The subscales (with the possible exception of Hysteria) were found to contribute to the general underlying factor of psychoneurosis. In general, the principal components analysis points to the reliability of the subscales, despite some item overlap.
ERIC Educational Resources Information Center
McCormick, Ernest J.; And Others
The study deals with the job component method of establishing compensation rates. The basic job analysis questionnaire used in the study was the Position Analysis Questionnaire (PAQ) (Form B). On the basis of a principal components analysis of PAQ data for a large sample (2,688) of jobs, a number of principal components (job dimensions) were…
ERIC Educational Resources Information Center
Faginski-Stark, Erica; Casavant, Christopher; Collins, William; McCandless, Jason; Tencza, Marilyn
2012-01-01
Recent federal and state mandates have tasked school systems to move beyond principal evaluation as a bureaucratic function and to re-imagine it as a critical component to improve principal performance and compel school renewal. This qualitative study investigated the district leaders' and principals' perceptions of the performance evaluation…
Use of Principal Components Analysis to Explain Controls on Nutrient Fluxes to the Chesapeake Bay
NASA Astrophysics Data System (ADS)
Rice, K. C.; Mills, A. L.
2017-12-01
The Chesapeake Bay watershed, on the east coast of the United States, encompasses about 166,000-square kilometers (km2) of diverse land use, which includes a mixture of forested, agricultural, and developed land. The watershed is now managed under a Total Daily Maximum Load (TMDL), which requires implementation of management actions by 2025 that are sufficient to reduce nitrogen, phosphorus, and suspended-sediment fluxes to the Chesapeake Bay and restore the bay's water quality. We analyzed nutrient and sediment data along with land-use and climatic variables in nine sub watersheds to better understand the drivers of flux within the watershed and to provide relevant management implications. The nine sub watersheds range in area from 300 to 30,000 km2, and the analysis period was 1985-2014. The 31 variables specific to each sub watershed were highly statistically significantly correlated, so Principal Components Analysis was used to reduce the dimensionality of the dataset. The analysis revealed that about 80% of the variability in the whole dataset can be explained by discharge, flux, and concentration of nutrients and sediment. The first two principal components (PCs) explained about 68% of the total variance. PC1 loaded strongly on discharge and flux, and PC2 loaded on concentration. The PC scores of both PC1 and PC2 varied by season. Subsequent analysis of PC1 scores versus PC2 scores, broken out by sub watershed, revealed management implications. Some of the largest sub watersheds are largely driven by discharge, and consequently large fluxes. In contrast, some of the smaller sub watersheds are more variable in nutrient concentrations than discharge and flux. Our results suggest that, given no change in discharge, a reduction in nutrient flux to the streams in the smaller watersheds could result in a proportionately larger decrease in fluxes of nutrients down the river to the bay, than in the larger watersheds.
Fernández-Arjona, María Del Mar; Grondona, Jesús M; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D
2017-01-01
It is known that microglia morphology and function are closely related, but only few studies have objectively described different morphological subtypes. To address this issue, morphological parameters of microglial cells were analyzed in a rat model of aseptic neuroinflammation. After the injection of a single dose of the enzyme neuraminidase (NA) within the lateral ventricle (LV) an acute inflammatory process occurs. Sections from NA-injected animals and sham controls were immunolabeled with the microglial marker IBA1, which highlights ramifications and features of the cell shape. Using images obtained by section scanning, individual microglial cells were sampled from various regions (septofimbrial nucleus, hippocampus and hypothalamus) at different times post-injection (2, 4 and 12 h). Each cell yielded a set of 15 morphological parameters by means of image analysis software. Five initial parameters (including fractal measures) were statistically different in cells from NA-injected rats (most of them IL-1β positive, i.e., M1-state) compared to those from control animals (none of them IL-1β positive, i.e., surveillant state). However, additional multimodal parameters were revealed more suitable for hierarchical cluster analysis (HCA). This method pointed out the classification of microglia population in four clusters. Furthermore, a linear discriminant analysis (LDA) suggested three specific parameters to objectively classify any microglia by a decision tree. In addition, a principal components analysis (PCA) revealed two extra valuable variables that allowed to further classifying microglia in a total of eight sub-clusters or types. The spatio-temporal distribution of these different morphotypes in our rat inflammation model allowed to relate specific morphotypes with microglial activation status and brain location. An objective method for microglia classification based on morphological parameters is proposed. Main points Microglia undergo a quantifiable morphological change upon neuraminidase induced inflammation.Hierarchical cluster and principal components analysis allow morphological classification of microglia.Brain location of microglia is a relevant factor.
Guo, Lijun; Bao, Yong; Ma, Jun; Li, Shujun; Cai, Yuyang; Sun, Wei; Liu, Qiaohong
2018-01-01
Urban areas usually display better health care services than rural areas, but data about suburban areas in China are lacking. Hence, this cross-sectional study compared the utilization of community basic medical services in Shanghai urban and suburban areas between 2009 and 2014. These data were used to improve the efficiency of community health service utilization and to provide a reference for solving the main health problems of the residents in urban and suburban areas of Shanghai. Using a two-stage random sampling method, questionnaires were completed by 73 community health service centers that were randomly selected from six districts that were also randomly selected from 17 counties in Shanghai. Descriptive statistics, principal component analysis, and forecast analysis were used to complete a gap analysis of basic health services utilization quality between urban and suburban areas. During the 6-year study period, there was an increasing trend toward greater efficiency of basic medical service provision, benefits of basic medical service provision, effectiveness of common chronic disease management, overall satisfaction of community residents, and two-way referral effects. In addition to the implementation effect of hypertension management and two-way referral, the remaining indicators showed a superior effect in urban areas compared with the suburbs (P<0.001). In addition, among the seven principal components, four principal component scores were better in urban areas than in suburban areas (P = <0.001, 0.004, 0.036, and 0.022). The urban comprehensive score also exceeded that of the suburbs (P<0.001). In summary, over the 6-year period, there was a rapidly increasing trend in basic medical service utilization. Comprehensive satisfaction clearly improved as well. Nevertheless, there was an imbalance in health service utilization between urban and suburban areas. There is a need for the health administrative department to address this imbalance between urban and suburban institutions and to provide the required support to underdeveloped areas to improve resident satisfaction.
Gu, Haiwei; Pan, Zhengzheng; Xi, Bowei; Asiago, Vincent; Musselman, Brian; Raftery, Daniel
2011-02-07
Nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) are the two most commonly used analytical tools in metabolomics, and their complementary nature makes the combination particularly attractive. A combined analytical approach can improve the potential for providing reliable methods to detect metabolic profile alterations in biofluids or tissues caused by disease, toxicity, etc. In this paper, (1)H NMR spectroscopy and direct analysis in real time (DART)-MS were used for the metabolomics analysis of serum samples from breast cancer patients and healthy controls. Principal component analysis (PCA) of the NMR data showed that the first principal component (PC1) scores could be used to separate cancer from normal samples. However, no such obvious clustering could be observed in the PCA score plot of DART-MS data, even though DART-MS can provide a rich and informative metabolic profile. Using a modified multivariate statistical approach, the DART-MS data were then reevaluated by orthogonal signal correction (OSC) pretreated partial least squares (PLS), in which the Y matrix in the regression was set to the PC1 score values from the NMR data analysis. This approach, and a similar one using the first latent variable from PLS-DA of the NMR data resulted in a significant improvement of the separation between the disease samples and normals, and a metabolic profile related to breast cancer could be extracted from DART-MS. The new approach allows the disease classification to be expressed on a continuum as opposed to a binary scale and thus better represents the disease and healthy classifications. An improved metabolic profile obtained by combining MS and NMR by this approach may be useful to achieve more accurate disease detection and gain more insight regarding disease mechanisms and biology. Copyright © 2010 Elsevier B.V. All rights reserved.
Fernández-Arjona, María del Mar; Grondona, Jesús M.; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D.
2017-01-01
It is known that microglia morphology and function are closely related, but only few studies have objectively described different morphological subtypes. To address this issue, morphological parameters of microglial cells were analyzed in a rat model of aseptic neuroinflammation. After the injection of a single dose of the enzyme neuraminidase (NA) within the lateral ventricle (LV) an acute inflammatory process occurs. Sections from NA-injected animals and sham controls were immunolabeled with the microglial marker IBA1, which highlights ramifications and features of the cell shape. Using images obtained by section scanning, individual microglial cells were sampled from various regions (septofimbrial nucleus, hippocampus and hypothalamus) at different times post-injection (2, 4 and 12 h). Each cell yielded a set of 15 morphological parameters by means of image analysis software. Five initial parameters (including fractal measures) were statistically different in cells from NA-injected rats (most of them IL-1β positive, i.e., M1-state) compared to those from control animals (none of them IL-1β positive, i.e., surveillant state). However, additional multimodal parameters were revealed more suitable for hierarchical cluster analysis (HCA). This method pointed out the classification of microglia population in four clusters. Furthermore, a linear discriminant analysis (LDA) suggested three specific parameters to objectively classify any microglia by a decision tree. In addition, a principal components analysis (PCA) revealed two extra valuable variables that allowed to further classifying microglia in a total of eight sub-clusters or types. The spatio-temporal distribution of these different morphotypes in our rat inflammation model allowed to relate specific morphotypes with microglial activation status and brain location. An objective method for microglia classification based on morphological parameters is proposed. Main points Microglia undergo a quantifiable morphological change upon neuraminidase induced inflammation.Hierarchical cluster and principal components analysis allow morphological classification of microglia.Brain location of microglia is a relevant factor. PMID:28848398
Development of a scale for attitude toward condom use for migrant workers in India.
Talukdar, Arunansu; Bal, Runa; Sanyal, Debasis; Roy, Krishnendu; Talukdar, Payel Sengupta
2008-02-01
The propaganda for the use of condoms remains one of the mainstay for prevention of human immunodeficiency virus (HIV) transmission. In spite of the proven efficacy of condom, some moral, social and psychological obstacles are still prevalent, hindering the use of condoms. The study tried to construct a short condom-attitude scale for use among the migrant workers, a major bridge population in India. The study was conducted among the male migrant workers who were 18-49 years old, sexually active and had heard about condoms and were engaged in nonformal jobs. We recruited 234 and 280 candidates for Phase 1 and Phase 2 respectively. Ten items from the original 40-item Brown's ATC (attitude towards condom) scale were selected in Phase 1. After analysis of Phase 1 results, using principal component analysis six items were found appropriate for measuring attitude towards condom use. These six items were then administered in another group in Phase 2. Utilizing Pearson's correlations, scale items were examined in terms of their mean response scores and the correlation matrix between items. Cornbach's alpha and construct validity were also assessed for the entire sample. Study subjects were categorized as condom users and nonusers. The scale structure was explored by analyzing response scores with respect to the items, using principal component analysis followed by varimax rotation analysis. Principal component analysis revealed that the first factor accounted for 71% of the variance, with eigenvalue greater than one. Eigenvalues of the second factor was less than one. Application of screen test suggests only one factor was dominant. Mean score of six items among condom users was 20.45 and that among nonusers was 16.67, which was statistically significant (P<0.01). Cornbach's alpha coefficient was 0.92. This tailor-made attitude-toward-condom-use scale, targeted for most vulnerable people in India, can be included in any rapid survey for assessing the existing beliefs and attitudes toward condoms and also for evaluating efficacy of an intervention program.
Ma, Jun; Li, Shujun; Cai, Yuyang; Sun, Wei; Liu, Qiaohong
2018-01-01
Urban areas usually display better health care services than rural areas, but data about suburban areas in China are lacking. Hence, this cross-sectional study compared the utilization of community basic medical services in Shanghai urban and suburban areas between 2009 and 2014. These data were used to improve the efficiency of community health service utilization and to provide a reference for solving the main health problems of the residents in urban and suburban areas of Shanghai. Using a two-stage random sampling method, questionnaires were completed by 73 community health service centers that were randomly selected from six districts that were also randomly selected from 17 counties in Shanghai. Descriptive statistics, principal component analysis, and forecast analysis were used to complete a gap analysis of basic health services utilization quality between urban and suburban areas. During the 6-year study period, there was an increasing trend toward greater efficiency of basic medical service provision, benefits of basic medical service provision, effectiveness of common chronic disease management, overall satisfaction of community residents, and two-way referral effects. In addition to the implementation effect of hypertension management and two-way referral, the remaining indicators showed a superior effect in urban areas compared with the suburbs (P<0.001). In addition, among the seven principal components, four principal component scores were better in urban areas than in suburban areas (P = <0.001, 0.004, 0.036, and 0.022). The urban comprehensive score also exceeded that of the suburbs (P<0.001). In summary, over the 6-year period, there was a rapidly increasing trend in basic medical service utilization. Comprehensive satisfaction clearly improved as well. Nevertheless, there was an imbalance in health service utilization between urban and suburban areas. There is a need for the health administrative department to address this imbalance between urban and suburban institutions and to provide the required support to underdeveloped areas to improve resident satisfaction. PMID:29791470
Khamis, Fathiya M.; Masiga, Daniel K.; Mohamed, Samira A.; Salifu, Daisy; de Meyer, Marc; Ekesi, Sunday
2012-01-01
In 2003, a new fruit fly pest species was recorded for the first time in Kenya and has subsequently been found in 28 countries across tropical Africa. The insect was described as Bactrocera invadens, due to its rapid invasion of the African continent. In this study, the morphometry and DNA Barcoding of different populations of B. invadens distributed across the species range of tropical Africa and a sample from the pest's putative aboriginal home of Sri Lanka was investigated. Morphometry using wing veins and tibia length was used to separate B. invadens populations from other closely related Bactrocera species. The Principal component analysis yielded 15 components which correspond to the 15 morphometric measurements. The first two principal axes contributed to 90.7% of the total variance and showed partial separation of these populations. Canonical discriminant analysis indicated that only the first five canonical variates were statistically significant. The first two canonical variates contributed a total of 80.9% of the total variance clustering B. invadens with other members of the B. dorsalis complex while distinctly separating B. correcta, B. cucurbitae, B. oleae and B. zonata. The largest Mahalanobis squared distance (D2 = 122.9) was found to be between B. cucurbitae and B. zonata, while the lowest was observed between B. invadens populations against B. kandiensis (8.1) and against B. dorsalis s.s (11.4). Evolutionary history inferred by the Neighbor-Joining method clustered the Bactrocera species populations into four clusters. First cluster consisted of the B. dorsalis complex (B. invadens, B. kandiensis and B. dorsalis s. s.), branching from the same node while the second group was paraphyletic clades of B. correcta and B. zonata. The last two are monophyletic clades, consisting of B. cucurbitae and B. oleae, respectively. Principal component analysis using the genetic distances confirmed the clustering inferred by the NJ tree. PMID:23028649
Jiang, Yu; Li, Changying
2015-01-01
Cotton quality, a major factor determining both cotton profitability and marketability, is affected by not only the overall quantity of but also the type of the foreign matter. Although current commercial instruments can measure the overall amount of the foreign matter, no instrument can differentiate various types of foreign matter. The goal of this study was to develop a hyperspectral imaging system to discriminate major types of foreign matter in cotton lint. A push-broom based hyperspectral imaging system with a custom-built multi-thread software was developed to acquire hyperspectral images of cotton fiber with 15 types of foreign matter commonly found in the U.S. cotton lint. A total of 450 (30 replicates for each foreign matter) foreign matter samples were cut into 1 by 1 cm2 pieces and imaged on the lint surface using reflectance mode in the spectral range from 400-1000 nm. The mean spectra of the foreign matter and lint were extracted from the user-defined region-of-interests in the hyperspectral images. The principal component analysis was performed on the mean spectra to reduce the feature dimension from the original 256 bands to the top 3 principal components. The score plots of the 3 principal components were used to examine clusterization patterns for classifying the foreign matter. These patterns were further validated by statistical tests. The experimental results showed that the mean spectra of all 15 types of cotton foreign matter were different from that of the lint. Nine types of cotton foreign matter formed distinct clusters in the score plots. Additionally, all of them were significantly different from each other at the significance level of 0.05 except brown leaf and bract. The developed hyperspectral imaging system is effective to detect and classify cotton foreign matter on the lint surface and has the potential to be implemented in commercial cotton classing offices.
Vasudevan, Rama K; Tselev, Alexander; Baddorf, Arthur P; Kalinin, Sergei V
2014-10-28
Reflection high energy electron diffraction (RHEED) has by now become a standard tool for in situ monitoring of film growth by pulsed laser deposition and molecular beam epitaxy. Yet despite the widespread adoption and wealth of information in RHEED images, most applications are limited to observing intensity oscillations of the specular spot, and much additional information on growth is discarded. With ease of data acquisition and increased computation speeds, statistical methods to rapidly mine the data set are now feasible. Here, we develop such an approach to the analysis of the fundamental growth processes through multivariate statistical analysis of a RHEED image sequence. This approach is illustrated for growth of La(x)Ca(1-x)MnO(3) films grown on etched (001) SrTiO(3) substrates, but is universal. The multivariate methods including principal component analysis and k-means clustering provide insight into the relevant behaviors, the timing and nature of a disordered to ordered growth change, and highlight statistically significant patterns. Fourier analysis yields the harmonic components of the signal and allows separation of the relevant components and baselines, isolating the asymmetric nature of the step density function and the transmission spots from the imperfect layer-by-layer (LBL) growth. These studies show the promise of big data approaches to obtaining more insight into film properties during and after epitaxial film growth. Furthermore, these studies open the pathway to use forward prediction methods to potentially allow significantly more control over growth process and hence final film quality.
Multivariate frequency domain analysis of protein dynamics
NASA Astrophysics Data System (ADS)
Matsunaga, Yasuhiro; Fuchigami, Sotaro; Kidera, Akinori
2009-03-01
Multivariate frequency domain analysis (MFDA) is proposed to characterize collective vibrational dynamics of protein obtained by a molecular dynamics (MD) simulation. MFDA performs principal component analysis (PCA) for a bandpass filtered multivariate time series using the multitaper method of spectral estimation. By applying MFDA to MD trajectories of bovine pancreatic trypsin inhibitor, we determined the collective vibrational modes in the frequency domain, which were identified by their vibrational frequencies and eigenvectors. At near zero temperature, the vibrational modes determined by MFDA agreed well with those calculated by normal mode analysis. At 300 K, the vibrational modes exhibited characteristic features that were considerably different from the principal modes of the static distribution given by the standard PCA. The influences of aqueous environments were discussed based on two different sets of vibrational modes, one derived from a MD simulation in water and the other from a simulation in vacuum. Using the varimax rotation, an algorithm of the multivariate statistical analysis, the representative orthogonal set of eigenmodes was determined at each vibrational frequency.
An assessment of first-order stochastic dispersion theories in porous media
NASA Astrophysics Data System (ADS)
Chin, David A.
1997-12-01
Random realizations of three-dimensional exponentially correlated hydraulic conductivity fields are used in a finite-difference numerical flow model to calculate the mean and covariance of the corresponding Lagrangian-velocity fields. The dispersivity of the porous medium is then determined from the Lagrangian-velocity statistics using the Taylor definition. This estimation procedure is exact, except for numerical errors, and the results are used to assess the accuracy of various first-order dispersion theories in both isotropic and anisotropic porous media. The results show that the Dagan theory is by far the most robust in both isotropic and anisotropic media, producing accurate values of the principal dispersivity components for σy as high as 1.0, In the case of anisotropic media where the flow is at an angle to the principal axis of hydraulic conductivity, it is shown that the dispersivity tensor is rotated away from the flow direction in the non-Fickian phase, but eventually coincides with the flow direction in the Fickian phase.
Pace, Roberto; Martinelli, Ernesto Marco; Sardone, Nicola; D E Combarieu, Eric
2015-03-01
Ginseng is any one of the eleven species belonging to the genus Panax of the family Araliaceae and is found in North America and in eastern Asia. Ginseng is characterized by the presence of ginsenosides. Principally Panax ginseng and Panax quinquefolius are the adaptogenic herbs and are commonly distributed as health food markets. In the present study high performance liquid chromatography has been used to identify and quantify ginsenosides in the two subject species and the different parts of the plant (roots, neck, leaves, flowers, fruits). The power of this chromatographic technique to evaluate the identity of botanical material and to distinguishing different part of the plants has been investigated with metabolomic technique such as principal component analysis. Metabolomics provide a good opportunity for mining useful chemical information from the chromatographic data set resulting an important tool for quality evaluation of medicinal plants in the authenticity, consistency and efficacy. Copyright © 2015 Elsevier B.V. All rights reserved.
Monitoring and evaluation of the water quality of Budeasa Reservoir-Arges River, Romania.
Ion, Antoanela; Vladescu, Luminita; Badea, Irinel Adriana; Comanescu, Laura
2016-09-01
The purpose of this study was to monitor and record the specific characteristics and properties of the Arges River water in the Budeasa Reservoir (the principal water resources of municipal tap water of the big Romanian city Pitesti and surrounding area) for a period of 5 years (2005-2009). The monitored physical and chemical parameters were turbidity, pH, electrical conductivity, chemical oxygen demand, 5 days biochemical oxygen demand, free dissolved oxygen, nitrite, nitrate, ammonia nitrogen, chloride, total dissolved iron ions, sulfate, manganese, phosphate, total alkalinity, and total hardness. The results were discussed in correlation with the precipitation values during the study. Monthly and annual values of each parameter determined in the period January 2005-December 2009 were used as a basis for the classification of Budeasa Reservoir water, according to the European legislation, as well as for assessing its quality as a drinking water supply. Principal component analysis and Pearson correlation coefficients were used as statistical procedures in order to evaluate the data obtained during this study.
2L-PCA: a two-level principal component analyzer for quantitative drug design and its applications.
Du, Qi-Shi; Wang, Shu-Qing; Xie, Neng-Zhong; Wang, Qing-Yan; Huang, Ri-Bo; Chou, Kuo-Chen
2017-09-19
A two-level principal component predictor (2L-PCA) was proposed based on the principal component analysis (PCA) approach. It can be used to quantitatively analyze various compounds and peptides about their functions or potentials to become useful drugs. One level is for dealing with the physicochemical properties of drug molecules, while the other level is for dealing with their structural fragments. The predictor has the self-learning and feedback features to automatically improve its accuracy. It is anticipated that 2L-PCA will become a very useful tool for timely providing various useful clues during the process of drug development.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre
Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy,more » and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics (which we discussed in [1]) where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel. We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.« less
Effect of noise in principal component analysis with an application to ozone pollution
NASA Astrophysics Data System (ADS)
Tsakiri, Katerina G.
This thesis analyzes the effect of independent noise in principal components of k normally distributed random variables defined by a covariance matrix. We prove that the principal components as well as the canonical variate pairs determined from joint distribution of original sample affected by noise can be essentially different in comparison with those determined from the original sample. However when the differences between the eigenvalues of the original covariance matrix are sufficiently large compared to the level of the noise, the effect of noise in principal components and canonical variate pairs proved to be negligible. The theoretical results are supported by simulation study and examples. Moreover, we compare our results about the eigenvalues and eigenvectors in the two dimensional case with other models examined before. This theory can be applied in any field for the decomposition of the components in multivariate analysis. One application is the detection and prediction of the main atmospheric factor of ozone concentrations on the example of Albany, New York. Using daily ozone, solar radiation, temperature, wind speed and precipitation data, we determine the main atmospheric factor for the explanation and prediction of ozone concentrations. A methodology is described for the decomposition of the time series of ozone and other atmospheric variables into the global term component which describes the long term trend and the seasonal variations, and the synoptic scale component which describes the short term variations. By using the Canonical Correlation Analysis, we show that solar radiation is the only main factor between the atmospheric variables considered here for the explanation and prediction of the global and synoptic scale component of ozone. The global term components are modeled by a linear regression model, while the synoptic scale components by a vector autoregressive model and the Kalman filter. The coefficient of determination, R2, for the prediction of the synoptic scale ozone component was found to be the highest when we consider the synoptic scale component of the time series for solar radiation and temperature. KEY WORDS: multivariate analysis; principal component; canonical variate pairs; eigenvalue; eigenvector; ozone; solar radiation; spectral decomposition; Kalman filter; time series prediction
NASA Astrophysics Data System (ADS)
Nemoto, Mitsutaka; Nomura, Yukihiro; Hanaoka, Shohei; Masutani, Yoshitaka; Yoshikawa, Takeharu; Hayashi, Naoto; Yoshioka, Naoki; Ohtomo, Kuni
Anatomical point landmarks as most primitive anatomical knowledge are useful for medical image understanding. In this study, we propose a detection method for anatomical point landmark based on appearance models, which include gray-level statistical variations at point landmarks and their surrounding area. The models are built based on results of Principal Component Analysis (PCA) of sample data sets. In addition, we employed generative learning method by transforming ROI of sample data. In this study, we evaluated our method with 24 data sets of body trunk CT images and obtained 95.8 ± 7.3 % of the average sensitivity in 28 landmarks.
Duerinck, Tim; Couck, Sarah; Vermoortele, Frederik; De Vos, Dirk E; Baron, Gino V; Denayer, Joeri F M
2012-10-02
The low coverage adsorptive properties of the MIL-47 metal organic framework toward aromatic and heterocyclic molecules are reported in this paper. The effect of molecular functionality and size on Henry adsorption constants and adsorption enthalpies of alkyl and heteroatom functionalized benzene derivates and heterocyclic molecules was studied using pulse gas chromatography. By means of statistical analysis, experimental data was analyzed and modeled using principal component analysis and partial least-squares regression. Structure-property relationships were established, revealing and confirming several trends. Among the molecular properties governing the adsorption process, vapor pressure, mean polarizability, and dipole moment play a determining role.
NASA Astrophysics Data System (ADS)
Somogyi, Andrea; Medjoubi, Kadda; Sancho-Tomas, Maria; Visscher, P. T.; Baranton, Gil; Philippot, Pascal
2017-09-01
The understanding of real complex geological, environmental and geo-biological processes depends increasingly on in-depth non-invasive study of chemical composition and morphology. In this paper we used scanning hard X-ray nanoprobe techniques in order to study the elemental composition, morphology and As speciation in complex highly heterogeneous geological samples. Multivariate statistical analytical techniques, such as principal component analysis and clustering were used for data interpretation. These measurements revealed the quantitative and valance state inhomogeneity of As and its relation to the total compositional and morphological variation of the sample at sub-μm scales.
NASA Astrophysics Data System (ADS)
Hristian, L.; Ostafe, M. M.; Manea, L. R.; Apostol, L. L.
2017-06-01
The work pursued the distribution of combed wool fabrics destined to manufacturing of external articles of clothing in terms of the values of durability and physiological comfort indices, using the mathematical model of Principal Component Analysis (PCA). Principal Components Analysis (PCA) applied in this study is a descriptive method of the multivariate analysis/multi-dimensional data, and aims to reduce, under control, the number of variables (columns) of the matrix data as much as possible to two or three. Therefore, based on the information about each group/assortment of fabrics, it is desired that, instead of nine inter-correlated variables, to have only two or three new variables called components. The PCA target is to extract the smallest number of components which recover the most of the total information contained in the initial data.
ERIC Educational Resources Information Center
Battle, Danielle
2010-01-01
While the National Center for Education Statistics (NCES) has conducted surveys of attrition and mobility among school teachers for two decades, little was known about similar movements among school principals. In order to inform discussions and decisions among policymakers, researchers, and parents, the 2008-09 Principal Follow-up Survey (PFS)…
Li, Jinling; He, Ming; Han, Wei; Gu, Yifan
2009-05-30
An investigation on heavy metal sources, i.e., Cu, Zn, Ni, Pb, Cr, and Cd in the coastal soils of Shanghai, China, was conducted using multivariate statistical methods (principal component analysis, clustering analysis, and correlation analysis). All the results of the multivariate analysis showed that: (i) Cu, Ni, Pb, and Cd had anthropogenic sources (e.g., overuse of chemical fertilizers and pesticides, industrial and municipal discharges, animal wastes, sewage irrigation, etc.); (ii) Zn and Cr were associated with parent materials and therefore had natural sources (e.g., the weathering process of parent materials and subsequent pedo-genesis due to the alluvial deposits). The effect of heavy metals in the soils was greatly affected by soil formation, atmospheric deposition, and human activities. These findings provided essential information on the possible sources of heavy metals, which would contribute to the monitoring and assessment process of agricultural soils in worldwide regions.
The contribution of executive functions to emergent mathematic skills in preschool children.
Espy, Kimberly Andrews; McDiarmid, Melanie M; Cwik, Mary F; Stalets, Melissa Meade; Hamby, Arlena; Senn, Theresa E
2004-01-01
Mathematical ability is related to both activation of the prefrontal cortex in neuroimaging studies of adults and to executive functions in school-age children. The purpose of this study was to determine whether executive functions were related to emergent mathematical proficiency in preschool children. Preschool children (N = 96) were administered an executive function battery that was reduced empirically to working memory (WM), inhibitory control (IC), and shifting abilities by calculating composite scores derived from principal component analysis. Both WM and IC predicted early arithmetic competency, with the observed relations robust after controlling statistically for child age, maternal education, and child vocabulary. Only IC accounted for unique variance in mathematical skills, after the contribution of other executive functions were controlled statistically as well. Specific executive functions are related to emergent mathematical proficiency in this age range. Longitudinal studies using structural equation modeling are necessary to better characterize these ontogenetic relations.
Guo, Jing; Yuan, Yahong; Dou, Pei; Yue, Tianli
2017-10-01
Fifty-one kiwifruit juice samples of seven kiwifruit varieties from five regions in China were analyzed to determine their polyphenols contents and to trace fruit varieties and geographical origins by multivariate statistical analysis. Twenty-one polyphenols belonging to four compound classes were determined by ultra-high-performance liquid chromatography coupled with ultra-high-resolution TOF mass spectrometry. (-)-Epicatechin, (+)-catechin, procyanidin B1 and caffeic acid derivatives were the predominant phenolic compounds in the juices. Principal component analysis (PCA) allowed a clear separation of the juices according to kiwifruit varieties. Stepwise linear discriminant analysis (SLDA) yielded satisfactory categorization of samples, provided 100% success rate according to kiwifruit varieties and 92.2% success rate according to geographical origins. The result showed that polyphenolic profiles of kiwifruit juices contain enough information to trace fruit varieties and geographical origins. Copyright © 2017 Elsevier Ltd. All rights reserved.
Stupák, Ivan; Pavloková, Sylvie; Vysloužil, Jakub; Dohnal, Jiří; Čulen, Martin
2017-11-23
Biorelevant dissolution instruments represent an important tool for pharmaceutical research and development. These instruments are designed to simulate the dissolution of drug formulations in conditions most closely mimicking the gastrointestinal tract. In this work, we focused on the optimization of dissolution compartments/vessels for an updated version of the biorelevant dissolution apparatus-Golem v2. We designed eight compartments of uniform size but different inner geometry. The dissolution performance of the compartments was tested using immediate release caffeine tablets and evaluated by standard statistical methods and principal component analysis. Based on two phases of dissolution testing (using 250 and 100 mL of dissolution medium), we selected two compartment types yielding the highest measurement reproducibility. We also confirmed a statistically ssignificant effect of agitation rate and dissolution volume on the extent of drug dissolved and measurement reproducibility.
Provenance establishment of coffee using solution ICP-MS and ICP-AES.
Valentin, Jenna L; Watling, R John
2013-11-01
Statistical interpretation of the concentrations of 59 elements, determined using solution based inductively coupled plasma mass spectrometry (ICP-MS) and inductively coupled plasma emission spectroscopy (ICP-AES), was used to establish the provenance of coffee samples from 15 countries across five continents. Data confirmed that the harvest year, degree of ripeness and whether the coffees were green or roasted had little effect on the elemental composition of the coffees. The application of linear discriminant analysis and principal component analysis of the elemental concentrations permitted up to 96.9% correct classification of the coffee samples according to their continent of origin. When samples from each continent were considered separately, up to 100% correct classification of coffee samples into their countries, and plantations of origin was achieved. This research demonstrates the potential of using elemental composition, in combination with statistical classification methods, for accurate provenance establishment of coffee. Copyright © 2013 Elsevier Ltd. All rights reserved.
Multi-element fingerprinting as a tool in origin authentication of four east China marine species.
Guo, Lipan; Gong, Like; Yu, Yanlei; Zhang, Hong
2013-12-01
The contents of 25 elements in 4 types of commercial marine species from the East China Sea were determined by inductively coupled plasma mass spectrometry and atomic absorption spectrometry. The elemental composition was used to differentiate marine species according to geographical origin by multivariate statistical analysis. The results showed that principal component analysis could distinguish samples from different areas and reveal the elements which played the most important role in origin diversity. The established models by partial least squares discriminant analysis (PLS-DA) and by probabilistic neural network (PNN) can both precisely predict the origin of the marine species. Further study indicated that PLS-DA and PNN were efficacious in regional discrimination. The models from these 2 statistical methods, with an accuracy of 97.92% and 100%, respectively, could both distinguish samples from different areas without the need for species differentiation. © 2013 Institute of Food Technologists®
Torres, M E; Añino, M M; Schlotthauer, G
2003-12-01
It is well known that, from a dynamical point of view, sudden variations in physiological parameters which govern certain diseases can cause qualitative changes in the dynamics of the corresponding physiological process. The purpose of this paper is to introduce a technique that allows the automated temporal localization of slight changes in a parameter of the law that governs the nonlinear dynamics of a given signal. This tool takes, from the multiresolution entropies, the ability to show these changes as statistical variations at each scale. These variations are held in the corresponding principal component. Appropriately combining these techniques with a statistical changes detector, a complexity change detection algorithm is obtained. The relevance of the approach, together with its robustness in the presence of moderate noise, is discussed in numerical simulations and the automatic detector is applied to real and simulated biological signals.
Rapid analysis of pharmaceutical drugs using LIBS coupled with multivariate analysis.
Tiwari, P K; Awasthi, S; Kumar, R; Anand, R K; Rai, P K; Rai, A K
2018-02-01
Type 2 diabetes drug tablets containing voglibose having dose strengths of 0.2 and 0.3 mg of various brands have been examined, using laser-induced breakdown spectroscopy (LIBS) technique. The statistical methods such as the principal component analysis (PCA) and the partial least square regression analysis (PLSR) have been employed on LIBS spectral data for classifying and developing the calibration models of drug samples. We have developed the ratio-based calibration model applying PLSR in which relative spectral intensity ratios H/C, H/N and O/N are used. Further, the developed model has been employed to predict the relative concentration of element in unknown drug samples. The experiment has been performed in air and argon atmosphere, respectively, and the obtained results have been compared. The present model provides rapid spectroscopic method for drug analysis with high statistical significance for online control and measurement process in a wide variety of pharmaceutical industrial applications.
NASA Technical Reports Server (NTRS)
Talpe, Matthieu J.; Nerem, R. Steven; Forootan, Ehsan; Schmidt, Michael; Lemoine, Frank G.; Enderlin, Ellyn M.; Landerer, Felix W.
2017-01-01
We construct long-term time series of Greenland and Antarctic ice sheet mass change from satellite gravity measurements. A statistical reconstruction approach is developed based on a principal component analysis (PCA) to combine high-resolution spatial modes from the Gravity Recovery and Climate Experiment (GRACE) mission with the gravity information from conventional satellite tracking data. Uncertainties of this reconstruction are rigorously assessed; they include temporal limitations for short GRACE measurements, spatial limitations for the low-resolution conventional tracking data measurements, and limitations of the estimated statistical relationships between low- and high-degree potential coefficients reflected in the PCA modes. Trends of mass variations in Greenland and Antarctica are assessed against a number of previous studies. The resulting time series for Greenland show a higher rate of mass loss than other methods before 2000, while the Antarctic ice sheet appears heavily influenced by interannual variations.
Principal Component Clustering Approach to Teaching Quality Discriminant Analysis
ERIC Educational Resources Information Center
Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan
2016-01-01
Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…
Analysis of the principal component algorithm in phase-shifting interferometry.
Vargas, J; Quiroga, J Antonio; Belenguer, T
2011-06-15
We recently presented a new asynchronous demodulation method for phase-sampling interferometry. The method is based in the principal component analysis (PCA) technique. In the former work, the PCA method was derived heuristically. In this work, we present an in-depth analysis of the PCA demodulation method.
Psychometric Measurement Models and Artificial Neural Networks
ERIC Educational Resources Information Center
Sese, Albert; Palmer, Alfonso L.; Montano, Juan J.
2004-01-01
The study of measurement models in psychometrics by means of dimensionality reduction techniques such as Principal Components Analysis (PCA) is a very common practice. In recent times, an upsurge of interest in the study of artificial neural networks apt to computing a principal component extraction has been observed. Despite this interest, the…
Microelectrode arrays (MEAs) detect drug and chemical induced changes in neuronal network function and have been used for neurotoxicity screening. As a proof-•of-concept, the current study assessed the utility of analytical "fingerprinting" using Principal Components Analysis (P...
Incremental principal component pursuit for video background modeling
Rodriquez-Valderrama, Paul A.; Wohlberg, Brendt
2017-03-14
An incremental Principal Component Pursuit (PCP) algorithm for video background modeling that is able to process one frame at a time while adapting to changes in background, with a computational complexity that allows for real-time processing, having a low memory footprint and is robust to translational and rotational jitter.
Sequential analysis of hydrochemical data for watershed characterization.
Thyne, Geoffrey; Güler, Cüneyt; Poeter, Eileen
2004-01-01
A methodology for characterizing the hydrogeology of watersheds using hydrochemical data that combine statistical, geochemical, and spatial techniques is presented. Surface water and ground water base flow and spring runoff samples (180 total) from a single watershed are first classified using hierarchical cluster analysis. The statistical clusters are analyzed for spatial coherence confirming that the clusters have a geological basis corresponding to topographic flowpaths and showing that the fractured rock aquifer behaves as an equivalent porous medium on the watershed scale. Then principal component analysis (PCA) is used to determine the sources of variation between parameters. PCA analysis shows that the variations within the dataset are related to variations in calcium, magnesium, SO4, and HCO3, which are derived from natural weathering reactions, and pH, NO3, and chlorine, which indicate anthropogenic impact. PHREEQC modeling is used to quantitatively describe the natural hydrochemical evolution for the watershed and aid in discrimination of samples that have an anthropogenic component. Finally, the seasonal changes in the water chemistry of individual sites were analyzed to better characterize the spatial variability of vertical hydraulic conductivity. The integrated result provides a method to characterize the hydrogeology of the watershed that fully utilizes traditional data.
Fault detection, isolation, and diagnosis of self-validating multifunctional sensors.
Yang, Jing-Li; Chen, Yin-Sheng; Zhang, Li-Li; Sun, Zhen
2016-06-01
A novel fault detection, isolation, and diagnosis (FDID) strategy for self-validating multifunctional sensors is presented in this paper. The sparse non-negative matrix factorization-based method can effectively detect faults by using the squared prediction error (SPE) statistic, and the variables contribution plots based on SPE statistic can help to locate and isolate the faulty sensitive units. The complete ensemble empirical mode decomposition is employed to decompose the fault signals to a series of intrinsic mode functions (IMFs) and a residual. The sample entropy (SampEn)-weighted energy values of each IMFs and the residual are estimated to represent the characteristics of the fault signals. Multi-class support vector machine is introduced to identify the fault mode with the purpose of diagnosing status of the faulty sensitive units. The performance of the proposed strategy is compared with other fault detection strategies such as principal component analysis, independent component analysis, and fault diagnosis strategies such as empirical mode decomposition coupled with support vector machine. The proposed strategy is fully evaluated in a real self-validating multifunctional sensors experimental system, and the experimental results demonstrate that the proposed strategy provides an excellent solution to the FDID research topic of self-validating multifunctional sensors.
Dynamic competitive probabilistic principal components analysis.
López-Rubio, Ezequiel; Ortiz-DE-Lazcano-Lobato, Juan Miguel
2009-04-01
We present a new neural model which extends the classical competitive learning (CL) by performing a Probabilistic Principal Components Analysis (PPCA) at each neuron. The model also has the ability to learn the number of basis vectors required to represent the principal directions of each cluster, so it overcomes a drawback of most local PCA models, where the dimensionality of a cluster must be fixed a priori. Experimental results are presented to show the performance of the network with multispectral image data.
A principal components model of soundscape perception.
Axelsson, Östen; Nilsson, Mats E; Berglund, Birgitta
2010-11-01
There is a need for a model that identifies underlying dimensions of soundscape perception, and which may guide measurement and improvement of soundscape quality. With the purpose to develop such a model, a listening experiment was conducted. One hundred listeners measured 50 excerpts of binaural recordings of urban outdoor soundscapes on 116 attribute scales. The average attribute scale values were subjected to principal components analysis, resulting in three components: Pleasantness, eventfulness, and familiarity, explaining 50, 18 and 6% of the total variance, respectively. The principal-component scores were correlated with physical soundscape properties, including categories of dominant sounds and acoustic variables. Soundscape excerpts dominated by technological sounds were found to be unpleasant, whereas soundscape excerpts dominated by natural sounds were pleasant, and soundscape excerpts dominated by human sounds were eventful. These relationships remained after controlling for the overall soundscape loudness (Zwicker's N(10)), which shows that 'informational' properties are substantial contributors to the perception of soundscape. The proposed principal components model provides a framework for future soundscape research and practice. In particular, it suggests which basic dimensions are necessary to measure, how to measure them by a defined set of attribute scales, and how to promote high-quality soundscapes.
Zhao, Yueran; Dou, Deqiang; Guo, Yueqiu; Qi, Yue; Li, Jun; Jia, Dong
2018-06-01
Thirteen trace elements and active constituents of 40 batches of Lonicera japonica flos and Lonicera flos were comparatively studied using inductively coupled plasma mass-spectrometry (ICP-MS) and high-performance liquid chromatography-photodiode array (HPLC-PDA). The trace elements were 24 Mg, 52 Cr, 55 Mn, 57 Fe, 60 Ni, 63 Cu, 66 Zn, 75 As, 82 Se, 98 Mo, 114 Cd, 202 Hg, and 208 Pb, and the active compounds were chlorogenic acid, 3,5-O-dicaffeoylquinc acid, 4,5-O-dicaffeoylquinc acid, luteolin-7-O-glucoside, and 4-O-caffeoylquinic acid. The data of 18 variables were statistically processed using principal component analysis (PCA) and discriminate analysis (DA) to classify L. japonica flos and L. flos. The validated method was developed to divide the 40 samples into two groups based on the PCA in terms of 18 variables. Furthermore, the species of Lonicera was better discriminated by using DA with 12 variables. These results suggest that the method and statistical analysis of the contents of trace elements and chemical components can classify the L. japonica flos and L. flos using 12 variables, such as 3,5-O-dicaffeoylquincacid, luteolin-7-O-glucoside, Cd, Mn, Hg, Pb, Ni, 4-O-caffeoyl-quinic acid, 4,5-O-dicaffeoylquinc acid, Fe, Mg, and Cr.
Environmental nanoparticles are significantly over-expressed in acute myeloid leukemia.
Visani, G; Manti, A; Valentini, L; Canonico, B; Loscocco, F; Isidori, A; Gabucci, E; Gobbi, P; Montanari, S; Rocchi, M; Papa, S; Gatti, A M
2016-11-01
The increase in the incidence of acute myeloid leukemia (AML) may suggest a possible environmental etiology. PM2.5 was declared by IARC a Class I carcinogen. No report has focused on particulate environmental pollution together with AML. The study investigated the presence and composition of particulate matter in blood with a Scanning Electron Microscope coupled with an Energy Dispersive Spectroscope, a sensor capable of identifying the composition of foreign bodies. 38 peripheral blood samples, 19 AML cases and 19 healthy controls, were analyzed. A significant overload of particulate matter-derived nanoparticles linked or aggregated to blood components was found in AML patients, while almost absent in matched healthy controls. Two-tailed Student's t-test, MANOVA and Principal Component Analysis indicated that the total numbers of aggregates and particles were statistically different between cases and controls (MANOVA, P<0.001 and P=0.009 respectively). The particles detected showed to contain highly-reactive, non-biocompatible and non-biodegradable metals; in particular, micro- and nano-sized particles grouped in organic/inorganic clusters, with statistically higher frequency of a subgroup of elements in AML samples. The demonstration, for the first time, of an overload of nanoparticles linked to blood components in AML patients could be the basis for a possible, novel pathogenetic mechanism for AML development. Copyright © 2016 Elsevier Ltd. All rights reserved.
The Statistics and Mathematics of High Dimension Low Sample Size Asymptotics.
Shen, Dan; Shen, Haipeng; Zhu, Hongtu; Marron, J S
2016-10-01
The aim of this paper is to establish several deep theoretical properties of principal component analysis for multiple-component spike covariance models. Our new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size. When this ratio converges to a nonzero constant, the sample eigenvector converges to a cone, with a certain angle to its corresponding population eigenvector. In the High Dimension, Low Sample Size case, the angle between the sample eigenvector and its population counterpart converges to a limiting distribution. Several generalizations of the multi-spike covariance models are also explored, and additional theoretical results are presented.
Face-space architectures: evidence for the use of independent color-based features.
Nestor, Adrian; Plaut, David C; Behrmann, Marlene
2013-07-01
The concept of psychological face space lies at the core of many theories of face recognition and representation. To date, much of the understanding of face space has been based on principal component analysis (PCA); the structure of the psychological space is thought to reflect some important aspects of a physical face space characterized by PCA applications to face images. In the present experiments, we investigated alternative accounts of face space and found that independent component analysis provided the best fit to human judgments of face similarity and identification. Thus, our results challenge an influential approach to the study of human face space and provide evidence for the role of statistically independent features in face encoding. In addition, our findings support the use of color information in the representation of facial identity, and we thus argue for the inclusion of such information in theoretical and computational constructs of face space.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2012-01-05
SandiaMCR was developed to identify pure components and their concentrations from spectral data. This software efficiently implements the multivariate calibration regression alternating least squares (MCR-ALS), principal component analysis (PCA), and singular value decomposition (SVD). Version 3.37 also includes the PARAFAC-ALS Tucker-1 (for trilinear analysis) algorithms. The alternating least squares methods can be used to determine the composition without or with incomplete prior information on the constituents and their concentrations. It allows the specification of numerous preprocessing, initialization and data selection and compression options for the efficient processing of large data sets. The software includes numerous options including the definition ofmore » equality and non-negativety constraints to realistically restrict the solution set, various normalization or weighting options based on the statistics of the data, several initialization choices and data compression. The software has been designed to provide a practicing spectroscopist the tools required to routinely analysis data in a reasonable time and without requiring expert intervention.« less
Das, Atanu; Mukhopadhyay, Chaitali
2007-10-28
We have performed molecular dynamics (MD) simulation of the thermal denaturation of one protein and one peptide-ubiquitin and melittin. To identify the correlation in dynamics among various secondary structural fragments and also the individual contribution of different residues towards thermal unfolding, principal component analysis method was applied in order to give a new insight to protein dynamics by analyzing the contribution of coefficients of principal components. The cross-correlation matrix obtained from MD simulation trajectory provided important information regarding the anisotropy of backbone dynamics that leads to unfolding. Unfolding of ubiquitin was found to be a three-state process, while that of melittin, though smaller and mostly helical, is more complicated.
NASA Astrophysics Data System (ADS)
Das, Atanu; Mukhopadhyay, Chaitali
2007-10-01
We have performed molecular dynamics (MD) simulation of the thermal denaturation of one protein and one peptide—ubiquitin and melittin. To identify the correlation in dynamics among various secondary structural fragments and also the individual contribution of different residues towards thermal unfolding, principal component analysis method was applied in order to give a new insight to protein dynamics by analyzing the contribution of coefficients of principal components. The cross-correlation matrix obtained from MD simulation trajectory provided important information regarding the anisotropy of backbone dynamics that leads to unfolding. Unfolding of ubiquitin was found to be a three-state process, while that of melittin, though smaller and mostly helical, is more complicated.
SAS program for quantitative stratigraphic correlation by principal components
Hohn, M.E.
1985-01-01
A SAS program is presented which constructs a composite section of stratigraphic events through principal components analysis. The variables in the analysis are stratigraphic sections and the observational units are range limits of taxa. The program standardizes data in each section, extracts eigenvectors, estimates missing range limits, and computes the composite section from scores of events on the first principal component. Provided is an option of several types of diagnostic plots; these help one to determine conservative range limits or unrealistic estimates of missing values. Inspection of the graphs and eigenvalues allow one to evaluate goodness of fit between the composite and measured data. The program is extended easily to the creation of a rank-order composite. ?? 1985.
NASA Astrophysics Data System (ADS)
Werth, Alexandra; Liakat, Sabbir; Dong, Anqi; Woods, Callie M.; Gmachl, Claire F.
2018-05-01
An integrating sphere is used to enhance the collection of backscattered light in a noninvasive glucose sensor based on quantum cascade laser spectroscopy. The sphere enhances signal stability by roughly an order of magnitude, allowing us to use a thermoelectrically (TE) cooled detector while maintaining comparable glucose prediction accuracy levels. Using a smaller TE-cooled detector reduces form factor, creating a mobile sensor. Principal component analysis has predicted principal components of spectra taken from human subjects that closely match the absorption peaks of glucose. These principal components are used as regressors in a linear regression algorithm to make glucose concentration predictions, over 75% of which are clinically accurate.
ERIC Educational Resources Information Center
Ilgan, Abdurrahman; Parylo, Oksana; Sungu, Hilmi
2015-01-01
This quantitative research examined instructional supervision behaviours of school principals as a predictor of teacher job satisfaction through the analysis of Turkish teachers' perceptions of principals' instructional supervision behaviours. There was a statistically significant difference found between the teachers' job satisfaction level and…
ERIC Educational Resources Information Center
Chirichello, Michael
There are about 80,000 public school principals in the United States. The Bureau of Labor Statistics estimates there will be a 10 percent increase in the employment of educational administrators of all types through 2006. The National Association of Elementary School Principals estimates that more than 40 percent of principals will retire or leave…
Principals' Perceptions of Collegial Support as a Component of Administrative Inservice.
ERIC Educational Resources Information Center
Daresh, John C.
To address the problem of increasing professional isolation of building administrators, the Principals' Inservice Project helps establish principals' collegial support groups across the nation. The groups are typically composed of 6 to 10 principals who meet at least once each month over a 2-year period. One collegial support group of seven…
Training the Trainers: Learning to Be a Principal Supervisor
ERIC Educational Resources Information Center
Saltzman, Amy
2017-01-01
While most principal supervisors are former principals themselves, few come to the role with specific training in how to do the job effectively. For this reason, both the Washington, D.C., and Tulsa, Oklahoma, principal supervisor programs include a strong professional development component. In this article, the author takes a look inside these…
Zeng, Irene Sui Lan; Lumley, Thomas
2018-01-01
Integrated omics is becoming a new channel for investigating the complex molecular system in modern biological science and sets a foundation for systematic learning for precision medicine. The statistical/machine learning methods that have emerged in the past decade for integrated omics are not only innovative but also multidisciplinary with integrated knowledge in biology, medicine, statistics, machine learning, and artificial intelligence. Here, we review the nontrivial classes of learning methods from the statistical aspects and streamline these learning methods within the statistical learning framework. The intriguing findings from the review are that the methods used are generalizable to other disciplines with complex systematic structure, and the integrated omics is part of an integrated information science which has collated and integrated different types of information for inferences and decision making. We review the statistical learning methods of exploratory and supervised learning from 42 publications. We also discuss the strengths and limitations of the extended principal component analysis, cluster analysis, network analysis, and regression methods. Statistical techniques such as penalization for sparsity induction when there are fewer observations than the number of features and using Bayesian approach when there are prior knowledge to be integrated are also included in the commentary. For the completeness of the review, a table of currently available software and packages from 23 publications for omics are summarized in the appendix.
ERIC Educational Resources Information Center
Rodrigue, Christine M.
2011-01-01
This paper presents a laboratory exercise used to teach principal components analysis (PCA) as a means of surface zonation. The lab was built around abundance data for 16 oxides and elements collected by the Mars Exploration Rover Spirit in Gusev Crater between Sol 14 and Sol 470. Students used PCA to reduce 15 of these into 3 components, which,…
Anatomical curve identification
Bowman, Adrian W.; Katina, Stanislav; Smith, Joanna; Brown, Denise
2015-01-01
Methods for capturing images in three dimensions are now widely available, with stereo-photogrammetry and laser scanning being two common approaches. In anatomical studies, a number of landmarks are usually identified manually from each of these images and these form the basis of subsequent statistical analysis. However, landmarks express only a very small proportion of the information available from the images. Anatomically defined curves have the advantage of providing a much richer expression of shape. This is explored in the context of identifying the boundary of breasts from an image of the female torso and the boundary of the lips from a facial image. The curves of interest are characterised by ridges or valleys. Key issues in estimation are the ability to navigate across the anatomical surface in three-dimensions, the ability to recognise the relevant boundary and the need to assess the evidence for the presence of the surface feature of interest. The first issue is addressed by the use of principal curves, as an extension of principal components, the second by suitable assessment of curvature and the third by change-point detection. P-spline smoothing is used as an integral part of the methods but adaptations are made to the specific anatomical features of interest. After estimation of the boundary curves, the intermediate surfaces of the anatomical feature of interest can be characterised by surface interpolation. This allows shape variation to be explored using standard methods such as principal components. These tools are applied to a collection of images of women where one breast has been reconstructed after mastectomy and where interest lies in shape differences between the reconstructed and unreconstructed breasts. They are also applied to a collection of lip images where possible differences in shape between males and females are of interest. PMID:26041943
ERIC Educational Resources Information Center
Ackermann, Margot Elise; Morrow, Jennifer Ann
2008-01-01
The present study describes the development and initial validation of the Coping with the College Environment Scale (CWCES). Participants included 433 college students who took an online survey. Principal Components Analysis (PCA) revealed six coping strategies: planning and self-management, seeking support from institutional resources, escaping…
NASA Astrophysics Data System (ADS)
Kistenev, Yu. V.; Shapovalov, A. V.; Borisov, A. V.; Vrazhnov, D. A.; Nikolaev, V. V.; Nikiforova, O. Yu.
2015-11-01
The comparison results of different mother wavelets used for de-noising of model and experimental data which were presented by profiles of absorption spectra of exhaled air are presented. The impact of wavelets de-noising on classification quality made by principal component analysis are also discussed.
Evaluation of skin melanoma in spectral range 450-950 nm using principal component analysis
NASA Astrophysics Data System (ADS)
Jakovels, D.; Lihacova, I.; Kuzmina, I.; Spigulis, J.
2013-06-01
Diagnostic potential of principal component analysis (PCA) of multi-spectral imaging data in the wavelength range 450- 950 nm for distant skin melanoma recognition is discussed. Processing of the measured clinical data by means of PCA resulted in clear separation between malignant melanomas and pigmented nevi.
ERIC Educational Resources Information Center
Linting, Marielle; Meulman, Jacqueline J.; Groenen, Patrick J. F.; van der Kooij, Anita J.
2007-01-01
Principal components analysis (PCA) is used to explore the structure of data sets containing linearly related numeric variables. Alternatively, nonlinear PCA can handle possibly nonlinearly related numeric as well as nonnumeric variables. For linear PCA, the stability of its solution can be established under the assumption of multivariate…
40 CFR 60.2998 - What are the principal components of the model rule?
Code of Federal Regulations, 2012 CFR
2012-07-01
... the model rule? 60.2998 Section 60.2998 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) STANDARDS OF PERFORMANCE FOR NEW STATIONARY SOURCES Emission Guidelines... December 9, 2004 Model Rule-Use of Model Rule § 60.2998 What are the principal components of the model rule...
40 CFR 60.2998 - What are the principal components of the model rule?
Code of Federal Regulations, 2014 CFR
2014-07-01
... the model rule? 60.2998 Section 60.2998 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) STANDARDS OF PERFORMANCE FOR NEW STATIONARY SOURCES Emission Guidelines... December 9, 2004 Model Rule-Use of Model Rule § 60.2998 What are the principal components of the model rule...
40 CFR 60.2998 - What are the principal components of the model rule?
Code of Federal Regulations, 2011 CFR
2011-07-01
... the model rule? 60.2998 Section 60.2998 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) STANDARDS OF PERFORMANCE FOR NEW STATIONARY SOURCES Emission Guidelines... December 9, 2004 Model Rule-Use of Model Rule § 60.2998 What are the principal components of the model rule...
40 CFR 60.1580 - What are the principal components of the model rule?
Code of Federal Regulations, 2010 CFR
2010-07-01
... the model rule? 60.1580 Section 60.1580 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) STANDARDS OF PERFORMANCE FOR NEW STATIONARY SOURCES Emission Guidelines..., 1999 Use of Model Rule § 60.1580 What are the principal components of the model rule? The model rule...
40 CFR 60.2998 - What are the principal components of the model rule?
Code of Federal Regulations, 2013 CFR
2013-07-01
... the model rule? 60.2998 Section 60.2998 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) STANDARDS OF PERFORMANCE FOR NEW STATIONARY SOURCES Emission Guidelines... December 9, 2004 Model Rule-Use of Model Rule § 60.2998 What are the principal components of the model rule...
Students' Perceptions of Teaching and Learning Practices: A Principal Component Approach
ERIC Educational Resources Information Center
Mukorera, Sophia; Nyatanga, Phocenah
2017-01-01
Students' attendance and engagement with teaching and learning practices is perceived as a critical element for academic performance. Even with stipulated attendance policies, students still choose not to engage. The study employed a principal component analysis to analyze first- and second-year students' perceptions of the importance of the 12…
ERIC Educational Resources Information Center
Hunley-Jenkins, Keisha Janine
2012-01-01
This qualitative study explores large, urban, mid-western principal perspectives about cyberbullying and the policy components and practices that they have found effective and ineffective at reducing its occurrence and/or negative effect on their schools' learning environments. More specifically, the researcher was interested in learning more…
Learning Principal Component Analysis by Using Data from Air Quality Networks
ERIC Educational Resources Information Center
Perez-Arribas, Luis Vicente; Leon-González, María Eugenia; Rosales-Conrado, Noelia
2017-01-01
With the final objective of using computational and chemometrics tools in the chemistry studies, this paper shows the methodology and interpretation of the Principal Component Analysis (PCA) using pollution data from different cities. This paper describes how students can obtain data on air quality and process such data for additional information…
Principal component analysis for protein folding dynamics.
Maisuradze, Gia G; Liwo, Adam; Scheraga, Harold A
2009-01-09
Protein folding is considered here by studying the dynamics of the folding of the triple beta-strand WW domain from the Formin-binding protein 28. Starting from the unfolded state and ending either in the native or nonnative conformational states, trajectories are generated with the coarse-grained united residue (UNRES) force field. The effectiveness of principal components analysis (PCA), an already established mathematical technique for finding global, correlated motions in atomic simulations of proteins, is evaluated here for coarse-grained trajectories. The problems related to PCA and their solutions are discussed. The folding and nonfolding of proteins are examined with free-energy landscapes. Detailed analyses of many folding and nonfolding trajectories at different temperatures show that PCA is very efficient for characterizing the general folding and nonfolding features of proteins. It is shown that the first principal component captures and describes in detail the dynamics of a system. Anomalous diffusion in the folding/nonfolding dynamics is examined by the mean-square displacement (MSD) and the fractional diffusion and fractional kinetic equations. The collisionless (or ballistic) behavior of a polypeptide undergoing Brownian motion along the first few principal components is accounted for.
Principal Component 2-D Long Short-Term Memory for Font Recognition on Single Chinese Characters.
Tao, Dapeng; Lin, Xu; Jin, Lianwen; Li, Xuelong
2016-03-01
Chinese character font recognition (CCFR) has received increasing attention as the intelligent applications based on optical character recognition becomes popular. However, traditional CCFR systems do not handle noisy data effectively. By analyzing in detail the basic strokes of Chinese characters, we propose that font recognition on a single Chinese character is a sequence classification problem, which can be effectively solved by recurrent neural networks. For robust CCFR, we integrate a principal component convolution layer with the 2-D long short-term memory (2DLSTM) and develop principal component 2DLSTM (PC-2DLSTM) algorithm. PC-2DLSTM considers two aspects: 1) the principal component layer convolution operation helps remove the noise and get a rational and complete font information and 2) simultaneously, 2DLSTM deals with the long-range contextual processing along scan directions that can contribute to capture the contrast between character trajectory and background. Experiments using the frequently used CCFR dataset suggest the effectiveness of PC-2DLSTM compared with other state-of-the-art font recognition methods.
Dynamic of consumer groups and response of commodity markets by principal component analysis
NASA Astrophysics Data System (ADS)
Nobi, Ashadun; Alam, Shafiqul; Lee, Jae Woo
2017-09-01
This study investigates financial states and group dynamics by applying principal component analysis to the cross-correlation coefficients of the daily returns of commodity futures. The eigenvalues of the cross-correlation matrix in the 6-month timeframe displays similar values during 2010-2011, but decline following 2012. A sharp drop in eigenvalue implies the significant change of the market state. Three commodity sectors, energy, metals and agriculture, are projected into two dimensional spaces consisting of two principal components (PC). We observe that they form three distinct clusters in relation to various sectors. However, commodities with distinct features have intermingled with one another and scattered during severe crises, such as the European sovereign debt crises. We observe the notable change of the position of two dimensional spaces of groups during financial crises. By considering the first principal component (PC1) within the 6-month moving timeframe, we observe that commodities of the same group change states in a similar pattern, and the change of states of one group can be used as a warning for other group.
Yuan, Yuan-Yuan; Zhou, Yu-Bi; Sun, Jing; Deng, Juan; Bai, Ying; Wang, Jie; Lu, Xue-Feng
2017-06-01
The content of elements in fifteen different regions of Nitraria roborowskii samples were determined by inductively coupled plasma-atomic emission spectrometry(ICP-OES), and its elemental characteristics were analyzed by principal component analysis. The results indicated that 18 mineral elements were detected in N. roborowskii of which V cannot be detected. In addition, contents of Na, K and Ca showed high concentration. Ti showed maximum content variance, while K is minimum. Four principal components were gained from the original data. The cumulative variance contribution rate is 81.542% and the variance contribution of the first principal component was 44.997%, indicating that Cr, Fe, P and Ca were the characteristic elements of N. roborowskii.Thus, the established method was simple, precise and can be used for determination of mineral elements in N.roborowskii Kom. fruits. The elemental distribution characteristics among N.roborowskii fruits are related to geographical origins which were clearly revealed by PCA. All the results will provide good basis for comprehensive utilization of N.roborowskii. Copyright© by the Chinese Pharmaceutical Association.
Lü, Gui-Cai; Zhao, Wei-Hong; Wang, Jiang-Tao
2011-01-01
The identification techniques for 10 species of red tide algae often found in the coastal areas of China were developed by combining the three-dimensional fluorescence spectra of fluorescence dissolved organic matter (FDOM) from the cultured red tide algae with principal component analysis. Based on the results of principal component analysis, the first principal component loading spectrum of three-dimensional fluorescence spectrum was chosen as the identification characteristic spectrum for red tide algae, and the phytoplankton fluorescence characteristic spectrum band was established. Then the 10 algae species were tested using Bayesian discriminant analysis with a correct identification rate of more than 92% for Pyrrophyta on the level of species, and that of more than 75% for Bacillariophyta on the level of genus in which the correct identification rates were more than 90% for the phaeodactylum and chaetoceros. The results showed that the identification techniques for 10 species of red tide algae based on the three-dimensional fluorescence spectra of FDOM from the cultured red tide algae and principal component analysis could work well.
NASA Astrophysics Data System (ADS)
Ji, Yi; Sun, Shanlin; Xie, Hong-Bo
2017-06-01
Discrete wavelet transform (WT) followed by principal component analysis (PCA) has been a powerful approach for the analysis of biomedical signals. Wavelet coefficients at various scales and channels were usually transformed into a one-dimensional array, causing issues such as the curse of dimensionality dilemma and small sample size problem. In addition, lack of time-shift invariance of WT coefficients can be modeled as noise and degrades the classifier performance. In this study, we present a stationary wavelet-based two-directional two-dimensional principal component analysis (SW2D2PCA) method for the efficient and effective extraction of essential feature information from signals. Time-invariant multi-scale matrices are constructed in the first step. The two-directional two-dimensional principal component analysis then operates on the multi-scale matrices to reduce the dimension, rather than vectors in conventional PCA. Results are presented from an experiment to classify eight hand motions using 4-channel electromyographic (EMG) signals recorded in healthy subjects and amputees, which illustrates the efficiency and effectiveness of the proposed method for biomedical signal analysis.
Hyperspectral optical imaging of human iris in vivo: characteristics of reflectance spectra
NASA Astrophysics Data System (ADS)
Medina, José M.; Pereira, Luís M.; Correia, Hélder T.; Nascimento, Sérgio M. C.
2011-07-01
We report a hyperspectral imaging system to measure the reflectance spectra of real human irises with high spatial resolution. A set of ocular prosthesis was used as the control condition. Reflectance data were decorrelated by the principal-component analysis. The main conclusion is that spectral complexity of the human iris is considerable: between 9 and 11 principal components are necessary to account for 99% of the cumulative variance in human irises. Correcting image misalignments associated with spontaneous ocular movements did not influence this result. The data also suggests a correlation between the first principal component and different levels of melanin present in the irises. It was also found that although the spectral characteristics of the first five principal components were not affected by the radial and angular position of the selected iridal areas, they affect the higher-order ones, suggesting a possible influence of the iris texture. The results show that hyperspectral imaging in the iris, together with adequate spectroscopic analyses provide more information than conventional colorimetric methods, making hyperspectral imaging suitable for the characterization of melanin and the noninvasive diagnosis of ocular diseases and iris color.
Seeing wholes: The concept of systems thinking and its implementation in school leadership
NASA Astrophysics Data System (ADS)
Shaked, Haim; Schechter, Chen
2013-12-01
Systems thinking (ST) is an approach advocating thinking about any given issue as a whole, emphasising the interrelationships between its components rather than the components themselves. This article aims to link ST and school leadership, claiming that ST may enable school principals to develop highly performing schools that can cope successfully with current challenges, which are more complex than ever before in today's era of accountability and high expectations. The article presents the concept of ST - its definition, components, history and applications. Thereafter, its connection to education and its contribution to school management are described. The article concludes by discussing practical processes including screening for ST-skilled principal candidates and developing ST skills among prospective and currently performing school principals, pinpointing three opportunities for skills acquisition: during preparatory programmes; during their first years on the job, supported by veteran school principals as mentors; and throughout their entire career. Such opportunities may not only provide school principals with ST skills but also improve their functioning throughout the aforementioned stages of professional development.