Sample records for multivariate classification methods

  1. An information-based network approach for protein classification

    PubMed Central

    Wan, Xiaogeng; Zhao, Xin; Yau, Stephen S. T.

    2017-01-01

    Protein classification is one of the critical problems in bioinformatics. Early studies used geometric distances and polygenetic-tree to classify proteins. These methods use binary trees to present protein classification. In this paper, we propose a new protein classification method, whereby theories of information and networks are used to classify the multivariate relationships of proteins. In this study, protein universe is modeled as an undirected network, where proteins are classified according to their connections. Our method is unsupervised, multivariate, and alignment-free. It can be applied to the classification of both protein sequences and structures. Nine examples are used to demonstrate the efficiency of our new method. PMID:28350835

  2. Testing Multivariate Adaptive Regression Splines (MARS) as a Method of Land Cover Classification of TERRA-ASTER Satellite Images.

    PubMed

    Quirós, Elia; Felicísimo, Angel M; Cuartero, Aurora

    2009-01-01

    This work proposes a new method to classify multi-spectral satellite images based on multivariate adaptive regression splines (MARS) and compares this classification system with the more common parallelepiped and maximum likelihood (ML) methods. We apply the classification methods to the land cover classification of a test zone located in southwestern Spain. The basis of the MARS method and its associated procedures are explained in detail, and the area under the ROC curve (AUC) is compared for the three methods. The results show that the MARS method provides better results than the parallelepiped method in all cases, and it provides better results than the maximum likelihood method in 13 cases out of 17. These results demonstrate that the MARS method can be used in isolation or in combination with other methods to improve the accuracy of soil cover classification. The improvement is statistically significant according to the Wilcoxon signed rank test.

  3. Various forms of indexing HDMR for modelling multivariate classification problems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aksu, Çağrı; Tunga, M. Alper

    2014-12-10

    The Indexing HDMR method was recently developed for modelling multivariate interpolation problems. The method uses the Plain HDMR philosophy in partitioning the given multivariate data set into less variate data sets and then constructing an analytical structure through these partitioned data sets to represent the given multidimensional problem. Indexing HDMR makes HDMR be applicable to classification problems having real world data. Mostly, we do not know all possible class values in the domain of the given problem, that is, we have a non-orthogonal data structure. However, Plain HDMR needs an orthogonal data structure in the given problem to be modelled.more » In this sense, the main idea of this work is to offer various forms of Indexing HDMR to successfully model these real life classification problems. To test these different forms, several well-known multivariate classification problems given in UCI Machine Learning Repository were used and it was observed that the accuracy results lie between 80% and 95% which are very satisfactory.« less

  4. Fast-HPLC Fingerprinting to Discriminate Olive Oil from Other Edible Vegetable Oils by Multivariate Classification Methods.

    PubMed

    Jiménez-Carvelo, Ana M; González-Casado, Antonio; Pérez-Castaño, Estefanía; Cuadros-Rodríguez, Luis

    2017-03-01

    A new analytical method for the differentiation of olive oil from other vegetable oils using reversed-phase LC and applying chemometric techniques was developed. A 3 cm short column was used to obtain the chromatographic fingerprint of the methyl-transesterified fraction of each vegetable oil. The chromatographic analysis took only 4 min. The multivariate classification methods used were k-nearest neighbors, partial least-squares (PLS) discriminant analysis, one-class PLS, support vector machine classification, and soft independent modeling of class analogies. The discrimination of olive oil from other vegetable edible oils was evaluated by several classification quality metrics. Several strategies for the classification of the olive oil were used: one input-class, two input-class, and pseudo two input-class.

  5. Multivariate detrending of fMRI signal drifts for real-time multiclass pattern classification.

    PubMed

    Lee, Dongha; Jang, Changwon; Park, Hae-Jeong

    2015-03-01

    Signal drift in functional magnetic resonance imaging (fMRI) is an unavoidable artifact that limits classification performance in multi-voxel pattern analysis of fMRI. As conventional methods to reduce signal drift, global demeaning or proportional scaling disregards regional variations of drift, whereas voxel-wise univariate detrending is too sensitive to noisy fluctuations. To overcome these drawbacks, we propose a multivariate real-time detrending method for multiclass classification that involves spatial demeaning at each scan and the recursive detrending of drifts in the classifier outputs driven by a multiclass linear support vector machine. Experiments using binary and multiclass data showed that the linear trend estimation of the classifier output drift for each class (a weighted sum of drifts in the class-specific voxels) was more robust against voxel-wise artifacts that lead to inconsistent spatial patterns and the effect of online processing than voxel-wise detrending. The classification performance of the proposed method was significantly better, especially for multiclass data, than that of voxel-wise linear detrending, global demeaning, and classifier output detrending without demeaning. We concluded that the multivariate approach using classifier output detrending of fMRI signals with spatial demeaning preserves spatial patterns, is less sensitive than conventional methods to sample size, and increases classification performance, which is a useful feature for real-time fMRI classification. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. An efficient swarm intelligence approach to feature selection based on invasive weed optimization: Application to multivariate calibration and classification using spectroscopic data

    NASA Astrophysics Data System (ADS)

    Sheykhizadeh, Saheleh; Naseri, Abdolhossein

    2018-04-01

    Variable selection plays a key role in classification and multivariate calibration. Variable selection methods are aimed at choosing a set of variables, from a large pool of available predictors, relevant to the analyte concentrations estimation, or to achieve better classification results. Many variable selection techniques have now been introduced among which, those which are based on the methodologies of swarm intelligence optimization have been more respected during a few last decades since they are mainly inspired by nature. In this work, a simple and new variable selection algorithm is proposed according to the invasive weed optimization (IWO) concept. IWO is considered a bio-inspired metaheuristic mimicking the weeds ecological behavior in colonizing as well as finding an appropriate place for growth and reproduction; it has been shown to be very adaptive and powerful to environmental changes. In this paper, the first application of IWO, as a very simple and powerful method, to variable selection is reported using different experimental datasets including FTIR and NIR data, so as to undertake classification and multivariate calibration tasks. Accordingly, invasive weed optimization - linear discrimination analysis (IWO-LDA) and invasive weed optimization- partial least squares (IWO-PLS) are introduced for multivariate classification and calibration, respectively.

  7. An efficient swarm intelligence approach to feature selection based on invasive weed optimization: Application to multivariate calibration and classification using spectroscopic data.

    PubMed

    Sheykhizadeh, Saheleh; Naseri, Abdolhossein

    2018-04-05

    Variable selection plays a key role in classification and multivariate calibration. Variable selection methods are aimed at choosing a set of variables, from a large pool of available predictors, relevant to the analyte concentrations estimation, or to achieve better classification results. Many variable selection techniques have now been introduced among which, those which are based on the methodologies of swarm intelligence optimization have been more respected during a few last decades since they are mainly inspired by nature. In this work, a simple and new variable selection algorithm is proposed according to the invasive weed optimization (IWO) concept. IWO is considered a bio-inspired metaheuristic mimicking the weeds ecological behavior in colonizing as well as finding an appropriate place for growth and reproduction; it has been shown to be very adaptive and powerful to environmental changes. In this paper, the first application of IWO, as a very simple and powerful method, to variable selection is reported using different experimental datasets including FTIR and NIR data, so as to undertake classification and multivariate calibration tasks. Accordingly, invasive weed optimization - linear discrimination analysis (IWO-LDA) and invasive weed optimization- partial least squares (IWO-PLS) are introduced for multivariate classification and calibration, respectively. Copyright © 2018 Elsevier B.V. All rights reserved.

  8. Statistical methods and neural network approaches for classification of data from multiple sources

    NASA Technical Reports Server (NTRS)

    Benediktsson, Jon Atli; Swain, Philip H.

    1990-01-01

    Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.

  9. Use of collateral information to improve LANDSAT classification accuracies

    NASA Technical Reports Server (NTRS)

    Strahler, A. H. (Principal Investigator)

    1981-01-01

    Methods to improve LANDSAT classification accuracies were investigated including: (1) the use of prior probabilities in maximum likelihood classification as a methodology to integrate discrete collateral data with continuously measured image density variables; (2) the use of the logit classifier as an alternative to multivariate normal classification that permits mixing both continuous and categorical variables in a single model and fits empirical distributions of observations more closely than the multivariate normal density function; and (3) the use of collateral data in a geographic information system as exercised to model a desired output information layer as a function of input layers of raster format collateral and image data base layers.

  10. Multivariate classification of the infrared spectra of cell and tissue samples

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Haaland, D.M.; Jones, H.D.; Thomas, E.V.

    1997-03-01

    Infrared microspectroscopy of biopsied canine lymph cells and tissue was performed to investigate the possibility of using IR spectra coupled with multivariate classification methods to classify the samples as normal, hyperplastic, or neoplastic (malignant). IR spectra were obtained in transmission mode through BaF{sub 2} windows and in reflection mode from samples prepared on gold-coated microscope slides. Cytology and histopathology samples were prepared by a variety of methods to identify the optimal methods of sample preparation. Cytospinning procedures that yielded a monolayer of cells on the BaF{sub 2} windows produced a limited set of IR transmission spectra. These transmission spectra weremore » converted to absorbance and formed the basis for a classification rule that yielded 100{percent} correct classification in a cross-validated context. Classifications of normal, hyperplastic, and neoplastic cell sample spectra were achieved by using both partial least-squares (PLS) and principal component regression (PCR) classification methods. Linear discriminant analysis applied to principal components obtained from the spectral data yielded a small number of misclassifications. PLS weight loading vectors yield valuable qualitative insight into the molecular changes that are responsible for the success of the infrared classification. These successful classification results show promise for assisting pathologists in the diagnosis of cell types and offer future potential for {ital in vivo} IR detection of some types of cancer. {copyright} {ital 1997} {ital Society for Applied Spectroscopy}« less

  11. Forensic Discrimination of Latent Fingerprints Using Laser-Induced Breakdown Spectroscopy (LIBS) and Chemometric Approaches.

    PubMed

    Yang, Jun-Ho; Yoh, Jack J

    2018-01-01

    A novel technique is reported for separating overlapping latent fingerprints using chemometric approaches that combine laser-induced breakdown spectroscopy (LIBS) and multivariate analysis. The LIBS technique provides the capability of real time analysis and high frequency scanning as well as the data regarding the chemical composition of overlapping latent fingerprints. These spectra offer valuable information for the classification and reconstruction of overlapping latent fingerprints by implementing appropriate statistical multivariate analysis. The current study employs principal component analysis and partial least square methods for the classification of latent fingerprints from the LIBS spectra. This technique was successfully demonstrated through a classification study of four distinct latent fingerprints using classification methods such as soft independent modeling of class analogy (SIMCA) and partial least squares discriminant analysis (PLS-DA). The novel method yielded an accuracy of more than 85% and was proven to be sufficiently robust. Furthermore, through laser scanning analysis at a spatial interval of 125 µm, the overlapping fingerprints were reconstructed as separate two-dimensional forms.

  12. Combination of laser-induced breakdown spectroscopy and Raman spectroscopy for multivariate classification of bacteria

    NASA Astrophysics Data System (ADS)

    Prochazka, D.; Mazura, M.; Samek, O.; Rebrošová, K.; Pořízka, P.; Klus, J.; Prochazková, P.; Novotný, J.; Novotný, K.; Kaiser, J.

    2018-01-01

    In this work, we investigate the impact of data provided by complementary laser-based spectroscopic methods on multivariate classification accuracy. Discrimination and classification of five Staphylococcus bacterial strains and one strain of Escherichia coli is presented. The technique that we used for measurements is a combination of Raman spectroscopy and Laser-Induced Breakdown Spectroscopy (LIBS). Obtained spectroscopic data were then processed using Multivariate Data Analysis algorithms. Principal Components Analysis (PCA) was selected as the most suitable technique for visualization of bacterial strains data. To classify the bacterial strains, we used Neural Networks, namely a supervised version of Kohonen's self-organizing maps (SOM). We were processing results in three different ways - separately from LIBS measurements, from Raman measurements, and we also merged data from both mentioned methods. The three types of results were then compared. By applying the PCA to Raman spectroscopy data, we observed that two bacterial strains were fully distinguished from the rest of the data set. In the case of LIBS data, three bacterial strains were fully discriminated. Using a combination of data from both methods, we achieved the complete discrimination of all bacterial strains. All the data were classified with a high success rate using SOM algorithm. The most accurate classification was obtained using a combination of data from both techniques. The classification accuracy varied, depending on specific samples and techniques. As for LIBS, the classification accuracy ranged from 45% to 100%, as for Raman Spectroscopy from 50% to 100% and in case of merged data, all samples were classified correctly. Based on the results of the experiments presented in this work, we can assume that the combination of Raman spectroscopy and LIBS significantly enhances discrimination and classification accuracy of bacterial species and strains. The reason is the complementarity in obtained chemical information while using these two methods.

  13. Multivariate Density Estimation and Remote Sensing

    NASA Technical Reports Server (NTRS)

    Scott, D. W.

    1983-01-01

    Current efforts to develop methods and computer algorithms to effectively represent multivariate data commonly encountered in remote sensing applications are described. While this may involve scatter diagrams, multivariate representations of nonparametric probability density estimates are emphasized. The density function provides a useful graphical tool for looking at data and a useful theoretical tool for classification. This approach is called a thunderstorm data analysis.

  14. Particle analysis using laser ablation mass spectroscopy

    DOEpatents

    Parker, Eric P.; Rosenthal, Stephen E.; Trahan, Michael W.; Wagner, John S.

    2003-09-09

    The present invention provides a method of quickly identifying bioaerosols by class, even if the subject bioaerosol has not been previously encountered. The method begins by collecting laser ablation mass spectra from known particles. The spectra are correlated with the known particles, including the species of particle and the classification (e.g., bacteria). The spectra can then be used to train a neural network, for example using genetic algorithm-based training, to recognize each spectra and to recognize characteristics of the classifications. The spectra can also be used in a multivariate patch algorithm. Laser ablation mass specta from unknown particles can be presented as inputs to the trained neural net for identification as to classification. The description below first describes suitable intelligent algorithms and multivariate patch algorithms, then presents an example of the present invention including results.

  15. Estuarial fingerprinting through multidimensional fluorescence and multivariate analysis.

    PubMed

    Hall, Gregory J; Clow, Kerin E; Kenny, Jonathan E

    2005-10-01

    As part of a strategy for preventing the introduction of aquatic nuisance species (ANS) to U.S. estuaries, ballast water exchange (BWE) regulations have been imposed. Enforcing these regulations requires a reliable method for determining the port of origin of water in the ballast tanks of ships entering U.S. waters. This study shows that a three-dimensional fluorescence fingerprinting technique, excitation emission matrix (EEM) spectroscopy, holds great promise as a ballast water analysis tool. In our technique, EEMs are analyzed by multivariate classification and curve resolution methods, such as N-way partial least squares Regression-discriminant analysis (NPLS-DA) and parallel factor analysis (PARAFAC). We demonstrate that classification techniques can be used to discriminate among sampling sites less than 10 miles apart, encompassing Boston Harbor and two tributaries in the Mystic River Watershed. To our knowledge, this work is the first to use multivariate analysis to classify water as to location of origin. Furthermore, it is shown that curve resolution can show seasonal features within the multidimensional fluorescence data sets, which correlate with difficulty in classification.

  16. Microcomputer-based classification of environmental data in municipal areas

    NASA Astrophysics Data System (ADS)

    Thiergärtner, H.

    1995-10-01

    Multivariate data-processing methods used in mineral resource identification can be used to classify urban regions. Using elements of expert systems, geographical information systems, as well as known classification and prognosis systems, it is possible to outline a single model that consists of resistant and of temporary parts of a knowledge base including graphical input and output treatment and of resistant and temporary elements of a bank of methods and algorithms. Whereas decision rules created by experts will be stored in expert systems directly, powerful classification rules in form of resistant but latent (implicit) decision algorithms may be implemented in the suggested model. The latent functions will be transformed into temporary explicit decision rules by learning processes depending on the actual task(s), parameter set(s), pixels selection(s), and expert control(s). This takes place both at supervised and nonsupervised classification of multivariately described pixel sets representing municipal subareas. The model is outlined briefly and illustrated by results obtained in a target area covering a part of the city of Berlin (Germany).

  17. Multivariate spline methods in surface fitting

    NASA Technical Reports Server (NTRS)

    Guseman, L. F., Jr. (Principal Investigator); Schumaker, L. L.

    1984-01-01

    The use of spline functions in the development of classification algorithms is examined. In particular, a method is formulated for producing spline approximations to bivariate density functions where the density function is decribed by a histogram of measurements. The resulting approximations are then incorporated into a Bayesiaan classification procedure for which the Bayes decision regions and the probability of misclassification is readily computed. Some preliminary numerical results are presented to illustrate the method.

  18. Parental Perceptions of Their Adolescent's Weight Status: The ECHO Study

    ERIC Educational Resources Information Center

    Hearst, Mary O.; Sherwood, Nancy E.; Klein, Elizabeth G.; Pasch, Keryn E.; Lytle, Leslie A.

    2011-01-01

    Objectives: To assess the correlates of parental classification of adolescent weight status. Methods: Measured adolescent weight status was compared to parent self-report perception data (n 374 dyads) using multivariate analyses with interactions to identify characteristics associated with inaccurate parent classification of adolescent weight…

  19. Fast classification of hazelnut cultivars through portable infrared spectroscopy and chemometrics

    NASA Astrophysics Data System (ADS)

    Manfredi, Marcello; Robotti, Elisa; Quasso, Fabio; Mazzucco, Eleonora; Calabrese, Giorgio; Marengo, Emilio

    2018-01-01

    The authentication and traceability of hazelnuts is very important for both the consumer and the food industry, to safeguard the protected varieties and the food quality. This study investigates the use of a portable FTIR spectrometer coupled to multivariate statistical analysis for the classification of raw hazelnuts. The method discriminates hazelnuts from different origins/cultivars based on differences of the signal intensities of their IR spectra. The multivariate classification methods, namely principal component analysis (PCA) followed by linear discriminant analysis (LDA) and partial least square discriminant analysis (PLS-DA), with or without variable selection, allowed a very good discrimination among the groups, with PLS-DA coupled to variable selection providing the best results. Due to the fast analysis, high sensitivity, simplicity and no sample preparation, the proposed analytical methodology could be successfully used to verify the cultivar of hazelnuts, and the analysis can be performed quickly and directly on site.

  20. Multivariate qualitative analysis of banned additives in food safety using surface enhanced Raman scattering spectroscopy

    NASA Astrophysics Data System (ADS)

    He, Shixuan; Xie, Wanyi; Zhang, Wei; Zhang, Liqun; Wang, Yunxia; Liu, Xiaoling; Liu, Yulong; Du, Chunlei

    2015-02-01

    A novel strategy which combines iteratively cubic spline fitting baseline correction method with discriminant partial least squares qualitative analysis is employed to analyze the surface enhanced Raman scattering (SERS) spectroscopy of banned food additives, such as Sudan I dye and Rhodamine B in food, Malachite green residues in aquaculture fish. Multivariate qualitative analysis methods, using the combination of spectra preprocessing iteratively cubic spline fitting (ICSF) baseline correction with principal component analysis (PCA) and discriminant partial least squares (DPLS) classification respectively, are applied to investigate the effectiveness of SERS spectroscopy for predicting the class assignments of unknown banned food additives. PCA cannot be used to predict the class assignments of unknown samples. However, the DPLS classification can discriminate the class assignment of unknown banned additives using the information of differences in relative intensities. The results demonstrate that SERS spectroscopy combined with ICSF baseline correction method and exploratory analysis methodology DPLS classification can be potentially used for distinguishing the banned food additives in field of food safety.

  1. Rapid differentiation of Ghana cocoa beans by FT-NIR spectroscopy coupled with multivariate classification

    NASA Astrophysics Data System (ADS)

    Teye, Ernest; Huang, Xingyi; Dai, Huang; Chen, Quansheng

    2013-10-01

    Quick, accurate and reliable technique for discrimination of cocoa beans according to geographical origin is essential for quality control and traceability management. This current study presents the application of Near Infrared Spectroscopy technique and multivariate classification for the differentiation of Ghana cocoa beans. A total of 194 cocoa bean samples from seven cocoa growing regions were used. Principal component analysis (PCA) was used to extract relevant information from the spectral data and this gave visible cluster trends. The performance of four multivariate classification methods: Linear discriminant analysis (LDA), K-nearest neighbors (KNN), Back propagation artificial neural network (BPANN) and Support vector machine (SVM) were compared. The performances of the models were optimized by cross validation. The results revealed that; SVM model was superior to all the mathematical methods with a discrimination rate of 100% in both the training and prediction set after preprocessing with Mean centering (MC). BPANN had a discrimination rate of 99.23% for the training set and 96.88% for prediction set. While LDA model had 96.15% and 90.63% for the training and prediction sets respectively. KNN model had 75.01% for the training set and 72.31% for prediction set. The non-linear classification methods used were superior to the linear ones. Generally, the results revealed that NIR Spectroscopy coupled with SVM model could be used successfully to discriminate cocoa beans according to their geographical origins for effective quality assurance.

  2. A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations

    PubMed Central

    Horsch, Salome; Kopczynski, Dominik; Kuthe, Elias; Baumbach, Jörg Ingo; Rahmann, Sven

    2017-01-01

    Motivation Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. Method We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. Results The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology. PMID:28910313

  3. Workshop on Algorithms for Time-Series Analysis

    NASA Astrophysics Data System (ADS)

    Protopapas, Pavlos

    2012-04-01

    abstract-type="normal">SummaryThis Workshop covered the four major subjects listed below in two 90-minute sessions. Each talk or tutorial allowed questions, and concluded with a discussion. Classification: Automatic classification using machine-learning methods is becoming a standard in surveys that generate large datasets. Ashish Mahabal (Caltech) reviewed various methods, and presented examples of several applications. Time-Series Modelling: Suzanne Aigrain (Oxford University) discussed autoregressive models and multivariate approaches such as Gaussian Processes. Meta-classification/mixture of expert models: Karim Pichara (Pontificia Universidad Católica, Chile) described the substantial promise which machine-learning classification methods are now showing in automatic classification, and discussed how the various methods can be combined together. Event Detection: Pavlos Protopapas (Harvard) addressed methods of fast identification of events with low signal-to-noise ratios, enlarging on the characterization and statistical issues of low signal-to-noise ratios and rare events.

  4. Multivariate qualitative analysis of banned additives in food safety using surface enhanced Raman scattering spectroscopy.

    PubMed

    He, Shixuan; Xie, Wanyi; Zhang, Wei; Zhang, Liqun; Wang, Yunxia; Liu, Xiaoling; Liu, Yulong; Du, Chunlei

    2015-02-25

    A novel strategy which combines iteratively cubic spline fitting baseline correction method with discriminant partial least squares qualitative analysis is employed to analyze the surface enhanced Raman scattering (SERS) spectroscopy of banned food additives, such as Sudan I dye and Rhodamine B in food, Malachite green residues in aquaculture fish. Multivariate qualitative analysis methods, using the combination of spectra preprocessing iteratively cubic spline fitting (ICSF) baseline correction with principal component analysis (PCA) and discriminant partial least squares (DPLS) classification respectively, are applied to investigate the effectiveness of SERS spectroscopy for predicting the class assignments of unknown banned food additives. PCA cannot be used to predict the class assignments of unknown samples. However, the DPLS classification can discriminate the class assignment of unknown banned additives using the information of differences in relative intensities. The results demonstrate that SERS spectroscopy combined with ICSF baseline correction method and exploratory analysis methodology DPLS classification can be potentially used for distinguishing the banned food additives in field of food safety. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. Fast discrimination of hydroxypropyl methyl cellulose using portable Raman spectrometer and multivariate methods

    NASA Astrophysics Data System (ADS)

    Song, Biao; Lu, Dan; Peng, Ming; Li, Xia; Zou, Ye; Huang, Meizhen; Lu, Feng

    2017-02-01

    Raman spectroscopy is developed as a fast and non-destructive method for the discrimination and classification of hydroxypropyl methyl cellulose (HPMC) samples. 44 E series and 41 K series of HPMC samples are measured by a self-developed portable Raman spectrometer (Hx-Raman) which is excited by a 785 nm diode laser and the spectrum range is 200-2700 cm-1 with a resolution (FWHM) of 6 cm-1. Multivariate analysis is applied for discrimination of E series from K series. By methods of principal components analysis (PCA) and Fisher discriminant analysis (FDA), a discrimination result with sensitivity of 90.91% and specificity of 95.12% is achieved. The corresponding receiver operating characteristic (ROC) is 0.99, indicting the accuracy of the predictive model. This result demonstrates the prospect of portable Raman spectrometer for rapid, non-destructive classification and discrimination of E series and K series samples of HPMC.

  6. Evaluation of AMOEBA: a spectral-spatial classification method

    USGS Publications Warehouse

    Jenson, Susan K.; Loveland, Thomas R.; Bryant, J.

    1982-01-01

    Muitispectral remotely sensed images have been treated as arbitrary multivariate spectral data for purposes of clustering and classifying. However, the spatial properties of image data can also be exploited. AMOEBA is a clustering and classification method that is based on a spatially derived model for image data. In an evaluation test, Landsat data were classified with both AMOEBA and a widely used spectral classifier. The test showed that irrigated crop types can be classified as accurately with the AMOEBA method as with the generally used spectral method ISOCLS; the AMOEBA method, however, requires less computer time.

  7. Visual classification of very fine-grained sediments: Evaluation through univariate and multivariate statistics

    USGS Publications Warehouse

    Hohn, M. Ed; Nuhfer, E.B.; Vinopal, R.J.; Klanderman, D.S.

    1980-01-01

    Classifying very fine-grained rocks through fabric elements provides information about depositional environments, but is subject to the biases of visual taxonomy. To evaluate the statistical significance of an empirical classification of very fine-grained rocks, samples from Devonian shales in four cored wells in West Virginia and Virginia were measured for 15 variables: quartz, illite, pyrite and expandable clays determined by X-ray diffraction; total sulfur, organic content, inorganic carbon, matrix density, bulk density, porosity, silt, as well as density, sonic travel time, resistivity, and ??-ray response measured from well logs. The four lithologic types comprised: (1) sharply banded shale, (2) thinly laminated shale, (3) lenticularly laminated shale, and (4) nonbanded shale. Univariate and multivariate analyses of variance showed that the lithologic classification reflects significant differences for the variables measured, difference that can be detected independently of stratigraphic effects. Little-known statistical methods found useful in this work included: the multivariate analysis of variance with more than one effect, simultaneous plotting of samples and variables on canonical variates, and the use of parametric ANOVA and MANOVA on ranked data. ?? 1980 Plenum Publishing Corporation.

  8. Large Uptake of Titania and Iron Oxide Nanoparticles in the Nucleus of Lung Epithelial Cells as Measured by Raman Imaging and Multivariate Classification

    PubMed Central

    Ahlinder, Linnea; Ekstrand-Hammarström, Barbro; Geladi, Paul; Österlund, Lars

    2013-01-01

    It is a challenging task to characterize the biodistribution of nanoparticles in cells and tissue on a subcellular level. Conventional methods to study the interaction of nanoparticles with living cells rely on labeling techniques that either selectively stain the particles or selectively tag them with tracer molecules. In this work, Raman imaging, a label-free technique that requires no extensive sample preparation, was combined with multivariate classification to quantify the spatial distribution of oxide nanoparticles inside living lung epithelial cells (A549). Cells were exposed to TiO2 (titania) and/or α-FeO(OH) (goethite) nanoparticles at various incubation times (4 or 48 h). Using multivariate classification of hyperspectral Raman data with partial least-squares discriminant analysis, we show that a surprisingly large fraction of spectra, classified as belonging to the cell nucleus, show Raman bands associated with nanoparticles. Up to 40% of spectra from the cell nucleus show Raman bands associated with nanoparticles. Complementary transmission electron microscopy data for thin cell sections qualitatively support the conclusions. PMID:23870252

  9. Design of neural networks for classification of remotely sensed imagery

    NASA Technical Reports Server (NTRS)

    Chettri, Samir R.; Cromp, Robert F.; Birmingham, Mark

    1992-01-01

    Classification accuracies of a backpropagation neural network are discussed and compared with a maximum likelihood classifier (MLC) with multivariate normal class models. We have found that, because of its nonparametric nature, the neural network outperforms the MLC in this area. In addition, we discuss techniques for constructing optimal neural nets on parallel hardware like the MasPar MP-1 currently at GSFC. Other important discussions are centered around training and classification times of the two methods, and sensitivity to the training data. Finally, we discuss future work in the area of classification and neural nets.

  10. Multivariate classification of infrared spectra of cell and tissue samples

    DOEpatents

    Haaland, David M.; Jones, Howland D. T.; Thomas, Edward V.

    1997-01-01

    Multivariate classification techniques are applied to spectra from cell and tissue samples irradiated with infrared radiation to determine if the samples are normal or abnormal (cancerous). Mid and near infrared radiation can be used for in vivo and in vitro classifications using at least different wavelengths.

  11. Introduction to multivariate discrimination

    NASA Astrophysics Data System (ADS)

    Kégl, Balázs

    2013-07-01

    Multivariate discrimination or classification is one of the best-studied problem in machine learning, with a plethora of well-tested and well-performing algorithms. There are also several good general textbooks [1-9] on the subject written to an average engineering, computer science, or statistics graduate student; most of them are also accessible for an average physics student with some background on computer science and statistics. Hence, instead of writing a generic introduction, we concentrate here on relating the subject to a practitioner experimental physicist. After a short introduction on the basic setup (Section 1) we delve into the practical issues of complexity regularization, model selection, and hyperparameter optimization (Section 2), since it is this step that makes high-complexity non-parametric fitting so different from low-dimensional parametric fitting. To emphasize that this issue is not restricted to classification, we illustrate the concept on a low-dimensional but non-parametric regression example (Section 2.1). Section 3 describes the common algorithmic-statistical formal framework that unifies the main families of multivariate classification algorithms. We explain here the large-margin principle that partly explains why these algorithms work. Section 4 is devoted to the description of the three main (families of) classification algorithms, neural networks, the support vector machine, and AdaBoost. We do not go into the algorithmic details; the goal is to give an overview on the form of the functions these methods learn and on the objective functions they optimize. Besides their technical description, we also make an attempt to put these algorithm into a socio-historical context. We then briefly describe some rather heterogeneous applications to illustrate the pattern recognition pipeline and to show how widespread the use of these methods is (Section 5). We conclude the chapter with three essentially open research problems that are either relevant to or even motivated by certain unorthodox applications of multivariate discrimination in experimental physics.

  12. Spatial patterns of brain atrophy in MCI patients, identified via high-dimensional pattern classification, predict subsequent cognitive decline

    PubMed Central

    Fan, Yong; Batmanghelich, Nematollah; Clark, Chris M.; Davatzikos, Christos

    2010-01-01

    Spatial patterns of brain atrophy in mild cognitive impairment (MCI) and Alzheimer’s disease (AD) were measured via methods of computational neuroanatomy. These patterns were spatially complex and involved many brain regions. In addition to the hippocampus and the medial temporal lobe gray matter, a number of other regions displayed significant atrophy, including orbitofrontal and medial-prefrontal grey matter, cingulate (mainly posterior), insula, uncus, and temporal lobe white matter. Approximately 2/3 of the MCI group presented patterns of atrophy that overlapped with AD, whereas the remaining 1/3 overlapped with cognitively normal individuals, thereby indicating that some, but not all, MCI patients have significant and extensive brain atrophy in this cohort of MCI patients. Importantly, the group with AD-like patterns presented much higher rate of MMSE decline in follow-up visits; conversely, pattern classification provided relatively high classification accuracy (87%) of the individuals that presented relatively higher MMSE decline within a year from baseline. High-dimensional pattern classification, a nonlinear multivariate analysis, provided measures of structural abnormality that can potentially be useful for individual patient classification, as well as for predicting progression and examining multivariate relationships in group analyses. PMID:18053747

  13. Interpreting support vector machine models for multivariate group wise analysis in neuroimaging

    PubMed Central

    Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos

    2015-01-01

    Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913

  14. Laser-induced breakdown spectroscopy-based investigation and classification of pharmaceutical tablets using multivariate chemometric analysis

    PubMed Central

    Myakalwar, Ashwin Kumar; Sreedhar, S.; Barman, Ishan; Dingari, Narahara Chari; Rao, S. Venugopal; Kiran, P. Prem; Tewari, Surya P.; Kumar, G. Manoj

    2012-01-01

    We report the effectiveness of laser-induced breakdown spectroscopy (LIBS) in probing the content of pharmaceutical tablets and also investigate its feasibility for routine classification. This method is particularly beneficial in applications where its exquisite chemical specificity and suitability for remote and on site characterization significantly improves the speed and accuracy of quality control and assurance process. Our experiments reveal that in addition to the presence of carbon, hydrogen, nitrogen and oxygen, which can be primarily attributed to the active pharmaceutical ingredients, specific inorganic atoms were also present in all the tablets. Initial attempts at classification by a ratiometric approach using oxygen to nitrogen compositional values yielded an optimal value (at 746.83 nm) with the least relative standard deviation but nevertheless failed to provide an acceptable classification. To overcome this bottleneck in the detection process, two chemometric algorithms, i.e. principal component analysis (PCA) and soft independent modeling of class analogy (SIMCA), were implemented to exploit the multivariate nature of the LIBS data demonstrating that LIBS has the potential to differentiate and discriminate among pharmaceutical tablets. We report excellent prospective classification accuracy using supervised classification via the SIMCA algorithm, demonstrating its potential for future applications in process analytical technology, especially for fast on-line process control monitoring applications in the pharmaceutical industry. PMID:22099648

  15. Characterization of Escherichia coli isolates from different fecal sources by means of classification tree analysis of fatty acid methyl ester (FAME) profiles.

    PubMed

    Seurinck, Sylvie; Deschepper, Ellen; Deboch, Bishaw; Verstraete, Willy; Siciliano, Steven

    2006-03-01

    Microbial source tracking (MST) methods need to be rapid, inexpensive and accurate. Unfortunately, many MST methods provide a wealth of information that is difficult to interpret by the regulators who use this information to make decisions. This paper describes the use of classification tree analysis to interpret the results of a MST method based on fatty acid methyl ester (FAME) profiles of Escherichia coli isolates, and to present results in a format readily interpretable by water quality managers. Raw sewage E. coli isolates and animal E. coli isolates from cow, dog, gull, and horse were isolated and their FAME profiles collected. Correct classification rates determined with leaveone-out cross-validation resulted in an overall low correct classification rate of 61%. A higher overall correct classification rate of 85% was obtained when the animal isolates were pooled together and compared to the raw sewage isolates. Bootstrap aggregation or adaptive resampling and combining of the FAME profile data increased correct classification rates substantially. Other MST methods may be better suited to differentiate between different fecal sources but classification tree analysis has enabled us to distinguish raw sewage from animal E. coli isolates, which previously had not been possible with other multivariate methods such as principal component analysis and cluster analysis.

  16. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, George

    1993-01-01

    The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multiparameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resource.

  17. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, Stanislav

    1992-01-01

    The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.

  18. Discrimination of irradiated MOX fuel from UOX fuel by multivariate statistical analysis of simulated activities of gamma-emitting isotopes

    NASA Astrophysics Data System (ADS)

    Åberg Lindell, M.; Andersson, P.; Grape, S.; Hellesen, C.; Håkansson, A.; Thulin, M.

    2018-03-01

    This paper investigates how concentrations of certain fission products and their related gamma-ray emissions can be used to discriminate between uranium oxide (UOX) and mixed oxide (MOX) type fuel. Discrimination of irradiated MOX fuel from irradiated UOX fuel is important in nuclear facilities and for transport of nuclear fuel, for purposes of both criticality safety and nuclear safeguards. Although facility operators keep records on the identity and properties of each fuel, tools for nuclear safeguards inspectors that enable independent verification of the fuel are critical in the recovery of continuity of knowledge, should it be lost. A discrimination methodology for classification of UOX and MOX fuel, based on passive gamma-ray spectroscopy data and multivariate analysis methods, is presented. Nuclear fuels and their gamma-ray emissions were simulated in the Monte Carlo code Serpent, and the resulting data was used as input to train seven different multivariate classification techniques. The trained classifiers were subsequently implemented and evaluated with respect to their capabilities to correctly predict the classes of unknown fuel items. The best results concerning successful discrimination of UOX and MOX-fuel were acquired when using non-linear classification techniques, such as the k nearest neighbors method and the Gaussian kernel support vector machine. For fuel with cooling times up to 20 years, when it is considered that gamma-rays from the isotope 134Cs can still be efficiently measured, success rates of 100% were obtained. A sensitivity analysis indicated that these methods were also robust.

  19. The influence of different classification standards of age groups on prognosis in high-grade hemispheric glioma patients.

    PubMed

    Chen, Jian-Wu; Zhou, Chang-Fu; Lin, Zhi-Xiong

    2015-09-15

    Although age is thought to correlate with the prognosis of glioma patients, the most appropriate age-group classification standard to evaluate prognosis had not been fully studied. This study aimed to investigate the influence of age-group classification standards on the prognosis of patients with high-grade hemispheric glioma (HGG). This retrospective study of 125 HGG patients used three different classification standards of age-groups (≤ 50 and >50 years old, ≤ 60 and >60 years old, ≤ 45 and 45-65 and ≥ 65 years old) to evaluate the impact of age on prognosis. The primary end-point was overall survival (OS). The Kaplan-Meier method was applied for univariate analysis and Cox proportional hazards model for multivariate analysis. Univariate analysis showed a significant correlation between OS and all three classification standards of age-groups as well as between OS and pathological grade, gender, location of glioma, and regular chemotherapy and radiotherapy treatment. Multivariate analysis showed that the only independent predictors of OS were classification standard of age-groups ≤ 50 and > 50 years old, pathological grade and regular chemotherapy. In summary, the most appropriate classification standard of age-groups as an independent prognostic factor was ≤ 50 and > 50 years old. Pathological grade and chemotherapy were also independent predictors of OS in post-operative HGG patients. Copyright © 2015. Published by Elsevier B.V.

  20. Multivariate assessment of event-related potentials with the t-CWT method.

    PubMed

    Bostanov, Vladimir

    2015-11-05

    Event-related brain potentials (ERPs) are usually assessed with univariate statistical tests although they are essentially multivariate objects. Brain-computer interface applications are a notable exception to this practice, because they are based on multivariate classification of single-trial ERPs. Multivariate ERP assessment can be facilitated by feature extraction methods. One such method is t-CWT, a mathematical-statistical algorithm based on the continuous wavelet transform (CWT) and Student's t-test. This article begins with a geometric primer on some basic concepts of multivariate statistics as applied to ERP assessment in general and to the t-CWT method in particular. Further, it presents for the first time a detailed, step-by-step, formal mathematical description of the t-CWT algorithm. A new multivariate outlier rejection procedure based on principal component analysis in the frequency domain is presented as an important pre-processing step. The MATLAB and GNU Octave implementation of t-CWT is also made publicly available for the first time as free and open source code. The method is demonstrated on some example ERP data obtained in a passive oddball paradigm. Finally, some conceptually novel applications of the multivariate approach in general and of the t-CWT method in particular are suggested and discussed. Hopefully, the publication of both the t-CWT source code and its underlying mathematical algorithm along with a didactic geometric introduction to some basic concepts of multivariate statistics would make t-CWT more accessible to both users and developers in the field of neuroscience research.

  1. Multivariate geometry as an approach to algal community analysis

    USGS Publications Warehouse

    Allen, T.F.H.; Skagen, S.

    1973-01-01

    Multivariate analyses are put in the context of more usual approaches to phycological investigations. The intuitive common-sense involved in methods of ordination, classification and discrimination are emphasised by simple geometric accounts which avoid jargon and matrix algebra. Warnings are given that artifacts result from technique abuses by the naive or over-enthusiastic. An analysis of a simple periphyton data set is presented as an example of the approach. Suggestions are made as to situations in phycological investigations, where the techniques could be appropriate. The discipline is reprimanded for its neglect of the multivariate approach.

  2. Multivariate pattern analysis of fMRI data reveals deficits in distributed representations in schizophrenia

    PubMed Central

    Yoon, Jong H.; Tamir, Diana; Minzenberg, Michael J.; Ragland, J. Daniel; Ursu, Stefan; Carter, Cameron S.

    2009-01-01

    Background Multivariate pattern analysis is an alternative method of analyzing fMRI data, which is capable of decoding distributed neural representations. We applied this method to test the hypothesis of the impairment in distributed representations in schizophrenia. We also compared the results of this method with traditional GLM-based univariate analysis. Methods 19 schizophrenia and 15 control subjects viewed two runs of stimuli--exemplars of faces, scenes, objects, and scrambled images. To verify engagement with stimuli, subjects completed a 1-back matching task. A multi-voxel pattern classifier was trained to identify category-specific activity patterns on one run of fMRI data. Classification testing was conducted on the remaining run. Correlation of voxel-wise activity across runs evaluated variance over time in activity patterns. Results Patients performed the task less accurately. This group difference was reflected in the pattern analysis results with diminished classification accuracy in patients compared to controls, 59% and 72% respectively. In contrast, there was no group difference in GLM-based univariate measures. In both groups, classification accuracy was significantly correlated with behavioral measures. Both groups showed highly significant correlation between inter-run correlations and classification accuracy. Conclusions Distributed representations of visual objects are impaired in schizophrenia. This impairment is correlated with diminished task performance, suggesting that decreased integrity of cortical activity patterns is reflected in impaired behavior. Comparisons with univariate results suggest greater sensitivity of pattern analysis in detecting group differences in neural activity and reduced likelihood of non-specific factors driving these results. PMID:18822407

  3. REGIONAL-SCALE WIND FIELD CLASSIFICATION EMPLOYING CLUSTER ANALYSIS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Glascoe, L G; Glaser, R E; Chin, H S

    2004-06-17

    The classification of time-varying multivariate regional-scale wind fields at a specific location can assist event planning as well as consequence and risk analysis. Further, wind field classification involves data transformation and inference techniques that effectively characterize stochastic wind field variation. Such a classification scheme is potentially useful for addressing overall atmospheric transport uncertainty and meteorological parameter sensitivity issues. Different methods to classify wind fields over a location include the principal component analysis of wind data (e.g., Hardy and Walton, 1978) and the use of cluster analysis for wind data (e.g., Green et al., 1992; Kaufmann and Weber, 1996). The goalmore » of this study is to use a clustering method to classify the winds of a gridded data set, i.e, from meteorological simulations generated by a forecast model.« less

  4. Drunk driving detection based on classification of multivariate time series.

    PubMed

    Li, Zhenlong; Jin, Xue; Zhao, Xiaohua

    2015-09-01

    This paper addresses the problem of detecting drunk driving based on classification of multivariate time series. First, driving performance measures were collected from a test in a driving simulator located in the Traffic Research Center, Beijing University of Technology. Lateral position and steering angle were used to detect drunk driving. Second, multivariate time series analysis was performed to extract the features. A piecewise linear representation was used to represent multivariate time series. A bottom-up algorithm was then employed to separate multivariate time series. The slope and time interval of each segment were extracted as the features for classification. Third, a support vector machine classifier was used to classify driver's state into two classes (normal or drunk) according to the extracted features. The proposed approach achieved an accuracy of 80.0%. Drunk driving detection based on the analysis of multivariate time series is feasible and effective. The approach has implications for drunk driving detection. Copyright © 2015 Elsevier Ltd and National Safety Council. All rights reserved.

  5. The Classification of Ground Roasted Decaffeinated Coffee Using UV-VIS Spectroscopy and SIMCA Method

    NASA Astrophysics Data System (ADS)

    Yulia, M.; Asnaning, A. R.; Suhandy, D.

    2018-05-01

    In this work, an investigation on the classification between decaffeinated and non- decaffeinated coffee samples using UV-VIS spectroscopy and SIMCA method was investigated. Total 200 samples of ground roasted coffee were used (100 samples for decaffeinated coffee and 100 samples for non-decaffeinated coffee). After extraction and dilution, the spectra of coffee samples solution were acquired using a UV-VIS spectrometer (Genesys™ 10S UV-VIS, Thermo Scientific, USA) in the range of 190-1100 nm. The multivariate analyses of the spectra were performed using principal component analysis (PCA) and soft independent modeling of class analogy (SIMCA). The SIMCA model showed that the classification between decaffeinated and non-decaffeinated coffee samples was detected with 100% sensitivity and specificity.

  6. Discrimination of soft tissues using laser-induced breakdown spectroscopy in combination with k nearest neighbors (kNN) and support vector machine (SVM) classifiers

    NASA Astrophysics Data System (ADS)

    Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying

    2018-06-01

    In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.

  7. Comprehensive analysis of Polygoni Multiflori Radix of different geographical origins using ultra-high-performance liquid chromatography fingerprints and multivariate chemometric methods.

    PubMed

    Sun, Li-Li; Wang, Meng; Zhang, Hui-Jie; Liu, Ya-Nan; Ren, Xiao-Liang; Deng, Yan-Ru; Qi, Ai-Di

    2018-01-01

    Polygoni Multiflori Radix (PMR) is increasingly being used not just as a traditional herbal medicine but also as a popular functional food. In this study, multivariate chemometric methods and mass spectrometry were combined to analyze the ultra-high-performance liquid chromatograph (UPLC) fingerprints of PMR from six different geographical origins. A chemometric strategy based on multivariate curve resolution-alternating least squares (MCR-ALS) and three classification methods is proposed to analyze the UPLC fingerprints obtained. Common chromatographic problems, including the background contribution, baseline contribution, and peak overlap, were handled by the established MCR-ALS model. A total of 22 components were resolved. Moreover, relative species concentrations were obtained from the MCR-ALS model, which was used for multivariate classification analysis. Principal component analysis (PCA) and Ward's method have been applied to classify 72 PMR samples from six different geographical regions. The PCA score plot showed that the PMR samples fell into four clusters, which related to the geographical location and climate of the source areas. The results were then corroborated by Ward's method. In addition, according to the variance-weighted distance between cluster centers obtained from Ward's method, five components were identified as the most significant variables (chemical markers) for cluster discrimination. A counter-propagation artificial neural network has been applied to confirm and predict the effects of chemical markers on different samples. Finally, the five chemical markers were identified by UPLC-quadrupole time-of-flight mass spectrometer. Components 3, 12, 16, 18, and 19 were identified as 2,3,5,4'-tetrahydroxy-stilbene-2-O-β-d-glucoside, emodin-8-O-β-d-glucopyranoside, emodin-8-O-(6'-O-acetyl)-β-d-glucopyranoside, emodin, and physcion, respectively. In conclusion, the proposed method can be applied for the comprehensive analysis of natural samples. Copyright © 2016. Published by Elsevier B.V.

  8. Identification of Reliable Components in Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS): a Data-Driven Approach across Metabolic Processes.

    PubMed

    Motegi, Hiromi; Tsuboi, Yuuri; Saga, Ayako; Kagami, Tomoko; Inoue, Maki; Toki, Hideaki; Minowa, Osamu; Noda, Tetsuo; Kikuchi, Jun

    2015-11-04

    There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. In addition to classical analyses such as hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis, various multivariate strategies, including independent component analysis, non-negative matrix factorization, and multivariate curve resolution, have recently been proposed. However, determining the number of components is problematic. Despite the proposal of several different methods, no satisfactory approach has yet been reported. To resolve this problem, we implemented a new idea: classifying a component as "reliable" or "unreliable" based on the reproducibility of its appearance, regardless of the number of components in the calculation. Using the clustering method for classification, we applied this idea to multivariate curve resolution-alternating least squares (MCR-ALS). Comparisons between conventional and modified methods applied to proton nuclear magnetic resonance ((1)H-NMR) spectral datasets derived from known standard mixtures and biological mixtures (urine and feces of mice) revealed that more plausible results are obtained by the modified method. In particular, clusters containing little information were detected with reliability. This strategy, named "cluster-aided MCR-ALS," will facilitate the attainment of more reliable results in the metabolomics datasets.

  9. Three-Way Analysis of Spectrospatial Electromyography Data: Classification and Interpretation

    PubMed Central

    Kauppi, Jukka-Pekka; Hahne, Janne; Müller, Klaus-Robert; Hyvärinen, Aapo

    2015-01-01

    Classifying multivariate electromyography (EMG) data is an important problem in prosthesis control as well as in neurophysiological studies and diagnosis. With modern high-density EMG sensor technology, it is possible to capture the rich spectrospatial structure of the myoelectric activity. We hypothesize that multi-way machine learning methods can efficiently utilize this structure in classification as well as reveal interesting patterns in it. To this end, we investigate the suitability of existing three-way classification methods to EMG-based hand movement classification in spectrospatial domain, as well as extend these methods by sparsification and regularization. We propose to use Fourier-domain independent component analysis as preprocessing to improve classification and interpretability of the results. In high-density EMG experiments on hand movements across 10 subjects, three-way classification yielded higher average performance compared with state-of-the art classification based on temporal features, suggesting that the three-way analysis approach can efficiently utilize detailed spectrospatial information of high-density EMG. Phase and amplitude patterns of features selected by the classifier in finger-movement data were found to be consistent with known physiology. Thus, our approach can accurately resolve hand and finger movements on the basis of detailed spectrospatial information, and at the same time allows for physiological interpretation of the results. PMID:26039100

  10. Mapping the Diversity among Runaways: A Descriptive Multivariate Analysis of Selected Social Psychological Background Conditions.

    ERIC Educational Resources Information Center

    Brennan, Tim

    1980-01-01

    A review of prior classification systems of runaways is followed by a descriptive taxonomy of runaways developed using cluster-analytic methods. The empirical types illustrate patterns of weakness in bonds between runaways and families, schools, or peer relationships. (Author)

  11. Proceedings of the Third Annual Symposium on Mathematical Pattern Recognition and Image Analysis

    NASA Technical Reports Server (NTRS)

    Guseman, L. F., Jr.

    1985-01-01

    Topics addressed include: multivariate spline method; normal mixture analysis applied to remote sensing; image data analysis; classifications in spatially correlated environments; probability density functions; graphical nonparametric methods; subpixel registration analysis; hypothesis integration in image understanding systems; rectification of satellite scanner imagery; spatial variation in remotely sensed images; smooth multidimensional interpolation; and optimal frequency domain textural edge detection filters.

  12. Supervision of Ethylene Propylene Diene M-Class (EPDM) Rubber Vulcanization and Recovery Processes Using Attenuated Total Reflection Fourier Transform Infrared (ATR FT-IR) Spectroscopy and Multivariate Analysis.

    PubMed

    Riba Ruiz, Jordi-Roger; Canals, Trini; Cantero, Rosa

    2017-01-01

    Ethylene propylene diene monomer (EPDM) rubber is widely used in a diverse type of applications, such as the automotive, industrial and construction sectors among others. Due to its appealing features, the consumption of vulcanized EPDM rubber is growing significantly. However, environmental issues are forcing the application of devulcanization processes to facilitate recovery, which has led rubber manufacturers to implement strict quality controls. Consequently, it is important to develop methods for supervising the vulcanizing and recovery processes of such products. This paper deals with the supervision process of EPDM compounds by means of Fourier transform mid-infrared (FT-IR) spectroscopy and suitable multivariate statistical methods. An expedited and nondestructive classification approach was applied to a sufficient number of EPDM samples with different applied processes, that is, with and without application of vulcanizing agents, vulcanized samples, and microwave treated samples. First the FT-IR spectra of the samples is acquired and next it is processed by applying suitable feature extraction methods, i.e., principal component analysis and canonical variate analysis to obtain the latent variables to be used for classifying test EPDM samples. Finally, the k nearest neighbor algorithm was used in the classification stage. Experimental results prove the accuracy of the proposed method and the potential of FT-IR spectroscopy in this area, since the classification accuracy can be as high as 100%.

  13. Nonlinear multivariate and time series analysis by neural network methods

    NASA Astrophysics Data System (ADS)

    Hsieh, William W.

    2004-03-01

    Methods in multivariate statistical analysis are essential for working with large amounts of geophysical data, data from observational arrays, from satellites, or from numerical model output. In classical multivariate statistical analysis, there is a hierarchy of methods, starting with linear regression at the base, followed by principal component analysis (PCA) and finally canonical correlation analysis (CCA). A multivariate time series method, the singular spectrum analysis (SSA), has been a fruitful extension of the PCA technique. The common drawback of these classical methods is that only linear structures can be correctly extracted from the data. Since the late 1980s, neural network methods have become popular for performing nonlinear regression and classification. More recently, neural network methods have been extended to perform nonlinear PCA (NLPCA), nonlinear CCA (NLCCA), and nonlinear SSA (NLSSA). This paper presents a unified view of the NLPCA, NLCCA, and NLSSA techniques and their applications to various data sets of the atmosphere and the ocean (especially for the El Niño-Southern Oscillation and the stratospheric quasi-biennial oscillation). These data sets reveal that the linear methods are often too simplistic to describe real-world systems, with a tendency to scatter a single oscillatory phenomenon into numerous unphysical modes or higher harmonics, which can be largely alleviated in the new nonlinear paradigm.

  14. Chemometric and multivariate statistical analysis of time-of-flight secondary ion mass spectrometry spectra from complex Cu-Fe sulfides.

    PubMed

    Kalegowda, Yogesh; Harmer, Sarah L

    2012-03-20

    Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.

  15. Refined composite multivariate generalized multiscale fuzzy entropy: A tool for complexity analysis of multichannel signals

    NASA Astrophysics Data System (ADS)

    Azami, Hamed; Escudero, Javier

    2017-01-01

    Multiscale entropy (MSE) is an appealing tool to characterize the complexity of time series over multiple temporal scales. Recent developments in the field have tried to extend the MSE technique in different ways. Building on these trends, we propose the so-called refined composite multivariate multiscale fuzzy entropy (RCmvMFE) whose coarse-graining step uses variance (RCmvMFEσ2) or mean (RCmvMFEμ). We investigate the behavior of these multivariate methods on multichannel white Gaussian and 1/ f noise signals, and two publicly available biomedical recordings. Our simulations demonstrate that RCmvMFEσ2 and RCmvMFEμ lead to more stable results and are less sensitive to the signals' length in comparison with the other existing multivariate multiscale entropy-based methods. The classification results also show that using both the variance and mean in the coarse-graining step offers complexity profiles with complementary information for biomedical signal analysis. We also made freely available all the Matlab codes used in this paper.

  16. Predicting trauma patient mortality: ICD [or ICD-10-AM] versus AIS based approaches.

    PubMed

    Willis, Cameron D; Gabbe, Belinda J; Jolley, Damien; Harrison, James E; Cameron, Peter A

    2010-11-01

    The International Classification of Diseases Injury Severity Score (ICISS) has been proposed as an International Classification of Diseases (ICD)-10-based alternative to mortality prediction tools that use Abbreviated Injury Scale (AIS) data, including the Trauma and Injury Severity Score (TRISS). To date, studies have not examined the performance of ICISS using Australian trauma registry data. This study aimed to compare the performance of ICISS with other mortality prediction tools in an Australian trauma registry. This was a retrospective review of prospectively collected data from the Victorian State Trauma Registry. A training dataset was created for model development and a validation dataset for evaluation. The multiplicative ICISS model was compared with a worst injury ICISS approach, Victorian TRISS (V-TRISS, using local coefficients), maximum AIS severity and a multivariable model including ICD-10-AM codes as predictors. Models were investigated for discrimination (C-statistic) and calibration (Hosmer-Lemeshow statistic). The multivariable approach had the highest level of discrimination (C-statistic 0.90) and calibration (H-L 7.65, P= 0.468). Worst injury ICISS, V-TRISS and maximum AIS had similar performance. The multiplicative ICISS produced the lowest level of discrimination (C-statistic 0.80) and poorest calibration (H-L 50.23, P < 0.001). The performance of ICISS may be affected by the data used to develop estimates, the ICD version employed, the methods for deriving estimates and the inclusion of covariates. In this analysis, a multivariable approach using ICD-10-AM codes was the best-performing method. A multivariable ICISS approach may therefore be a useful alternative to AIS-based methods and may have comparable predictive performance to locally derived TRISS models. © 2010 The Authors. ANZ Journal of Surgery © 2010 Royal Australasian College of Surgeons.

  17. False alarm reduction by the And-ing of multiple multivariate Gaussian classifiers

    NASA Astrophysics Data System (ADS)

    Dobeck, Gerald J.; Cobb, J. Tory

    2003-09-01

    The high-resolution sonar is one of the principal sensors used by the Navy to detect and classify sea mines in minehunting operations. For such sonar systems, substantial effort has been devoted to the development of automated detection and classification (D/C) algorithms. These have been spurred by several factors including (1) aids for operators to reduce work overload, (2) more optimal use of all available data, and (3) the introduction of unmanned minehunting systems. The environments where sea mines are typically laid (harbor areas, shipping lanes, and the littorals) give rise to many false alarms caused by natural, biologic, and man-made clutter. The objective of the automated D/C algorithms is to eliminate most of these false alarms while still maintaining a very high probability of mine detection and classification (PdPc). In recent years, the benefits of fusing the outputs of multiple D/C algorithms have been studied. We refer to this as Algorithm Fusion. The results have been remarkable, including reliable robustness to new environments. This paper describes a method for training several multivariate Gaussian classifiers such that their And-ing dramatically reduces false alarms while maintaining a high probability of classification. This training approach is referred to as the Focused- Training method. This work extends our 2001-2002 work where the Focused-Training method was used with three other types of classifiers: the Attractor-based K-Nearest Neighbor Neural Network (a type of radial-basis, probabilistic neural network), the Optimal Discrimination Filter Classifier (based linear discrimination theory), and the Quadratic Penalty Function Support Vector Machine (QPFSVM). Although our experience has been gained in the area of sea mine detection and classification, the principles described herein are general and can be applied to a wide range of pattern recognition and automatic target recognition (ATR) problems.

  18. A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

    PubMed

    Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

    2015-01-01

    The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.

  19. Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

    PubMed

    Henrard, S; Speybroeck, N; Hermans, C

    2015-11-01

    Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.

  20. A statistical approach to root system classification

    PubMed Central

    Bodner, Gernot; Leitner, Daniel; Nakhforoosh, Alireza; Sobotik, Monika; Moder, Karl; Kaul, Hans-Peter

    2013-01-01

    Plant root systems have a key role in ecology and agronomy. In spite of fast increase in root studies, still there is no classification that allows distinguishing among distinctive characteristics within the diversity of rooting strategies. Our hypothesis is that a multivariate approach for “plant functional type” identification in ecology can be applied to the classification of root systems. The classification method presented is based on a data-defined statistical procedure without a priori decision on the classifiers. The study demonstrates that principal component based rooting types provide efficient and meaningful multi-trait classifiers. The classification method is exemplified with simulated root architectures and morphological field data. Simulated root architectures showed that morphological attributes with spatial distribution parameters capture most distinctive features within root system diversity. While developmental type (tap vs. shoot-borne systems) is a strong, but coarse classifier, topological traits provide the most detailed differentiation among distinctive groups. Adequacy of commonly available morphologic traits for classification is supported by field data. Rooting types emerging from measured data, mainly distinguished by diameter/weight and density dominated types. Similarity of root systems within distinctive groups was the joint result of phylogenetic relation and environmental as well as human selection pressure. We concluded that the data-define classification is appropriate for integration of knowledge obtained with different root measurement methods and at various scales. Currently root morphology is the most promising basis for classification due to widely used common measurement protocols. To capture details of root diversity efforts in architectural measurement techniques are essential. PMID:23914200

  1. A statistical approach to root system classification.

    PubMed

    Bodner, Gernot; Leitner, Daniel; Nakhforoosh, Alireza; Sobotik, Monika; Moder, Karl; Kaul, Hans-Peter

    2013-01-01

    Plant root systems have a key role in ecology and agronomy. In spite of fast increase in root studies, still there is no classification that allows distinguishing among distinctive characteristics within the diversity of rooting strategies. Our hypothesis is that a multivariate approach for "plant functional type" identification in ecology can be applied to the classification of root systems. The classification method presented is based on a data-defined statistical procedure without a priori decision on the classifiers. The study demonstrates that principal component based rooting types provide efficient and meaningful multi-trait classifiers. The classification method is exemplified with simulated root architectures and morphological field data. Simulated root architectures showed that morphological attributes with spatial distribution parameters capture most distinctive features within root system diversity. While developmental type (tap vs. shoot-borne systems) is a strong, but coarse classifier, topological traits provide the most detailed differentiation among distinctive groups. Adequacy of commonly available morphologic traits for classification is supported by field data. Rooting types emerging from measured data, mainly distinguished by diameter/weight and density dominated types. Similarity of root systems within distinctive groups was the joint result of phylogenetic relation and environmental as well as human selection pressure. We concluded that the data-define classification is appropriate for integration of knowledge obtained with different root measurement methods and at various scales. Currently root morphology is the most promising basis for classification due to widely used common measurement protocols. To capture details of root diversity efforts in architectural measurement techniques are essential.

  2. Identifying when tagged fishes have been consumed by piscivorous predators: application of multivariate mixture models to movement parameters of telemetered fishes

    USGS Publications Warehouse

    Romine, Jason G.; Perry, Russell W.; Johnston, Samuel V.; Fitzer, Christopher W.; Pagliughi, Stephen W.; Blake, Aaron R.

    2013-01-01

    Mixture models proved valuable as a means to differentiate between salmonid smolts and predators that consumed salmonid smolts. However, successful application of this method requires that telemetered fishes and their predators exhibit measurable differences in movement behavior. Our approach is flexible, allows inclusion of multiple track statistics and improves upon rule-based manual classification methods.

  3. Models of Marine Fish Biodiversity: Assessing Predictors from Three Habitat Classification Schemes.

    PubMed

    Yates, Katherine L; Mellin, Camille; Caley, M Julian; Radford, Ben T; Meeuwig, Jessica J

    2016-01-01

    Prioritising biodiversity conservation requires knowledge of where biodiversity occurs. Such knowledge, however, is often lacking. New technologies for collecting biological and physical data coupled with advances in modelling techniques could help address these gaps and facilitate improved management outcomes. Here we examined the utility of environmental data, obtained using different methods, for developing models of both uni- and multivariate biodiversity metrics. We tested which biodiversity metrics could be predicted best and evaluated the performance of predictor variables generated from three types of habitat data: acoustic multibeam sonar imagery, predicted habitat classification, and direct observer habitat classification. We used boosted regression trees (BRT) to model metrics of fish species richness, abundance and biomass, and multivariate regression trees (MRT) to model biomass and abundance of fish functional groups. We compared model performance using different sets of predictors and estimated the relative influence of individual predictors. Models of total species richness and total abundance performed best; those developed for endemic species performed worst. Abundance models performed substantially better than corresponding biomass models. In general, BRT and MRTs developed using predicted habitat classifications performed less well than those using multibeam data. The most influential individual predictor was the abiotic categorical variable from direct observer habitat classification and models that incorporated predictors from direct observer habitat classification consistently outperformed those that did not. Our results show that while remotely sensed data can offer considerable utility for predictive modelling, the addition of direct observer habitat classification data can substantially improve model performance. Thus it appears that there are aspects of marine habitats that are important for modelling metrics of fish biodiversity that are not fully captured by remotely sensed data. As such, the use of remotely sensed data to model biodiversity represents a compromise between model performance and data availability.

  4. Models of Marine Fish Biodiversity: Assessing Predictors from Three Habitat Classification Schemes

    PubMed Central

    Yates, Katherine L.; Mellin, Camille; Caley, M. Julian; Radford, Ben T.; Meeuwig, Jessica J.

    2016-01-01

    Prioritising biodiversity conservation requires knowledge of where biodiversity occurs. Such knowledge, however, is often lacking. New technologies for collecting biological and physical data coupled with advances in modelling techniques could help address these gaps and facilitate improved management outcomes. Here we examined the utility of environmental data, obtained using different methods, for developing models of both uni- and multivariate biodiversity metrics. We tested which biodiversity metrics could be predicted best and evaluated the performance of predictor variables generated from three types of habitat data: acoustic multibeam sonar imagery, predicted habitat classification, and direct observer habitat classification. We used boosted regression trees (BRT) to model metrics of fish species richness, abundance and biomass, and multivariate regression trees (MRT) to model biomass and abundance of fish functional groups. We compared model performance using different sets of predictors and estimated the relative influence of individual predictors. Models of total species richness and total abundance performed best; those developed for endemic species performed worst. Abundance models performed substantially better than corresponding biomass models. In general, BRT and MRTs developed using predicted habitat classifications performed less well than those using multibeam data. The most influential individual predictor was the abiotic categorical variable from direct observer habitat classification and models that incorporated predictors from direct observer habitat classification consistently outperformed those that did not. Our results show that while remotely sensed data can offer considerable utility for predictive modelling, the addition of direct observer habitat classification data can substantially improve model performance. Thus it appears that there are aspects of marine habitats that are important for modelling metrics of fish biodiversity that are not fully captured by remotely sensed data. As such, the use of remotely sensed data to model biodiversity represents a compromise between model performance and data availability. PMID:27333202

  5. Control-group feature normalization for multivariate pattern analysis of structural MRI data using the support vector machine.

    PubMed

    Linn, Kristin A; Gaonkar, Bilwaj; Satterthwaite, Theodore D; Doshi, Jimit; Davatzikos, Christos; Shinohara, Russell T

    2016-05-15

    Normalization of feature vector values is a common practice in machine learning. Generally, each feature value is standardized to the unit hypercube or by normalizing to zero mean and unit variance. Classification decisions based on support vector machines (SVMs) or by other methods are sensitive to the specific normalization used on the features. In the context of multivariate pattern analysis using neuroimaging data, standardization effectively up- and down-weights features based on their individual variability. Since the standard approach uses the entire data set to guide the normalization, it utilizes the total variability of these features. This total variation is inevitably dependent on the amount of marginal separation between groups. Thus, such a normalization may attenuate the separability of the data in high dimensional space. In this work we propose an alternate approach that uses an estimate of the control-group standard deviation to normalize features before training. We study our proposed approach in the context of group classification using structural MRI data. We show that control-based normalization leads to better reproducibility of estimated multivariate disease patterns and improves the classifier performance in many cases. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. The classification of secondary colorectal liver cancer in human biopsy samples using angular dispersive x-ray diffraction and multivariate analysis

    NASA Astrophysics Data System (ADS)

    Theodorakou, Chrysoula; Farquharson, Michael J.

    2009-08-01

    The motivation behind this study is to assess whether angular dispersive x-ray diffraction (ADXRD) data, processed using multivariate analysis techniques, can be used for classifying secondary colorectal liver cancer tissue and normal surrounding liver tissue in human liver biopsy samples. The ADXRD profiles from a total of 60 samples of normal liver tissue and colorectal liver metastases were measured using a synchrotron radiation source. The data were analysed for 56 samples using nonlinear peak-fitting software. Four peaks were fitted to all of the ADXRD profiles, and the amplitude, area, amplitude and area ratios for three of the four peaks were calculated and used for the statistical and multivariate analysis. The statistical analysis showed that there are significant differences between all the peak-fitting parameters and ratios between the normal and the diseased tissue groups. The technique of soft independent modelling of class analogy (SIMCA) was used to classify normal liver tissue and colorectal liver metastases resulting in 67% of the normal tissue samples and 60% of the secondary colorectal liver tissue samples being classified correctly. This study has shown that the ADXRD data of normal and secondary colorectal liver cancer are statistically different and x-ray diffraction data analysed using multivariate analysis have the potential to be used as a method of tissue classification.

  7. Optimizing Functional Network Representation of Multivariate Time Series

    NASA Astrophysics Data System (ADS)

    Zanin, Massimiliano; Sousa, Pedro; Papo, David; Bajo, Ricardo; García-Prieto, Juan; Pozo, Francisco Del; Menasalvas, Ernestina; Boccaletti, Stefano

    2012-09-01

    By combining complex network theory and data mining techniques, we provide objective criteria for optimization of the functional network representation of generic multivariate time series. In particular, we propose a method for the principled selection of the threshold value for functional network reconstruction from raw data, and for proper identification of the network's indicators that unveil the most discriminative information on the system for classification purposes. We illustrate our method by analysing networks of functional brain activity of healthy subjects, and patients suffering from Mild Cognitive Impairment, an intermediate stage between the expected cognitive decline of normal aging and the more pronounced decline of dementia. We discuss extensions of the scope of the proposed methodology to network engineering purposes, and to other data mining tasks.

  8. Optimizing Functional Network Representation of Multivariate Time Series

    PubMed Central

    Zanin, Massimiliano; Sousa, Pedro; Papo, David; Bajo, Ricardo; García-Prieto, Juan; Pozo, Francisco del; Menasalvas, Ernestina; Boccaletti, Stefano

    2012-01-01

    By combining complex network theory and data mining techniques, we provide objective criteria for optimization of the functional network representation of generic multivariate time series. In particular, we propose a method for the principled selection of the threshold value for functional network reconstruction from raw data, and for proper identification of the network's indicators that unveil the most discriminative information on the system for classification purposes. We illustrate our method by analysing networks of functional brain activity of healthy subjects, and patients suffering from Mild Cognitive Impairment, an intermediate stage between the expected cognitive decline of normal aging and the more pronounced decline of dementia. We discuss extensions of the scope of the proposed methodology to network engineering purposes, and to other data mining tasks. PMID:22953051

  9. Combination of multivariate curve resolution and multivariate classification techniques for comprehensive high-performance liquid chromatography-diode array absorbance detection fingerprints analysis of Salvia reuterana extracts.

    PubMed

    Hakimzadeh, Neda; Parastar, Hadi; Fattahi, Mohammad

    2014-01-24

    In this study, multivariate curve resolution (MCR) and multivariate classification methods are proposed to develop a new chemometric strategy for comprehensive analysis of high-performance liquid chromatography-diode array absorbance detection (HPLC-DAD) fingerprints of sixty Salvia reuterana samples from five different geographical regions. Different chromatographic problems occurred during HPLC-DAD analysis of S. reuterana samples, such as baseline/background contribution and noise, low signal-to-noise ratio (S/N), asymmetric peaks, elution time shifts, and peak overlap are handled using the proposed strategy. In this way, chromatographic fingerprints of sixty samples are properly segmented to ten common chromatographic regions using local rank analysis and then, the corresponding segments are column-wise augmented for subsequent MCR analysis. Extended multivariate curve resolution-alternating least squares (MCR-ALS) is used to obtain pure component profiles in each segment. In general, thirty-one chemical components were resolved using MCR-ALS in sixty S. reuterana samples and the lack of fit (LOF) values of MCR-ALS models were below 10.0% in all cases. Pure spectral profiles are considered for identification of chemical components by comparing their resolved spectra with the standard ones and twenty-four components out of thirty-one components were identified. Additionally, pure elution profiles are used to obtain relative concentrations of chemical components in different samples for multivariate classification analysis by principal component analysis (PCA) and k-nearest neighbors (kNN). Inspection of the PCA score plot (explaining 76.1% of variance accounted for three PCs) showed that S. reuterana samples belong to four clusters. The degree of class separation (DCS) which quantifies the distance separating clusters in relation to the scatter within each cluster is calculated for four clusters and it was in the range of 1.6-5.8. These results are then confirmed by kNN. In addition, according to the PCA loading plot and kNN dendrogram of thirty-one variables, five chemical constituents of luteolin-7-o-glucoside, salvianolic acid D, rosmarinic acid, lithospermic acid and trijuganone A are identified as the most important variables (i.e., chemical markers) for clusters discrimination. Finally, the effect of different chemical markers on samples differentiation is investigated using counter-propagation artificial neural network (CP-ANN) method. It is concluded that the proposed strategy can be successfully applied for comprehensive analysis of chromatographic fingerprints of complex natural samples. Copyright © 2013 Elsevier B.V. All rights reserved.

  10. Recursive Partitioning Analysis for New Classification of Patients With Esophageal Cancer Treated by Chemoradiotherapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nomura, Motoo, E-mail: excell@hkg.odn.ne.jp; Department of Clinical Oncology, Aichi Cancer Center Hospital, Nagoya; Department of Radiation Oncology, Aichi Cancer Center Hospital, Nagoya

    2012-11-01

    Background: The 7th edition of the American Joint Committee on Cancer staging system does not include lymph node size in the guidelines for staging patients with esophageal cancer. The objectives of this study were to determine the prognostic impact of the maximum metastatic lymph node diameter (ND) on survival and to develop and validate a new staging system for patients with esophageal squamous cell cancer who were treated with definitive chemoradiotherapy (CRT). Methods: Information on 402 patients with esophageal cancer undergoing CRT at two institutions was reviewed. Univariate and multivariate analyses of data from one institution were used to assessmore » the impact of clinical factors on survival, and recursive partitioning analysis was performed to develop the new staging classification. To assess its clinical utility, the new classification was validated using data from the second institution. Results: By multivariate analysis, gender, T, N, and ND stages were independently and significantly associated with survival (p < 0.05). The resulting new staging classification was based on the T and ND. The four new stages led to good separation of survival curves in both the developmental and validation datasets (p < 0.05). Conclusions: Our results showed that lymph node size is a strong independent prognostic factor and that the new staging system, which incorporated lymph node size, provided good prognostic power, and discriminated effectively for patients with esophageal cancer undergoing CRT.« less

  11. Origin Discrimination of Osmanthus fragrans var. thunbergii Flowers using GC-MS and UPLC-PDA Combined with Multivariable Analysis Methods.

    PubMed

    Zhou, Fei; Zhao, Yajing; Peng, Jiyu; Jiang, Yirong; Li, Maiquan; Jiang, Yuan; Lu, Baiyi

    2017-07-01

    Osmanthus fragrans flowers are used as folk medicine and additives for teas, beverages and foods. The metabolites of O. fragrans flowers from different geographical origins were inconsistent in some extent. Chromatography and mass spectrometry combined with multivariable analysis methods provides an approach for discriminating the origin of O. fragrans flowers. To discriminate the Osmanthus fragrans var. thunbergii flowers from different origins with the identified metabolites. GC-MS and UPLC-PDA were conducted to analyse the metabolites in O. fragrans var. thunbergii flowers (in total 150 samples). Principal component analysis (PCA), soft independent modelling of class analogy analysis (SIMCA) and random forest (RF) analysis were applied to group the GC-MS and UPLC-PDA data. GC-MS identified 32 compounds common to all samples while UPLC-PDA/QTOF-MS identified 16 common compounds. PCA of the UPLC-PDA data generated a better clustering than PCA of the GC-MS data. Ten metabolites (six from GC-MS and four from UPLC-PDA) were selected as effective compounds for discrimination by PCA loadings. SIMCA and RF analysis were used to build classification models, and the RF model, based on the four effective compounds (caffeic acid derivative, acteoside, ligustroside and compound 15), yielded better results with the classification rate of 100% in the calibration set and 97.8% in the prediction set. GC-MS and UPLC-PDA combined with multivariable analysis methods can discriminate the origin of Osmanthus fragrans var. thunbergii flowers. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  12. The Japanese Histologic Classification and T-score in the Oxford Classification system could predict renal outcome in Japanese IgA nephropathy patients.

    PubMed

    Kaihan, Ahmad Baseer; Yasuda, Yoshinari; Katsuno, Takayuki; Kato, Sawako; Imaizumi, Takahiro; Ozeki, Takaya; Hishida, Manabu; Nagata, Takanobu; Ando, Masahiko; Tsuboi, Naotake; Maruyama, Shoichi

    2017-12-01

    The Oxford Classification is utilized globally, but has not been fully validated. In this study, we conducted a comparative analysis between the Oxford Classification and Japanese Histologic Classification (JHC) to predict renal outcome in Japanese patients with IgA nephropathy (IgAN). A retrospective cohort study including 86 adult IgAN patients was conducted. The Oxford Classification and the JHC were evaluated by 7 independent specialists. The JHC, MEST score in the Oxford Classification, and crescents were analyzed in association with renal outcome, defined as a 50% increase in serum creatinine. In multivariate analysis without the JHC, only the T score was significantly associated with renal outcome. While, a significant association was revealed only in the JHC on multivariate analysis with JHC. The JHC and T score in the Oxford Classification were associated with renal outcome among Japanese patients with IgAN. Superiority of the JHC as a predictive index should be validated with larger study population and cohort studies in different ethnicities.

  13. Understanding perception of active noise control system through multichannel EEG analysis.

    PubMed

    Bagha, Sangeeta; Tripathy, R K; Nanda, Pranati; Preetam, C; Das, Debi Prasad

    2018-06-01

    In this Letter, a method is proposed to investigate the effect of noise with and without active noise control (ANC) on multichannel electroencephalogram (EEG) signal. The multichannel EEG signal is recorded during different listening conditions such as silent, music, noise, ANC with background noise and ANC with both background noise and music. The multiscale analysis of EEG signal of each channel is performed using the discrete wavelet transform. The multivariate multiscale matrices are formulated based on the sub-band signals of each EEG channel. The singular value decomposition is applied to the multivariate matrices of multichannel EEG at significant scales. The singular value features at significant scales and the extreme learning machine classifier with three different activation functions are used for classification of multichannel EEG signal. The experimental results demonstrate that, for ANC with noise and ANC with noise and music classes, the proposed method has sensitivity values of 75.831% ( p < 0.001 ) and 99.31% ( p < 0.001 ), respectively. The method has an accuracy value of 83.22% for the classification of EEG signal with music and ANC with music as stimuli. The important finding of this study is that by the introduction of ANC, music can be better perceived by the human brain.

  14. Use of Neuroanatomical Pattern Classification to Identify Subjects in At-Risk Mental States of Psychosis and Predict Disease Transition

    PubMed Central

    Koutsouleris, Nikolaos; Meisenzahl, Eva M.; Davatzikos, Christos; Bottlender, Ronald; Frodl, Thomas; Scheuerecker, Johanna; Schmitt, Gisela; Zetzsche, Thomas; Decker, Petra; Reiser, Maximilian; Möller, Hans-Jürgen; Gaser, Christian

    2014-01-01

    Context Identification of individuals at high risk of developing psychosis has relied on prodromal symptomatology. Recently, machine learning algorithms have been successfully used for magnetic resonance imaging–based diagnostic classification of neuropsychiatric patient populations. Objective To determine whether multivariate neuroanatomical pattern classification facilitates identification of individuals in different at-risk mental states (ARMS) of psychosis and enables the prediction of disease transition at the individual level. Design Multivariate neuroanatomical pattern classification was performed on the structural magnetic resonance imaging data of individuals in early or late ARMS vs healthy controls (HCs). The predictive power of the method was then evaluated by categorizing the baseline imaging data of individuals with transition to psychosis vs those without transition vs HCs after 4 years of clinical follow-up. Classification generalizability was estimated by cross-validation and by categorizing an independent cohort of 45 new HCs. Setting Departments of Psychiatry and Psychotherapy, Ludwig-Maximilians-University, Munich, Germany. Participants The first classification analysis included 20 early and 25 late at-risk individuals and 25 matched HCs. The second analysis consisted of 15 individuals with transition, 18 without transition, and 17 matched HCs. Main Outcome Measures Specificity, sensitivity, and accuracy of classification. Results The 3-group, cross-validated classification accuracies of the first analysis were 86% (HCs vs the rest), 91% (early at-risk individuals vs the rest), and 86% (late at-risk individuals vs the rest). The accuracies in the second analysis were 90% (HCs vs the rest), 88% (individuals with transition vs the rest), and 86% (individuals without transition vs the rest). Independent HCs were correctly classified in 96% (first analysis) and 93% (second analysis) of cases. Conclusions Different ARMSs and their clinical outcomes may be reliably identified on an individual basis by assessing patterns of whole-brain neuroanatomical abnormalities. These patterns may serve as valuable biomarkers for the clinician to guide early detection in the prodromal phase of psychosis. PMID:19581561

  15. Classification of Ilex species based on metabolomic fingerprinting using nuclear magnetic resonance and multivariate data analysis.

    PubMed

    Choi, Young Hae; Sertic, Sarah; Kim, Hye Kyong; Wilson, Erica G; Michopoulos, Filippos; Lefeber, Alfons W M; Erkelens, Cornelis; Prat Kricun, Sergio D; Verpoorte, Robert

    2005-02-23

    The metabolomic analysis of 11 Ilex species, I. argentina, I. brasiliensis, I. brevicuspis, I. dumosavar. dumosa, I. dumosa var. guaranina, I. integerrima, I. microdonta, I. paraguariensis var. paraguariensis, I. pseudobuxus, I. taubertiana, and I. theezans, was carried out by NMR spectroscopy and multivariate data analysis. The analysis using principal component analysis and classification of the (1)H NMR spectra showed a clear discrimination of those samples based on the metabolites present in the organic and aqueous fractions. The major metabolites that contribute to the discrimination are arbutin, caffeine, phenylpropanoids, and theobromine. Among those metabolites, arbutin, which has not been reported yet as a constituent of Ilex species, was found to be a biomarker for I. argentina,I. brasiliensis, I. brevicuspis, I. integerrima, I. microdonta, I. pseudobuxus, I. taubertiana, and I. theezans. This reliable method based on the determination of a large number of metabolites makes the chemotaxonomical analysis of Ilex species possible.

  16. Identification of Enterococcus, Streptococcus, and Staphylococcus by Multivariate Analysis of Proton Magnetic Resonance Spectroscopic Data from Plate Cultures

    PubMed Central

    Bourne, Roger; Himmelreich, Uwe; Sharma, Ansuiya; Mountford, Carolyn; Sorrell, Tania

    2001-01-01

    A new fingerprinting technique with the potential for rapid identification of bacteria was developed by combining proton magnetic resonance spectroscopy (1H MRS) with multivariate statistical analysis. This resulted in an objective identification strategy for common clinical isolates belonging to the bacterial species Staphylococcus aureus, Staphylococcus epidermidis, Enterococcus faecalis, Streptococcus pneumoniae, Streptococcus pyogenes, Streptococcus agalactiae, and the Streptococcus milleri group. Duplicate cultures of 104 different isolates were examined one or more times using 1H MRS. A total of 312 cultures were examined. An optimized classifier was developed using a bootstrapping process and a seven-group linear discriminant analysis to provide objective classification of the spectra. Identification of isolates was based on consistent high-probability classification of spectra from duplicate cultures and achieved 92% agreement with conventional methods of identification. Fewer than 1% of isolates were identified incorrectly. Identification of the remaining 7% of isolates was defined as indeterminate. PMID:11474013

  17. A conceptual weather-type classification procedure for the Philadelphia, Pennsylvania, area

    USGS Publications Warehouse

    McCabe, Gregory J.

    1990-01-01

    A simple method of weather-type classification, based on a conceptual model of pressure systems that pass through the Philadelphia, Pennsylvania, area, has been developed. The only inputs required for the procedure are daily mean wind direction and cloud cover, which are used to index the relative position of pressure systems and fronts to Philadelphia.Daily mean wind-direction and cloud-cover data recorded at Philadelphia, Pennsylvania, from January 1954 through August 1988 were used to categorize daily weather conditions. The conceptual weather types reflect changes in daily air and dew-point temperatures, and changes in monthly mean temperature and monthly and annual precipitation. The weather-type classification produced by using the conceptual model was similar to a classification produced by using a multivariate statistical classification procedure. Even though the conceptual weather types are derived from a small amount of data, they appear to account for the variability of daily weather patterns sufficiently to describe distinct weather conditions for use in environmental analyses of weather-sensitive processes.

  18. Simulation techniques for estimating error in the classification of normal patterns

    NASA Technical Reports Server (NTRS)

    Whitsitt, S. J.; Landgrebe, D. A.

    1974-01-01

    Methods of efficiently generating and classifying samples with specified multivariate normal distributions were discussed. Conservative confidence tables for sample sizes are given for selective sampling. Simulation results are compared with classified training data. Techniques for comparing error and separability measure for two normal patterns are investigated and used to display the relationship between the error and the Chernoff bound.

  19. Determination of fragrance content in perfume by Raman spectroscopy and multivariate calibration

    NASA Astrophysics Data System (ADS)

    Godinho, Robson B.; Santos, Mauricio C.; Poppi, Ronei J.

    2016-03-01

    An alternative methodology is herein proposed for determination of fragrance content in perfumes and their classification according to the guidelines established by fine perfume manufacturers. The methodology is based on Raman spectroscopy associated with multivariate calibration, allowing the determination of fragrance content in a fast, nondestructive, and sustainable manner. The results were considered consistent with the conventional method, whose standard error of prediction values was lower than the 1.0%. This result indicates that the proposed technology is a feasible analytical tool for determination of the fragrance content in a hydro-alcoholic solution for use in manufacturing, quality control and regulatory agencies.

  20. Real-Time Food Authentication Using a Miniature Mass Spectrometer.

    PubMed

    Gerbig, Stefanie; Neese, Stephan; Penner, Alexander; Spengler, Bernhard; Schulz, Sabine

    2017-10-17

    Food adulteration is a threat to public health and the economy. In order to determine food adulteration efficiently, rapid and easy-to-use on-site analytical methods are needed. In this study, a miniaturized mass spectrometer in combination with three ambient ionization methods was used for food authentication. The chemical fingerprints of three milk types, five fish species, and two coffee types were measured using electrospray ionization, desorption electrospray ionization, and low temperature plasma ionization. Minimum sample preparation was needed for the analysis of liquid and solid food samples. Mass spectrometric data was processed using the laboratory-built software MS food classifier, which allows for the definition of specific food profiles from reference data sets using multivariate statistical methods and the subsequent classification of unknown data. Applicability of the obtained mass spectrometric fingerprints for food authentication was evaluated using different data processing methods, leave-10%-out cross-validation, and real-time classification of new data. Classification accuracy of 100% was achieved for the differentiation of milk types and fish species, and a classification accuracy of 96.4% was achieved for coffee types in cross-validation experiments. Measurement of two milk mixtures yielded correct classification of >94%. For real-time classification, the accuracies were comparable. Functionality of the software program and its performance is described. Processing time for a reference data set and a newly acquired spectrum was found to be 12 s and 2 s, respectively. These proof-of-principle experiments show that the combination of a miniaturized mass spectrometer, ambient ionization, and statistical analysis is suitable for on-site real-time food authentication.

  1. Mapping Informative Clusters in a Hierarchial Framework of fMRI Multivariate Analysis

    PubMed Central

    Xu, Rui; Zhen, Zonglei; Liu, Jia

    2010-01-01

    Pattern recognition methods have become increasingly popular in fMRI data analysis, which are powerful in discriminating between multi-voxel patterns of brain activities associated with different mental states. However, when they are used in functional brain mapping, the location of discriminative voxels varies significantly, raising difficulties in interpreting the locus of the effect. Here we proposed a hierarchical framework of multivariate approach that maps informative clusters rather than voxels to achieve reliable functional brain mapping without compromising the discriminative power. In particular, we first searched for local homogeneous clusters that consisted of voxels with similar response profiles. Then, a multi-voxel classifier was built for each cluster to extract discriminative information from the multi-voxel patterns. Finally, through multivariate ranking, outputs from the classifiers were served as a multi-cluster pattern to identify informative clusters by examining interactions among clusters. Results from both simulated and real fMRI data demonstrated that this hierarchical approach showed better performance in the robustness of functional brain mapping than traditional voxel-based multivariate methods. In addition, the mapped clusters were highly overlapped for two perceptually equivalent object categories, further confirming the validity of our approach. In short, the hierarchical framework of multivariate approach is suitable for both pattern classification and brain mapping in fMRI studies. PMID:21152081

  2. Rapid characterization of transgenic and non-transgenic soybean oils by chemometric methods using NIR spectroscopy

    NASA Astrophysics Data System (ADS)

    Luna, Aderval S.; da Silva, Arnaldo P.; Pinho, Jéssica S. A.; Ferré, Joan; Boqué, Ricard

    Near infrared (NIR) spectroscopy and multivariate classification were applied to discriminate soybean oil samples into non-transgenic and transgenic. Principal Component Analysis (PCA) was applied to extract relevant features from the spectral data and to remove the anomalous samples. The best results were obtained when with Support Vectors Machine-Discriminant Analysis (SVM-DA) and Partial Least Squares-Discriminant Analysis (PLS-DA) after mean centering plus multiplicative scatter correction. For SVM-DA the percentage of successful classification was 100% for the training group and 100% and 90% in validation group for non transgenic and transgenic soybean oil samples respectively. For PLS-DA the percentage of successful classification was 95% and 100% in training group for non transgenic and transgenic soybean oil samples respectively and 100% and 80% in validation group for non transgenic and transgenic respectively. The results demonstrate that NIR spectroscopy can provide a rapid, nondestructive and reliable method to distinguish non-transgenic and transgenic soybean oils.

  3. Discrimination of inflammatory bowel disease using Raman spectroscopy and linear discriminant analysis methods

    NASA Astrophysics Data System (ADS)

    Ding, Hao; Cao, Ming; DuPont, Andrew W.; Scott, Larry D.; Guha, Sushovan; Singhal, Shashideep; Younes, Mamoun; Pence, Isaac; Herline, Alan; Schwartz, David; Xu, Hua; Mahadevan-Jansen, Anita; Bi, Xiaohong

    2016-03-01

    Inflammatory bowel disease (IBD) is an idiopathic disease that is typically characterized by chronic inflammation of the gastrointestinal tract. Recently much effort has been devoted to the development of novel diagnostic tools that can assist physicians for fast, accurate, and automated diagnosis of the disease. Previous research based on Raman spectroscopy has shown promising results in differentiating IBD patients from normal screening cases. In the current study, we examined IBD patients in vivo through a colonoscope-coupled Raman system. Optical diagnosis for IBD discrimination was conducted based on full-range spectra using multivariate statistical methods. Further, we incorporated several feature selection methods in machine learning into the classification model. The diagnostic performance for disease differentiation was significantly improved after feature selection. Our results showed that improved IBD diagnosis can be achieved using Raman spectroscopy in combination with multivariate analysis and feature selection.

  4. Multivariate neuroanatomical classification of cognitive subtypes in schizophrenia: A support vector machine learning approach

    PubMed Central

    Gould, Ian C.; Shepherd, Alana M.; Laurens, Kristin R.; Cairns, Murray J.; Carr, Vaughan J.; Green, Melissa J.

    2014-01-01

    Heterogeneity in the structural brain abnormalities associated with schizophrenia has made identification of reliable neuroanatomical markers of the disease difficult. The use of more homogenous clinical phenotypes may improve the accuracy of predicting psychotic disorder/s on the basis of observable brain disturbances. Here we investigate the utility of cognitive subtypes of schizophrenia – ‘cognitive deficit’ and ‘cognitively spared’ – in determining whether multivariate patterns of volumetric brain differences can accurately discriminate these clinical subtypes from healthy controls, and from each other. We applied support vector machine classification to grey- and white-matter volume data from 126 schizophrenia patients previously allocated to the cognitive spared subtype, 74 cognitive deficit schizophrenia patients, and 134 healthy controls. Using this method, cognitive subtypes were distinguished from healthy controls with up to 72% accuracy. Cross-validation analyses between subtypes achieved an accuracy of 71%, suggesting that some common neuroanatomical patterns distinguish both subtypes from healthy controls. Notably, cognitive subtypes were best distinguished from one another when the sample was stratified by sex prior to classification analysis: cognitive subtype classification accuracy was relatively low (<60%) without stratification, and increased to 83% for females with sex stratification. Distinct neuroanatomical patterns predicted cognitive subtype status in each sex: sex-specific multivariate patterns did not predict cognitive subtype status in the other sex above chance, and weight map analyses demonstrated negative correlations between the spatial patterns of weights underlying classification for each sex. These results suggest that in typical mixed-sex samples of schizophrenia patients, the volumetric brain differences between cognitive subtypes are relatively minor in contrast to the large common disease-associated changes. Volumetric differences that distinguish between cognitive subtypes on a case-by-case basis appear to occur in a sex-specific manner that is consistent with previous evidence of disrupted relationships between brain structure and cognition in male, but not female, schizophrenia patients. Consideration of sex-specific differences in brain organization is thus likely to assist future attempts to distinguish subgroups of schizophrenia patients on the basis of neuroanatomical features. PMID:25379435

  5. Recent applications of multivariate data analysis methods in the authentication of rice and the most analyzed parameters: A review.

    PubMed

    Maione, Camila; Barbosa, Rommel Melgaço

    2018-01-24

    Rice is one of the most important staple foods around the world. Authentication of rice is one of the most addressed concerns in the present literature, which includes recognition of its geographical origin and variety, certification of organic rice and many other issues. Good results have been achieved by multivariate data analysis and data mining techniques when combined with specific parameters for ascertaining authenticity and many other useful characteristics of rice, such as quality, yield and others. This paper brings a review of the recent research projects on discrimination and authentication of rice using multivariate data analysis and data mining techniques. We found that data obtained from image processing, molecular and atomic spectroscopy, elemental fingerprinting, genetic markers, molecular content and others are promising sources of information regarding geographical origin, variety and other aspects of rice, being widely used combined with multivariate data analysis techniques. Principal component analysis and linear discriminant analysis are the preferred methods, but several other data classification techniques such as support vector machines, artificial neural networks and others are also frequently present in some studies and show high performance for discrimination of rice.

  6. The Decoding Toolbox (TDT): a versatile software package for multivariate analyses of functional imaging data

    PubMed Central

    Hebart, Martin N.; Görgen, Kai; Haynes, John-Dylan

    2015-01-01

    The multivariate analysis of brain signals has recently sparked a great amount of interest, yet accessible and versatile tools to carry out decoding analyses are scarce. Here we introduce The Decoding Toolbox (TDT) which represents a user-friendly, powerful and flexible package for multivariate analysis of functional brain imaging data. TDT is written in Matlab and equipped with an interface to the widely used brain data analysis package SPM. The toolbox allows running fast whole-brain analyses, region-of-interest analyses and searchlight analyses, using machine learning classifiers, pattern correlation analysis, or representational similarity analysis. It offers automatic creation and visualization of diverse cross-validation schemes, feature scaling, nested parameter selection, a variety of feature selection methods, multiclass capabilities, and pattern reconstruction from classifier weights. While basic users can implement a generic analysis in one line of code, advanced users can extend the toolbox to their needs or exploit the structure to combine it with external high-performance classification toolboxes. The toolbox comes with an example data set which can be used to try out the various analysis methods. Taken together, TDT offers a promising option for researchers who want to employ multivariate analyses of brain activity patterns. PMID:25610393

  7. External Validation of the European Hernia Society Classification for Postoperative Complications after Incisional Hernia Repair: A Cohort Study of 2,191 Patients.

    PubMed

    Kroese, Leonard F; Kleinrensink, Gert-Jan; Lange, Johan F; Gillion, Jean-Francois

    2018-03-01

    Incisional hernia is a frequent complication after midline laparotomy. Surgical hernia repair is associated with complications, but no clear predictive risk factors have been identified. The European Hernia Society (EHS) classification offers a structured framework to describe hernias and to analyze postoperative complications. Because of its structured nature, it might prove to be useful for preoperative patient or treatment classification. The objective of this study was to investigate the EHS classification as a predictor for postoperative complications after incisional hernia surgery. An analysis was performed using a registry-based, large-scale, prospective cohort study, including all patients undergoing incisional hernia surgery between September 1, 2011 and February 29, 2016. Univariate analyses and multivariable logistic regression analysis were performed to identify risk factors for postoperative complications. A total of 2,191 patients were included, of whom 323 (15%) had 1 or more complications. Factors associated with complications in univariate analyses (p < 0.20) and clinically relevant factors were included in the multivariable analysis. In the multivariable analysis, EHS width class, incarceration, open surgery, duration of surgery, Altemeier wound class, and therapeutic antibiotic treatment were independent risk factors for postoperative complications. Third recurrence and emergency surgery were associated with fewer complications. Incisional hernia repair is associated with a 15% complication rate. The EHS width classification is associated with postoperative complications. To identify patients at risk for complications, the EHS classification is useful. Copyright © 2017. Published by Elsevier Inc.

  8. Analysis of spreadable cheese by Raman spectroscopy and chemometric tools.

    PubMed

    Oliveira, Kamila de Sá; Callegaro, Layce de Souza; Stephani, Rodrigo; Almeida, Mariana Ramos; de Oliveira, Luiz Fernando Cappa

    2016-03-01

    In this work, FT-Raman spectroscopy was explored to evaluate spreadable cheese samples. A partial least squares discriminant analysis was employed to identify the spreadable cheese samples containing starch. To build the models, two types of samples were used: commercial samples and samples manufactured in local industries. The method of supervised classification PLS-DA was employed to classify the samples as adulterated or without starch. Multivariate regression was performed using the partial least squares method to quantify the starch in the spreadable cheese. The limit of detection obtained for the model was 0.34% (w/w) and the limit of quantification was 1.14% (w/w). The reliability of the models was evaluated by determining the confidence interval, which was calculated using the bootstrap re-sampling technique. The results show that the classification models can be used to complement classical analysis and as screening methods. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Nearest clusters based partial least squares discriminant analysis for the classification of spectral data.

    PubMed

    Song, Weiran; Wang, Hui; Maguire, Paul; Nibouche, Omar

    2018-06-07

    Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time. Copyright © 2018 Elsevier B.V. All rights reserved.

  10. Multivariate analysis of the volatile components in tobacco based on infrared-assisted extraction coupled to headspace solid-phase microextraction and gas chromatography-mass spectrometry.

    PubMed

    Yang, Yanqin; Pan, Yuanjiang; Zhou, Guojun; Chu, Guohai; Jiang, Jian; Yuan, Kailong; Xia, Qian; Cheng, Changhe

    2016-11-01

    A novel infrared-assisted extraction coupled to headspace solid-phase microextraction followed by gas chromatography with mass spectrometry method has been developed for the rapid determination of the volatile components in tobacco. The optimal extraction conditions for maximizing the extraction efficiency were as follows: 65 μm polydimethylsiloxane-divinylbenzene fiber, extraction time of 20 min, infrared power of 175 W, and distance between the infrared lamp and the headspace vial of 2 cm. Under the optimum conditions, 50 components were found to exist in all ten tobacco samples from different geographical origins. Compared with conventional water-bath heating and nonheating extraction methods, the extraction efficiency of infrared-assisted extraction was greatly improved. Furthermore, multivariate analysis including principal component analysis, hierarchical cluster analysis, and similarity analysis were performed to evaluate the chemical information of these samples and divided them into three classifications, including rich, moderate, and fresh flavors. The above-mentioned classification results were consistent with the sensory evaluation, which was pivotal and meaningful for tobacco discrimination. As a simple, fast, cost-effective, and highly efficient method, the infrared-assisted extraction coupled to headspace solid-phase microextraction technique is powerful and promising for distinguishing the geographical origins of the tobacco samples coupled to suitable chemometrics. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Incremental Validity of Multidimensional Proficiency Scores from Diagnostic Classification Models: An Illustration for Elementary School Mathematics

    ERIC Educational Resources Information Center

    Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver

    2017-01-01

    Diagnostic classification models (DCMs) hold great potential for applications in summative and formative assessment by providing discrete multivariate proficiency scores that yield statistically driven classifications of students. Using data from a newly developed diagnostic arithmetic assessment that was administered to 2032 fourth-grade students…

  12. Unique Characteristics of Diagnostic Classification Models: A Comprehensive Review of the Current State-of-the-Art

    ERIC Educational Resources Information Center

    Rupp, Andre A.; Templin, Jonathan L.

    2008-01-01

    "Diagnostic classification models" (DCM) are frequently promoted by psychometricians as important modelling alternatives for analyzing response data in situations where multivariate classifications of respondents are made on the basis of multiple postulated latent skills. In this review paper, a definitional boundary of the space of DCM…

  13. Determination of fragrance content in perfume by Raman spectroscopy and multivariate calibration.

    PubMed

    Godinho, Robson B; Santos, Mauricio C; Poppi, Ronei J

    2016-03-15

    An alternative methodology is herein proposed for determination of fragrance content in perfumes and their classification according to the guidelines established by fine perfume manufacturers. The methodology is based on Raman spectroscopy associated with multivariate calibration, allowing the determination of fragrance content in a fast, nondestructive, and sustainable manner. The results were considered consistent with the conventional method, whose standard error of prediction values was lower than the 1.0%. This result indicates that the proposed technology is a feasible analytical tool for determination of the fragrance content in a hydro-alcoholic solution for use in manufacturing, quality control and regulatory agencies. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. Geomorphic Classification and Evaluation of Channel Width and Emergent Sandbar Habitat Relations on the Lower Platte River, Nebraska

    USGS Publications Warehouse

    Elliott, Caroline M.

    2011-01-01

    This report presents a summary of geomorphic characteristics extracted from aerial imagery for three broad segments of the Lower Platte River. This report includes a summary of the longitudinal multivariate classification in Elliott and others (2009) and presents a new analysis of total channel width and habitat variables. Three segments on the lower 102.8 miles of the Lower Platte River are addressed in this report: the Loup River to the Elkhorn River (70 miles long), the Elkhorn River to Salt Creek (6.9 miles long), and Salt Creek to the Missouri River (25.9 miles long). The locations of these segments were determined by the locations of tributaries potentially significant to the hydrology or sediment supply of the Lower Platte River. This report summarizes channel characteristics as mapped from July 2006 aerial imagery including river width, valley width, channel curvature, and in-channel habitat features. In-channel habitat measurements were not made under consistent hydrologic conditions and must be considered general estimates of channel condition in late July 2006. Longitudinal patterns in these features are explored and are summarized in the context of the longitudinal multivariate classification in Elliott and others (2009) for the three Lower Platte River segments. Detailed descriptions of data collection and classification methods are described in Elliott and others (2009). Nesting data for the endangered interior least tern (Sternula antillarum) and threatened piping plover (Charadrius melodus) from 2006 through 2009 are examined within the context of the multivariate classification and Lower Platte River segments. The widest reaches of the Lower Platte River are located in the segment downstream from the Loup River to the Elkhorn River. This segment also has the widest valley and highest degree of braiding of the three segments and many large vegetated islands. The short segment of river between the Elkhorn River and Salt Creek has a fairly low valley width and high channel sinuosities at larger scales. The segment from Salt Creek to the Missouri River has narrow valleys and generally low channel sinuosity. Tern and plover nest sites from 2006 through 2009 in the multi-scale multivariate classification indicated relative nesting selection of cluster 2 reaches among the four-cluster classification and reaches containing clusters 2, 3, and 6 from the seven-cluster classification. These classes, with the exception of cluster 6 are common downstream from the Elkhorn River. Trends in total channel width indicated that reaches dominated by dark vegetation (islands) are the widest on the Lower Platte River. Reaches with high percentages of dry sand and dry sand plus light vegetation were the narrowest reaches. This suggests that narrow channel reaches have sufficient transport capacity to maintain sandbars under recent (2006) flow regimes and are likely to be most amenable to maintaining tern and plover habitat in the Lower Platte River. Further investigations into the dynamics of emergent sandbar habitat and the effects of bank stabilization on in-channel habitats will require the collection and analysis of new data, particularly detailed elevation information and an assessment of existing bank stabilization structures.

  15. Comparing statistical and machine learning classifiers: alternatives for predictive modeling in human factors research.

    PubMed

    Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann

    2003-01-01

    Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.

  16. Nonparametric analysis of Minnesota spruce and aspen tree data and LANDSAT data

    NASA Technical Reports Server (NTRS)

    Scott, D. W.; Jee, R.

    1984-01-01

    The application of nonparametric methods in data-intensive problems faced by NASA is described. The theoretical development of efficient multivariate density estimators and the novel use of color graphics workstations are reviewed. The use of nonparametric density estimates for data representation and for Bayesian classification are described and illustrated. Progress in building a data analysis system in a workstation environment is reviewed and preliminary runs presented.

  17. Comparative study of different approaches for multivariate image analysis in HPTLC fingerprinting of natural products such as plant resin.

    PubMed

    Ristivojević, Petar; Trifković, Jelena; Vovk, Irena; Milojković-Opsenica, Dušanka

    2017-01-01

    Considering the introduction of phytochemical fingerprint analysis, as a method of screening the complex natural products for the presence of most bioactive compounds, use of chemometric classification methods, application of powerful scanning and image capturing and processing devices and algorithms, advancement in development of novel stationary phases as well as various separation modalities, high-performance thin-layer chromatography (HPTLC) fingerprinting is becoming attractive and fruitful field of separation science. Multivariate image analysis is crucial in the light of proper data acquisition. In a current study, different image processing procedures were studied and compared in detail on the example of HPTLC chromatograms of plant resins. In that sense, obtained variables such as gray intensities of pixels along the solvent front, peak area and mean values of peak were used as input data and compared to obtained best classification models. Important steps in image analysis, baseline removal, denoising, target peak alignment and normalization were pointed out. Numerical data set based on mean value of selected bands and intensities of pixels along the solvent front proved to be the most convenient for planar-chromatographic profiling, although required at least the basic knowledge on image processing methodology, and could be proposed for further investigation in HPLTC fingerprinting. Copyright © 2016 Elsevier B.V. All rights reserved.

  18. Rapid discrimination of sea buckthorn berries from different H. rhamnoides subspecies by multi-step IR spectroscopy coupled with multivariate data analysis

    NASA Astrophysics Data System (ADS)

    Liu, Yue; Zhang, Ying; Zhang, Jing; Fan, Gang; Tu, Ya; Sun, Suqin; Shen, Xudong; Li, Qingzhu; Zhang, Yi

    2018-03-01

    As an important ethnic medicine, sea buckthorn was widely used to prevent and treat various diseases due to its nutritional and medicinal properties. According to the Chinese Pharmacopoeia, sea buckthorn was originated from H. rhamnoides, which includes five subspecies distributed in China. Confusion and misidentification usually occurred due to their similar morphology, especially in dried and powdered forms. Additionally, these five subspecies have vital differences in quality and physiological efficacy. This paper focused on the quick classification and identification method of sea buckthorn berry powders from five H. rhamnoides subspecies using multi-step IR spectroscopy coupled with multivariate data analysis. The holistic chemical compositions revealed by the FT-IR spectra demonstrated that flavonoids, fatty acids and sugars were the main chemical components. Further, the differences in FT-IR spectra regarding their peaks, positions and intensities were used to identify H. rhamnoides subspecies samples. The discrimination was achieved using principal component analysis (PCA) and partial least square-discriminant analysis (PLS-DA). The results showed that the combination of multi-step IR spectroscopy and chemometric analysis offered a simple, fast and reliable method for the classification and identification of the sea buckthorn berry powders from different H. rhamnoides subspecies.

  19. Learning a Mahalanobis Distance-Based Dynamic Time Warping Measure for Multivariate Time Series Classification.

    PubMed

    Mei, Jiangyuan; Liu, Meizhu; Wang, Yuan-Fang; Gao, Huijun

    2016-06-01

    Multivariate time series (MTS) datasets broadly exist in numerous fields, including health care, multimedia, finance, and biometrics. How to classify MTS accurately has become a hot research topic since it is an important element in many computer vision and pattern recognition applications. In this paper, we propose a Mahalanobis distance-based dynamic time warping (DTW) measure for MTS classification. The Mahalanobis distance builds an accurate relationship between each variable and its corresponding category. It is utilized to calculate the local distance between vectors in MTS. Then we use DTW to align those MTS which are out of synchronization or with different lengths. After that, how to learn an accurate Mahalanobis distance function becomes another key problem. This paper establishes a LogDet divergence-based metric learning with triplet constraint model which can learn Mahalanobis matrix with high precision and robustness. Furthermore, the proposed method is applied on nine MTS datasets selected from the University of California, Irvine machine learning repository and Robert T. Olszewski's homepage, and the results demonstrate the improved performance of the proposed approach.

  20. Classification of bladder cancer cell lines using Raman spectroscopy: a comparison of excitation wavelength, sample substrate and statistical algorithms

    NASA Astrophysics Data System (ADS)

    Kerr, Laura T.; Adams, Aine; O'Dea, Shirley; Domijan, Katarina; Cullen, Ivor; Hennelly, Bryan M.

    2014-05-01

    Raman microspectroscopy can be applied to the urinary bladder for highly accurate classification and diagnosis of bladder cancer. This technique can be applied in vitro to bladder epithelial cells obtained from urine cytology or in vivo as an optical biopsy" to provide results in real-time with higher sensitivity and specificity than current clinical methods. However, there exists a high degree of variability across experimental parameters which need to be standardised before this technique can be utilized in an everyday clinical environment. In this study, we investigate different laser wavelengths (473 nm and 532 nm), sample substrates (glass, fused silica and calcium fluoride) and multivariate statistical methods in order to gain insight into how these various experimental parameters impact on the sensitivity and specificity of Raman cytology.

  1. Multivariate data analysis and machine learning in Alzheimer's disease with a focus on structural magnetic resonance imaging.

    PubMed

    Falahati, Farshad; Westman, Eric; Simmons, Andrew

    2014-01-01

    Machine learning algorithms and multivariate data analysis methods have been widely utilized in the field of Alzheimer's disease (AD) research in recent years. Advances in medical imaging and medical image analysis have provided a means to generate and extract valuable neuroimaging information. Automatic classification techniques provide tools to analyze this information and observe inherent disease-related patterns in the data. In particular, these classifiers have been used to discriminate AD patients from healthy control subjects and to predict conversion from mild cognitive impairment to AD. In this paper, recent studies are reviewed that have used machine learning and multivariate analysis in the field of AD research. The main focus is on studies that used structural magnetic resonance imaging (MRI), but studies that included positron emission tomography and cerebrospinal fluid biomarkers in addition to MRI are also considered. A wide variety of materials and methods has been employed in different studies, resulting in a range of different outcomes. Influential factors such as classifiers, feature extraction algorithms, feature selection methods, validation approaches, and cohort properties are reviewed, as well as key MRI-based and multi-modal based studies. Current and future trends are discussed.

  2. Development of an ecological classification system for the Wayne National Forest

    Treesearch

    David M. Hix; Andrea M. Chech

    1993-01-01

    In 1991, a collaborative research project was initiated to create an ecological classification system for the Wayne National Forest of southeastern Ohio. The work focuses on the ecological land type (ELT) level of ecosystem classification. The most common ELTs are being identified and described using information from intensive field sampling and multivariate data...

  3. Discrimination of edible oils and fats by combination of multivariate pattern recognition and FT-IR spectroscopy: A comparative study between different modeling methods

    NASA Astrophysics Data System (ADS)

    Javidnia, Katayoun; Parish, Maryam; Karimi, Sadegh; Hemmateenejad, Bahram

    2013-03-01

    By using FT-IR spectroscopy, many researchers from different disciplines enrich the experimental complexity of their research for obtaining more precise information. Moreover chemometrics techniques have boosted the use of IR instruments. In the present study we aimed to emphasize on the power of FT-IR spectroscopy for discrimination between different oil samples (especially fat from vegetable oils). Also our data were used to compare the performance of different classification methods. FT-IR transmittance spectra of oil samples (Corn, Colona, Sunflower, Soya, Olive, and Butter) were measured in the wave-number interval of 450-4000 cm-1. Classification analysis was performed utilizing PLS-DA, interval PLS-DA, extended canonical variate analysis (ECVA) and interval ECVA methods. The effect of data preprocessing by extended multiplicative signal correction was investigated. Whilst all employed method could distinguish butter from vegetable oils, iECVA resulted in the best performances for calibration and external test set with 100% sensitivity and specificity.

  4. Exploration of computational methods for classification of movement intention during human voluntary movement from single trial EEG.

    PubMed

    Bai, Ou; Lin, Peter; Vorbach, Sherry; Li, Jiang; Furlani, Steve; Hallett, Mark

    2007-12-01

    To explore effective combinations of computational methods for the prediction of movement intention preceding the production of self-paced right and left hand movements from single trial scalp electroencephalogram (EEG). Twelve naïve subjects performed self-paced movements consisting of three key strokes with either hand. EEG was recorded from 128 channels. The exploration was performed offline on single trial EEG data. We proposed that a successful computational procedure for classification would consist of spatial filtering, temporal filtering, feature selection, and pattern classification. A systematic investigation was performed with combinations of spatial filtering using principal component analysis (PCA), independent component analysis (ICA), common spatial patterns analysis (CSP), and surface Laplacian derivation (SLD); temporal filtering using power spectral density estimation (PSD) and discrete wavelet transform (DWT); pattern classification using linear Mahalanobis distance classifier (LMD), quadratic Mahalanobis distance classifier (QMD), Bayesian classifier (BSC), multi-layer perceptron neural network (MLP), probabilistic neural network (PNN), and support vector machine (SVM). A robust multivariate feature selection strategy using a genetic algorithm was employed. The combinations of spatial filtering using ICA and SLD, temporal filtering using PSD and DWT, and classification methods using LMD, QMD, BSC and SVM provided higher performance than those of other combinations. Utilizing one of the better combinations of ICA, PSD and SVM, the discrimination accuracy was as high as 75%. Further feature analysis showed that beta band EEG activity of the channels over right sensorimotor cortex was most appropriate for discrimination of right and left hand movement intention. Effective combinations of computational methods provide possible classification of human movement intention from single trial EEG. Such a method could be the basis for a potential brain-computer interface based on human natural movement, which might reduce the requirement of long-term training. Effective combinations of computational methods can classify human movement intention from single trial EEG with reasonable accuracy.

  5. Elimination of RF inhomogeneity effects in segmentation.

    PubMed

    Agus, Onur; Ozkan, Mehmed; Aydin, Kubilay

    2007-01-01

    There are various methods proposed for the segmentation and analysis of MR images. However the efficiency of these techniques is effected by various artifacts that occur in the imaging system. One of the most encountered problems is the intensity variation across an image. To overcome this problem different methods are used. In this paper we propose a method for the elimination of intensity artifacts in segmentation of MRI images. Inter imager variations are also minimized to produce the same tissue segmentation for the same patient. A well-known multivariate classification algorithm, maximum likelihood is employed to illustrate the enhancement in segmentation.

  6. Fatty acid methyl ester analysis to identify sources of soil in surface water.

    PubMed

    Banowetz, Gary M; Whittaker, Gerald W; Dierksen, Karen P; Azevedo, Mark D; Kennedy, Ann C; Griffith, Stephen M; Steiner, Jeffrey J

    2006-01-01

    Efforts to improve land-use practices to prevent contamination of surface waters with soil are limited by an inability to identify the primary sources of soil present in these waters. We evaluated the utility of fatty acid methyl ester (FAME) profiles of dry reference soils for multivariate statistical classification of soils collected from surface waters adjacent to agricultural production fields and a wooded riparian zone. Trials that compared approaches to concentrate soil from surface water showed that aluminum sulfate precipitation provided comparable yields to that obtained by vacuum filtration and was more suitable for handling large numbers of samples. Fatty acid methyl ester profiles were developed from reference soils collected from contrasting land uses in different seasons to determine whether specific fatty acids would consistently serve as variables in multivariate statistical analyses to permit reliable classification of soils. We used a Bayesian method and an independent iterative process to select appropriate fatty acids and found that variable selection was strongly impacted by the season during which soil was collected. The apparent seasonal variation in the occurrence of marker fatty acids in FAME profiles from reference soils prevented preparation of a standardized set of variables. Nevertheless, accurate classification of soil in surface water was achieved utilizing fatty acid variables identified in seasonally matched reference soils. Correlation analysis of entire chromatograms and subsequent discriminant analyses utilizing a restricted number of fatty acid variables showed that FAME profiles of soils exposed to the aquatic environment still had utility for classification at least 1 wk after submersion.

  7. A Temporal Pattern Mining Approach for Classifying Electronic Health Record Data

    PubMed Central

    Batal, Iyad; Valizadegan, Hamed; Cooper, Gregory F.; Hauskrecht, Milos

    2013-01-01

    We study the problem of learning classification models from complex multivariate temporal data encountered in electronic health record systems. The challenge is to define a good set of features that are able to represent well the temporal aspect of the data. Our method relies on temporal abstractions and temporal pattern mining to extract the classification features. Temporal pattern mining usually returns a large number of temporal patterns, most of which may be irrelevant to the classification task. To address this problem, we present the Minimal Predictive Temporal Patterns framework to generate a small set of predictive and non-spurious patterns. We apply our approach to the real-world clinical task of predicting patients who are at risk of developing heparin induced thrombocytopenia. The results demonstrate the benefit of our approach in efficiently learning accurate classifiers, which is a key step for developing intelligent clinical monitoring systems. PMID:25309815

  8. Photoacoustic discrimination of vascular and pigmented lesions using classical and Bayesian methods

    NASA Astrophysics Data System (ADS)

    Swearingen, Jennifer A.; Holan, Scott H.; Feldman, Mary M.; Viator, John A.

    2010-01-01

    Discrimination of pigmented and vascular lesions in skin can be difficult due to factors such as size, subungual location, and the nature of lesions containing both melanin and vascularity. Misdiagnosis may lead to precancerous or cancerous lesions not receiving proper medical care. To aid in the rapid and accurate diagnosis of such pathologies, we develop a photoacoustic system to determine the nature of skin lesions in vivo. By irradiating skin with two laser wavelengths, 422 and 530 nm, we induce photoacoustic responses, and the relative response at these two wavelengths indicates whether the lesion is pigmented or vascular. This response is due to the distinct absorption spectrum of melanin and hemoglobin. In particular, pigmented lesions have ratios of photoacoustic amplitudes of approximately 1.4 to 1 at the two wavelengths, while vascular lesions have ratios of about 4.0 to 1. Furthermore, we consider two statistical methods for conducting classification of lesions: standard multivariate analysis classification techniques and a Bayesian-model-based approach. We study 15 human subjects with eight vascular and seven pigmented lesions. Using the classical method, we achieve a perfect classification rate, while the Bayesian approach has an error rate of 20%.

  9. Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning.

    PubMed

    Formisano, Elia; De Martino, Federico; Valente, Giancarlo

    2008-09-01

    Machine learning and pattern recognition techniques are being increasingly employed in functional magnetic resonance imaging (fMRI) data analysis. By taking into account the full spatial pattern of brain activity measured simultaneously at many locations, these methods allow detecting subtle, non-strictly localized effects that may remain invisible to the conventional analysis with univariate statistical methods. In typical fMRI applications, pattern recognition algorithms "learn" a functional relationship between brain response patterns and a perceptual, cognitive or behavioral state of a subject expressed in terms of a label, which may assume discrete (classification) or continuous (regression) values. This learned functional relationship is then used to predict the unseen labels from a new data set ("brain reading"). In this article, we describe the mathematical foundations of machine learning applications in fMRI. We focus on two methods, support vector machines and relevance vector machines, which are respectively suited for the classification and regression of fMRI patterns. Furthermore, by means of several examples and applications, we illustrate and discuss the methodological challenges of using machine learning algorithms in the context of fMRI data analysis.

  10. Towards exaggerated emphysema stereotypes

    NASA Astrophysics Data System (ADS)

    Chen, C.; Sørensen, L.; Lauze, F.; Igel, C.; Loog, M.; Feragen, A.; de Bruijne, M.; Nielsen, M.

    2012-03-01

    Classification is widely used in the context of medical image analysis and in order to illustrate the mechanism of a classifier, we introduce the notion of an exaggerated image stereotype based on training data and trained classifier. The stereotype of some image class of interest should emphasize/exaggerate the characteristic patterns in an image class and visualize the information the employed classifier relies on. This is useful for gaining insight into the classification and serves for comparison with the biological models of disease. In this work, we build exaggerated image stereotypes by optimizing an objective function which consists of a discriminative term based on the classification accuracy, and a generative term based on the class distributions. A gradient descent method based on iterated conditional modes (ICM) is employed for optimization. We use this idea with Fisher's linear discriminant rule and assume a multivariate normal distribution for samples within a class. The proposed framework is applied to computed tomography (CT) images of lung tissue with emphysema. The synthesized stereotypes illustrate the exaggerated patterns of lung tissue with emphysema, which is underpinned by three different quantitative evaluation methods.

  11. Portable XRF and principal component analysis for bill characterization in forensic science.

    PubMed

    Appoloni, C R; Melquiades, F L

    2014-02-01

    Several modern techniques have been applied to prevent counterfeiting of money bills. The objective of this study was to demonstrate the potential of Portable X-ray Fluorescence (PXRF) technique and the multivariate analysis method of Principal Component Analysis (PCA) for classification of bills in order to use it in forensic science. Bills of Dollar, Euro and Real (Brazilian currency) were measured directly at different colored regions, without any previous preparation. Spectra interpretation allowed the identification of Ca, Ti, Fe, Cu, Sr, Y, Zr and Pb. PCA analysis separated the bills in three groups and subgroups among Brazilian currency. In conclusion, the samples were classified according to its origin identifying the elements responsible for differentiation and basic pigment composition. PXRF allied to multivariate discriminate methods is a promising technique for rapid and no destructive identification of false bills in forensic science. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. Improving the analysis of near-spectroscopy data with multivariate classification of hemodynamic patterns: a theoretical formulation and validation.

    PubMed

    Gemignani, Jessica; Middell, Eike; Barbour, Randall L; Graber, Harry L; Blankertz, Benjamin

    2018-04-04

    The statistical analysis of functional near infrared spectroscopy (fNIRS) data based on the general linear model (GLM) is often made difficult by serial correlations, high inter-subject variability of the hemodynamic response, and the presence of motion artifacts. In this work we propose to extract information on the pattern of hemodynamic activations without using any a priori model for the data, by classifying the channels as 'active' or 'not active' with a multivariate classifier based on linear discriminant analysis (LDA). This work is developed in two steps. First we compared the performance of the two analyses, using a synthetic approach in which simulated hemodynamic activations were combined with either simulated or real resting-state fNIRS data. This procedure allowed for exact quantification of the classification accuracies of GLM and LDA. In the case of real resting-state data, the correlations between classification accuracy and demographic characteristics were investigated by means of a Linear Mixed Model. In the second step, to further characterize the reliability of the newly proposed analysis method, we conducted an experiment in which participants had to perform a simple motor task and data were analyzed with the LDA-based classifier as well as with the standard GLM analysis. The results of the simulation study show that the LDA-based method achieves higher classification accuracies than the GLM analysis, and that the LDA results are more uniform across different subjects and, in contrast to the accuracies achieved by the GLM analysis, have no significant correlations with any of the demographic characteristics. Findings from the real-data experiment are consistent with the results of the real-plus-simulation study, in that the GLM-analysis results show greater inter-subject variability than do the corresponding LDA results. The results obtained suggest that the outcome of GLM analysis is highly vulnerable to violations of theoretical assumptions, and that therefore a data-driven approach such as that provided by the proposed LDA-based method is to be favored.

  13. Unsupervised classification of multivariate geostatistical data: Two algorithms

    NASA Astrophysics Data System (ADS)

    Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques

    2015-12-01

    With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.

  14. Influence of microclimatic ammonia levels on productive performance of different broilers’ breeds estimated with univariate and multivariate approaches

    PubMed Central

    Soliman, Essam S.; Moawed, Sherif A.; Hassan, Rania A.

    2017-01-01

    Background and Aim: Birds litter contains unutilized nitrogen in the form of uric acid that is converted into ammonia; a fact that does not only affect poultry performance but also has a negative effect on people’s health around the farm and contributes in the environmental degradation. The influence of microclimatic ammonia emissions on Ross and Hubbard broilers reared in different housing systems at two consecutive seasons (fall and winter) was evaluated using a discriminant function analysis to differentiate between Ross and Hubbard breeds. Materials and Methods: A total number of 400 air samples were collected and analyzed for ammonia levels during the experimental period. Data were analyzed using univariate and multivariate statistical methods. Results: Ammonia levels were significantly higher (p< 0.01) in the Ross compared to the Hubbard breed farm, although no significant differences (p>0.05) were found between the two farms in body weight, body weight gain, feed intake, feed conversion ratio, and performance index (PI) of broilers. Body weight; weight gain and PI had increased values (p< 0.01) during fall compared to winter irrespective of broiler breed. Ammonia emissions were positively (although weekly) correlated with the ambient relative humidity (r=0.383; p< 0.01), but not with the ambient temperature (r=−0.045; p>0.05). Test of significance of discriminant function analysis did not show a classification based on the studied traits suggesting that they cannot been used as predictor variables. The percentage of correct classification was 52% and it was improved after deletion of highly correlated traits to 57%. Conclusion: The study revealed that broiler’s growth was negatively affected by increased microclimatic ammonia concentrations and recommended the analysis of broilers’ growth performance parameters data using multivariate discriminant function analysis. PMID:28919677

  15. Classification of the medicinal plants of the genus Atractylodes using high-performance liquid chromatography with diode array and tandem mass spectrometry detection combined with multivariate statistical analysis.

    PubMed

    Cho, Hyun-Deok; Kim, Unyong; Suh, Joon Hyuk; Eom, Han Young; Kim, Junghyun; Lee, Seul Gi; Choi, Yong Seok; Han, Sang Beom

    2016-04-01

    Analytical methods using high-performance liquid chromatography with diode array and tandem mass spectrometry detection were developed for the discrimination of the rhizomes of four Atractylodes medicinal plants: A. japonica, A. macrocephala, A. chinensis, and A. lancea. A quantitative study was performed, selecting five bioactive components, including atractylenolide I, II, III, eudesma-4(14),7(11)-dien-8-one and atractylodin, on twenty-six Atractylodes samples of various origins. Sample extraction was optimized to sonication with 80% methanol for 40 min at room temperature. High-performance liquid chromatography with diode array detection was established using a C18 column with a water/acetonitrile gradient system at a flow rate of 1.0 mL/min, and the detection wavelength was set at 236 nm. Liquid chromatography with tandem mass spectrometry was applied to certify the reliability of the quantitative results. The developed methods were validated by ensuring specificity, linearity, limit of quantification, accuracy, precision, recovery, robustness, and stability. Results showed that cangzhu contained higher amounts of atractylenolide I and atractylodin than baizhu, and especially atractylodin contents showed the greatest variation between baizhu and cangzhu. Multivariate statistical analysis, such as principal component analysis and hierarchical cluster analysis, were also employed for further classification of the Atractylodes plants. The established method was suitable for quality control of the Atractylodes plants. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Clustering of the human skeletal muscle fibers using linear programming and angular Hilbertian metrics.

    PubMed

    Neji, Radhouène; Besbes, Ahmed; Komodakis, Nikos; Deux, Jean-François; Maatouk, Mezri; Rahmouni, Alain; Bassez, Guillaume; Fleury, Gilles; Paragios, Nikos

    2009-01-01

    In this paper, we present a manifold clustering method fo the classification of fibers obtained from diffusion tensor images (DTI) of the human skeletal muscle. Using a linear programming formulation of prototype-based clustering, we propose a novel fiber classification algorithm over manifolds that circumvents the necessity to embed the data in low dimensional spaces and determines automatically the number of clusters. Furthermore, we propose the use of angular Hilbertian metrics between multivariate normal distributions to define a family of distances between tensors that we generalize to fibers. These metrics are used to approximate the geodesic distances over the fiber manifold. We also discuss the case where only geodesic distances to a reduced set of landmark fibers are available. The experimental validation of the method is done using a manually annotated significant dataset of DTI of the calf muscle for healthy and diseased subjects.

  17. Simultaneous Force Regression and Movement Classification of Fingers via Surface EMG within a Unified Bayesian Framework.

    PubMed

    Baldacchino, Tara; Jacobs, William R; Anderson, Sean R; Worden, Keith; Rowson, Jennifer

    2018-01-01

    This contribution presents a novel methodology for myolectric-based control using surface electromyographic (sEMG) signals recorded during finger movements. A multivariate Bayesian mixture of experts (MoE) model is introduced which provides a powerful method for modeling force regression at the fingertips, while also performing finger movement classification as a by-product of the modeling algorithm. Bayesian inference of the model allows uncertainties to be naturally incorporated into the model structure. This method is tested using data from the publicly released NinaPro database which consists of sEMG recordings for 6 degree-of-freedom force activations for 40 intact subjects. The results demonstrate that the MoE model achieves similar performance compared to the benchmark set by the authors of NinaPro for finger force regression. Additionally, inherent to the Bayesian framework is the inclusion of uncertainty in the model parameters, naturally providing confidence bounds on the force regression predictions. Furthermore, the integrated clustering step allows a detailed investigation into classification of the finger movements, without incurring any extra computational effort. Subsequently, a systematic approach to assessing the importance of the number of electrodes needed for accurate control is performed via sensitivity analysis techniques. A slight degradation in regression performance is observed for a reduced number of electrodes, while classification performance is unaffected.

  18. ASTM clustering for improving coal analysis by near-infrared spectroscopy.

    PubMed

    Andrés, J M; Bona, M T

    2006-11-15

    Multivariate analysis techniques have been applied to near-infrared (NIR) spectra coals to investigate the relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding predictor variables. In this work, a whole set of coal samples was grouped into six more homogeneous clusters following the ASTM reference method for classification prior to the application of calibration methods to each coal set. The results obtained showed a considerable improvement of the error determination compared with the calibration for the whole sample set. For some groups, the established calibrations approached the quality required by the ASTM/ISO norms for laboratory analysis. To predict property values for a new coal sample it is necessary the assignation of that sample to its respective group. Thus, the discrimination and classification ability of coal samples by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) in the NIR range was also studied by applying Soft Independent Modelling of Class Analogy (SIMCA) and Linear Discriminant Analysis (LDA) techniques. Modelling of the groups by SIMCA led to overlapping models that cannot discriminate for unique classification. On the other hand, the application of Linear Discriminant Analysis improved the classification of the samples but not enough to be satisfactory for every group considered.

  19. Real-time Neuroimaging and Cognitive Monitoring Using Wearable Dry EEG

    PubMed Central

    Mullen, Tim R.; Kothe, Christian A.E.; Chi, Mike; Ojeda, Alejandro; Kerth, Trevor; Makeig, Scott; Jung, Tzyy-Ping; Cauwenberghs, Gert

    2015-01-01

    Goal We present and evaluate a wearable high-density dry electrode EEG system and an open-source software framework for online neuroimaging and state classification. Methods The system integrates a 64-channel dry EEG form-factor with wireless data streaming for online analysis. A real-time software framework is applied, including adaptive artifact rejection, cortical source localization, multivariate effective connectivity inference, data visualization, and cognitive state classification from connectivity features using a constrained logistic regression approach (ProxConn). We evaluate the system identification methods on simulated 64-channel EEG data. Then we evaluate system performance, using ProxConn and a benchmark ERP method, in classifying response errors in 9 subjects using the dry EEG system. Results Simulations yielded high accuracy (AUC=0.97±0.021) for real-time cortical connectivity estimation. Response error classification using cortical effective connectivity (sdDTF) was significantly above chance with similar performance (AUC) for cLORETA (0.74±0.09) and LCMV (0.72±0.08) source localization. Cortical ERP-based classification was equivalent to ProxConn for cLORETA (0.74±0.16) but significantly better for LCMV (0.82±0.12). Conclusion We demonstrated the feasibility for real-time cortical connectivity analysis and cognitive state classification from high-density wearable dry EEG. Significance This paper is the first validated application of these methods to 64-channel dry EEG. The work addresses a need for robust real-time measurement and interpretation of complex brain activity in the dynamic environment of the wearable setting. Such advances can have broad impact in research, medicine, and brain-computer interfaces. The pipelines are made freely available in the open-source SIFT and BCILAB toolboxes. PMID:26415149

  20. Investigating the sex-related geometric variation of the human cranium.

    PubMed

    Bertsatos, Andreas; Papageorgopoulou, Christina; Valakos, Efstratios; Chovalopoulou, Maria-Eleni

    2018-01-29

    Accurate sexing methods are of great importance in forensic anthropology since sex assessment is among the principal tasks when examining human skeletal remains. The present study explores a novel approach in assessing the most accurate metric traits of the human cranium for sex estimation based on 80 ectocranial landmarks from 176 modern individuals of known age and sex from the Athens Collection. The purpose of the study is to identify those distance and angle measurements that can be most effectively used in sex assessment. Three-dimensional landmark coordinates were digitized with a Microscribe 3DX and analyzed in GNU Octave. An iterative linear discriminant analysis of all possible combinations of landmarks was performed for each unique set of the 3160 distances and 246,480 angles. Cross-validated correct classification as well as multivariate DFA on top performing variables reported 13 craniometric distances with over 85% classification accuracy, 7 angles over 78%, as well as certain multivariate combinations yielding over 95%. Linear regression of these variables with the centroid size was used to assess their relation to the size of the cranium. In contrast to the use of generalized procrustes analysis (GPA) and principal component analysis (PCA), which constitute the common analytical work flow for such data, our method, although computational intensive, produced easily applicable discriminant functions of high accuracy, while at the same time explored the maximum of cranial variability.

  1. Selected-ion flow-tube mass-spectrometry (SIFT-MS) fingerprinting versus chemical profiling for geographic traceability of Moroccan Argan oils.

    PubMed

    Kharbach, Mourad; Kamal, Rabie; Mansouri, Mohammed Alaoui; Marmouzi, Ilias; Viaene, Johan; Cherrah, Yahia; Alaoui, Katim; Vercammen, Joeri; Bouklouze, Abdelaziz; Vander Heyden, Yvan

    2018-10-15

    This study investigated the effectiveness of SIFT-MS versus chemical profiling, both coupled to multivariate data analysis, to classify 95 Extra Virgin Argan Oils (EVAO), originating from five Moroccan Argan forest locations. The full scan option of SIFT-MS, is suitable to indicate the geographic origin of EVAO based on the fingerprints obtained using the three chemical ionization precursors (H 3 O + , NO + and O 2 + ). The chemical profiling (including acidity, peroxide value, spectrophotometric indices, fatty acids, tocopherols- and sterols composition) was also used for classification. Partial least squares discriminant analysis (PLS-DA), soft independent modeling of class analogy (SIMCA), K-nearest neighbors (KNN), and support vector machines (SVM), were compared. The SIFT-MS data were therefore fed to variable-selection methods to find potential biomarkers for classification. The classification models based either on chemical profiling or SIFT-MS data were able to classify the samples with high accuracy. SIFT-MS was found to be advantageous for rapid geographic classification. Copyright © 2018 Elsevier Ltd. All rights reserved.

  2. Comparative evaluation of spectroscopic models using different multivariate statistical tools in a multicancer scenario

    NASA Astrophysics Data System (ADS)

    Ghanate, A. D.; Kothiwale, S.; Singh, S. P.; Bertrand, Dominique; Krishna, C. Murali

    2011-02-01

    Cancer is now recognized as one of the major causes of morbidity and mortality. Histopathological diagnosis, the gold standard, is shown to be subjective, time consuming, prone to interobserver disagreement, and often fails to predict prognosis. Optical spectroscopic methods are being contemplated as adjuncts or alternatives to conventional cancer diagnostics. The most important aspect of these approaches is their objectivity, and multivariate statistical tools play a major role in realizing it. However, rigorous evaluation of the robustness of spectral models is a prerequisite. The utility of Raman spectroscopy in the diagnosis of cancers has been well established. Until now, the specificity and applicability of spectral models have been evaluated for specific cancer types. In this study, we have evaluated the utility of spectroscopic models representing normal and malignant tissues of the breast, cervix, colon, larynx, and oral cavity in a broader perspective, using different multivariate tests. The limit test, which was used in our earlier study, gave high sensitivity but suffered from poor specificity. The performance of other methods such as factorial discriminant analysis and partial least square discriminant analysis are at par with more complex nonlinear methods such as decision trees, but they provide very little information about the classification model. This comparative study thus demonstrates not just the efficacy of Raman spectroscopic models but also the applicability and limitations of different multivariate tools for discrimination under complex conditions such as the multicancer scenario.

  3. Estimating the Classification Efficiency of a Test Battery.

    ERIC Educational Resources Information Center

    De Corte, Wilfried

    2000-01-01

    Shows how a theorem proven by H. Brogden (1951, 1959) can be used to estimate the allocation average (a predictor based classification of a test battery) assuming that the predictor intercorrelations and validities are known and that the predictor variables have a joint multivariate normal distribution. (SLD)

  4. Multivariate decoding of brain images using ordinal regression.

    PubMed

    Doyle, O M; Ashburner, J; Zelaya, F O; Williams, S C R; Mehta, M A; Marquand, A F

    2013-11-01

    Neuroimaging data are increasingly being used to predict potential outcomes or groupings, such as clinical severity, drug dose response, and transitional illness states. In these examples, the variable (target) we want to predict is ordinal in nature. Conventional classification schemes assume that the targets are nominal and hence ignore their ranked nature, whereas parametric and/or non-parametric regression models enforce a metric notion of distance between classes. Here, we propose a novel, alternative multivariate approach that overcomes these limitations - whole brain probabilistic ordinal regression using a Gaussian process framework. We applied this technique to two data sets of pharmacological neuroimaging data from healthy volunteers. The first study was designed to investigate the effect of ketamine on brain activity and its subsequent modulation with two compounds - lamotrigine and risperidone. The second study investigates the effect of scopolamine on cerebral blood flow and its modulation using donepezil. We compared ordinal regression to multi-class classification schemes and metric regression. Considering the modulation of ketamine with lamotrigine, we found that ordinal regression significantly outperformed multi-class classification and metric regression in terms of accuracy and mean absolute error. However, for risperidone ordinal regression significantly outperformed metric regression but performed similarly to multi-class classification both in terms of accuracy and mean absolute error. For the scopolamine data set, ordinal regression was found to outperform both multi-class and metric regression techniques considering the regional cerebral blood flow in the anterior cingulate cortex. Ordinal regression was thus the only method that performed well in all cases. Our results indicate the potential of an ordinal regression approach for neuroimaging data while providing a fully probabilistic framework with elegant approaches for model selection. Copyright © 2013. Published by Elsevier Inc.

  5. Remodeling characteristics and collagen distribution in synthetic mesh materials explanted from human subjects after abdominal wall reconstruction: an analysis of remodeling characteristics by patient risk factors and surgical site classifications

    PubMed Central

    Cavallo, Jaime A.; Roma, Andres A.; Jasielec, Mateusz S.; Ousley, Jenny; Creamer, Jennifer; Pichert, Matthew D.; Baalman, Sara; Frisella, Margaret M.; Matthews, Brent D.

    2014-01-01

    Background The purpose of this study was to evaluate the associations between patient characteristics or surgical site classifications and the histologic remodeling scores of synthetic meshes biopsied from their abdominal wall repair sites in the first attempt to generate a multivariable risk prediction model of non-constructive remodeling. Methods Biopsies of the synthetic meshes were obtained from the abdominal wall repair sites of 51 patients during a subsequent abdominal re-exploration. Biopsies were stained with hematoxylin and eosin, and evaluated according to a semi-quantitative scoring system for remodeling characteristics (cell infiltration, cell types, extracellular matrix deposition, inflammation, fibrous encapsulation, and neovascularization) and a mean composite score (CR). Biopsies were also stained with Sirius Red and Fast Green, and analyzed to determine the collagen I:III ratio. Based on univariate analyses between subject clinical characteristics or surgical site classification and the histologic remodeling scores, cohort variables were selected for multivariable regression models using a threshold p value of ≤0.200. Results The model selection process for the extracellular matrix score yielded two variables: subject age at time of mesh implantation, and mesh classification (c-statistic = 0.842). For CR score, the model selection process yielded two variables: subject age at time of mesh implantation and mesh classification (r2 = 0.464). The model selection process for the collagen III area yielded a model with two variables: subject body mass index at time of mesh explantation and pack-year history (r2 = 0.244). Conclusion Host characteristics and surgical site assessments may predict degree of remodeling for synthetic meshes used to reinforce abdominal wall repair sites. These preliminary results constitute the first steps in generating a risk prediction model that predicts the patients and clinical circumstances for which non-constructive remodeling of an abdominal wall repair site with synthetic mesh reinforcement is most likely to occur. PMID:24442681

  6. Automated classification of single airborne particles from two-dimensional angle-resolved optical scattering (TAOS) patterns by non-linear filtering

    NASA Astrophysics Data System (ADS)

    Crosta, Giovanni Franco; Pan, Yong-Le; Aptowicz, Kevin B.; Casati, Caterina; Pinnick, Ronald G.; Chang, Richard K.; Videen, Gorden W.

    2013-12-01

    Measurement of two-dimensional angle-resolved optical scattering (TAOS) patterns is an attractive technique for detecting and characterizing micron-sized airborne particles. In general, the interpretation of these patterns and the retrieval of the particle refractive index, shape or size alone, are difficult problems. By reformulating the problem in statistical learning terms, a solution is proposed herewith: rather than identifying airborne particles from their scattering patterns, TAOS patterns themselves are classified through a learning machine, where feature extraction interacts with multivariate statistical analysis. Feature extraction relies on spectrum enhancement, which includes the discrete cosine FOURIER transform and non-linear operations. Multivariate statistical analysis includes computation of the principal components and supervised training, based on the maximization of a suitable figure of merit. All algorithms have been combined together to analyze TAOS patterns, organize feature vectors, design classification experiments, carry out supervised training, assign unknown patterns to classes, and fuse information from different training and recognition experiments. The algorithms have been tested on a data set with more than 3000 TAOS patterns. The parameters that control the algorithms at different stages have been allowed to vary within suitable bounds and are optimized to some extent. Classification has been targeted at discriminating aerosolized Bacillus subtilis particles, a simulant of anthrax, from atmospheric aerosol particles and interfering particles, like diesel soot. By assuming that all training and recognition patterns come from the respective reference materials only, the most satisfactory classification result corresponds to 20% false negatives from B. subtilis particles and <11% false positives from all other aerosol particles. The most effective operations have consisted of thresholding TAOS patterns in order to reject defective ones, and forming training sets from three or four pattern classes. The presented automated classification method may be adapted into a real-time operation technique, capable of detecting and characterizing micron-sized airborne particles.

  7. Periodontal inflamed surface area as a novel numerical variable describing periodontal conditions

    PubMed Central

    2017-01-01

    Purpose A novel index, the periodontal inflamed surface area (PISA), represents the sum of the periodontal pocket depth of bleeding on probing (BOP)-positive sites. In the present study, we evaluated correlations between PISA and periodontal classifications, and examined PISA as an index integrating the discrete conventional periodontal indexes. Methods This study was a cross-sectional subgroup analysis of data from a prospective cohort study investigating the association between chronic periodontitis and the clinical features of ankylosing spondylitis. Data from 84 patients without systemic diseases (the control group in the previous study) were analyzed in the present study. Results PISA values were positively correlated with conventional periodontal classifications (Spearman correlation coefficient=0.52; P<0.01) and with periodontal indexes, such as BOP and the plaque index (PI) (r=0.94; P<0.01 and r=0.60; P<0.01, respectively; Pearson correlation test). Porphyromonas gingivalis (P. gingivalis) expression and the presence of serum P. gingivalis antibodies were significant factors affecting PISA values in a simple linear regression analysis, together with periodontal classification, PI, bleeding index, and smoking, but not in the multivariate analysis. In the multivariate linear regression analysis, PISA values were positively correlated with the quantity of current smoking, PI, and severity of periodontal disease. Conclusions PISA integrates multiple periodontal indexes, such as probing pocket depth, BOP, and PI into a numerical variable. PISA is advantageous for quantifying periodontal inflammation and plaque accumulation. PMID:29093989

  8. Predicting clinical diagnosis in Huntington's disease: An imaging polymarker

    PubMed Central

    Daws, Richard E.; Soreq, Eyal; Johnson, Eileanoir B.; Scahill, Rachael I.; Tabrizi, Sarah J.; Barker, Roger A.; Hampshire, Adam

    2018-01-01

    Objective Huntington's disease (HD) gene carriers can be identified before clinical diagnosis; however, statistical models for predicting when overt motor symptoms will manifest are too imprecise to be useful at the level of the individual. Perfecting this prediction is integral to the search for disease modifying therapies. This study aimed to identify an imaging marker capable of reliably predicting real‐life clinical diagnosis in HD. Method A multivariate machine learning approach was applied to resting‐state and structural magnetic resonance imaging scans from 19 premanifest HD gene carriers (preHD, 8 of whom developed clinical disease in the 5 years postscanning) and 21 healthy controls. A classification model was developed using cross‐group comparisons between preHD and controls, and within the preHD group in relation to “estimated” and “actual” proximity to disease onset. Imaging measures were modeled individually, and combined, and permutation modeling robustly tested classification accuracy. Results Classification performance for preHDs versus controls was greatest when all measures were combined. The resulting polymarker predicted converters with high accuracy, including those who were not expected to manifest in that time scale based on the currently adopted statistical models. Interpretation We propose that a holistic multivariate machine learning treatment of brain abnormalities in the premanifest phase can be used to accurately identify those patients within 5 years of developing motor features of HD, with implications for prognostication and preclinical trials. Ann Neurol 2018;83:532–543 PMID:29405351

  9. Discrimination of edible oils and fats by combination of multivariate pattern recognition and FT-IR spectroscopy: a comparative study between different modeling methods.

    PubMed

    Javidnia, Katayoun; Parish, Maryam; Karimi, Sadegh; Hemmateenejad, Bahram

    2013-03-01

    By using FT-IR spectroscopy, many researchers from different disciplines enrich the experimental complexity of their research for obtaining more precise information. Moreover chemometrics techniques have boosted the use of IR instruments. In the present study we aimed to emphasize on the power of FT-IR spectroscopy for discrimination between different oil samples (especially fat from vegetable oils). Also our data were used to compare the performance of different classification methods. FT-IR transmittance spectra of oil samples (Corn, Colona, Sunflower, Soya, Olive, and Butter) were measured in the wave-number interval of 450-4000 cm(-1). Classification analysis was performed utilizing PLS-DA, interval PLS-DA, extended canonical variate analysis (ECVA) and interval ECVA methods. The effect of data preprocessing by extended multiplicative signal correction was investigated. Whilst all employed method could distinguish butter from vegetable oils, iECVA resulted in the best performances for calibration and external test set with 100% sensitivity and specificity. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Superiority of artificial neural networks for a genetic classification procedure.

    PubMed

    Sant'Anna, I C; Tomaz, R S; Silva, G N; Nascimento, M; Bhering, L L; Cruz, C D

    2015-08-19

    The correct classification of individuals is extremely important for the preservation of genetic variability and for maximization of yield in breeding programs using phenotypic traits and genetic markers. The Fisher and Anderson discriminant functions are commonly used multivariate statistical techniques for these situations, which allow for the allocation of an initially unknown individual to predefined groups. However, for higher levels of similarity, such as those found in backcrossed populations, these methods have proven to be inefficient. Recently, much research has been devoted to developing a new paradigm of computing known as artificial neural networks (ANNs), which can be used to solve many statistical problems, including classification problems. The aim of this study was to evaluate the feasibility of ANNs as an evaluation technique of genetic diversity by comparing their performance with that of traditional methods. The discriminant functions were equally ineffective in discriminating the populations, with error rates of 23-82%, thereby preventing the correct discrimination of individuals between populations. The ANN was effective in classifying populations with low and high differentiation, such as those derived from a genetic design established from backcrosses, even in cases of low differentiation of the data sets. The ANN appears to be a promising technique to solve classification problems, since the number of individuals classified incorrectly by the ANN was always lower than that of the discriminant functions. We envisage the potential relevant application of this improved procedure in the genomic classification of markers to distinguish between breeds and accessions.

  11. Penalized discriminant analysis for the detection of wild-grown and cultivated Ganoderma lucidum using Fourier transform infrared spectroscopy

    NASA Astrophysics Data System (ADS)

    Zhu, Ying; Tan, Tuck Lee

    2016-04-01

    An effective and simple analytical method using Fourier transform infrared (FTIR) spectroscopy to distinguish wild-grown high-quality Ganoderma lucidum (G. lucidum) from cultivated one is of essential importance for its quality assurance and medicinal value estimation. Commonly used chemical and analytical methods using full spectrum are not so effective for the detection and interpretation due to the complex system of the herbal medicine. In this study, two penalized discriminant analysis models, penalized linear discriminant analysis (PLDA) and elastic net (Elnet),using FTIR spectroscopy have been explored for the purpose of discrimination and interpretation. The classification performances of the two penalized models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The Elnet model involving a combination of L1 and L2 norm penalties enabled an automatic selection of a small number of informative spectral absorption bands and gave an excellent classification accuracy of 99% for discrimination between spectra of wild-grown and cultivated G. lucidum. Its classification performance was superior to that of the PLDA model in a pure L1 setting and outperformed the PCDA and PLSDA models using full wavelength. The well-performed selection of informative spectral features leads to substantial reduction in model complexity and improvement of classification accuracy, and it is particularly helpful for the quantitative interpretations of the major chemical constituents of G. lucidum regarding its anti-cancer effects.

  12. Improved classification and visualization of healthy and pathological hard dental tissues by modeling specular reflections in NIR hyperspectral images

    NASA Astrophysics Data System (ADS)

    Usenik, Peter; Bürmen, Miran; Fidler, Aleš; Pernuš, Franjo; Likar, Boštjan

    2012-03-01

    Despite major improvements in dental healthcare and technology, dental caries remains one of the most prevalent chronic diseases of modern society. The initial stages of dental caries are characterized by demineralization of enamel crystals, commonly known as white spots, which are difficult to diagnose. Near-infrared (NIR) hyperspectral imaging is a new promising technique for early detection of demineralization which can classify healthy and pathological dental tissues. However, due to non-ideal illumination of the tooth surface the hyperspectral images can exhibit specular reflections, in particular around the edges and the ridges of the teeth. These reflections significantly affect the performance of automated classification and visualization methods. Cross polarized imaging setup can effectively remove the specular reflections, however is due to the complexity and other imaging setup limitations not always possible. In this paper, we propose an alternative approach based on modeling the specular reflections of hard dental tissues, which significantly improves the classification accuracy in the presence of specular reflections. The method was evaluated on five extracted human teeth with corresponding gold standard for 6 different healthy and pathological hard dental tissues including enamel, dentin, calculus, dentin caries, enamel caries and demineralized regions. Principal component analysis (PCA) was used for multivariate local modeling of healthy and pathological dental tissues. The classification was performed by employing multiple discriminant analysis. Based on the obtained results we believe the proposed method can be considered as an effective alternative to the complex cross polarized imaging setups.

  13. Authentication of Trappist beers by LC-MS fingerprints and multivariate data analysis.

    PubMed

    Mattarucchi, Elia; Stocchero, Matteo; Moreno-Rojas, José Manuel; Giordano, Giuseppe; Reniero, Fabiano; Guillou, Claude

    2010-12-08

    The aim of this study was to asses the applicability of LC-MS profiling to authenticate a selected Trappist beer as part of a program on traceability funded by the European Commission. A total of 232 beers were fingerprinted and classified through multivariate data analysis. The selected beer was clearly distinguished from beers of different brands, while only 3 samples (3.5% of the test set) were wrongly classified when compared with other types of beer of the same Trappist brewery. The fingerprints were further analyzed to extract the most discriminating variables, which proved to be sufficient for classification, even using a simplified unsupervised model. This reduced fingerprint allowed us to study the influence of batch-to-batch variability on the classification model. Our results can easily be applied to different matrices and they confirmed the effectiveness of LC-MS profiling in combination with multivariate data analysis for the characterization of food products.

  14. Gender, Race, and Survival: A Study in Non-Small-Cell Lung Cancer Brain Metastases Patients Utilizing the Radiation Therapy Oncology Group Recursive Partitioning Analysis Classification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Videtic, Gregory M.M., E-mail: videtig@ccf.or; Reddy, Chandana A.; Chao, Samuel T.

    Purpose: To explore whether gender and race influence survival in non-small-cell lung cancer (NSCLC) in patients with brain metastases, using our large single-institution brain tumor database and the Radiation Therapy Oncology Group recursive partitioning analysis (RPA) brain metastases classification. Methods and materials: A retrospective review of a single-institution brain metastasis database for the interval January 1982 to September 2004 yielded 835 NSCLC patients with brain metastases for analysis. Patient subsets based on combinations of gender, race, and RPA class were then analyzed for survival differences. Results: Median follow-up was 5.4 months (range, 0-122.9 months). There were 485 male patients (M)more » (58.4%) and 346 female patients (F) (41.6%). Of the 828 evaluable patients (99%), 143 (17%) were black/African American (B) and 685 (83%) were white/Caucasian (W). Median survival time (MST) from time of brain metastasis diagnosis for all patients was 5.8 months. Median survival time by gender (F vs. M) and race (W vs. B) was 6.3 months vs. 5.5 months (p = 0.013) and 6.0 months vs. 5.2 months (p = 0.08), respectively. For patients stratified by RPA class, gender, and race, MST significantly favored BFs over BMs in Class II: 11.2 months vs. 4.6 months (p = 0.021). On multivariable analysis, significant variables were gender (p = 0.041, relative risk [RR] 0.83) and RPA class (p < 0.0001, RR 0.28 for I vs. III; p < 0.0001, RR 0.51 for II vs. III) but not race. Conclusions: Gender significantly influences NSCLC brain metastasis survival. Race trended to significance in overall survival but was not significant on multivariable analysis. Multivariable analysis identified gender and RPA classification as significant variables with respect to survival.« less

  15. Authentication of whisky due to its botanical origin and way of production by instrumental analysis and multivariate classification methods

    NASA Astrophysics Data System (ADS)

    Wiśniewska, Paulina; Boqué, Ricard; Borràs, Eva; Busto, Olga; Wardencki, Waldemar; Namieśnik, Jacek; Dymerski, Tomasz

    2017-02-01

    Headspace mass-spectrometry (HS-MS), mid infrared (MIR) and UV-vis spectroscopy were used to authenticate whisky samples from different origins and ways of production ((Irish, Spanish, Bourbon, Tennessee Whisky and Scotch). The collected spectra were processed with partial least-squares discriminant analysis (PLS-DA) to build the classification models. In all cases the five groups of whiskies were distinguished, but the best results were obtained by HS-MS, which indicates that the biggest differences between different types of whisky are due to their aroma. Differences were also found inside groups, showing that not only raw material is important to discriminate samples but also the way of their production. The methodology is quick, easy and does not require sample preparation.

  16. Comparative study on fast classification of brick samples by combination of principal component analysis and linear discriminant analysis using stand-off and table-top laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Vítková, Gabriela; Prokeš, Lubomír; Novotný, Karel; Pořízka, Pavel; Novotný, Jan; Všianský, Dalibor; Čelko, Ladislav; Kaiser, Jozef

    2014-11-01

    Focusing on historical aspect, during archeological excavation or restoration works of buildings or different structures built from bricks it is important to determine, preferably in-situ and in real-time, the locality of bricks origin. Fast classification of bricks on the base of Laser-Induced Breakdown Spectroscopy (LIBS) spectra is possible using multivariate statistical methods. Combination of principal component analysis (PCA) and linear discriminant analysis (LDA) was applied in this case. LIBS was used to classify altogether the 29 brick samples from 7 different localities. Realizing comparative study using two different LIBS setups - stand-off and table-top it is shown that stand-off LIBS has a big potential for archeological in-field measurements.

  17. Semi-supervised anomaly detection - towards model-independent searches of new physics

    NASA Astrophysics Data System (ADS)

    Kuusela, Mikael; Vatanen, Tommi; Malmi, Eric; Raiko, Tapani; Aaltonen, Timo; Nagai, Yoshikazu

    2012-06-01

    Most classification algorithms used in high energy physics fall under the category of supervised machine learning. Such methods require a training set containing both signal and background events and are prone to classification errors should this training data be systematically inaccurate for example due to the assumed MC model. To complement such model-dependent searches, we propose an algorithm based on semi-supervised anomaly detection techniques, which does not require a MC training sample for the signal data. We first model the background using a multivariate Gaussian mixture model. We then search for deviations from this model by fitting to the observations a mixture of the background model and a number of additional Gaussians. This allows us to perform pattern recognition of any anomalous excess over the background. We show by a comparison to neural network classifiers that such an approach is a lot more robust against misspecification of the signal MC than supervised classification. In cases where there is an unexpected signal, a neural network might fail to correctly identify it, while anomaly detection does not suffer from such a limitation. On the other hand, when there are no systematic errors in the training data, both methods perform comparably.

  18. Identification and classification of silks using infrared spectroscopy

    PubMed Central

    Boulet-Audet, Maxime; Vollrath, Fritz; Holland, Chris

    2015-01-01

    ABSTRACT Lepidopteran silks number in the thousands and display a vast diversity of structures, properties and industrial potential. To map this remarkable biochemical diversity, we present an identification and screening method based on the infrared spectra of native silk feedstock and cocoons. Multivariate analysis of over 1214 infrared spectra obtained from 35 species allowed us to group silks into distinct hierarchies and a classification that agrees well with current phylogenetic data and taxonomies. This approach also provides information on the relative content of sericin, calcium oxalate, phenolic compounds, poly-alanine and poly(alanine-glycine) β-sheets. It emerged that the domesticated mulberry silkmoth Bombyx mori represents an outlier compared with other silkmoth taxa in terms of spectral properties. Interestingly, Epiphora bauhiniae was found to contain the highest amount of β-sheets reported to date for any wild silkmoth. We conclude that our approach provides a new route to determine cocoon chemical composition and in turn a novel, biological as well as material, classification of silks. PMID:26347557

  19. A computer analysis of ERTS data of the Lake Gregory area of South Australia with particular emphasis on its role in terrain classification for engineering. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Lodwick, G. D. (Principal Investigator)

    1976-01-01

    A digital computer and multivariate statistical techniques were used to analyze 4-band multispectral data. A representation of the original data for each of the four bands allows a certain degree of terrain interpretation; however, variations in appearance of sites within and between bands, without additional criteria for deciding which representation should be preferred, create difficulties for classification. Investigation of the video data groups produced by principal components analysis and cluster analysis techniques shows that effective correlations with classifications of terrain produced by conventional methods could be carried out. The analyses also highlighted underlying relationships between the various elements. The approach used allows large areas (185 cm by 185 cm) to be classified into fundamental units within a matter of hours and can be applied to those parts of the Earth where facilities for conventional studies are poor or lacking.

  20. Metabolomic analysis applied to chemosystematics and evolution of megadiverse Brazilian Vernonieae (Asteraceae).

    PubMed

    Gallon, Marília Elias; Monge, Marcelo; Casoti, Rosana; Da Costa, Fernando Batista; Semir, João; Gobbo-Neto, Leonardo

    2018-06-01

    Vernonia sensu lato is the largest and most complex genus of the tribe Vernonieae (Asteraceae). The tribe is chemically characterized by the presence of sesquiterpene lactones and flavonoids. Over the years, several taxonomic classifications have been proposed for Vernonia s.l. and for the tribe; however, there has been no consensus among the researches. According to traditional classification, Vernonia s.l. comprises more than 1000 species divided into sections, subsections and series (sensu Bentham). In a more recent classification, these species have been segregated into other genera and some subtribes were proposed, while the genus Vernonia sensu stricto was restricted to 22 species distributed mainly in North America (sensu Robinson). In this study, species from the subtribes Vernoniinae, Lepidaploinae and Rolandrinae were analyzed by UHPLC-UV-HRMS followed by multivariate statistical analysis. Data mining was performed using unsupervised (HCA and PCA) and supervised methods (OPLS-DA). The HCA showed the segregation of the species into four main groups. Comparing the HCA with taxonomical classifications of Vernonieae, we observed that the groups of the dendogram, based on metabolic profiling, were in accordance with the generic classification proposed by Robinson and with previous phylogenetic studies. The species of the genera Stenocephalum, Stilpnopappus, Strophopappus and Rolandra (Group 1) were revealed to be more related to the species of the genus Vernonanthura (Group 2), while the genera Cyrtocymura, Chrysolaena and Echinocoryne (Group 3) were chemically more similar to the genera Lessingianthus and Lepidaploa (Group 4). These findings indicated that the subtribes Vernoniinae and Lepidaploinae are non-chemically homogeneous groups and highlighted the application of untargeted metabolomic tools for taxonomy and as indicators of species evolution. Discriminant compounds for the groups obtained by OPLS-DA were determined. Groups 1 and 2 were characterized by the presence of 3',4'-dimethoxyluteolin, glaucolide A and 8-tigloyloxyglaucolide A. The species of Groups 3 and 4 were characterized by the presence of putative acacetin 7-O-rutinoside and glaucolide B. Therefore, untargeted metabolomic approach combined with multivariate statistical analysis, as proposed herein, allowed the identification of potential chemotaxonomic markers, helping in the taxonomic classifications. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. Data analysis techniques

    NASA Technical Reports Server (NTRS)

    Park, Steve

    1990-01-01

    A large and diverse number of computational techniques are routinely used to process and analyze remotely sensed data. These techniques include: univariate statistics; multivariate statistics; principal component analysis; pattern recognition and classification; other multivariate techniques; geometric correction; registration and resampling; radiometric correction; enhancement; restoration; Fourier analysis; and filtering. Each of these techniques will be considered, in order.

  2. Texture as a basis for acoustic classification of substrate in the nearshore region

    NASA Astrophysics Data System (ADS)

    Dennison, A.; Wattrus, N. J.

    2016-12-01

    Segmentation and classification of substrate type from two locations in Lake Superior, are predicted using multivariate statistical processing of textural measures derived from shallow-water, high-resolution multibeam bathymetric data. During a multibeam sonar survey, both bathymetric and backscatter data are collected. It is well documented that the statistical characteristic of a sonar backscatter mosaic is dependent on substrate type. While classifying the bottom-type on the basis on backscatter alone can accurately predict and map bottom-type, it lacks the ability to resolve and capture fine textural details, an important factor in many habitat mapping studies. Statistical processing can capture the pertinent details about the bottom-type that are rich in textural information. Further multivariate statistical processing can then isolate characteristic features, and provide the basis for an accurate classification scheme. Preliminary results from an analysis of bathymetric data and ground-truth samples collected from the Amnicon River, Superior, Wisconsin, and the Lester River, Duluth, Minnesota, demonstrate the ability to process and develop a novel classification scheme of the bottom type in two geomorphologically distinct areas.

  3. A Deep Learning Architecture for Temporal Sleep Stage Classification Using Multivariate and Multimodal Time Series.

    PubMed

    Chambon, Stanislas; Galtier, Mathieu N; Arnal, Pierrick J; Wainrib, Gilles; Gramfort, Alexandre

    2018-04-01

    Sleep stage classification constitutes an important preliminary exam in the diagnosis of sleep disorders. It is traditionally performed by a sleep expert who assigns to each 30 s of the signal of a sleep stage, based on the visual inspection of signals such as electroencephalograms (EEGs), electrooculograms (EOGs), electrocardiograms, and electromyograms (EMGs). We introduce here the first deep learning approach for sleep stage classification that learns end-to-end without computing spectrograms or extracting handcrafted features, that exploits all multivariate and multimodal polysomnography (PSG) signals (EEG, EMG, and EOG), and that can exploit the temporal context of each 30-s window of data. For each modality, the first layer learns linear spatial filters that exploit the array of sensors to increase the signal-to-noise ratio, and the last layer feeds the learnt representation to a softmax classifier. Our model is compared to alternative automatic approaches based on convolutional networks or decisions trees. Results obtained on 61 publicly available PSG records with up to 20 EEG channels demonstrate that our network architecture yields the state-of-the-art performance. Our study reveals a number of insights on the spatiotemporal distribution of the signal of interest: a good tradeoff for optimal classification performance measured with balanced accuracy is to use 6 EEG with 2 EOG (left and right) and 3 EMG chin channels. Also exploiting 1 min of data before and after each data segment offers the strongest improvement when a limited number of channels are available. As sleep experts, our system exploits the multivariate and multimodal nature of PSG signals in order to deliver the state-of-the-art classification performance with a small computational cost.

  4. Robust Averaging of Covariances for EEG Recordings Classification in Motor Imagery Brain-Computer Interfaces.

    PubMed

    Uehara, Takashi; Sartori, Matteo; Tanaka, Toshihisa; Fiori, Simone

    2017-06-01

    The estimation of covariance matrices is of prime importance to analyze the distribution of multivariate signals. In motor imagery-based brain-computer interfaces (MI-BCI), covariance matrices play a central role in the extraction of features from recorded electroencephalograms (EEGs); therefore, correctly estimating covariance is crucial for EEG classification. This letter discusses algorithms to average sample covariance matrices (SCMs) for the selection of the reference matrix in tangent space mapping (TSM)-based MI-BCI. Tangent space mapping is a powerful method of feature extraction and strongly depends on the selection of a reference covariance matrix. In general, the observed signals may include outliers; therefore, taking the geometric mean of SCMs as the reference matrix may not be the best choice. In order to deal with the effects of outliers, robust estimators have to be used. In particular, we discuss and test the use of geometric medians and trimmed averages (defined on the basis of several metrics) as robust estimators. The main idea behind trimmed averages is to eliminate data that exhibit the largest distance from the average covariance calculated on the basis of all available data. The results of the experiments show that while the geometric medians show little differences from conventional methods in terms of classification accuracy in the classification of electroencephalographic recordings, the trimmed averages show significant improvement for all subjects.

  5. Identification and classification of failure modes in laminated composites by using a multivariate statistical analysis of wavelet coefficients

    NASA Astrophysics Data System (ADS)

    Baccar, D.; Söffker, D.

    2017-11-01

    Acoustic Emission (AE) is a suitable method to monitor the health of composite structures in real-time. However, AE-based failure mode identification and classification are still complex to apply due to the fact that AE waves are generally released simultaneously from all AE-emitting damage sources. Hence, the use of advanced signal processing techniques in combination with pattern recognition approaches is required. In this paper, AE signals generated from laminated carbon fiber reinforced polymer (CFRP) subjected to indentation test are examined and analyzed. A new pattern recognition approach involving a number of processing steps able to be implemented in real-time is developed. Unlike common classification approaches, here only CWT coefficients are extracted as relevant features. Firstly, Continuous Wavelet Transform (CWT) is applied to the AE signals. Furthermore, dimensionality reduction process using Principal Component Analysis (PCA) is carried out on the coefficient matrices. The PCA-based feature distribution is analyzed using Kernel Density Estimation (KDE) allowing the determination of a specific pattern for each fault-specific AE signal. Moreover, waveform and frequency content of AE signals are in depth examined and compared with fundamental assumptions reported in this field. A correlation between the identified patterns and failure modes is achieved. The introduced method improves the damage classification and can be used as a non-destructive evaluation tool.

  6. Risk factors associated with oroantral perforation during surgical removal of maxillary third molar teeth.

    PubMed

    Hasegawa, Takumi; Tachibana, Akira; Takeda, Daisuke; Iwata, Eiji; Arimoto, Satomi; Sakakibara, Akiko; Akashi, Masaya; Komori, Takahide

    2016-12-01

    The relationship between radiographic findings and the occurrence of oroantral perforation is controversial. Few studies have quantitatively analyzed the risk factors contributing to oroantral perforation, and no study has reported multivariate analysis of the relationship(s) between these various factors. This retrospective study aims to fill this void. Various risk factors for oroantral perforation during maxillary third molar extraction were investigated by univariate and multivariate analysis. The proximity of the roots to the maxillary sinus floor (root-sinus [RS] classification) was assessed using panoramic radiography and classified as types 1-5. The relationship between the maxillary second and third molars was classified according to a modified version of the Archer classification. The relative depth of the maxillary third molar in the bone was classified as class A-C, and its angulation relative to the long axis of the second molar was also recorded. Performance of an incision (OR 5.16), mesioangular tooth angulation (OR 6.05), and type 3 RS classification (i.e., significant superimposition of the roots of all posterior maxillary teeth with the sinus floor; OR 10.18) were all identified as risk factors with significant association to an outcome of oroantral perforation. To our knowledge, this is the first multivariate analysis of the risk factors for oroantral perforation during surgical extraction of the maxillary third molar. This RS classification may offer a new predictive parameter for estimating the risk of oroantral perforation.

  7. Classification of Fusarium-Infected Korean Hulled Barley Using Near-Infrared Reflectance Spectroscopy and Partial Least Squares Discriminant Analysis

    PubMed Central

    Lim, Jongguk; Kim, Giyoung; Mo, Changyeun; Oh, Kyoungmin; Yoo, Hyeonchae; Ham, Hyeonheui; Kim, Moon S.

    2017-01-01

    The purpose of this study is to use near-infrared reflectance (NIR) spectroscopy equipment to nondestructively and rapidly discriminate Fusarium-infected hulled barley. Both normal hulled barley and Fusarium-infected hulled barley were scanned by using a NIR spectrometer with a wavelength range of 1175 to 2170 nm. Multiple mathematical pretreatments were applied to the reflectance spectra obtained for Fusarium discrimination and the multivariate analysis method of partial least squares discriminant analysis (PLS-DA) was used for discriminant prediction. The PLS-DA prediction model developed by applying the second-order derivative pretreatment to the reflectance spectra obtained from the side of hulled barley without crease achieved 100% accuracy in discriminating the normal hulled barley and the Fusarium-infected hulled barley. These results demonstrated the feasibility of rapid discrimination of the Fusarium-infected hulled barley by combining multivariate analysis with the NIR spectroscopic technique, which is utilized as a nondestructive detection method. PMID:28974012

  8. Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms.

    PubMed

    Bromuri, Stefano; Zufferey, Damien; Hennebert, Jean; Schumacher, Michael

    2014-10-01

    This research is motivated by the issue of classifying illnesses of chronically ill patients for decision support in clinical settings. Our main objective is to propose multi-label classification of multivariate time series contained in medical records of chronically ill patients, by means of quantization methods, such as bag of words (BoW), and multi-label classification algorithms. Our second objective is to compare supervised dimensionality reduction techniques to state-of-the-art multi-label classification algorithms. The hypothesis is that kernel methods and locality preserving projections make such algorithms good candidates to study multi-label medical time series. We combine BoW and supervised dimensionality reduction algorithms to perform multi-label classification on health records of chronically ill patients. The considered algorithms are compared with state-of-the-art multi-label classifiers in two real world datasets. Portavita dataset contains 525 diabetes type 2 (DT2) patients, with co-morbidities of DT2 such as hypertension, dyslipidemia, and microvascular or macrovascular issues. MIMIC II dataset contains 2635 patients affected by thyroid disease, diabetes mellitus, lipoid metabolism disease, fluid electrolyte disease, hypertensive disease, thrombosis, hypotension, chronic obstructive pulmonary disease (COPD), liver disease and kidney disease. The algorithms are evaluated using multi-label evaluation metrics such as hamming loss, one error, coverage, ranking loss, and average precision. Non-linear dimensionality reduction approaches behave well on medical time series quantized using the BoW algorithm, with results comparable to state-of-the-art multi-label classification algorithms. Chaining the projected features has a positive impact on the performance of the algorithm with respect to pure binary relevance approaches. The evaluation highlights the feasibility of representing medical health records using the BoW for multi-label classification tasks. The study also highlights that dimensionality reduction algorithms based on kernel methods, locality preserving projections or both are good candidates to deal with multi-label classification tasks in medical time series with many missing values and high label density. Copyright © 2014 Elsevier Inc. All rights reserved.

  9. Analysis of Big Data in Gait Biomechanics: Current Trends and Future Directions.

    PubMed

    Phinyomark, Angkoon; Petri, Giovanni; Ibáñez-Marcelo, Esther; Osis, Sean T; Ferber, Reed

    2018-01-01

    The increasing amount of data in biomechanics research has greatly increased the importance of developing advanced multivariate analysis and machine learning techniques, which are better able to handle "big data". Consequently, advances in data science methods will expand the knowledge for testing new hypotheses about biomechanical risk factors associated with walking and running gait-related musculoskeletal injury. This paper begins with a brief introduction to an automated three-dimensional (3D) biomechanical gait data collection system: 3D GAIT, followed by how the studies in the field of gait biomechanics fit the quantities in the 5 V's definition of big data: volume, velocity, variety, veracity, and value. Next, we provide a review of recent research and development in multivariate and machine learning methods-based gait analysis that can be applied to big data analytics. These modern biomechanical gait analysis methods include several main modules such as initial input features, dimensionality reduction (feature selection and extraction), and learning algorithms (classification and clustering). Finally, a promising big data exploration tool called "topological data analysis" and directions for future research are outlined and discussed.

  10. Discrete Fourier Transform-Based Multivariate Image Analysis: Application to Modeling of Aromatase Inhibitory Activity.

    PubMed

    Barigye, Stephen J; Freitas, Matheus P; Ausina, Priscila; Zancan, Patricia; Sola-Penna, Mauro; Castillo-Garit, Juan A

    2018-02-12

    We recently generalized the formerly alignment-dependent multivariate image analysis applied to quantitative structure-activity relationships (MIA-QSAR) method through the application of the discrete Fourier transform (DFT), allowing for its application to noncongruent and structurally diverse chemical compound data sets. Here we report the first practical application of this method in the screening of molecular entities of therapeutic interest, with human aromatase inhibitory activity as the case study. We developed an ensemble classification model based on the two-dimensional (2D) DFT MIA-QSAR descriptors, with which we screened the NCI Diversity Set V (1593 compounds) and obtained 34 chemical compounds with possible aromatase inhibitory activity. These compounds were docked into the aromatase active site, and the 10 most promising compounds were selected for in vitro experimental validation. Of these compounds, 7419 (nonsteroidal) and 89 201 (steroidal) demonstrated satisfactory antiproliferative and aromatase inhibitory activities. The obtained results suggest that the 2D-DFT MIA-QSAR method may be useful in ligand-based virtual screening of new molecular entities of therapeutic utility.

  11. Postcraniometric sex and ancestry estimation in South Africa: a validation study.

    PubMed

    Liebenberg, Leandi; Krüger, Gabriele C; L'Abbé, Ericka N; Stull, Kyra E

    2018-05-24

    With the acceptance of the Daubert criteria as the standards for best practice in forensic anthropological research, more emphasis is being placed on the validation of published methods. Methods, both traditional and novel, need to be validated, adjusted, and refined for optimal performance within forensic anthropological analyses. Recently, a custom postcranial database of modern South Africans was created for use in Fordisc 3.1. Classification accuracies of up to 85% for ancestry estimation and 98% for sex estimation were achieved using a multivariate approach. To measure the external validity and report more realistic performance statistics, an independent sample was tested. The postcrania from 180 black, white, and colored South Africans were measured and classified using the custom postcranial database. A decrease in accuracy was observed for both ancestry estimation (79%) and sex estimation (95%) of the validation sample. When incorporating both sex and ancestry simultaneously, the method achieved 70% accuracy, and 79% accuracy when sex-specific ancestry analyses were run. Classification matrices revealed that postcrania were more likely to misclassify as a result of ancestry rather than sex. While both sex and ancestry influence the size of an individual, sex differences are more marked in the postcranial skeleton and are therefore easier to identify. The external validity of the postcranial database was verified and therefore shown to be a useful tool for forensic casework in South Africa. While the classification rates were slightly lower than the original method, this is expected when a method is generalized.

  12. Chemometrics Methods for Specificity, Authenticity and Traceability Analysis of Olive Oils: Principles, Classifications and Applications.

    PubMed

    Messai, Habib; Farman, Muhammad; Sarraj-Laabidi, Abir; Hammami-Semmar, Asma; Semmar, Nabil

    2016-11-17

    Olive oils (OOs) show high chemical variability due to several factors of genetic, environmental and anthropic types. Genetic and environmental factors are responsible for natural compositions and polymorphic diversification resulting in different varietal patterns and phenotypes. Anthropic factors, however, are at the origin of different blends' preparation leading to normative, labelled or adulterated commercial products. Control of complex OO samples requires their (i) characterization by specific markers; (ii) authentication by fingerprint patterns; and (iii) monitoring by traceability analysis. These quality control and management aims require the use of several multivariate statistical tools: specificity highlighting requires ordination methods; authentication checking calls for classification and pattern recognition methods; traceability analysis implies the use of network-based approaches able to separate or extract mixed information and memorized signals from complex matrices. This chapter presents a review of different chemometrics methods applied for the control of OO variability from metabolic and physical-chemical measured characteristics. The different chemometrics methods are illustrated by different study cases on monovarietal and blended OO originated from different countries. Chemometrics tools offer multiple ways for quantitative evaluations and qualitative control of complex chemical variability of OO in relation to several intrinsic and extrinsic factors.

  13. Development and validation of a casemix classification to predict costs of specialist palliative care provision across inpatient hospice, hospital and community settings in the UK: a study protocol

    PubMed Central

    Guo, Ping; Dzingina, Mendwas; Firth, Alice M; Davies, Joanna M; Douiri, Abdel; O’Brien, Suzanne M; Pinto, Cathryn; Pask, Sophie; Higginson, Irene J; Eagar, Kathy; Murtagh, Fliss E M

    2018-01-01

    Introduction Provision of palliative care is inequitable with wide variations across conditions and settings in the UK. Lack of a standard way to classify by case complexity is one of the principle obstacles to addressing this. We aim to develop and validate a casemix classification to support the prediction of costs of specialist palliative care provision. Methods and analysis Phase I: A cohort study to determine the variables and potential classes to be included in a casemix classification. Data are collected from clinicians in palliative care services across inpatient hospice, hospital and community settings on: patient demographics, potential complexity/casemix criteria and patient-level resource use. Cost predictors are derived using multivariate regression and then incorporated into a classification using classification and regression trees. Internal validation will be conducted by bootstrapping to quantify any optimism in the predictive performance (calibration and discrimination) of the developed classification. Phase II: A mixed-methods cohort study across settings for external validation of the classification developed in phase I. Patient and family caregiver data will be collected longitudinally on demographics, potential complexity/casemix criteria and patient-level resource use. This will be triangulated with data collected from clinicians on potential complexity/casemix criteria and patient-level resource use, and with qualitative interviews with patients and caregivers about care provision across difference settings. The classification will be refined on the basis of its performance in the validation data set. Ethics and dissemination The study has been approved by the National Health Service Health Research Authority Research Ethics Committee. The results are expected to be disseminated in 2018 through papers for publication in major palliative care journals; policy briefs for clinicians, commissioning leads and policy makers; and lay summaries for patients and public. Trial registration number ISRCTN90752212. PMID:29550781

  14. Profiling and classification of French propolis by combined multivariate data analysis of planar chromatograms and scanning direct analysis in real time mass spectra.

    PubMed

    Chasset, Thibaut; Häbe, Tim T; Ristivojevic, Petar; Morlock, Gertrud E

    2016-09-23

    Quality control of propolis is challenging, as it is a complex natural mixture of compounds, and thus, very difficult to analyze and standardize. Shown on the example of 30 French propolis samples, a strategy for an improved quality control was demonstrated in which high-performance thin-layer chromatography (HPTLC) fingerprints were evaluated in combination with selected mass signals obtained by desorption-based scanning mass spectrometry (MS). The French propolis sample extracts were separated by a newly developed reversed phase (RP)-HPTLC method. The fingerprints obtained by two different detection modes, i.e. after (1) derivatization and fluorescence detection (FLD) at UV 366nm and (2) scanning direct analysis in real time (DART)-MS, were analyzed by multivariate data analysis. Thus, RP-HPTLC-FLD and RP-HPTLC-DART-MS fingerprints were explored and the best classification was obtained using both methods in combination with pattern recognition techniques, such as principal component analysis. All investigated French propolis samples were divided in two types and characteristic patterns were observed. Phenolic compounds such as caffeic acid, p-coumaric acid, chrysin, pinobanksin, pinobanksin-3-acetate, galangin, kaempferol, tectochrysin and pinocembrin were identified as characteristic marker compounds of French propolis samples. This study expanded the research on the European poplar type of propolis and confirmed the presence of two botanically different types of propolis, known as the blue and orange types. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.

    PubMed

    Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won

    2016-07-01

    In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.

  16. Textural Analysis and Substrate Classification in the Nearshore Region of Lake Superior Using High-Resolution Multibeam Bathymetry

    NASA Astrophysics Data System (ADS)

    Dennison, Andrew G.

    Classification of the seafloor substrate can be done with a variety of methods. These methods include Visual (dives, drop cameras); mechanical (cores, grab samples); acoustic (statistical analysis of echosounder returns). Acoustic methods offer a more powerful and efficient means of collecting useful information about the bottom type. Due to the nature of an acoustic survey, larger areas can be sampled, and by combining the collected data with visual and mechanical survey methods provide greater confidence in the classification of a mapped region. During a multibeam sonar survey, both bathymetric and backscatter data is collected. It is well documented that the statistical characteristic of a sonar backscatter mosaic is dependent on bottom type. While classifying the bottom-type on the basis on backscatter alone can accurately predict and map bottom-type, i.e a muddy area from a rocky area, it lacks the ability to resolve and capture fine textural details, an important factor in many habitat mapping studies. Statistical processing of high-resolution multibeam data can capture the pertinent details about the bottom-type that are rich in textural information. Further multivariate statistical processing can then isolate characteristic features, and provide the basis for an accurate classification scheme. The development of a new classification method is described here. It is based upon the analysis of textural features in conjunction with ground truth sampling. The processing and classification result of two geologically distinct areas in nearshore regions of Lake Superior; off the Lester River,MN and Amnicon River, WI are presented here, using the Minnesota Supercomputer Institute's Mesabi computing cluster for initial processing. Processed data is then calibrated using ground truth samples to conduct an accuracy assessment of the surveyed areas. From analysis of high-resolution bathymetry data collected at both survey sites is was possible to successfully calculate a series of measures that describe textural information about the lake floor. Further processing suggests that the features calculated capture a significant amount of statistical information about the lake floor terrain as well. Two sources of error, an anomalous heave and refraction error significantly deteriorated the quality of the processed data and resulting validate results. Ground truth samples used to validate the classification methods utilized for both survey sites, however, resulted in accuracy values ranging from 5 -30 percent at the Amnicon River, and between 60-70 percent for the Lester River. The final results suggest that this new processing methodology does adequately capture textural information about the lake floor and does provide an acceptable classification in the absence of significant data quality issues.

  17. Lake bed classification using acoustic data

    USGS Publications Warehouse

    Yin, Karen K.; Li, Xing; Bonde, John; Richards, Carl; Cholwek, Gary

    1998-01-01

    As part of our effort to identify the lake bed surficial substrates using remote sensing data, this work designs pattern classifiers by multivariate statistical methods. Probability distribution of the preprocessed acoustic signal is analyzed first. A confidence region approach is then adopted to improve the design of the existing classifier. A technique for further isolation is proposed which minimizes the expected loss from misclassification. The devices constructed are applicable for real-time lake bed categorization. A mimimax approach is suggested to treat more general cases where the a priori probability distribution of the substrate types is unknown. Comparison of the suggested methods with the traditional likelihood ratio tests is discussed.

  18. Unbiased metabolite profiling by liquid chromatography-quadrupole time-of-flight mass spectrometry and multivariate data analysis for herbal authentication: classification of seven Lonicera species flower buds.

    PubMed

    Gao, Wen; Yang, Hua; Qi, Lian-Wen; Liu, E-Hu; Ren, Mei-Ting; Yan, Yu-Ting; Chen, Jun; Li, Ping

    2012-07-06

    Plant-based medicines become increasingly popular over the world. Authentication of herbal raw materials is important to ensure their safety and efficacy. Some herbs belonging to closely related species but differing in medicinal properties are difficult to be identified because of similar morphological and microscopic characteristics. Chromatographic fingerprinting is an alternative method to distinguish them. Existing approaches do not allow a comprehensive analysis for herbal authentication. We have now developed a strategy consisting of (1) full metabolic profiling of herbal medicines by rapid resolution liquid chromatography (RRLC) combined with quadrupole time-of-flight mass spectrometry (QTOF MS), (2) global analysis of non-targeted compounds by molecular feature extraction algorithm, (3) multivariate statistical analysis for classification and prediction, and (4) marker compounds characterization. This approach has provided a fast and unbiased comparative multivariate analysis of the metabolite composition of 33-batch samples covering seven Lonicera species. Individual metabolic profiles are performed at the level of molecular fragments without prior structural assignment. In the entire set, the obtained classifier for seven Lonicera species flower buds showed good prediction performance and a total of 82 statistically different components were rapidly obtained by the strategy. The elemental compositions of discriminative metabolites were characterized by the accurate mass measurement of the pseudomolecular ions and their chemical types were assigned by the MS/MS spectra. The high-resolution, comprehensive and unbiased strategy for metabolite data analysis presented here is powerful and opens the new direction of authentication in herbal analysis. Copyright © 2012 Elsevier B.V. All rights reserved.

  19. Ensemble support vector machine classification of dementia using structural MRI and mini-mental state examination.

    PubMed

    Sørensen, Lauge; Nielsen, Mads

    2018-05-15

    The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. Multivariate pattern classification reveals autonomic and experiential representations of discrete emotions.

    PubMed

    Kragel, Philip A; Labar, Kevin S

    2013-08-01

    Defining the structural organization of emotions is a central unresolved question in affective science. In particular, the extent to which autonomic nervous system activity signifies distinct affective states remains controversial. Most prior research on this topic has used univariate statistical approaches in attempts to classify emotions from psychophysiological data. In the present study, electrodermal, cardiac, respiratory, and gastric activity, as well as self-report measures were taken from healthy subjects during the experience of fear, anger, sadness, surprise, contentment, and amusement in response to film and music clips. Information pertaining to affective states present in these response patterns was analyzed using multivariate pattern classification techniques. Overall accuracy for classifying distinct affective states was 58.0% for autonomic measures and 88.2% for self-report measures, both of which were significantly above chance. Further, examining the error distribution of classifiers revealed that the dimensions of valence and arousal selectively contributed to decoding emotional states from self-report, whereas a categorical configuration of affective space was evident in both self-report and autonomic measures. Taken together, these findings extend recent multivariate approaches to study emotion and indicate that pattern classification tools may improve upon univariate approaches to reveal the underlying structure of emotional experience and physiological expression. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  1. Multivariate Pattern Classification Reveals Autonomic and Experiential Representations of Discrete Emotions

    PubMed Central

    Kragel, Philip A.; LaBar, Kevin S.

    2013-01-01

    Defining the structural organization of emotions is a central unresolved question in affective science. In particular, the extent to which autonomic nervous system activity signifies distinct affective states remains controversial. Most prior research on this topic has used univariate statistical approaches in attempts to classify emotions from psychophysiological data. In the present study, electrodermal, cardiac, respiratory, and gastric activity, as well as self-report measures were taken from healthy subjects during the experience of fear, anger, sadness, surprise, contentment, and amusement in response to film and music clips. Information pertaining to affective states present in these response patterns was analyzed using multivariate pattern classification techniques. Overall accuracy for classifying distinct affective states was 58.0% for autonomic measures and 88.2% for self-report measures, both of which were significantly above chance. Further, examining the error distribution of classifiers revealed that the dimensions of valence and arousal selectively contributed to decoding emotional states from self-report, whereas a categorical configuration of affective space was evident in both self-report and autonomic measures. Taken together, these findings extend recent multivariate approaches to study emotion and indicate that pattern classification tools may improve upon univariate approaches to reveal the underlying structure of emotional experience and physiological expression. PMID:23527508

  2. Classifying aerosol type using in situ surface spectral aerosol optical properties

    NASA Astrophysics Data System (ADS)

    Schmeisser, Lauren; Andrews, Elisabeth; Ogren, John A.; Sheridan, Patrick; Jefferson, Anne; Sharma, Sangeeta; Kim, Jeong Eun; Sherman, James P.; Sorribas, Mar; Kalapov, Ivo; Arsov, Todor; Angelov, Christo; Mayol-Bracero, Olga L.; Labuschagne, Casper; Kim, Sang-Woo; Hoffer, András; Lin, Neng-Huei; Chia, Hao-Ping; Bergin, Michael; Sun, Junying; Liu, Peng; Wu, Hao

    2017-10-01

    Knowledge of aerosol size and composition is important for determining radiative forcing effects of aerosols, identifying aerosol sources and improving aerosol satellite retrieval algorithms. The ability to extrapolate aerosol size and composition, or type, from intensive aerosol optical properties can help expand the current knowledge of spatiotemporal variability in aerosol type globally, particularly where chemical composition measurements do not exist concurrently with optical property measurements. This study uses medians of the scattering Ångström exponent (SAE), absorption Ångström exponent (AAE) and single scattering albedo (SSA) from 24 stations within the NOAA/ESRL Federated Aerosol Monitoring Network to infer aerosol type using previously published aerosol classification schemes.Three methods are implemented to obtain a best estimate of dominant aerosol type at each station using aerosol optical properties. The first method plots station medians into an AAE vs. SAE plot space, so that a unique combination of intensive properties corresponds with an aerosol type. The second typing method expands on the first by introducing a multivariate cluster analysis, which aims to group stations with similar optical characteristics and thus similar dominant aerosol type. The third and final classification method pairs 3-day backward air mass trajectories with median aerosol optical properties to explore the relationship between trajectory origin (proxy for likely aerosol type) and aerosol intensive parameters, while allowing for multiple dominant aerosol types at each station.The three aerosol classification methods have some common, and thus robust, results. In general, estimating dominant aerosol type using optical properties is best suited for site locations with a stable and homogenous aerosol population, particularly continental polluted (carbonaceous aerosol), marine polluted (carbonaceous aerosol mixed with sea salt) and continental dust/biomass sites (dust and carbonaceous aerosol); however, current classification schemes perform poorly when predicting dominant aerosol type at remote marine and Arctic sites and at stations with more complex locations and topography where variable aerosol populations are not well represented by median optical properties. Although the aerosol classification methods presented here provide new ways to reduce ambiguity in typing schemes, there is more work needed to find aerosol typing methods that are useful for a larger range of geographic locations and aerosol populations.

  3. R-parametrization and its role in classification of linear multivariable feedback systems

    NASA Technical Reports Server (NTRS)

    Chen, Robert T. N.

    1988-01-01

    A classification of all the compensators that stabilize a given general plant in a linear, time-invariant multi-input, multi-output feedback system is developed. This classification, along with the associated necessary and sufficient conditions for stability of the feedback system, is achieved through the introduction of a new parameterization, referred to as R-Parameterization, which is a dual of the familiar Q-Parameterization. The classification is made to the stability conditions of the compensators and the plant by themselves; and necessary and sufficient conditions are based on the stability of Q and R themselves.

  4. Sex estimation standards for medieval and contemporary Croats

    PubMed Central

    Bašić, Željana; Kružić, Ivana; Jerković, Ivan; Anđelinović, Deny; Anđelinović, Šimun

    2017-01-01

    Aim To develop discriminant functions for sex estimation on medieval Croatian population and test their application on contemporary Croatian population. Methods From a total of 519 skeletons, we chose 84 adult excellently preserved skeletons free of antemortem and postmortem changes and took all standard measurements. Sex was estimated/determined using standard anthropological procedures and ancient DNA (amelogenin analysis) where pelvis was insufficiently preserved or where sex morphological indicators were not consistent. We explored which measurements showed sexual dimorphism and used them for developing univariate and multivariate discriminant functions for sex estimation. We included only those functions that reached accuracy rate ≥80%. We tested the applicability of developed functions on modern Croatian sample (n = 37). Results From 69 standard skeletal measurements used in this study, 56 of them showed statistically significant sexual dimorphism (74.7%). We developed five univariate discriminant functions with classification rate 80.6%-85.2% and seven multivariate discriminant functions with an accuracy rate of 81.8%-93.0%. When tested on the modern population functions showed classification rates 74.1%-100%, and ten of them reached aimed accuracy rate. Females showed higher classification rates in the medieval populations, whereas males were better classified in the modern populations. Conclusion Developed discriminant functions are sufficiently accurate for reliable sex estimation in both medieval Croatian population and modern Croatian samples and may be used in forensic settings. The methodological issues that emerged regarding the importance of considering external factors in development and application of discriminant functions for sex estimation should be further explored. PMID:28613039

  5. Multimodal Feature Integration in the Angular Gyrus during Episodic and Semantic Retrieval

    PubMed Central

    Bonnici, Heidi M.; Richter, Franziska R.; Yazar, Yasemin

    2016-01-01

    Much evidence from distinct lines of investigation indicates the involvement of angular gyrus (AnG) in the retrieval of both episodic and semantic information, but the region's precise function and whether that function differs across episodic and semantic retrieval have yet to be determined. We used univariate and multivariate fMRI analysis methods to examine the role of AnG in multimodal feature integration during episodic and semantic retrieval. Human participants completed episodic and semantic memory tasks involving unimodal (auditory or visual) and multimodal (audio-visual) stimuli. Univariate analyses revealed the recruitment of functionally distinct AnG subregions during the retrieval of episodic and semantic information. Consistent with a role in multimodal feature integration during episodic retrieval, significantly greater AnG activity was observed during retrieval of integrated multimodal episodic memories compared with unimodal episodic memories. Multivariate classification analyses revealed that individual multimodal episodic memories could be differentiated in AnG, with classification accuracy tracking the vividness of participants' reported recollections, whereas distinct unimodal memories were represented in sensory association areas only. In contrast to episodic retrieval, AnG was engaged to a statistically equivalent degree during retrieval of unimodal and multimodal semantic memories, suggesting a distinct role for AnG during semantic retrieval. Modality-specific sensory association areas exhibited corresponding activity during both episodic and semantic retrieval, which mirrored the functional specialization of these regions during perception. The results offer new insights into the integrative processes subserved by AnG and its contribution to our subjective experience of remembering. SIGNIFICANCE STATEMENT Using univariate and multivariate fMRI analyses, we provide evidence that functionally distinct subregions of angular gyrus (AnG) contribute to the retrieval of episodic and semantic memories. Our multivariate pattern classifier could distinguish episodic memory representations in AnG according to whether they were multimodal (audio-visual) or unimodal (auditory or visual) in nature, whereas statistically equivalent AnG activity was observed during retrieval of unimodal and multimodal semantic memories. Classification accuracy during episodic retrieval scaled with the trial-by-trial vividness with which participants experienced their recollections. Therefore, the findings offer new insights into the integrative processes subserved by AnG and how its function may contribute to our subjective experience of remembering. PMID:27194327

  6. Multimodal Feature Integration in the Angular Gyrus during Episodic and Semantic Retrieval.

    PubMed

    Bonnici, Heidi M; Richter, Franziska R; Yazar, Yasemin; Simons, Jon S

    2016-05-18

    Much evidence from distinct lines of investigation indicates the involvement of angular gyrus (AnG) in the retrieval of both episodic and semantic information, but the region's precise function and whether that function differs across episodic and semantic retrieval have yet to be determined. We used univariate and multivariate fMRI analysis methods to examine the role of AnG in multimodal feature integration during episodic and semantic retrieval. Human participants completed episodic and semantic memory tasks involving unimodal (auditory or visual) and multimodal (audio-visual) stimuli. Univariate analyses revealed the recruitment of functionally distinct AnG subregions during the retrieval of episodic and semantic information. Consistent with a role in multimodal feature integration during episodic retrieval, significantly greater AnG activity was observed during retrieval of integrated multimodal episodic memories compared with unimodal episodic memories. Multivariate classification analyses revealed that individual multimodal episodic memories could be differentiated in AnG, with classification accuracy tracking the vividness of participants' reported recollections, whereas distinct unimodal memories were represented in sensory association areas only. In contrast to episodic retrieval, AnG was engaged to a statistically equivalent degree during retrieval of unimodal and multimodal semantic memories, suggesting a distinct role for AnG during semantic retrieval. Modality-specific sensory association areas exhibited corresponding activity during both episodic and semantic retrieval, which mirrored the functional specialization of these regions during perception. The results offer new insights into the integrative processes subserved by AnG and its contribution to our subjective experience of remembering. Using univariate and multivariate fMRI analyses, we provide evidence that functionally distinct subregions of angular gyrus (AnG) contribute to the retrieval of episodic and semantic memories. Our multivariate pattern classifier could distinguish episodic memory representations in AnG according to whether they were multimodal (audio-visual) or unimodal (auditory or visual) in nature, whereas statistically equivalent AnG activity was observed during retrieval of unimodal and multimodal semantic memories. Classification accuracy during episodic retrieval scaled with the trial-by-trial vividness with which participants experienced their recollections. Therefore, the findings offer new insights into the integrative processes subserved by AnG and how its function may contribute to our subjective experience of remembering. Copyright © 2016 Bonnici, Richter, et al.

  7. Chemometric classification of morphologically similar Umbelliferae medicinal herbs by DART-TOF-MS fingerprint.

    PubMed

    Lee, Sang Min; Kim, Hye-Jin; Jang, Young Pyo

    2012-01-01

    It needs many years of special training to gain expertise on the organoleptic classification of botanical raw materials and, even for those experts, discrimination among Umbelliferae medicinal herbs remains an intricate challenge due to their morphological similarity. To develop a new chemometric classification method using a direct analysis in real time-time of flight-mass spectrometry (DART-TOF-MS) fingerprinting for Umbelliferae medicinal herbs and to provide a platform for its application to the discrimination of other herbal medicines. Angelica tenuissima, Angelica gigas, Angelica dahurica and Cnidium officinale were chosen for this study and ten samples of each species were purchased from various Korean markets. DART-TOF-MS was employed on powdered raw materials to obtain a chemical fingerprint of each sample and the orthogonal partial-least squares method in discriminant analysis (OPLS-DA) was used for multivariate analysis. All samples of collected species were successfully discriminated from each other according to their characteristic DART-TOF-MS fingerprint. Decursin (or decursinol angelate) and byakangelicol were identified as marker molecules for Angelica gigas and A. dahurica, respectively. Using the OPLS method for discriminant analysis, Angelica tenuissima and Cnidium officinale were clearly separated into two groups. Angelica tenuissima was characterised by the presence of ligustilide and unidentified molecular ions of m/z 239 and 283, while senkyunolide A together with signals with m/z 387 and 389 were the marker compounds for Cnidium officinale. Elaborating with chemoinformatics, DART-TOF-MS fingerprinting with chemoinformatic tools results in a powerful method for the classification of morphologically similar Umbelliferae medicinal herbs and quality control of medicinal herbal products, including the extracts of these crude drugs. Copyright © 2012 John Wiley & Sons, Ltd.

  8. Penalized discriminant analysis for the detection of wild-grown and cultivated Ganoderma lucidum using Fourier transform infrared spectroscopy.

    PubMed

    Zhu, Ying; Tan, Tuck Lee

    2016-04-15

    An effective and simple analytical method using Fourier transform infrared (FTIR) spectroscopy to distinguish wild-grown high-quality Ganoderma lucidum (G. lucidum) from cultivated one is of essential importance for its quality assurance and medicinal value estimation. Commonly used chemical and analytical methods using full spectrum are not so effective for the detection and interpretation due to the complex system of the herbal medicine. In this study, two penalized discriminant analysis models, penalized linear discriminant analysis (PLDA) and elastic net (Elnet),using FTIR spectroscopy have been explored for the purpose of discrimination and interpretation. The classification performances of the two penalized models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The Elnet model involving a combination of L1 and L2 norm penalties enabled an automatic selection of a small number of informative spectral absorption bands and gave an excellent classification accuracy of 99% for discrimination between spectra of wild-grown and cultivated G. lucidum. Its classification performance was superior to that of the PLDA model in a pure L1 setting and outperformed the PCDA and PLSDA models using full wavelength. The well-performed selection of informative spectral features leads to substantial reduction in model complexity and improvement of classification accuracy, and it is particularly helpful for the quantitative interpretations of the major chemical constituents of G. lucidum regarding its anti-cancer effects. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. FTIR microspectroscopy for rapid screening and monitoring of polyunsaturated fatty acid production in commercially valuable marine yeasts and protists.

    PubMed

    Vongsvivut, Jitraporn; Heraud, Philip; Gupta, Adarsha; Puri, Munish; McNaughton, Don; Barrow, Colin J

    2013-10-21

    The increase in polyunsaturated fatty acid (PUFA) consumption has prompted research into alternative resources other than fish oil. In this study, a new approach based on focal-plane-array Fourier transform infrared (FPA-FTIR) microspectroscopy and multivariate data analysis was developed for the characterisation of some marine microorganisms. Cell and lipid compositions in lipid-rich marine yeasts collected from the Australian coast were characterised in comparison to a commercially available PUFA-producing marine fungoid protist, thraustochytrid. Multivariate classification methods provided good discriminative accuracy evidenced from (i) separation of the yeasts from thraustochytrids and distinct spectral clusters among the yeasts that conformed well to their biological identities, and (ii) correct classification of yeasts from a totally independent set using cross-validation testing. The findings further indicated additional capability of the developed FPA-FTIR methodology, when combined with partial least squares regression (PLSR) analysis, for rapid monitoring of lipid production in one of the yeasts during the growth period, which was achieved at a high accuracy compared to the results obtained from the traditional lipid analysis based on gas chromatography. The developed FTIR-based approach when coupled to programmable withdrawal devices and a cytocentrifugation module would have strong potential as a novel online monitoring technology suited for bioprocessing applications and large-scale production.

  10. Extracting galactic structure parameters from multivariated density estimation

    NASA Technical Reports Server (NTRS)

    Chen, B.; Creze, M.; Robin, A.; Bienayme, O.

    1992-01-01

    Multivariate statistical analysis, including includes cluster analysis (unsupervised classification), discriminant analysis (supervised classification) and principle component analysis (dimensionlity reduction method), and nonparameter density estimation have been successfully used to search for meaningful associations in the 5-dimensional space of observables between observed points and the sets of simulated points generated from a synthetic approach of galaxy modelling. These methodologies can be applied as the new tools to obtain information about hidden structure otherwise unrecognizable, and place important constraints on the space distribution of various stellar populations in the Milky Way. In this paper, we concentrate on illustrating how to use nonparameter density estimation to substitute for the true densities in both of the simulating sample and real sample in the five-dimensional space. In order to fit model predicted densities to reality, we derive a set of equations which include n lines (where n is the total number of observed points) and m (where m: the numbers of predefined groups) unknown parameters. A least-square estimation will allow us to determine the density law of different groups and components in the Galaxy. The output from our software, which can be used in many research fields, will also give out the systematic error between the model and the observation by a Bayes rule.

  11. Authentication of whisky due to its botanical origin and way of production by instrumental analysis and multivariate classification methods.

    PubMed

    Wiśniewska, Paulina; Boqué, Ricard; Borràs, Eva; Busto, Olga; Wardencki, Waldemar; Namieśnik, Jacek; Dymerski, Tomasz

    2017-02-15

    Headspace mass-spectrometry (HS-MS), mid infrared (MIR) and UV-vis spectroscopy were used to authenticate whisky samples from different origins and ways of production ((Irish, Spanish, Bourbon, Tennessee Whisky and Scotch). The collected spectra were processed with partial least-squares discriminant analysis (PLS-DA) to build the classification models. In all cases the five groups of whiskies were distinguished, but the best results were obtained by HS-MS, which indicates that the biggest differences between different types of whisky are due to their aroma. Differences were also found inside groups, showing that not only raw material is important to discriminate samples but also the way of their production. The methodology is quick, easy and does not require sample preparation. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Implementation of physicochemical and sensory analysis in conjunction with multivariate analysis towards assessing olive oil authentication/adulteration.

    PubMed

    Arvanitoyannis, Ioannis S; Vlachos, Antonios

    2007-01-01

    The authenticity of products labeled as olive oils, and in particular as virgin olive oils, stands for a very important issue both in terms of its health and commercial aspects. In view of the continuously increasing interest in virgin olive oil therapeutic properties, the traditional methods of characterization and physical and sensory analysis were further enriched with more advanced and sophisticated methods such as HPLC-MS, HPLC-GC/C/IRMS, RPLC-GC, DEPT, and CSIA among others. The results of both traditional and "novel" methods were treated both by means of classical multivariate analysis (cluster, principal component, correspondence, canonical, and discriminant) and artificial intelligence methods showing that nowadays the adulteration of virgin olive oil with seed oil is detectable at very low percentages, sometimes even at less than 1%. Furthermore, the detection of geographical origin of olive oil is equally feasible and much more accurate in countries like Italy and Spain where databases of physical/chemical properties exist. However, this geographical origin classification can also be accomplished in the absence of such databases provided that an adequate number of oil samples are used and the parameters studied have "discriminating power."

  13. Nontargeted, Rapid Screening of Extra Virgin Olive Oil Products for Authenticity Using Near-Infrared Spectroscopy in Combination with Conformity Index and Multivariate Statistical Analyses.

    PubMed

    Karunathilaka, Sanjeewa R; Kia, Ali-Reza Fardin; Srigley, Cynthia; Chung, Jin Kyu; Mossoba, Magdi M

    2016-10-01

    A rapid tool for evaluating authenticity was developed and applied to the screening of extra virgin olive oil (EVOO) retail products by using Fourier-transform near infrared (FT-NIR) spectroscopy in combination with univariate and multivariate data analysis methods. Using disposable glass tubes, spectra for 62 reference EVOO, 10 edible oil adulterants, 20 blends consisting of EVOO spiked with adulterants, 88 retail EVOO products and other test samples were rapidly measured in the transmission mode without any sample preparation. The univariate conformity index (CI) and the multivariate supervised soft independent modeling of class analogy (SIMCA) classification tool were used to analyze the various olive oil products which were tested for authenticity against a library of reference EVOO. Better discrimination between the authentic EVOO and some commercial EVOO products was observed with SIMCA than with CI analysis. Approximately 61% of all EVOO commercial products were flagged by SIMCA analysis, suggesting that further analysis be performed to identify quality issues and/or potential adulterants. Due to its simplicity and speed, FT-NIR spectroscopy in combination with multivariate data analysis can be used as a complementary tool to conventional official methods of analysis to rapidly flag EVOO products that may not belong to the class of authentic EVOO. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.

  14. Classification bias in commercial business lists for retail food stores in the U.S.

    PubMed Central

    2012-01-01

    Background Aspects of the food environment such as the availability of different types of food stores have recently emerged as key modifiable factors that may contribute to the increased prevalence of obesity. Given that many of these studies have derived their results based on secondary datasets and the relationship of food stores with individual weight outcomes has been reported to vary by store type, it is important to understand the extent to which often-used secondary data correctly classify food stores. We evaluated the classification bias of food stores in Dun & Bradstreet (D&B) and InfoUSA commercial business lists. Methods We performed a full census in 274 randomly selected census tracts in the Chicago metropolitan area and collected detailed store attributes inside stores for classification. Store attributes were compared by classification match status and store type. Systematic classification bias by census tract characteristics was assessed in multivariate regression. Results D&B had a higher classification match rate than InfoUSA for supermarkets and grocery stores, while InfoUSA was higher for convenience stores. Both lists were more likely to correctly classify large supermarkets, grocery stores, and convenience stores with more cash registers and different types of service counters (supermarkets and grocery stores only). The likelihood of a correct classification match for supermarkets and grocery stores did not vary systemically by tract characteristics whereas convenience stores were more likely to be misclassified in predominately Black tracts. Conclusion Researches can rely on classification of food stores in commercial datasets for supermarkets and grocery stores whereas classifications for convenience and specialty food stores are subject to some systematic bias by neighborhood racial/ethnic composition. PMID:22512874

  15. Multivariate Analysis As a Support for Diagnostic Flowcharts in Allergic Bronchopulmonary Aspergillosis: A Proof-of-Concept Study.

    PubMed

    Vitte, Joana; Ranque, Stéphane; Carsin, Ania; Gomez, Carine; Romain, Thomas; Cassagne, Carole; Gouitaa, Marion; Baravalle-Einaudi, Mélisande; Bel, Nathalie Stremler-Le; Reynaud-Gaubert, Martine; Dubus, Jean-Christophe; Mège, Jean-Louis; Gaudart, Jean

    2017-01-01

    Molecular-based allergy diagnosis yields multiple biomarker datasets. The classical diagnostic score for allergic bronchopulmonary aspergillosis (ABPA), a severe disease usually occurring in asthmatic patients and people with cystic fibrosis, comprises succinct immunological criteria formulated in 1977: total IgE, anti- Aspergillus fumigatus ( Af ) IgE, anti- Af "precipitins," and anti- Af IgG. Progress achieved over the last four decades led to multiple IgE and IgG(4) Af biomarkers available with quantitative, standardized, molecular-level reports. These newly available biomarkers have not been included in the current diagnostic criteria, either individually or in algorithms, despite persistent underdiagnosis of ABPA. Large numbers of individual biomarkers may hinder their use in clinical practice. Conversely, multivariate analysis using new tools may bring about a better chance of less diagnostic mistakes. We report here a proof-of-concept work consisting of a three-step multivariate analysis of Af IgE, IgG, and IgG4 biomarkers through a combination of principal component analysis, hierarchical ascendant classification, and classification and regression tree multivariate analysis. The resulting diagnostic algorithms might show the way for novel criteria and improved diagnostic efficiency in Af -sensitized patients at risk for ABPA.

  16. Multivariate pattern analysis of MEG and EEG: A comparison of representational structure in time and space.

    PubMed

    Cichy, Radoslaw Martin; Pantazis, Dimitrios

    2017-09-01

    Multivariate pattern analysis of magnetoencephalography (MEG) and electroencephalography (EEG) data can reveal the rapid neural dynamics underlying cognition. However, MEG and EEG have systematic differences in sampling neural activity. This poses the question to which degree such measurement differences consistently bias the results of multivariate analysis applied to MEG and EEG activation patterns. To investigate, we conducted a concurrent MEG/EEG study while participants viewed images of everyday objects. We applied multivariate classification analyses to MEG and EEG data, and compared the resulting time courses to each other, and to fMRI data for an independent evaluation in space. We found that both MEG and EEG revealed the millisecond spatio-temporal dynamics of visual processing with largely equivalent results. Beyond yielding convergent results, we found that MEG and EEG also captured partly unique aspects of visual representations. Those unique components emerged earlier in time for MEG than for EEG. Identifying the sources of those unique components with fMRI, we found the locus for both MEG and EEG in high-level visual cortex, and in addition for MEG in low-level visual cortex. Together, our results show that multivariate analyses of MEG and EEG data offer a convergent and complimentary view on neural processing, and motivate the wider adoption of these methods in both MEG and EEG research. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. A neuromorphic network for generic multivariate data classification

    PubMed Central

    Schmuker, Michael; Pfeil, Thomas; Nawrot, Martin Paul

    2014-01-01

    Computational neuroscience has uncovered a number of computational principles used by nervous systems. At the same time, neuromorphic hardware has matured to a state where fast silicon implementations of complex neural networks have become feasible. En route to future technical applications of neuromorphic computing the current challenge lies in the identification and implementation of functional brain algorithms. Taking inspiration from the olfactory system of insects, we constructed a spiking neural network for the classification of multivariate data, a common problem in signal and data analysis. In this model, real-valued multivariate data are converted into spike trains using “virtual receptors” (VRs). Their output is processed by lateral inhibition and drives a winner-take-all circuit that supports supervised learning. VRs are conveniently implemented in software, whereas the lateral inhibition and classification stages run on accelerated neuromorphic hardware. When trained and tested on real-world datasets, we find that the classification performance is on par with a naïve Bayes classifier. An analysis of the network dynamics shows that stable decisions in output neuron populations are reached within less than 100 ms of biological time, matching the time-to-decision reported for the insect nervous system. Through leveraging a population code, the network tolerates the variability of neuronal transfer functions and trial-to-trial variation that is inevitably present on the hardware system. Our work provides a proof of principle for the successful implementation of a functional spiking neural network on a configurable neuromorphic hardware system that can readily be applied to real-world computing problems. PMID:24469794

  18. Seizure classification in EEG signals utilizing Hilbert-Huang transform

    PubMed Central

    2011-01-01

    Background Classification method capable of recognizing abnormal activities of the brain functionality are either brain imaging or brain signal analysis. The abnormal activity of interest in this study is characterized by a disturbance caused by changes in neuronal electrochemical activity that results in abnormal synchronous discharges. The method aims at helping physicians discriminate between healthy and seizure electroencephalographic (EEG) signals. Method Discrimination in this work is achieved by analyzing EEG signals obtained from freely accessible databases. MATLAB has been used to implement and test the proposed classification algorithm. The analysis in question presents a classification of normal and ictal activities using a feature relied on Hilbert-Huang Transform. Through this method, information related to the intrinsic functions contained in the EEG signal has been extracted to track the local amplitude and the frequency of the signal. Based on this local information, weighted frequencies are calculated and a comparison between ictal and seizure-free determinant intrinsic functions is then performed. Methods of comparison used are the t-test and the Euclidean clustering. Results The t-test results in a P-value < 0.02 and the clustering leads to accurate (94%) and specific (96%) results. The proposed method is also contrasted against the Multivariate Empirical Mode Decomposition that reaches 80% accuracy. Comparison results strengthen the contribution of this paper not only from the accuracy point of view but also with respect to its fast response and ease to use. Conclusion An original tool for EEG signal processing giving physicians the possibility to diagnose brain functionality abnormalities is presented in this paper. The proposed system bears the potential of providing several credible benefits such as fast diagnosis, high accuracy, good sensitivity and specificity, time saving and user friendly. Furthermore, the classification of mode mixing can be achieved using the extracted instantaneous information of every IMF, but it would be most likely a hard task if only the average value is used. Extra benefits of this proposed system include low cost, and ease of interface. All of that indicate the usefulness of the tool and its use as an efficient diagnostic tool. PMID:21609459

  19. Chemometrics Methods for Specificity, Authenticity and Traceability Analysis of Olive Oils: Principles, Classifications and Applications

    PubMed Central

    Messai, Habib; Farman, Muhammad; Sarraj-Laabidi, Abir; Hammami-Semmar, Asma; Semmar, Nabil

    2016-01-01

    Background. Olive oils (OOs) show high chemical variability due to several factors of genetic, environmental and anthropic types. Genetic and environmental factors are responsible for natural compositions and polymorphic diversification resulting in different varietal patterns and phenotypes. Anthropic factors, however, are at the origin of different blends’ preparation leading to normative, labelled or adulterated commercial products. Control of complex OO samples requires their (i) characterization by specific markers; (ii) authentication by fingerprint patterns; and (iii) monitoring by traceability analysis. Methods. These quality control and management aims require the use of several multivariate statistical tools: specificity highlighting requires ordination methods; authentication checking calls for classification and pattern recognition methods; traceability analysis implies the use of network-based approaches able to separate or extract mixed information and memorized signals from complex matrices. Results. This chapter presents a review of different chemometrics methods applied for the control of OO variability from metabolic and physical-chemical measured characteristics. The different chemometrics methods are illustrated by different study cases on monovarietal and blended OO originated from different countries. Conclusion. Chemometrics tools offer multiple ways for quantitative evaluations and qualitative control of complex chemical variability of OO in relation to several intrinsic and extrinsic factors. PMID:28231172

  20. Metabolite profiling in retinoblastoma identifies novel clinicopathological subgroups

    PubMed Central

    Kohe, Sarah; Brundler, Marie-Anne; Jenkinson, Helen; Parulekar, Manoj; Wilson, Martin; Peet, Andrew C; McConville, Carmel M

    2015-01-01

    Background: Tumour classification, based on histopathology or molecular pathology, is of value to predict tumour behaviour and to select appropriate treatment. In retinoblastoma, pathology information is not available at diagnosis and only exists for enucleated tumours. Alternative methods of tumour classification, using noninvasive techniques such as magnetic resonance spectroscopy, are urgently required to guide treatment decisions at the time of diagnosis. Methods: High-resolution magic-angle spinning magnetic resonance spectroscopy (HR-MAS MRS) was undertaken on enucleated retinoblastomas. Principal component analysis and cluster analysis of the HR-MAS MRS data was used to identify tumour subgroups. Individual metabolite concentrations were determined and were correlated with histopathological risk factors for each group. Results: Multivariate analysis identified three metabolic subgroups of retinoblastoma, with the most discriminatory metabolites being taurine, hypotaurine, total-choline and creatine. Metabolite concentrations correlated with specific histopathological features: taurine was correlated with differentiation, total-choline and phosphocholine with retrolaminar optic nerve invasion, and total lipids with necrosis. Conclusions: We have demonstrated that a metabolite-based classification of retinoblastoma can be obtained using ex vivo magnetic resonance spectroscopy, and that the subgroups identified correlate with histopathological features. This result justifies future studies to validate the clinical relevance of these subgroups and highlights the potential of in vivo MRS as a noninvasive diagnostic tool for retinoblastoma patient stratification. PMID:26348444

  1. Robust diagnosis of non-Hodgkin lymphoma phenotypes validated on gene expression data from different laboratories.

    PubMed

    Bhanot, Gyan; Alexe, Gabriela; Levine, Arnold J; Stolovitzky, Gustavo

    2005-01-01

    A major challenge in cancer diagnosis from microarray data is the need for robust, accurate, classification models which are independent of the analysis techniques used and can combine data from different laboratories. We propose such a classification scheme originally developed for phenotype identification from mass spectrometry data. The method uses a robust multivariate gene selection procedure and combines the results of several machine learning tools trained on raw and pattern data to produce an accurate meta-classifier. We illustrate and validate our method by applying it to gene expression datasets: the oligonucleotide HuGeneFL microarray dataset of Shipp et al. (www.genome.wi.mit.du/MPR/lymphoma) and the Hu95Av2 Affymetrix dataset (DallaFavera's laboratory, Columbia University). Our pattern-based meta-classification technique achieves higher predictive accuracies than each of the individual classifiers , is robust against data perturbations and provides subsets of related predictive genes. Our techniques predict that combinations of some genes in the p53 pathway are highly predictive of phenotype. In particular, we find that in 80% of DLBCL cases the mRNA level of at least one of the three genes p53, PLK1 and CDK2 is elevated, while in 80% of FL cases, the mRNA level of at most one of them is elevated.

  2. The classification of anxiety and hysterical states. Part I. Historical review and empirical delineation.

    PubMed

    Sheehan, D V; Sheehan, K H

    1982-08-01

    The history of the classification of anxiety, hysterical, and hypochondriacal disorders is reviewed. Problems in the ability of current classification schemes to predict, control, and describe the relationship between the symptoms and other phenomena are outlined. Existing classification schemes failed the first test of a good classification model--that of providing categories that are mutually exclusive. The independence of these diagnostic categories from each other does not appear to hold up on empirical testing. In the absence of inherently mutually exclusive categories, further empirical investigation of these classes is obstructed since statistically valid analysis of the nominal data and any useful multivariate analysis would be difficult if not impossible. It is concluded that the existing classifications are unsatisfactory and require some fundamental reconceptualization.

  3. Serum C-reactive protein level in COPD patients stratified according to GOLD 2011 grading classification

    PubMed Central

    Lin, Yi-Hua; Wang, Wan-Yu; Hu, Su-Xian; Shi, Yong-Hong

    2016-01-01

    Background and Objective: The Global Initiative for Chronic Obstructive Lung Disease (GOLD) 2011 grading classification has been used to evaluate the severity of patients with chronic obstructive pulmonary disease (COPD). However, little is known about the relationship between the systemic inflammation and this classification. We aimed to study the relationship between serum CRP and the components of the GOLD 2011 grading classification. Methods: C-reactive protein (CRP) levels were measured in 391 clinically stable COPD patients and in 50 controls from June 2, 2015 to October 31, 2015 in the First Affiliated Hospital of Xiamen University. The association between CRP levels and the components of the GOLD 2011 grading classification were assessed. Results: Correlation was found with the following variables: GOLD 2011 group (0.240), age (0.227), pack year (0.136), forced expiratory volume in one second % predicted (FEV1%; -0.267), forced vital capacity % predicted (-0.210), number of acute exacerbations in the past year (0.265), number of hospitalized exacerbations in the past year (0.165), British medical Research Council dyspnoea scale (0.121), COPD assessment test score (CAT, 0.233). Using multivariate analysis, FEV1% and CAT score manifested the strongest negative association with CRP levels. Conclusions: CRP levels differ in COPD patients among groups A-D based on GOLD 2011 grading classification. CRP levels are associated with several important clinical variables, of which FEV1% and CAT score manifested the strongest negative correlation. PMID:28083044

  4. Design of Embedded System for Multivariate Classification of Finger and Thumb Movements Using EEG Signals for Control of Upper Limb Prosthesis.

    PubMed

    Rashid, Nasir; Iqbal, Javaid; Javed, Amna; Tiwana, Mohsin I; Khan, Umar Shahbaz

    2018-01-01

    Brain Computer Interface (BCI) determines the intent of the user from a variety of electrophysiological signals. These signals, Slow Cortical Potentials, are recorded from scalp, and cortical neuronal activity is recorded by implanted electrodes. This paper is focused on design of an embedded system that is used to control the finger movements of an upper limb prosthesis using Electroencephalogram (EEG) signals. This is a follow-up of our previous research which explored the best method to classify three movements of fingers (thumb movement, index finger movement, and first movement). Two-stage logistic regression classifier exhibited the highest classification accuracy while Power Spectral Density (PSD) was used as a feature of the filtered signal. The EEG signal data set was recorded using a 14-channel electrode headset (a noninvasive BCI system) from right-handed, neurologically intact volunteers. Mu (commonly known as alpha waves) and Beta Rhythms (8-30 Hz) containing most of the movement data were retained through filtering using "Arduino Uno" microcontroller followed by 2-stage logistic regression to obtain a mean classification accuracy of 70%.

  5. Quality classification of Spanish olive oils by untargeted gas chromatography coupled to hybrid quadrupole-time of flight mass spectrometry with atmospheric pressure chemical ionization and metabolomics-based statistical approach.

    PubMed

    Sales, C; Cervera, M I; Gil, R; Portolés, T; Pitarch, E; Beltran, J

    2017-02-01

    The novel atmospheric pressure chemical ionization (APCI) source has been used in combination with gas chromatography (GC) coupled to hybrid quadrupole time-of-flight (QTOF) mass spectrometry (MS) for determination of volatile components of olive oil, enhancing its potential for classification of olive oil samples according to their quality using a metabolomics-based approach. The full-spectrum acquisition has allowed the detection of volatile organic compounds (VOCs) in olive oil samples, including Extra Virgin, Virgin and Lampante qualities. A dynamic headspace extraction with cartridge solvent elution was applied. The metabolomics strategy consisted of three different steps: a full mass spectral alignment of GC-MS data using MzMine 2.0, a multivariate analysis using Ez-Info and the creation of the statistical model with combinations of responses for molecular fragments. The model was finally validated using blind samples, obtaining an accuracy in oil classification of 70%, taking the official established method, "PANEL TEST", as reference. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. Application of a Novel S3 Nanowire Gas Sensor Device in Parallel with GC-MS for the Identification of Rind Percentage of Grated Parmigiano Reggiano.

    PubMed

    Abbatangelo, Marco; Núñez-Carmona, Estefanía; Sberveglieri, Veronica; Zappa, Dario; Comini, Elisabetta; Sberveglieri, Giorgio

    2018-05-18

    Parmigiano Reggiano cheese is one of the most appreciated and consumed foods worldwide, especially in Italy, for its high content of nutrients and taste. However, these characteristics make this product subject to counterfeiting in different forms. In this study, a novel method based on an electronic nose has been developed to investigate the potentiality of this tool to distinguish rind percentages in grated Parmigiano Reggiano packages that should be lower than 18%. Different samples, in terms of percentage, seasoning and rind working process, were considered to tackle the problem at 360°. In parallel, GC-MS technique was used to give a name to the compounds that characterize Parmigiano and to relate them to sensors responses. Data analysis consisted of two stages: Multivariate analysis (PLS) and classification made in a hierarchical way with PLS-DA ad ANNs. Results were promising, in terms of correct classification of the samples. The correct classification rate (%) was higher for ANNs than PLS-DA, with correct identification approaching 100 percent.

  7. A Hybrid Semi-supervised Classification Scheme for Mining Multisource Geospatial Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vatsavai, Raju; Bhaduri, Budhendra L

    2011-01-01

    Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of large number of accurate training samples (10 to 30 |dimensions|), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, itmore » is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately there is no convenient multivariate statistical model that can be employed for mulitsource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on real datasets, and our new hybrid approach shows over 25 to 35% improvement in overall classification accuracy over conventional classification schemes.« less

  8. Incidence of cervical lymph node metastasis and its association with outcomes in patients with adenoid cystic carcinoma. An international collaborative study

    PubMed Central

    Amit, Moran; Binenbaum, Yoav; Sharma, Kanika; Ramer, Naomi; Ramer, Ilana; Agbetoba, Abib; Glick, Joelle; Yang, Xinjie; Lei, Delin; Bjørndal, Kristine; Godballe, Christian; Mücke, Thomas; Wolff, Klaus-Dietrich; Fliss, Dan; Eckardt, André M.; Copelli, Chiara; Sesenna, Enrico; Palmer, Frank; Ganly, Ian; Patel, Snehal; Gil, Ziv

    2016-01-01

    Background The patterns of regional metastasis in adenoid cystic carcinoma (ACC) of the head and neck and its association with outcome is not established. Methods We conducted a retrospective multicentered multivariate analysis of 270 patients who underwent neck dissection. Results The incidence rate of neck metastases was 29%. The rate observed in the oral cavity is 37%, and in the major salivary glands is 19% (p = .001). The rate of occult nodal metastases was 17%. Overall 5-year survival rates were 44% in patients undergoing therapeutic neck dissections, and 65% and 73% among those undergoing elective neck dissections, with and without nodal metastases, respectively (p = .017). Multivariate analysis revealed that the primary site, nodal classification, and margin status were independent predictors of survival. Conclusion Our findings support the consideration of elective neck treatment in patients with ACC of the oral cavity. PMID:25060927

  9. Elemental analysis of soils using laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) and laser-induced breakdown spectroscopy (LIBS) with multivariate discrimination: tape mounting as an alternative to pellets for small forensic transfer specimens.

    PubMed

    Jantzi, Sarah C; Almirall, José R

    2014-01-01

    Elemental analysis of soil is a useful application of both laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) and laser-induced breakdown spectroscopy (LIBS) in geological, agricultural, environmental, archeological, planetary, and forensic sciences. In forensic science, the question to be answered is often whether soil specimens found on objects (e.g., shoes, tires, or tools) originated from the crime scene or other location of interest. Elemental analysis of the soil from the object and the locations of interest results in a characteristic elemental profile of each specimen, consisting of the amount of each element present. Because multiple elements are measured, multivariate statistics can be used to compare the elemental profiles in order to determine whether the specimen from the object is similar to one of the locations of interest. Previous work involved milling and pressing 0.5 g of soil into pellets before analysis using LA-ICP-MS and LIBS. However, forensic examiners prefer techniques that require smaller samples, are less time consuming, and are less destructive, allowing for future analysis by other techniques. An alternative sample introduction method was developed to meet these needs while still providing quantitative results suitable for multivariate comparisons. The tape-mounting method involved deposition of a thin layer of soil onto double-sided adhesive tape. A comparison of tape-mounting and pellet method performance is reported for both LA-ICP-MS and LIBS. Calibration standards and reference materials, prepared using the tape method, were analyzed by LA-ICP-MS and LIBS. As with the pellet method, linear calibration curves were achieved with the tape method, as well as good precision and low bias. Soil specimens from Miami-Dade County were prepared by both the pellet and tape methods and analyzed by LA-ICP-MS and LIBS. Principal components analysis and linear discriminant analysis were applied to the multivariate data. Results from both the tape method and the pellet method were nearly identical, with clear groupings and correct classification rates of >94%.

  10. A multivariate pattern analysis study of the HIV-related white matter anatomical structural connections alterations

    NASA Astrophysics Data System (ADS)

    Tang, Zhenchao; Liu, Zhenyu; Li, Ruili; Cui, Xinwei; Li, Hongjun; Dong, Enqing; Tian, Jie

    2017-03-01

    It's widely known that HIV infection would cause white matter integrity impairments. Nevertheless, it is still unclear that how the white matter anatomical structural connections are affected by HIV infection. In the current study, we employed a multivariate pattern analysis to explore the HIV-related white matter connections alterations. Forty antiretroviraltherapy- naïve HIV patients and thirty healthy controls were enrolled. Firstly, an Automatic Anatomical Label (AAL) atlas based white matter structural network, a 90 × 90 FA-weighted matrix, was constructed for each subject. Then, the white matter connections deprived from the structural network were entered into a lasso-logistic regression model to perform HIV-control group classification. Using leave one out cross validation, a classification accuracy (ACC) of 90% (P=0.002) and areas under the receiver operating characteristic curve (AUC) of 0.96 was obtained by the classification model. This result indicated that the white matter anatomical structural connections contributed greatly to HIV-control group classification, providing solid evidence that the white matter connections were affected by HIV infection. Specially, 11 white matter connections were selected in the classification model, mainly crossing the regions of frontal lobe, Cingulum, Hippocampus, and Thalamus, which were reported to be damaged in previous HIV studies. This might suggest that the white matter connections adjacent to the HIV-related impaired regions were prone to be damaged.

  11. A multivariate analytical method to characterize sediment attributes from high-frequency acoustic backscatter and ground-truthing data (Jade Bay, German North Sea coast)

    NASA Astrophysics Data System (ADS)

    Biondo, Manuela; Bartholomä, Alexander

    2017-04-01

    One of the burning issues on the topic of acoustic seabed classification is the lack of solid, repeatable, statistical procedures that can support the verification of acoustic variability in relation to seabed properties. Acoustic sediment classification schemes often lead to biased and subjective interpretation, as they ultimately aim at an oversimplified categorization of the seabed based on conventionally defined sediment types. However, grain size variability alone cannot be accounted for acoustic diversity, which will be ultimately affected by multiple physical processes, scale of heterogeneity, instrument settings, data quality, image processing and segmentation performances. Understanding and assessing the weight of all of these factors on backscatter is a difficult task, due to the spatially limited and fragmentary knowledge of the seabed from of direct observations (e.g. grab samples, cores, videos). In particular, large-scale mapping requires an enormous availability of ground-truthing data that is often obtained from heterogeneous and multidisciplinary sources, resulting into a further chance of misclassification. Independently from all of these limitations, acoustic segments still contain signals for seabed changes that, if appropriate procedures are established, can be translated into meaningful knowledge. In this study we design a simple, repeatable method, based on multivariate procedures, with the scope to classify a 100 km2, high-frequency (450 kHz) sidescan sonar mosaic acquired in the year 2012 in the shallow upper-mesotidal inlet of the Jade Bay (German North Sea coast). The tool used for the automated classification of the backscatter mosaic is the QTC SWATHVIEWTMsoftware. The ground-truthing database included grab sample data from multiple sources (2009-2011). The method was designed to extrapolate quantitative descriptors for acoustic backscatter and model their spatial changes in relation to grain size distribution and morphology. The modelled relationships were used to: 1) asses the automated segmentation performance, 2) obtain a ranking of most discriminant seabed attributes responsible for acoustic diversity, 3) select the best-fit ground-truthing information to characterize each acoustic class. Using a supervised Linear Discriminant Analysis (LDA), relationships between seabed parameters and acoustic classes discrimination were modelled, and acoustic classes for each data point were predicted. The model predicted a success rate of 63.5%. An unsupervised LDA was used to model relationships between acoustic variables and clustered seabed categories with the scope of identifying misrepresentative ground-truthing data points. The model prediction scored a success rate of 50.8%. Misclassified data points were disregarded for final classification. Analyses led to clearer, more accurate appreciation of relationship patterns and improved understanding of site-specific processes affecting the acoustic signal. Value to the qualitative classification output was added by comparing the latter with a more recent set of acoustic and ground-truthing information (2014). Classification resulted in the first acoustic sediment map ever produced in the area and offered valuable knowledge for detailed sediment variability. The method proved to be a simple, repeatable strategy that may be applied to similar work and environments.

  12. FDG-PET and CSF biomarker accuracy in prediction of conversion to different dementias in a large multicentre MCI cohort.

    PubMed

    Caminiti, Silvia Paola; Ballarini, Tommaso; Sala, Arianna; Cerami, Chiara; Presotto, Luca; Santangelo, Roberto; Fallanca, Federico; Vanoli, Emilia Giovanna; Gianolli, Luigi; Iannaccone, Sandro; Magnani, Giuseppe; Perani, Daniela

    2018-01-01

    In this multicentre study in clinical settings, we assessed the accuracy of optimized procedures for FDG-PET brain metabolism and CSF classifications in predicting or excluding the conversion to Alzheimer's disease (AD) dementia and non-AD dementias. We included 80 MCI subjects with neurological and neuropsychological assessments, FDG-PET scan and CSF measures at entry, all with clinical follow-up. FDG-PET data were analysed with a validated voxel-based SPM method. Resulting single-subject SPM maps were classified by five imaging experts according to the disease-specific patterns, as "typical-AD", "atypical-AD" (i.e. posterior cortical atrophy, asymmetric logopenic AD variant, frontal-AD variant), "non-AD" (i.e. behavioural variant FTD, corticobasal degeneration, semantic variant FTD; dementia with Lewy bodies) or "negative" patterns. To perform the statistical analyses, the individual patterns were grouped either as "AD dementia vs. non-AD dementia (all diseases)" or as "FTD vs. non-FTD (all diseases)". Aβ42, total and phosphorylated Tau CSF-levels were classified dichotomously, and using the Erlangen Score algorithm. Multivariate logistic models tested the prognostic accuracy of FDG-PET-SPM and CSF dichotomous classifications. Accuracy of Erlangen score and Erlangen Score aided by FDG-PET SPM classification was evaluated. The multivariate logistic model identified FDG-PET "AD" SPM classification (Expβ = 19.35, 95% C.I. 4.8-77.8, p < 0.001) and CSF Aβ42 (Expβ = 6.5, 95% C.I. 1.64-25.43, p < 0.05) as the best predictors of conversion from MCI to AD dementia. The "FTD" SPM pattern significantly predicted conversion to FTD dementias at follow-up (Expβ = 14, 95% C.I. 3.1-63, p < 0.001). Overall, FDG-PET-SPM classification was the most accurate biomarker, able to correctly differentiate either the MCI subjects who converted to AD or FTD dementias, and those who remained stable or reverted to normal cognition (Expβ = 17.9, 95% C.I. 4.55-70.46, p < 0.001). Our results support the relevant role of FDG-PET-SPM classification in predicting progression to different dementia conditions in prodromal MCI phase, and in the exclusion of progression, outperforming CSF biomarkers.

  13. Agricultural Land Cover from Multitemporal C-Band SAR Data

    NASA Astrophysics Data System (ADS)

    Skriver, H.

    2013-12-01

    Henning Skriver DTU Space, Technical University of Denmark Ørsteds Plads, Building 348, DK-2800 Lyngby e-mail: hs@space.dtu.dk Problem description This paper focuses on land cover type from SAR data using high revisit acquisitions, including single and dual polarisation and fully polarimetric data, at C-band. The data set were acquired during an ESA-supported campaign, AgriSAR09, with the Radarsat-2 system. Ground surveys to obtain detailed land cover maps were performed during the campaign. Classification methods using single- and dual-polarisation data, and fully polarimetric data are used with multitemporal data with short revisit time. Results for airborne campaigns have previously been reported in Skriver et al. (2011) and Skriver (2012). In this paper, the short revisit satellite SAR data will be used to assess the trade-off between polarimetric SAR data and data as single or dual polarisation SAR data. This is particularly important in relation to the future GMES Sentinel-1 SAR satellites, where two satellites with a relatively wide swath will ensure a short revisit time globally. Questions dealt with are: which accuracy can we expect from a mission like the Sentinel-1, what is the improvement of using polarimetric SAR compared to single or dual polarisation SAR, and what is the optimum number of acquisitions needed. Methodology The data have sufficient number of looks for the Gaussian assumption to be valid for the backscatter coefficients for the individual polarizations. The classification method used for these data is therefore the standard Bayesian classification method for multivariate Gaussian statistics. For the full-polarimetric cases two classification methods have been applied, the standard ML Wishart classifier, and a method based on a reversible transform of the covariance matrix into backscatter intensities. The following pre-processing steps were performed on both data sets: The scattering matrix data in the form of SLC products were coregistered, converted to covariance matrix format and multilooked to a specific equivalent number of looks. Results The multitemporal data improve significantly the classification results, and single acquisition data cannot provide the necessary classification performance. The multitemporal data are especially important for the single and dual polarization data, but less important for the fully polarimetric data. The satellite data set produces realistic classification results based on about 2000 fields. The best classification results for the single-polarized mode provide classification errors in the mid-twenties. Using the dual-polarized mode reduces the classification error with about 5 percentage points, whereas the polarimetric mode reduces it with about 10 percentage points. These results show, that it will be possible to obtain reasonable results with relatively simple systems with short revisit time. This very important result shows that systems like the Sentinel-1 mission will be able to produce fairly good results for global land cover classification. References Skriver, H. et al., 2011, 'Crop Classification using Short-Revisit Multitemporal SAR Data', IEEE J. Sel. Topics in Appl. Earth Obs. Rem. Sens., vol. 4, pp. 423-431. Skriver, H., 2012, 'Crop classification by multitemporal C- and L-band single- and dual-polarization and fully polarimetric SAR', IEEE Trans. Geosc. Rem. Sens., vol. 50, pp. 2138-2149.

  14. Single trial classification for the categories of perceived emotional facial expressions: an event-related fMRI study

    NASA Astrophysics Data System (ADS)

    Song, Sutao; Huang, Yuxia; Long, Zhiying; Zhang, Jiacai; Chen, Gongxiang; Wang, Shuqing

    2016-03-01

    Recently, several studies have successfully applied multivariate pattern analysis methods to predict the categories of emotions. These studies are mainly focused on self-experienced emotions, such as the emotional states elicited by music or movie. In fact, most of our social interactions involve perception of emotional information from the expressions of other people, and it is an important basic skill for humans to recognize the emotional facial expressions of other people in a short time. In this study, we aimed to determine the discriminability of perceived emotional facial expressions. In a rapid event-related fMRI design, subjects were instructed to classify four categories of facial expressions (happy, disgust, angry and neutral) by pressing different buttons, and each facial expression stimulus lasted for 2s. All participants performed 5 fMRI runs. One multivariate pattern analysis method, support vector machine was trained to predict the categories of facial expressions. For feature selection, ninety masks defined from anatomical automatic labeling (AAL) atlas were firstly generated and each were treated as the input of the classifier; then, the most stable AAL areas were selected according to prediction accuracies, and comprised the final feature sets. Results showed that: for the 6 pair-wise classification conditions, the accuracy, sensitivity and specificity were all above chance prediction, among which, happy vs. neutral , angry vs. disgust achieved the lowest results. These results suggested that specific neural signatures of perceived emotional facial expressions may exist, and happy vs. neutral, angry vs. disgust might be more similar in information representation in the brain.

  15. Multivariate pattern analysis strategies in detection of remitted major depressive disorder using resting state functional connectivity.

    PubMed

    Bhaumik, Runa; Jenkins, Lisanne M; Gowins, Jennifer R; Jacobs, Rachel H; Barba, Alyssa; Bhaumik, Dulal K; Langenecker, Scott A

    2017-01-01

    Understanding abnormal resting-state functional connectivity of distributed brain networks may aid in probing and targeting mechanisms involved in major depressive disorder (MDD). To date, few studies have used resting state functional magnetic resonance imaging (rs-fMRI) to attempt to discriminate individuals with MDD from individuals without MDD, and to our knowledge no investigations have examined a remitted (r) population. In this study, we examined the efficiency of support vector machine (SVM) classifier to successfully discriminate rMDD individuals from healthy controls (HCs) in a narrow early-adult age range. We empirically evaluated four feature selection methods including multivariate Least Absolute Shrinkage and Selection Operator (LASSO) and Elastic Net feature selection algorithms. Our results showed that SVM classification with Elastic Net feature selection achieved the highest classification accuracy of 76.1% (sensitivity of 81.5% and specificity of 68.9%) by leave-one-out cross-validation across subjects from a dataset consisting of 38 rMDD individuals and 29 healthy controls. The highest discriminating functional connections were between the left amygdala, left posterior cingulate cortex, bilateral dorso-lateral prefrontal cortex, and right ventral striatum. These appear to be key nodes in the etiopathophysiology of MDD, within and between default mode, salience and cognitive control networks. This technique demonstrates early promise for using rs-fMRI connectivity as a putative neurobiological marker capable of distinguishing between individuals with and without rMDD. These methods may be extended to periods of risk prior to illness onset, thereby allowing for earlier diagnosis, prevention, and intervention.

  16. Rapid determination of chemical composition and classification of bamboo fractions using visible-near infrared spectroscopy coupled with multivariate data analysis.

    PubMed

    Yang, Zhong; Li, Kang; Zhang, Maomao; Xin, Donglin; Zhang, Junhua

    2016-01-01

    During conversion of bamboo into biofuels and chemicals, it is necessary to efficiently predict the chemical composition and digestibility of biomass. However, traditional methods for determination of lignocellulosic biomass composition are expensive and time consuming. In this work, a novel and fast method for quantitative and qualitative analysis of chemical composition and enzymatic digestibilities of juvenile bamboo and mature bamboo fractions (bamboo green, bamboo timber, bamboo yellow, bamboo node, and bamboo branch) using visible-near infrared spectra was evaluated. The developed partial least squares models yielded coefficients of determination in calibration of 0.88, 0.94, and 0.96, for cellulose, xylan, and lignin of bamboo fractions in raw spectra, respectively. After visible-near infrared spectra being pretreated, the corresponding coefficients of determination in calibration yielded by the developed partial least squares models are 0.994, 0.990, and 0.996, respectively. The score plots of principal component analysis of mature bamboo, juvenile bamboo, and different fractions of mature bamboo were obviously distinguished in raw spectra. Based on partial least squares discriminant analysis, the classification accuracies of mature bamboo, juvenile bamboo, and different fractions of bamboo (bamboo green, bamboo timber, bamboo yellow, and bamboo branch) all reached 100 %. In addition, high accuracies of evaluation of the enzymatic digestibilities of bamboo fractions after pretreatment with aqueous ammonia were also observed. The results showed the potential of visible-near infrared spectroscopy in combination with multivariate analysis in efficiently analyzing the chemical composition and hydrolysabilities of lignocellulosic biomass, such as bamboo fractions.

  17. Automatic Cell Segmentation Using a Shape-Classification Model in Immunohistochemically Stained Cytological Images

    NASA Astrophysics Data System (ADS)

    Shah, Shishir

    This paper presents a segmentation method for detecting cells in immunohistochemically stained cytological images. A two-phase approach to segmentation is used where an unsupervised clustering approach coupled with cluster merging based on a fitness function is used as the first phase to obtain a first approximation of the cell locations. A joint segmentation-classification approach incorporating ellipse as a shape model is used as the second phase to detect the final cell contour. The segmentation model estimates a multivariate density function of low-level image features from training samples and uses it as a measure of how likely each image pixel is to be a cell. This estimate is constrained by the zero level set, which is obtained as a solution to an implicit representation of an ellipse. Results of segmentation are presented and compared to ground truth measurements.

  18. Comprehensive Chemical Fingerprinting of High-Quality Cocoa at Early Stages of Processing: Effectiveness of Combined Untargeted and Targeted Approaches for Classification and Discrimination.

    PubMed

    Magagna, Federico; Guglielmetti, Alessandro; Liberto, Erica; Reichenbach, Stephen E; Allegrucci, Elena; Gobino, Guido; Bicchi, Carlo; Cordero, Chiara

    2017-08-02

    This study investigates chemical information of volatile fractions of high-quality cocoa (Theobroma cacao L. Malvaceae) from different origins (Mexico, Ecuador, Venezuela, Columbia, Java, Trinidad, and Sao Tomè) produced for fine chocolate. This study explores the evolution of the entire pattern of volatiles in relation to cocoa processing (raw, roasted, steamed, and ground beans). Advanced chemical fingerprinting (e.g., combined untargeted and targeted fingerprinting) with comprehensive two-dimensional gas chromatography coupled with mass spectrometry allows advanced pattern recognition for classification, discrimination, and sensory-quality characterization. The entire data set is analyzed for 595 reliable two-dimensional peak regions, including 130 known analytes and 13 potent odorants. Multivariate analysis with unsupervised exploration (principal component analysis) and simple supervised discrimination methods (Fisher ratios and linear regression trees) reveal informative patterns of similarities and differences and identify characteristic compounds related to sample origin and manufacturing step.

  19. Influence of microclimatic ammonia levels on productive performance of different broilers' breeds estimated with univariate and multivariate approaches.

    PubMed

    Soliman, Essam S; Moawed, Sherif A; Hassan, Rania A

    2017-08-01

    Birds litter contains unutilized nitrogen in the form of uric acid that is converted into ammonia; a fact that does not only affect poultry performance but also has a negative effect on people's health around the farm and contributes in the environmental degradation. The influence of microclimatic ammonia emissions on Ross and Hubbard broilers reared in different housing systems at two consecutive seasons (fall and winter) was evaluated using a discriminant function analysis to differentiate between Ross and Hubbard breeds. A total number of 400 air samples were collected and analyzed for ammonia levels during the experimental period. Data were analyzed using univariate and multivariate statistical methods. Ammonia levels were significantly higher (p< 0.01) in the Ross compared to the Hubbard breed farm, although no significant differences (p>0.05) were found between the two farms in body weight, body weight gain, feed intake, feed conversion ratio, and performance index (PI) of broilers. Body weight; weight gain and PI had increased values (p< 0.01) during fall compared to winter irrespective of broiler breed. Ammonia emissions were positively (although weekly) correlated with the ambient relative humidity (r=0.383; p< 0.01), but not with the ambient temperature (r=-0.045; p>0.05). Test of significance of discriminant function analysis did not show a classification based on the studied traits suggesting that they cannot been used as predictor variables. The percentage of correct classification was 52% and it was improved after deletion of highly correlated traits to 57%. The study revealed that broiler's growth was negatively affected by increased microclimatic ammonia concentrations and recommended the analysis of broilers' growth performance parameters data using multivariate discriminant function analysis.

  20. Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data. Part 3. Variable selection in classification.

    PubMed

    Ballabio, Davide; Consonni, Viviana; Mauri, Andrea; Todeschini, Roberto

    2010-01-11

    In multivariate regression and classification issues variable selection is an important procedure used to select an optimal subset of variables with the aim of producing more parsimonious and eventually more predictive models. Variable selection is often necessary when dealing with methodologies that produce thousands of variables, such as Quantitative Structure-Activity Relationships (QSARs) and highly dimensional analytical procedures. In this paper a novel method for variable selection for classification purposes is introduced. This method exploits the recently proposed Canonical Measure of Correlation between two sets of variables (CMC index). The CMC index is in this case calculated for two specific sets of variables, the former being comprised of the independent variables and the latter of the unfolded class matrix. The CMC values, calculated by considering one variable at a time, can be sorted and a ranking of the variables on the basis of their class discrimination capabilities results. Alternatively, CMC index can be calculated for all the possible combinations of variables and the variable subset with the maximal CMC can be selected, but this procedure is computationally more demanding and classification performance of the selected subset is not always the best one. The effectiveness of the CMC index in selecting variables with discriminative ability was compared with that of other well-known strategies for variable selection, such as the Wilks' Lambda, the VIP index based on the Partial Least Squares-Discriminant Analysis, and the selection provided by classification trees. A variable Forward Selection based on the CMC index was finally used in conjunction of Linear Discriminant Analysis. This approach was tested on several chemical data sets. Obtained results were encouraging.

  1. An NRG Oncology/GOG study of molecular classification for risk prediction in endometrioid endometrial cancer

    PubMed Central

    Cosgrove, Casey M; Tritchler, David L; Cohn, David E; Mutch, David G; Rush, Craig M; Lankes, Heather A; Creasman, William T.; Miller, David S; Ramirez, Nilsa C; Geller, Melissa A; Powell, Matthew A; Backes, Floor J; Landrum, Lisa M; Timmers, Cynthia; Suarez, Adrian A; Zaino, Richard J; Pearl, Michael L; DiSilvestro, Paul A; Lele, Shashikant B; Goodfellow, Paul J

    2017-01-01

    Objectives The purpose of this study was to assess the prognostic significance of a simplified, clinically accessible classification system for endometrioid endometrial cancers combining Lynch syndrome screening and molecular risk stratification. Methods Tumors from NRG/GOG GOG210 were evaluated for mismatch repair defects (MSI, MMR IHC, and MLH1 methylation), POLE mutations, and loss of heterozygosity. TP53 was evaluated in a subset of cases. Tumors were assigned to four molecular classes. Relationships between molecular classes and clinicopathologic variables were assessed using contingency tests and Cox proportional methods. Results Molecular classification was successful for 982 tumors. Based on the NCI consensus MSI panel assessing MSI and loss of heterozygosity combined with POLE testing, 49% of tumors were classified copy number stable (CNS), 39% MMR deficient, 8% copy number altered (CNA) and 4% POLE mutant. Cancer-specific mortality occurred in 5% of patients with CNS tumors; 2.6% with POLE tumors; 7.6% with MMR deficient tumors and 19% with CNA tumors. The CNA group had worse progression-free (HR 2.31, 95%CI 1.53–3.49) and cancer-specific survival (HR 3.95; 95%CI 2.10–7.44). The POLE group had improved outcomes, but the differences were not statistically significant. CNA class remained significant for cancer-specific survival (HR 2.11; 95%CI 1.04–4.26) in multivariable analysis. The CNA molecular class was associated with TP53 mutation and expression status. Conclusions A simple molecular classification for endometrioid endometrial cancers that can be easily combined with Lynch syndrome screening provides important prognostic information. These findings support prospective clinical validation and further studies on the predictive value of a simplified molecular classification system. PMID:29132872

  2. Seizure classification in EEG signals utilizing Hilbert-Huang transform.

    PubMed

    Oweis, Rami J; Abdulhay, Enas W

    2011-05-24

    Classification method capable of recognizing abnormal activities of the brain functionality are either brain imaging or brain signal analysis. The abnormal activity of interest in this study is characterized by a disturbance caused by changes in neuronal electrochemical activity that results in abnormal synchronous discharges. The method aims at helping physicians discriminate between healthy and seizure electroencephalographic (EEG) signals. Discrimination in this work is achieved by analyzing EEG signals obtained from freely accessible databases. MATLAB has been used to implement and test the proposed classification algorithm. The analysis in question presents a classification of normal and ictal activities using a feature relied on Hilbert-Huang Transform. Through this method, information related to the intrinsic functions contained in the EEG signal has been extracted to track the local amplitude and the frequency of the signal. Based on this local information, weighted frequencies are calculated and a comparison between ictal and seizure-free determinant intrinsic functions is then performed. Methods of comparison used are the t-test and the Euclidean clustering. The t-test results in a P-value < 0.02 and the clustering leads to accurate (94%) and specific (96%) results. The proposed method is also contrasted against the Multivariate Empirical Mode Decomposition that reaches 80% accuracy. Comparison results strengthen the contribution of this paper not only from the accuracy point of view but also with respect to its fast response and ease to use. An original tool for EEG signal processing giving physicians the possibility to diagnose brain functionality abnormalities is presented in this paper. The proposed system bears the potential of providing several credible benefits such as fast diagnosis, high accuracy, good sensitivity and specificity, time saving and user friendly. Furthermore, the classification of mode mixing can be achieved using the extracted instantaneous information of every IMF, but it would be most likely a hard task if only the average value is used. Extra benefits of this proposed system include low cost, and ease of interface. All of that indicate the usefulness of the tool and its use as an efficient diagnostic tool.

  3. Panic disorder and agoraphobia: A direct comparison of their multivariate comorbidity patterns.

    PubMed

    Greene, Ashley L; Eaton, Nicholas R

    2016-01-15

    Scientific debate has long surrounded whether agoraphobia is a severe consequence of panic disorder or a frequently comorbid diagnosis. Multivariate comorbidity investigations typically treat these diagnoses as fungible in structural models, assuming both are manifestations of the fear-subfactor in the internalizing-externalizing model. No studies have directly compared these disorders' multivariate associations, which could clarify their conceptualization in classification and comorbidity research. In a nationally representative sample (N=43,093), we examined the multivariate comorbidity of panic disorder (1) without agoraphobia, (2) with agoraphobia, and (3) regardless of agoraphobia; and (4) agoraphobia without panic. We conducted exploratory and confirmatory factor analyses of these and 10 other lifetime DSM-IV diagnoses in a nationally representative sample (N=43,093). Differing bivariate and multivariate relations were found. Panic disorder without agoraphobia was largely a distress disorder, related to emotional disorders. Agoraphobia without panic was largely a fear disorder, related to phobias. When considered jointly, concomitant agoraphobia and panic was a fear disorder, and when panic was assessed without regard to agoraphobia (some individuals had agoraphobia while others did not) it was a mixed distress and fear disorder. Diagnoses were obtained from comprehensively trained lay interviewers, not clinicians and analyses used DSM-IV diagnoses (rather than DSM-5). These findings support the conceptualization of agoraphobia as a distinct diagnostic entity and the independent classification of both disorders in DSM-5, suggesting future multivariate comorbidity studies should not assume various panic/agoraphobia diagnoses are invariably fear disorders. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Decoding of visual activity patterns from fMRI responses using multivariate pattern analyses and convolutional neural network.

    PubMed

    Zafar, Raheel; Kamel, Nidal; Naufal, Mohamad; Malik, Aamir Saeed; Dass, Sarat C; Ahmad, Rana Fayyaz; Abdullah, Jafri M; Reza, Faruque

    2017-01-01

    Decoding of human brain activity has always been a primary goal in neuroscience especially with functional magnetic resonance imaging (fMRI) data. In recent years, Convolutional neural network (CNN) has become a popular method for the extraction of features due to its higher accuracy, however it needs a lot of computation and training data. In this study, an algorithm is developed using Multivariate pattern analysis (MVPA) and modified CNN to decode the behavior of brain for different images with limited data set. Selection of significant features is an important part of fMRI data analysis, since it reduces the computational burden and improves the prediction performance; significant features are selected using t-test. MVPA uses machine learning algorithms to classify different brain states and helps in prediction during the task. General linear model (GLM) is used to find the unknown parameters of every individual voxel and the classification is done using multi-class support vector machine (SVM). MVPA-CNN based proposed algorithm is compared with region of interest (ROI) based method and MVPA based estimated values. The proposed method showed better overall accuracy (68.6%) compared to ROI (61.88%) and estimation values (64.17%).

  5. Fluorescent marker-based and marker-free discrimination between healthy and cancerous human tissues using hyper-spectral imaging

    NASA Astrophysics Data System (ADS)

    Arnold, Thomas; De Biasio, Martin; Leitner, Raimund

    2015-06-01

    Two problems are addressed in this paper (i) the fluorescent marker-based and the (ii) marker-free discrimination between healthy and cancerous human tissues. For both applications the performance of hyper-spectral methods are quantified. Fluorescent marker-based tissue classification uses a number of fluorescent markers to dye specific parts of a human cell. The challenge is that the emission spectra of the fluorescent dyes overlap considerably. They are, furthermore disturbed by the inherent auto-fluorescence of human tissue. This results in ambiguities and decreased image contrast causing difficulties for the treatment decision. The higher spectral resolution introduced by tunable-filter-based spectral imaging in combination with spectral unmixing techniques results in an improvement of the image contrast and therefore more reliable information for the physician to choose the treatment decision. Marker-free tissue classification is based solely on the subtle spectral features of human tissue without the use of artificial markers. The challenge in this case is that the spectral differences between healthy and cancerous tissues are subtle and embedded in intra- and inter-patient variations of these features. The contributions of this paper are (i) the evaluation of hyper-spectral imaging in combination with spectral unmixing techniques for fluorescence marker-based tissue classification, (ii) the evaluation of spectral imaging for marker-free intra surgery tissue classification. Within this paper, we consider real hyper-spectral fluorescence and endoscopy data sets to emphasize the practical capability of the proposed methods. It is shown that the combination of spectral imaging with multivariate statistical methods can improve the sensitivity and specificity of the detection and the staging of cancerous tissues compared to standard procedures.

  6. Improved sparse decomposition based on a smoothed L0 norm using a Laplacian kernel to select features from fMRI data.

    PubMed

    Zhang, Chuncheng; Song, Sutao; Wen, Xiaotong; Yao, Li; Long, Zhiying

    2015-04-30

    Feature selection plays an important role in improving the classification accuracy of multivariate classification techniques in the context of fMRI-based decoding due to the "few samples and large features" nature of functional magnetic resonance imaging (fMRI) data. Recently, several sparse representation methods have been applied to the voxel selection of fMRI data. Despite the low computational efficiency of the sparse representation methods, they still displayed promise for applications that select features from fMRI data. In this study, we proposed the Laplacian smoothed L0 norm (LSL0) approach for feature selection of fMRI data. Based on the fast sparse decomposition using smoothed L0 norm (SL0) (Mohimani, 2007), the LSL0 method used the Laplacian function to approximate the L0 norm of sources. Results of the simulated and real fMRI data demonstrated the feasibility and robustness of LSL0 for the sparse source estimation and feature selection. Simulated results indicated that LSL0 produced more accurate source estimation than SL0 at high noise levels. The classification accuracy using voxels that were selected by LSL0 was higher than that by SL0 in both simulated and real fMRI experiment. Moreover, both LSL0 and SL0 showed higher classification accuracy and required less time than ICA and t-test for the fMRI decoding. LSL0 outperformed SL0 in sparse source estimation at high noise level and in feature selection. Moreover, LSL0 and SL0 showed better performance than ICA and t-test for feature selection. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Study for Updated Gout Classification Criteria (SUGAR): identification of features to classify gout

    PubMed Central

    Taylor, William J.; Fransen, Jaap; Jansen, Tim L.; Dalbeth, Nicola; Schumacher, H. Ralph; Brown, Melanie; Louthrenoo, Worawit; Vazquez-Mellado, Janitzia; Eliseev, Maxim; McCarthy, Geraldine; Stamp, Lisa K.; Perez-Ruiz, Fernando; Sivera, Francisca; Ea, Hang-Korng; Gerritsen, Martijn; Scire, Carlo; Cavagna, Lorenzo; Lin, Chingtsai; Chou, Yin-Yi; Tausche, Anne-Kathrin; Vargas-Santos, Ana Beatriz; Janssen, Matthijs; Chen, Jiunn-Horng; Slot, Ole; Cimmino, Marco A.; Uhlig, Till; Neogi, Tuhina

    2015-01-01

    Objective To determine which clinical, laboratory and imaging features most accurately distinguished gout from non-gout. Methods A cross-sectional study of consecutive rheumatology clinic patients with at least one swollen joint or subcutaneous tophus. Gout was defined by synovial fluid or tophus aspirate microscopy by certified examiners in all patients. The sample was randomly divided into a model development (2/3) and test sample (1/3). Univariate and multivariate association between clinical features and MSU-defined gout was determined using logistic regression modelling. Shrinkage of regression weights was performed to prevent over-fitting of the final model. Latent class analysis was conducted to identify patterns of joint involvement. Results In total, 983 patients were included. Gout was present in 509 (52%). In the development sample (n=653), these features were selected for the final model (multivariate OR) joint erythema (2.13), difficulty walking (7.34), time to maximal pain < 24 hours (1.32), resolution by 2 weeks (3.58), tophus (7.29), MTP1 ever involved (2.30), location of currently tender joints: Other foot/ankle (2.28), MTP1 (2.82), serum urate level > 6 mg/dl (0.36 mmol/l) (3.35), ultrasound double contour sign (7.23), Xray erosion or cyst (2.49). The final model performed adequately in the test set with no evidence of misfit, high discrimination and predictive ability. MTP1 involvement was the most common joint pattern (39.4%) in gout cases. Conclusion Ten key discriminating features have been identified for further evaluation for new gout classification criteria. Ultrasound findings and degree of uricemia add discriminating value, and will significantly contribute to more accurate classification criteria. PMID:25777045

  8. Comparison of cystatin C and creatinine to determine the incidence of composite adverse outcomes in HIV-infected individuals.

    PubMed

    Yanagisawa, Naoki; Sasaki, Shugo; Suganuma, Akihiko; Imamura, Akifumi; Ajisawa, Atsushi; Ando, Minoru

    2015-02-01

    Cystatin C is an overall biomarker of pathophysiologic abnormalities that accompany chronic kidney disease (CKD). The utility of cystatin C is not fully understood in an HIV-infected population. This prospective study investigated 661 HIV-infected individuals for 4 years to determine the incidence of adverse outcomes, including all-cause mortality, cardiovascular disease, and renal dysfunction. The risk of developing the outcomes was discriminated with a 4 color-coded classification in a 3 × 6 contingency table, that combined 3 grades of dipstick proteinuria with 6 grades of estimated glomerular filtration rate (eGFR) calculated using either serum creatinine (eGFRcr) or cystatin C (eGFRcy): green, low risk; yellow, moderately increased risk; orange, high risk; and red, very high risk. The cumulative incidence of the outcomes was assessed by the Kaplan-Meier method, and the association between color-coded risk and the time to outcome was evaluated using multivariate proportional hazards analysis. Compared with eGFRcr, the use of eGFRcy reduced the prevalence of risk ≥ orange by 0.8%. The adverse outcomes were significantly more likely to occur to the patients with baseline risk category ≥orange than those with ≤ yellow, independent of risk categories based on eGFRcr or eGFRcy. However, in multivariate analysis, risk category ≥orange with eGFRcy-based classification was significantly associated with adverse outcomes, but not the one with eGFRcr. Replacing creatinine by cystatin C in the CKD color-coded risk classification may be appropriate to discriminate HIV-infected patients at increased risk of a poor prognosis. Copyright © 2014 Japanese Society of Chemotherapy and The Japanese Association for Infectious Diseases. Published by Elsevier Ltd. All rights reserved.

  9. A novel latent gaussian copula framework for modeling spatial correlation in quantized SAR imagery with applications to ATR

    NASA Astrophysics Data System (ADS)

    Thelen, Brian T.; Xique, Ismael J.; Burns, Joseph W.; Goley, G. Steven; Nolan, Adam R.; Benson, Jonathan W.

    2017-04-01

    With all of the new remote sensing modalities available, and with ever increasing capabilities and frequency of collection, there is a desire to fundamentally understand/quantify the information content in the collected image data relative to various exploitation goals, such as detection/classification. A fundamental approach for this is the framework of Bayesian decision theory, but a daunting challenge is to have significantly flexible and accurate multivariate models for the features and/or pixels that capture a wide assortment of distributions and dependen- cies. In addition, data can come in the form of both continuous and discrete representations, where the latter is often generated based on considerations of robustness to imaging conditions and occlusions/degradations. In this paper we propose a novel suite of "latent" models fundamentally based on multivariate Gaussian copula models that can be used for quantized data from SAR imagery. For this Latent Gaussian Copula (LGC) model, we derive an approximate, maximum-likelihood estimation algorithm and demonstrate very reasonable estimation performance even for the larger images with many pixels. However applying these LGC models to large dimen- sions/images within a Bayesian decision/classification theory is infeasible due to the computational/numerical issues in evaluating the true full likelihood, and we propose an alternative class of novel pseudo-likelihoood detection statistics that are computationally feasible. We show in a few simple examples that these statistics have the potential to provide very good and robust detection/classification performance. All of this framework is demonstrated on a simulated SLICY data set, and the results show the importance of modeling the dependencies, and of utilizing the pseudo-likelihood methods.

  10. Use of the Analysis of the Volatile Faecal Metabolome in Screening for Colorectal Cancer

    PubMed Central

    2015-01-01

    Diagnosis of colorectal cancer is an invasive and expensive colonoscopy, which is usually carried out after a positive screening test. Unfortunately, existing screening tests lack specificity and sensitivity, hence many unnecessary colonoscopies are performed. Here we report on a potential new screening test for colorectal cancer based on the analysis of volatile organic compounds (VOCs) in the headspace of faecal samples. Faecal samples were obtained from subjects who had a positive faecal occult blood sample (FOBT). Subjects subsequently had colonoscopies performed to classify them into low risk (non-cancer) and high risk (colorectal cancer) groups. Volatile organic compounds were analysed by selected ion flow tube mass spectrometry (SIFT-MS) and then data were analysed using both univariate and multivariate statistical methods. Ions most likely from hydrogen sulphide, dimethyl sulphide and dimethyl disulphide are statistically significantly higher in samples from high risk rather than low risk subjects. Results using multivariate methods show that the test gives a correct classification of 75% with 78% specificity and 72% sensitivity on FOBT positive samples, offering a potentially effective alternative to FOBT. PMID:26086914

  11. Evaluation of Urinary Tract Dilation Classification System for Grading Postnatal Hydronephrosis.

    PubMed

    Hodhod, Amr; Capolicchio, John-Paul; Jednak, Roman; El-Sherif, Eid; El-Doray, Abd El-Alim; El-Sherbiny, Mohamed

    2016-03-01

    We assessed the reliability and validity of the Urinary Tract Dilation classification system as a new grading system for postnatal hydronephrosis. We retrospectively reviewed charts of patients who presented with hydronephrosis from 2008 to 2013. We included patients diagnosed prenatally and those with hydronephrosis discovered incidentally during the first year of life. We excluded cases involving urinary tract infection, neurogenic bladder and chromosomal anomalies, those associated with extraurinary congenital malformations and those with followup of less than 24 months without resolution. Hydronephrosis was graded postnatally using the Society for Fetal Urology system, and then the management protocol was chosen. All units were regraded using the Urinary Tract Dilation classification system and compared to the Society for Fetal Urology system to assess reliability. Univariate and multivariate analyses were performed to assess the validity of the Urinary Tract Dilation classification system in predicting hydronephrosis resolution and surgical intervention. A total of 490 patients (730 renal units) were eligible to participate. The Urinary Tract Dilation classification system was reliable in the assessment of hydronephrosis (parallel forms 0.92). Hydronephrosis resolved in 357 units (49%), and 86 units (12%) were managed by surgical intervention. The remainder of renal units demonstrated stable or improved hydronephrosis. Multivariate analysis revealed that the likelihood of surgical intervention was predicted independently by Urinary Tract Dilation classification system risk group, while Society for Fetal Urology grades were predictive of likelihood of resolution. The Urinary Tract Dilation classification system is reliable for evaluation of postnatal hydronephrosis and is valid in predicting surgical intervention. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.

  12. Chromatographic profiles of Phyllanthus aqueous extracts samples: a proposition of classification using chemometric models.

    PubMed

    Martins, Lucia Regina Rocha; Pereira-Filho, Edenir Rodrigues; Cass, Quezia Bezerra

    2011-04-01

    Taking in consideration the global analysis of complex samples, proposed by the metabolomic approach, the chromatographic fingerprint encompasses an attractive chemical characterization of herbal medicines. Thus, it can be used as a tool in quality control analysis of phytomedicines. The generated multivariate data are better evaluated by chemometric analyses, and they can be modeled by classification methods. "Stone breaker" is a popular Brazilian plant of Phyllanthus genus, used worldwide to treat renal calculus, hepatitis, and many other diseases. In this study, gradient elution at reversed-phase conditions with detection at ultraviolet region were used to obtain chemical profiles (fingerprints) of botanically identified samples of six Phyllanthus species. The obtained chromatograms, at 275 nm, were organized in data matrices, and the time shifts of peaks were adjusted using the Correlation Optimized Warping algorithm. Principal Component Analyses were performed to evaluate similarities among cultivated and uncultivated samples and the discrimination among the species and, after that, the samples were used to compose three classification models using Soft Independent Modeling of Class analogy, K-Nearest Neighbor, and Partial Least Squares for Discriminant Analysis. The ability of classification models were discussed after their successful application for authenticity evaluation of 25 commercial samples of "stone breaker."

  13. How many taxa can be recognized within the complex Tillandsia capillaris (Bromeliaceae, Tillandsioideae)? Analysis of the available classifications using a multivariate approach.

    PubMed

    Castello, Lucía V; Galetto, Leonardo

    2013-01-01

    Tillandsia capillaris Ruiz & Pav., which belongs to the subgenus Diaphoranthema is distributed in Ecuador, Peru, Bolivia, northern and central Argentina, and Chile, and includes forms that are difficult to circumscribe, thus considered to form a complex. The entities of this complex are predominantly small-sized epiphytes, adapted to xeric environments. The most widely used classification defines 5 forms for this complex based on few morphological reproductive traits: Tillandsia capillaris Ruiz & Pav. f. capillaris, Tillandsia capillaris f. incana (Mez) L.B. Sm., Tillandsia capillaris f. cordobensis (Hieron.) L.B. Sm., Tillandsia capillaris f. hieronymi (Mez) L.B. Sm. and Tillandsia capillaris f. virescens (Ruiz & Pav.) L.B. Sm. In this study, 35 floral and vegetative characters were analyzed with a multivariate approach in order to assess and discuss different proposals for classification of the Tillandsia capillaris complex, which presents morphotypes that co-occur in central and northern Argentina. To accomplish this, data of quantitative and categorical morphological characters of flowers and leaves were collected from herbarium specimens and field collections and were analyzed with statistical multivariate techniques. The results suggest that the last classification for the complex seems more comprehensive and three taxa were delimited: Tillandsia capillaris (=Tillandsia capillaris f. incana-hieronymi), Tillandsia virescens s. str. (=Tillandsia capillaris f. cordobensis) and Tillandsia virescens s. l. (=Tillandsia capillaris f. virescens). While Tillandsia capillaris and Tillandsia virescens s. str. co-occur, Tillandsia virescens s. l. is restricted to altitudes above 2000 m in Argentina. Characters previously used for taxa delimitation showed continuous variation and therefore were not useful. New diagnostic characters are proposed and a key is provided for delimiting these three taxa within the complex.

  14. How many taxa can be recognized within the complex Tillandsia capillaris (Bromeliaceae, Tillandsioideae)? Analysis of the available classifications using a multivariate approach

    PubMed Central

    Castello, Lucía V.; Galetto, Leonardo

    2013-01-01

    Abstract Tillandsia capillaris Ruiz & Pav., which belongs to the subgenus Diaphoranthema is distributed in Ecuador, Peru, Bolivia, northern and central Argentina, and Chile, and includes forms that are difficult to circumscribe, thus considered to form a complex. The entities of this complex are predominantly small-sized epiphytes, adapted to xeric environments. The most widely used classification defines 5 forms for this complex based on few morphological reproductive traits: Tillandsia capillaris Ruiz & Pav. f. capillaris, Tillandsia capillaris f. incana (Mez) L.B. Sm., Tillandsia capillaris f. cordobensis (Hieron.) L.B. Sm., Tillandsia capillaris f. hieronymi (Mez) L.B. Sm. and Tillandsia capillaris f. virescens (Ruiz & Pav.) L.B. Sm. In this study, 35 floral and vegetative characters were analyzed with a multivariate approach in order to assess and discuss different proposals for classification of the Tillandsia capillaris complex, which presents morphotypes that co-occur in central and northern Argentina. To accomplish this, data of quantitative and categorical morphological characters of flowers and leaves were collected from herbarium specimens and field collections and were analyzed with statistical multivariate techniques. The results suggest that the last classification for the complex seems more comprehensive and three taxa were delimited: Tillandsia capillaris (=Tillandsia capillaris f. incana-hieronymi), Tillandsia virescens s. str. (=Tillandsia capillaris f. cordobensis) and Tillandsia virescens s. l. (=Tillandsia capillaris f. virescens). While Tillandsia capillaris and Tillandsia virescens s. str. co-occur, Tillandsia virescens s. l. is restricted to altitudes above 2000 m in Argentina. Characters previously used for taxa delimitation showed continuous variation and therefore were not useful. New diagnostic characters are proposed and a key is provided for delimiting these three taxa within the complex. PMID:23805053

  15. A Novel Acoustic Sensor Approach to Classify Seeds Based on Sound Absorption Spectra

    PubMed Central

    Gasso-Tortajada, Vicent; Ward, Alastair J.; Mansur, Hasib; Brøchner, Torben; Sørensen, Claus G.; Green, Ole

    2010-01-01

    A non-destructive and novel in situ acoustic sensor approach based on the sound absorption spectra was developed for identifying and classifying different seed types. The absorption coefficient spectra were determined by using the impedance tube measurement method. Subsequently, a multivariate statistical analysis, i.e., principal component analysis (PCA), was performed as a way to generate a classification of the seeds based on the soft independent modelling of class analogy (SIMCA) method. The results show that the sound absorption coefficient spectra of different seed types present characteristic patterns which are highly dependent on seed size and shape. In general, seed particle size and sphericity were inversely related with the absorption coefficient. PCA presented reliable grouping capabilities within the diverse seed types, since the 95% of the total spectral variance was described by the first two principal components. Furthermore, the SIMCA classification model based on the absorption spectra achieved optimal results as 100% of the evaluation samples were correctly classified. This study contains the initial structuring of an innovative method that will present new possibilities in agriculture and industry for classifying and determining physical properties of seeds and other materials. PMID:22163455

  16. Sex discrimination from the acetabulum in a twentieth-century skeletal sample from France using digital photogrammetry.

    PubMed

    Macaluso, P J

    2011-02-01

    Digital photogrammetric methods were used to collect diameter, area, and perimeter data of the acetabulum for a twentieth-century skeletal sample from France (Georges Olivier Collection, Musée de l'Homme, Paris) consisting of 46 males and 36 females. The measurements were then subjected to both discriminant function and logistic regression analyses in order to develop osteometric standards for sex assessment. Univariate discriminant functions and logistic regression equations yielded overall correct classification accuracy rates for both the left and the right acetabula ranging from 84.1% to 89.6%. The multivariate models developed in this study did not provide increased accuracy over those using only a single variable. Classification sex bias ratios ranged between 1.1% and 7.3% for the majority of models. The results of this study, therefore, demonstrate that metric analysis of acetabular size provides a highly accurate, and easily replicable, method of discriminating sex in this documented skeletal collection. The results further suggest that the addition of area and perimeter data derived from digital images may provide a more effective method of sex assessment than that offered by traditional linear measurements alone. Copyright © 2010 Elsevier GmbH. All rights reserved.

  17. Multivariate evaluation of Thyroid Imaging Reporting and Data System (TI-RADS) in diagnosis malignant thyroid nodule: application to PCA and PLS-DA analysis.

    PubMed

    Zhang, Tan; Li, Fangxuan; Mu, Jiali; Liu, Juntian; Zhang, Sheng

    2017-06-01

    To explore the significance of ultrasonic features in differential diagnosis of thyroid nodules via combining the thyroid imaging reporting and data system (TI-RADS) and multivariate statistical analysis. Patients who received surgical treatment and was diagnosed with single thyroid nodule by postoperative pathology and preoperative ultrasound were enrolled in this study. Multivariate analysis was applied to assess the significant ultrasonic features which correlated with identifying benign or malignance and grading the TI-RADS classification of thyroid nodule. There were significant differences in the nodule size, aspect ratio, internal, echogenicity, boundary, presence or absence of calcifications, calcification type and CDFI between benign and malignant thyroid nodules. Multivariate analysis showed clear-cut distinction both between benign and malignance and among different TI-RADS categories of malignancy nodules. The shape and calcification of the nodule were important factors for distinguish the benign and malignance. Height of the nodule, aspect and calcification was important factors for grading TI-RADS categories of malignancy thyroid nodules. Ill-defined boundary, irregular shape and presence of calcification related with highly malignant risk for thyroid nodule. The larger height and aspect and presence of calcification related with higher TI-RADS classification of malignancy thyroid nodule.

  18. Rapid differentiation of Listeria monocytogenes epidemic clones III and IV and their intact compared with heat-killed populations using Fourier transform infrared spectroscopy and chemometrics.

    PubMed

    Nyarko, Esmond B; Puzey, Kenneth A; Donnelly, Catherine W

    2014-06-01

    The objectives of this study were to determine if Fourier transform infrared (FT-IR) spectroscopy and multivariate statistical analysis (chemometrics) could be used to rapidly differentiate epidemic clones (ECs) of Listeria monocytogenes, as well as their intact compared with heat-killed populations. FT-IR spectra were collected from dried thin smears on infrared slides prepared from aliquots of 10 μL of each L. monocytogenes ECs (ECIII: J1-101 and R2-499; ECIV: J1-129 and J1-220), and also from intact and heat-killed cell populations of each EC strain using 250 scans at a resolution of 4 cm(-1) in the mid-infrared region in a reflectance mode. Chemometric analysis of spectra involved the application of the multivariate discriminant method for canonical variate analysis (CVA) and linear discriminant analysis (LDA). CVA of the spectra in the wavelength region 4000 to 600 cm(-1) separated the EC strains while LDA resulted in a 100% accurate classification of all spectra in the data set. Further, CVA separated intact and heat-killed cells of each EC strain and there was 100% accuracy in the classification of all spectra when LDA was applied. FT-IR spectral wavenumbers 1650 to 1390 cm(-1) were used to separate heat-killed and intact populations of L. monocytogenes. The FT-IR spectroscopy method allowed discrimination between strains that belong to the same EC. FT-IR is a highly discriminatory and reproducible method that can be used for the rapid subtyping of L. monocytogenes, as well as for the detection of live compared with dead populations of the organism. Fourier transform infrared (FT-IR) spectroscopy and multivariate statistical analysis can be used for L. monocytogenes source tracking and for clinical case isolate comparison during epidemiological investigations since the method is capable of differentiating epidemic clones and it uses a library of well-characterized strains. The FT-IR method is potentially less expensive and more rapid compared to genetic subtyping methods, and can be used for L. monocytogenes strain typing by food industries and public health agencies to enable faster response and intervention to listeriosis outbreaks. FT-IR can also be applied for routine monitoring of the pathogen in food processing plants and for investigating postprocessing contamination because it is capable of differentiating heat-killed and viable L. monocytogenes populations. © 2014 Institute of Food Technologists®

  19. Lameness detection in dairy cattle: single predictor v. multivariate analysis of image-based posture processing and behaviour and performance sensing.

    PubMed

    Van Hertem, T; Bahr, C; Schlageter Tello, A; Viazzi, S; Steensels, M; Romanini, C E B; Lokhorst, C; Maltz, E; Halachmi, I; Berckmans, D

    2016-09-01

    The objective of this study was to evaluate if a multi-sensor system (milk, activity, body posture) was a better classifier for lameness than the single-sensor-based detection models. Between September 2013 and August 2014, 3629 cow observations were collected on a commercial dairy farm in Belgium. Human locomotion scoring was used as reference for the model development and evaluation. Cow behaviour and performance was measured with existing sensors that were already present at the farm. A prototype of three-dimensional-based video recording system was used to quantify automatically the back posture of a cow. For the single predictor comparisons, a receiver operating characteristics curve was made. For the multivariate detection models, logistic regression and generalized linear mixed models (GLMM) were developed. The best lameness classification model was obtained by the multi-sensor analysis (area under the receiver operating characteristics curve (AUC)=0.757±0.029), containing a combination of milk and milking variables, activity and gait and posture variables from videos. Second, the multivariate video-based system (AUC=0.732±0.011) performed better than the multivariate milk sensors (AUC=0.604±0.026) and the multivariate behaviour sensors (AUC=0.633±0.018). The video-based system performed better than the combined behaviour and performance-based detection model (AUC=0.669±0.028), indicating that it is worthwhile to consider a video-based lameness detection system, regardless the presence of other existing sensors in the farm. The results suggest that Θ2, the feature variable for the back curvature around the hip joints, with an AUC of 0.719 is the best single predictor variable for lameness detection based on locomotion scoring. In general, this study showed that the video-based back posture monitoring system is outperforming the behaviour and performance sensing techniques for locomotion scoring-based lameness detection. A GLMM with seven specific variables (walking speed, back posture measurement, daytime activity, milk yield, lactation stage, milk peak flow rate and milk peak conductivity) is the best combination of variables for lameness classification. The accuracy on four-level lameness classification was 60.3%. The accuracy improved to 79.8% for binary lameness classification. The binary GLMM obtained a sensitivity of 68.5% and a specificity of 87.6%, which both exceed the sensitivity (52.1%±4.7%) and specificity (83.2%±2.3%) of the multi-sensor logistic regression model. This shows that the repeated measures analysis in the GLMM, taking into account the individual history of the animal, outperforms the classification when thresholds based on herd level (a statistical population) are used.

  20. Application of ¹H NMR for the characterisation and authentication of ''Tonda Gentile Trilobata" hazelnuts from Piedmont (Italy).

    PubMed

    Caligiani, Augusta; Coisson, Jean Daniel; Travaglia, Fabiano; Acquotti, Domenico; Palla, Gerardo; Palla, Luigi; Arlorio, Marco

    2014-04-01

    The Italian hazelnut (Corylus avellana L.) cultivar "Tonda Gentile Trilobata" (TGT) is covered by protected geographical indication "Nocciola Piemonte" and is well-known as the best-suited hazelnut for the industrial transformation into roasted kernel. The hazelnut cultivar identification is primarily based on morphological characteristics, so there is the need for more objective analytical methods for high quality hazelnut authentication. This study reports the (1)H NMR fingerprinting of raw and roasted hazelnut, with the aim of obtaining hazelnut classification based on their spectroscopic pattern. (1)H NMR analyses were carried out on polar extracts of TGT and other cultivars: the data were analysed with multivariate statistical methods. Results showed that (1)H NMR combined with chemometrics is useful to characterise the hazelnuts as a function of the cultivars, both on raw and roasted form. The classification models allowed identifying molecular markers useful to distinguish TGT from other types, among these trigonelline, amino acids and an unidentified orto-disubstituted aromatic compound. Copyright © 2013 Elsevier Ltd. All rights reserved.

  1. Discrimination and characterization of strawberry juice based on electronic nose and tongue: comparison of different juice processing approaches by LDA, PLSR, RF, and SVM.

    PubMed

    Qiu, Shanshan; Wang, Jun; Gao, Liping

    2014-07-09

    An electronic nose (E-nose) and an electronic tongue (E-tongue) have been used to characterize five types of strawberry juices based on processing approaches (i.e., microwave pasteurization, steam blanching, high temperature short time pasteurization, frozen-thawed, and freshly squeezed). Juice quality parameters (vitamin C, pH, total soluble solid, total acid, and sugar/acid ratio) were detected by traditional measuring methods. Multivariate statistical methods (linear discriminant analysis (LDA) and partial least squares regression (PLSR)) and neural networks (Random Forest (RF) and Support Vector Machines) were employed to qualitative classification and quantitative regression. E-tongue system reached higher accuracy rates than E-nose did, and the simultaneous utilization did have an advantage in LDA classification and PLSR regression. According to cross-validation, RF has shown outstanding and indisputable performances in the qualitative and quantitative analysis. This work indicates that the simultaneous utilization of E-nose and E-tongue can discriminate processed fruit juices and predict quality parameters successfully for the beverage industry.

  2. Screening analysis of biodiesel feedstock using UV-vis, NIR and synchronous fluorescence spectrometries and the successive projections algorithm.

    PubMed

    Insausti, Matías; Gomes, Adriano A; Cruz, Fernanda V; Pistonesi, Marcelo F; Araujo, Mario C U; Galvão, Roberto K H; Pereira, Claudete F; Band, Beatriz S F

    2012-08-15

    This paper investigates the use of UV-vis, near infrared (NIR) and synchronous fluorescence (SF) spectrometries coupled with multivariate classification methods to discriminate biodiesel samples with respect to the base oil employed in their production. More specifically, the present work extends previous studies by investigating the discrimination of corn-based biodiesel from two other biodiesel types (sunflower and soybean). Two classification methods are compared, namely full-spectrum SIMCA (soft independent modelling of class analogies) and SPA-LDA (linear discriminant analysis with variables selected by the successive projections algorithm). Regardless of the spectrometric technique employed, full-spectrum SIMCA did not provide an appropriate discrimination of the three biodiesel types. In contrast, all samples were correctly classified on the basis of a reduced number of wavelengths selected by SPA-LDA. It can be concluded that UV-vis, NIR and SF spectrometries can be successfully employed to discriminate corn-based biodiesel from the two other biodiesel types, but wavelength selection by SPA-LDA is key to the proper separation of the classes. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. Impact of FAB classification on predicting outcome in acute myeloid leukemia, not otherwise specified, patients undergoing allogeneic stem cell transplantation in CR1: An analysis of 1690 patients from the acute leukemia working party of EBMT.

    PubMed

    Canaani, Jonathan; Beohou, Eric; Labopin, Myriam; Socié, Gerard; Huynh, Anne; Volin, Liisa; Cornelissen, Jan; Milpied, Noel; Gedde-Dahl, Tobias; Deconinck, Eric; Fegueux, Nathalie; Blaise, Didier; Mohty, Mohamad; Nagler, Arnon

    2017-04-01

    The French, American, and British (FAB) classification system for acute myeloid leukemia (AML) is extensively used and is incorporated into the AML, not otherwise specified (NOS) category in the 2016 WHO edition of myeloid neoplasm classification. While recent data proposes that FAB classification does not provide additional prognostic information for patients for whom NPM1 status is available, it is unknown whether FAB still retains a current prognostic role in predicting outcome of AML patients undergoing allogeneic stem cell transplantation. Using the European Society of Blood and Bone Marrow Transplantation registry we analyzed outcome of 1690 patients transplanted in CR1 to determine if FAB classification provides additional prognostic value. Multivariate analysis revealed that M6/M7 patients had decreased leukemia free survival (hazard ratio (HR) of 1.41, 95% confidence interval (CI), 1.01-1.99; P = .046) in addition to increased nonrelapse mortality (NRM) rates (HR, 1.79; 95% CI, 1.06-3.01; P = .028) compared with other FAB types. In the NPM1 wt AML, NOS cohort, FAB M6/M7 was also associated with increased NRM (HR, 2.17; 95% CI, 1.14-4.16; P = .019). Finally, in FLT3-ITD + patients, multivariate analyses revealed that specific FAB types were tightly associated with adverse outcome. In conclusion, FAB classification may predict outcome following transplantation in AML, NOS patients. © 2017 Wiley Periodicals, Inc.

  4. Comparisons of severity classification systems for oropharyngeal dysfunction in children with cerebral palsy: Relations with other functional profiles.

    PubMed

    Goh, Yu-Ra; Choi, Ja Young; Kim, Seon Ah; Park, Jieun; Park, Eun Sook

    2018-01-01

    This study aimed to investigate the relationships between various classification systems assessing the severity of oropharyngeal dysphagia and communication function and other functional profiles in children with cerebral palsy (CP). This is a prospective, cross-sectional, study in a university-affiliated, tertiary-care hospital. We recruited 151 children with CP (mean age 6.11 years, SD 3.42, range 3-18yr). The Eating and Drinking Ability Classification System (EDACS) and the dysphagia scales of Functional Oral Intake Scale (FOIS), Swallow Function Scales (SFS), and Food Intake Level Scale (FILS) were used. The Communication Function Classification System (CFCS) and Viking Speech Scale (VSS) were employed to classify communication function and speech intelligibility, respectively. The Pediatric Evaluation of Disability Inventory (PEDI) with the Gross Motor Function Classification System (GFMCS) and the Manual Ability Classification System (MACS) level were also assessed. Spearman correlation analysis to investigate the associations between measures and univariate and multivariate logistic regression models to identify significant factors were used. Median GMFCS level of participants was III (interquartile range II-IV). Significant dysphagia based on EDACS level III-V was noted in 23 children (15.2%). There were strong to very strong relationships between the EDACS level with the dysphagia scales. The EDACS presented strong associations with MACS, CFCS, and VSS, a moderate association with GMFCS level, and a moderate to strong association with each domain of the PEDI. In multivariate analysis, poor functioning in EDACS were associated with poor functioning in gross motor and communication functions. Copyright © 2017. Published by Elsevier Ltd.

  5. Optimal mapping of site-specific multivariate soil properties.

    PubMed

    Burrough, P A; Swindell, J

    1997-01-01

    This paper demonstrates how geostatistics and fuzzy k-means classification can be used together to improve our practical understanding of crop yield-site response. Two aspects of soil are important for precision farming: (a) sensible classes for a given crop, and (b) their spatial variation. Local site classifications are more sensitive than general taxonomies and can be provided by the method of fuzzy k-means to transform a multivariate data set with i attributes measured at n sites into k overlapping classes; each site has a membership value mk for each class in the range 0-1. Soil variation is of interest when conditions vary over patches manageable by agricultural machinery. The spatial variation of each of the k classes can be analysed by computing the variograms of mk over the n sites. Memberships for each of the k classes can be mapped by ordinary kriging. Areas of class dominance and the transition zones between them can be identified by an inter-class confusion index; reducing the zones to boundaries gives crisp maps of dominant soil groups that can be used to guide precision farming equipment. Automation of the procedure is straightforward given sufficient data. Time variations in soil properties can be automatically incorporated in the computation of membership values. The procedures are illustrated with multi-year crop yield data collected from a 5 ha demonstration field at the Royal Agricultural College in Cirencester, UK.

  6. Delineation of estuarine management areas using multivariate geostatistics: the case of Sado Estuary.

    PubMed

    Caeiro, Sandra; Goovaerts, Pierre; Painho, Marco; Costa, M Helena

    2003-09-15

    The Sado Estuary is a coastal zone located in the south of Portugal where conflicts between conservation and development exist because of its location near industrialized urban zones and its designation as a natural reserve. The aim of this paper is to evaluate a set of multivariate geostatistical approaches to delineate spatially contiguous regions of sediment structure for Sado Estuary. These areas will be the supporting infrastructure of an environmental management system for this estuary. The boundaries of each homogeneous area were derived from three sediment characterization attributes through three different approaches: (1) cluster analysis of dissimilarity matrix function of geographical separation followed by indicator kriging of the cluster data, (2) discriminant analysis of kriged values of the three sediment attributes, and (3) a combination of methods 1 and 2. Final maximum likelihood classification was integrated into a geographical information system. All methods generated fairly spatially contiguous management areas that reproduce well the environment of the estuary. Map comparison techniques based on kappa statistics showed thatthe resultant three maps are similar, supporting the choice of any of the methods as appropriate for management of the Sado Estuary. However, the results of method 1 seem to be in better agreement with estuary behavior, assessment of contamination sources, and previous work conducted at this site.

  7. Accuracy assessments and areal estimates using two-phase stratified random sampling, cluster plots, and the multivariate composite estimator

    Treesearch

    Raymond L. Czaplewski

    2000-01-01

    Consider the following example of an accuracy assessment. Landsat data are used to build a thematic map of land cover for a multicounty region. The map classifier (e.g., a supervised classification algorithm) assigns each pixel into one category of land cover. The classification system includes 12 different types of forest and land cover: black spruce, balsam fir,...

  8. Risk prediction for myocardial infarction via generalized functional regression models.

    PubMed

    Ieva, Francesca; Paganoni, Anna M

    2016-08-01

    In this paper, we propose a generalized functional linear regression model for a binary outcome indicating the presence/absence of a cardiac disease with multivariate functional data among the relevant predictors. In particular, the motivating aim is the analysis of electrocardiographic traces of patients whose pre-hospital electrocardiogram (ECG) has been sent to 118 Dispatch Center of Milan (the Italian free-toll number for emergencies) by life support personnel of the basic rescue units. The statistical analysis starts with a preprocessing of ECGs treated as multivariate functional data. The signals are reconstructed from noisy observations. The biological variability is then removed by a nonlinear registration procedure based on landmarks. Thus, in order to perform a data-driven dimensional reduction, a multivariate functional principal component analysis is carried out on the variance-covariance matrix of the reconstructed and registered ECGs and their first derivatives. We use the scores of the Principal Components decomposition as covariates in a generalized linear model to predict the presence of the disease in a new patient. Hence, a new semi-automatic diagnostic procedure is proposed to estimate the risk of infarction (in the case of interest, the probability of being affected by Left Bundle Brunch Block). The performance of this classification method is evaluated and compared with other methods proposed in literature. Finally, the robustness of the procedure is checked via leave-j-out techniques. © The Author(s) 2013.

  9. (13)C NMR pattern recognition techniques for the classification of Atlantic salmon (Salmo salar L.) according to their wild, farmed, and geographical origin.

    PubMed

    Aursand, Marit; Standal, Inger B; Praël, Angelika; McEvoy, Lesley; Irvine, Joe; Axelson, David E

    2009-05-13

    (13)C nuclear magnetic resonance (NMR) in combination with multivariate data analysis was used to (1) discriminate between farmed and wild Atlantic salmon ( Salmo salar L.), (2) discriminate between different geographical origins, and (3) verify the origin of market samples. Muscle lipids from 195 Atlantic salmon of known origin (wild and farmed salmon from Norway, Scotland, Canada, Iceland, Ireland, the Faroes, and Tasmania) in addition to market samples were analyzed by (13)C NMR spectroscopy and multivariate analysis. Both probabilistic neural networks (PNN) and support vector machines (SVM) provided excellent discrimination (98.5 and 100.0%, respectively) between wild and farmed salmon. Discrimination with respect to geographical origin was somewhat more difficult, with correct classification rates ranging from 82.2 to 99.3% by PNN and SVM, respectively. In the analysis of market samples, five fish labeled and purchased as wild salmon were classified as farmed salmon (indicating mislabeling), and there were also some discrepancies between the classification and the product declaration with regard to geographical origin.

  10. Investigation of production method, geographical origin and species authentication in commercially relevant shrimps using stable isotope ratio and/or multi-element analyses combined with chemometrics: an exploratory analysis.

    PubMed

    Ortea, Ignacio; Gallardo, José M

    2015-03-01

    Three factors defining the traceability of a food product are production method (wild or farmed), geographical origin and biological species, which have to be checked and guaranteed, not only in order to avoid mislabelling and commercial fraud, but also to address food safety issues and to comply with legal regulations. The aim of this study was to determine whether these three factors could be differentiated in shrimps using stable isotope ratio analysis of carbon and nitrogen and/or multi-element composition. Different multivariate statistics methods were applied to different data subsets in order to evaluate their performance in terms of classification or predictive ability. Although the success rates varied depending on the dataset used, the combination of both techniques allowed the correct classification of 100% of the samples according to their actual origin and method of production, and 93.5% according to biological species. Even though further studies including a larger number of samples in each group are needed in order to validate these findings, we can conclude that these methodologies should be considered for studies regarding seafood product authenticity. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. Classification of Brazilian and foreign gasolines adulterated with alcohol using infrared spectroscopy.

    PubMed

    da Silva, Neirivaldo C; Pimentel, Maria Fernanda; Honorato, Ricardo S; Talhavini, Marcio; Maldaner, Adriano O; Honorato, Fernanda A

    2015-08-01

    The smuggling of products across the border regions of many countries is a practice to be fought. Brazilian authorities are increasingly worried about the illicit trade of fuels along the frontiers of the country. In order to confirm this as a crime, the Federal Police must have a means of identifying the origin of the fuel. This work describes the development of a rapid and nondestructive methodology to classify gasoline as to its origin (Brazil, Venezuela and Peru), using infrared spectroscopy and multivariate classification. Partial Least Squares Discriminant Analysis (PLS-DA) and Soft Independent Modeling Class Analogy (SIMCA) models were built. Direct standardization (DS) was employed aiming to standardize the spectra obtained in different laboratories of the border units of the Federal Police. Two approaches were considered in this work: (1) local and (2) global classification models. When using Approach 1, the PLS-DA achieved 100% correct classification, and the deviation of the predicted values for the secondary instrument considerably decreased after performing DS. In this case, SIMCA models were not efficient in the classification, even after standardization. Using a global model (Approach 2), both PLS-DA and SIMCA techniques were effective after performing DS. Considering that real situations may involve questioned samples from other nations (such as Peru), the SIMCA method developed according to Approach 2 is a more adequate, since the sample will be classified neither as Brazil nor Venezuelan. This methodology could be applied to other forensic problems involving the chemical classification of a product, provided that a specific modeling is performed. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  12. Probability of identification: adulteration of American Ginseng with Asian Ginseng.

    PubMed

    Harnly, James; Chen, Pei; Harrington, Peter De B

    2013-01-01

    The AOAC INTERNATIONAL guidelines for validation of botanical identification methods were applied to the detection of Asian Ginseng [Panax ginseng (PG)] as an adulterant for American Ginseng [P. quinquefolius (PQ)] using spectral fingerprints obtained by flow injection mass spectrometry (FIMS). Samples of 100% PQ and 100% PG were physically mixed to provide 90, 80, and 50% PQ. The multivariate FIMS fingerprint data were analyzed using soft independent modeling of class analogy (SIMCA) based on 100% PQ. The Q statistic, a measure of the degree of non-fit of the test samples with the calibration model, was used as the analytical parameter. FIMS was able to discriminate between 100% PQ and 100% PG, and between 100% PQ and 90, 80, and 50% PQ. The probability of identification (POI) curve was estimated based on the SD of 90% PQ. A digital model of adulteration, obtained by mathematically summing the experimentally acquired spectra of 100% PQ and 100% PG in the desired ratios, agreed well with the physical data and provided an easy and more accurate method for constructing the POI curve. Two chemometric modeling methods, SIMCA and fuzzy optimal associative memories, and two classification methods, partial least squares-discriminant analysis and fuzzy rule-building expert systems, were applied to the data. The modeling methods correctly identified the adulterated samples; the classification methods did not.

  13. Geographic identification of Boletus mushrooms by data fusion of FT-IR and UV spectroscopies combined with multivariate statistical analysis

    NASA Astrophysics Data System (ADS)

    Yao, Sen; Li, Tao; Li, JieQing; Liu, HongGao; Wang, YuanZhong

    2018-06-01

    Boletus griseus and Boletus edulis are two well-known wild-grown edible mushrooms which have high nutrition, delicious flavor and high economic value distributing in Yunnan Province. In this study, a rapid method using Fourier transform infrared (FT-IR) and ultraviolet (UV) spectroscopies coupled with data fusion was established for the discrimination of Boletus mushrooms from seven different geographical origins with pattern recognition method. Initially, the spectra of 332 mushroom samples obtained from the two spectroscopic techniques were analyzed individually and then the classification performance based on data fusion strategy was investigated. Meanwhile, the latent variables (LVs) of FT-IR and UV spectra were extracted by partial least square discriminant analysis (PLS-DA) and two datasets were concatenated into a new matrix for data fusion. Then, the fusion matrix was further analyzed by support vector machine (SVM). Compared with single spectroscopic technique, data fusion strategy can improve the classification performance effectively. In particular, the accuracy of correct classification of SVM model in training and test sets were 99.10% and 100.00%, respectively. The results demonstrated that data fusion of FT-IR and UV spectra can provide higher synergic effect for the discrimination of different geographical origins of Boletus mushrooms, which may be benefit for further authentication and quality assessment of edible mushrooms.

  14. Optimal statistical damage detection and classification in an experimental wind turbine blade using minimum instrumentation

    NASA Astrophysics Data System (ADS)

    Hoell, Simon; Omenzetter, Piotr

    2017-04-01

    The increasing demand for carbon neutral energy in a challenging economic environment is a driving factor for erecting ever larger wind turbines in harsh environments using novel wind turbine blade (WTBs) designs characterized by high flexibilities and lower buckling capacities. To counteract resulting increasing of operation and maintenance costs, efficient structural health monitoring systems can be employed to prevent dramatic failures and to schedule maintenance actions according to the true structural state. This paper presents a novel methodology for classifying structural damages using vibrational responses from a single sensor. The method is based on statistical classification using Bayes' theorem and an advanced statistic, which allows controlling the performance by varying the number of samples which represent the current state. This is done for multivariate damage sensitive features defined as partial autocorrelation coefficients (PACCs) estimated from vibrational responses and principal component analysis scores from PACCs. Additionally, optimal DSFs are composed not only for damage classification but also for damage detection based on binary statistical hypothesis testing, where features selections are found with a fast forward procedure. The method is applied to laboratory experiments with a small scale WTB with wind-like excitation and non-destructive damage scenarios. The obtained results demonstrate the advantages of the proposed procedure and are promising for future applications of vibration-based structural health monitoring in WTBs.

  15. Geographic identification of Boletus mushrooms by data fusion of FT-IR and UV spectroscopies combined with multivariate statistical analysis.

    PubMed

    Yao, Sen; Li, Tao; Li, JieQing; Liu, HongGao; Wang, YuanZhong

    2018-06-05

    Boletus griseus and Boletus edulis are two well-known wild-grown edible mushrooms which have high nutrition, delicious flavor and high economic value distributing in Yunnan Province. In this study, a rapid method using Fourier transform infrared (FT-IR) and ultraviolet (UV) spectroscopies coupled with data fusion was established for the discrimination of Boletus mushrooms from seven different geographical origins with pattern recognition method. Initially, the spectra of 332 mushroom samples obtained from the two spectroscopic techniques were analyzed individually and then the classification performance based on data fusion strategy was investigated. Meanwhile, the latent variables (LVs) of FT-IR and UV spectra were extracted by partial least square discriminant analysis (PLS-DA) and two datasets were concatenated into a new matrix for data fusion. Then, the fusion matrix was further analyzed by support vector machine (SVM). Compared with single spectroscopic technique, data fusion strategy can improve the classification performance effectively. In particular, the accuracy of correct classification of SVM model in training and test sets were 99.10% and 100.00%, respectively. The results demonstrated that data fusion of FT-IR and UV spectra can provide higher synergic effect for the discrimination of different geographical origins of Boletus mushrooms, which may be benefit for further authentication and quality assessment of edible mushrooms. Copyright © 2018 Elsevier B.V. All rights reserved.

  16. LSST Astroinformatics And Astrostatistics: Data-oriented Astronomical Research

    NASA Astrophysics Data System (ADS)

    Borne, Kirk D.; Stassun, K.; Brunner, R. J.; Djorgovski, S. G.; Graham, M.; Hakkila, J.; Mahabal, A.; Paegert, M.; Pesenson, M.; Ptak, A.; Scargle, J.; Informatics, LSST; Statistics Team

    2011-01-01

    The LSST Informatics and Statistics Science Collaboration (ISSC) focuses on research and scientific discovery challenges posed by the very large and complex data collection that LSST will generate. Application areas include astroinformatics, machine learning, data mining, astrostatistics, visualization, scientific data semantics, time series analysis, and advanced signal processing. Research problems to be addressed with these methodologies include transient event characterization and classification, rare class discovery, correlation mining, outlier/anomaly/surprise detection, improved estimators (e.g., for photometric redshift or early onset supernova classification), exploration of highly dimensional (multivariate) data catalogs, and more. We present sample science results from these data-oriented approaches to large-data astronomical research. We present results from LSST ISSC team members, including the EB (Eclipsing Binary) Factory, the environmental variations in the fundamental plane of elliptical galaxies, and outlier detection in multivariate catalogs.

  17. Multivariate analysis of volatile compounds detected by headspace solid-phase microextraction/gas chromatography: A tool for sensory classification of cork stoppers.

    PubMed

    Prat, Chantal; Besalú, Emili; Bañeras, Lluís; Anticó, Enriqueta

    2011-06-15

    The volatile fraction of aqueous cork macerates of tainted and non-tainted agglomerate cork stoppers was analysed by headspace solid-phase microextraction (HS-SPME)/gas chromatography. Twenty compounds containing terpenoids, aliphatic alcohols, lignin-related compounds and others were selected and analysed in individual corks. Cork stoppers were previously classified in six different classes according to sensory descriptions including, 2,4,6-trichloroanisole taint and other frequent, non-characteristic odours found in cork. A multivariate analysis of the chromatographic data of 20 selected chemical compounds using linear discriminant analysis models helped in the differentiation of the a priori made groups. The discriminant model selected five compounds as the best combination. Selected compounds appear in the model in the following order; 2,4,6 TCA, fenchyl alcohol, 1-octen-3-ol, benzyl alcohol and benzothiazole. Unfortunately, not all six a priori differentiated sensory classes were clearly discriminated in the model, probably indicating that no measurable differences exist in the chromatographic data for some categories. The predictive analyses of a refined model in which two sensory classes were fused together resulted in a good classification. Prediction rates of control (non-tainted), TCA, musty-earthy-vegetative, vegetative and chemical descriptions were 100%, 100%, 85%, 67.3% and 100%, respectively, when the modified model was used. The multivariate analysis of chromatographic data will help in the classification of stoppers and provide a perfect complement to sensory analyses. Copyright © 2010 Elsevier Ltd. All rights reserved.

  18. Plurigon: three dimensional visualization and classification of high-dimensionality data

    PubMed Central

    Martin, Bronwen; Chen, Hongyu; Daimon, Caitlin M.; Chadwick, Wayne; Siddiqui, Sana; Maudsley, Stuart

    2013-01-01

    High-dimensionality data is rapidly becoming the norm for biomedical sciences and many other analytical disciplines. Not only is the collection and processing time for such data becoming problematic, but it has become increasingly difficult to form a comprehensive appreciation of high-dimensionality data. Though data analysis methods for coping with multivariate data are well-documented in technical fields such as computer science, little effort is currently being expended to condense data vectors that exist beyond the realm of physical space into an easily interpretable and aesthetic form. To address this important need, we have developed Plurigon, a data visualization and classification tool for the integration of high-dimensionality visualization algorithms with a user-friendly, interactive graphical interface. Unlike existing data visualization methods, which are focused on an ensemble of data points, Plurigon places a strong emphasis upon the visualization of a single data point and its determining characteristics. Multivariate data vectors are represented in the form of a deformed sphere with a distinct topology of hills, valleys, plateaus, peaks, and crevices. The gestalt structure of the resultant Plurigon object generates an easily-appreciable model. User interaction with the Plurigon is extensive; zoom, rotation, axial and vector display, feature extraction, and anaglyph stereoscopy are currently supported. With Plurigon and its ability to analyze high-complexity data, we hope to see a unification of biomedical and computational sciences as well as practical applications in a wide array of scientific disciplines. Increased accessibility to the analysis of high-dimensionality data may increase the number of new discoveries and breakthroughs, ranging from drug screening to disease diagnosis to medical literature mining. PMID:23885241

  19. Delay differential analysis of time series.

    PubMed

    Lainscsek, Claudia; Sejnowski, Terrence J

    2015-03-01

    Nonlinear dynamical system analysis based on embedding theory has been used for modeling and prediction, but it also has applications to signal detection and classification of time series. An embedding creates a multidimensional geometrical object from a single time series. Traditionally either delay or derivative embeddings have been used. The delay embedding is composed of delayed versions of the signal, and the derivative embedding is composed of successive derivatives of the signal. The delay embedding has been extended to nonuniform embeddings to take multiple timescales into account. Both embeddings provide information on the underlying dynamical system without having direct access to all the system variables. Delay differential analysis is based on functional embeddings, a combination of the derivative embedding with nonuniform delay embeddings. Small delay differential equation (DDE) models that best represent relevant dynamic features of time series data are selected from a pool of candidate models for detection or classification. We show that the properties of DDEs support spectral analysis in the time domain where nonlinear correlation functions are used to detect frequencies, frequency and phase couplings, and bispectra. These can be efficiently computed with short time windows and are robust to noise. For frequency analysis, this framework is a multivariate extension of discrete Fourier transform (DFT), and for higher-order spectra, it is a linear and multivariate alternative to multidimensional fast Fourier transform of multidimensional correlations. This method can be applied to short or sparse time series and can be extended to cross-trial and cross-channel spectra if multiple short data segments of the same experiment are available. Together, this time-domain toolbox provides higher temporal resolution, increased frequency and phase coupling information, and it allows an easy and straightforward implementation of higher-order spectra across time compared with frequency-based methods such as the DFT and cross-spectral analysis.

  20. Design of Embedded System for Multivariate Classification of Finger and Thumb Movements Using EEG Signals for Control of Upper Limb Prosthesis

    PubMed Central

    Javed, Amna; Tiwana, Mohsin I.; Khan, Umar Shahbaz

    2018-01-01

    Brain Computer Interface (BCI) determines the intent of the user from a variety of electrophysiological signals. These signals, Slow Cortical Potentials, are recorded from scalp, and cortical neuronal activity is recorded by implanted electrodes. This paper is focused on design of an embedded system that is used to control the finger movements of an upper limb prosthesis using Electroencephalogram (EEG) signals. This is a follow-up of our previous research which explored the best method to classify three movements of fingers (thumb movement, index finger movement, and first movement). Two-stage logistic regression classifier exhibited the highest classification accuracy while Power Spectral Density (PSD) was used as a feature of the filtered signal. The EEG signal data set was recorded using a 14-channel electrode headset (a noninvasive BCI system) from right-handed, neurologically intact volunteers. Mu (commonly known as alpha waves) and Beta Rhythms (8–30 Hz) containing most of the movement data were retained through filtering using “Arduino Uno” microcontroller followed by 2-stage logistic regression to obtain a mean classification accuracy of 70%. PMID:29888252

  1. All Rural Places Are Not Created Equal: Revisiting the Rural Mortality Penalty in the United States

    PubMed Central

    2014-01-01

    Objectives. I investigated mortality disparities between urban and rural areas by measuring disparities in urban US areas compared with 6 rural classifications, ranging from suburban to remote locales. Methods. Data from the Compressed Mortality File, National Center for Health Statistics, from 1968 to 2007, was used to calculate age-adjusted mortality rates for all rural and urban regions by year. Criteria measuring disparity between regions included excess deaths, annual rate of change in mortality, and proportion of excess deaths by population size. I used multivariable analysis to test for differences in determinants across regions. Results. The rural mortality penalty existed in all rural classifications, but the degree of disparity varied considerably. Rural–urban continuum code 6 was highly disadvantaged, and rural–urban continuum code 9 displayed a favorable mortality profile. Population, socioeconomic, and health care determinants of mortality varied across regions. Conclusions. A 2-decade long trend in mortality disparities existed in all rural classifications, but the penalty was not distributed evenly. This constitutes an important public health problem. Research should target the slow rates of improvement in mortality in the rural United States as an area of concern. PMID:25211763

  2. Study of the questionnaire of the Polytechnic University of Valencia (UPV) teaching staff, using students opinion survey. Statistical treatment

    NASA Astrophysics Data System (ADS)

    Martinez Gomez, Monica

    Quality improvement of university institutions represents the most important challenge in the next years, and the potential tool to achieve it is based on the institutional evaluation in general, and specially the evaluation of the teaching performance. The opinion questionnaire from the students is the most generalised tool used to evaluate the teaching performance at Spanish universities. The general objective of this thesis is to develop a statistical methodology suitable to extract, analyse and interpret the information contained in the Questionnaire of Teaching Evaluation from Student Opinion (CEDA) of the UPV, aimed at optimising its practical use. The study is centred in the application of different multivariate techniques and has been structured in three parts: (1) Evaluation of the reliability, validity and dimensionality of the tool. The multivariate method used for this purpose is the Factorial Analysis. (2) Determination of the capacity of the questionnaire to identify different profiles of lecturers based on the quality perceived by students. This target is conducted with different multivariate classification techniques: hierarchical cluster analysis, non-hierarchical and two-stage analysis. Moreover, those items that best discriminate among the teaching typologies obtained are identified in the questionnaire. (3) Identification of the teaching typologies according to different descriptive characteristics referent to the subject and lecturer, with the use of decision trees. Once identified these typologies, a new discriminant analysis is conducted aimed at identifying those items that best characterise each typology. Finally, a study is carried out with the classification method SIMCA (Soft Independent Modelling of Class Analogy) in order to determine the discriminant loading of every item among the identified teaching typologies, allowing the identification of those that best distinguish the different classes obtained. With the combined use of the proposed techniques, it is expected to optimise the use of CEDA as a measuring tool and an indicator of the teaching quality at the university, that would allow the introduction of actions for the continuous improvement in the teaching processes of the UPV.

  3. Multivariate Approach for Alzheimer's Disease Detection Using Stationary Wavelet Entropy and Predator-Prey Particle Swarm Optimization.

    PubMed

    Zhang, Yudong; Wang, Shuihua; Sui, Yuxiu; Yang, Ming; Liu, Bin; Cheng, Hong; Sun, Junding; Jia, Wenjuan; Phillips, Preetha; Gorriz, Juan Manuel

    2017-07-17

    The number of patients with Alzheimer's disease is increasing rapidly every year. Scholars often use computer vision and machine learning methods to develop an automatic diagnosis system. In this study, we developed a novel machine learning system that can make diagnoses automatically from brain magnetic resonance images. First, the brain imaging was processed, including skull stripping and spatial normalization. Second, one axial slice was selected from the volumetric image, and stationary wavelet entropy (SWE) was done to extract the texture features. Third, a single-hidden-layer neural network was used as the classifier. Finally, a predator-prey particle swarm optimization was proposed to train the weights and biases of the classifier. Our method used 4-level decomposition and yielded 13 SWE features. The classification yielded an overall accuracy of 92.73±1.03%, a sensitivity of 92.69±1.29%, and a specificity of 92.78±1.51%. The area under the curve is 0.95±0.02. Additionally, this method only cost 0.88 s to identify a subject in online stage, after its volumetric image is preprocessed. In terms of classification performance, our method performs better than 10 state-of-the-art approaches and the performance of human observers. Therefore, this proposed method is effective in the detection of Alzheimer's disease.

  4. Discrimination between Alzheimer's Disease and Late Onset Bipolar Disorder Using Multivariate Analysis.

    PubMed

    Besga, Ariadna; Gonzalez, Itxaso; Echeburua, Enrique; Savio, Alexandre; Ayerdi, Borja; Chyzhyk, Darya; Madrigal, Jose L M; Leza, Juan C; Graña, Manuel; Gonzalez-Pinto, Ana Maria

    2015-01-01

    Late onset bipolar disorder (LOBD) is often difficult to distinguish from degenerative dementias, such as Alzheimer disease (AD), due to comorbidities and common cognitive symptoms. Moreover, LOBD prevalence in the elder population is not negligible and it is increasing. Both pathologies share pathophysiological neuroinflammation features. Improvements in differential diagnosis of LOBD and AD will help to select the best personalized treatment. The aim of this study is to assess the relative significance of clinical observations, neuropsychological tests, and specific blood plasma biomarkers (inflammatory and neurotrophic), separately and combined, in the differential diagnosis of LOBD versus AD. It was carried out evaluating the accuracy achieved by classification-based computer-aided diagnosis (CAD) systems based on these variables. A sample of healthy controls (HC) (n = 26), AD patients (n = 37), and LOBD patients (n = 32) was recruited at the Alava University Hospital. Clinical observations, neuropsychological tests, and plasma biomarkers were measured at recruitment time. We applied multivariate machine learning classification methods to discriminate subjects from HC, AD, and LOBD populations in the study. We analyzed, for each classification contrast, feature sets combining clinical observations, neuropsychological measures, and biological markers, including inflammation biomarkers. Furthermore, we analyzed reduced feature sets containing variables with significative differences determined by a Welch's t-test. Furthermore, a battery of classifier architectures were applied, encompassing linear and non-linear Support Vector Machines (SVM), Random Forests (RF), Classification and regression trees (CART), and their performance was evaluated in a leave-one-out (LOO) cross-validation scheme. Post hoc analysis of Gini index in CART classifiers provided a measure of each variable importance. Welch's t-test found one biomarker (Malondialdehyde) with significative differences (p < 0.001) in LOBD vs. AD contrast. Classification results with the best features are as follows: discrimination of HC vs. AD patients reaches accuracy 97.21% and AUC 98.17%. Discrimination of LOBD vs. AD patients reaches accuracy 90.26% and AUC 89.57%. Discrimination of HC vs LOBD patients achieves accuracy 95.76% and AUC 88.46%. It is feasible to build CAD systems for differential diagnosis of LOBD and AD on the basis of a reduced set of clinical variables. Clinical observations provide the greatest discrimination. Neuropsychological tests are improved by the addition of biomarkers, and both contribute significantly to improve the overall predictive performance.

  5. Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation

    PubMed Central

    Parsons, Helen M; Ludwig, Christian; Günther, Ulrich L; Viant, Mark R

    2007-01-01

    Background Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising from sample preparation and analytical measurements, and thereby maximise any contribution from wanted biological variance between different classes. The generalised logarithm (glog) transform was developed to stabilise the variance in DNA microarray datasets, but has rarely been applied to metabolomics data. In particular, it has not been rigorously evaluated against other scaling techniques used in metabolomics, nor tested on all forms of NMR spectra including 1-dimensional (1D) 1H, projections of 2D 1H, 1H J-resolved (pJRES), and intact 2D J-resolved (JRES). Results Here, the effects of the glog transform are compared against two commonly used variance stabilising techniques, autoscaling and Pareto scaling, as well as unscaled data. The four methods are evaluated in terms of the effects on the variance of NMR metabolomics data and on the classification accuracy following multivariate analysis, the latter achieved using principal component analysis followed by linear discriminant analysis. For two of three datasets analysed, classification accuracies were highest following glog transformation: 100% accuracy for discriminating 1D NMR spectra of hypoxic and normoxic invertebrate muscle, and 100% accuracy for discriminating 2D JRES spectra of fish livers sampled from two rivers. For the third dataset, pJRES spectra of urine from two breeds of dog, the glog transform and autoscaling achieved equal highest accuracies. Additionally we extended the glog algorithm to effectively suppress noise, which proved critical for the analysis of 2D JRES spectra. Conclusion We have demonstrated that the glog and extended glog transforms stabilise the technical variance in NMR metabolomics datasets. This significantly improves the discrimination between sample classes and has resulted in higher classification accuracies compared to unscaled, autoscaled or Pareto scaled data. Additionally we have confirmed the broad applicability of the glog approach using three disparate datasets from different biological samples using 1D NMR spectra, 1D projections of 2D JRES spectra, and intact 2D JRES spectra. PMID:17605789

  6. Physicochemical properties of honey from Marche, Central Italy: classification of unifloral and multifloral honeys by multivariate analysis.

    PubMed

    Truzzi, Cristina; Illuminati, Silvia; Annibaldia, Anna; Finale, Carolina; Rossetti, Monica; Scarponi, Giuseppe

    2014-11-01

    The purpose of this study was the physicochemical characterization and classification of Italian honey from Marche Region with a chemometric approach. A total of 135 honeys of different botanical origins [acacia (Robinia pseudoacacia L.), chestnut (Castanea sativa), coriander (Coriandrum sativum L.), lime (Tilia spp.), sunflower (Helianthus annuus L.), Metcalfa honeydew and multifloral honey] were considered. The average results of electrical conductivity (0.14-1.45 mS cm(-1)), pH (3.89-5.42), free acidity (10.9-39.0 meq(NaOH) kg(-1)), lactones (2.4-4.5 meq(NaOH) kg(-1)), total acidity (14.5-40.9 meq(NaOH) kg(-1)), proline (229-665 mg kg(-1)) and 5-(hydroxy-methyl)-2-furaldehyde (0.6-3.9 mg kg(-1)) content show wide variability among the analysed honey types, with statistically significant differences between the different honey types. Pattern recognition methods such as principal component analysis and discriminant analysis were performed in order to find a relationship between variables and types of honey and to classify honey on the basis of its physicochemical properties. The variables of electrical conductivity, acidity (free, lactones), pH and proline content exhibited higher discriminant power and provided enough information for the classification and distinction of unifloral honey types, but not for the classification of multifloral honey (100% and 85% of samples correctly classified, respectively).

  7. Predict or classify: The deceptive role of time-locking in brain signal classification

    NASA Astrophysics Data System (ADS)

    Rusconi, Marco; Valleriani, Angelo

    2016-06-01

    Several experimental studies claim to be able to predict the outcome of simple decisions from brain signals measured before subjects are aware of their decision. Often, these studies use multivariate pattern recognition methods with the underlying assumption that the ability to classify the brain signal is equivalent to predict the decision itself. Here we show instead that it is possible to correctly classify a signal even if it does not contain any predictive information about the decision. We first define a simple stochastic model that mimics the random decision process between two equivalent alternatives, and generate a large number of independent trials that contain no choice-predictive information. The trials are first time-locked to the time point of the final event and then classified using standard machine-learning techniques. The resulting classification accuracy is above chance level long before the time point of time-locking. We then analyze the same trials using information theory. We demonstrate that the high classification accuracy is a consequence of time-locking and that its time behavior is simply related to the large relaxation time of the process. We conclude that when time-locking is a crucial step in the analysis of neural activity patterns, both the emergence and the timing of the classification accuracy are affected by structural properties of the network that generates the signal.

  8. Classification and source determination of medium petroleum distillates by chemometric and artificial neural networks: a self organizing feature approach.

    PubMed

    Mat-Desa, Wan N S; Ismail, Dzulkiflee; NicDaeid, Niamh

    2011-10-15

    Three different medium petroleum distillate (MPD) products (white spirit, paint brush cleaner, and lamp oil) were purchased from commercial stores in Glasgow, Scotland. Samples of 10, 25, 50, 75, 90, and 95% evaporated product were prepared, resulting in 56 samples in total which were analyzed using gas chromatography-mass spectrometry. Data sets from the chromatographic patterns were examined and preprocessed for unsupervised multivariate analyses using principal component analysis (PCA), hierarchical cluster analysis (HCA), and a self organizing feature map (SOFM) artificial neural network. It was revealed that data sets comprised of higher boiling point hydrocarbon compounds provided a good means for the classification of the samples and successfully linked highly weathered samples back to their unevaporated counterpart in every case. The classification abilities of SOFM were further tested and validated for their predictive abilities where one set of weather data in each case was withdrawn from the sample set and used as a test set of the retrained network. This revealed SOFM to be an outstanding mechanism for sample discrimination and linkage over the more conventional PCA and HCA methods often suggested for such data analysis. SOFM also has the advantage of providing additional information through the evaluation of component planes facilitating the investigation of underlying variables that account for the classification. © 2011 American Chemical Society

  9. Data mining for water resource management part 2 - methods and approaches to solving contemporary problems

    USGS Publications Warehouse

    Roehl, Edwin A.; Conrads, Paul

    2010-01-01

    This is the second of two papers that describe how data mining can aid natural-resource managers with the difficult problem of controlling the interactions between hydrologic and man-made systems. Data mining is a new science that assists scientists in converting large databases into knowledge, and is uniquely able to leverage the large amounts of real-time, multivariate data now being collected for hydrologic systems. Part 1 gives a high-level overview of data mining, and describes several applications that have addressed major water resource issues in South Carolina. This Part 2 paper describes how various data mining methods are integrated to produce predictive models for controlling surface- and groundwater hydraulics and quality. The methods include: - signal processing to remove noise and decompose complex signals into simpler components; - time series clustering that optimally groups hundreds of signals into "classes" that behave similarly for data reduction and (or) divide-and-conquer problem solving; - classification which optimally matches new data to behavioral classes; - artificial neural networks which optimally fit multivariate data to create predictive models; - model response surface visualization that greatly aids in understanding data and physical processes; and, - decision support systems that integrate data, models, and graphics into a single package that is easy to use.

  10. Multivariate Classification of Original and Fake Perfumes by Ion Analysis and Ethanol Content.

    PubMed

    Gomes, Clêrton L; de Lima, Ari Clecius A; Loiola, Adonay R; da Silva, Abel B R; Cândido, Manuela C L; Nascimento, Ronaldo F

    2016-07-01

    The increased marketing of fake perfumes has encouraged us to investigate how to identify such products by their chemical characteristics and multivariate analysis. The aim of this study was to present an alternative approach to distinguish original from fake perfumes by means of the investigation of sodium, potassium, chloride ions, and ethanol contents by chemometric tools. For this, 50 perfumes were used (25 original and 25 counterfeit) for the analysis of ions (ion chromatography) and ethanol (gas chromatography). The results demonstrated that the fake perfume had low levels of ethanol and high levels of chloride compared to the original product. The data were treated by chemometric tools such as principal component analysis and linear discriminant analysis. This study proved that the analysis of ethanol is an effective method of distinguishing original from the fake products, and it may potentially be used to assist legal authorities in such cases. © 2016 American Academy of Forensic Sciences.

  11. Discrimination of whisky brands and counterfeit identification by UV-Vis spectroscopy and multivariate data analysis.

    PubMed

    Martins, Angélica Rocha; Talhavini, Márcio; Vieira, Maurício Leite; Zacca, Jorge Jardim; Braga, Jez Willian Batista

    2017-08-15

    The discrimination of whisky brands and counterfeit identification were performed by UV-Vis spectroscopy combined with partial least squares for discriminant analysis (PLS-DA). In the proposed method all spectra were obtained with no sample preparation. The discrimination models were built with the employment of seven whisky brands: Red Label, Black Label, White Horse, Chivas Regal (12years), Ballantine's Finest, Old Parr and Natu Nobilis. The method was validated with an independent test set of authentic samples belonging to the seven selected brands and another eleven brands not included in the training samples. Furthermore, seventy-three counterfeit samples were also used to validate the method. Results showed correct classification rates for genuine and false samples over 98.6% and 93.1%, respectively, indicating that the method can be helpful for the forensic analysis of whisky samples. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Bayesian Integration and Classification of Composition C-4 Plastic Explosives Based on Time-of-Flight-Secondary Ion Mass Spectrometry and Laser Ablation-Inductively Coupled Plasma Mass Spectrometry.

    PubMed

    Mahoney, Christine M; Kelly, Ryan T; Alexander, Liz; Newburn, Matt; Bader, Sydney; Ewing, Robert G; Fahey, Albert J; Atkinson, David A; Beagley, Nathaniel

    2016-04-05

    Time-of-flight-secondary ion mass spectrometry (TOF-SIMS) and laser ablation-inductively coupled plasma mass spectrometry (LA-ICPMS) were used for characterization and identification of unique signatures from a series of 18 Composition C-4 plastic explosives. The samples were obtained from various commercial and military sources around the country. Positive and negative ion TOF-SIMS data were acquired directly from the C-4 residue on Si surfaces, where the positive ion mass spectra obtained were consistent with the major composition of organic additives, and the negative ion mass spectra were more consistent with explosive content in the C-4 samples. Each series of mass spectra was subjected to partial least squares-discriminant analysis (PLS-DA), a multivariate statistical analysis approach which serves to first find the areas of maximum variance within different classes of C-4 and subsequently to classify unknown samples based on correlations between the unknown data set and the original data set (often referred to as a training data set). This method was able to successfully classify test samples of C-4, though with a limited degree of certainty. The classification accuracy of the method was further improved by integrating the positive and negative ion data using a Bayesian approach. The TOF-SIMS data was combined with a second analytical method, LA-ICPMS, which was used to analyze elemental signatures in the C-4. The integrated data were able to classify test samples with a high degree of certainty. Results indicate that this Bayesian integrated approach constitutes a robust classification method that should be employable even in dirty samples collected in the field.

  13. Importance of recurrence rating, morphology, hernial gap size, and risk factors in ventral and incisional hernia classification.

    PubMed

    Dietz, U A; Winkler, M S; Härtel, R W; Fleischhacker, A; Wiegering, A; Isbert, C; Jurowich, Ch; Heuschmann, P; Germer, C-T

    2014-02-01

    There is limited evidence on the natural course of ventral and incisional hernias and the results of hernia repair, what might partially be explained by the lack of an accepted classification system. The aim of the present study is to investigate the association of the criteria included in the Wuerzburg classification system of ventral and incisional hernias with postoperative complications and long-term recurrence. In a retrospective cohort study, the data on 330 consecutive patients who underwent surgery to repair ventral and incisional hernias were analyzed. The following four classification criteria were applied: (a) recurrence rating (ventral, incisional or incisional recurrent); (b) morphology (location); (c) size of the hernial gap; and (d) risk factors. The primary endpoint was the occurrence of a recurrence during follow-up. Secondary endpoints were incidence of postoperative complications. Independent association between classification criteria, type of surgical procedures and postoperative complications was calculated by multivariate logistic regression analysis and between classification criteria, type of surgical procedures and risk of long-term recurrence by Cox regression analysis. Follow-up lasted a mean 47.7 ± 23.53 months (median 45 months) or 3.9 ± 1.96 years. The criterion "recurrence rating" was found as predictive factor for postoperative complications in the multivariate analysis (OR 2.04; 95 % CI 1.09-3.84; incisional vs. ventral hernia). The criterion "morphology" had influence neither on the incidence of the critical event "recurrence during follow-up" nor on the incidence of postoperative complications. Hernial gap "width" predicted postoperative complications in the multivariate analysis (OR 1.98; 95 % CI 1.19-3.29; ≤5 vs. >5 cm). Length of the hernial gap was found to be an independent prognostic factor for the critical event "recurrence during follow-up" (HR 2.05; 95 % CI 1.25-3.37; ≤5 vs. >5 cm). The presence of 3 or more risk factors was a consistent predictor for "recurrence during follow-up" (HR 2.25; 95 % CI 1.28-9.92). Mesh repair was an independent protective factor for "recurrence during follow-up" compared to suture (HR 0.53; 95 % CI 0.32-0.86). The ventral and incisional hernia classification of Dietz et al. employs a clinically proven terminology and has an open classification structure. Hernial gap size and the number of risk factors are independent predictors for "recurrence during follow-up", whereas recurrence rating and hernial gap size correlated significantly with the incidence of postoperative complications. We propose the application of these criteria for future clinical research, as larger patient numbers will be needed to refine the results.

  14. From sensation to perception: Using multivariate classification of visual illusions to identify neural correlates of conscious awareness in space and time.

    PubMed

    Hogendoorn, Hinze

    2015-01-01

    An important goal of cognitive neuroscience is understanding the neural underpinnings of conscious awareness. Although the low-level processing of sensory input is well understood in most modalities, it remains a challenge to understand how the brain translates such input into conscious awareness. Here, I argue that the application of multivariate pattern classification techniques to neuroimaging data acquired while observers experience perceptual illusions provides a unique way to dissociate sensory mechanisms from mechanisms underlying conscious awareness. Using this approach, it is possible to directly compare patterns of neural activity that correspond to the contents of awareness, independent from changes in sensory input, and to track these neural representations over time at high temporal resolution. I highlight five recent studies using this approach, and provide practical considerations and limitations for future implementations.

  15. Discrimination of Gastrodia elata from Different Geographical Origin for Quality Evaluation Using Newly-Build Near Infrared Spectrum Coupled with Multivariate Analysis.

    PubMed

    Zuo, Yamin; Deng, Xuehua; Wu, Qing

    2018-05-04

    Discrimination of Gastrodia elata ( G. elata ) geographical origin is of great importance to pharmaceutical companies and consumers in China. this paper focuses on the feasibility of near infrared spectrum (NIRS) combined multivariate analysis as a rapid and non-destructive method to prove its fit for this purpose. Firstly, 16 batches of G. elata samples from four main-cultivation regions in China were quantified by traditional HPLC method. It showed that samples from different origins could not be efficiently differentiated by the contents of four phenolic compounds in this study. Secondly, the raw near infrared (NIR) spectra of those samples were acquired and two different pattern recognition techniques were used to classify the geographical origins. The results showed that with spectral transformation optimized, discriminant analysis (DA) provided 97% and 99% correct classification for the calibration and validation sets of samples from discriminating of four different main-cultivation regions, and provided 98% and 99% correct classifications for the calibration and validation sets of samples from eight different cities, respectively, which all performed better than the principal component analysis (PCA) method. Thirdly, as phenolic compounds content (PCC) is highly related with the quality of G. elata , synergy interval partial least squares (Si-PLS) was applied to build the PCC prediction model. The coefficient of determination for prediction (R p ²) of the Si-PLS model was 0.9209, and root mean square error for prediction (RMSEP) was 0.338. The two regions (4800 cm −1 ⁻5200 cm −1 , and 5600 cm −1 ⁻6000 cm −1 ) selected by Si-PLS corresponded to the absorptions of aromatic ring in the basic phenolic structure. It can be concluded that NIR spectroscopy combined with PCA, DA and Si-PLS would be a potential tool to provide a reference for the quality control of G. elata.

  16. Differences in chewing sounds of dry-crisp snacks by multivariate data analysis

    NASA Astrophysics Data System (ADS)

    De Belie, N.; Sivertsvik, M.; De Baerdemaeker, J.

    2003-09-01

    Chewing sounds of different types of dry-crisp snacks (two types of potato chips, prawn crackers, cornflakes and low calorie snacks from extruded starch) were analysed to assess differences in sound emission patterns. The emitted sounds were recorded by a microphone placed over the ear canal. The first bite and the first subsequent chew were selected from the time signal and a fast Fourier transformation provided the power spectra. Different multivariate analysis techniques were used for classification of the snack groups. This included principal component analysis (PCA) and unfold partial least-squares (PLS) algorithms, as well as multi-way techniques such as three-way PLS, three-way PCA (Tucker3), and parallel factor analysis (PARAFAC) on the first bite and subsequent chew. The models were evaluated by calculating the classification errors and the root mean square error of prediction (RMSEP) for independent validation sets. It appeared that the logarithm of the power spectra obtained from the chewing sounds could be used successfully to distinguish the different snack groups. When different chewers were used, recalibration of the models was necessary. Multi-way models distinguished better between chewing sounds of different snack groups than PCA on bite or chew separately and than unfold PLS. From all three-way models applied, N-PLS with three components showed the best classification capabilities, resulting in classification errors of 14-18%. The major amount of incorrect classifications was due to one type of potato chips that had a very irregular shape, resulting in a wide variation of the emitted sounds.

  17. A machine learning approach to identify functional biomarkers in human prefrontal cortex for individuals with traumatic brain injury using functional near-infrared spectroscopy.

    PubMed

    Karamzadeh, Nader; Amyot, Franck; Kenney, Kimbra; Anderson, Afrouz; Chowdhry, Fatima; Dashtestani, Hadis; Wassermann, Eric M; Chernomordik, Victor; Boccara, Claude; Wegman, Edward; Diaz-Arrastia, Ramon; Gandjbakhche, Amir H

    2016-11-01

    We have explored the potential prefrontal hemodynamic biomarkers to characterize subjects with Traumatic Brain Injury (TBI) by employing the multivariate machine learning approach and introducing a novel task-related hemodynamic response detection followed by a heuristic search for optimum set of hemodynamic features. To achieve this goal, the hemodynamic response from a group of 31 healthy controls and 30 chronic TBI subjects were recorded as they performed a complexity task. To determine the optimum hemodynamic features, we considered 11 features and their combinations in characterizing TBI subjects. We investigated the significance of the features by utilizing a machine learning classification algorithm to score all the possible combinations of features according to their predictive power. The identified optimum feature elements resulted in classification accuracy, sensitivity, and specificity of 85%, 85%, and 84%, respectively. Classification improvement was achieved for TBI subject classification through feature combination. It signified the major advantage of the multivariate analysis over the commonly used univariate analysis suggesting that the features that are individually irrelevant in characterizing the data may become relevant when used in combination. We also conducted a spatio-temporal classification to identify regions within the prefrontal cortex (PFC) that contribute in distinguishing between TBI and healthy subjects. As expected, Brodmann areas (BA) 10 within the PFC were isolated as the region that healthy subjects (unlike subjects with TBI), showed major hemodynamic activity in response to the High Complexity task. Overall, our results indicate that identified temporal and spatio-temporal features from PFC's hemodynamic activity are promising biomarkers in classifying subjects with TBI.

  18. Development and validation of a casemix classification to predict costs of specialist palliative care provision across inpatient hospice, hospital and community settings in the UK: a study protocol.

    PubMed

    Guo, Ping; Dzingina, Mendwas; Firth, Alice M; Davies, Joanna M; Douiri, Abdel; O'Brien, Suzanne M; Pinto, Cathryn; Pask, Sophie; Higginson, Irene J; Eagar, Kathy; Murtagh, Fliss E M

    2018-03-17

    Provision of palliative care is inequitable with wide variations across conditions and settings in the UK. Lack of a standard way to classify by case complexity is one of the principle obstacles to addressing this. We aim to develop and validate a casemix classification to support the prediction of costs of specialist palliative care provision. Phase I: A cohort study to determine the variables and potential classes to be included in a casemix classification. Data are collected from clinicians in palliative care services across inpatient hospice, hospital and community settings on: patient demographics, potential complexity/casemix criteria and patient-level resource use. Cost predictors are derived using multivariate regression and then incorporated into a classification using classification and regression trees. Internal validation will be conducted by bootstrapping to quantify any optimism in the predictive performance (calibration and discrimination) of the developed classification. Phase II: A mixed-methods cohort study across settings for external validation of the classification developed in phase I. Patient and family caregiver data will be collected longitudinally on demographics, potential complexity/casemix criteria and patient-level resource use. This will be triangulated with data collected from clinicians on potential complexity/casemix criteria and patient-level resource use, and with qualitative interviews with patients and caregivers about care provision across difference settings. The classification will be refined on the basis of its performance in the validation data set. The study has been approved by the National Health Service Health Research Authority Research Ethics Committee. The results are expected to be disseminated in 2018 through papers for publication in major palliative care journals; policy briefs for clinicians, commissioning leads and policy makers; and lay summaries for patients and public. ISRCTN90752212. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  19. Development and validation of the SIMPLE endoscopic classification of diminutive and small colorectal polyps.

    PubMed

    Iacucci, Marietta; Trovato, Cristina; Daperno, Marco; Akinola, Oluseyi; Greenwald, David; Gross, Seth A; Hoffman, Arthur; Lee, Jeffrey; Lethebe, Brendan C; Lowerison, Mark; Nayor, Jennifer; Neumann, Helmut; Rath, Timo; Sanduleanu, Silvia; Sharma, Prateek; Kiesslich, Ralf; Ghosh, Subrata; Saltzman, John R

    2018-03-23

    Prediction of histology of small polyps facilitates colonoscopic treatment. The aims of this study were: 1) to develop a simplified polyp classification, 2) to evaluate its performance in predicting polyp histology, and 3) to evaluate the reproducibility of the classification by trainees using multiplatform endoscopic systems. In phase 1, a new simplified endoscopic classification for polyps - Simplified Identification Method for Polyp Labeling during Endoscopy (SIMPLE) - was created, using the new I-SCAN OE system (Pentax, Tokyo, Japan), by eight international experts. In phase 2, the accuracy, level of confidence, and interobserver agreement to predict polyp histology before and after training, and univariable/multivariable analysis of the endoscopic features, were performed. In phase 3, the reproducibility of SIMPLE by trainees using different endoscopy platforms was evaluated. Using the SIMPLE classification, the accuracy of experts in predicting polyps was 83 % (95 % confidence interval [CI] 77 % - 88 %) before and 94 % (95 %CI 89 % - 97 %) after training ( P   = 0.002). The sensitivity, specificity, positive predictive value, and negative predictive value after training were 97 %, 88 %, 95 %, and 91 %. The interobserver agreement of polyp diagnosis improved from 0.46 (95 %CI 0.30 - 0.64) before to 0.66 (95 %CI 0.48 - 0.82) after training. The trainees demonstrated that the SIMPLE classification is applicable across endoscopy platforms, with similar post-training accuracies for narrow-band imaging NBI classification (0.69; 95 %CI 0.64 - 0.73) and SIMPLE (0.71; 95 %CI 0.67 - 0.75). Using the I-SCAN OE system, the new SIMPLE classification demonstrated a high degree of accuracy for adenoma diagnosis, meeting the ASGE PIVI recommendations. We demonstrated that SIMPLE may be used with either I-SCAN OE or NBI. © Georg Thieme Verlag KG Stuttgart · New York.

  20. Development of Raman microspectroscopy for automated detection and imaging of basal cell carcinoma

    NASA Astrophysics Data System (ADS)

    Larraona-Puy, Marta; Ghita, Adrian; Zoladek, Alina; Perkins, William; Varma, Sandeep; Leach, Iain H.; Koloydenko, Alexey A.; Williams, Hywel; Notingher, Ioan

    2009-09-01

    We investigate the potential of Raman microspectroscopy (RMS) for automated evaluation of excised skin tissue during Mohs micrographic surgery (MMS). The main aim is to develop an automated method for imaging and diagnosis of basal cell carcinoma (BCC) regions. Selected Raman bands responsible for the largest spectral differences between BCC and normal skin regions and linear discriminant analysis (LDA) are used to build a multivariate supervised classification model. The model is based on 329 Raman spectra measured on skin tissue obtained from 20 patients. BCC is discriminated from healthy tissue with 90+/-9% sensitivity and 85+/-9% specificity in a 70% to 30% split cross-validation algorithm. This multivariate model is then applied on tissue sections from new patients to image tumor regions. The RMS images show excellent correlation with the gold standard of histopathology sections, BCC being detected in all positive sections. We demonstrate the potential of RMS as an automated objective method for tumor evaluation during MMS. The replacement of current histopathology during MMS by a ``generalization'' of the proposed technique may improve the feasibility and efficacy of MMS, leading to a wider use according to clinical need.

  1. Quantitative Outline-based Shape Analysis and Classification of Planetary Craterforms using Supervised Learning Models

    NASA Astrophysics Data System (ADS)

    Slezak, Thomas Joseph; Radebaugh, Jani; Christiansen, Eric

    2017-10-01

    The shapes of craterform morphology on planetary surfaces provides rich information about their origins and evolution. While morphologic information provides rich visual clues to geologic processes and properties, the ability to quantitatively communicate this information is less easily accomplished. This study examines the morphology of craterforms using the quantitative outline-based shape methods of geometric morphometrics, commonly used in biology and paleontology. We examine and compare landforms on planetary surfaces using shape, a property of morphology that is invariant to translation, rotation, and size. We quantify the shapes of paterae on Io, martian calderas, terrestrial basaltic shield calderas, terrestrial ash-flow calderas, and lunar impact craters using elliptic Fourier analysis (EFA) and the Zahn and Roskies (Z-R) shape function, or tangent angle approach to produce multivariate shape descriptors. These shape descriptors are subjected to multivariate statistical analysis including canonical variate analysis (CVA), a multiple-comparison variant of discriminant analysis, to investigate the link between craterform shape and classification. Paterae on Io are most similar in shape to terrestrial ash-flow calderas and the shapes of terrestrial basaltic shield volcanoes are most similar to martian calderas. The shapes of lunar impact craters, including simple, transitional, and complex morphology, are classified with a 100% rate of success in all models. Multiple CVA models effectively predict and classify different craterforms using shape-based identification and demonstrate significant potential for use in the analysis of planetary surfaces.

  2. Comparison of the 7(th) and proposed 8(th) editions of the AJCC/UICC TNM staging system for non-small cell lung cancer undergoing radical surgery.

    PubMed

    Jin, Ying; Chen, Ming; Yu, Xinmin

    2016-09-19

    The present study aims to compare the 7(th) and the proposed 8(th) edition of the AJCC/UICC TNM staging system for NSCLC in a cohort of patients from a single institution. A total of 408 patients with NSCLC who underwent radical surgery were analyzed retrospectively. Survivals were analyzed using the Kaplan -Meier method and were compared using the log-rank test. Multivariate analysis was performed by the Cox proportional hazard model. The Akaike information criterion (AIC) and C-index were applied to compare the two prognostic systems with different numbers of stages. The 7(th) AJCC T categories, the proposed 8(th) AJCC T categories, N categories, visceral pleural invasion, and vessel invasion were found to have statistically significant associations with disease-free survival (DFS) on univariate analysis. In the 7(th) edition staging system as well as in the proposed 8(th) edition, T categories, N categories, and pleural invasion were independent factors for DFS on multivariate analysis. The AIC value was smaller for the 8(th) edition compared to the 7(th) edition staging system. The C-index value was larger for the 8(th) edition compared to the 7(th) edition staging system. Based on the data from our single center, the proposed 8(th) AJCC T classification seems to be superior to the 7(th) AJCC T classification in terms of DFS for patients with NSCLC underwent radical surgery.

  3. Optical biopsy using fluorescence spectroscopy for prostate cancer diagnosis

    NASA Astrophysics Data System (ADS)

    Wu, Binlin; Gao, Xin; Smith, Jason; Bailin, Jacob

    2017-02-01

    Native fluorescence spectra are acquired from fresh normal and cancerous human prostate tissues. The fluorescence data are analyzed using a multivariate analysis algorithm such as non-negative matrix factorization. The nonnegative spectral components are retrieved and attributed to the native fluorophores such as collagen, reduced nicotinamide adenine dinucleotide (NADH), and flavin adenine dinucleotide (FAD) in tissue. The retrieved weights of the components, e.g. NADH and FAD are used to estimate the relative concentrations of the native fluorophores and the redox ratio. A machine learning algorithm such as support vector machine (SVM) is used for classification to distinguish normal and cancerous tissue samples based on either the relative concentrations of NADH and FAD or the redox ratio alone. The classification performance is shown based on statistical measures such as sensitivity, specificity, and accuracy, along with the area under receiver operating characteristic (ROC) curve. A cross validation method such as leave-one-out is used to evaluate the predictive performance of the SVM classifier to avoid bias due to overfitting.

  4. Novel high-resolution computed tomography-based radiomic classifier for screen-identified pulmonary nodules in the National Lung Screening Trial.

    PubMed

    Peikert, Tobias; Duan, Fenghai; Rajagopalan, Srinivasan; Karwoski, Ronald A; Clay, Ryan; Robb, Richard A; Qin, Ziling; Sicks, JoRean; Bartholmai, Brian J; Maldonado, Fabien

    2018-01-01

    Optimization of the clinical management of screen-detected lung nodules is needed to avoid unnecessary diagnostic interventions. Herein we demonstrate the potential value of a novel radiomics-based approach for the classification of screen-detected indeterminate nodules. Independent quantitative variables assessing various radiologic nodule features such as sphericity, flatness, elongation, spiculation, lobulation and curvature were developed from the NLST dataset using 726 indeterminate nodules (all ≥ 7 mm, benign, n = 318 and malignant, n = 408). Multivariate analysis was performed using least absolute shrinkage and selection operator (LASSO) method for variable selection and regularization in order to enhance the prediction accuracy and interpretability of the multivariate model. The bootstrapping method was then applied for the internal validation and the optimism-corrected AUC was reported for the final model. Eight of the originally considered 57 quantitative radiologic features were selected by LASSO multivariate modeling. These 8 features include variables capturing Location: vertical location (Offset carina centroid z), Size: volume estimate (Minimum enclosing brick), Shape: flatness, Density: texture analysis (Score Indicative of Lesion/Lung Aggression/Abnormality (SILA) texture), and surface characteristics: surface complexity (Maximum shape index and Average shape index), and estimates of surface curvature (Average positive mean curvature and Minimum mean curvature), all with P<0.01. The optimism-corrected AUC for these 8 features is 0.939. Our novel radiomic LDCT-based approach for indeterminate screen-detected nodule characterization appears extremely promising however independent external validation is needed.

  5. Objective classification of ecological status in marine water bodies using ecotoxicological information and multivariate analysis.

    PubMed

    Beiras, Ricardo; Durán, Iria

    2014-12-01

    Some relevant shortcomings have been identified in the current approach for the classification of ecological status in marine water bodies, leading to delays in the fulfillment of the Water Framework Directive objectives. Natural variability makes difficult to settle fixed reference values and boundary values for the Ecological Quality Ratios (EQR) for the biological quality elements. Biological responses to environmental degradation are frequently of nonmonotonic nature, hampering the EQR approach. Community structure traits respond only once ecological damage has already been done and do not provide early warning signals. An alternative methodology for the classification of ecological status integrating chemical measurements, ecotoxicological bioassays and community structure traits (species richness and diversity), and using multivariate analyses (multidimensional scaling and cluster analysis), is proposed. This approach does not depend on the arbitrary definition of fixed reference values and EQR boundary values, and it is suitable to integrate nonlinear, sensitive signals of ecological degradation. As a disadvantage, this approach demands the inclusion of sampling sites representing the full range of ecological status in each monitoring campaign. National or international agencies in charge of coastal pollution monitoring have comprehensive data sets available to overcome this limitation.

  6. Multivariate Statistical Analysis of Orthogonal Mass Spectral Data for the Identification of Chemical Attribution Signatures of 3-Methylfentanyl

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mayer, B. P.; Valdez, C. A.; DeHope, A. J.

    Critical to many modern forensic investigations is the chemical attribution of the origin of an illegal drug. This process greatly relies on identification of compounds indicative of its clandestine or commercial production. The results of these studies can yield detailed information on method of manufacture, sophistication of the synthesis operation, starting material source, and final product. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic 3- methylfentanyl, N-(3-methyl-1-phenethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods were studied in an effort to identify and classify route-specific signatures. These methods were chosen to minimize the use of scheduledmore » precursors, complicated laboratory equipment, number of overall steps, and demanding reaction conditions. Using gas and liquid chromatographies combined with mass spectrometric methods (GC-QTOF and LC-QTOF) in conjunction with inductivelycoupled plasma mass spectrometry (ICP-MS), over 240 distinct compounds and elements were monitored. As seen in our previous work with CAS of fentanyl synthesis the complexity of the resultant data matrix necessitated the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 62 statistically significant, route-specific CAS were identified. Statistical classification models using a variety of machine learning techniques were then developed with the ability to predict the method of 3-methylfentanyl synthesis from three blind crude samples generated by synthetic chemists without prior experience with these methods.« less

  7. A comparison of different chemometrics approaches for the robust classification of electronic nose data.

    PubMed

    Gromski, Piotr S; Correa, Elon; Vaughan, Andrew A; Wedge, David C; Turner, Michael L; Goodacre, Royston

    2014-11-01

    Accurate detection of certain chemical vapours is important, as these may be diagnostic for the presence of weapons, drugs of misuse or disease. In order to achieve this, chemical sensors could be deployed remotely. However, the readout from such sensors is a multivariate pattern, and this needs to be interpreted robustly using powerful supervised learning methods. Therefore, in this study, we compared the classification accuracy of four pattern recognition algorithms which include linear discriminant analysis (LDA), partial least squares-discriminant analysis (PLS-DA), random forests (RF) and support vector machines (SVM) which employed four different kernels. For this purpose, we have used electronic nose (e-nose) sensor data (Wedge et al., Sensors Actuators B Chem 143:365-372, 2009). In order to allow direct comparison between our four different algorithms, we employed two model validation procedures based on either 10-fold cross-validation or bootstrapping. The results show that LDA (91.56% accuracy) and SVM with a polynomial kernel (91.66% accuracy) were very effective at analysing these e-nose data. These two models gave superior prediction accuracy, sensitivity and specificity in comparison to the other techniques employed. With respect to the e-nose sensor data studied here, our findings recommend that SVM with a polynomial kernel should be favoured as a classification method over the other statistical models that we assessed. SVM with non-linear kernels have the advantage that they can be used for classifying non-linear as well as linear mapping from analytical data space to multi-group classifications and would thus be a suitable algorithm for the analysis of most e-nose sensor data.

  8. Classification of autistic individuals and controls using cross-task characterization of fMRI activity

    PubMed Central

    Chanel, Guillaume; Pichon, Swann; Conty, Laurence; Berthoz, Sylvie; Chevallier, Coralie; Grèzes, Julie

    2015-01-01

    Multivariate pattern analysis (MVPA) has been applied successfully to task-based and resting-based fMRI recordings to investigate which neural markers distinguish individuals with autistic spectrum disorders (ASD) from controls. While most studies have focused on brain connectivity during resting state episodes and regions of interest approaches (ROI), a wealth of task-based fMRI datasets have been acquired in these populations in the last decade. This calls for techniques that can leverage information not only from a single dataset, but from several existing datasets that might share some common features and biomarkers. We propose a fully data-driven (voxel-based) approach that we apply to two different fMRI experiments with social stimuli (faces and bodies). The method, based on Support Vector Machines (SVMs) and Recursive Feature Elimination (RFE), is first trained for each experiment independently and each output is then combined to obtain a final classification output. Second, this RFE output is used to determine which voxels are most often selected for classification to generate maps of significant discriminative activity. Finally, to further explore the clinical validity of the approach, we correlate phenotypic information with obtained classifier scores. The results reveal good classification accuracy (range between 69% and 92.3%). Moreover, we were able to identify discriminative activity patterns pertaining to the social brain without relying on a priori ROI definitions. Finally, social motivation was the only dimension which correlated with classifier scores, suggesting that it is the main dimension captured by the classifiers. Altogether, we believe that the present RFE method proves to be efficient and may help identifying relevant biomarkers by taking advantage of acquired task-based fMRI datasets in psychiatric populations. PMID:26793434

  9. Decoding the Traumatic Memory among Women with PTSD: Implications for Neurocircuitry Models of PTSD and Real-Time fMRI Neurofeedback

    PubMed Central

    Cisler, Josh M.; Bush, Keith; James, G. Andrew; Smitherman, Sonet; Kilts, Clinton D.

    2015-01-01

    Posttraumatic Stress Disorder (PTSD) is characterized by intrusive recall of the traumatic memory. While numerous studies have investigated the neural processing mechanisms engaged during trauma memory recall in PTSD, these analyses have only focused on group-level contrasts that reveal little about the predictive validity of the identified brain regions. By contrast, a multivariate pattern analysis (MVPA) approach towards identifying the neural mechanisms engaged during trauma memory recall would entail testing whether a multivariate set of brain regions is reliably predictive of (i.e., discriminates) whether an individual is engaging in trauma or non-trauma memory recall. Here, we use a MVPA approach to test 1) whether trauma memory vs neutral memory recall can be predicted reliably using a multivariate set of brain regions among women with PTSD related to assaultive violence exposure (N=16), 2) the methodological parameters (e.g., spatial smoothing, number of memory recall repetitions, etc.) that optimize classification accuracy and reproducibility of the feature weight spatial maps, and 3) the correspondence between brain regions that discriminate trauma memory recall and the brain regions predicted by neurocircuitry models of PTSD. Cross-validation classification accuracy was significantly above chance for all methodological permutations tested; mean accuracy across participants was 76% for the methodological parameters selected as optimal for both efficiency and accuracy. Classification accuracy was significantly better for a voxel-wise approach relative to voxels within restricted regions-of-interest (ROIs); classification accuracy did not differ when using PTSD-related ROIs compared to randomly generated ROIs. ROI-based analyses suggested the reliable involvement of the left hippocampus in discriminating memory recall across participants and that the contribution of the left amygdala to the decision function was dependent upon PTSD symptom severity. These results have methodological implications for real-time fMRI neurofeedback of the trauma memory in PTSD and conceptual implications for neurocircuitry models of PTSD that attempt to explain core neural processing mechanisms mediating PTSD. PMID:26241958

  10. Decoding the Traumatic Memory among Women with PTSD: Implications for Neurocircuitry Models of PTSD and Real-Time fMRI Neurofeedback.

    PubMed

    Cisler, Josh M; Bush, Keith; James, G Andrew; Smitherman, Sonet; Kilts, Clinton D

    2015-01-01

    Posttraumatic Stress Disorder (PTSD) is characterized by intrusive recall of the traumatic memory. While numerous studies have investigated the neural processing mechanisms engaged during trauma memory recall in PTSD, these analyses have only focused on group-level contrasts that reveal little about the predictive validity of the identified brain regions. By contrast, a multivariate pattern analysis (MVPA) approach towards identifying the neural mechanisms engaged during trauma memory recall would entail testing whether a multivariate set of brain regions is reliably predictive of (i.e., discriminates) whether an individual is engaging in trauma or non-trauma memory recall. Here, we use a MVPA approach to test 1) whether trauma memory vs neutral memory recall can be predicted reliably using a multivariate set of brain regions among women with PTSD related to assaultive violence exposure (N=16), 2) the methodological parameters (e.g., spatial smoothing, number of memory recall repetitions, etc.) that optimize classification accuracy and reproducibility of the feature weight spatial maps, and 3) the correspondence between brain regions that discriminate trauma memory recall and the brain regions predicted by neurocircuitry models of PTSD. Cross-validation classification accuracy was significantly above chance for all methodological permutations tested; mean accuracy across participants was 76% for the methodological parameters selected as optimal for both efficiency and accuracy. Classification accuracy was significantly better for a voxel-wise approach relative to voxels within restricted regions-of-interest (ROIs); classification accuracy did not differ when using PTSD-related ROIs compared to randomly generated ROIs. ROI-based analyses suggested the reliable involvement of the left hippocampus in discriminating memory recall across participants and that the contribution of the left amygdala to the decision function was dependent upon PTSD symptom severity. These results have methodological implications for real-time fMRI neurofeedback of the trauma memory in PTSD and conceptual implications for neurocircuitry models of PTSD that attempt to explain core neural processing mechanisms mediating PTSD.

  11. Intraoperative Raman Spectroscopy of Soft Tissue Sarcomas

    PubMed Central

    Nguyen, John Q.; Gowani, Zain S.; O’Connor, Maggie; Pence, Isaac J.; Nguyen, The-Quyen; Holt, Ginger E.; Schwartz, Herbert S.; Halpern, Jennifer L.; Mahadevan-Jansen, Anita

    2017-01-01

    Background and Objective Soft tissue sarcomas (STS) are a rare and heterogeneous group of malignant tumors that are often treated through surgical resection. Current intraoperative margin assessment methods are limited and highlight the need for an improved approach with respect to time and specificity. Here we investigate the potential of near-infrared Raman spectroscopy for the intraoperative differentiation of STS from surrounding normal tissue. Materials and Methods In vivo Raman measurements at 785 nm excitation were intraoperatively acquired from subjects undergoing STS resection using a probe based spectroscopy system. A multivariate classification algorithm was developed in order to automatically identify spectral features that can be used to differentiate STS from the surrounding normal muscle and fat. The classification algorithm was subsequently tested using leave-one-subject-out cross-validation. Results With the exclusion of well-differentiated liposarcomas, the algorithm was able to classify STS from the surrounding normal muscle and fat with a sensitivity and specificity of 89.5% and 96.4%, respectively. Conclusion These results suggest that single point near-infrared Raman spectroscopy could be utilized as a rapid and non-destructive surgical guidance tool for identifying abnormal tissue margins in need of further excision. PMID:27454580

  12. Dual-modal cancer detection based on optical pH sensing and Raman spectroscopy

    NASA Astrophysics Data System (ADS)

    Kim, Soogeun; Lee, Seung Ho; Min, Sun Young; Byun, Kyung Min; Lee, Soo Yeol

    2017-10-01

    A dual-modal approach using Raman spectroscopy and optical pH sensing was investigated to discriminate between normal and cancerous tissues. Raman spectroscopy has demonstrated the potential for in vivo cancer detection. However, Raman spectroscopy has suffered from strong fluorescence background of biological samples and subtle spectral differences between normal and disease tissues. To overcome those issues, pH sensing is adopted to Raman spectroscopy as a dual-modal approach. Based on the fact that the pH level in cancerous tissues is lower than that in normal tissues due to insufficient vasculature formation, the dual-modal approach combining the chemical information of Raman spectrum and the metabolic information of pH level can improve the specificity of cancer diagnosis. From human breast tissue samples, Raman spectra and pH levels are measured using fiber-optic-based Raman and pH probes, respectively. The pH sensing is based on the dependence of pH level on optical transmission spectrum. Multivariate statistical analysis is performed to evaluate the classification capability of the dual-modal method. The analytical results show that the dual-modal method based on Raman spectroscopy and optical pH sensing can improve the performance of cancer classification.

  13. Novel techniques for characterization of hydrocarbon emission sources in the Barnett Shale

    NASA Astrophysics Data System (ADS)

    Nathan, Brian Joseph

    Changes in ambient atmospheric hydrocarbon concentrations can have both short-term and long-term effects on the atmosphere and on human health. Thus, accurate characterization of emissions sources is critically important. The recent boom in shale gas production has led to an increase in hydrocarbon emissions from associated processes, though the exact extent is uncertain. As an original quantification technique, a model airplane equipped with a specially-designed, open-path methane sensor was flown multiple times over a natural gas compressor station in the Barnett Shale in October 2013. A linear optimization was introduced to a standard Gaussian plume model in an effort to determine the most probable emission rate coming from the station. This is shown to be a suitable approach given an ideal source with a single, central plume. Separately, an analysis was performed to characterize the nonmethane hydrocarbons in the Barnett during the same period. Starting with ambient hourly concentration measurements of forty-six hydrocarbon species, Lagrangian air parcel trajectories were implemented in a meteorological model to extend the resolution of these measurements and achieve domain-fillings of the region for the period of interest. A self-organizing map (a type of unsupervised classification) was then utilized to reduce the dimensionality of the total multivariate set of grids into characteristic one-dimensional signatures. By also introducing a self-organizing map classification of the contemporary wind measurements, the spatial hydrocarbon characterizations are analyzed for periods with similar wind conditions. The accuracy of the classification is verified through assessment of observed spatial mixing ratio enhancements of key species, through site-comparisons with a related long-term study, and through a random forest analysis (an ensemble learning method of supervised classification) to determine the most important species for defining key classes. The hydrocarbon classification is shown to have performed very well in identifying expected signatures near and downwind-of oil and gas facilities with active permits, which showcases this method's usefulness for future regional hydrocarbon source-apportionment analyses.

  14. Decoding Multiple Sound Categories in the Human Temporal Cortex Using High Resolution fMRI

    PubMed Central

    Zhang, Fengqing; Wang, Ji-Ping; Kim, Jieun; Parrish, Todd; Wong, Patrick C. M.

    2015-01-01

    Perception of sound categories is an important aspect of auditory perception. The extent to which the brain’s representation of sound categories is encoded in specialized subregions or distributed across the auditory cortex remains unclear. Recent studies using multivariate pattern analysis (MVPA) of brain activations have provided important insights into how the brain decodes perceptual information. In the large existing literature on brain decoding using MVPA methods, relatively few studies have been conducted on multi-class categorization in the auditory domain. Here, we investigated the representation and processing of auditory categories within the human temporal cortex using high resolution fMRI and MVPA methods. More importantly, we considered decoding multiple sound categories simultaneously through multi-class support vector machine-recursive feature elimination (MSVM-RFE) as our MVPA tool. Results show that for all classifications the model MSVM-RFE was able to learn the functional relation between the multiple sound categories and the corresponding evoked spatial patterns and classify the unlabeled sound-evoked patterns significantly above chance. This indicates the feasibility of decoding multiple sound categories not only within but across subjects. However, the across-subject variation affects classification performance more than the within-subject variation, as the across-subject analysis has significantly lower classification accuracies. Sound category-selective brain maps were identified based on multi-class classification and revealed distributed patterns of brain activity in the superior temporal gyrus and the middle temporal gyrus. This is in accordance with previous studies, indicating that information in the spatially distributed patterns may reflect a more abstract perceptual level of representation of sound categories. Further, we show that the across-subject classification performance can be significantly improved by averaging the fMRI images over items, because the irrelevant variations between different items of the same sound category are reduced and in turn the proportion of signals relevant to sound categorization increases. PMID:25692885

  15. Decoding multiple sound categories in the human temporal cortex using high resolution fMRI.

    PubMed

    Zhang, Fengqing; Wang, Ji-Ping; Kim, Jieun; Parrish, Todd; Wong, Patrick C M

    2015-01-01

    Perception of sound categories is an important aspect of auditory perception. The extent to which the brain's representation of sound categories is encoded in specialized subregions or distributed across the auditory cortex remains unclear. Recent studies using multivariate pattern analysis (MVPA) of brain activations have provided important insights into how the brain decodes perceptual information. In the large existing literature on brain decoding using MVPA methods, relatively few studies have been conducted on multi-class categorization in the auditory domain. Here, we investigated the representation and processing of auditory categories within the human temporal cortex using high resolution fMRI and MVPA methods. More importantly, we considered decoding multiple sound categories simultaneously through multi-class support vector machine-recursive feature elimination (MSVM-RFE) as our MVPA tool. Results show that for all classifications the model MSVM-RFE was able to learn the functional relation between the multiple sound categories and the corresponding evoked spatial patterns and classify the unlabeled sound-evoked patterns significantly above chance. This indicates the feasibility of decoding multiple sound categories not only within but across subjects. However, the across-subject variation affects classification performance more than the within-subject variation, as the across-subject analysis has significantly lower classification accuracies. Sound category-selective brain maps were identified based on multi-class classification and revealed distributed patterns of brain activity in the superior temporal gyrus and the middle temporal gyrus. This is in accordance with previous studies, indicating that information in the spatially distributed patterns may reflect a more abstract perceptual level of representation of sound categories. Further, we show that the across-subject classification performance can be significantly improved by averaging the fMRI images over items, because the irrelevant variations between different items of the same sound category are reduced and in turn the proportion of signals relevant to sound categorization increases.

  16. Evaluation of metabolites extraction strategies for identifying different brain regions and their relationship with alcohol preference and gender difference using NMR metabolomics.

    PubMed

    Wang, Jie; Zeng, Hao-Long; Du, Hongying; Liu, Zeyuan; Cheng, Ji; Liu, Taotao; Hu, Ting; Kamal, Ghulam Mustafa; Li, Xihai; Liu, Huili; Xu, Fuqiang

    2018-03-01

    Metabolomics generate a profile of small molecules from cellular/tissue metabolism, which could directly reflect the mechanisms of complex networks of biochemical reactions. Traditional metabolomics methods, such as OPLS-DA, PLS-DA are mainly used for binary class discrimination. Multiple groups are always involved in the biological system, especially for brain research. Multiple brain regions are involved in the neuronal study of brain metabolic dysfunctions such as alcoholism, Alzheimer's disease, etc. In the current study, 10 different brain regions were utilized for comparative studies between alcohol preferring and non-preferring rats, male and female rats respectively. As many classes are involved (ten different regions and four types of animals), traditional metabolomics methods are no longer efficient for showing differentiation. Here, a novel strategy based on the decision tree algorithm was employed for successfully constructing different classification models to screen out the major characteristics of ten brain regions at the same time. Subsequently, this method was also utilized to select the major effective brain regions related to alcohol preference and gender difference. Compared with the traditional multivariate statistical methods, the decision tree could construct acceptable and understandable classification models for multi-class data analysis. Therefore, the current technology could also be applied to other general metabolomics studies involving multi class data. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Association of hand or knee osteoarthritis with diabetes mellitus in a population of Hispanics from Puerto Rico

    PubMed Central

    Nieves-Plaza, Mariely; Castro-Santana, Lesliane E.; Font, Yvonne M.; Mayor, Angel M.; Vilá, Luis M.

    2013-01-01

    Background Although a higher prevalence of osteoarthritis (OA) has been reported among diabetes mellitus (DM) patients, inconsistencies and limitations of observational studies have precluded a conclusive association. Objective To evaluate the association of hand or knee OA with DM in a population of Hispanics from Puerto Rico. Methods A cross-sectional study was performed in 202 subjects (100 adult DM patients as per the National Diabetes Data Group Classification, and 102 non-diabetic subjects). OA of hand and knee was ascertained using the American College of Rheumatology classification criteria. Sociodemographic characteristics, health-related behaviors, comorbidities, pharmacotherapy and DM clinical manifestations were determined. Multivariable logistic regression was used to evaluate the association of DM with hand or knee OA, and to evaluate factors associated with hand or knee OA among DM patients. Results The mean (standard deviation, SD) age for DM patients was 51.6 (13.1) years; 64.0% were females. The mean (SD) DM duration was 11.0 (10.4) years. The prevalence of OA in patients with DM and non-diabetics subjects was 49.0% and 26.5%, respectively (p<0.01). In the multivariable analysis, patients with DM had 2.18 the odds of having OA when compared to non-diabetic subjects (95% CI: 1.12–4.24). In a sub-analysis among DM patients, female patients were more likely to have hand or knee OA (OR [95% CI]: 5.06 [1.66–15.66]), whereas patients who did not use insulin alone for DM therapy were more likely to have OA (OR [95% CI]: 4.44 [1.22–16.12]). Conclusion In this population of Hispanics from Puerto Rico, DM patients were more likely to have OA of hands or knees than non-diabetic subjects. This association was retained in multivariable models accounting for established risk factors for OA. Among DM patients, females were at greater risk for OA, whereas the use of insulin was negatively associated. PMID:23319016

  18. PyMVPA: A python toolbox for multivariate pattern analysis of fMRI data.

    PubMed

    Hanke, Michael; Halchenko, Yaroslav O; Sederberg, Per B; Hanson, Stephen José; Haxby, James V; Pollmann, Stefan

    2009-01-01

    Decoding patterns of neural activity onto cognitive states is one of the central goals of functional brain imaging. Standard univariate fMRI analysis methods, which correlate cognitive and perceptual function with the blood oxygenation-level dependent (BOLD) signal, have proven successful in identifying anatomical regions based on signal increases during cognitive and perceptual tasks. Recently, researchers have begun to explore new multivariate techniques that have proven to be more flexible, more reliable, and more sensitive than standard univariate analysis. Drawing on the field of statistical learning theory, these new classifier-based analysis techniques possess explanatory power that could provide new insights into the functional properties of the brain. However, unlike the wealth of software packages for univariate analyses, there are few packages that facilitate multivariate pattern classification analyses of fMRI data. Here we introduce a Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets. PyMVPA makes use of Python's ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine learning packages. We present the framework in this paper and provide illustrative examples on its usage, features, and programmability.

  19. PyMVPA: A Python toolbox for multivariate pattern analysis of fMRI data

    PubMed Central

    Hanke, Michael; Halchenko, Yaroslav O.; Sederberg, Per B.; Hanson, Stephen José; Haxby, James V.; Pollmann, Stefan

    2009-01-01

    Decoding patterns of neural activity onto cognitive states is one of the central goals of functional brain imaging. Standard univariate fMRI analysis methods, which correlate cognitive and perceptual function with the blood oxygenation-level dependent (BOLD) signal, have proven successful in identifying anatomical regions based on signal increases during cognitive and perceptual tasks. Recently, researchers have begun to explore new multivariate techniques that have proven to be more flexible, more reliable, and more sensitive than standard univariate analysis. Drawing on the field of statistical learning theory, these new classifier-based analysis techniques possess explanatory power that could provide new insights into the functional properties of the brain. However, unlike the wealth of software packages for univariate analyses, there are few packages that facilitate multivariate pattern classification analyses of fMRI data. Here we introduce a Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets. PyMVPA makes use of Python's ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine-learning packages. We present the framework in this paper and provide illustrative examples on its usage, features, and programmability. PMID:19184561

  20. A revision of chiggers of the minuta species-group (Acari: Trombiculidae: Neotrombicula Hirst, 1925) using multivariate morphometrics.

    PubMed

    Stekolnikov, Alexandr A; Klimov, Pavel B

    2010-09-01

    We revise chiggers belonging to the minuta-species group (genus Neotrombicula Hirst, 1925) from the Palaearctic using size-free multivariate morphometrics. This approach allowed us to resolve several diagnostic problems. We show that the widely distributed Neotrombicula scrupulosa Kudryashova, 1993 forms three spatially and ecologically isolated groups different from each other in size or shape (morphometric property) only: specimens from the Caucasus are distinct from those from Asia in shape, whereas the Asian specimens from plains and mountains are different from each other in size. We developed a multivariate classification model to separate three closely related species: N. scrupulosa, N. lubrica Kudryashova, 1993 and N. minuta Schluger, 1966. This model is based on five shape variables selected from an initial 17 variables by a best subset analysis using a custom size-correction subroutine. The variable selection procedure slightly improved the predictive power of the model, suggesting that it not only removed redundancy but also reduced 'noise' in the dataset. The overall classification accuracy of this model is 96.2, 96.2 and 95.5%, as estimated by internal validation, external validation and jackknife statistics, respectively. Our analyses resulted in one new synonymy: N. dimidiata Stekolnikov, 1995 is considered to be a synonym of N. lubrica. Both N. scrupulosa and N. lubrica are recorded from new localities. A key to species of the minuta-group incorporating results from our multivariate analyses is presented.

  1. Linear Discriminant Analysis Achieves High Classification Accuracy for the BOLD fMRI Response to Naturalistic Movie Stimuli

    PubMed Central

    Mandelkow, Hendrik; de Zwart, Jacco A.; Duyn, Jeff H.

    2016-01-01

    Naturalistic stimuli like movies evoke complex perceptual processes, which are of great interest in the study of human cognition by functional MRI (fMRI). However, conventional fMRI analysis based on statistical parametric mapping (SPM) and the general linear model (GLM) is hampered by a lack of accurate parametric models of the BOLD response to complex stimuli. In this situation, statistical machine-learning methods, a.k.a. multivariate pattern analysis (MVPA), have received growing attention for their ability to generate stimulus response models in a data-driven fashion. However, machine-learning methods typically require large amounts of training data as well as computational resources. In the past, this has largely limited their application to fMRI experiments involving small sets of stimulus categories and small regions of interest in the brain. By contrast, the present study compares several classification algorithms known as Nearest Neighbor (NN), Gaussian Naïve Bayes (GNB), and (regularized) Linear Discriminant Analysis (LDA) in terms of their classification accuracy in discriminating the global fMRI response patterns evoked by a large number of naturalistic visual stimuli presented as a movie. Results show that LDA regularized by principal component analysis (PCA) achieved high classification accuracies, above 90% on average for single fMRI volumes acquired 2 s apart during a 300 s movie (chance level 0.7% = 2 s/300 s). The largest source of classification errors were autocorrelations in the BOLD signal compounded by the similarity of consecutive stimuli. All classifiers performed best when given input features from a large region of interest comprising around 25% of the voxels that responded significantly to the visual stimulus. Consistent with this, the most informative principal components represented widespread distributions of co-activated brain regions that were similar between subjects and may represent functional networks. In light of these results, the combination of naturalistic movie stimuli and classification analysis in fMRI experiments may prove to be a sensitive tool for the assessment of changes in natural cognitive processes under experimental manipulation. PMID:27065832

  2. Analysis of cohort studies with multivariate and partially observed disease classification data.

    PubMed

    Chatterjee, Nilanjan; Sinha, Samiran; Diver, W Ryan; Feigelson, Heather Spencer

    2010-09-01

    Complex diseases like cancers can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semiparametric Cox proportional hazards regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating equation approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of the estimating equation method under a general missing-at-random assumption and propose a novel influence-function-based sandwich variance estimator. The methods are illustrated using simulation studies and a real data application involving the Cancer Prevention Study II nutrition cohort.

  3. Sparse Multivariate Autoregressive Modeling for Mild Cognitive Impairment Classification

    PubMed Central

    Li, Yang; Wee, Chong-Yaw; Jie, Biao; Peng, Ziwen

    2014-01-01

    Brain connectivity network derived from functional magnetic resonance imaging (fMRI) is becoming increasingly prevalent in the researches related to cognitive and perceptual processes. The capability to detect causal or effective connectivity is highly desirable for understanding the cooperative nature of brain network, particularly when the ultimate goal is to obtain good performance of control-patient classification with biological meaningful interpretations. Understanding directed functional interactions between brain regions via brain connectivity network is a challenging task. Since many genetic and biomedical networks are intrinsically sparse, incorporating sparsity property into connectivity modeling can make the derived models more biologically plausible. Accordingly, we propose an effective connectivity modeling of resting-state fMRI data based on the multivariate autoregressive (MAR) modeling technique, which is widely used to characterize temporal information of dynamic systems. This MAR modeling technique allows for the identification of effective connectivity using the Granger causality concept and reducing the spurious causality connectivity in assessment of directed functional interaction from fMRI data. A forward orthogonal least squares (OLS) regression algorithm is further used to construct a sparse MAR model. By applying the proposed modeling to mild cognitive impairment (MCI) classification, we identify several most discriminative regions, including middle cingulate gyrus, posterior cingulate gyrus, lingual gyrus and caudate regions, in line with results reported in previous findings. A relatively high classification accuracy of 91.89 % is also achieved, with an increment of 5.4 % compared to the fully-connected, non-directional Pearson-correlation-based functional connectivity approach. PMID:24595922

  4. Grouping Parturients by Parity, Previous-Cesarean, and Mode of Delivery (P-C-MoD Classification) Better Identifies Groups at Risk for Postpartum Hemorrhage.

    PubMed

    Reichman, Orna; Gal, Micahel; Sela, Hen Y; Khayyat, Izzat; Emanuel, Michael; Samueloff, Arnon

    2016-10-01

    Objective We aimed to create a clinical classification to better identify parturients at risk for postpartum hemorrhage (PPH). Method A retrospective cohort, including all women who delivered at a single tertiary care medical center, between 2006 and 2014. Parturients were grouped by parity and history of cesarean delivery (CD): primiparas, multipara, and multipara with previous CD. Each were further subgrouped by mode of delivery (spontaneous vaginal delivery [SVD], operative vaginal delivery [OVD], emergency or elective CD). In all, 12 subgroups, based on parity, previous cesarean, and mode of delivery, formed the P-C-MoD classification. PPH was defined as a decrease of ≥3 gram% hemoglobin from admission and/or transfusion of blood products. Univariate analysis followed by multivariate analysis was performed to assess risk for PPH, controlling for confounders. Results The crude rate of PPH among 126,693 parturients was 7%. The prevalence differed significantly among independent risk factors: primiparity, 14%; multiparity, 4%; OVD, 22%; and CD, 15%. The P-C-MoD classification, segregated better between parturients at risk for PPH. The prevalence of PPH was highest for primiparous undergoing OVD (27%) compared with multiparous with SVD (3%), odds ratio [OR] = 12.8 (95% confidence interval [CI],11.9-13.9). These finding were consistent in the multivariate analysis OR = 13.1 (95% CI,12.1-14.3). Conclusion Employing the P-C-MoD classification more readily identifies parturients at risk for PPH and is superior to estimations based on single risk factors. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  5. Big genomics and clinical data analytics strategies for precision cancer prognosis.

    PubMed

    Ow, Ghim Siong; Kuznetsov, Vladimir A

    2016-11-07

    The field of personalized and precise medicine in the era of big data analytics is growing rapidly. Previously, we proposed our model of patient classification termed Prognostic Signature Vector Matching (PSVM) and identified a 37 variable signature comprising 36 let-7b associated prognostic significant mRNAs and the age risk factor that stratified large high-grade serous ovarian cancer patient cohorts into three survival-significant risk groups. Here, we investigated the predictive performance of PSVM via optimization of the prognostic variable weights, which represent the relative importance of one prognostic variable over the others. In addition, we compared several multivariate prognostic models based on PSVM with classical machine learning techniques such as K-nearest-neighbor, support vector machine, random forest, neural networks and logistic regression. Our results revealed that negative log-rank p-values provides more robust weight values as opposed to the use of other quantities such as hazard ratios, fold change, or a combination of those factors. PSVM, together with the classical machine learning classifiers were combined in an ensemble (multi-test) voting system, which collectively provides a more precise and reproducible patient stratification. The use of the multi-test system approach, rather than the search for the ideal classification/prediction method, might help to address limitations of the individual classification algorithm in specific situation.

  6. Land cover and land use mapping of the iSimangaliso Wetland Park, South Africa: comparison of oblique and orthogonal random forest algorithms

    NASA Astrophysics Data System (ADS)

    Bassa, Zaakirah; Bob, Urmilla; Szantoi, Zoltan; Ismail, Riyad

    2016-01-01

    In recent years, the popularity of tree-based ensemble methods for land cover classification has increased significantly. Using WorldView-2 image data, we evaluate the potential of the oblique random forest algorithm (oRF) to classify a highly heterogeneous protected area. In contrast to the random forest (RF) algorithm, the oRF algorithm builds multivariate trees by learning the optimal split using a supervised model. The oRF binary algorithm is adapted to a multiclass land cover and land use application using both the "one-against-one" and "one-against-all" combination approaches. Results show that the oRF algorithms are capable of achieving high classification accuracies (>80%). However, there was no statistical difference in classification accuracies obtained by the oRF algorithms and the more popular RF algorithm. For all the algorithms, user accuracies (UAs) and producer accuracies (PAs) >80% were recorded for most of the classes. Both the RF and oRF algorithms poorly classified the indigenous forest class as indicated by the low UAs and PAs. Finally, the results from this study advocate and support the utility of the oRF algorithm for land cover and land use mapping of protected areas using WorldView-2 image data.

  7. Classification of Spanish white wines using their electrophoretic profiles obtained by capillary zone electrophoresis with amperometric detection.

    PubMed

    Arribas, Alberto Sánchez; Martínez-Fernández, Marta; Moreno, Mónica; Bermejo, Esperanza; Zapardiel, Antonio; Chicharro, Manuel

    2014-06-01

    A method was developed for the simultaneous detection of eight polyphenols (t-resveratrol, (+)-catechin, quercetin and p-coumaric, caffeic, sinapic, ferulic, and gallic acids) by CZE with electrochemical detection. Separation of these polyphenols was achieved within 25 min using a 200 mM borate buffer (pH 9.4) containing 10% methanol as separation electrolyte. Amperometric detection of polyphenols was carried out with a glassy carbon electrode (GCE) modified with a multiwalled carbon nanotubes (CNT) layer obtained from a dispersion of CNT in polyethylenimine. The excellent electrochemical properties of this modified electrode allowed the detection and quantification of the selected polyphenols in white wines without any pretreatment step, showing remarkable signal stability despite the presence of potential fouling substances in wine. The electrophoretic profiles of white wines, obtained using this methodology, have proven to be useful for the classification of these wines by means of chemometric multivariate techniques. Principal component analysis and discriminant analysis allowed accurate classification of wine samples on the basis of their grape varietal (verdejo and airén) using the information contained in selected zones of the electropherogram. The utility of the proposed CZE methodology based on the electrochemical response of CNT-modified electrodes appears to be promising in the field of wine industry and it is expected to be successfully extended to classification of a wider range of wines made of other grape varietals. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Arm structure in normal spiral galaxies, 1: Multivariate data for 492 galaxies

    NASA Technical Reports Server (NTRS)

    Magri, Christopher

    1994-01-01

    Multivariate data have been collected as part of an effort to develop a new classification system for spiral galaxies, one which is not necessarily based on subjective morphological properties. A sample of 492 moderately bright northern Sa and Sc spirals was chosen for future statistical analysis. New observations were made at 20 and 21 cm; the latter data are described in detail here. Infrared Astronomy Satellite (IRAS) fluxes were obtained from archival data. Finally, new estimates of arm pattern radomness and of local environmental harshness were compiled for most sample objects.

  9. Evidence-based provisional clinical classification criteria for autoinflammatory periodic fevers.

    PubMed

    Federici, Silvia; Sormani, Maria Pia; Ozen, Seza; Lachmann, Helen J; Amaryan, Gayane; Woo, Patricia; Koné-Paut, Isabelle; Dewarrat, Natacha; Cantarini, Luca; Insalaco, Antonella; Uziel, Yosef; Rigante, Donato; Quartier, Pierre; Demirkaya, Erkan; Herlin, Troels; Meini, Antonella; Fabio, Giovanna; Kallinich, Tilmann; Martino, Silvana; Butbul, Aviel Yonatan; Olivieri, Alma; Kuemmerle-Deschner, Jasmin; Neven, Benedicte; Simon, Anna; Ozdogan, Huri; Touitou, Isabelle; Frenkel, Joost; Hofer, Michael; Martini, Alberto; Ruperto, Nicolino; Gattorno, Marco

    2015-05-01

    The objective of this work was to develop and validate a set of clinical criteria for the classification of patients affected by periodic fevers. Patients with inherited periodic fevers (familial Mediterranean fever (FMF); mevalonate kinase deficiency (MKD); tumour necrosis factor receptor-associated periodic fever syndrome (TRAPS); cryopyrin-associated periodic syndromes (CAPS)) enrolled in the Eurofever Registry up until March 2013 were evaluated. Patients with periodic fever, aphthosis, pharyngitis and adenitis (PFAPA) syndrome were used as negative controls. For each genetic disease, patients were considered to be 'gold standard' on the basis of the presence of a confirmatory genetic analysis. Clinical criteria were formulated on the basis of univariate and multivariate analysis in an initial group of patients (training set) and validated in an independent set of patients (validation set). A total of 1215 consecutive patients with periodic fevers were identified, and 518 gold standard patients (291 FMF, 74 MKD, 86 TRAPS, 67 CAPS) and 199 patients with PFAPA as disease controls were evaluated. The univariate and multivariate analyses identified a number of clinical variables that correlated independently with each disease, and four provisional classification scores were created. Cut-off values of the classification scores were chosen using receiver operating characteristic curve analysis as those giving the highest sensitivity and specificity. The classification scores were then tested in an independent set of patients (validation set) with an area under the curve of 0.98 for FMF, 0.95 for TRAPS, 0.96 for MKD, and 0.99 for CAPS. In conclusion, evidence-based provisional clinical criteria with high sensitivity and specificity for the clinical classification of patients with inherited periodic fevers have been developed. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  10. [Value of the albumin to globulin ratio in predicting severity and prognosis in myasthenia gravis patients].

    PubMed

    Yang, D H; Su, Z Q; Chen, Y; Chen, Z B; Ding, Z N; Weng, Y Y; Li, J; Li, X; Tong, Q L; Han, Y X; Zhang, X

    2016-03-08

    To assess the predictive value of the albumin to globulin ratio (AGR) in evaluation of disease severity and prognosis in myasthenia gravis patients. A total of 135 myasthenia gravis (MG) patients were enrolled between February 2009 and March 2015. The AGR was detected on the first day of hospitalization and ranked from lowest to highest, and the patients were divided into three equal tertiles according to the AGR values, which were T1 (AGR <1.34), T2 (1.34≤AGR≤1.53) and T3 (AGR>1.53). The Kaplan-Meier curve was used to evaluate the prognostic value of AGR. Cox model analysis was used to evaluate the relevant factors. Multivariate Logistic regression analysis was used to find the predictors of myasthenia crisis during hospitalization. The median length of hospital stay for each tertile was: for the T1 21 days (15-35.5), T2 18 days (14-27.5), and T3 16 days (12-22.5) (P<0.01), and Kaplan-Meier curves showed significant difference among the three groups. In the univariate model, serum albumin, creatinine, AGR and MGFA clinical classification were related to prognosis of myasthenia gravis. At the multivariate Cox regression analysis, the AGR (P<0.001) and MGFA clinical classification (P<0.001) were independent predictive factors of disease severity and prognosis in myasthenia gravis patients. Respectively, the hazard ratio (HR) were 4.655 (95% CI: 2.355-9.202) and 0.596 (95% CI: 0.492-0.723). Multivariate Logistic regression analysis showed the AGR (P<0.001) and MGFA clinical classification were related to myasthenia crisis. The AGR may represent a simple, potentially useful predictive biomarker for evaluating the disease severity and prognosis of patients with myasthenia gravis.

  11. Information spreading by a combination of MEG source estimation and multivariate pattern classification.

    PubMed

    Sato, Masashi; Yamashita, Okito; Sato, Masa-Aki; Miyawaki, Yoichi

    2018-01-01

    To understand information representation in human brain activity, it is important to investigate its fine spatial patterns at high temporal resolution. One possible approach is to use source estimation of magnetoencephalography (MEG) signals. Previous studies have mainly quantified accuracy of this technique according to positional deviations and dispersion of estimated sources, but it remains unclear how accurately MEG source estimation restores information content represented by spatial patterns of brain activity. In this study, using simulated MEG signals representing artificial experimental conditions, we performed MEG source estimation and multivariate pattern analysis to examine whether MEG source estimation can restore information content represented by patterns of cortical current in source brain areas. Classification analysis revealed that the corresponding artificial experimental conditions were predicted accurately from patterns of cortical current estimated in the source brain areas. However, accurate predictions were also possible from brain areas whose original sources were not defined. Searchlight decoding further revealed that this unexpected prediction was possible across wide brain areas beyond the original source locations, indicating that information contained in the original sources can spread through MEG source estimation. This phenomenon of "information spreading" may easily lead to false-positive interpretations when MEG source estimation and classification analysis are combined to identify brain areas that represent target information. Real MEG data analyses also showed that presented stimuli were able to be predicted in the higher visual cortex at the same latency as in the primary visual cortex, also suggesting that information spreading took place. These results indicate that careful inspection is necessary to avoid false-positive interpretations when MEG source estimation and multivariate pattern analysis are combined.

  12. Information spreading by a combination of MEG source estimation and multivariate pattern classification

    PubMed Central

    Sato, Masashi; Yamashita, Okito; Sato, Masa-aki

    2018-01-01

    To understand information representation in human brain activity, it is important to investigate its fine spatial patterns at high temporal resolution. One possible approach is to use source estimation of magnetoencephalography (MEG) signals. Previous studies have mainly quantified accuracy of this technique according to positional deviations and dispersion of estimated sources, but it remains unclear how accurately MEG source estimation restores information content represented by spatial patterns of brain activity. In this study, using simulated MEG signals representing artificial experimental conditions, we performed MEG source estimation and multivariate pattern analysis to examine whether MEG source estimation can restore information content represented by patterns of cortical current in source brain areas. Classification analysis revealed that the corresponding artificial experimental conditions were predicted accurately from patterns of cortical current estimated in the source brain areas. However, accurate predictions were also possible from brain areas whose original sources were not defined. Searchlight decoding further revealed that this unexpected prediction was possible across wide brain areas beyond the original source locations, indicating that information contained in the original sources can spread through MEG source estimation. This phenomenon of “information spreading” may easily lead to false-positive interpretations when MEG source estimation and classification analysis are combined to identify brain areas that represent target information. Real MEG data analyses also showed that presented stimuli were able to be predicted in the higher visual cortex at the same latency as in the primary visual cortex, also suggesting that information spreading took place. These results indicate that careful inspection is necessary to avoid false-positive interpretations when MEG source estimation and multivariate pattern analysis are combined. PMID:29912968

  13. Using Copula Distributions to Support More Accurate Imaging-Based Diagnostic Classifiers for Neuropsychiatric Disorders

    PubMed Central

    Bansal, Ravi; Hao, Xuejun; Liu, Jun; Peterson, Bradley S.

    2014-01-01

    Many investigators have tried to apply machine learning techniques to magnetic resonance images (MRIs) of the brain in order to diagnose neuropsychiatric disorders. Usually the number of brain imaging measures (such as measures of cortical thickness and measures of local surface morphology) derived from the MRIs (i.e., their dimensionality) has been large (e.g. >10) relative to the number of participants who provide the MRI data (<100). Sparse data in a high dimensional space increases the variability of the classification rules that machine learning algorithms generate, thereby limiting the validity, reproducibility, and generalizability of those classifiers. The accuracy and stability of the classifiers can improve significantly if the multivariate distributions of the imaging measures can be estimated accurately. To accurately estimate the multivariate distributions using sparse data, we propose to estimate first the univariate distributions of imaging data and then combine them using a Copula to generate more accurate estimates of their multivariate distributions. We then sample the estimated Copula distributions to generate dense sets of imaging measures and use those measures to train classifiers. We hypothesize that the dense sets of brain imaging measures will generate classifiers that are stable to variations in brain imaging measures, thereby improving the reproducibility, validity, and generalizability of diagnostic classification algorithms in imaging datasets from clinical populations. In our experiments, we used both computer-generated and real-world brain imaging datasets to assess the accuracy of multivariate Copula distributions in estimating the corresponding multivariate distributions of real-world imaging data. Our experiments showed that diagnostic classifiers generated using imaging measures sampled from the Copula were significantly more accurate and more reproducible than were the classifiers generated using either the real-world imaging measures or their multivariate Gaussian distributions. Thus, our findings demonstrate that estimated multivariate Copula distributions can generate dense sets of brain imaging measures that can in turn be used to train classifiers, and those classifiers are significantly more accurate and more reproducible than are those generated using real-world imaging measures alone. PMID:25093634

  14. Identification of the compositional changes in Orthosiphon stamineus leaves triggered by different drying techniques using 1 H NMR metabolomics.

    PubMed

    Pariyani, Raghunath; Ismail, Intan Safinar; Ahmad Azam, Amalina; Abas, Faridah; Shaari, Khozirah

    2017-09-01

    Java tea is a well-known herbal infusion prepared from the leaves of Orthosiphon stamineus (OS). The biological properties of tea are in direct correlation with the primary and secondary metabolite composition, which in turn largely depends on the choice of drying method. Herein, the impact of three commonly used drying methods, i.e. shade, microwave and freeze drying, on the metabolite composition and antioxidant activity of OS leaves was investigated using proton nuclear magnetic resonance ( 1 H NMR) spectroscopy combined with multivariate classification and regression analysis tools. A total of 31 constituents comprising primary and secondary metabolites belonging to the chemical classes of fatty acids, amino acids, sugars, terpenoids and phenolic compounds were identified. Shade-dried leaves were identified to possess the highest concentrations of bioactive secondary metabolites such as chlorogenic acid, caffeic acid, luteolin, orthosiphol and apigenin, followed by microwave-dried samples. Freeze-dried leaves had higher concentrations of choline, amino acids leucine, alanine and glutamine and sugars such as fructose and α-glucose, but contained the lowest levels of secondary metabolites. Metabolite profiling coupled with multivariate analysis identified shade drying as the best method to prepare OS leaves as Java tea or to include in traditional medicine preparation. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.

  15. A multi-analyte serum test for the detection of non-small cell lung cancer

    PubMed Central

    Farlow, E C; Vercillo, M S; Coon, J S; Basu, S; Kim, A W; Faber, L P; Warren, W H; Bonomi, P; Liptay, M J; Borgia, J A

    2010-01-01

    Background: In this study, we appraised a wide assortment of biomarkers previously shown to have diagnostic or prognostic value for non-small cell lung cancer (NSCLC) with the intent of establishing a multi-analyte serum test capable of identifying patients with lung cancer. Methods: Circulating levels of 47 biomarkers were evaluated against patient cohorts consisting of 90 NSCLC and 43 non-cancer controls using commercial immunoassays. Multivariate statistical methods were used on all biomarkers achieving statistical relevance to define an optimised panel of diagnostic biomarkers for NSCLC. The resulting biomarkers were fashioned into a classification algorithm and validated against serum from a second patient cohort. Results: A total of 14 analytes achieved statistical relevance upon evaluation. Multivariate statistical methods then identified a panel of six biomarkers (tumour necrosis factor-α, CYFRA 21-1, interleukin-1ra, matrix metalloproteinase-2, monocyte chemotactic protein-1 and sE-selectin) as being the most efficacious for diagnosing early stage NSCLC. When tested against a second patient cohort, the panel successfully classified 75 of 88 patients. Conclusions: Here, we report the development of a serum algorithm with high specificity for classifying patients with NSCLC against cohorts of various ‘high-risk' individuals. A high rate of false positives was observed within the cohort in which patients had non-neoplastic lung nodules, possibly as a consequence of the inflammatory nature of these conditions. PMID:20859284

  16. Classification of Physical Activity: Information to Artificial Pancreas Control Systems in Real Time.

    PubMed

    Turksoy, Kamuran; Paulino, Thiago Marques Luz; Zaharieva, Dessi P; Yavelberg, Loren; Jamnik, Veronica; Riddell, Michael C; Cinar, Ali

    2015-10-06

    Physical activity has a wide range of effects on glucose concentrations in type 1 diabetes (T1D) depending on the type (ie, aerobic, anaerobic, mixed) and duration of activity performed. This variability in glucose responses to physical activity makes the development of artificial pancreas (AP) systems challenging. Automatic detection of exercise type and intensity, and its classification as aerobic or anaerobic would provide valuable information to AP control algorithms. This can be achieved by using a multivariable AP approach where biometric variables are measured and reported to the AP at high frequency. We developed a classification system that identifies, in real time, the exercise intensity and its reliance on aerobic or anaerobic metabolism and tested this approach using clinical data collected from 5 persons with T1D and 3 individuals without T1D in a controlled laboratory setting using a variety of common types of physical activity. The classifier had an average sensitivity of 98.7% for physiological data collected over a range of exercise modalities and intensities in these subjects. The classifier will be added as a new module to the integrated multivariable adaptive AP system to enable the detection of aerobic and anaerobic exercise for enhancing the accuracy of insulin infusion strategies during and after exercise. © 2015 Diabetes Technology Society.

  17. Multivariate Classification of Structural MRI Data Detects Chronic Low Back Pain

    PubMed Central

    Ung, Hoameng; Brown, Justin E.; Johnson, Kevin A.; Younger, Jarred; Hush, Julia; Mackey, Sean

    2014-01-01

    Chronic low back pain (cLBP) has a tremendous personal and socioeconomic impact, yet the underlying pathology remains a mystery in the majority of cases. An objective measure of this condition, that augments self-report of pain, could have profound implications for diagnostic characterization and therapeutic development. Contemporary research indicates that cLBP is associated with abnormal brain structure and function. Multivariate analyses have shown potential to detect a number of neurological diseases based on structural neuroimaging. Therefore, we aimed to empirically evaluate such an approach in the detection of cLBP, with a goal to also explore the relevant neuroanatomy. We extracted brain gray matter (GM) density from magnetic resonance imaging scans of 47 patients with cLBP and 47 healthy controls. cLBP was classified with an accuracy of 76% by support vector machine analysis. Primary drivers of the classification included areas of the somatosensory, motor, and prefrontal cortices—all areas implicated in the pain experience. Differences in areas of the temporal lobe, including bordering the amygdala, medial orbital gyrus, cerebellum, and visual cortex, were also useful for the classification. Our findings suggest that cLBP is characterized by a pattern of GM changes that can have discriminative power and reflect relevant pathological brain morphology. PMID:23246778

  18. Classification of Physical Activity

    PubMed Central

    Turksoy, Kamuran; Paulino, Thiago Marques Luz; Zaharieva, Dessi P.; Yavelberg, Loren; Jamnik, Veronica; Riddell, Michael C.; Cinar, Ali

    2015-01-01

    Physical activity has a wide range of effects on glucose concentrations in type 1 diabetes (T1D) depending on the type (ie, aerobic, anaerobic, mixed) and duration of activity performed. This variability in glucose responses to physical activity makes the development of artificial pancreas (AP) systems challenging. Automatic detection of exercise type and intensity, and its classification as aerobic or anaerobic would provide valuable information to AP control algorithms. This can be achieved by using a multivariable AP approach where biometric variables are measured and reported to the AP at high frequency. We developed a classification system that identifies, in real time, the exercise intensity and its reliance on aerobic or anaerobic metabolism and tested this approach using clinical data collected from 5 persons with T1D and 3 individuals without T1D in a controlled laboratory setting using a variety of common types of physical activity. The classifier had an average sensitivity of 98.7% for physiological data collected over a range of exercise modalities and intensities in these subjects. The classifier will be added as a new module to the integrated multivariable adaptive AP system to enable the detection of aerobic and anaerobic exercise for enhancing the accuracy of insulin infusion strategies during and after exercise. PMID:26443291

  19. FT-Raman and NIR spectroscopy data fusion strategy for multivariate qualitative analysis of food fraud.

    PubMed

    Márquez, Cristina; López, M Isabel; Ruisánchez, Itziar; Callao, M Pilar

    2016-12-01

    Two data fusion strategies (high- and mid-level) combined with a multivariate classification approach (Soft Independent Modelling of Class Analogy, SIMCA) have been applied to take advantage of the synergistic effect of the information obtained from two spectroscopic techniques: FT-Raman and NIR. Mid-level data fusion consists of merging some of the previous selected variables from the spectra obtained from each spectroscopic technique and then applying the classification technique. High-level data fusion combines the SIMCA classification results obtained individually from each spectroscopic technique. Of the possible ways to make the necessary combinations, we decided to use fuzzy aggregation connective operators. As a case study, we considered the possible adulteration of hazelnut paste with almond. Using the two-class SIMCA approach, class 1 consisted of unadulterated hazelnut samples and class 2 of samples adulterated with almond. Models performance was also studied with samples adulterated with chickpea. The results show that data fusion is an effective strategy since the performance parameters are better than the individual ones: sensitivity and specificity values between 75% and 100% for the individual techniques and between 96-100% and 88-100% for the mid- and high-level data fusion strategies, respectively. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Discriminant analysis of cardiovascular and respiratory variables for classification of road cyclists by specialty.

    PubMed

    Nikolić, Biljana; Martinović, Jelena; Matić, Milan; Stefanović, Đorđe

    2018-05-29

    Different variables determine the performance of cyclists, which brings up the question how these parameters may help in their classification by specialty. The aim of the study was to determine differences in cardiorespiratory parameters of male cyclists according to their specialty, flat rider (N=21), hill rider (N=35) and sprinter (N=20) and obtain the multivariate model for further cyclists classification by specialties, based on selected variables. Seventeen variables were measured at submaximal and maximum load on the cycle ergometer Cosmed E 400HK (Cosmed, Rome, Italy) (initial 100W with 25W increase, 90-100 rpm). Multivariate discriminant analysis was used to determine which variables group cyclists within their specialty, and to predict which variables can direct cyclists to a particular specialty. Among nine variables that statistically contribute to the discriminant power of the model, achieved power on the anaerobic threshold and the produced CO2 had the biggest impact. The obtained discriminatory model correctly classified 91.43% of flat riders, 85.71% of hill riders, while sprinters were classified completely correct (100%), i.e. 92.10% of examinees were correctly classified, which point out the strength of the discriminatory model. Respiratory indicators mostly contribute to the discriminant power of the model, which may significantly contribute to training practice and laboratory tests in future.

  1. Non-targeted 1H NMR fingerprinting and multivariate statistical analyses for the characterisation of the geographical origin of Italian sweet cherries.

    PubMed

    Longobardi, F; Ventrella, A; Bianco, A; Catucci, L; Cafagna, I; Gallo, V; Mastrorilli, P; Agostiano, A

    2013-12-01

    In this study, non-targeted (1)H NMR fingerprinting was used in combination with multivariate statistical techniques for the classification of Italian sweet cherries based on their different geographical origins (Emilia Romagna and Puglia). As classification techniques, Soft Independent Modelling of Class Analogy (SIMCA), Partial Least Squares Discriminant Analysis (PLS-DA), and Linear Discriminant Analysis (LDA) were carried out and the results were compared. For LDA, before performing a refined selection of the number/combination of variables, two different strategies for a preliminary reduction of the variable number were tested. The best average recognition and CV prediction abilities (both 100.0%) were obtained for all the LDA models, although PLS-DA also showed remarkable performances (94.6%). All the statistical models were validated by observing the prediction abilities with respect to an external set of cherry samples. The best result (94.9%) was obtained with LDA by performing a best subset selection procedure on a set of 30 principal components previously selected by a stepwise decorrelation. The metabolites that mostly contributed to the classification performances of such LDA model, were found to be malate, glucose, fructose, glutamine and succinate. Copyright © 2013 Elsevier Ltd. All rights reserved.

  2. Towards intra-operative diagnosis of tumours during breast conserving surgery by selective-sampling Raman micro-spectroscopy

    NASA Astrophysics Data System (ADS)

    Kong, Kenny; Zaabar, Fazliyana; Rakha, Emad; Ellis, Ian; Koloydenko, Alexey; Notingher, Ioan

    2014-10-01

    Breast-conserving surgery (BCS) is increasingly employed for the treatment of early stage breast cancer. One of the key challenges in BCS is to ensure complete removal of the tumour while conserving as much healthy tissue as possible. In this study we have investigated the potential of Raman micro-spectroscopy (RMS) for automated intra-operative evaluation of tumour excision. First, a multivariate classification model based on Raman spectra of normal and malignant breast tissue samples was built and achieved diagnosis of mammary ductal carcinoma (DC) with 95.6% sensitivity and 96.2% specificity (5-fold cross-validation). The tumour regions were discriminated from the healthy tissue structures based on increased concentration of nucleic acids and reduced concentration of collagen and fat. The multivariate classification model was then applied to sections from fresh tissue of new patients to produce diagnosis images for DC. The diagnosis images obtained by raster scanning RMS were in agreement with the conventional histopathology diagnosis but were limited to long data acquisition times (typically 10 000 spectra mm-2, which is equivalent to ~5 h mm-2). Selective-sampling based on integrated auto-fluorescence imaging and Raman spectroscopy was used to reduce the number of Raman spectra to ~20 spectra mm-2, which is equivalent to an acquisition time of ~15 min for 5 × 5 mm2 tissue samples. This study suggests that selective-sampling Raman microscopy has the potential to provide a rapid and objective intra-operative method to detect mammary carcinoma in tissue and assess resection margins.

  3. Chronic pruritus: evaluation of patient needs and treatment goals with a special regard to differences according to pruritus classification and sex.

    PubMed

    Steinke, S; Bruland, P; Blome, C; Osada, N; Dugas, M; Fritz, F; Augustin, M; Ständer, S

    2017-02-01

    Chronic pruritus (CP) is present in approximately one-third of all dermatological patients. Diagnostics and treatment are challenging and impair patients' quality of life. To analyse therapeutic needs in terms of the importance of treatment goals in a large sample of patients with CP. Routine data of 2747 patients with CP were analysed with descriptive methods and significance tests (univariate and multivariate variance analyses). The importance of 27 need items was measured using the Patient Needs Questionnaire of the Patient Benefit Index. The most important needs were to find a clear diagnosis and treatment, to no longer experience itching and to have confidence in the therapy, which were quite or very important to > 90% of the patients. The least important goals concerned a normal working or sex life. Nine needs related mostly to disease and psychological symptoms, and some social needs differed in importance between sexes (P ≤ 0·05). Patients with pruritus on inflamed skin or with chronic scratch lesions judged more than half of all needs as more important than did patients with pruritus on noninflamed skin (P ≤ 0·05). In the multivariate model, age, pruritus intensity and quality of life had a significant effect on the importance of therapeutic needs besides sex and pruritus classification. Patients with CP present high levels of various therapeutic needs with differences by sex and clinical phenotype. The most important needs can be addressed through medical activities such as appropriate itch medication and a trustful doctor-patient relationship. © 2016 British Association of Dermatologists.

  4. Newer classification and regression tree techniques: Bagging and Random Forests for ecological prediction

    Treesearch

    Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw

    2006-01-01

    We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.

  5. Study of archaeological coins of different dynasties using libs coupled with multivariate analysis

    NASA Astrophysics Data System (ADS)

    Awasthi, Shikha; Kumar, Rohit; Rai, G. K.; Rai, A. K.

    2016-04-01

    Laser Induced Breakdown Spectroscopy (LIBS) is an atomic emission spectroscopic technique having unique capability of an in-situ monitoring tool for detection and quantification of elements present in different artifacts. Archaeological coins collected form G.R. Sharma Memorial Museum; University of Allahabad, India has been analyzed using LIBS technique. These coins were obtained from excavation of Kausambi, Uttar Pradesh, India. LIBS system assembled in the laboratory (laser Nd:YAG 532 nm, 4 ns pulse width FWHM with Ocean Optics LIBS 2000+ spectrometer) is employed for spectral acquisition. The spectral lines of Ag, Cu, Ca, Sn, Si, Fe and Mg are identified in the LIBS spectra of different coins. LIBS along with Multivariate Analysis play an effective role for classification and contribution of spectral lines in different coins. The discrimination between five coins with Archaeological interest has been carried out using Principal Component Analysis (PCA). The results show the potential relevancy of the methodology used in the elemental identification and classification of artifacts with high accuracy and robustness.

  6. Craters on Earth, Moon, and Mars: Multivariate classification and mode of origin

    USGS Publications Warehouse

    Pike, R.J.

    1974-01-01

    Testing extraterrestrial craters and candidate terrestrial analogs for morphologic similitude is treated as a problem in numerical taxonomy. According to a principal-components solution and a cluster analysis, 402 representative craters on the Earth, the Moon, and Mars divide into two major classes of contrasting shapes and modes of origin. Craters of net accumulation of material (cratered lunar domes, Martian "calderas," and all terrestrial volcanoes except maars and tuff rings) group apart from craters of excavation (terrestrial meteorite impact and experimental explosion craters, typical Martian craters, and all other lunar craters). Maars and tuff rings belong to neither group but are transitional. The classification criteria are four independent attributes of topographic geometry derived from seven descriptive variables by the principal-components transformation. Morphometric differences between crater bowl and raised rim constitute the strongest of the four components. Although single topographic variables cannot confidently predict the genesis of individual extraterrestrial craters, multivariate statistical models constructed from several variables can distinguish consistently between large impact craters and volcanoes. ?? 1974.

  7. Neuroendocrine tumors of colon and rectum: validation of clinical and prognostic values of the World Health Organization 2010 grading classifications and European Neuroendocrine Tumor Society staging systems.

    PubMed

    Shen, Chaoyong; Yin, Yuan; Chen, Huijiao; Tang, Sumin; Yin, Xiaonan; Zhou, Zongguang; Zhang, Bo; Chen, Zhixin

    2017-03-28

    This study evaluated and compared the clinical and prognostic values of the grading criteria used by the World Health Organization (WHO) and the European Neuroendocrine Tumors Society (ENETS). Moreover, this work assessed the current best prognostic model for colorectal neuroendocrine tumors (CRNETs). The 2010 WHO classifications and the ENETS systems can both stratify the patients into prognostic groups, although the 2010 WHO criteria is more applicable to CRNET patients. Along with tumor location, the 2010 WHO criteria are important independent prognostic parameters for CRNETs in both univariate and multivariate analyses through Cox regression (P<0.05). Data from 192 consecutive patients histopathologically diagnosed with CRNETs and had undergone surgical resection from January 2009 to May 2016 in a single center were retrospectively analyzed. Findings suggest that the WHO classifications are superior over the ENETS classification system in predicting the prognosis of CRNETs. Additionally, the WHO classifications can be widely used in clinical practice.

  8. Patterns of brain structural connectivity differentiate normal weight from overweight subjects

    PubMed Central

    Gupta, Arpana; Mayer, Emeran A.; Sanmiguel, Claudia P.; Van Horn, John D.; Woodworth, Davis; Ellingson, Benjamin M.; Fling, Connor; Love, Aubrey; Tillisch, Kirsten; Labus, Jennifer S.

    2015-01-01

    Background Alterations in the hedonic component of ingestive behaviors have been implicated as a possible risk factor in the pathophysiology of overweight and obese individuals. Neuroimaging evidence from individuals with increasing body mass index suggests structural, functional, and neurochemical alterations in the extended reward network and associated networks. Aim To apply a multivariate pattern analysis to distinguish normal weight and overweight subjects based on gray and white-matter measurements. Methods Structural images (N = 120, overweight N = 63) and diffusion tensor images (DTI) (N = 60, overweight N = 30) were obtained from healthy control subjects. For the total sample the mean age for the overweight group (females = 32, males = 31) was 28.77 years (SD = 9.76) and for the normal weight group (females = 32, males = 25) was 27.13 years (SD = 9.62). Regional segmentation and parcellation of the brain images was performed using Freesurfer. Deterministic tractography was performed to measure the normalized fiber density between regions. A multivariate pattern analysis approach was used to examine whether brain measures can distinguish overweight from normal weight individuals. Results 1. White-matter classification: The classification algorithm, based on 2 signatures with 17 regional connections, achieved 97% accuracy in discriminating overweight individuals from normal weight individuals. For both brain signatures, greater connectivity as indexed by increased fiber density was observed in overweight compared to normal weight between the reward network regions and regions of the executive control, emotional arousal, and somatosensory networks. In contrast, the opposite pattern (decreased fiber density) was found between ventromedial prefrontal cortex and the anterior insula, and between thalamus and executive control network regions. 2. Gray-matter classification: The classification algorithm, based on 2 signatures with 42 morphological features, achieved 69% accuracy in discriminating overweight from normal weight. In both brain signatures regions of the reward, salience, executive control and emotional arousal networks were associated with lower morphological values in overweight individuals compared to normal weight individuals, while the opposite pattern was seen for regions of the somatosensory network. Conclusions 1. An increased BMI (i.e., overweight subjects) is associated with distinct changes in gray-matter and fiber density of the brain. 2. Classification algorithms based on white-matter connectivity involving regions of the reward and associated networks can identify specific targets for mechanistic studies and future drug development aimed at abnormal ingestive behavior and in overweight/obesity. PMID:25737959

  9. Pattern classification of fMRI data: applications for analysis of spatially distributed cortical networks.

    PubMed

    Yourganov, Grigori; Schmah, Tanya; Churchill, Nathan W; Berman, Marc G; Grady, Cheryl L; Strother, Stephen C

    2014-08-01

    The field of fMRI data analysis is rapidly growing in sophistication, particularly in the domain of multivariate pattern classification. However, the interaction between the properties of the analytical model and the parameters of the BOLD signal (e.g. signal magnitude, temporal variance and functional connectivity) is still an open problem. We addressed this problem by evaluating a set of pattern classification algorithms on simulated and experimental block-design fMRI data. The set of classifiers consisted of linear and quadratic discriminants, linear support vector machine, and linear and nonlinear Gaussian naive Bayes classifiers. For linear discriminant, we used two methods of regularization: principal component analysis, and ridge regularization. The classifiers were used (1) to classify the volumes according to the behavioral task that was performed by the subject, and (2) to construct spatial maps that indicated the relative contribution of each voxel to classification. Our evaluation metrics were: (1) accuracy of out-of-sample classification and (2) reproducibility of spatial maps. In simulated data sets, we performed an additional evaluation of spatial maps with ROC analysis. We varied the magnitude, temporal variance and connectivity of simulated fMRI signal and identified the optimal classifier for each simulated environment. Overall, the best performers were linear and quadratic discriminants (operating on principal components of the data matrix) and, in some rare situations, a nonlinear Gaussian naïve Bayes classifier. The results from the simulated data were supported by within-subject analysis of experimental fMRI data, collected in a study of aging. This is the first study that systematically characterizes interactions between analysis model and signal parameters (such as magnitude, variance and correlation) on the performance of pattern classifiers for fMRI. Copyright © 2014 Elsevier Inc. All rights reserved.

  10. Efficacy of the Kyoto Classification of Gastritis in Identifying Patients at High Risk for Gastric Cancer.

    PubMed

    Sugimoto, Mitsushige; Ban, Hiromitsu; Ichikawa, Hitomi; Sahara, Shu; Otsuka, Taketo; Inatomi, Osamu; Bamba, Shigeki; Furuta, Takahisa; Andoh, Akira

    2017-01-01

    Objective The Kyoto gastritis classification categorizes the endoscopic characteristics of Helicobacter pylori (H. pylori) infection-associated gastritis and identifies patterns associated with a high risk of gastric cancer. We investigated its efficacy, comparing scores in patients with H. pylori-associated gastritis and with gastric cancer. Methods A total of 1,200 patients with H. pylori-positive gastritis alone (n=932), early-stage H. pylori-positive gastric cancer (n=189), and successfully treated H. pylori-negative cancer (n=79) were endoscopically graded according to the Kyoto gastritis classification for atrophy, intestinal metaplasia, fold hypertrophy, nodularity, and diffuse redness. Results The prevalence of O-II/O-III-type atrophy according to the Kimura-Takemoto classification in early-stage H. pylori-positive gastric cancer and successfully treated H. pylori-negative cancer groups was 45.1%, which was significantly higher than in subjects with gastritis alone (12.7%, p<0.001). Kyoto gastritis scores of atrophy and intestinal metaplasia in the H. pylori-positive cancer group were significantly higher than in subjects with gastritis alone (all p<0.001). No significant differences were noted in the rates of gastric fold hypertrophy or diffuse redness between the two groups. In a multivariate analysis, the risks for H. pylori-positive gastric cancer increased with intestinal metaplasia (odds ratio: 4.453, 95% confidence interval: 3.332-5.950, <0.001) and male sex (1.737, 1.102-2.739, p=0.017). Conclusion Making an appropriate diagnosis and detecting patients at high risk is crucial for achieving total eradication of gastric cancer. The scores of intestinal metaplasia and atrophy of the scoring system in the Kyoto gastritis classification may thus be useful for detecting these patients.

  11. Efficacy of the Kyoto Classification of Gastritis in Identifying Patients at High Risk for Gastric Cancer

    PubMed Central

    Sugimoto, Mitsushige; Ban, Hiromitsu; Ichikawa, Hitomi; Sahara, Shu; Otsuka, Taketo; Inatomi, Osamu; Bamba, Shigeki; Furuta, Takahisa; Andoh, Akira

    2017-01-01

    Objective The Kyoto gastritis classification categorizes the endoscopic characteristics of Helicobacter pylori (H. pylori) infection-associated gastritis and identifies patterns associated with a high risk of gastric cancer. We investigated its efficacy, comparing scores in patients with H. pylori-associated gastritis and with gastric cancer. Methods A total of 1,200 patients with H. pylori-positive gastritis alone (n=932), early-stage H. pylori-positive gastric cancer (n=189), and successfully treated H. pylori-negative cancer (n=79) were endoscopically graded according to the Kyoto gastritis classification for atrophy, intestinal metaplasia, fold hypertrophy, nodularity, and diffuse redness. Results The prevalence of O-II/O-III-type atrophy according to the Kimura-Takemoto classification in early-stage H. pylori-positive gastric cancer and successfully treated H. pylori-negative cancer groups was 45.1%, which was significantly higher than in subjects with gastritis alone (12.7%, p<0.001). Kyoto gastritis scores of atrophy and intestinal metaplasia in the H. pylori-positive cancer group were significantly higher than in subjects with gastritis alone (all p<0.001). No significant differences were noted in the rates of gastric fold hypertrophy or diffuse redness between the two groups. In a multivariate analysis, the risks for H. pylori-positive gastric cancer increased with intestinal metaplasia (odds ratio: 4.453, 95% confidence interval: 3.332-5.950, <0.001) and male sex (1.737, 1.102-2.739, p=0.017). Conclusion Making an appropriate diagnosis and detecting patients at high risk is crucial for achieving total eradication of gastric cancer. The scores of intestinal metaplasia and atrophy of the scoring system in the Kyoto gastritis classification may thus be useful for detecting these patients. PMID:28321054

  12. Rapid discrimination of different Apiaceae species based on HPTLC fingerprints and targeted flavonoids determination using multivariate image analysis.

    PubMed

    Shawky, Eman; Abou El Kheir, Rasha M

    2018-02-11

    Species of Apiaceae are used in folk medicine as spices and in officinal medicinal preparations of drugs. They are an excellent source of phenolics exhibiting antioxidant activity, which are of great benefit to human health. Discrimination among Apiaceae medicinal herbs remains an intricate challenge due to their morphological similarity. In this study, a combined "untargeted" and "targeted" approach to investigate different Apiaceae plants species was proposed by using the merging of high-performance thin layer chromatography (HPTLC)-image analysis and pattern recognition methods which were used for fingerprinting and classification of 42 different Apiaceae samples collected from Egypt. Software for image processing was applied for fingerprinting and data acquisition. HPTLC fingerprint assisted by principal component analysis (PCA) and hierarchical cluster analysis (HCA)-heat maps resulted in a reliable untargeted approach for discrimination and classification of different samples. The "targeted" approach was performed by developing and validating an HPTLC method allowing the quantification of eight flavonoids. The combination of quantitative data with PCA and HCA-heat-maps allowed the different samples to be discriminated from each other. The use of chemometrics tools for evaluation of fingerprints reduced expense and analysis time. The proposed method can be adopted for routine discrimination and evaluation of the phytochemical variability in different Apiaceae species extracts. Copyright © 2018 John Wiley & Sons, Ltd.

  13. Identification of species and geographical strains of Sitophilus oryzae and Sitophilus zeamais using the visible/near-infrared hyperspectral imaging technique.

    PubMed

    Cao, Yang; Zhang, Chaojie; Chen, Quansheng; Li, Yanyu; Qi, Shuai; Tian, Lin; Ren, YongLin

    2015-08-01

    Identifying stored-product insects is essential for granary management. Automated, computer-based classification methods are rapidly developing in many areas. A hyperspectral imaging technique could potentially be developed to identify stored-product insect species and geographical strains. This study tested and adapted the technique using four geographical strains of each of two insect species, the rice weevil and maize weevil, to collect and analyse the resultant hyperspectral data. Three characteristic images that corresponded to the dominant wavelengths, 505, 659 and 955 nm, were selected by multivariate image analysis. Each image was processed, and 22 morphological and textural features from regions of interest were extracted as the inputs for an identification model. We found the backpropagation neural network model to be the superior method for distinguishing between the insect species and geographical strains. The overall recognition rates of the classification model for insect species were 100 and 98.13% for the calibration and prediction sets respectively, while the rates of the model for geographical strains were 94.17 and 86.88% respectively. This study has demonstrated that hyperspectral imaging, together with the appropriate recognition method, could provide a potential instrument for identifying insects and could become a useful tool for identification of Sitophilus oryzae and Sitophilus zeamais to aid in the management of stored-product insects. © 2014 Society of Chemical Industry.

  14. A data fusion-based drought index

    NASA Astrophysics Data System (ADS)

    Azmi, Mohammad; Rüdiger, Christoph; Walker, Jeffrey P.

    2016-03-01

    Drought and water stress monitoring plays an important role in the management of water resources, especially during periods of extreme climate conditions. Here, a data fusion-based drought index (DFDI) has been developed and analyzed for three different locations of varying land use and climate regimes in Australia. The proposed index comprehensively considers all types of drought through a selection of indices and proxies associated with each drought type. In deriving the proposed index, weekly data from three different data sources (OzFlux Network, Asia-Pacific Water Monitor, and MODIS-Terra satellite) were employed to first derive commonly used individual standardized drought indices (SDIs), which were then grouped using an advanced clustering method. Next, three different multivariate methods (principal component analysis, factor analysis, and independent component analysis) were utilized to aggregate the SDIs located within each group. For the two clusters in which the grouped SDIs best reflected the water availability and vegetation conditions, the variables were aggregated based on an averaging between the standardized first principal components of the different multivariate methods. Then, considering those two aggregated indices as well as the classifications of months (dry/wet months and active/non-active months), the proposed DFDI was developed. Finally, the symbolic regression method was used to derive mathematical equations for the proposed DFDI. The results presented here show that the proposed index has revealed new aspects in water stress monitoring which previous indices were not able to, by simultaneously considering both hydrometeorological and ecological concepts to define the real water stress of the study areas.

  15. Weather patterns as a downscaling tool - evaluating their skill in stratifying local climate variables

    NASA Astrophysics Data System (ADS)

    Murawski, Aline; Bürger, Gerd; Vorogushyn, Sergiy; Merz, Bruno

    2016-04-01

    The use of a weather pattern based approach for downscaling of coarse, gridded atmospheric data, as usually obtained from the output of general circulation models (GCM), allows for investigating the impact of anthropogenic greenhouse gas emissions on fluxes and state variables of the hydrological cycle such as e.g. on runoff in large river catchments. Here we aim at attributing changes in high flows in the Rhine catchment to anthropogenic climate change. Therefore we run an objective classification scheme (simulated annealing and diversified randomisation - SANDRA, available from the cost733 classification software) on ERA20C reanalyses data and apply the established classification to GCMs from the CMIP5 project. After deriving weather pattern time series from GCM runs using forcing from all greenhouse gases (All-Hist) and using natural greenhouse gas forcing only (Nat-Hist), a weather generator will be employed to obtain climate data time series for the hydrological model. The parameters of the weather pattern classification (i.e. spatial extent, number of patterns, classification variables) need to be selected in a way that allows for good stratification of the meteorological variables that are of interest for the hydrological modelling. We evaluate the skill of the classification in stratifying meteorological data using a multi-variable approach. This allows for estimating the stratification skill for all meteorological variables together, not separately as usually done in existing similar work. The advantage of the multi-variable approach is to properly account for situations where e.g. two patterns are associated with similar mean daily temperature, but one pattern is dry while the other one is related to considerable amounts of precipitation. Thus, the separation of these two patterns would not be justified when considering temperature only, but is perfectly reasonable when accounting for precipitation as well. Besides that, the weather patterns derived from reanalyses data should be well represented in the All-Hist GCM runs in terms of e.g. frequency, seasonality, and persistence. In this contribution we show how to select the most appropriate weather pattern classification and how the classes derived from it are reflected in the GCMs.

  16. Partial Least Squares for Discrimination in fMRI Data

    PubMed Central

    Andersen, Anders H.; Rayens, William S.; Liu, Yushu; Smith, Charles D.

    2011-01-01

    Multivariate methods for discrimination were used in the comparison of brain activation patterns between groups of cognitively normal women who are at either high or low Alzheimer's disease risk based on family history and apolipoprotein-E4 status. Linear discriminant analysis (LDA) was preceded by dimension reduction using either principal component analysis (PCA), partial least squares (PLS), or a new oriented partial least squares (OrPLS) method. The aim was to identify a spatial pattern of functionally connected brain regions that was differentially expressed by the risk groups and yielded optimal classification accuracy. Multivariate dimension reduction is required prior to LDA when the data contains more feature variables than there are observations on individual subjects. Whereas PCA has been commonly used to identify covariance patterns in neuroimaging data, this approach only identifies gross variability and is not capable of distinguishing among-groups from within-groups variability. PLS and OrPLS provide a more focused dimension reduction by incorporating information on class structure and therefore lead to more parsimonious models for discrimination. Performance was evaluated in terms of the cross-validated misclassification rates. The results support the potential of using fMRI as an imaging biomarker or diagnostic tool to discriminate individuals with disease or high risk. PMID:22227352

  17. Exploring high dimensional data with Butterfly: a novel classification algorithm based on discrete dynamical systems.

    PubMed

    Geraci, Joseph; Dharsee, Moyez; Nuin, Paulo; Haslehurst, Alexandria; Koti, Madhuri; Feilotter, Harriet E; Evans, Ken

    2014-03-01

    We introduce a novel method for visualizing high dimensional data via a discrete dynamical system. This method provides a 2D representation of the relationship between subjects according to a set of variables without geometric projections, transformed axes or principal components. The algorithm exploits a memory-type mechanism inherent in a certain class of discrete dynamical systems collectively referred to as the chaos game that are closely related to iterative function systems. The goal of the algorithm was to create a human readable representation of high dimensional patient data that was capable of detecting unrevealed subclusters of patients from within anticipated classifications. This provides a mechanism to further pursue a more personalized exploration of pathology when used with medical data. For clustering and classification protocols, the dynamical system portion of the algorithm is designed to come after some feature selection filter and before some model evaluation (e.g. clustering accuracy) protocol. In the version given here, a univariate features selection step is performed (in practice more complex feature selection methods are used), a discrete dynamical system is driven by this reduced set of variables (which results in a set of 2D cluster models), these models are evaluated for their accuracy (according to a user-defined binary classification) and finally a visual representation of the top classification models are returned. Thus, in addition to the visualization component, this methodology can be used for both supervised and unsupervised machine learning as the top performing models are returned in the protocol we describe here. Butterfly, the algorithm we introduce and provide working code for, uses a discrete dynamical system to classify high dimensional data and provide a 2D representation of the relationship between subjects. We report results on three datasets (two in the article; one in the appendix) including a public lung cancer dataset that comes along with the included Butterfly R package. In the included R script, a univariate feature selection method is used for the dimension reduction step, but in the future we wish to use a more powerful multivariate feature reduction method based on neural networks (Kriesel, 2007). A script written in R (designed to run on R studio) accompanies this article that implements this algorithm and is available at http://butterflygeraci.codeplex.com/. For details on the R package or for help installing the software refer to the accompanying document, Supporting Material and Appendix.

  18. Intelligent quotient estimation of mental retarded people from different psychometric instruments using artificial neural networks.

    PubMed

    Di Nuovo, Alessandro G; Di Nuovo, Santo; Buono, Serafino

    2012-02-01

    The estimation of a person's intelligence quotient (IQ) by means of psychometric tests is indispensable in the application of psychological assessment to several fields. When complex tests as the Wechsler scales, which are the most commonly used and universally recognized parameter for the diagnosis of degrees of retardation, are not applicable, it is necessary to use other psycho-diagnostic tools more suited for the subject's specific condition. But to ensure a homogeneous diagnosis it is necessary to reach a common metric, thus, the aim of our work is to build models able to estimate accurately and reliably the Wechsler IQ, starting from different psycho-diagnostic tools. Four different psychometric tests (Leiter international performance scale; coloured progressive matrices test; the mental development scale; psycho educational profile), along with the Wechsler scale, were administered to a group of 40 mentally retarded subjects, with various pathologies, and control persons. The obtained database is used to evaluate Wechsler IQ estimation models starting from the scores obtained in the other tests. Five modelling methods, two statistical and three from machine learning, that belong to the family of artificial neural networks (ANNs) are employed to build the estimator. Several error metrics for estimated IQ and for retardation level classification are defined to compare the performance of the various models with univariate and multivariate analyses. Eight empirical studies show that, after ten-fold cross-validation, best average estimation error is of 3.37 IQ points and mental retardation level classification error of 7.5%. Furthermore our experiments prove the superior performance of ANN methods over statistical regression ones, because in all cases considered ANN models show the lowest estimation error (from 0.12 to 0.9 IQ points) and the lowest classification error (from 2.5% to 10%). Since the estimation performance is better than the confidence interval of Wechsler scales (five IQ points), we consider models built very accurate and reliable and they can be used into help clinical diagnosis. Therefore a computer software based on the results of our work is currently used in a clinical center and empirical trails confirm its validity. Furthermore positive results in our multivariate studies suggest new approaches for clinicians. Copyright © 2011 Elsevier B.V. All rights reserved.

  19. Are there pollination syndromes in the Australian epacrids (Ericaceae: Styphelioideae)? A novel statistical method to identify key floral traits per syndrome

    PubMed Central

    Johnson, Karen A.

    2013-01-01

    Background and Aims Convergent floral traits hypothesized as attracting particular pollinators are known as pollination syndromes. Floral diversity suggests that the Australian epacrid flora may be adapted to pollinator type. Currently there are empirical data on the pollination systems for 87 species (approx. 15 % of Australian epacrids). This provides an opportunity to test for pollination syndromes and their important morphological traits in an iconic element of the Australian flora. Methods Data on epacrid–pollinator relationships were obtained from published literature and field observation. A multivariate approach was used to test whether epacrid floral attributes related to pollinator profiles. Statistical classification was then used to rank floral attributes according to their predictive value. Data sets excluding mixed pollination systems were used to test the predictive power of statistical classification to identify pollination models. Key Results Floral attributes are correlated with bird, fly and bee pollination. Using floral attributes identified as correlating with pollinator type, bird pollination is classified with 86 % accuracy, red flowers being the most important predictor. Fly and bee pollination are classified with 78 and 69 % accuracy, but have a lack of individually important floral predictors. Excluding mixed pollination systems improved the accuracy of the prediction of both bee and fly pollination systems. Conclusions Although most epacrids have generalized pollination systems, a correlation between bird pollination and red, long-tubed epacrids is found. Statistical classification highlights the relative importance of each floral attribute in relation to pollinator type and proves useful in classifying epacrids to bird, fly and bee pollination systems. PMID:23681546

  20. The classification of gunshot residue using laser electrospray mass spectrometry and offline multivariate statistical analysis

    USDA-ARS?s Scientific Manuscript database

    Nonresonant laser vaporization combined with high-resolution electrospray time-of-flight mass spectrometry enables analysis of a casing after discharge of a firearm revealing organic signature molecules including methyl centralite (MC), diphenylamine (DPA), N-nitrosodiphenylamine (N-NO-DPA), 4-nitro...

  1. Dual-modal cancer detection based on optical pH sensing and Raman spectroscopy.

    PubMed

    Kim, Soogeun; Lee, Seung Ho; Min, Sun Young; Byun, Kyung Min; Lee, Soo Yeol

    2017-10-01

    A dual-modal approach using Raman spectroscopy and optical pH sensing was investigated to discriminate between normal and cancerous tissues. Raman spectroscopy has demonstrated the potential for in vivo cancer detection. However, Raman spectroscopy has suffered from strong fluorescence background of biological samples and subtle spectral differences between normal and disease tissues. To overcome those issues, pH sensing is adopted to Raman spectroscopy as a dual-modal approach. Based on the fact that the pH level in cancerous tissues is lower than that in normal tissues due to insufficient vasculature formation, the dual-modal approach combining the chemical information of Raman spectrum and the metabolic information of pH level can improve the specificity of cancer diagnosis. From human breast tissue samples, Raman spectra and pH levels are measured using fiber-optic-based Raman and pH probes, respectively. The pH sensing is based on the dependence of pH level on optical transmission spectrum. Multivariate statistical analysis is performed to evaluate the classification capability of the dual-modal method. The analytical results show that the dual-modal method based on Raman spectroscopy and optical pH sensing can improve the performance of cancer classification. (2017) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).

  2. Classification of type 2 diabetes rats based on urine amino acids metabolic profiling by liquid chromatography coupled with tandem mass spectrometry.

    PubMed

    Wang, Chunyan; Zhu, Hongbin; Pi, Zifeng; Song, Fengrui; Liu, Zhiqiang; Liu, Shuying

    2013-09-15

    An analytical method for quantifying underivatized amino acids (AAs) in urine samples of rats was developed by using liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS). Classification of type 2 diabetes rats was based on urine amino acids metabolic profiling. LC-MS/MS analysis was applied through chromatographic separation and multiple reactions monitoring (MRM) transitions of MS/MS. Multivariate profile-wide predictive models were constructed using partial least squares discriminant analysis (PLS-DA) by SIMAC-P 11.5 version software package and hierarchical cluster analysis (HCA) by SPSS 18.0 version software. Some amino acids in urine of rats have significant change. The results of the present study prove that this method could perform the quantification of free AAs in urine of rats by using LC-MS/MS. In summary, the PLS-DA and HCA statistical analysis in our research were preferable to differentiate healthy rats and type 2 diabetes rats by the quantification of AAs in their urine samples. In addition, comparing with health group the seven increased amino acids in urine of type 2 rats were returned to normal under the treatment of acarbose. Copyright © 2013 Elsevier B.V. All rights reserved.

  3. Classification of buildings mold threat using electronic nose

    NASA Astrophysics Data System (ADS)

    Łagód, Grzegorz; Suchorab, Zbigniew; Guz, Łukasz; Sobczuk, Henryk

    2017-07-01

    Mold is considered to be one of the most important features of Sick Building Syndrome and is an important problem in current building industry. In many cases it is caused by the rising moisture of building envelopes surface and exaggerated humidity of indoor air. Concerning historical buildings it is mostly caused by outdated raising techniques among that is absence of horizontal isolation against moisture and hygroscopic materials applied for construction. Recent buildings also suffer problem of mold risk which is caused in many cases by hermetization leading to improper performance of gravitational ventilation systems that make suitable conditions for mold development. Basing on our research there is proposed a method of buildings mold threat classification using electronic nose, based on a gas sensors array which consists of MOS sensors (metal oxide semiconductor). Used device is frequently applied for air quality assessment in environmental engineering branches. Presented results show the interpretation of e-nose readouts of indoor air sampled in rooms threatened with mold development in comparison with clean reference rooms and synthetic air. Obtained multivariate data were processed, visualized and classified using a PCA (Principal Component Analysis) and ANN (Artificial Neural Network) methods. Described investigation confirmed that electronic nose - gas sensors array supported with data processing enables to classify air samples taken from different rooms affected with mold.

  4. A Novel Hyperspectral Microscopic Imaging System for Evaluating Fresh Degree of Pork.

    PubMed

    Xu, Yi; Chen, Quansheng; Liu, Yan; Sun, Xin; Huang, Qiping; Ouyang, Qin; Zhao, Jiewen

    2018-04-01

    This study proposed a rapid microscopic examination method for pork freshness evaluation by using the self-assembled hyperspectral microscopic imaging (HMI) system with the help of feature extraction algorithm and pattern recognition methods. Pork samples were stored for different days ranging from 0 to 5 days and the freshness of samples was divided into three levels which were determined by total volatile basic nitrogen (TVB-N) content. Meanwhile, hyperspectral microscopic images of samples were acquired by HMI system and processed by the following steps for the further analysis. Firstly, characteristic hyperspectral microscopic images were extracted by using principal component analysis (PCA) and then texture features were selected based on the gray level co-occurrence matrix (GLCM). Next, features data were reduced dimensionality by fisher discriminant analysis (FDA) for further building classification model. Finally, compared with linear discriminant analysis (LDA) model and support vector machine (SVM) model, good back propagation artificial neural network (BP-ANN) model obtained the best freshness classification with a 100 % accuracy rating based on the extracted data. The results confirm that the fabricated HMI system combined with multivariate algorithms has ability to evaluate the fresh degree of pork accurately in the microscopic level, which plays an important role in animal food quality control.

  5. A Novel Hyperspectral Microscopic Imaging System for Evaluating Fresh Degree of Pork

    PubMed Central

    Xu, Yi; Chen, Quansheng; Liu, Yan; Sun, Xin; Huang, Qiping; Ouyang, Qin; Zhao, Jiewen

    2018-01-01

    Abstract This study proposed a rapid microscopic examination method for pork freshness evaluation by using the self-assembled hyperspectral microscopic imaging (HMI) system with the help of feature extraction algorithm and pattern recognition methods. Pork samples were stored for different days ranging from 0 to 5 days and the freshness of samples was divided into three levels which were determined by total volatile basic nitrogen (TVB-N) content. Meanwhile, hyperspectral microscopic images of samples were acquired by HMI system and processed by the following steps for the further analysis. Firstly, characteristic hyperspectral microscopic images were extracted by using principal component analysis (PCA) and then texture features were selected based on the gray level co-occurrence matrix (GLCM). Next, features data were reduced dimensionality by fisher discriminant analysis (FDA) for further building classification model. Finally, compared with linear discriminant analysis (LDA) model and support vector machine (SVM) model, good back propagation artificial neural network (BP-ANN) model obtained the best freshness classification with a 100 % accuracy rating based on the extracted data. The results confirm that the fabricated HMI system combined with multivariate algorithms has ability to evaluate the fresh degree of pork accurately in the microscopic level, which plays an important role in animal food quality control. PMID:29805285

  6. Nearest neighbors by neighborhood counting.

    PubMed

    Wang, Hui

    2006-06-01

    Finding nearest neighbors is a general idea that underlies many artificial intelligence tasks, including machine learning, data mining, natural language understanding, and information retrieval. This idea is explicitly used in the k-nearest neighbors algorithm (kNN), a popular classification method. In this paper, this idea is adopted in the development of a general methodology, neighborhood counting, for devising similarity functions. We turn our focus from neighbors to neighborhoods, a region in the data space covering the data point in question. To measure the similarity between two data points, we consider all neighborhoods that cover both data points. We propose to use the number of such neighborhoods as a measure of similarity. Neighborhood can be defined for different types of data in different ways. Here, we consider one definition of neighborhood for multivariate data and derive a formula for such similarity, called neighborhood counting measure or NCM. NCM was tested experimentally in the framework of kNN. Experiments show that NCM is generally comparable to VDM and its variants, the state-of-the-art distance functions for multivariate data, and, at the same time, is consistently better for relatively large k values. Additionally, NCM consistently outperforms HEOM (a mixture of Euclidean and Hamming distances), the "standard" and most widely used distance function for multivariate data. NCM has a computational complexity in the same order as the standard Euclidean distance function and NCM is task independent and works for numerical and categorical data in a conceptually uniform way. The neighborhood counting methodology is proven sound for multivariate data experimentally. We hope it will work for other types of data.

  7. Discrete Neural Signatures of Basic Emotions.

    PubMed

    Saarimäki, Heini; Gotsopoulos, Athanasios; Jääskeläinen, Iiro P; Lampinen, Jouko; Vuilleumier, Patrik; Hari, Riitta; Sams, Mikko; Nummenmaa, Lauri

    2016-06-01

    Categorical models of emotions posit neurally and physiologically distinct human basic emotions. We tested this assumption by using multivariate pattern analysis (MVPA) to classify brain activity patterns of 6 basic emotions (disgust, fear, happiness, sadness, anger, and surprise) in 3 experiments. Emotions were induced with short movies or mental imagery during functional magnetic resonance imaging. MVPA accurately classified emotions induced by both methods, and the classification generalized from one induction condition to another and across individuals. Brain regions contributing most to the classification accuracy included medial and inferior lateral prefrontal cortices, frontal pole, precentral and postcentral gyri, precuneus, and posterior cingulate cortex. Thus, specific neural signatures across these regions hold representations of different emotional states in multimodal fashion, independently of how the emotions are induced. Similarity of subjective experiences between emotions was associated with similarity of neural patterns for the same emotions, suggesting a direct link between activity in these brain regions and the subjective emotional experience. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Metabolomics approach of infant formula for the evaluation of contamination and degradation using hydrophilic interaction liquid chromatography coupled with mass spectrometry.

    PubMed

    Inoue, Koichi; Tanada, Chihiro; Sakamoto, Tasuku; Tsutsui, Haruhito; Akiba, Takashi; Min, Jun Zhe; Todoroki, Kenichiro; Yamano, Yutaka; Toyo'oka, Toshimasa

    2015-08-15

    In this study including the field of metabolomics approach for food, the evaluation of untargeted compounds using HILIC-ESI/TOF/MS and multivariate statistical analysis method is proposed for the assessment of classification, contamination and degradation of infant formula. HILIC mode is used to monitor more detected numbers in infant formulas in the ESI-positive scan mode than the reversed phase. The repeatability of the non-targeted contents from 4 kinds of infant formulas based on PCA was less than the relative standard deviation of 15% in all groups. The PCA pattern showed that significant differences in the classification of types and origins, the contamination of melamine and the degradations for one week were evaluated using HILIC-ESI/TOF/MS. In the S-plot from the degradation test, we could identify two markers by comparison to standards as nicotinic acid and nicotinamide. With this strategy, the differences from the untargeted compounds could be utilized for quality and safety assessment of infant formula. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Geomorphic Classification and Assessment of Channel Dynamics in the Missouri National Recreational River, South Dakota and Nebraska

    USGS Publications Warehouse

    Elliott, Caroline M.; Jacobson, Robert B.

    2006-01-01

    A multiscale geomorphic classification was established for the 39-mile, 59-mile, and adjacent segments of the Missouri National Recreational River administered by the National Park Service in South Dakota and Nebraska. The objective of the classification was to define naturally occurring clusters of geomorphic characteristics that would be indicative of discrete sets of geomorphic processes, with the intent that such a classification would be useful in river-management and rehabilitation decisions. The statistical classification was based on geomorphic characteristics of the river collected from 1999 orthophotography and the persistence of classified units was evaluated by comparison with similar datasets for 2003 and 2004 and by evaluating variation of bank erosion rates by geomorphic class. Changes in channel location and form were also explored using imagery and maps from 1993-2004, 1941 and 1894. The multivariate classification identified a hierarchy of naturally occurring clusters of reach-scale geomorphic characteristics. The simplest level of the hierarchy divides the river from segments into discrete reaches characterized by single and multithread channels and additional hierarchical levels established 4-part and 10-part classifications. The classification system presents a physical framework that can be applied to prioritization and design of bank stabilization projects, design of habitat rehabilitation projects, and stratification of monitoring and assessment sampling programs.

  10. Diffuse reflectance startigraphy - a new method in the study of loess (?)

    NASA Astrophysics Data System (ADS)

    József, Szeberényi; Balázs, Bradák; Klaudia, Kiss; József, Kovács; György, Varga; Réka, Balázs; Viczián, István

    2017-04-01

    The different varieties of loess (and intercalated paleosol layers) together constitute one of the most widespread terrestrial sediments, which was deposited, altered, and redeposited in the course of the changing climatic conditions of the Pleistocene. To reveal more information about Pleistocene climate cycles and/or environments the detailed lithostratigraphical subdivision and classification of the loess variations and paleosols are necessary. Beside the numerous method such as various field measurements, semi-quantitative tests and laboratory investigations, diffuse reflectance spectroscopy (DRS) is one of the well applied method on loess/paleosol sequences. Generally, DRS has been used to separate the detrital and pedogenic mineral component of the loess sections by the hematite/goethite ratio. DRS also has been applied as a joint method of various environmental magnetic investigations such as magnetic susceptibility- and isothermal remanent magnetization measurements. In our study the so-called "diffuse reflectance stratigraphy method" were developed. At First, complex mathematical method was applied to compare the results of the spectral reflectance measurements. One of the most preferred multivariate methods is cluster analysis. Its scope is to group and compare the loess variations and paleosol based on the similarity and common properties of their reflectance curves. In the Second, beside the basic subdivision of the profiles by the different reflectance curves of the layers, the most characteristic wavelength section of the reflectance curve was determined. This sections played the most important role during the classification of the different materials of the section. The reflectance value of individual samples, belonged to the characteristic wavelength were depicted in the function of depth and well correlated with other proxies like grain size distribution and magnetic susceptibility data. The results of the correlation showed the significance of the "combined reflectance stratigraphy" as a stratigraphical method and as an environmental proxy also.

  11. [Mortality in early-stage, surgically resected non-small cell lung cancer less than 3 cm of size: Competing risk analysis].

    PubMed

    Jordá Aragón, Carlos; Peñalver Cuesta, Juan Carlos; Mancheño Franch, Nuria; de Aguiar Quevedo, Karol; Vera Sempere, Francisco; Padilla Alarcón, José

    2015-09-07

    Survival studies of non-small cell lung cancer (NSCLC) are usually based on the Kaplan-Meier method. However, other factors not covered by this method may modify the observation of the event of interest. There are models of cumulative incidence (CI), that take into account these competing risks, enabling more accurate survival estimates and evaluation of the risk of death from other causes. We aimed to evaluate these models in resected early-stage NSCLC patients. This study included 263 patients with resected NSCLC whose diameter was ≤ 3 cm without node involvement (N0). Demographic, clinical, morphopathological and surgical variables, TNM classification and long-term evolution were analysed. To analyse CI, death by another cause was considered to be competitive event. For the univariate analysis, Gray's method was used, while Fine and Gray's method was employed for the multivariate analysis. Mortality by NSCLC was 19.4% at 5 years and 14.3% by another cause. Both curves crossed at 6.3 years, and probability of death by another cause became greater from this point. In multivariate analysis, cancer mortality was conditioned by visceral pleural invasion (VPI) (P=.001) and vascular invasion (P=.020), with age>50 years (P=.034), smoking (P=.009) and the Charlson index ≥ 2 (P=.000) being by no cancer. By the method of CI, VPI and vascular invasion conditioned cancer death in NSCLC >3 cm, while non-tumor causes of long-term death were determined. Copyright © 2014 Elsevier España, S.L.U. All rights reserved.

  12. DEFINITION OF MULTIVARIATE GEOCHEMICAL ASSOCIATIONS WITH POLYMETALLIC MINERAL OCCURRENCES USING A SPATIALLY DEPENDENT CLUSTERING TECHNIQUE AND RASTERIZED STREAM SEDIMENT DATA - AN ALASKAN EXAMPLE.

    USGS Publications Warehouse

    Jenson, Susan K.; Trautwein, C.M.

    1984-01-01

    The application of an unsupervised, spatially dependent clustering technique (AMOEBA) to interpolated raster arrays of stream sediment data has been found to provide useful multivariate geochemical associations for modeling regional polymetallic resource potential. The technique is based on three assumptions regarding the compositional and spatial relationships of stream sediment data and their regional significance. These assumptions are: (1) compositionally separable classes exist and can be statistically distinguished; (2) the classification of multivariate data should minimize the pair probability of misclustering to establish useful compositional associations; and (3) a compositionally defined class represented by three or more contiguous cells within an array is a more important descriptor of a terrane than a class represented by spatial outliers.

  13. Multivariate Feature Selection of Image Descriptors Data for Breast Cancer with Computer-Assisted Diagnosis

    PubMed Central

    Galván-Tejada, Carlos E.; Zanella-Calzada, Laura A.; Galván-Tejada, Jorge I.; Celaya-Padilla, José M.; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L.

    2017-01-01

    Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions. PMID:28216571

  14. Multivariate Feature Selection of Image Descriptors Data for Breast Cancer with Computer-Assisted Diagnosis.

    PubMed

    Galván-Tejada, Carlos E; Zanella-Calzada, Laura A; Galván-Tejada, Jorge I; Celaya-Padilla, José M; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L

    2017-02-14

    Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions.

  15. Artificial Neural Networks in Policy Research: A Current Assessment.

    ERIC Educational Resources Information Center

    Woelfel, Joseph

    1993-01-01

    Suggests that artificial neural networks (ANNs) exhibit properties that promise usefulness for policy researchers. Notes that ANNs have found extensive use in areas once reserved for multivariate statistical programs such as regression and multiple classification analysis and are developing an extensive community of advocates for processing text…

  16. MMPI Modal Profiles in a Juvenile Delinquent Population.

    ERIC Educational Resources Information Center

    Pickett, Lawrence K., Jr.

    1981-01-01

    The MMPI results obtained from 245 adolescent males referred to the evaluation unit of a Juvenile Court were submitted to a multivariate classification system. By correlating individual subject profiles with the modal profiles, six membership groups were formed. No relationship was found between group membership and age or race. (Author)

  17. Sample classification for improved performance of PLS models applied to the quality control of deep-frying oils of different botanic origins analyzed using ATR-FTIR spectroscopy.

    PubMed

    Kuligowski, Julia; Carrión, David; Quintás, Guillermo; Garrigues, Salvador; de la Guardia, Miguel

    2011-01-01

    The selection of an appropriate calibration set is a critical step in multivariate method development. In this work, the effect of using different calibration sets, based on a previous classification of unknown samples, on the partial least squares (PLS) regression model performance has been discussed. As an example, attenuated total reflection (ATR) mid-infrared spectra of deep-fried vegetable oil samples from three botanical origins (olive, sunflower, and corn oil), with increasing polymerized triacylglyceride (PTG) content induced by a deep-frying process were employed. The use of a one-class-classifier partial least squares-discriminant analysis (PLS-DA) and a rooted binary directed acyclic graph tree provided accurate oil classification. Oil samples fried without foodstuff could be classified correctly, independent of their PTG content. However, class separation of oil samples fried with foodstuff, was less evident. The combined use of double-cross model validation with permutation testing was used to validate the obtained PLS-DA classification models, confirming the results. To discuss the usefulness of the selection of an appropriate PLS calibration set, the PTG content was determined by calculating a PLS model based on the previously selected classes. In comparison to a PLS model calculated using a pooled calibration set containing samples from all classes, the root mean square error of prediction could be improved significantly using PLS models based on the selected calibration sets using PLS-DA, ranging between 1.06 and 2.91% (w/w).

  18. Nonparametric, Coupled ,Bayesian ,Dictionary ,and Classifier Learning for Hyperspectral Classification.

    PubMed

    Akhtar, Naveed; Mian, Ajmal

    2017-10-03

    We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.

  19. Climate Classification is an Important Factor in ­Assessing Hospital Performance Metrics

    NASA Astrophysics Data System (ADS)

    Boland, M. R.; Parhi, P.; Gentine, P.; Tatonetti, N. P.

    2017-12-01

    Context/Purpose: Climate is a known modulator of disease, but its impact on hospital performance metrics remains unstudied. Methods: We assess the relationship between Köppen-Geiger climate classification and hospital performance metrics, specifically 30-day mortality, as reported in Hospital Compare, and collected for the period July 2013 through June 2014 (7/1/2013 - 06/30/2014). A hospital-level multivariate linear regression analysis was performed while controlling for known socioeconomic factors to explore the relationship between all-cause mortality and climate. Hospital performance scores were obtained from 4,524 hospitals belonging to 15 distinct Köppen-Geiger climates and 2,373 unique counties. Results: Model results revealed that hospital performance metrics for mortality showed significant climate dependence (p<0.001) after adjusting for socioeconomic factors. Interpretation: Currently, hospitals are reimbursed by Governmental agencies using 30-day mortality rates along with 30-day readmission rates. These metrics allow Government agencies to rank hospitals according to their `performance' along these metrics. Various socioeconomic factors are taken into consideration when determining individual hospitals performance. However, no climate-based adjustment is made within the existing framework. Our results indicate that climate-based variability in 30-day mortality rates does exist even after socioeconomic confounder adjustment. Use of standardized high-level climate classification systems (such as Koppen-Geiger) would be useful to incorporate in future metrics. Conclusion: Climate is a significant factor in evaluating hospital 30-day mortality rates. These results demonstrate that climate classification is an important factor when comparing hospital performance across the United States.

  20. Electroencephalogram-based decoding cognitive states using convolutional neural network and likelihood ratio based score fusion.

    PubMed

    Zafar, Raheel; Dass, Sarat C; Malik, Aamir Saeed

    2017-01-01

    Electroencephalogram (EEG)-based decoding human brain activity is challenging, owing to the low spatial resolution of EEG. However, EEG is an important technique, especially for brain-computer interface applications. In this study, a novel algorithm is proposed to decode brain activity associated with different types of images. In this hybrid algorithm, convolutional neural network is modified for the extraction of features, a t-test is used for the selection of significant features and likelihood ratio-based score fusion is used for the prediction of brain activity. The proposed algorithm takes input data from multichannel EEG time-series, which is also known as multivariate pattern analysis. Comprehensive analysis was conducted using data from 30 participants. The results from the proposed method are compared with current recognized feature extraction and classification/prediction techniques. The wavelet transform-support vector machine method is the most popular currently used feature extraction and prediction method. This method showed an accuracy of 65.7%. However, the proposed method predicts the novel data with improved accuracy of 79.9%. In conclusion, the proposed algorithm outperformed the current feature extraction and prediction method.

  1. Performance Assessment of Kernel Density Clustering for Gene Expression Profile Data

    PubMed Central

    Zeng, Beiyan; Chen, Yiping P.; Smith, Oscar H.

    2003-01-01

    Kernel density smoothing techniques have been used in classification or supervised learning of gene expression profile (GEP) data, but their applications to clustering or unsupervised learning of those data have not been explored and assessed. Here we report a kernel density clustering method for analysing GEP data and compare its performance with the three most widely-used clustering methods: hierarchical clustering, K-means clustering, and multivariate mixture model-based clustering. Using several methods to measure agreement, between-cluster isolation, and withincluster coherence, such as the Adjusted Rand Index, the Pseudo F test, the r2 test, and the profile plot, we have assessed the effectiveness of kernel density clustering for recovering clusters, and its robustness against noise on clustering both simulated and real GEP data. Our results show that the kernel density clustering method has excellent performance in recovering clusters from simulated data and in grouping large real expression profile data sets into compact and well-isolated clusters, and that it is the most robust clustering method for analysing noisy expression profile data compared to the other three methods assessed. PMID:18629292

  2. Discerning suicide in drug intoxication deaths: Paucity and primacy of suicide notes and psychiatric history

    PubMed Central

    Caine, Eric D.; Connery, Hilary S.; D’Onofrio, Gail; Gunnell, David J.; Miller, Ted R.; Nolte, Kurt B.; Kaplan, Mark S.; Kapusta, Nestor D.; Lilly, Christa L.; Nelson, Lewis S.; Putnam, Sandra L.; Stack, Steven; Värnik, Peeter; Webster, Lynn R.; Jia, Haomiao

    2018-01-01

    Objective A paucity of corroborative psychological and psychiatric evidence may be inhibiting detection of drug intoxication suicides in the United States. We evaluated the relative importance of suicide notes and psychiatric history in the classification of suicide by drug intoxication versus firearm (gunshot wound) plus hanging/suffocation—the other two major, but overtly violent methods. Methods This observational multilevel (individual/county), multivariable study employed a generalized linear mixed model (GLMM) to analyze pooled suicides and undetermined intent deaths, as possible suicides, among the population aged 15 years and older in the 17 states participating in the National Violent Death Reporting System throughout 2011–2013. The outcome measure was relative odds of suicide versus undetermined classification, adjusted for demographics, precipitating circumstances, and investigation characteristics. Results A suicide note, prior suicide attempt, or affective disorder was documented in less than one-third of suicides and one-quarter of undetermined deaths. The prevalence gaps were larger among drug intoxication cases than gunshot/hanging cases. The latter were more likely than intoxication cases to be classified as suicide versus undetermined manner of death (adjusted odds ratio [OR], 41.14; 95% CI, 34.43–49.15), as were cases documenting a suicide note (OR, 33.90; 95% CI, 26.11–44.05), prior suicide attempt (OR, 2.42; 95% CI, 2.11–2.77), or depression (OR, 1.61; 95% CI, 1.38 to 1.88), or bipolar disorder (OR, 1.41; 95% CI, 1.10–1.81). Stratification by mechanism/cause intensified the association between a note and suicide classification for intoxication cases (OR, 45.43; 95% CI, 31.06–66.58). Prior suicide attempt (OR, 2.64; 95% CI, 2.19–3.18) and depression (OR, 1.48; 95% CI, 1.17–1.87) were associated with suicide classification in intoxication but not gunshot/hanging cases. Conclusions Without psychological/psychiatric evidence contributing to manner of death classification, suicide by drug intoxication in the US is likely profoundly under-reported. Findings harbor adverse implications for surveillance, etiologic understanding, and prevention of suicides and drug deaths. PMID:29320540

  3. Characterization of agricultural land using singular value decomposition

    NASA Astrophysics Data System (ADS)

    Herries, Graham M.; Danaher, Sean; Selige, Thomas

    1995-11-01

    A method is defined and tested for the characterization of agricultural land from multi-spectral imagery, based on singular value decomposition (SVD) and key vector analysis. The SVD technique, which bears a close resemblance to multivariate statistic techniques, has previously been successfully applied to problems of signal extraction for marine data and forestry species classification. In this study the SVD technique is used as a classifier for agricultural regions, using airborne Daedalus ATM data, with 1 m resolution. The specific region chosen is an experimental research farm in Bavaria, Germany. This farm has a large number of crops, within a very small region and hence is not amenable to existing techniques. There are a number of other significant factors which render existing techniques such as the maximum likelihood algorithm less suitable for this area. These include a very dynamic terrain and tessellated pattern soil differences, which together cause large variations in the growth characteristics of the crops. The SVD technique is applied to this data set using a multi-stage classification approach, removing unwanted land-cover classes one step at a time. Typical classification accuracy's for SVD are of the order of 85-100%. Preliminary results indicate that it is a fast and efficient classifier with the ability to differentiate between crop types such as wheat, rye, potatoes and clover. The results of characterizing 3 sub-classes of Winter Wheat are also shown.

  4. Use of genetic algorithm for the selection of EEG features

    NASA Astrophysics Data System (ADS)

    Asvestas, P.; Korda, A.; Kostopoulos, S.; Karanasiou, I.; Ouzounoglou, A.; Sidiropoulos, K.; Ventouras, E.; Matsopoulos, G.

    2015-09-01

    Genetic Algorithm (GA) is a popular optimization technique that can detect the global optimum of a multivariable function containing several local optima. GA has been widely used in the field of biomedical informatics, especially in the context of designing decision support systems that classify biomedical signals or images into classes of interest. The aim of this paper is to present a methodology, based on GA, for the selection of the optimal subset of features that can be used for the efficient classification of Event Related Potentials (ERPs), which are recorded during the observation of correct or incorrect actions. In our experiment, ERP recordings were acquired from sixteen (16) healthy volunteers who observed correct or incorrect actions of other subjects. The brain electrical activity was recorded at 47 locations on the scalp. The GA was formulated as a combinatorial optimizer for the selection of the combination of electrodes that maximizes the performance of the Fuzzy C Means (FCM) classification algorithm. In particular, during the evolution of the GA, for each candidate combination of electrodes, the well-known (Σ, Φ, Ω) features were calculated and were evaluated by means of the FCM method. The proposed methodology provided a combination of 8 electrodes, with classification accuracy 93.8%. Thus, GA can be the basis for the selection of features that discriminate ERP recordings of observations of correct or incorrect actions.

  5. G-mode analysis of the reflection spectra of 84 asteroids.

    NASA Astrophysics Data System (ADS)

    Birlan, M.; Barucci, M. A.; Fulchignoni, M.

    1996-01-01

    A revised version of the G-mode multivariate statistics (Coradini et al. 1977) has been used to analyse a sample of 84 asteroids. This sample of asteroids is described by 29 variables, namely 23 colours between 0.9 and 2.35 microns obtained from the data base collected by Bell et al. (Private communication), 5 colors between 0.3 and 0.85 microns from the ECAS survey (Zellner et al. 1985) and the revised IRAS albedo (Tedesco et al. 1992). The G-mode method allows the user to obtain an automatic classification of the asteroids in spectrally homogeneous groups. The role of the IR colours in separating the various groups is outlined, particularly with regard to the fine subdivision of S and C taxonomical types.

  6. Symposium on Machine Processing of Remotely Sensed Data, Purdue University, West Lafayette, Ind., June 29-July 1, 1976, Proceedings

    NASA Technical Reports Server (NTRS)

    1976-01-01

    Papers are presented on the applicability of Landsat data to water management and control needs, IBIS, a geographic information system based on digital image processing and image raster datatype, and the Image Data Access Method (IDAM) for the Earth Resources Interactive Processing System. Attention is also given to the Prototype Classification and Mensuration System (PROCAMS) applied to agricultural data, the use of Landsat for water quality monitoring in North Carolina, and the analysis of geophysical remote sensing data using multivariate pattern recognition. The Illinois crop-acreage estimation experiment, the Pacific Northwest Resources Inventory Demonstration, and the effects of spatial misregistration on multispectral recognition are also considered. Individual items are announced in this issue.

  7. Subdivision of Holocene Baltic sea sediments by their physical properties [Gliederung holozaner ostseesedimente nach physikalischen Eigenschaften

    USGS Publications Warehouse

    Harff, Jan; Bohling, Geoffrey C.; Endler, R.; Davis, J.C.; Olea, R.A.

    1999-01-01

    The Holocene sediment sequence of a core taken within the centre of the Eastern Gotland Basin was subdivided into 12 lithostratigraphic units based on MSCL-data (sound velocity, wet bulk density, magnetic susceptibility) using a multivariate classification method. The lower 6 units embrace the sediments until the Litorina transgression, and the upper 6 units subdivide the brackish-marine Litorina- and post-Litorina sediments. The upper lithostratigraphic units reflect a change of anoxic (laminated) and oxic (non-laminated) sediments. By application of a numerical stratigraphic correlation method the zonation was extended laterally onto contiguous sediment cores within the central basin. Consequently the change of anoxic and oxic sediments can be used for a general lithostratigraphic subdivision of sediments of the Gotland Basin. A quantitative criterion based on the sediment-physical lithofacies is added to existing subdivisions of the Holocene in the Baltic Sea.

  8. Can the new RCP R0/R1 classification predict the clinical outcome in ductal adenocarcinoma of the pancreatic head?

    PubMed

    Janot, M S; Kersting, S; Belyaev, O; Matuschek, A; Chromik, A M; Suelberg, D; Uhl, W; Tannapfel, A; Bergmann, U

    2012-08-01

    According to the International Union Against Cancer (UICC), R1 is defined as the microscopic presence of tumor cells at the surface of the resection margin (RM). In contrast, the Royal College of Pathologists (RCP) suggested to declare R1 already when tumor cells are found within 1 mm of the RM. The aim of this study was to determine the significance of the RM concerning the prognosis of pancreatic ductal adenocarcinoma (PDAC). From 2007 to 2009, 62 patients underwent a curative operation for PDAC of the pancreatic head. The relevance of R status on cumulative overall survival (OS) was assessed on univariate and multivariate analysis for both the classic R classification (UICC) and the suggestion of the RCP. Following the UICC criteria, a positive RM was detected in 8 %. Along with grading and lymph node ratio, R status revealed a significant impact on OS on univariate and multivariate analysis. Applying the suggestion of the RCP, R1 rate rose to 26 % resulting in no significant impact on OS in univariate analysis. Our study has shown that the RCP suggestion for R status has no impact on the prognosis of PDAC. In contrast, our data confirmed the UICC R classification of RM as well as N category, grading, and lymph node ratio as significant prognostic factors.

  9. Mapping of rock types using a joint approach by combining the multivariate statistics, self-organizing map and Bayesian neural networks: an example from IODP 323 site

    NASA Astrophysics Data System (ADS)

    Karmakar, Mampi; Maiti, Saumen; Singh, Amrita; Ojha, Maheswar; Maity, Bhabani Sankar

    2017-07-01

    Modeling and classification of the subsurface lithology is very important to understand the evolution of the earth system. However, precise classification and mapping of lithology using a single framework are difficult due to the complexity and the nonlinearity of the problem driven by limited core sample information. Here, we implement a joint approach by combining the unsupervised and the supervised methods in a single framework for better classification and mapping of rock types. In the unsupervised method, we use the principal component analysis (PCA), K-means cluster analysis (K-means), dendrogram analysis, Fuzzy C-means (FCM) cluster analysis and self-organizing map (SOM). In the supervised method, we use the Bayesian neural networks (BNN) optimized by the Hybrid Monte Carlo (HMC) (BNN-HMC) and the scaled conjugate gradient (SCG) (BNN-SCG) techniques. We use P-wave velocity, density, neutron porosity, resistivity and gamma ray logs of the well U1343E of the Integrated Ocean Drilling Program (IODP) Expedition 323 in the Bering Sea slope region. While the SOM algorithm allows us to visualize the clustering results in spatial domain, the combined classification schemes (supervised and unsupervised) uncover the different patterns of lithology such of as clayey-silt, diatom-silt and silty-clay from an un-cored section of the drilled hole. In addition, the BNN approach is capable of estimating uncertainty in the predictive modeling of three types of rocks over the entire lithology section at site U1343. Alternate succession of clayey-silt, diatom-silt and silty-clay may be representative of crustal inhomogeneity in general and thus could be a basis for detail study related to the productivity of methane gas in the oceans worldwide. Moreover, at the 530 m depth down below seafloor (DSF), the transition from Pliocene to Pleistocene could be linked to lithological alternation between the clayey-silt and the diatom-silt. The present results could provide the basis for the detailed study to get deeper insight into the Bering Sea' sediment deposition and sequence.

  10. Attachment-based classifications of children's family drawings: psychometric properties and relations with children's adjustment in kindergarten.

    PubMed

    Pianta, R C; Longmaid, K; Ferguson, J E

    1999-06-01

    Investigated an attachment-based theoretical framework and classification system, introduced by Kaplan and Main (1986), for interpreting children's family drawings. This study concentrated on the psychometric properties of the system and the relation between drawings classified using this system and teacher ratings of classroom social-emotional and behavioral functioning, controlling for child age, ethnic status, intelligence, and fine motor skills. This nonclinical sample consisted of 200 kindergarten children of diverse racial and socioeconomic status (SES). Limited support for reliability of this classification system was obtained. Kappas for overall classifications of drawings (e.g., secure) exceeded .80 and mean kappa for discrete drawing features (e.g., figures with smiles) was .82. Coders' endorsement of the presence of certain discrete drawing features predicted their overall classification at 82.5% accuracy. Drawing classification was related to teacher ratings of classroom functioning independent of child age, sex, race, SES, intelligence, and fine motor skills (with p values for the multivariate effects ranging from .043-.001). Results are discussed in terms of the psychometric properties of this system for classifying children's representations of family and the limitations of family drawing techniques for young children.

  11. Stability and bias of classification rates in biological applications of discriminant analysis

    USGS Publications Warehouse

    Williams, B.K.; Titus, K.; Hines, J.E.

    1990-01-01

    We assessed the sampling stability of classification rates in discriminant analysis by using a factorial design with factors for multivariate dimensionality, dispersion structure, configuration of group means, and sample size. A total of 32,400 discriminant analyses were conducted, based on data from simulated populations with appropriate underlying statistical distributions. Simulation results indicated strong bias in correct classification rates when group sample sizes were small and when overlap among groups was high. We also found that stability of the correct classification rates was influenced by these factors, indicating that the number of samples required for a given level of precision increases with the amount of overlap among groups. In a review of 60 published studies, we found that 57% of the articles presented results on classification rates, though few of them mentioned potential biases in their results. Wildlife researchers should choose the total number of samples per group to be at least 2 times the number of variables to be measured when overlap among groups is low. Substantially more samples are required as the overlap among groups increases

  12. Urogenital tuberculosis: definition and classification.

    PubMed

    Kulchavenya, Ekaterina

    2014-10-01

    To improve the approach to the diagnosis and management of urogenital tuberculosis (UGTB), we need clear and unique classification. UGTB remains an important problem, especially in developing countries, but it is often an overlooked disease. As with any other infection, UGTB should be cured by antibacterial therapy, but because of late diagnosis it may often require surgery. Scientific literature dedicated to this problem was critically analyzed and juxtaposed with the author's own more than 30 years' experience in tuberculosis urology. The conception, terms and definition were consolidated into one system; classification stage by stage as well as complications are presented. Classification of any disease includes dispersion on forms and stages and exact definitions for each stage. Clinical features and symptoms significantly vary between different forms and stages of UGTB. A simple diagnostic algorithm was constructed. UGTB is multivariant disease and a standard unified approach to it is impossible. Clear definition as well as unique classification are necessary for real estimation of epidemiology and the optimization of therapy. The term 'UGTB' has insufficient information in order to estimate therapy, surgery and prognosis, or to evaluate the epidemiology.

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kowalchik, Kristin V.; Vallow, Laura A., E-mail: vallow.laura@mayo.edu; McDonough, Michelle

    Purpose: To study the utility of preoperative breast MRI for partial breast irradiation (PBI) patient selection, using multivariable analysis of significant risk factors to create a classification rule. Methods and Materials: Between 2002 and 2009, 712 women with newly diagnosed breast cancer underwent preoperative bilateral breast MRI at Mayo Clinic Florida. Of this cohort, 566 were retrospectively deemed eligible for PBI according to the National Surgical Adjuvant Breast and Bowel Project Protocol B-39 inclusion criteria using physical examination, mammogram, and/or ultrasound. Magnetic resonance images were then reviewed to determine their impact on patient eligibility. The patient and tumor characteristics weremore » evaluated to determine risk factors for altered PBI eligibility after MRI and to create a classification rule. Results: Of the 566 patients initially eligible for PBI, 141 (25%) were found ineligible because of pathologically proven MRI findings. Magnetic resonance imaging detected additional ipsilateral breast cancer in 118 (21%). Of these, 62 (11%) had more extensive disease than originally noted before MRI, and 64 (11%) had multicentric disease. Contralateral breast cancer was detected in 28 (5%). Four characteristics were found to be significantly associated with PBI ineligibility after MRI on multivariable analysis: premenopausal status (P=.021), detection by palpation (P<.001), first-degree relative with a history of breast cancer (P=.033), and lobular histology (P=.002). Risk factors were assigned a score of 0-2. The risk of altered PBI eligibility from MRI based on number of risk factors was 0:18%; 1:22%; 2:42%; 3:65%. Conclusions: Preoperative bilateral breast MRI altered the PBI recommendations for 25% of women. Women who may undergo PBI should be considered for breast MRI, especially those with lobular histology or with 2 or more of the following risk factors: premenopausal, detection by palpation, and first-degree relative with a history of breast cancer.« less

  14. Prognosticators and risk grouping in patients with lung metastasis from nasopharyngeal carcinoma: a more accurate and appropriate assessment of prognosis.

    PubMed

    Cao, Xun; Luo, Rong-Zhen; He, Li-Ru; Li, Yong; Lin, Wen-Qian; Chen, You-Fang; Wen, Zhe-Sheng

    2011-08-26

    Lung metastases arising from nasopharyngeal carcinomas (NPC) have a relatively favourable prognosis. The purpose of this study was to identify the prognostic factors and to establish a risk grouping in patients with lung metastases from NPC. A total of 198 patients who developed lung metastases from NPC after primary therapy were retrospectively recruited from January 1982 to December 2000. Univariate and multivariate analyses of clinical variables were performed using Cox proportional hazards regression models. Actuarial survival rates were plotted against time using the Kaplan-Meier method, and log-rank testing was used to compare the differences between the curves. The median overall survival (OS) period and the lung metastasis survival (LMS) period were 51.5 and 20.9 months, respectively. After univariate and multivariate analyses of the clinical variables, age, T classification, N classification, site of metastases, secondary metastases and disease-free interval (DFI) correlated with OS, whereas age, VCA-IgA titre, number of metastases and secondary metastases were related to LMS. The prognoses of the low- (score 0-1), intermediate- (score 2-3) and high-risk (score 4-8) subsets based on these factors were significantly different. The 3-, 5- and 10-year survival rates of the low-, intermediate- and high-risk subsets, respectively (P < 0.001) were as follows: 77.3%, 60% and 59%; 52.3%, 30% and 27.8%; and 20.5%, 7% and 0%. In this study, clinical variables provided prognostic indicators of survival in NPC patients with lung metastases. Risk subsets would help in a more accurate assessment of a patient's prognosis in the clinical setting and could facilitate the establishment of patient-tailored medical strategies and supports.

  15. Evaluation of the 7(th) edition of the UICC-AJCC tumor, node, metastasis classification for esophageal cancer in a Chinese cohort.

    PubMed

    Huang, Yan; Guo, Weigang; Shi, Shiming; He, Jian

    2016-07-01

    To assess and evaluate the prognostic value of the 7(th) edition of the Union for International Cancer Control-American Joint Committee on Cancer (UICC-AJCC) tumor, node, metastasis (TNM) staging system for Chinese patients with esophageal cancer in comparison with the 6(th) edition. A retrospective review was performed on 766 consecutive esophageal cancer patients treated with esophagectomy between 2008 and 2012. Patients were staged according to the 6(th) and 7(th) editions for esophageal cancer respectively. Survival was calculated by the Kaplan-Meier method, and multivariate analysis was performed using Cox regression model. Overall 3-year survival rate was 59.5%. There were significant differences in 3-year survival rates among T stages both according to the 6(th) edition and the 7(th) edition (P<0.001). According to the 7(th) edition, the 3-year survival rates of N0 (75.4%), N1 (65.2%), N2 (39.7%) and N3 (27.3%) patients were significant differences (P<0.001). Kaplan-Meier curve revealed a good discriminatory ability from stage I to IV, except for stage IB, IIA and IIB in the 7(th) edition staging system. Based on the 7(th) edition, the degree of differentiation, tumor length and tumor location were not independent prognostic factors on multivariate analysis. The multivariate analyses suggested that pT-, pN-, pTNM-category were all the independent prognostic factors based on the 6(th) and 7(th) edition staging system. The 7(th) edition of AJCC TNM staging system of esophageal cancer should discriminate pT2-3N0M0 (stage IB, IIA and IIB) better when considering the esophageal squamous cell cancer patients. Therefore, to improve and optimize the AJCC TNM classification for Chinese patients with esophageal cancer, more considerations about the value of tumor grade and tumor location in pT2-3N0M0 esophageal squamous cell cancer should be taken in the next new TNM staging system.

  16. Analyzing thematic maps and mapping for accuracy

    USGS Publications Warehouse

    Rosenfield, G.H.

    1982-01-01

    Two problems which exist while attempting to test the accuracy of thematic maps and mapping are: (1) evaluating the accuracy of thematic content, and (2) evaluating the effects of the variables on thematic mapping. Statistical analysis techniques are applicable to both these problems and include techniques for sampling the data and determining their accuracy. In addition, techniques for hypothesis testing, or inferential statistics, are used when comparing the effects of variables. A comprehensive and valid accuracy test of a classification project, such as thematic mapping from remotely sensed data, includes the following components of statistical analysis: (1) sample design, including the sample distribution, sample size, size of the sample unit, and sampling procedure; and (2) accuracy estimation, including estimation of the variance and confidence limits. Careful consideration must be given to the minimum sample size necessary to validate the accuracy of a given. classification category. The results of an accuracy test are presented in a contingency table sometimes called a classification error matrix. Usually the rows represent the interpretation, and the columns represent the verification. The diagonal elements represent the correct classifications. The remaining elements of the rows represent errors by commission, and the remaining elements of the columns represent the errors of omission. For tests of hypothesis that compare variables, the general practice has been to use only the diagonal elements from several related classification error matrices. These data are arranged in the form of another contingency table. The columns of the table represent the different variables being compared, such as different scales of mapping. The rows represent the blocking characteristics, such as the various categories of classification. The values in the cells of the tables might be the counts of correct classification or the binomial proportions of these counts divided by either the row totals or the column totals from the original classification error matrices. In hypothesis testing, when the results of tests of multiple sample cases prove to be significant, some form of statistical test must be used to separate any results that differ significantly from the others. In the past, many analyses of the data in this error matrix were made by comparing the relative magnitudes of the percentage of correct classifications, for either individual categories, the entire map or both. More rigorous analyses have used data transformations and (or) two-way classification analysis of variance. A more sophisticated step of data analysis techniques would be to use the entire classification error matrices using the methods of discrete multivariate analysis or of multiviariate analysis of variance.

  17. Statistical Approaches to Type Determination of the Ejector Marks on Cartridge Cases.

    PubMed

    Warren, Eric M; Sheets, H David

    2018-03-01

    While type determination on bullets has been performed for over a century, type determination on cartridge cases is often overlooked. Presented here is an example of type determination of ejector marks on cartridge cases from Glock and Smith & Wesson Sigma series pistols using Naïve Bayes and Random Forest classification methods. The shapes of ejector marks were captured from images of test-fired cartridge cases and subjected to multivariate analysis. Naïve Bayes and Random Forest methods were used to assign the ejector shapes to the correct class of firearm with success rates as high as 98%. This method is easily implemented with equipment already available in crime laboratories and can serve as an investigative lead in the form of a list of firearms that could have fired the evidence. Paired with the FBI's General Rifling Characteristics (GRC) database, this could be an invaluable resource for firearm evidence at crime scenes. © 2017 American Academy of Forensic Sciences.

  18. Vibrational biospectroscopy: from plants to animals to humans. A historical perspective

    NASA Astrophysics Data System (ADS)

    Shaw, R. Anthony; Mantsch, Henry H.

    1999-05-01

    Today, more than ever, vibrational spectroscopy means different things to different people. From their roots as molecular fingerprinting techniques, both infrared and Raman spectroscopy have evolved to the point where they play roles in a staggering variety of scientific endeavors. This survey focuses upon biological and medical applications. The past 40 years have witnessed enormous advances in our understanding of the building blocks of life, and vibrational spectroscopy has played an important role. That role is reviewed briefly here. In parallel with these efforts, the near-IR community developed powerful 'chemometric' methods to extract a wealth of information from spectra that appeared superficially featureless. As vibrational spectroscopy is finding new niches in the medical and clinical realms, these chemometric methods are proving to be a valuable (but not infallible!) adjunct to conventional spectral interpretation. This survey includes a brief outline of biomedical vibrational spectroscopy and imaging, including several representative examples to illustrate the strengths and pitfalls of a growing reliance upon multivariate quantitation and classification methods.

  19. Classification of broiler breast fillets according to storage and to freeze-thaw treatment using near infrared spectroscopy and multivariate analysis

    USDA-ARS?s Scientific Manuscript database

    Visible/near-infrared (NIR) spectroscopy has shown potential for successfully classifying broiler breast fillets according to their texture properties. Freshness and shelf life are also important quality characteristics of boneless skinless chicken breast products in the marketplace. This study deal...

  20. Interpretable Early Classification of Multivariate Time Series

    ERIC Educational Resources Information Center

    Ghalwash, Mohamed F.

    2013-01-01

    Recent advances in technology have led to an explosion in data collection over time rather than in a single snapshot. For example, microarray technology allows us to measure gene expression levels in different conditions over time. Such temporal data grants the opportunity for data miners to develop algorithms to address domain-related problems,…

  1. Prediction of Gestational Diabetes through NMR Metabolomics of Maternal Blood.

    PubMed

    Pinto, Joana; Almeida, Lara M; Martins, Ana S; Duarte, Daniela; Barros, António S; Galhano, Eulália; Pita, Cristina; Almeida, Maria do Céu; Carreira, Isabel M; Gil, Ana M

    2015-06-05

    Metabolic biomarkers of pre- and postdiagnosis gestational diabetes mellitus (GDM) were sought, using nuclear magnetic resonance (NMR) metabolomics of maternal plasma and corresponding lipid extracts. Metabolite differences between controls and disease were identified through multivariate analysis of variable selected (1)H NMR spectra. For postdiagnosis GDM, partial least squares regression identified metabolites with higher dependence on normal gestational age evolution. Variable selection of NMR spectra produced good classification models for both pre- and postdiagnostic GDM. Prediagnosis GDM was accompanied by cholesterol increase and minor increases in lipoproteins (plasma), fatty acids, and triglycerides (extracts). Small metabolite changes comprised variations in glucose (up regulated), amino acids, betaine, urea, creatine, and metabolites related to gut microflora. Most changes were enhanced upon GDM diagnosis, in addition to newly observed changes in low-Mw compounds. GDM prediction seems possible exploiting multivariate profile changes rather than a set of univariate changes. Postdiagnosis GDM is successfully classified using a 26-resonance plasma biomarker. Plasma and extracts display comparable classification performance, the former enabling direct and more rapid analysis. Results and putative biochemical hypotheses require further confirmation in larger cohorts of distinct ethnicities.

  2. Multivariate cross-classification: applying machine learning techniques to characterize abstraction in neural representations

    PubMed Central

    Kaplan, Jonas T.; Man, Kingson; Greening, Steven G.

    2015-01-01

    Here we highlight an emerging trend in the use of machine learning classifiers to test for abstraction across patterns of neural activity. When a classifier algorithm is trained on data from one cognitive context, and tested on data from another, conclusions can be drawn about the role of a given brain region in representing information that abstracts across those cognitive contexts. We call this kind of analysis Multivariate Cross-Classification (MVCC), and review several domains where it has recently made an impact. MVCC has been important in establishing correspondences among neural patterns across cognitive domains, including motor-perception matching and cross-sensory matching. It has been used to test for similarity between neural patterns evoked by perception and those generated from memory. Other work has used MVCC to investigate the similarity of representations for semantic categories across different kinds of stimulus presentation, and in the presence of different cognitive demands. We use these examples to demonstrate the power of MVCC as a tool for investigating neural abstraction and discuss some important methodological issues related to its application. PMID:25859202

  3. Functional Groups Based on Leaf Physiology: Are they Spatially and Temporally Robust?

    NASA Technical Reports Server (NTRS)

    Foster, Tammy E.; Brooks, J. Renee

    2004-01-01

    The functional grouping hypothesis, which suggests that complexity in ecosystem function can be simplified by grouping species with similar responses, was tested in the Florida scrub habitat. Functional groups were identified based on how species in fire maintained Florida scrub regulate exchange of carbon and water with the atmosphere as indicated by both instantaneous gas exchange measurements and integrated measures of function (%N, delta C-13, delta N-15, C-N ratio). Using cluster analysis, five distinct physiologically-based functional groups were identified in the fire maintained scrub. These functional groups were tested to determine if they were robust spatially, temporally, and with management regime. Analysis of Similarities (ANOSIM), a non-parametric multivariate analysis, indicated that these five physiologically-based groupings were not altered by plot differences (R = -0.115, p = 0.893) or by the three different management regimes; prescribed burn, mechanically treated and burn, and fire-suppressed (R = 0.018, p = 0.349). The physiological groupings also remained robust between the two climatically different years 1999 and 2000 (R = -0.027, p = 0.725). Easy-to-measure morphological characteristics indicating functional groups would be more practical for scaling and modeling ecosystem processes than detailed gas-exchange measurements, therefore we tested a variety of morphological characteristics as functional indicators. A combination of non-parametric multivariate techniques (Hierarchical cluster analysis, non-metric Multi-Dimensional Scaling, and ANOSIM) were used to compare the ability of life form, leaf thickness, and specific leaf area classifications to identify the physiologically-based functional groups. Life form classifications (ANOSIM; R = 0.629, p 0.001) were able to depict the physiological groupings more adequately than either specific leaf area (ANOSIM; R = 0.426, p = 0.001) or leaf thickness (ANOSIM; R 0.344, p 0.001). The ability of life forms to depict the physiological groupings was improved by separating the parasitic Ximenia americana from the shrub category (ANOSIM; R = 0.794, p = 0.001). Therefore, a life form classification including parasites was determined to be a good indicator of the physiological processes of scrub species, and would be a useful method of grouping for scaling physiological processes to the ecosystem level.

  4. Age-specific discrimination of blood plasma samples of healthy and ovarian cancer prone mice using laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Melikechi, Noureddine; Markushin, Yuri; Connolly, Denise C.; Lasue, Jeremie; Ewusi-Annan, Ebo; Makrogiannis, Sokratis

    2016-09-01

    Epithelial ovarian cancer (EOC) mortality rates are strongly correlated with the stage at which it is diagnosed. Detection of EOC prior to its dissemination from the site of origin is known to significantly improve the patient outcome. However, there are currently no effective methods for early detection of the most common and lethal subtype of EOC. We sought to determine whether laser-induced breakdown spectroscopy (LIBS) and classification techniques such as linear discriminant analysis (LDA) and random forest (RF) could classify and differentiate blood plasma specimens from transgenic mice with ovarian carcinoma and wild type control mice. Herein we report results using this approach to distinguish blood plasma samples obtained from serially bled (at 8, 12, and 16 weeks) tumor-bearing TgMISIIR-TAg transgenic and wild type cancer-free littermate control mice. We have calculated the age-specific accuracy of classification using 18,000 laser-induced breakdown spectra of the blood plasma samples from tumor-bearing mice and wild type controls. When the analysis is performed in the spectral range 250 nm to 680 nm using LDA, these are 76.7 (± 2.6)%, 71.2 (± 1.3)%, and 73.1 (± 1.4)%, for the 8, 12 and 16 weeks. When the RF classifier is used, we obtain values of 78.5 (± 2.3)%, 76.9 (± 2.1)% and 75.4 (± 2.0)% in the spectral range of 250 nm to 680 nm, and 81.0 (± 1.8)%, 80.4 (± 2.1)% and 79.6 (± 3.5)% in 220 nm to 850 nm. In addition, we report, the positive and negative predictive values of the classification of the two classes of blood plasma samples. The approach used in this study is rapid, requires only 5 μL of blood plasma, and is based on the use of unsupervised and widely accepted multivariate analysis algorithms. These findings suggest that LIBS and multivariate analysis may be a novel approach for detecting EOC.

  5. Percentage of Positive Biopsy Cores: A Better Risk Stratification Model for Prostate Cancer?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huang Jiayi; Vicini, Frank A.; Williams, Scott G.

    2012-07-15

    Purpose: To assess the prognostic value of the percentage of positive biopsy cores (PPC) and perineural invasion in predicting the clinical outcomes after radiotherapy (RT) for prostate cancer and to explore the possibilities to improve on existing risk-stratification models. Methods and Materials: Between 1993 and 2004, 1,056 patients with clinical Stage T1c-T3N0M0 prostate cancer, who had four or more biopsy cores sampled and complete biopsy core data available, were treated with external beam RT, with or without a high-dose-rate brachytherapy boost at William Beaumont Hospital. The median follow-up was 7.6 years. Multivariate Cox regression analysis was performed with PPC, Gleasonmore » score, pretreatment prostate-specific antigen, T stage, PNI, radiation dose, androgen deprivation, age, prostate-specific antigen frequency, and follow-up duration. A new risk stratification (PPC classification) was empirically devised to incorporate PPC and replace the T stage. Results: On multivariate Cox regression analysis, the PPC was an independent predictor of distant metastasis, cause-specific survival, and overall survival (all p < .05). A PPC >50% was associated with significantly greater distant metastasis (hazard ratio, 4.01; 95% confidence interval, 1.86-8.61), and its independent predictive value remained significant with or without androgen deprivation therapy (all p < .05). In contrast, PNI and T stage were only predictive for locoregional recurrence. Combining the PPC ({<=}50% vs. >50%) with National Comprehensive Cancer Network risk stratification demonstrated added prognostic value of distant metastasis for the intermediate-risk (hazard ratio, 5.44; 95% confidence interval, 1.78-16.6) and high-risk (hazard ratio, 4.39; 95% confidence interval, 1.70-11.3) groups, regardless of the use of androgen deprivation and high-dose RT (all p < .05). The proposed PPC classification appears to provide improved stratification of the clinical outcomes relative to the National Comprehensive Cancer Network classification. Conclusions: The PPC is an independent and powerful predictor of clinical outcomes of prostate cancer after RT. A risk model replacing T stage with the PPC to reduce subjectivity demonstrated potentially improved stratification.« less

  6. Landsat TM inventory and assessment of waterbird habitat in the southern altiplano of South America

    USGS Publications Warehouse

    Boyle, T.P.; Caziani, S.M.; Waltermire, R.G.

    2004-01-01

    The diverse set of wetlands in southern altiplano of South America supports a number of endemic and migratory waterbirds. These species include endangered endemic flamingos and shorebirds that nest in North America and winter in the altiplano. This research developed maps from nine Landsat Thematic Mapper (TM) images (254,300 km2) to provide an inventory of aquatic waterbird habitats. Image processing software was used to produce a map with a classification of wetlands according to the habitat requirements of different types of waterbirds. A hierarchical procedure was used to, first, isolate the bodies of water within the TM image; second, execute an unsupervised classification on the subsetted image to produce 300 signatures of cover types, which were further subdivided as necessary. Third, each of the classifications was examined in the light of field data and personal experience for relevance to the determination of the various habitat types. Finally, the signatures were applied to the entire image and other adjacent images to yield a map depicting the location of the various waterbird habitats in the southern altiplano. The data sets referenced with a global positioning system receiver were used to test the classification system. Multivariate analysis of the bird communities censused at each lake by individual habitats indicated a salinity gradient, and then the depth of the water separated the birds. Multivariate analysis of the chemical and physical data from the lakes showed that the variation in lakes were significantly associated with difference in depth, transparency, latitude, elevation, and pH. The presence of gravel bottoms was also one of the qualities distinguishing a group of lakes. This information will be directly useful to the Flamingo Census Project and serve as an element for risk assessment for future development.

  7. Why Multivariate Methods Are Usually Vital in Research: Some Basic Concepts.

    ERIC Educational Resources Information Center

    Thompson, Bruce

    The present paper suggests that multivariate methods ought to be used more frequently in behavioral research and explores the potential consequences of failing to use multivariate methods when these methods are appropriate. The paper explores in detail two reasons why multivariate methods are usually vital. The first is that they limit the…

  8. Complementary biomarker-based methods for characterising Arctic sea ice conditions: A case study comparison between multivariate analysis and the PIP25 index

    NASA Astrophysics Data System (ADS)

    Köseoğlu, Denizcan; Belt, Simon T.; Smik, Lukas; Yao, Haoyi; Panieri, Giuliana; Knies, Jochen

    2018-02-01

    The discovery of IP25 as a qualitative biomarker proxy for Arctic sea ice and subsequent introduction of the so-called PIP25 index for semi-quantitative descriptions of sea ice conditions has significantly advanced our understanding of long-term paleo Arctic sea ice conditions over the past decade. We investigated the potential for classification tree (CT) models to provide a further approach to paleo Arctic sea ice reconstruction through analysis of a suite of highly branched isoprenoid (HBI) biomarkers in ca. 200 surface sediments from the Barents Sea. Four CT models constructed using different HBI assemblages revealed IP25 and an HBI triene as the most appropriate classifiers of sea ice conditions, achieving a >90% cross-validated classification rate. Additionally, lower model performance for locations in the Marginal Ice Zone (MIZ) highlighted difficulties in characterisation of this climatically-sensitive region. CT model classification and semi-quantitative PIP25-derived estimates of spring sea ice concentration (SpSIC) for four downcore records from the region were consistent, although agreement between proxy and satellite/observational records was weaker for a core from the west Svalbard margin, likely due to the highly variable sea ice conditions. The automatic selection of appropriate biomarkers for description of sea ice conditions, quantitative model assessment, and insensitivity to the c-factor used in the calculation of the PIP25 index are key attributes of the CT approach, and we provide an initial comparative assessment between these potentially complementary methods. The CT model should be capable of generating longer-term temporal shifts in sea ice conditions for the climatically sensitive Barents Sea.

  9. Landslide susceptibility mapping using decision-tree based CHi-squared automatic interaction detection (CHAID) and Logistic regression (LR) integration

    NASA Astrophysics Data System (ADS)

    Althuwaynee, Omar F.; Pradhan, Biswajeet; Ahmad, Noordin

    2014-06-01

    This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies.

  10. Parasites as biological tags of fish stocks: a meta-analysis of their discriminatory power.

    PubMed

    Poulin, Robert; Kamiya, Tsukushi

    2015-01-01

    The use of parasites as biological tags to discriminate among marine fish stocks has become a widely accepted method in fisheries management. Here, we first link this approach to its unstated ecological foundation, the decay in the similarity of the species composition of assemblages as a function of increasing distance between them, a phenomenon almost universal in nature. We explain how distance decay of similarity can influence the use of parasites as biological tags. Then, we perform a meta-analysis of 61 uses of parasites as tags of marine fish populations in multivariate discriminant analyses, obtained from 29 articles. Our main finding is that across all studies, the observed overall probability of correct classification of fish based on parasite data was about 71%. This corresponds to a two-fold improvement over the rate of correct classification expected by chance alone, and the average effect size (Zr = 0·463) computed from the original values was also indicative of a medium-to-large effect. However, none of the moderator variables included in the meta-analysis had a significant effect on the proportion of correct classification; these moderators included the total number of fish sampled, the number of parasite species used in the discriminant analysis, the number of localities from which fish were sampled, the minimum and maximum distance between any pair of sampling localities, etc. Therefore, there are no clear-cut situations in which the use of parasites as tags is more useful than others. Finally, we provide recommendations for the future usage of parasites as tags for stock discrimination, to ensure that future applications of the method achieve statistical rigour and a high discriminatory power.

  11. Multivariate pattern analysis reveals subtle brain anomalies relevant to the cognitive phenotype in neurofibromatosis type 1.

    PubMed

    Duarte, João V; Ribeiro, Maria J; Violante, Inês R; Cunha, Gil; Silva, Eduardo; Castelo-Branco, Miguel

    2014-01-01

    Neurofibromatosis Type 1 (NF1) is a common genetic condition associated with cognitive dysfunction. However, the pathophysiology of the NF1 cognitive deficits is not well understood. Abnormal brain structure, including increased total brain volume, white matter (WM) and grey matter (GM) abnormalities have been reported in the NF1 brain. These previous studies employed univariate model-driven methods preventing detection of subtle and spatially distributed differences in brain anatomy. Multivariate pattern analysis allows the combination of information from multiple spatial locations yielding a discriminative power beyond that of single voxels. Here we investigated for the first time subtle anomalies in the NF1 brain, using a multivariate data-driven classification approach. We used support vector machines (SVM) to classify whole-brain GM and WM segments of structural T1 -weighted MRI scans from 39 participants with NF1 and 60 non-affected individuals, divided in children/adolescents and adults groups. We also employed voxel-based morphometry (VBM) as a univariate gold standard to study brain structural differences. SVM classifiers correctly classified 94% of cases (sensitivity 92%; specificity 96%) revealing the existence of brain structural anomalies that discriminate NF1 individuals from controls. Accordingly, VBM analysis revealed structural differences in agreement with the SVM weight maps representing the most relevant brain regions for group discrimination. These included the hippocampus, basal ganglia, thalamus, and visual cortex. This multivariate data-driven analysis thus identified subtle anomalies in brain structure in the absence of visible pathology. Our results provide further insight into the neuroanatomical correlates of known features of the cognitive phenotype of NF1. Copyright © 2012 Wiley Periodicals, Inc.

  12. Formalized classification of moss litters in swampy spruce forests of intermontane depressions of Kuznetsk Alatau

    NASA Astrophysics Data System (ADS)

    Efremova, T. T.; Avrova, A. F.; Efremov, S. P.

    2016-09-01

    The approaches of multivariate statistics have been used for the numerical classification of morphogenetic types of moss litters in swampy spruce forests according to their physicochemical properties (the ash content, decomposition degree, bulk density, pH, mass, and thickness). Three clusters of moss litters— peat, peaty, and high-ash peaty—have been specified. The functions of classification for identification of new objects have been calculated and evaluated. The degree of decomposition and the ash content are the main classification parameters of litters, though all other characteristics are also statistically significant. The final prediction accuracy of the assignment of a litter to a particular cluster is 86%. Two leading factors participating in the clustering of litters have been determined. The first factor—the degree of transformation of plant remains (quality)—specifies 49% of the total variance, and the second factor—the accumulation rate (quantity)— specifies 26% of the total variance. The morphogenetic structure and physicochemical properties of the clusters of moss litters are characterized.

  13. Simultaneous fecal microbial and metabolite profiling enables accurate classification of pediatric irritable bowel syndrome.

    PubMed

    Shankar, Vijay; Reo, Nicholas V; Paliy, Oleg

    2015-12-09

    We previously showed that stool samples of pre-adolescent and adolescent US children diagnosed with diarrhea-predominant IBS (IBS-D) had different compositions of microbiota and metabolites compared to healthy age-matched controls. Here we explored whether observed fecal microbiota and metabolite differences between these two adolescent populations can be used to discriminate between IBS and health. We constructed individual microbiota- and metabolite-based sample classification models based on the partial least squares multivariate analysis and then applied a Bayesian approach to integrate individual models into a single classifier. The resulting combined classification achieved 84 % accuracy of correct sample group assignment and 86 % prediction for IBS-D in cross-validation tests. The performance of the cumulative classification model was further validated by the de novo analysis of stool samples from a small independent IBS-D cohort. High-throughput microbial and metabolite profiling of subject stool samples can be used to facilitate IBS diagnosis.

  14. Classification of Alzheimer's disease patients with hippocampal shape wrapper-based feature selection and support vector machine

    NASA Astrophysics Data System (ADS)

    Young, Jonathan; Ridgway, Gerard; Leung, Kelvin; Ourselin, Sebastien

    2012-02-01

    It is well known that hippocampal atrophy is a marker of the onset of Alzheimer's disease (AD) and as a result hippocampal volumetry has been used in a number of studies to provide early diagnosis of AD and predict conversion of mild cognitive impairment patients to AD. However, rates of atrophy are not uniform across the hippocampus making shape analysis a potentially more accurate biomarker. This study studies the hippocampi from 226 healthy controls, 148 AD patients and 330 MCI patients obtained from T1 weighted structural MRI images from the ADNI database. The hippocampi are anatomically segmented using the MAPS multi-atlas segmentation method, and the resulting binary images are then processed with SPHARM software to decompose their shapes as a weighted sum of spherical harmonic basis functions. The resulting parameterizations are then used as feature vectors in Support Vector Machine (SVM) classification. A wrapper based feature selection method was used as this considers the utility of features in discriminating classes in combination, fully exploiting the multivariate nature of the data and optimizing the selected set of features for the type of classifier that is used. The leave-one-out cross validated accuracy obtained on training data is 88.6% for classifying AD vs controls and 74% for classifying MCI-converters vs MCI-stable with very compact feature sets, showing that this is a highly promising method. There is currently a considerable fall in accuracy on unseen data indicating that the feature selection is sensitive to the data used, however feature ensemble methods may overcome this.

  15. Multivariate analyses of Erzgebirge granite and rhyolite composition: Implications for classification of granites and their genetic relations

    USGS Publications Warehouse

    Forster, H.-J.; Davis, J.C.; Tischendorf, G.; Seltmann, R.

    1999-01-01

    High-precision major, minor and trace element analyses for 44 elements have been made of 329 Late Variscan granitic and rhyolitic rocks from the Erzgebirge metallogenic province of Germany. The intrusive histories of some of these granites are not completely understood and exposures of rock are not adequate to resolve relationships between what apparently are different plutons. Therefore, it is necessary to turn to chemical analyses to decipher the evolution of the plutons and their relationships. A new classification of Erzgebirge plutons into five major groups of granites, based on petrologic interpretations of geochemical and mineralogical relationships (low-F biotite granites; low-F two-mica granites; high-F, high-P2O5 Li-mica granites; high-F, low-P2O5 Li-mica granites; high-F, low-P2O5 biotite granites) was tested by multivariate techniques. Canonical analyses of major elements, minor elements, trace elements and ratio variables all distinguish the groups with differing amounts of success. Univariate ANOVA's, in combination with forward-stepwise and backward-elimination canonical analyses, were used to select ten variables which were most effective in distinguishing groups. In a biplot, groups form distinct clusters roughly arranged along a quadratic path. Within groups, individual plutons tend to be arranged in patterns possibly reflecting granitic evolution. Canonical functions were used to classify samples of rhyolites of unknown association into the five groups. Another canonical analysis was based on ten elements traditionally used in petrology and which were important in the new classification of granites. Their biplot pattern is similar to that from statistically chosen variables but less effective at distinguishing the five groups of granites. This study shows that multivariate statistical techniques can provide significant insight into problems of granitic petrogenesis and may be superior to conventional procedures for petrological interpretation.

  16. Diagnosis of rheumatoid arthritis: multivariate analysis of biomarkers.

    PubMed

    Wild, Norbert; Karl, Johann; Grunert, Veit P; Schmitt, Raluca I; Garczarek, Ursula; Krause, Friedemann; Hasler, Fritz; van Riel, Piet L C M; Bayer, Peter M; Thun, Matthias; Mattey, Derek L; Sharif, Mohammed; Zolg, Werner

    2008-02-01

    To test if a combination of biomarkers can increase the classification power of autoantibodies to cyclic citrullinated peptides (anti-CCP) in the diagnosis of rheumatoid arthritis (RA) depending on the diagnostic situation. Biomarkers were subject to three inclusion/exclusion criteria (discrimination between RA patients and healthy blood donors, ability to identify anti-CCP-negative RA patients, specificity in a panel with major non-rheumatological diseases) before univariate ranking and multivariate analysis was carried out using a modelling panel (n = 906). To enable the evaluation of the classification power in different diagnostic settings the disease controls (n = 542) were weighted according to the admission rates in rheumatology clinics modelling a clinic panel or according to the relative prevalences of musculoskeletal disorders in the general population seen by general practitioners modelling a GP panel. Out of 131 biomarkers considered originally, we evaluated 32 biomarkers in this study, of which only seven passed the three inclusion/exclusion criteria and were combined by multivariate analysis using four different mathematical models. In the modelled clinic panel, anti-CCP was the lead marker with a sensitivity of 75.8% and a specificity of 94.0%. Due to the lack in specificity of the markers other than anti-CCP in this diagnostic setting, any gain in sensitivity by any marker combination is off-set by a corresponding loss in specificity. In the modelled GP panel, the best marker combination of anti-CCP and interleukin (IL)-6 resulted in a sensitivity gain of 7.6% (85.9% vs. 78.3%) at a minor loss in specificity of 1.6% (90.3% vs. 91.9%) compared with anti-CCP as the best single marker. Depending on the composition of the sample panel, anti-CCP alone or anti-CCP in combination with IL-6 has the highest classification power for the diagnosis of established RA.

  17. Comparison of Xenon-Enhanced Area-Detector CT and Krypton Ventilation SPECT/CT for Assessment of Pulmonary Functional Loss and Disease Severity in Smokers.

    PubMed

    Ohno, Yoshiharu; Fujisawa, Yasuko; Takenaka, Daisuke; Kaminaga, Shigeo; Seki, Shinichiro; Sugihara, Naoki; Yoshikawa, Takeshi

    2018-02-01

    The objective of this study was to compare the capability of xenon-enhanced area-detector CT (ADCT) performed with a subtraction technique and coregistered 81m Kr-ventilation SPECT/CT for the assessment of pulmonary functional loss and disease severity in smokers. Forty-six consecutive smokers (32 men and 14 women; mean age, 67.0 years) underwent prospective unenhanced and xenon-enhanced ADCT, 81m Kr-ventilation SPECT/CT, and pulmonary function tests. Disease severity was evaluated according to the Global Initiative for Chronic Obstructive Lung Disease (GOLD) classification. CT-based functional lung volume (FLV), the percentage of wall area to total airway area (WA%), and ventilated FLV on xenon-enhanced ADCT and SPECT/CT were calculated for each smoker. All indexes were correlated with percentage of forced expiratory volume in 1 second (%FEV 1 ) using step-wise regression analyses, and univariate and multivariate logistic regression analyses were performed. In addition, the diagnostic accuracy of the proposed model was compared with that of each radiologic index by means of McNemar analysis. Multivariate logistic regression showed that %FEV 1 was significantly affected (r = 0.77, r 2 = 0.59) by two factors: the first factor, ventilated FLV on xenon-enhanced ADCT (p < 0.0001); and the second factor, WA% (p = 0.004). Univariate logistic regression analyses indicated that all indexes significantly affected GOLD classification (p < 0.05). Multivariate logistic regression analyses revealed that ventilated FLV on xenon-enhanced ADCT and CT-based FLV significantly influenced GOLD classification (p < 0.0001). The diagnostic accuracy of the proposed model was significantly higher than that of ventilated FLV on SPECT/CT (p = 0.03) and WA% (p = 0.008). Xenon-enhanced ADCT is more effective than 81m Kr-ventilation SPECT/CT for the assessment of pulmonary functional loss and disease severity.

  18. Intrapartum fetal heart rate classification from trajectory in Sparse SVM feature space.

    PubMed

    Spilka, J; Frecon, J; Leonarduzzi, R; Pustelnik, N; Abry, P; Doret, M

    2015-01-01

    Intrapartum fetal heart rate (FHR) constitutes a prominent source of information for the assessment of fetal reactions to stress events during delivery. Yet, early detection of fetal acidosis remains a challenging signal processing task. The originality of the present contribution are three-fold: multiscale representations and wavelet leader based multifractal analysis are used to quantify FHR variability ; Supervised classification is achieved by means of Sparse-SVM that aim jointly to achieve optimal detection performance and to select relevant features in a multivariate setting ; Trajectories in the feature space accounting for the evolution along time of features while labor progresses are involved in the construction of indices quantifying fetal health. The classification performance permitted by this combination of tools are quantified on a intrapartum FHR large database (≃ 1250 subjects) collected at a French academic public hospital.

  19. An application of LANDSAT multispectral imagery for the classification of hydrobiological systems, Shark River Slough, Everglades National Park, Florida

    NASA Technical Reports Server (NTRS)

    Rose, P. W.; Rosendahl, P. C. (Principal Investigator)

    1979-01-01

    Multivariant hydrologic parameters over the Shark River Slough were investigated. Ground truth was established utilizing U-2 infrared photography and comprehensive field data to define a control network which represented all hydrobiological systems in the slough. These data were then applied to LANDSAT imagery utilizing an interactive multispectral processor which generated hydrographic maps through classification of the slough and defined the multispectral surface radiance characteristics of the wetlands areas in the park. The spectral response of each hydrobiological zone was determined and plotted to formulate multispectral relationships between the emittent energy from the slough in order to determine the best possible multispectral wavelength combinations to enhance classification results. The extent of each hydrobiological zone in slough was determined and flow vectors for water movement throughout the slough established.

  20. MDAS: an integrated system for metabonomic data analysis.

    PubMed

    Liu, Juan; Li, Bo; Xiong, Jiang-Hui

    2009-03-01

    Metabonomics, the latest 'omics' research field, shows great promise as a tool in biomarker discovery, drug efficacy and toxicity analysis, disease diagnosis and prognosis. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system, e.g., the mechanism of diseases. Traditional methods employed in metabonomic data analysis use multivariate analysis methods developed independently in chemometrics research. Additionally, with the development of machine learning approaches, some methods such as SVMs also show promise for use in metabonomic data analysis. Aside from the application of general multivariate analysis and machine learning methods to this problem, there is also a need for an integrated tool customized for metabonomic data analysis which can be easily used by biologists to reveal interesting patterns in metabonomic data.In this paper, we present a novel software tool MDAS (Metabonomic Data Analysis System) for metabonomic data analysis which integrates traditional chemometrics methods and newly introduced machine learning approaches. MDAS contains a suite of functional models for metabonomic data analysis and optimizes the flow of data analysis. Several file formats can be accepted as input. The input data can be optionally preprocessed and can then be processed with operations such as feature analysis and dimensionality reduction. The data with reduced dimensionalities can be used for training or testing through machine learning models. The system supplies proper visualization for data preprocessing, feature analysis, and classification which can be a powerful function for users to extract knowledge from the data. MDAS is an integrated platform for metabonomic data analysis, which transforms a complex analysis procedure into a more formalized and simplified one. The software package can be obtained from the authors.

  1. Assessment of self-organizing maps to analyze sole-carbon source utilization profiles.

    PubMed

    Leflaive, Joséphine; Céréghino, Régis; Danger, Michaël; Lacroix, Gérard; Ten-Hage, Loïc

    2005-07-01

    The use of community-level physiological profiles obtained with Biolog microplates is widely employed to consider the functional diversity of bacterial communities. Biolog produces a great amount of data which analysis has been the subject of many studies. In most cases, after some transformations, these data were investigated with classical multivariate analyses. Here we provided an alternative to this method, that is the use of an artificial intelligence technique, the Self-Organizing Maps (SOM, unsupervised neural network). We used data from a microcosm study of algae-associated bacterial communities placed in various nutritive conditions. Analyses were carried out on the net absorbances at two incubation times for each substrates and on the chemical guild categorization of the total bacterial activity. Compared to Principal Components Analysis and cluster analysis, SOM appeared as a valuable tool for community classification, and to establish clear relationships between clusters of bacterial communities and sole-carbon sources utilization. Specifically, SOM offered a clear bidimensional projection of a relatively large volume of data and were easier to interpret than plots commonly obtained with multivariate analyses. They would be recommended to pattern the temporal evolution of communities' functional diversity.

  2. Illustration of compositional variations over time of Chinese porcelain glazes combining micro-X-ray Fluorescence spectrometry, multivariate data analysis and Seger formulas

    NASA Astrophysics Data System (ADS)

    Van Pevenage, J.; Verhaeven, E.; Vekemans, B.; Lauwers, D.; Herremans, D.; De Clercq, W.; Vincze, L.; Moens, L.; Vandenabeele, P.

    2015-01-01

    In this research, the transparent glaze layers of Chinese porcelain samples were investigated. Depending on the production period, these samples can be divided into two groups: the samples of group A dating from the Kangxi period (1661-1722), and the samples of group B produced under emperor Qianlong (1735-1795). Due to the specific sample preparation method and the small spot size of the X-ray beam, investigation of the transparent glaze layers is enabled. Despite the many existing research papers about glaze investigations of ceramics and/or porcelain ware, this research reveals new insights into the glaze composition and structure of Chinese porcelain samples. In this paper it is demonstrated, using micro-X-ray Fluorescence (μ-XRF) spectrometry, multivariate data analysis and statistical analysis (Hotelling's T-Square test) that the transparent glaze layers of the samples of groups A and B are significantly different (95% confidence level). Calculation of the Seger formulas, enabled classification of the glazes. Combining all the information, the difference in composition of the Chinese porcelain glazes of the Kangxi period and the Qianlong period can be demonstrated.

  3. The role of the human leukocyte antigen system in retinopathy of prematurity: a pilot study.

    PubMed

    Flor-de-Lima, Filipa; Rocha, Gustavo; Proença, Elisa; Tafulo, Sandra; Freitas, Fátima; Guimarães, Hercília

    2013-12-01

    To assess the association between the human leukocyte antigen system and retinopathy of prematurity. Neonates of <32 weeks of gestational age, born at two level III neonatal intensive care units from January 2000 to December 2001 and from January 2006 to June 2009, were included in the study. Demographic and clinical data were recorded, and retinopathy was classified according to the International Classification. Epithelial cells were collected from the oral cavity and the HLA were studied using the PCR/SSO method. Univariate and multivariate analyses were performed using SPSS® v.18. We evaluated 156 neonates, including 82 (52.6%) males. Median gestational age was 29 (23-31) weeks, and median birth weight was 1030 (525-1935) grams. Seventy (44.9%) of the neonates developed retinopathy. Alleles HLA-B*38, HLA-Cw*12, HLA-DRB1*09, HLA-DRB1*14 (univariate analysis) and HLA-A*68 and HLA-Cw*12 were associated to retinopathy (multivariate analysis). The results suggest that the HLA system may be associated with the development of retinopathy of prematurity. A large-scale population-based study should be performed to clarify this association. ©2013 Foundation Acta Paediatrica. Published by John Wiley & Sons Ltd.

  4. On the use of spectra from portable Raman and ATR-IR instruments in synthesis route attribution of a chemical warfare agent by multivariate modeling.

    PubMed

    Wiktelius, Daniel; Ahlinder, Linnea; Larsson, Andreas; Höjer Holmgren, Karin; Norlin, Rikard; Andersson, Per Ola

    2018-08-15

    Collecting data under field conditions for forensic investigations of chemical warfare agents calls for the use of portable instruments. In this study, a set of aged, crude preparations of sulfur mustard were characterized spectroscopically without any sample preparation using handheld Raman and portable IR instruments. The spectral data was used to construct Random Forest multivariate models for the attribution of test set samples to the synthetic method used for their production. Colored and fluorescent samples were included in the study, which made Raman spectroscopy challenging although fluorescence was diminished by using an excitation wavelength of 1064 nm. The predictive power of models constructed with IR or Raman data alone, as well as with combined data was investigated. Both techniques gave useful data for attribution. Model performance was enhanced when Raman and IR spectra were combined, allowing correct classification of 19/23 (83%) of test set spectra. The results demonstrate that data obtained with spectroscopy instruments amenable for field deployment can be useful in forensic studies of chemical warfare agents. Copyright © 2018 Elsevier B.V. All rights reserved.

  5. Discriminant analysis of fused positive and negative ion mobility spectra using multivariate self-modeling mixture analysis and neural networks.

    PubMed

    Chen, Ping; Harrington, Peter B

    2008-02-01

    A new method coupling multivariate self-modeling mixture analysis and pattern recognition has been developed to identify toxic industrial chemicals using fused positive and negative ion mobility spectra (dual scan spectra). A Smiths lightweight chemical detector (LCD), which can measure positive and negative ion mobility spectra simultaneously, was used to acquire the data. Simple-to-use interactive self-modeling mixture analysis (SIMPLISMA) was used to separate the analytical peaks in the ion mobility spectra from the background reactant ion peaks (RIP). The SIMPLSIMA analytical components of the positive and negative ion peaks were combined together in a butterfly representation (i.e., negative spectra are reported with negative drift times and reflected with respect to the ordinate and juxtaposed with the positive ion mobility spectra). Temperature constrained cascade-correlation neural network (TCCCN) models were built to classify the toxic industrial chemicals. Seven common toxic industrial chemicals were used in this project to evaluate the performance of the algorithm. Ten bootstrapped Latin partitions demonstrated that the classification of neural networks using the SIMPLISMA components was statistically better than neural network models trained with fused ion mobility spectra (IMS).

  6. Human immunotoxicologic markers of chemical exposures: preliminary validation studies.

    PubMed

    Wartenberg, D; Laskin, D; Kipen, H

    1993-01-01

    The circulating cells of the immune system are sensitive to environmental contaminants, and effects are often manifested as changes in the cell surface differentiation antigens of affected populations of cells, particularly lymphocytes. In this investigation, we explore the likelihood that variation in the expression of the surface markers of immune cells can be used as an index of exposure to toxic chemicals. We recruited 38 healthy New Jersey men to study pesticides effects: 19 orchard farmers (high exposure); 13 berry farmers (low exposure); and 6 hardware store owners (no exposure). Immunophenotyping was performed assaying the following cell surface antigens: CD2, CD4, CD8, CD14, CD20, CD26, CD29, CD45R, CD56, and PMN. Data were analyzed using univariate and multivariate methods. There were no significant differences among the groups with respect to routine medical histories, physical examinations, or routine laboratory parameters. No striking differences between groups were seen in univariate tests. Multivariate tests suggested some differences among groups and limited ability to correctly classify individuals based on immunophenotyping results. Immunophenotyping represents a fruitful area of research for improved exposure classification. Work is needed both on mechanistic understanding of the patterns observed and on the statistical interpretation of these patterns.

  7. Segmentation of prostate biopsy needles in transrectal ultrasound images

    NASA Astrophysics Data System (ADS)

    Krefting, Dagmar; Haupt, Barbara; Tolxdorff, Thomas; Kempkensteffen, Carsten; Miller, Kurt

    2007-03-01

    Prostate cancer is the most common cancer in men. Tissue extraction at different locations (biopsy) is the gold-standard for diagnosis of prostate cancer. These biopsies are commonly guided by transrectal ultrasound imaging (TRUS). Exact location of the extracted tissue within the gland is desired for more specific diagnosis and provides better therapy planning. While the orientation and the position of the needle within clinical TRUS image are limited, the appearing length and visibility of the needle varies strongly. Marker lines are present and tissue inhomogeneities and deflection artefacts may appear. Simple intensity, gradient oder edge-detecting based segmentation methods fail. Therefore a multivariate statistical classificator is implemented. The independent feature model is built by supervised learning using a set of manually segmented needles. The feature space is spanned by common binary object features as size and eccentricity as well as imaging-system dependent features like distance and orientation relative to the marker line. The object extraction is done by multi-step binarization of the region of interest. The ROI is automatically determined at the beginning of the segmentation and marker lines are removed from the images. The segmentation itself is realized by scale-invariant classification using maximum likelihood estimation and Mahalanobis distance as discriminator. The technique presented here could be successfully applied in 94% of 1835 TRUS images from 30 tissue extractions. It provides a robust method for biopsy needle localization in clinical prostate biopsy TRUS images.

  8. Chemical indices and methods of multivariate statistics as a tool for odor classification.

    PubMed

    Mahlke, Ingo T; Thiesen, Peter H; Niemeyer, Bernd

    2007-04-01

    Industrial and agricultural off-gas streams are comprised of numerous volatile compounds, many of which have substantially different odorous properties. State-of-the-art waste-gas treatment includes the characterization of these molecules and is directed at, if possible, either the avoidance of such odorants during processing or the use of existing standardized air purification techniques like bioscrubbing or afterburning, which however, often show low efficiency under ecological and economical regards. Selective odor separation from the off-gas streams could ease many of these disadvantages but is not yet widely applicable. Thus, the aim of this paper is to identify possible model substances in selective odor separation research from 155 volatile molecules mainly originating from livestock facilities, fat refineries, and cocoa and coffee production by knowledge-based methods. All compounds are examined with regard to their structure and information-content using topological and information-theoretical indices. Resulting data are fitted in an observation matrix, and similarities between the substances are computed. Principal component analysis and k-means cluster analysis are conducted showing that clustering of indices data can depict odor information correlating well to molecular composition and molecular shape. Quantitative molecule describtion along with the application of such statistical means therefore provide a good classification tool of malodorant structure properties with no thermodynamic data needed. The approximate look-alike shape of odorous compounds within the clusters suggests a fair choice of possible model molecules.

  9. On the effect of experimental noise on the classification of biological samples using Raman micro-spectroscopy

    NASA Astrophysics Data System (ADS)

    Barton, Sinead J.; Kerr, Laura T.; Domijan, Katarina; Hennelly, Bryan M.

    2016-04-01

    Raman micro-spectroscopy is an optoelectronic technique that can be used to evaluate the chemical composition of biological samples and has been shown to be a powerful diagnostic tool for the investigation of various cancer related diseases including bladder, breast, and cervical cancer. Raman scattering is an inherently weak process with approximately 1 in 107 photons undergoing scattering and for this reason, noise from the recording system can have a significant impact on the quality of the signal, and its suitability for diagnostic classification. The main sources of noise in the recorded signal are shot noise, CCD dark current, and CCD readout noise. Shot noise results from the low signal photon count while dark current results from thermally generated electrons in the semiconductor pixels. Both of these noise sources are time dependent; readout noise is time independent but is inherent in each individual recording and results in the fundamental limit of measurement, arising from the internal electronics of the camera. In this paper, each of the aforementioned noise sources are analysed in isolation, and used to experimentally validate a mathematical model. This model is then used to simulate spectra that might be acquired under various experimental conditions including the use of different cameras, different source wavelength, and power etc. Simulated noisy datasets of T24 and RT112 cell line spectra are generated based on true cell Raman spectrum irradiance values (recorded using very long exposure times) and the addition of simulated noise. These datasets are then input to multivariate classification using Principal Components Analysis and Linear Discriminant Analysis. This method enables an investigation into the effect of noise on the sensitivity and specificity of Raman based classification under various experimental conditions and using different equipment.

  10. Incorporation of support vector machines in the LIBS toolbox for sensitive and robust classification amidst unexpected sample and system variability

    PubMed Central

    ChariDingari, Narahara; Barman, Ishan; Myakalwar, Ashwin Kumar; Tewari, Surya P.; Kumar, G. Manoj

    2012-01-01

    Despite the intrinsic elemental analysis capability and lack of sample preparation requirements, laser-induced breakdown spectroscopy (LIBS) has not been extensively used for real world applications, e.g. quality assurance and process monitoring. Specifically, variability in sample, system and experimental parameters in LIBS studies present a substantive hurdle for robust classification, even when standard multivariate chemometric techniques are used for analysis. Considering pharmaceutical sample investigation as an example, we propose the use of support vector machines (SVM) as a non-linear classification method over conventional linear techniques such as soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) for discrimination based on LIBS measurements. Using over-the-counter pharmaceutical samples, we demonstrate that application of SVM enables statistically significant improvements in prospective classification accuracy (sensitivity), due to its ability to address variability in LIBS sample ablation and plasma self-absorption behavior. Furthermore, our results reveal that SVM provides nearly 10% improvement in correct allocation rate and a concomitant reduction in misclassification rates of 75% (cf. PLS-DA) and 80% (cf. SIMCA)-when measurements from samples not included in the training set are incorporated in the test data – highlighting its robustness. While further studies on a wider matrix of sample types performed using different LIBS systems is needed to fully characterize the capability of SVM to provide superior predictions, we anticipate that the improved sensitivity and robustness observed here will facilitate application of the proposed LIBS-SVM toolbox for screening drugs and detecting counterfeit samples as well as in related areas of forensic and biological sample analysis. PMID:22292496

  11. Incorporation of support vector machines in the LIBS toolbox for sensitive and robust classification amidst unexpected sample and system variability.

    PubMed

    Dingari, Narahara Chari; Barman, Ishan; Myakalwar, Ashwin Kumar; Tewari, Surya P; Kumar Gundawar, Manoj

    2012-03-20

    Despite the intrinsic elemental analysis capability and lack of sample preparation requirements, laser-induced breakdown spectroscopy (LIBS) has not been extensively used for real-world applications, e.g., quality assurance and process monitoring. Specifically, variability in sample, system, and experimental parameters in LIBS studies present a substantive hurdle for robust classification, even when standard multivariate chemometric techniques are used for analysis. Considering pharmaceutical sample investigation as an example, we propose the use of support vector machines (SVM) as a nonlinear classification method over conventional linear techniques such as soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) for discrimination based on LIBS measurements. Using over-the-counter pharmaceutical samples, we demonstrate that the application of SVM enables statistically significant improvements in prospective classification accuracy (sensitivity), because of its ability to address variability in LIBS sample ablation and plasma self-absorption behavior. Furthermore, our results reveal that SVM provides nearly 10% improvement in correct allocation rate and a concomitant reduction in misclassification rates of 75% (cf. PLS-DA) and 80% (cf. SIMCA)-when measurements from samples not included in the training set are incorporated in the test data-highlighting its robustness. While further studies on a wider matrix of sample types performed using different LIBS systems is needed to fully characterize the capability of SVM to provide superior predictions, we anticipate that the improved sensitivity and robustness observed here will facilitate application of the proposed LIBS-SVM toolbox for screening drugs and detecting counterfeit samples, as well as in related areas of forensic and biological sample analysis.

  12. The Pathways for Intelligible Speech: Multivariate and Univariate Perspectives

    PubMed Central

    Evans, S.; Kyong, J.S.; Rosen, S.; Golestani, N.; Warren, J.E.; McGettigan, C.; Mourão-Miranda, J.; Wise, R.J.S.; Scott, S.K.

    2014-01-01

    An anterior pathway, concerned with extracting meaning from sound, has been identified in nonhuman primates. An analogous pathway has been suggested in humans, but controversy exists concerning the degree of lateralization and the precise location where responses to intelligible speech emerge. We have demonstrated that the left anterior superior temporal sulcus (STS) responds preferentially to intelligible speech (Scott SK, Blank CC, Rosen S, Wise RJS. 2000. Identification of a pathway for intelligible speech in the left temporal lobe. Brain. 123:2400–2406.). A functional magnetic resonance imaging study in Cerebral Cortex used equivalent stimuli and univariate and multivariate analyses to argue for the greater importance of bilateral posterior when compared with the left anterior STS in responding to intelligible speech (Okada K, Rong F, Venezia J, Matchin W, Hsieh IH, Saberi K, Serences JT,Hickok G. 2010. Hierarchical organization of human auditory cortex: evidence from acoustic invariance in the response to intelligible speech. 20: 2486–2495.). Here, we also replicate our original study, demonstrating that the left anterior STS exhibits the strongest univariate response and, in decoding using the bilateral temporal cortex, contains the most informative voxels showing an increased response to intelligible speech. In contrast, in classifications using local “searchlights” and a whole brain analysis, we find greater classification accuracy in posterior rather than anterior temporal regions. Thus, we show that the precise nature of the multivariate analysis used will emphasize different response profiles associated with complex sound to speech processing. PMID:23585519

  13. Prediction of wastewater quality indicators at the inflow to the wastewater treatment plant using data mining methods

    NASA Astrophysics Data System (ADS)

    Szeląg, Bartosz; Barbusiński, Krzysztof; Studziński, Jan; Bartkiewicz, Lidia

    2017-11-01

    In the study, models developed using data mining methods are proposed for predicting wastewater quality indicators: biochemical and chemical oxygen demand, total suspended solids, total nitrogen and total phosphorus at the inflow to wastewater treatment plant (WWTP). The models are based on values measured in previous time steps and daily wastewater inflows. Also, independent prediction systems that can be used in case of monitoring devices malfunction are provided. Models of wastewater quality indicators were developed using MARS (multivariate adaptive regression spline) method, artificial neural networks (ANN) of the multilayer perceptron type combined with the classification model (SOM) and cascade neural networks (CNN). The lowest values of absolute and relative errors were obtained using ANN+SOM, whereas the MARS method produced the highest error values. It was shown that for the analysed WWTP it is possible to obtain continuous prediction of selected wastewater quality indicators using the two developed independent prediction systems. Such models can ensure reliable WWTP work when wastewater quality monitoring systems become inoperable, or are under maintenance.

  14. A Multivariate Analytic Approach to the Differential Diagnosis of Apraxia of Speech

    ERIC Educational Resources Information Center

    Basilakos, Alexandra; Yourganov, Grigori; den Ouden, Dirk-Bart; Fogerty, Daniel; Rorden, Chris; Feenaughty, Lynda; Fridriksson, Julius

    2017-01-01

    Purpose: Apraxia of speech (AOS) is a consequence of stroke that frequently co-occurs with aphasia. Its study is limited by difficulties with its perceptual evaluation and dissociation from co-occurring impairments. This study examined the classification accuracy of several acoustic measures for the differential diagnosis of AOS in a sample of…

  15. Feature combinations and the divergence criterion

    NASA Technical Reports Server (NTRS)

    Decell, H. P., Jr.; Mayekar, S. M.

    1976-01-01

    Classifying large quantities of multidimensional remotely sensed agricultural data requires efficient and effective classification techniques and the construction of certain transformations of a dimension reducing, information preserving nature. The construction of transformations that minimally degrade information (i.e., class separability) is described. Linear dimension reducing transformations for multivariate normal populations are presented. Information content is measured by divergence.

  16. An unconventional approach to ecosystem unit classification in western North Carolina, USA

    Treesearch

    W. Henry McNab; Sara A. Browning; Steven A. Simon; Penelope E. Fouts

    1999-01-01

    The authors used an unconventional combination of data transformation and multivariate analyses to reduce subjectivity in identification of ecosystem units in a mountainous region of western North Carolina, USA. Vegetative cover and environmental variables were measured on 79 stratified, randomly located, 0.1 ha sample plots in a 4000 ha watershed. Binary...

  17. FT-IR spectroscopy and multivariate analysis as an auxiliary tool for diagnosis of mental disorders: Bipolar and schizophrenia cases

    NASA Astrophysics Data System (ADS)

    Ogruc Ildiz, G.; Arslan, M.; Unsalan, O.; Araujo-Andrade, C.; Kurt, E.; Karatepe, H. T.; Yilmaz, A.; Yalcinkaya, O. B.; Herken, H.

    2016-01-01

    In this study, a methodology based on Fourier-transform infrared spectroscopy and principal component analysis and partial least square methods is proposed for the analysis of blood plasma samples in order to identify spectral changes correlated with some biomarkers associated with schizophrenia and bipolarity. Our main goal was to use the spectral information for the calibration of statistical models to discriminate and classify blood plasma samples belonging to bipolar and schizophrenic patients. IR spectra of 30 samples of blood plasma obtained from each, bipolar and schizophrenic patients and healthy control group were collected. The results obtained from principal component analysis (PCA) show a clear discrimination between the bipolar (BP), schizophrenic (SZ) and control group' (CG) blood samples that also give possibility to identify three main regions that show the major differences correlated with both mental disorders (biomarkers). Furthermore, a model for the classification of the blood samples was calibrated using partial least square discriminant analysis (PLS-DA), allowing the correct classification of BP, SZ and CG samples. The results obtained applying this methodology suggest that it can be used as a complimentary diagnostic tool for the detection and discrimination of these mental diseases.

  18. Intraoperative Raman spectroscopy of soft tissue sarcomas.

    PubMed

    Nguyen, John Q; Gowani, Zain S; O'Connor, Maggie; Pence, Isaac J; Nguyen, The-Quyen; Holt, Ginger E; Schwartz, Herbert S; Halpern, Jennifer L; Mahadevan-Jansen, Anita

    2016-10-01

    Soft tissue sarcomas (STS) are a rare and heterogeneous group of malignant tumors that are often treated through surgical resection. Current intraoperative margin assessment methods are limited and highlight the need for an improved approach with respect to time and specificity. Here we investigate the potential of near-infrared Raman spectroscopy for the intraoperative differentiation of STS from surrounding normal tissue. In vivo Raman measurements at 785 nm excitation were intraoperatively acquired from subjects undergoing STS resection using a probe based spectroscopy system. A multivariate classification algorithm was developed in order to automatically identify spectral features that can be used to differentiate STS from the surrounding normal muscle and fat. The classification algorithm was subsequently tested using leave-one-subject-out cross-validation. With the exclusion of well-differentiated liposarcomas, the algorithm was able to classify STS from the surrounding normal muscle and fat with a sensitivity and specificity of 89.5% and 96.4%, respectively. These results suggest that single point near-infrared Raman spectroscopy could be utilized as a rapid and non-destructive surgical guidance tool for identifying abnormal tissue margins in need of further excision. Lasers Surg. Med. 48:774-781, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  19. Overweight and Obesity Prevalence Among School-Aged Nunavik Inuit Children According to Three Body Mass Index Classification Systems.

    PubMed

    Medehouenou, Thierry Comlan Marc; Ayotte, Pierre; St-Jean, Audray; Meziou, Salma; Roy, Cynthia; Muckle, Gina; Lucas, Michel

    2015-07-01

    Little is known about the suitability of three commonly used body mass index (BMI) classification system for Indigenous children. This study aims to estimate overweight and obesity prevalence among school-aged Nunavik Inuit children according to International Obesity Task Force (IOTF), Centers for Disease Control and Prevention (CDC), and World Health Organization (WHO) BMI classification systems, to measure agreement between those classification systems, and to investigate whether BMI status as defined by these classification systems is associated with levels of metabolic and inflammatory biomarkers. Data were collected on 290 school-aged children (aged 8-14 years; 50.7% girls) from the Nunavik Child Development Study with data collected in 2005-2010. Anthropometric parameters were measured and blood sampled. Participants were classified as normal weight, overweight, and obese according to BMI classification systems. Weighted kappa (κw) statistics assessed agreement between different BMI classification systems, and multivariate analysis of variance ascertained their relationship with metabolic and inflammatory biomarkers. The combined prevalence rate of overweight/obesity was 26.9% (with 6.6% obesity) with IOTF, 24.1% (11.0%) with CDC, and 40.4% (12.8%) with WHO classification systems. Agreement was the highest between IOTF and CDC (κw = .87) classifications, and substantial for IOTF and WHO (κw = .69) and for CDC and WHO (κw = .73). Insulin and high-sensitivity C-reactive protein plasma levels were significantly higher from normal weight to obesity, regardless of classification system. Among obese subjects, higher insulin level was observed with IOTF. Compared with other systems, IOTF classification appears to be more specific to identify overweight and obesity in Inuit children. Copyright © 2015 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.

  20. Clinical significance of erythropoietin receptor expression in oral squamous cell carcinoma

    PubMed Central

    2012-01-01

    Background Hypoxic tumors are refractory to radiation and chemotherapy. High expression of biomarkers related to hypoxia in head and neck cancer is associated with a poorer prognosis. The present study aimed to evaluate the clinicopathological significance of erythropoietin receptor (EPOR) expression in oral squamous cell carcinoma (OSCC). Methods The study included 256 patients who underwent primary surgical resection between October 1996 and August 2005 for treatment of OSCC without previous radiotherapy and/or chemotherapy. Clinicopathological information including gender, age, T classification, N classification, and TNM stage was obtained from clinical records and pathology reports. The mRNA and protein expression levels of EPOR in OSCC specimens were evaluated by Q-RT-PCR, Western blotting and immunohistochemistry assays. Results We found that EPOR were overexpressed in OSCC tissues. The study included 17 women and 239 men with an average age of 50.9 years (range, 26–87 years). The mean follow-up period was 67 months (range, 2–171 months). High EPOR expression was significantly correlated with advanced T classification (p < 0.001), advanced TNM stage (p < 0.001), and positive N classification (p = 0.001). Furthermore, the univariate analysis revealed that patients with high tumor EPOR expression had a lower 5-year overall survival rate (p = 0.0011) and 5-year disease-specific survival rate (p = 0.0017) than patients who had low tumor levels of EPOR. However, the multivariate analysis using Cox’s regression model revealed that only the T and N classifications were independent prognostic factors for the 5-year overall survival and 5-year disease-specific survival rates. Conclusions High EPOR expression in OSCC is associated with an aggressive tumor behavior and poorer prognosis in the univariate analysis among patients with OSCC. Thus, EPOR expression may serve as a treatment target for OSCC in the future. PMID:22639817

  1. Contemporary survival of patients with pulmonary arterial hypertension and congenital systemic to pulmonary shunts

    PubMed Central

    Chungsomprasong, Paweena; Bositthipichet, Densiri; Ketsara, Salisa; Titaram, Yuttapon; Chanthong, Prakul; Kanjanauthai, Supaluck

    2018-01-01

    Objective To compare survival of patients with newly diagnosed pulmonary arterial hypertension associated with congenital heart disease (PAH-CHD) according to various clinical classifications with classifications of anatomical-pathophysiological systemic to pulmonary shunts in a single-center cohort. Methods All prevalent cases of PAH-CHD with hemodynamic confirmation by cardiac catheterization in 1995–2015 were retrospectively reviewed. Patients who were younger than three months of age, or with single ventricle following surgery were excluded. Baseline characteristics and clinical outcomes were retrieved from the database. The survival analysis was performed at the end of 2016. Prognostic factors were identified using multivariate analysis. Results A total of 366 consecutive patients (24.5 ± 17.6 years of age, 40% male) with PAH-CHD were analyzed. Most had simple shunts (85 pre-tricuspid, 105 post-tricuspid, 102 combined shunts). Patients with pre-tricuspid shunts were significantly older at diagnosis in comparison to post-tricuspid, combined, and complex shunts. Clinical classifications identified patients as having Eisenmenger syndrome (ES, 26.8%), prevalent left to right shunt (66.7%), PAH with small defect (3%), or PAH following defect correction (3.5%). At follow-up (median = 5.9 years; 0.1–20.7 years), no statistically significant differences in survival rate were seen among the anatomical-pathophysiological shunts (p = 0.1). Conversely, the clinical classifications revealed that patients with PAH-small defect had inferior survival compared to patients with ES, PAH post-corrective surgery, or PAH with prevalent left to right shunt (p = 0.01). Significant mortality risks were functional class III, age < 10 years, PAH-small defect, elevated right atrial pressure > 15 mmHg, and baseline PVR > 8 WU•m.2 Conclusion Patients with PAH-CHD had a modest long-term survival. Different anatomical-pathophysiological shunts affect the natural presentation, while clinical classifications indicate treatment strategies and survival. Contemporary therapy improves survival in deliberately selected patients. PMID:29664959

  2. Hyperspectral image reconstruction using RGB color for foodborne pathogen detection on agar plates

    NASA Astrophysics Data System (ADS)

    Yoon, Seung-Chul; Shin, Tae-Sung; Park, Bosoon; Lawrence, Kurt C.; Heitschmidt, Gerald W.

    2014-03-01

    This paper reports the latest development of a color vision technique for detecting colonies of foodborne pathogens grown on agar plates with a hyperspectral image classification model that was developed using full hyperspectral data. The hyperspectral classification model depended on reflectance spectra measured in the visible and near-infrared spectral range from 400 and 1,000 nm (473 narrow spectral bands). Multivariate regression methods were used to estimate and predict hyperspectral data from RGB color values. The six representative non-O157 Shiga-toxin producing Eschetichia coli (STEC) serogroups (O26, O45, O103, O111, O121, and O145) were grown on Rainbow agar plates. A line-scan pushbroom hyperspectral image sensor was used to scan 36 agar plates grown with pure STEC colonies at each plate. The 36 hyperspectral images of the agar plates were divided in half to create training and test sets. The mean Rsquared value for hyperspectral image estimation was about 0.98 in the spectral range between 400 and 700 nm for linear, quadratic and cubic polynomial regression models and the detection accuracy of the hyperspectral image classification model with the principal component analysis and k-nearest neighbors for the test set was up to 92% (99% with the original hyperspectral images). Thus, the results of the study suggested that color-based detection may be viable as a multispectral imaging solution without much loss of prediction accuracy compared to hyperspectral imaging.

  3. 7 CFR 28.179 - Methods of cotton classification and comparison.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 2 2013-01-01 2013-01-01 false Methods of cotton classification and comparison. 28... STANDARD CONTAINER REGULATIONS COTTON CLASSING, TESTING, AND STANDARDS Classification for Foreign Growth Cotton § 28.179 Methods of cotton classification and comparison. The classification of samples from...

  4. 7 CFR 28.179 - Methods of cotton classification and comparison.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Methods of cotton classification and comparison. 28... STANDARD CONTAINER REGULATIONS COTTON CLASSING, TESTING, AND STANDARDS Classification for Foreign Growth Cotton § 28.179 Methods of cotton classification and comparison. The classification of samples from...

  5. 7 CFR 28.179 - Methods of cotton classification and comparison.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 2 2012-01-01 2012-01-01 false Methods of cotton classification and comparison. 28... STANDARD CONTAINER REGULATIONS COTTON CLASSING, TESTING, AND STANDARDS Classification for Foreign Growth Cotton § 28.179 Methods of cotton classification and comparison. The classification of samples from...

  6. 7 CFR 28.179 - Methods of cotton classification and comparison.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Methods of cotton classification and comparison. 28... STANDARD CONTAINER REGULATIONS COTTON CLASSING, TESTING, AND STANDARDS Classification for Foreign Growth Cotton § 28.179 Methods of cotton classification and comparison. The classification of samples from...

  7. 7 CFR 28.179 - Methods of cotton classification and comparison.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 2 2014-01-01 2014-01-01 false Methods of cotton classification and comparison. 28... STANDARD CONTAINER REGULATIONS COTTON CLASSING, TESTING, AND STANDARDS Classification for Foreign Growth Cotton § 28.179 Methods of cotton classification and comparison. The classification of samples from...

  8. Unsupervised learning on scientific ocean drilling datasets from the South China Sea

    NASA Astrophysics Data System (ADS)

    Tse, Kevin C.; Chiu, Hon-Chim; Tsang, Man-Yin; Li, Yiliang; Lam, Edmund Y.

    2018-06-01

    Unsupervised learning methods were applied to explore data patterns in multivariate geophysical datasets collected from ocean floor sediment core samples coming from scientific ocean drilling in the South China Sea. Compared to studies on similar datasets, but using supervised learning methods which are designed to make predictions based on sample training data, unsupervised learning methods require no a priori information and focus only on the input data. In this study, popular unsupervised learning methods including K-means, self-organizing maps, hierarchical clustering and random forest were coupled with different distance metrics to form exploratory data clusters. The resulting data clusters were externally validated with lithologic units and geologic time scales assigned to the datasets by conventional methods. Compact and connected data clusters displayed varying degrees of correspondence with existing classification by lithologic units and geologic time scales. K-means and self-organizing maps were observed to perform better with lithologic units while random forest corresponded best with geologic time scales. This study sets a pioneering example of how unsupervised machine learning methods can be used as an automatic processing tool for the increasingly high volume of scientific ocean drilling data.

  9. Electroencephalogram-based decoding cognitive states using convolutional neural network and likelihood ratio based score fusion

    PubMed Central

    2017-01-01

    Electroencephalogram (EEG)-based decoding human brain activity is challenging, owing to the low spatial resolution of EEG. However, EEG is an important technique, especially for brain–computer interface applications. In this study, a novel algorithm is proposed to decode brain activity associated with different types of images. In this hybrid algorithm, convolutional neural network is modified for the extraction of features, a t-test is used for the selection of significant features and likelihood ratio-based score fusion is used for the prediction of brain activity. The proposed algorithm takes input data from multichannel EEG time-series, which is also known as multivariate pattern analysis. Comprehensive analysis was conducted using data from 30 participants. The results from the proposed method are compared with current recognized feature extraction and classification/prediction techniques. The wavelet transform-support vector machine method is the most popular currently used feature extraction and prediction method. This method showed an accuracy of 65.7%. However, the proposed method predicts the novel data with improved accuracy of 79.9%. In conclusion, the proposed algorithm outperformed the current feature extraction and prediction method. PMID:28558002

  10. Discriminative analysis of early Alzheimer's disease based on two intrinsically anti-correlated networks with resting-state fMRI.

    PubMed

    Wang, Kun; Jiang, Tianzi; Liang, Meng; Wang, Liang; Tian, Lixia; Zhang, Xinqing; Li, Kuncheng; Liu, Zhening

    2006-01-01

    In this work, we proposed a discriminative model of Alzheimer's disease (AD) on the basis of multivariate pattern classification and functional magnetic resonance imaging (fMRI). This model used the correlation/anti-correlation coefficients of two intrinsically anti-correlated networks in resting brains, which have been suggested by two recent studies, as the feature of classification. Pseudo-Fisher Linear Discriminative Analysis (pFLDA) was then performed on the feature space and a linear classifier was generated. Using leave-one-out (LOO) cross validation, our results showed a correct classification rate of 83%. We also compared the proposed model with another one based on the whole brain functional connectivity. Our proposed model outperformed the other one significantly, and this implied that the two intrinsically anti-correlated networks may be a more susceptible part of the whole brain network in the early stage of AD.

  11. Multivariate analysis of full-term neonatal polysomnographic data.

    PubMed

    Gerla, V; Paul, K; Lhotska, L; Krajca, V

    2009-01-01

    Polysomnography (PSG) is one of the most important noninvasive methods for studying maturation of the child brain. Sleep in infants is significantly different from sleep in adults. This paper addresses the problem of computer analysis of neonatal polygraphic signals. We applied methods designed for differentiating three important neonatal behavioral states: quiet sleep, active sleep, and wakefulness. The proportion of these states is a significant indicator of the maturity of the newborn brain in clinical practice. In this study, we used data provided by the Institute for Care of Mother and Child, Prague (12 newborn infants of similar postconceptional age). The data were scored by an experienced physician to four states (wake, quiet sleep, active sleep, movement artifact). For accurate classification, it was necessary to determine the most informative features. We used a method based on power spectral density (PSD) applied to each EEG channel. We also used features derived from electrooculogram (EOG), electromyogram (EMG), ECG, and respiration [pneumogram (PNG)] signals. The most informative feature was the measure of regularity of respiration from the PNG signal. We designed an algorithm for interpreting these characteristics. This algorithm was based on Markov models. The results of automatic detection of sleep states were compared to the "sleep profiles" determined visually. We evaluated both the success rate and the true positive rate of the classification, and statistically significant agreement of the two scorings was found. Two variants, for learning and for testing, were applied, namely learning from the data of all 12 newborns and tenfold cross-validation, and learning from the data of 11 newborns and testing on the data from the 12th newborn. We utilized information obtained from several biological signals (EEG, ECG, PNG, EMG, EOG) for our final classification. We reached the final success rate of 82.5%. The true positive rate was 81.8% and the false positive rate was 6.1%. The most important step in the whole process is feature extraction and feature selection. In this process, we used visualization as an additional tool that helped us to decide which features to select. Proper selection of features may significantly influence the success rate of the classification. We made a visual comparison of the computed features with the manual scoring provided by the expert. A hidden Markov model was used for classification. The advantage of this model is that it determines the future behavior of the process by its present state. In this way, it preserves information about temporal development.

  12. The Statistical Consulting Center for Astronomy (SCCA)

    NASA Technical Reports Server (NTRS)

    Akritas, Michael

    2001-01-01

    The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.

  13. Studies of mu-neutrino going to e-neutrino oscillation appearance in the MINOS experiment

    NASA Astrophysics Data System (ADS)

    Sousa, Alexandre Bruno Pereira E.

    The MINOS experiment uses a long baseline neutrino beam, measured 1 km downstream from its origin in the Near Detector at Fermilab, and 734 km later in the large underground Far Detector in the Soudan mine. By comparing these two measurements, MINOS can probe the atmospheric domain of the neutrino oscillation phenomenology with unprecedented precision. Besides the ability to perform a world leading determination of the Dm223 and theta23 parameters, via numu flux disappearance, MINOS has the potential to make a leading measurement of nu mu → nue oscillations in the atmospheric sector by looking for nue appearance at the Far Detector. The observation of nue appearance, tantamount to establishing a non-zero value of the theta13 mixing angle, opens the way to studies of CP violation in the leptonic sector, the neutrino spectral mass pattern ordering and neutrino oscillations in matter, the driving motivations of the next generation of neutrino experiments. In this thesis, we study the MINOS potential for measuring theta13 in the context of the MINOS Mock Data Challenge using a multivariate discriminant analysis method. We show the method's validity in the application to nue event classification and background identification, as well as in its ability to identify a nue signal in a Mock Data sample generated with undisclosed parameters. An independent shower reconstruction method based on three-dimensional hit matching and clustering was developed, providing several useful discriminator variables used in the multivariate analysis method. We also demonstrate that within 2 years of running, MINOS has the potential to improve the current best limit on theta 13, from the CHOOZ experiment, by a factor of 2.

  14. SELECTION AND TRAINING, A SURVEY OF IOWA MANUFACTURING FIRMS. MONOGRAPH SERIES NO. 4.

    ERIC Educational Resources Information Center

    SHERIFF, DON R.; AND OTHERS

    INFORMATION ON EMPLOYEE SELECTION AND TRAINING ACTIVITIES WAS SECURED FROM QUESTIONNAIRES RETURNED BY 215 OF 283 FIRMS EMPLOYING AT LEAST 100 PERSONS. DATA FROM 207 SEPARATE ITEMS FOR EACH FIRM WERE KEY PUNCHED AND TABULATED INTO MULTIVARIATE CROSS-CLASSIFICATIONS. OVER 60 PERCENT OF THE FIRMS WERE IN CITIES HAVING OVER 25,000 POPULATION, 40…

  15. Distributed effects of methylphenidate on the network structure of the resting brain: a connectomic pattern classification analysis.

    PubMed

    Sripada, Chandra Sekhar; Kessler, Daniel; Welsh, Robert; Angstadt, Michael; Liberzon, Israel; Phan, K Luan; Scott, Clayton

    2013-11-01

    Methylphenidate is a psychostimulant medication that produces improvements in functions associated with multiple neurocognitive systems. To investigate the potentially distributed effects of methylphenidate on the brain's intrinsic network architecture, we coupled resting state imaging with multivariate pattern classification. In a within-subject, double-blind, placebo-controlled, randomized, counterbalanced, cross-over design, 32 healthy human volunteers received either methylphenidate or placebo prior to two fMRI resting state scans separated by approximately one week. Resting state connectomes were generated by placing regions of interest at regular intervals throughout the brain, and these connectomes were submitted for support vector machine analysis. We found that methylphenidate produces a distributed, reliably detected, multivariate neural signature. Methylphenidate effects were evident across multiple resting state networks, especially visual, somatomotor, and default networks. Methylphenidate reduced coupling within visual and somatomotor networks. In addition, default network exhibited decoupling with several task positive networks, consistent with methylphenidate modulation of the competitive relationship between these networks. These results suggest that connectivity changes within and between large-scale networks are potentially involved in the mechanisms by which methylphenidate improves attention functioning. Copyright © 2013 Elsevier Inc. All rights reserved.

  16. Tools based on multivariate statistical analysis for classification of soil and groundwater in Apulian agricultural sites.

    PubMed

    Ielpo, Pierina; Leardi, Riccardo; Pappagallo, Giuseppe; Uricchio, Vito Felice

    2017-06-01

    In this paper, the results obtained from multivariate statistical techniques such as PCA (Principal component analysis) and LDA (Linear discriminant analysis) applied to a wide soil data set are presented. The results have been compared with those obtained on a groundwater data set, whose samples were collected together with soil ones, within the project "Improvement of the Regional Agro-meteorological Monitoring Network (2004-2007)". LDA, applied to soil data, has allowed to distinguish the geographical origin of the sample from either one of the two macroaeras: Bari and Foggia provinces vs Brindisi, Lecce e Taranto provinces, with a percentage of correct prediction in cross validation of 87%. In the case of the groundwater data set, the best classification was obtained when the samples were grouped into three macroareas: Foggia province, Bari province and Brindisi, Lecce and Taranto provinces, by reaching a percentage of correct predictions in cross validation of 84%. The obtained information can be very useful in supporting soil and water resource management, such as the reduction of water consumption and the reduction of energy and chemical (nutrients and pesticides) inputs in agriculture.

  17. Gap Shape Classification using Landscape Indices and Multivariate Statistics

    PubMed Central

    Wu, Chih-Da; Cheng, Chi-Chuan; Chang, Che-Chang; Lin, Chinsu; Chang, Kun-Cheng; Chuang, Yung-Chung

    2016-01-01

    This study proposed a novel methodology to classify the shape of gaps using landscape indices and multivariate statistics. Patch-level indices were used to collect the qualified shape and spatial configuration characteristics for canopy gaps in the Lienhuachih Experimental Forest in Taiwan in 1998 and 2002. Non-hierarchical cluster analysis was used to assess the optimal number of gap clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy gap classification. The gaps for the two periods were optimally classified into three categories. In general, gap type 1 had a more complex shape, gap type 2 was more elongated and gap type 3 had the largest gaps that were more regular in shape. The results were evaluated using Wilks’ lambda as satisfactory (p < 0.001). The agreement rate of confusion matrices exceeded 96%. Differences in gap characteristics between the classified gap types that were determined using a one-way ANOVA showed a statistical significance in all patch indices (p = 0.00), except for the Euclidean nearest neighbor distance (ENN) in 2002. Taken together, these results demonstrated the feasibility and applicability of the proposed methodology to classify the shape of a gap. PMID:27901127

  18. Gap Shape Classification using Landscape Indices and Multivariate Statistics.

    PubMed

    Wu, Chih-Da; Cheng, Chi-Chuan; Chang, Che-Chang; Lin, Chinsu; Chang, Kun-Cheng; Chuang, Yung-Chung

    2016-11-30

    This study proposed a novel methodology to classify the shape of gaps using landscape indices and multivariate statistics. Patch-level indices were used to collect the qualified shape and spatial configuration characteristics for canopy gaps in the Lienhuachih Experimental Forest in Taiwan in 1998 and 2002. Non-hierarchical cluster analysis was used to assess the optimal number of gap clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy gap classification. The gaps for the two periods were optimally classified into three categories. In general, gap type 1 had a more complex shape, gap type 2 was more elongated and gap type 3 had the largest gaps that were more regular in shape. The results were evaluated using Wilks' lambda as satisfactory (p < 0.001). The agreement rate of confusion matrices exceeded 96%. Differences in gap characteristics between the classified gap types that were determined using a one-way ANOVA showed a statistical significance in all patch indices (p = 0.00), except for the Euclidean nearest neighbor distance (ENN) in 2002. Taken together, these results demonstrated the feasibility and applicability of the proposed methodology to classify the shape of a gap.

  19. Land Cover Change Detection using Neural Network and Grid Cells Techniques

    NASA Astrophysics Data System (ADS)

    Bagan, H.; Li, Z.; Tangud, T.; Yamagata, Y.

    2017-12-01

    In recent years, many advanced neural network methods have been applied in land cover classification, each of which has both strengths and limitations. In which, the self-organizing map (SOM) neural network method have been used to solve remote sensing data classification problems and have shown potential for efficient classification of remote sensing data. In SOM, both the distribution and the topology of features of the input layer are identified by using an unsupervised, competitive, neighborhood learning method. The high-dimensional data are then projected onto a low-dimensional map (competitive layer), usually as a two-dimensional map. The neurons (nodes) in the competitive layer are arranged by topological order in the input space. Spatio-temporal analyses of land cover change based on grid cells have demonstrated that gridded data are useful for obtaining spatial and temporal information about areas that are smaller than municipal scale and are uniform in size. Analysis based on grid cells has many advantages: grid cells all have the same size allowing for easy comparison; grids integrate easily with other scientific data; grids are stable over time and thus facilitate the modelling and analysis of very large multivariate spatial data sets. This study chose time-series MODIS and Landsat images as data sources, applied SOM neural network method to identify the land utilization in Inner Mongolia Autonomous Region of China. Then the results were integrated into grid cell to get the dynamic change maps. Land cover change using MODIS data in Inner Mongolia showed that urban area increased more than fivefold in recent 15 years, along with the growth of mining area. In terms of geographical distribution, the most obvious place of urban expansion is Ordos in southwest Inner Mongolia. The results using Landsat images from 1986 to 2014 in northeastern part of the Inner Mongolia show degradation in grassland from 1986 to 2014. Grid-cell-based spatial correlation analysis also confirmed a strong negative correlation between grassland and barren land, indicating that grassland degradation in this region is due to the urbanization and coal mining activities over the past three decades.

  20. Diagnostic classification of macular ganglion cell and retinal nerve fiber layer analysis: differentiation of false-positives from glaucoma.

    PubMed

    Kim, Ko Eun; Jeoung, Jin Wook; Park, Ki Ho; Kim, Dong Myung; Kim, Seok Hwan

    2015-03-01

    To investigate the rate and associated factors of false-positive diagnostic classification of ganglion cell analysis (GCA) and retinal nerve fiber layer (RNFL) maps, and characteristic false-positive patterns on optical coherence tomography (OCT) deviation maps. Prospective, cross-sectional study. A total of 104 healthy eyes of 104 normal participants. All participants underwent peripapillary and macular spectral-domain (Cirrus-HD, Carl Zeiss Meditec Inc, Dublin, CA) OCT scans. False-positive diagnostic classification was defined as yellow or red color-coded areas for GCA and RNFL maps. Univariate and multivariate logistic regression analyses were used to determine associated factors. Eyes with abnormal OCT deviation maps were categorized on the basis of the shape and location of abnormal color-coded area. Differences in clinical characteristics among the subgroups were compared. (1) The rate and associated factors of false-positive OCT maps; (2) patterns of false-positive, color-coded areas on the GCA deviation map and associated clinical characteristics. Of the 104 healthy eyes, 42 (40.4%) and 32 (30.8%) showed abnormal diagnostic classifications on any of the GCA and RNFL maps, respectively. Multivariate analysis revealed that false-positive GCA diagnostic classification was associated with longer axial length and larger fovea-disc angle, whereas longer axial length and smaller disc area were associated with abnormal RNFL maps. Eyes with abnormal GCA deviation map were categorized as group A (donut-shaped round area around the inner annulus), group B (island-like isolated area), and group C (diffuse, circular area with an irregular inner margin in either). The axial length showed a significant increasing trend from group A to C (P=0.001), and likewise, the refractive error was more myopic in group C than in groups A (P=0.015) and B (P=0.014). Group C had thinner average ganglion cell-inner plexiform layer thickness compared with other groups (group A=B>C, P=0.004). Abnormal OCT diagnostic classification should be interpreted with caution, especially in eyes with long axial lengths, large fovea-disc angles, and small optic discs. Our findings suggest that the characteristic patterns of OCT deviation map can provide useful clues to distinguish glaucomatous changes from false-positive findings. Copyright © 2015 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  1. Predictors of clinical-pathologic stage discrepancy in oral cavity squamous cell carcinoma: A National Cancer Database study.

    PubMed

    Kılıç, Sarah S; Kılıç, Suat; Crippen, Meghan M; Varughese, Denny; Eloy, Jean Anderson; Baredes, Soly; Mahmoud, Omar M; Park, Richard Chan Woo

    2018-04-01

    Few studies have examined the frequency and survival implications of clinicopathologic stage discrepancy in oral cavity squamous cell carcinoma (SCC). Oral cavity SCC cases with full pathologic staging information were identified in the National Cancer Database (NCDB). Clinical and pathologic stages were compared. Multivariate logistic regressions were performed to identify factors associated with stage discrepancy. There were 9110 cases identified, of which 67.3% of the cases were stage concordant, 19.9% were upstaged, and 12.8% were downstaged. The N classification discordance (28.5%) was more common than T classification discordance (27.6%). In cases of T classification discordance, downstaging is more common than upstaging (15.4% vs 12.1% of cases), but in cases of N classification discordance, the reverse is true; upstaging is much more common than downstaging (20.1 vs 8.4% of cases). Clinicopathologic stage discrepancy in oral cavity SCC is a common phenomenon that is associated with a number of clinical factors and has survival implications. © 2018 Wiley Periodicals, Inc.

  2. Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding.

    PubMed

    Xu, Yun; Muhamadali, Howbeer; Sayqal, Ali; Dixon, Neil; Goodacre, Royston

    2016-10-28

    Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a "pure" regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.

  3. Research on Classification of Chinese Text Data Based on SVM

    NASA Astrophysics Data System (ADS)

    Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao

    2017-09-01

    Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.

  4. Monitoring of Water Spectral Pattern Reveals Differences in Probiotics Growth When Used for Rapid Bacteria Selection.

    PubMed

    Slavchev, Aleksandar; Kovacs, Zoltan; Koshiba, Haruki; Nagai, Airi; Bázár, György; Krastanov, Albert; Kubota, Yousuke; Tsenkova, Roumiana

    2015-01-01

    Development of efficient screening method coupled with cell functionality evaluation is highly needed in contemporary microbiology. The presented novel concept and fast non-destructive method brings in to play the water spectral pattern of the solution as a molecular fingerprint of the cell culture system. To elucidate the concept, NIR spectroscopy with Aquaphotomics were applied to monitor the growth of sixteen Lactobacillus bulgaricus one Lactobacillus pentosus and one Lactobacillus gasseri bacteria strains. Their growth rate, maximal optical density, low pH and bile tolerances were measured and further used as a reference data for analysis of the simultaneously acquired spectral data. The acquired spectral data in the region of 1100-1850nm was subjected to various multivariate data analyses - PCA, OPLS-DA, PLSR. The results showed high accuracy of bacteria strains classification according to their probiotic strength. Most informative spectral fingerprints covered the first overtone of water, emphasizing the relation of water molecular system to cell functionality.

  5. Machine learning methods enable predictive modeling of antibody feature:function relationships in RV144 vaccinees.

    PubMed

    Choi, Ickwon; Chung, Amy W; Suscovich, Todd J; Rerks-Ngarm, Supachai; Pitisuttithum, Punnee; Nitayaphan, Sorachai; Kaewkungwal, Jaranit; O'Connell, Robert J; Francis, Donald; Robb, Merlin L; Michael, Nelson L; Kim, Jerome H; Alter, Galit; Ackerman, Margaret E; Bailey-Kellogg, Chris

    2015-04-01

    The adaptive immune response to vaccination or infection can lead to the production of specific antibodies to neutralize the pathogen or recruit innate immune effector cells for help. The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity) and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release). We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates.

  6. Preliminary identification of unicellular algal genus by using combined confocal resonance Raman spectroscopy with PCA and DPLS analysis

    NASA Astrophysics Data System (ADS)

    He, Shixuan; Xie, Wanyi; Zhang, Ping; Fang, Shaoxi; Li, Zhe; Tang, Peng; Gao, Xia; Guo, Jinsong; Tlili, Chaker; Wang, Deqiang

    2018-02-01

    The analysis of algae and dominant alga plays important roles in ecological and environmental fields since it can be used to forecast water bloom and control its potential deleterious effects. Herein, we combine in vivo confocal resonance Raman spectroscopy with multivariate analysis methods to preliminary identify the three algal genera in water blooms at unicellular scale. Statistical analysis of characteristic Raman peaks demonstrates that certain shifts and different normalized intensities, resulting from composition of different carotenoids, exist in Raman spectra of three algal cells. Principal component analysis (PCA) scores and corresponding loading weights show some differences from Raman spectral characteristics which are caused by vibrations of carotenoids in unicellular algae. Then, discriminant partial least squares (DPLS) classification method is used to verify the effectiveness of algal identification with confocal resonance Raman spectroscopy. Our results show that confocal resonance Raman spectroscopy combined with PCA and DPLS could handle the preliminary identification of dominant alga for forecasting and controlling of water blooms.

  7. Machine Learning Methods Enable Predictive Modeling of Antibody Feature:Function Relationships in RV144 Vaccinees

    PubMed Central

    Choi, Ickwon; Chung, Amy W.; Suscovich, Todd J.; Rerks-Ngarm, Supachai; Pitisuttithum, Punnee; Nitayaphan, Sorachai; Kaewkungwal, Jaranit; O'Connell, Robert J.; Francis, Donald; Robb, Merlin L.; Michael, Nelson L.; Kim, Jerome H.; Alter, Galit; Ackerman, Margaret E.; Bailey-Kellogg, Chris

    2015-01-01

    The adaptive immune response to vaccination or infection can lead to the production of specific antibodies to neutralize the pathogen or recruit innate immune effector cells for help. The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity) and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release). We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates. PMID:25874406

  8. Supervised DNA Barcodes species classification: analysis, comparisons and results

    PubMed Central

    2014-01-01

    Background Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms. Methods In this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods. Results A software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode classification methods. On empirical data their classification performances are at a comparable level to the other methods. Conclusions The classification analysis shows that supervised machine learning methods are promising candidates for handling with success the DNA Barcoding species classification problem, obtaining excellent performances. To conclude, a powerful tool to perform species identification is now available to the DNA Barcoding community. PMID:24721333

  9. Estimation of typhoon rainfall in GaoPing River: A Multivariate Maximum Entropy Method

    NASA Astrophysics Data System (ADS)

    Pei-Jui, Wu; Hwa-Lung, Yu

    2016-04-01

    The heavy rainfall from typhoons is the main factor of the natural disaster in Taiwan, which causes the significant loss of human lives and properties. Statistically average 3.5 typhoons invade Taiwan every year, and the serious typhoon, Morakot in 2009, impacted Taiwan in recorded history. Because the duration, path and intensity of typhoon, also affect the temporal and spatial rainfall type in specific region , finding the characteristics of the typhoon rainfall type is advantageous when we try to estimate the quantity of rainfall. This study developed a rainfall prediction model and can be divided three parts. First, using the EEOF(extended empirical orthogonal function) to classify the typhoon events, and decompose the standard rainfall type of all stations of each typhoon event into the EOF and PC(principal component). So we can classify the typhoon events which vary similarly in temporally and spatially as the similar typhoon types. Next, according to the classification above, we construct the PDF(probability density function) in different space and time by means of using the multivariate maximum entropy from the first to forth moment statistically. Therefore, we can get the probability of each stations of each time. Final we use the BME(Bayesian Maximum Entropy method) to construct the typhoon rainfall prediction model , and to estimate the rainfall for the case of GaoPing river which located in south of Taiwan.This study could be useful for typhoon rainfall predictions in future and suitable to government for the typhoon disaster prevention .

  10. Joint Concept Correlation and Feature-Concept Relevance Learning for Multilabel Classification.

    PubMed

    Zhao, Xiaowei; Ma, Zhigang; Li, Zhi; Li, Zhihui

    2018-02-01

    In recent years, multilabel classification has attracted significant attention in multimedia annotation. However, most of the multilabel classification methods focus only on the inherent correlations existing among multiple labels and concepts and ignore the relevance between features and the target concepts. To obtain more robust multilabel classification results, we propose a new multilabel classification method aiming to capture the correlations among multiple concepts by leveraging hypergraph that is proved to be beneficial for relational learning. Moreover, we consider mining feature-concept relevance, which is often overlooked by many multilabel learning algorithms. To better show the feature-concept relevance, we impose a sparsity constraint on the proposed method. We compare the proposed method with several other multilabel classification methods and evaluate the classification performance by mean average precision on several data sets. The experimental results show that the proposed method outperforms the state-of-the-art methods.

  11. Object-Based Random Forest Classification of Land Cover from Remotely Sensed Imagery for Industrial and Mining Reclamation

    NASA Astrophysics Data System (ADS)

    Chen, Y.; Luo, M.; Xu, L.; Zhou, X.; Ren, J.; Zhou, J.

    2018-04-01

    The RF method based on grid-search parameter optimization could achieve a classification accuracy of 88.16 % in the classification of images with multiple feature variables. This classification accuracy was higher than that of SVM and ANN under the same feature variables. In terms of efficiency, the RF classification method performs better than SVM and ANN, it is more capable of handling multidimensional feature variables. The RF method combined with object-based analysis approach could highlight the classification accuracy further. The multiresolution segmentation approach on the basis of ESP scale parameter optimization was used for obtaining six scales to execute image segmentation, when the segmentation scale was 49, the classification accuracy reached the highest value of 89.58 %. The classification accuracy of object-based RF classification was 1.42 % higher than that of pixel-based classification (88.16 %), and the classification accuracy was further improved. Therefore, the RF classification method combined with object-based analysis approach could achieve relatively high accuracy in the classification and extraction of land use information for industrial and mining reclamation areas. Moreover, the interpretation of remotely sensed imagery using the proposed method could provide technical support and theoretical reference for remotely sensed monitoring land reclamation.

  12. BANYAN. XI. The BANYAN Σ Multivariate Bayesian Algorithm to Identify Members of Young Associations with 150 pc

    NASA Astrophysics Data System (ADS)

    Gagné, Jonathan; Mamajek, Eric E.; Malo, Lison; Riedel, Adric; Rodriguez, David; Lafrenière, David; Faherty, Jacqueline K.; Roy-Loubier, Olivier; Pueyo, Laurent; Robin, Annie C.; Doyon, René

    2018-03-01

    BANYAN Σ is a new Bayesian algorithm to identify members of young stellar associations within 150 pc of the Sun. It includes 27 young associations with ages in the range ∼1–800 Myr, modeled with multivariate Gaussians in six-dimensional (6D) XYZUVW space. It is the first such multi-association classification tool to include the nearest sub-groups of the Sco-Cen OB star-forming region, the IC 2602, IC 2391, Pleiades and Platais 8 clusters, and the ρ Ophiuchi, Corona Australis, and Taurus star formation regions. A model of field stars is built from a mixture of multivariate Gaussians based on the Besançon Galactic model. The algorithm can derive membership probabilities for objects with only sky coordinates and proper motion, but can also include parallax and radial velocity measurements, as well as spectrophotometric distance constraints from sequences in color–magnitude or spectral type–magnitude diagrams. BANYAN Σ benefits from an analytical solution to the Bayesian marginalization integrals over unknown radial velocities and distances that makes it more accurate and significantly faster than its predecessor BANYAN II. A contamination versus hit rate analysis is presented and demonstrates that BANYAN Σ achieves a better classification performance than other moving group tools available in the literature, especially in terms of cross-contamination between young associations. An updated list of bona fide members in the 27 young associations, augmented by the Gaia-DR1 release, as well as all parameters for the 6D multivariate Gaussian models for each association and the Galactic field neighborhood within 300 pc are presented. This new tool will make it possible to analyze large data sets such as the upcoming Gaia-DR2 to identify new young stars. IDL and Python versions of BANYAN Σ are made available with this publication, and a more limited online web tool is available at http://www.exoplanetes.umontreal.ca/banyan/banyansigma.php.

  13. Multivariate logistic regression analysis of postoperative complications and risk model establishment of gastrectomy for gastric cancer: A single-center cohort report.

    PubMed

    Zhou, Jinzhe; Zhou, Yanbing; Cao, Shougen; Li, Shikuan; Wang, Hao; Niu, Zhaojian; Chen, Dong; Wang, Dongsheng; Lv, Liang; Zhang, Jian; Li, Yu; Jiao, Xuelong; Tan, Xiaojie; Zhang, Jianli; Wang, Haibo; Zhang, Bingyuan; Lu, Yun; Sun, Zhenqing

    2016-01-01

    Reporting of surgical complications is common, but few provide information about the severity and estimate risk factors of complications. If have, but lack of specificity. We retrospectively analyzed data on 2795 gastric cancer patients underwent surgical procedure at the Affiliated Hospital of Qingdao University between June 2007 and June 2012, established multivariate logistic regression model to predictive risk factors related to the postoperative complications according to the Clavien-Dindo classification system. Twenty-four out of 86 variables were identified statistically significant in univariate logistic regression analysis, 11 significant variables entered multivariate analysis were employed to produce the risk model. Liver cirrhosis, diabetes mellitus, Child classification, invasion of neighboring organs, combined resection, introperative transfusion, Billroth II anastomosis of reconstruction, malnutrition, surgical volume of surgeons, operating time and age were independent risk factors for postoperative complications after gastrectomy. Based on logistic regression equation, p=Exp∑BiXi / (1+Exp∑BiXi), multivariate logistic regression predictive model that calculated the risk of postoperative morbidity was developed, p = 1/(1 + e((4.810-1.287X1-0.504X2-0.500X3-0.474X4-0.405X5-0.318X6-0.316X7-0.305X8-0.278X9-0.255X10-0.138X11))). The accuracy, sensitivity and specificity of the model to predict the postoperative complications were 86.7%, 76.2% and 88.6%, respectively. This risk model based on Clavien-Dindo grading severity of complications system and logistic regression analysis can predict severe morbidity specific to an individual patient's risk factors, estimate patients' risks and benefits of gastric surgery as an accurate decision-making tool and may serve as a template for the development of risk models for other surgical groups.

  14. Predictive factors in patients with hepatocellular carcinoma receiving sorafenib therapy using time-dependent receiver operating characteristic analysis.

    PubMed

    Nishikawa, Hiroki; Nishijima, Norihiro; Enomoto, Hirayuki; Sakamoto, Azusa; Nasu, Akihiro; Komekado, Hideyuki; Nishimura, Takashi; Kita, Ryuichi; Kimura, Toru; Iijima, Hiroko; Nishiguchi, Shuhei; Osaki, Yukio

    2017-01-01

    To investigate variables before sorafenib therapy on the clinical outcomes in hepatocellular carcinoma (HCC) patients receiving sorafenib and to further assess and compare the predictive performance of continuous parameters using time-dependent receiver operating characteristics (ROC) analysis. A total of 225 HCC patients were analyzed. We retrospectively examined factors related to overall survival (OS) and progression free survival (PFS) using univariate and multivariate analyses. Subsequently, we performed time-dependent ROC analysis of continuous parameters which were significant in the multivariate analysis in terms of OS and PFS. Total sum of area under the ROC in all time points (defined as TAAT score) in each case was calculated. Our cohort included 175 male and 50 female patients (median age, 72 years) and included 158 Child-Pugh A and 67 Child-Pugh B patients. The median OS time was 0.68 years, while the median PFS time was 0.24 years. On multivariate analysis, gender, body mass index (BMI), Child-Pugh classification, extrahepatic metastases, tumor burden, aspartate aminotransferase (AST) and alpha-fetoprotein (AFP) were identified as significant predictors of OS and ECOG-performance status, Child-Pugh classification and extrahepatic metastases were identified as significant predictors of PFS. Among three continuous variables (i.e., BMI, AST and AFP), AFP had the highest TAAT score for the entire cohort. In subgroup analyses, AFP had the highest TAAT score except for Child-Pugh B and female among three continuous variables. In continuous variables, AFP could have higher predictive accuracy for survival in HCC patients undergoing sorafenib therapy.

  15. Noninvasive prostate cancer screening based on serum surface-enhanced Raman spectroscopy and support vector machine

    NASA Astrophysics Data System (ADS)

    Li, Shaoxin; Zhang, Yanjiao; Xu, Junfa; Li, Linfang; Zeng, Qiuyao; Lin, Lin; Guo, Zhouyi; Liu, Zhiming; Xiong, Honglian; Liu, Songhao

    2014-09-01

    This study aims to present a noninvasive prostate cancer screening methods using serum surface-enhanced Raman scattering (SERS) and support vector machine (SVM) techniques through peripheral blood sample. SERS measurements are performed using serum samples from 93 prostate cancer patients and 68 healthy volunteers by silver nanoparticles. Three types of kernel functions including linear, polynomial, and Gaussian radial basis function (RBF) are employed to build SVM diagnostic models for classifying measured SERS spectra. For comparably evaluating the performance of SVM classification models, the standard multivariate statistic analysis method of principal component analysis (PCA) is also applied to classify the same datasets. The study results show that for the RBF kernel SVM diagnostic model, the diagnostic accuracy of 98.1% is acquired, which is superior to the results of 91.3% obtained from PCA methods. The receiver operating characteristic curve of diagnostic models further confirm above research results. This study demonstrates that label-free serum SERS analysis technique combined with SVM diagnostic algorithm has great potential for noninvasive prostate cancer screening.

  16. Discriminant analysis in wildlife research: Theory and applications

    USGS Publications Warehouse

    Williams, B.K.; Capen, D.E.

    1981-01-01

    Discriminant analysis, a method of analyzing grouped multivariate data, is often used in ecological investigations. It has both a predictive and an explanatory function, the former aiming at classification of individuals of unknown group membership. The goal of the latter function is to exhibit group separation by means of linear transforms, and the corresponding method is called canonical analysis. This discussion focuses on the application of canonical analysis in ecology. In order to clarify its meaning, a parametric approach is taken instead of the usual data-based formulation. For certain assumptions the data-based canonical variates are shown to result from maximum likelihood estimation, thus insuring consistency and asymptotic efficiency. The distorting effects of covariance heterogeneity are examined, as are certain difficulties which arise in interpreting the canonical functions. A 'distortion metric' is defined, by means of which distortions resulting from the canonical transformation can be assessed. Several sampling problems which arise in ecological applications are considered. It is concluded that the method may prove valuable for data exploration, but is of limited value as an inferential procedure.

  17. Automated Classification and Analysis of Non-metallic Inclusion Data Sets

    NASA Astrophysics Data System (ADS)

    Abdulsalam, Mohammad; Zhang, Tongsheng; Tan, Jia; Webler, Bryan A.

    2018-05-01

    The aim of this study is to utilize principal component analysis (PCA), clustering methods, and correlation analysis to condense and examine large, multivariate data sets produced from automated analysis of non-metallic inclusions. Non-metallic inclusions play a major role in defining the properties of steel and their examination has been greatly aided by automated analysis in scanning electron microscopes equipped with energy dispersive X-ray spectroscopy. The methods were applied to analyze inclusions on two sets of samples: two laboratory-scale samples and four industrial samples from a near-finished 4140 alloy steel components with varying machinability. The laboratory samples had well-defined inclusions chemistries, composed of MgO-Al2O3-CaO, spinel (MgO-Al2O3), and calcium aluminate inclusions. The industrial samples contained MnS inclusions as well as (Ca,Mn)S + calcium aluminate oxide inclusions. PCA could be used to reduce inclusion chemistry variables to a 2D plot, which revealed inclusion chemistry groupings in the samples. Clustering methods were used to automatically classify inclusion chemistry measurements into groups, i.e., no user-defined rules were required.

  18. A matrix-based method of moments for fitting the multivariate random effects model for meta-analysis and meta-regression

    PubMed Central

    Jackson, Dan; White, Ian R; Riley, Richard D

    2013-01-01

    Multivariate meta-analysis is becoming more commonly used. Methods for fitting the multivariate random effects model include maximum likelihood, restricted maximum likelihood, Bayesian estimation and multivariate generalisations of the standard univariate method of moments. Here, we provide a new multivariate method of moments for estimating the between-study covariance matrix with the properties that (1) it allows for either complete or incomplete outcomes and (2) it allows for covariates through meta-regression. Further, for complete data, it is invariant to linear transformations. Our method reduces to the usual univariate method of moments, proposed by DerSimonian and Laird, in a single dimension. We illustrate our method and compare it with some of the alternatives using a simulation study and a real example. PMID:23401213

  19. Processing RoxAnn sonar data to improve its categorization of lake bed surficial sediments

    USGS Publications Warehouse

    Cholwek, Gary; Bonde, John; Li, Xing; Richards, Carl; Yin, Karen

    2000-01-01

    To categorize spawning and nursery habitat for lake trout in Minnesota's near shore waters of Lake Superior, data was collected with a single beam echo sounder coupled with a RoxAnn bottom classification sensor. Test areas representative of different bottom surficial substrates were sampled. The collected data consisted of acoustic signals which showed both depth and substrate type. The location of the signals was tagged in real-time with a DGPS. All data was imported into a GIS database. To better interpret the output signal from the RoxAnn, several pattern classifiers were developed by multivariate statistical method. From the data a detailed and accurate map of lake bed bathymetry and surficial substrate types was produced. This map will be of great value to fishery and other natural resource managers.

  20. Missing data exploration: highlighting graphical presentation of missing pattern.

    PubMed

    Zhang, Zhongheng

    2015-12-01

    Functions shipped with R base can fulfill many tasks of missing data handling. However, because the data volume of electronic medical record (EMR) system is always very large, more sophisticated methods may be helpful in data management. The article focuses on missing data handling by using advanced techniques. There are three types of missing data, that is, missing completely at random (MCAR), missing at random (MAR) and not missing at random (NMAR). This classification system depends on how missing values are generated. Two packages, Multivariate Imputation by Chained Equations (MICE) and Visualization and Imputation of Missing Values (VIM), provide sophisticated functions to explore missing data pattern. In particular, the VIM package is especially helpful in visual inspection of missing data. Finally, correlation analysis provides information on the dependence of missing data on other variables. Such information is useful in subsequent imputations.

  1. Sex estimation of the tibia in modern Turkish: A computed tomography study.

    PubMed

    Ekizoglu, Oguzhan; Er, Ali; Bozdag, Mustafa; Akcaoglu, Mustafa; Can, Ismail Ozgur; García-Donas, Julieta G; Kranioti, Elena F

    2016-11-01

    The utilization of computed tomography is beneficial for the analysis of skeletal remains and it has important advantages for anthropometric studies. The present study investigated morphometry of left tibia using CT images of a contemporary Turkish population. Seven parameters were measured on 203 individuals (124 males and 79 females) within the 19-92-years age group. The first objective of this study was to provide population-specific sex estimation equations for the contemporary Turkish population based on CT images. A second objective was to test the sex estimation formulae on Southern Europeans by Kranioti and Apostol (2015). Univariate discriminant functions resulted in classification accuracy that ranged from 66 to 86%. The best single variable was found to be upper epiphyseal breadth (86%) followed by lower epiphyseal breadth (85%). Multivariate discriminant functions resulted in classification accuracy for cross-validated data ranged from 79 to 86%. Applying the multivariate sex estimation formulae on Southern Europeans (SE) by Kranioti and Apostol in our sample resulted in very high classification accuracy ranging from 81 to 88%. In addition, 35.5-47% of the total Turkish sample is correctly classified with over 95% posterior probability, which is actually higher than the one reported for the original sample (25-43%). We conclude that the tibia is a very useful bone for sex estimation in the contemporary Turkish population. Moreover, our test results support the hypothesis that the SE formulae are sufficient for the contemporary Turkish population and they can be used safely for criminal investigations when posterior probabilities are over 95%. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  2. Crystallization tendency of active pharmaceutical ingredients following rapid solvent evaporation--classification and comparison with crystallization tendency from undercooled melts.

    PubMed

    Van Eerdenbrugh, Bernard; Baird, Jared A; Taylor, Lynne S

    2010-09-01

    In this study, the crystallization behavior of a variety of compounds was studied following rapid solvent evaporation using spin coating. Initial screening to determine model compound suitability was performed using a structurally diverse set of 51 compounds in three different solvent systems [dichloromethane (DCM), a 1:1 (w/w) dichloromethane/ethanol mixture (MIX), and ethanol (EtOH)]. Of this starting set of 153 drug-solvent combinations, 93 (40 compounds) were selected for further evaluation based on solubility, chemical solution stability, and processability criteria. These systems were spin coated and their crystallization was monitored using polarized light microscopy (7 days, dry conditions). The crystallization behavior of the samples could be classified as rapid (Class I: 39 cases), intermediate (Class II: 23 cases), or slow (Class III: 31 cases). The solvent system employed influenced the classification outcome for only four of the compounds. The various compounds showed very diverse crystallization behavior. Upon comparison of classification results with those of a previous study, where cooling from the melt was used as a preparation technique, a good similarity was found whereby 68% of the cases were identically classified. Multivariate analysis was performed using a set of relevant physicochemical compound characteristics. It was found that a number of these parameters tended to differ between the different classes. These could be further interpreted in terms of the nature of the crystallization process. Additional multivariate analysis on the separate classes of compounds indicated some potential in predicting the crystallization tendency of a given compound.

  3. Prognostic factors of non-functioning pancreatic neuroendocrine tumor revisited: The value of WHO 2010 classification.

    PubMed

    Bu, Jiyoung; Youn, Sangmin; Kwon, Wooil; Jang, Kee Taek; Han, Sanghyup; Han, Sunjong; You, Younghun; Heo, Jin Seok; Choi, Seong Ho; Choi, Dong Wook

    2018-02-01

    Various factors have been reported as prognostic factors of non-functional pancreatic neuroendocrine tumors (NF-pNETs). There remains some controversy as to the factors which might actually serve to successfully prognosticate future manifestation and diagnosis of NF-pNETs. As well, consensus regarding management strategy has never been achieved. The aim of this study is to further investigate potential prognostic factors using a large single-center cohort to help determine the management strategy of NF-pNETs. During the time period 1995 through 2013, 166 patients with NF-pNETs who underwent surgery in Samsung Medical Center were entered in a prospective database, and those factors thought to represent predictors of prognosis were tested in uni- and multivariate models. The median follow-up time was 46.5 months; there was a maximum follow-up period of 217 months. The five-year overall survival and disease-free survival rates were 88.5% and 77.0%, respectively. The 2010 WHO classification was found to be the only prognostic factor which affects overall survival and disease-free survival in multivariate analysis. Also, pathologic tumor size and preoperative image tumor size correlated strongly with the WHO grades ( p <0.001, and p <0.001). Our study demonstrates that 2010 WHO classification represents a valuable prognostic factor of NF-pNETs and tumor size on preoperative image correlated with WHO grade. In view of the foregoing, the preoperative image size is thought to represent a reasonable reference with regard to determination and development of treatment strategy of NF-pNETs.

  4. Identifying HIV associated neurocognitive disorder using large-scale Granger causality analysis on resting-state functional MRI

    NASA Astrophysics Data System (ADS)

    DSouza, Adora M.; Abidin, Anas Z.; Leistritz, Lutz; Wismüller, Axel

    2017-02-01

    We investigate the applicability of large-scale Granger Causality (lsGC) for extracting a measure of multivariate information flow between pairs of regional brain activities from resting-state functional MRI (fMRI) and test the effectiveness of these measures for predicting a disease state. Such pairwise multivariate measures of interaction provide high-dimensional representations of connectivity profiles for each subject and are used in a machine learning task to distinguish between healthy controls and individuals presenting with symptoms of HIV Associated Neurocognitive Disorder (HAND). Cognitive impairment in several domains can occur as a result of HIV infection of the central nervous system. The current paradigm for assessing such impairment is through neuropsychological testing. With fMRI data analysis, we aim at non-invasively capturing differences in brain connectivity patterns between healthy subjects and subjects presenting with symptoms of HAND. To classify the extracted interaction patterns among brain regions, we use a prototype-based learning algorithm called Generalized Matrix Learning Vector Quantization (GMLVQ). Our approach to characterize connectivity using lsGC followed by GMLVQ for subsequent classification yields good prediction results with an accuracy of 87% and an area under the ROC curve (AUC) of up to 0.90. We obtain a statistically significant improvement (p<0.01) over a conventional Granger causality approach (accuracy = 0.76, AUC = 0.74). High accuracy and AUC values using our multivariate method to connectivity analysis suggests that our approach is able to better capture changes in interaction patterns between different brain regions when compared to conventional Granger causality analysis known from the literature.

  5. Population differences in the postcrania of modern South Africans and the implications for ancestry estimation.

    PubMed

    Liebenberg, Leandi; L'Abbé, Ericka N; Stull, Kyra E

    2015-12-01

    The cranium is widely recognized as the most important skeletal element to use when evaluating population differences and estimating ancestry. However, the cranium is not always intact or available for analysis, which emphasizes the need for postcranial alternatives. The purpose of this study was to quantify postcraniometric differences among South Africans that can be used to estimate ancestry. Thirty-nine standard measurements from 11 postcranial bones were collected from 360 modern black, white and coloured South Africans; the sex and ancestry distribution were equal. Group differences were explored with analysis of variance (ANOVA) and Tukey's honestly significant difference (HSD) test. Linear and flexible discriminant analysis (LDA and FDA, respectively) were conducted with bone models as well as numerous multivariate subsets to identify the model and method that yielded the highest correct classifications. Leave-one-out (LDA) and k-fold (k=10; FDA) cross-validation with equal priors were used for all models. ANOVA and Tukey's HSD results reveal statistically significant differences between at least two of the three groups for the majority of the variables, with varying degrees of group overlap. Bone models, which consisted of all measurements per bone, resulted in low accuracies that ranged from 46% to 63% (LDA) and 41% to 66% (FDA). In contrast, the multivariate subsets, which consisted of different variable combinations from all elements, achieved accuracies as high as 85% (LDA) and 87% (FDA). Thus, when using a multivariate approach, the postcranial skeleton can distinguish among three modern South African groups with high accuracy. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  6. An advanced method for classifying atmospheric circulation types based on prototypes connectivity graph

    NASA Astrophysics Data System (ADS)

    Zagouras, Athanassios; Argiriou, Athanassios A.; Flocas, Helena A.; Economou, George; Fotopoulos, Spiros

    2012-11-01

    Classification of weather maps at various isobaric levels as a methodological tool is used in several problems related to meteorology, climatology, atmospheric pollution and to other fields for many years. Initially the classification was performed manually. The criteria used by the person performing the classification are features of isobars or isopleths of geopotential height, depending on the type of maps to be classified. Although manual classifications integrate the perceptual experience and other unquantifiable qualities of the meteorology specialists involved, these are typically subjective and time consuming. Furthermore, during the last years different approaches of automated methods for atmospheric circulation classification have been proposed, which present automated and so-called objective classifications. In this paper a new method of atmospheric circulation classification of isobaric maps is presented. The method is based on graph theory. It starts with an intelligent prototype selection using an over-partitioning mode of fuzzy c-means (FCM) algorithm, proceeds to a graph formulation for the entire dataset and produces the clusters based on the contemporary dominant sets clustering method. Graph theory is a novel mathematical approach, allowing a more efficient representation of spatially correlated data, compared to the classical Euclidian space representation approaches, used in conventional classification methods. The method has been applied to the classification of 850 hPa atmospheric circulation over the Eastern Mediterranean. The evaluation of the automated methods is performed by statistical indexes; results indicate that the classification is adequately comparable with other state-of-the-art automated map classification methods, for a variable number of clusters.

  7. Classification bias in commercial business lists for retail food stores in the U.S.

    PubMed

    Han, Euna; Powell, Lisa M; Zenk, Shannon N; Rimkus, Leah; Ohri-Vachaspati, Punam; Chaloupka, Frank J

    2012-04-18

    Aspects of the food environment such as the availability of different types of food stores have recently emerged as key modifiable factors that may contribute to the increased prevalence of obesity. Given that many of these studies have derived their results based on secondary datasets and the relationship of food stores with individual weight outcomes has been reported to vary by store type, it is important to understand the extent to which often-used secondary data correctly classify food stores. We evaluated the classification bias of food stores in Dun & Bradstreet (D&B) and InfoUSA commercial business lists. We performed a full census in 274 randomly selected census tracts in the Chicago metropolitan area and collected detailed store attributes inside stores for classification. Store attributes were compared by classification match status and store type. Systematic classification bias by census tract characteristics was assessed in multivariate regression. D&B had a higher classification match rate than InfoUSA for supermarkets and grocery stores, while InfoUSA was higher for convenience stores. Both lists were more likely to correctly classify large supermarkets, grocery stores, and convenience stores with more cash registers and different types of service counters (supermarkets and grocery stores only). The likelihood of a correct classification match for supermarkets and grocery stores did not vary systemically by tract characteristics whereas convenience stores were more likely to be misclassified in predominately Black tracts. Researches can rely on classification of food stores in commercial datasets for supermarkets and grocery stores whereas classifications for convenience and specialty food stores are subject to some systematic bias by neighborhood racial/ethnic composition.

  8. Pancreatic abnormalities detected by endoscopic ultrasound (EUS) in patients without clinical signs of pancreatic disease: any difference between standard and Rosemont classification scoring?

    PubMed

    Petrone, Maria Chiara; Terracciano, Fulvia; Perri, Francesco; Carrara, Silvia; Cavestro, Giulia Martina; Mariani, Alberto; Testoni, Pier Alberto; Arcidiacono, Paolo Giorgio

    2014-01-01

    The prevalence of nine EUS features of chronic pancreatitis (CP) according to the standard Wiersema classification has been investigated in 489 patients undergoing EUS for an indication not related to pancreatico-biliary disease. We showed that 82 subjects (16.8%) had at least one ductular or parenchymal abnormality. Among them, 18 (3.7% of study population) had ≥3 Wiersema criteria suggestive of CP. Recently, a new classification (Rosemont) of EUS findings consistent, suggestive or indeterminate for CP has been proposed. To stratify healthy subjects into different subgroups on the basis of EUS features of CP according to the Wiersema and Rosemont classifications and to evaluate the agreement in the diagnosis of CP with the two scoring systems. Weighted kappa statistics was computed to evaluate the strength of agreement between the two scoring systems. Univariate and multivariate analysis between any EUS abnormality and habits were performed. Eighty-two EUS videos were reviewed. Using the Wiersema classification, 18 subjects showed ≥3 EUS features suggestive of CP. The EUS diagnosis of CP in these 18 subjects was considered as consistent in only one patient, according to Rosemont classification. Weighted Kappa statistics was 0.34 showing that the strength of agreement was 'fair'. Alcohol use and smoking were identified as risk factors for having pancreatic abnormalities on EUS. The prevalence of EUS features consistent or suggestive of CP in healthy subjects according to the Rosemont classification is lower than that assessed by Wiersema criteria. In that regard the Rosemont classification seems to be more accurate in excluding clinically relevant CP. Overall agreement between the two classifications is fair. Copyright © 2014 IAP and EPC. Published by Elsevier B.V. All rights reserved.

  9. Instructional Method Classifications Lack User Language and Orientation

    ERIC Educational Resources Information Center

    Neumann, Susanne; Koper, Rob

    2010-01-01

    Following publications emphasizing the need of a taxonomy for instructional methods, this article presents a literature review on classifications for learning and teaching in order to identify possible classifications for instructional methods. Data was collected for 37 classifications capturing the origins, theoretical underpinnings, purposes and…

  10. Sparse representation based biomarker selection for schizophrenia with integrated analysis of fMRI and SNPs.

    PubMed

    Cao, Hongbao; Duan, Junbo; Lin, Dongdong; Shugart, Yin Yao; Calhoun, Vince; Wang, Yu-Ping

    2014-11-15

    Integrative analysis of multiple data types can take advantage of their complementary information and therefore may provide higher power to identify potential biomarkers that would be missed using individual data analysis. Due to different natures of diverse data modality, data integration is challenging. Here we address the data integration problem by developing a generalized sparse model (GSM) using weighting factors to integrate multi-modality data for biomarker selection. As an example, we applied the GSM model to a joint analysis of two types of schizophrenia data sets: 759,075 SNPs and 153,594 functional magnetic resonance imaging (fMRI) voxels in 208 subjects (92 cases/116 controls). To solve this small-sample-large-variable problem, we developed a novel sparse representation based variable selection (SRVS) algorithm, with the primary aim to identify biomarkers associated with schizophrenia. To validate the effectiveness of the selected variables, we performed multivariate classification followed by a ten-fold cross validation. We compared our proposed SRVS algorithm with an earlier sparse model based variable selection algorithm for integrated analysis. In addition, we compared with the traditional statistics method for uni-variant data analysis (Chi-squared test for SNP data and ANOVA for fMRI data). Results showed that our proposed SRVS method can identify novel biomarkers that show stronger capability in distinguishing schizophrenia patients from healthy controls. Moreover, better classification ratios were achieved using biomarkers from both types of data, suggesting the importance of integrative analysis. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. In vivo Raman spectroscopy of cervix cancers

    NASA Astrophysics Data System (ADS)

    Rubina, S.; Sathe, Priyanka; Dora, Tapas Kumar; Chopra, Supriya; Maheshwari, Amita; Krishna, C. Murali

    2014-03-01

    Cervix-cancer is the third most common female cancer worldwide. It is the leading cancer among Indian females with more than million new diagnosed cases and 50% mortality, annually. The high mortality rates can be attributed to late diagnosis. Efficacy of Raman spectroscopy in classification of normal and pathological conditions in cervix cancers on diverse populations has already been demonstrated. Our earlier ex vivo studies have shown the feasibility of classifying normal and cancer cervix tissues as well as responders/non-responders to Concurrent chemoradiotherapy (CCRT). The present study was carried out to explore feasibility of in vivo Raman spectroscopic methods in classifying normal and cancerous conditions in Indian population. A total of 182 normal and 132 tumor in vivo Raman spectra, from 63 subjects, were recorded using a fiberoptic probe coupled HE-785 spectrometer, under clinical supervision. Spectra were acquired for 5 s and averaged over 3 times at 80 mW laser power. Spectra of normal conditions suggest strong collagenous features and abundance of non-collagenous proteins and DNA in case of tumors. Preprocessed spectra were subjected to Principal Component-Linear Discrimination Analysis (PCLDA) followed by leave-one-out-cross-validation. Classification efficiency of ~96.7% and 100% for normal and cancerous conditions respectively, were observed. Findings of the study corroborates earlier studies and suggest applicability of Raman spectroscopic methods in combination with appropriate multivariate tool for objective, noninvasive and rapid diagnosis of cervical cancers in Indian population. In view of encouraging results, extensive validation studies will be undertaken to confirm the findings.

  12. Use of multivariate analysis to suggest a new molecular classification of colorectal cancer

    PubMed Central

    Domingo, Enric; Ramamoorthy, Rajarajan; Oukrif, Dahmane; Rosmarin, Daniel; Presz, Michal; Wang, Haitao; Pulker, Hannah; Lockstone, Helen; Hveem, Tarjei; Cranston, Treena; Danielsen, Havard; Novelli, Marco; Davidson, Brian; Xu, Zheng-Zhou; Molloy, Peter; Johnstone, Elaine; Holmes, Christopher; Midgley, Rachel; Kerr, David; Sieber, Oliver; Tomlinson, Ian

    2013-01-01

    Abstract Molecular classification of colorectal cancer (CRC) is currently based on microsatellite instability (MSI), KRAS or BRAF mutation and, occasionally, chromosomal instability (CIN). Whilst useful, these categories may not fully represent the underlying molecular subgroups. We screened 906 stage II/III CRCs from the VICTOR clinical trial for somatic mutations. Multivariate analyses (logistic regression, clustering, Bayesian networks) identified the primary molecular associations. Positive associations occurred between: CIN and TP53 mutation; MSI and BRAF mutation; and KRAS and PIK3CA mutations. Negative associations occurred between: MSI and CIN; MSI and NRAS mutation; and KRAS mutation, and each of NRAS, TP53 and BRAF mutations. Some complex relationships were elucidated: KRAS and TP53 mutations had both a direct negative association and a weaker, confounding, positive association via TP53–CIN–MSI–BRAF–KRAS. Our results suggested a new molecular classification of CRCs: (1) MSI+ and/or BRAF-mutant; (2) CIN+ and/or TP53– mutant, with wild-type KRAS and PIK3CA; (3) KRAS- and/or PIK3CA-mutant, CIN+, TP53-wild-type; (4) KRAS– and/or PIK3CA-mutant, CIN–, TP53-wild-type; (5) NRAS-mutant; (6) no mutations; (7) others. As expected, group 1 cancers were mostly proximal and poorly differentiated, usually occurring in women. Unexpectedly, two different types of CIN+ CRC were found: group 2 cancers were usually distal and occurred in men, whereas group 3 showed neither of these associations but were of higher stage. CIN+ cancers have conventionally been associated with all three of these variables, because they have been tested en masse. Our classification also showed potentially improved prognostic capabilities, with group 3, and possibly group 1, independently predicting disease-free survival. Copyright © 2012 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd. PMID:23165447

  13. Best Merge Region Growing with Integrated Probabilistic Classification for Hyperspectral Imagery

    NASA Technical Reports Server (NTRS)

    Tarabalka, Yuliya; Tilton, James C.

    2011-01-01

    A new method for spectral-spatial classification of hyperspectral images is proposed. The method is based on the integration of probabilistic classification within the hierarchical best merge region growing algorithm. For this purpose, preliminary probabilistic support vector machines classification is performed. Then, hierarchical step-wise optimization algorithm is applied, by iteratively merging regions with the smallest Dissimilarity Criterion (DC). The main novelty of this method consists in defining a DC between regions as a function of region statistical and geometrical features along with classification probabilities. Experimental results are presented on a 200-band AVIRIS image of the Northwestern Indiana s vegetation area and compared with those obtained by recently proposed spectral-spatial classification techniques. The proposed method improves classification accuracies when compared to other classification approaches.

  14. Photos for estimating fuel loadings before and after prescribed burning in the upper coastal plain of the southeast

    Treesearch

    Eric R. Scholl; Thomas A. Waldrop

    1999-01-01

    Although prescribed burning is common in the Southeastern United States, most fuel models apply to only western forests. This paper documents a fuel classification system that was developed for plantations of loblolly and longleaf pines for the Upper Coastal Plain region. Multivariate analysis of variance and discriminant function analysis were used to confirm eight...

  15. Reliable Early Classification on Multivariate Time Series with Numerical and Categorical Attributes

    DTIC Science & Technology

    2015-05-22

    design a procedure of feature extraction in REACT named MEG (Mining Equivalence classes with shapelet Generators) based on the concept of...Equivalence Classes Mining [12, 15]. MEG can efficiently and effectively generate the discriminative features. In addition, several strategies are proposed...technique of parallel computing [4] to propose a process of pa- rallel MEG for substantially reducing the computational overhead of discovering shapelet

  16. 7 CFR 28.35 - Method of classification.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Method of classification. 28.35 Section 28.35 Agriculture Regulations of the Department of Agriculture AGRICULTURAL MARKETING SERVICE (Standards... Classification § 28.35 Method of classification. All cotton samples shall be classified on the basis of the...

  17. Molecular Classification of Grade 3 Endometrioid Endometrial Cancers Identifies Distinct Prognostic Subgroups.

    PubMed

    Bosse, Tjalling; Nout, Remi A; McAlpine, Jessica N; McConechy, Melissa K; Britton, Heidi; Hussein, Yaser R; Gonzalez, Carlene; Ganesan, Raji; Steele, Jane C; Harrison, Beth T; Oliva, Esther; Vidal, August; Matias-Guiu, Xavier; Abu-Rustum, Nadeem R; Levine, Douglas A; Gilks, C Blake; Soslow, Robert A

    2018-05-01

    Our aim was to investigate whether molecular classification can be used to refine prognosis in grade 3 endometrial endometrioid carcinomas (EECs). Grade 3 EECs were classified into 4 subgroups: p53 abnormal, based on mutant-like immunostaining (p53abn); MMR deficient, based on loss of mismatch repair protein expression (MMRd); presence of POLE exonuclease domain hotspot mutation (POLE); no specific molecular profile (NSMP), in which none of these aberrations were present. Overall survival (OS) and recurrence-free survival (RFS) rates were compared using the Kaplan-Meier method (Log-rank test) and univariable and multivariable Cox proportional hazard models. In total, 381 patients were included. The median age was 66 years (range, 33 to 96 y). Federation Internationale de Gynecologie et d'Obstetrique stages (2009) were as follows: IA, 171 (44.9%); IB, 120 (31.5%); II, 24 (6.3%); III, 50 (13.1%); IV, 11 (2.9%). There were 49 (12.9%) POLE, 79 (20.7%) p53abn, 115 (30.2%) NSMP, and 138 (36.2%) MMRd tumors. Median follow-up of patients was 6.1 years (range, 0.2 to 17.0 y). Compared to patients with NSMP, patients with POLE mutant grade 3 EEC (OS: hazard ratio [HR], 0.36 [95% confidence interval, 0.18-0.70]; P=0.003; RFS: HR, 0.17 [0.05-0.54]; P=0.003) had a significantly better prognosis; patients with p53abn tumors had a significantly worse RFS (HR, 1.73 [1.09-2.74]; P=0.021); patients with MMRd tumors showed a trend toward better RFS. Estimated 5-year OS rates were as follows: POLE 89%, MMRd 75%, NSMP 69%, p53abn 55% (Log rank P=0.001). Five-year RFS rates were as follows: POLE 96%, MMRd 77%, NSMP 64%, p53abn 47% (P=0.000001), respectively. In a multivariable Cox model that included age and Federation Internationale de Gynecologie et d'Obstetrique stage, POLE and MMRd status remained independent prognostic factors for better RFS; p53 status was an independent prognostic factor for worse RFS. Molecular classification of grade 3 EECs reveals that these tumors are a mixture of molecular subtypes of endometrial carcinoma, rather than a homogeneous group. The addition of molecular markers identifies prognostic subgroups, with potential therapeutic implications.

  18. A Novel approach to monitor chlorophyll-a concentration using an adaptive model from MODIS data at 250 metres spatial resolution

    NASA Astrophysics Data System (ADS)

    El Alem, A.; Chokmani, K.; Laurion, I.; El Adlouni, S.

    2013-12-01

    Occurrence and extent of Harmful Algal Bloom (HAB) has increased in inland water bodies around the world. The appearance of these blooms reflects the advanced state of eutrophication of several aquatic systems caused by urban, agricultural, and industrial development. Algal blooms, especially those cyanobacterial origins, are capable to produce and release toxins, threatening human and animal health, quality of drinking water, and recreational water bodies. Conventional monitoring networks, based on infrequent sampling in a few fixed monitoring stations, cannot provide the information needed as HABs are spatially and temporally heterogeneous. Remote sensing represents an interesting alternative to provide the required spatial and temporal coverage. The usefulness of air-borne and satellite remote sensing data to detect HABs was demonstrated since three decades ago, and since several empirical and semi-empirical models, using satellite imagery, were developed to estimate chlorophyll-a concentration [Chl-a] as a proxy to detect bloom proliferations. However, most of those models presented several weaknesses that are generally linked to the range of [Chl-a] to be estimated. Indeed, models originally calibrated for high [Chl-a] fail to estimate low concentrations and vice versa. In this study, an adaptive model to estimate [Chl-a], spread over a wide range of concentrations, is developed for optically complex inland water bodies based on combination of water spectral response classification and three developed semi-empirical algorithms using a multivariate regression. Three distinct water types (low, medium, and high [Chl-a]) are first identified using the Classification and Regression Tree (CART) method performed on remote sensing reflectance over a dataset of 44 [Chl-a] samples collected from Lakes over Quebec province. Based on the water classification, a specific multivariate model to each water type is developed using the same dataset and the MODIS data at 250-m spatial resolution. By pre-clustering inland water bodies, the results were very interesting as the determination coefficients as well as the relative RMSE of the cross-validation were of 0.99, 0.98 and 0.95 and of 0.5%, 8% and 17% for high, medium, and low [Chl-a], respectively. On the other hand, the adaptive model reached a global success rate of 92% using an independent, semi-qualitative, [Chl-a] samples collected over more than twenty inland water bodies for the years 2009 and 2010 over the Quebec province.

  19. Rapid differentiation of Chinese hop varieties (Humulus lupulus) using volatile fingerprinting by HS-SPME-GC-MS combined with multivariate statistical analysis.

    PubMed

    Liu, Zechang; Wang, Liping; Liu, Yumei

    2018-01-18

    Hops impart flavor to beer, with the volatile components characterizing the various hop varieties and qualities. Fingerprinting, especially flavor fingerprinting, is often used to identify 'flavor products' because inconsistencies in the description of flavor may lead to an incorrect definition of beer quality. Compared to flavor fingerprinting, volatile fingerprinting is simpler and easier. We performed volatile fingerprinting using head space-solid phase micro-extraction gas chromatography-mass spectrometry combined with similarity analysis and principal component analysis (PCA) for evaluating and distinguishing between three major Chinese hops. Eighty-four volatiles were identified, which were classified into seven categories. Volatile fingerprinting based on similarity analysis did not yield any obvious result. By contrast, hop varieties and qualities were identified using volatile fingerprinting based on PCA. The potential variables explained the variance in the three hop varieties. In addition, the dendrogram and principal component score plot described the differences and classifications of hops. Volatile fingerprinting plus multivariate statistical analysis can rapidly differentiate between the different varieties and qualities of the three major Chinese hops. Furthermore, this method can be used as a reference in other fields. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.

  20. Multivariate Analysis as a Method for Evaluating the Conceptual Perceptions of Korean Medicine Students regarding Phlegm Pattern

    PubMed Central

    Kim, Hyungsuk; Park, Young-Jae; Park, Young-Bae

    2013-01-01

    Individuals may perceive the concepts in Korean medicine pattern classification differently because it is performed according to the integration of a variety of information. Therefore, analysis about individual perspective is very important for examining the cross-sectional perspective state of Korean medicine concepts and developing both the clinical guideline including diagnosis and the curriculum of Korean medicine colleges. Moreover, because this conceptual difference is thought to begin with college education, it is worthwhile to observe students' viewpoints. So, we suggested multivariate analysis to explore the dimensional structure of Korean medicine students' conceptual perceptions regarding phlegm pattern. We surveyed 326 students divided into 5 groups based on their year of study. Data were analyzed using multidimensional scaling and factor analysis. Within-group difference was the smallest for third-year students, who have received Korean medicine education in full for the first time. With the exception of first-year students, the conceptual map revealed that each group's mean perceptions of phlegm pattern were distributed in almost linear fashion. To determine the effect of education, we investigated the preference rankings and scores of each symptom. We also extracted factors to identify latent variables and to compare the between-group conceptual characteristics regarding phlegm pattern. PMID:24062789

  1. Stereotactic Body Radiation Therapy and the Influence of Chemotherapy on Overall Survival for Large (≥5 Centimeter) Non-Small Cell Lung Cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Verma, Vivek; McMillan, Matthew T.; Grover, Surbhi

    2017-01-01

    Purpose: Stereotactic body radiation therapy (SBRT) for ≥5 cm lesions is poorly defined, largely owing to the low sample sizes in existing studies. The present analysis examined the SBRT outcomes and assessed the effect of chemotherapy in this population. Methods and Materials: The National Cancer Data Base was queried for primary non-small cell lung cancer ≥5 cm treated with SBRT (≤10 fractions). Patient, tumor, and treatment parameters were extracted. The primary outcome was overall survival (OS). Statistical methods involved Kaplan-Meier analysis and multivariable Cox proportional hazards modeling. Results: From 2004 to 2012, data from 201 patients were analyzed. The median follow-upmore » was 41.1 months. The median tumor size was 5.5 cm (interquartile range 5.0-6.0), with cT2a, cT2b, and cT3 disease in 24.9%, 53.2%, and 21.9%, respectively. The median total SBRT dose and fractionation was 50 Gy in 4 fractions, and 92.5% of the patients underwent SBRT with ≤5 fractions. The median OS was 25.1 months. Of the 201 patients, 15% received chemotherapy. The receipt of chemotherapy was associated with longer OS (median 30.6 vs 23.4 months; P=.027). On multivariable analysis, worse OS was seen with increasing age (hazard ratio [HR] 1.03; P=.012), poorly differentiated tumors (HR 2.06; P=.049), and T3 classification (HR 2.13; P=.005). On multivariable analysis, chemotherapy remained independently associated with improved OS (HR 0.57; P=.039). Conclusions: SBRT has utility in the setting of tumors ≥5 cm, with chemotherapy associated with improved OS in this subset. These hypothesis-generating data now raise the necessity of performing prospective analyses to determine whether chemotherapy confers outcome benefits after SBRT.« less

  2. A Comparison of Two-Group Classification Methods

    ERIC Educational Resources Information Center

    Holden, Jocelyn E.; Finch, W. Holmes; Kelley, Ken

    2011-01-01

    The statistical classification of "N" individuals into "G" mutually exclusive groups when the actual group membership is unknown is common in the social and behavioral sciences. The results of such classification methods often have important consequences. Among the most common methods of statistical classification are linear discriminant analysis,…

  3. A multivariate model and statistical method for validating tree grade lumber yield equations

    Treesearch

    Donald W. Seegrist

    1975-01-01

    Lumber yields within lumber grades can be described by a multivariate linear model. A method for validating lumber yield prediction equations when there are several tree grades is presented. The method is based on multivariate simultaneous test procedures.

  4. A new hierarchical method for inter-patient heartbeat classification using random projections and RR intervals

    PubMed Central

    2014-01-01

    Background The inter-patient classification schema and the Association for the Advancement of Medical Instrumentation (AAMI) standards are important to the construction and evaluation of automated heartbeat classification systems. The majority of previously proposed methods that take the above two aspects into consideration use the same features and classification method to classify different classes of heartbeats. The performance of the classification system is often unsatisfactory with respect to the ventricular ectopic beat (VEB) and supraventricular ectopic beat (SVEB). Methods Based on the different characteristics of VEB and SVEB, a novel hierarchical heartbeat classification system was constructed. This was done in order to improve the classification performance of these two classes of heartbeats by using different features and classification methods. First, random projection and support vector machine (SVM) ensemble were used to detect VEB. Then, the ratio of the RR interval was compared to a predetermined threshold to detect SVEB. The optimal parameters for the classification models were selected on the training set and used in the independent testing set to assess the final performance of the classification system. Meanwhile, the effect of different lead configurations on the classification results was evaluated. Results Results showed that the performance of this classification system was notably superior to that of other methods. The VEB detection sensitivity was 93.9% with a positive predictive value of 90.9%, and the SVEB detection sensitivity was 91.1% with a positive predictive value of 42.2%. In addition, this classification process was relatively fast. Conclusions A hierarchical heartbeat classification system was proposed based on the inter-patient data division to detect VEB and SVEB. It demonstrated better classification performance than existing methods. It can be regarded as a promising system for detecting VEB and SVEB of unknown patients in clinical practice. PMID:24981916

  5. Metabolic profiling using HPLC allows classification of drugs according to their mechanisms of action in HL-1 cardiomyocytes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Strigun, Alexander; Wahrheit, Judith; Beckers, Simone

    Along with hepatotoxicity, cardiotoxic side effects remain one of the major reasons for drug withdrawals and boxed warnings. Prediction methods for cardiotoxicity are insufficient. High content screening comprising of not only electrophysiological characterization but also cellular molecular alterations are expected to improve the cardiotoxicity prediction potential. Metabolomic approaches recently have become an important focus of research in pharmacological testing and prediction. In this study, the culture medium supernatants from HL-1 cardiomyocytes after exposure to drugs from different classes (analgesics, antimetabolites, anthracyclines, antihistamines, channel blockers) were analyzed to determine specific metabolic footprints in response to the tested drugs. Since most drugsmore » influence energy metabolism in cardiac cells, the metabolite 'sub-profile' consisting of glucose, lactate, pyruvate and amino acids was considered. These metabolites were quantified using HPLC in samples after exposure of cells to test compounds of the respective drug groups. The studied drug concentrations were selected from concentration response curves for each drug. The metabolite profiles were randomly split into training/validation and test set; and then analysed using multivariate statistics (principal component analysis and discriminant analysis). Discriminant analysis resulted in clustering of drugs according to their modes of action. After cross validation and cross model validation, the underlying training data were able to predict 50%-80% of conditions to the correct classification group. We show that HPLC based characterisation of known cell culture medium components is sufficient to predict a drug's potential classification according to its mode of action.« less

  6. The neuropsychology of male adults with high-functioning autism or asperger syndrome.

    PubMed

    Wilson, C Ellie; Happé, Francesca; Wheelwright, Sally J; Ecker, Christine; Lombardo, Michael V; Johnston, Patrick; Daly, Eileen; Murphy, Clodagh M; Spain, Debbie; Lai, Meng-Chuan; Chakrabarti, Bhismadev; Sauter, Disa A; Baron-Cohen, Simon; Murphy, Declan G M

    2014-10-01

    Autism Spectrum Disorder (ASD) is diagnosed on the basis of behavioral symptoms, but cognitive abilities may also be useful in characterizing individuals with ASD. One hundred seventy-eight high-functioning male adults, half with ASD and half without, completed tasks assessing IQ, a broad range of cognitive skills, and autistic and comorbid symptomatology. The aims of the study were, first, to determine whether significant differences existed between cases and controls on cognitive tasks, and whether cognitive profiles, derived using a multivariate classification method with data from multiple cognitive tasks, could distinguish between the two groups. Second, to establish whether cognitive skill level was correlated with degree of autistic symptom severity, and third, whether cognitive skill level was correlated with degree of comorbid psychopathology. Fourth, cognitive characteristics of individuals with Asperger Syndrome (AS) and high-functioning autism (HFA) were compared. After controlling for IQ, ASD and control groups scored significantly differently on tasks of social cognition, motor performance, and executive function (P's < 0.05). To investigate cognitive profiles, 12 variables were entered into a support vector machine (SVM), which achieved good classification accuracy (81%) at a level significantly better than chance (P < 0.0001). After correcting for multiple correlations, there were no significant associations between cognitive performance and severity of either autistic or comorbid symptomatology. There were no significant differences between AS and HFA groups on the cognitive tasks. Cognitive classification models could be a useful aid to the diagnostic process when used in conjunction with other data sources-including clinical history. © 2014 International Society for Autism Research, Wiley Periodicals, Inc.

  7. Automated Decision Tree Classification of Corneal Shape

    PubMed Central

    Twa, Michael D.; Parthasarathy, Srinivasan; Roberts, Cynthia; Mahmoud, Ashraf M.; Raasch, Thomas W.; Bullimore, Mark A.

    2011-01-01

    Purpose The volume and complexity of data produced during videokeratography examinations present a challenge of interpretation. As a consequence, results are often analyzed qualitatively by subjective pattern recognition or reduced to comparisons of summary indices. We describe the application of decision tree induction, an automated machine learning classification method, to discriminate between normal and keratoconic corneal shapes in an objective and quantitative way. We then compared this method with other known classification methods. Methods The corneal surface was modeled with a seventh-order Zernike polynomial for 132 normal eyes of 92 subjects and 112 eyes of 71 subjects diagnosed with keratoconus. A decision tree classifier was induced using the C4.5 algorithm, and its classification performance was compared with the modified Rabinowitz–McDonnell index, Schwiegerling’s Z3 index (Z3), Keratoconus Prediction Index (KPI), KISA%, and Cone Location and Magnitude Index using recommended classification thresholds for each method. We also evaluated the area under the receiver operator characteristic (ROC) curve for each classification method. Results Our decision tree classifier performed equal to or better than the other classifiers tested: accuracy was 92% and the area under the ROC curve was 0.97. Our decision tree classifier reduced the information needed to distinguish between normal and keratoconus eyes using four of 36 Zernike polynomial coefficients. The four surface features selected as classification attributes by the decision tree method were inferior elevation, greater sagittal depth, oblique toricity, and trefoil. Conclusions Automated decision tree classification of corneal shape through Zernike polynomials is an accurate quantitative method of classification that is interpretable and can be generated from any instrument platform capable of raw elevation data output. This method of pattern classification is extendable to other classification problems. PMID:16357645

  8. Epithelial ovarian carcinoma diagnosis by desorption electrospray ionization mass spectrometry imaging

    PubMed Central

    Dória, Maria Luisa; McKenzie, James S.; Mroz, Anna; Phelps, David L.; Speller, Abigail; Rosini, Francesca; Strittmatter, Nicole; Golf, Ottmar; Veselkov, Kirill; Brown, Robert; Ghaem-Maghami, Sadaf; Takats, Zoltan

    2016-01-01

    Ovarian cancer is highly prevalent among European women, and is the leading cause of gynaecological cancer death. Current histopathological diagnoses of tumour severity are based on interpretation of, for example, immunohistochemical staining. Desorption electrospray mass spectrometry imaging (DESI-MSI) generates spatially resolved metabolic profiles of tissues and supports an objective investigation of tumour biology. In this study, various ovarian tissue types were analysed by DESI-MSI and co-registered with their corresponding haematoxylin and eosin (H&E) stained images. The mass spectral data reveal tissue type-dependent lipid profiles which are consistent across the n = 110 samples (n = 107 patients) used in this study. Multivariate statistical methods were used to classify samples and identify molecular features discriminating between tissue types. Three main groups of samples (epithelial ovarian carcinoma, borderline ovarian tumours, normal ovarian stroma) were compared as were the carcinoma histotypes (serous, endometrioid, clear cell). Classification rates >84% were achieved for all analyses, and variables differing statistically between groups were determined and putatively identified. The changes noted in various lipid types help to provide a context in terms of tumour biochemistry. The classification of unseen samples demonstrates the capability of DESI-MSI to characterise ovarian samples and to overcome existing limitations in classical histopathology. PMID:27976698

  9. Delirium, Frailty, and Fast-Track Surgery in Oncogeriatrics: Is There a Link?

    PubMed Central

    Monacelli, Fiammetta; Signori, Alessio; Prefumo, Matteo; Giannotti, Chiara; Nencioni, Alessio; Romairone, Emanuele; Scabini, Stefano; Odetti, Patrizio

    2018-01-01

    Background/Aims Postoperative delirium (POD) is more frequent in elderly patients undergoing major cancer surgery. The interplay between individual clinical vulnerability and a series of perioperative factors seems to play a relevant role. Surgery is the first-line treatment option for cancer, and fast-track surgery (FTS) has been documented to decrease postoperative complications. The study sought to assess, after comprehensive geriatric assessment (CGA) and frailty stratification (Rockwood 40 items index), which perioperative parameters were predictive of POD development in elderly patients undergoing FTS for colorectal cancer. Methods A total of 107 consecutive subjects admitted for elective colorectal FTS were enrolled. All patients underwent CGA, frailly stratification, Timed up & go (TUG) test, 4AT test for delirium screening, anesthesiologists physical status classification, and Dindo-Clavien classification. Results The incidence of POD was 12.3%. Patients’ prevalent clinical phenotype was pre-frail. The multivariate analysis indicated physical performance (TUG in seconds) as the most significant predictor of POD for each second of increase. Conclusions Only few procedure-specific studies have examined the impact of FTS for colorectal cancer on POD. This is the first study to investigate the risk factors for POD, in a vulnerable octogenarian oncogeriatric population submitted to FTS surgery and frailty stratification. PMID:29515621

  10. An integrated analysis for determining the geographical origin of medicinal herbs using ICP-AES/ICP-MS and (1)H NMR analysis.

    PubMed

    Kwon, Yong-Kook; Bong, Yeon-Sik; Lee, Kwang-Sik; Hwang, Geum-Sook

    2014-10-15

    ICP-MS and (1)H NMR are commonly used to determine the geographical origin of food and crops. In this study, data from multielemental analysis performed by ICP-AES/ICP-MS and metabolomic data obtained from (1)H NMR were integrated to improve the reliability of determining the geographical origin of medicinal herbs. Astragalus membranaceus and Paeonia albiflora with different origins in Korea and China were analysed by (1)H NMR and ICP-AES/ICP-MS, and an integrated multivariate analysis was performed to characterise the differences between their origins. Four classification methods were applied: linear discriminant analysis (LDA), k-nearest neighbour classification (KNN), support vector machines (SVM), and partial least squares-discriminant analysis (PLS-DA). Results were compared using leave-one-out cross-validation and external validation. The integration of multielemental and metabolomic data was more suitable for determining geographical origin than the use of each individual data set alone. The integration of the two analytical techniques allowed diverse environmental factors such as climate and geology, to be considered. Our study suggests that an appropriate integration of different types of analytical data is useful for determining the geographical origin of food and crops with a high degree of reliability. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. Structure/activity relationships for biodegradability and their role in environmental assessment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boethling, R.S.

    1994-12-31

    Assessment of biodegradability is an important part of the review process for both new and existing chemicals under the Toxic Substances Control Act. It is often necessary to estimate biodegradability because experimental data are unavailable. Structure/biodegradability relationships (SBR) are a means to this end. Quantitative SBR have been developed, but this approach has not been very useful because they apply only to a few narrowly defined classes of chemicals. In response to the need for more widely applicable methods, multivariate analysis has been used to develop biodegradability classification models. For example, recent efforts have produced four new models. Two calculatemore » the probability of rapid biodegradation and can be used for classification; the other two models allow semi-quantitative estimation of primary and ultimate biodegradation rates. All are based on multiple regressions against 36 preselected substructures plus molecular weight. Such efforts have been fairly successful by statistical criteria, but in general are hampered by a lack of large and consistent datasets. Knowledge-based expert systems may represent the next step in the evolution of SBR. In principle such systems need not be as severely limited by imperfect datasets. However, the codification of expert knowledge and reasoning is a critical prerequisite. Results of knowledge acquisition exercises and modeling based on them will also be described.« less

  12. Raman spectral post-processing for oral tissue discrimination – a step for an automatized diagnostic system

    PubMed Central

    Carvalho, Luis Felipe C. S.; Nogueira, Marcelo Saito; Neto, Lázaro P. M.; Bhattacharjee, Tanmoy T.; Martin, Airton A.

    2017-01-01

    Most oral injuries are diagnosed by histopathological analysis of a biopsy, which is an invasive procedure and does not give immediate results. On the other hand, Raman spectroscopy is a real time and minimally invasive analytical tool with potential for the diagnosis of diseases. The potential for diagnostics can be improved by data post-processing. Hence, this study aims to evaluate the performance of preprocessing steps and multivariate analysis methods for the classification of normal tissues and pathological oral lesion spectra. A total of 80 spectra acquired from normal and abnormal tissues using optical fiber Raman-based spectroscopy (OFRS) were subjected to PCA preprocessing in the z-scored data set, and the KNN (K-nearest neighbors), J48 (unpruned C4.5 decision tree), RBF (radial basis function), RF (random forest), and MLP (multilayer perceptron) classifiers at WEKA software (Waikato environment for knowledge analysis), after area normalization or maximum intensity normalization. Our results suggest the best classification was achieved by using maximum intensity normalization followed by MLP. Based on these results, software for automated analysis can be generated and validated using larger data sets. This would aid quick comprehension of spectroscopic data and easy diagnosis by medical practitioners in clinical settings. PMID:29188115

  13. Raman spectral post-processing for oral tissue discrimination - a step for an automatized diagnostic system.

    PubMed

    Carvalho, Luis Felipe C S; Nogueira, Marcelo Saito; Neto, Lázaro P M; Bhattacharjee, Tanmoy T; Martin, Airton A

    2017-11-01

    Most oral injuries are diagnosed by histopathological analysis of a biopsy, which is an invasive procedure and does not give immediate results. On the other hand, Raman spectroscopy is a real time and minimally invasive analytical tool with potential for the diagnosis of diseases. The potential for diagnostics can be improved by data post-processing. Hence, this study aims to evaluate the performance of preprocessing steps and multivariate analysis methods for the classification of normal tissues and pathological oral lesion spectra. A total of 80 spectra acquired from normal and abnormal tissues using optical fiber Raman-based spectroscopy (OFRS) were subjected to PCA preprocessing in the z-scored data set, and the KNN (K-nearest neighbors), J48 (unpruned C4.5 decision tree), RBF (radial basis function), RF (random forest), and MLP (multilayer perceptron) classifiers at WEKA software (Waikato environment for knowledge analysis), after area normalization or maximum intensity normalization. Our results suggest the best classification was achieved by using maximum intensity normalization followed by MLP. Based on these results, software for automated analysis can be generated and validated using larger data sets. This would aid quick comprehension of spectroscopic data and easy diagnosis by medical practitioners in clinical settings.

  14. Pharmacologic attenuation of cross-modal sensory augmentation within the chronic pain insula

    PubMed Central

    Harte, Steven E.; Ichesco, Eric; Hampson, Johnson P.; Peltier, Scott J.; Schmidt-Wilcke, Tobias; Clauw, Daniel J.; Harris, Richard E.

    2016-01-01

    Abstract Pain can be elicited through all mammalian sensory pathways yet cross-modal sensory integration, and its relationship to clinical pain, is largely unexplored. Centralized chronic pain conditions such as fibromyalgia are often associated with symptoms of multisensory hypersensitivity. In this study, female patients with fibromyalgia demonstrated cross-modal hypersensitivity to visual and pressure stimuli compared with age- and sex-matched healthy controls. Functional magnetic resonance imaging revealed that insular activity evoked by an aversive level of visual stimulation was associated with the intensity of fibromyalgia pain. Moreover, attenuation of this insular activity by the analgesic pregabalin was accompanied by concomitant reductions in clinical pain. A multivariate classification method using support vector machines (SVM) applied to visual-evoked brain activity distinguished patients with fibromyalgia from healthy controls with 82% accuracy. A separate SVM classification of treatment effects on visual-evoked activity reliably identified when patients were administered pregabalin as compared with placebo. Both SVM analyses identified significant weights within the insular cortex during aversive visual stimulation. These data suggest that abnormal integration of multisensory and pain pathways within the insula may represent a pathophysiological mechanism in some chronic pain conditions and that insular response to aversive visual stimulation may have utility as a marker for analgesic drug development. PMID:27101425

  15. Impact of esophageal invasion on clinicopathological characteristics and long-term outcome of adenocarcinoma of the subcardia.

    PubMed

    Tokunaga, Masanori; Tanizawa, Yutaka; Bando, Etsuro; Kawamura, Taiichi; Tsubosa, Yasuhiro; Terashima, Masanori

    2012-12-01

    A different classification system was used in the 7th edition of the TNM classification for adenocarcinoma of the subcardia either with or without esophageal invasion. The aim of this study was to clarify the clinicopathological and survival impact of esophageal invasion. The present study included 351 patients who underwent gastrectomy for adenocarcinoma located within 5 cm of the esophagogastric junction. The clinicopathological characteristics and survival curves were compared between patients with esophageal invasion [E (+) group, n = 125] and without esophageal invasion [E (-) group, n = 226]. Patients in the E (+) group had more advanced disease. The 5-year survival rate following macroscopic curative resection was significantly better in the E (-) group (80.8%) than in the E (+) (48.7%, P < 0.001), even after stratification by the pathological stage and nodal status. Multivariate analysis identified esophageal invasion (hazard ratio; 3.323, 95% confidential interval; 1.815-6.082) as one of the independent prognostic factors. Esophageal invasion affected the clinicopathological characteristics and long-term outcome of patients. Further study is necessary to clarify whether patients with esophageal invasion should be classified using the system for esophageal cancer or by another method. Copyright © 2012 Wiley Periodicals, Inc.

  16. Multivariate Methods for Meta-Analysis of Genetic Association Studies.

    PubMed

    Dimou, Niki L; Pantavou, Katerina G; Braliou, Georgia G; Bagos, Pantelis G

    2018-01-01

    Multivariate meta-analysis of genetic association studies and genome-wide association studies has received a remarkable attention as it improves the precision of the analysis. Here, we review, summarize and present in a unified framework methods for multivariate meta-analysis of genetic association studies and genome-wide association studies. Starting with the statistical methods used for robust analysis and genetic model selection, we present in brief univariate methods for meta-analysis and we then scrutinize multivariate methodologies. Multivariate models of meta-analysis for a single gene-disease association studies, including models for haplotype association studies, multiple linked polymorphisms and multiple outcomes are discussed. The popular Mendelian randomization approach and special cases of meta-analysis addressing issues such as the assumption of the mode of inheritance, deviation from Hardy-Weinberg Equilibrium and gene-environment interactions are also presented. All available methods are enriched with practical applications and methodologies that could be developed in the future are discussed. Links for all available software implementing multivariate meta-analysis methods are also provided.

  17. Method of Grassland Information Extraction Based on Multi-Level Segmentation and Cart Model

    NASA Astrophysics Data System (ADS)

    Qiao, Y.; Chen, T.; He, J.; Wen, Q.; Liu, F.; Wang, Z.

    2018-04-01

    It is difficult to extract grassland accurately by traditional classification methods, such as supervised method based on pixels or objects. This paper proposed a new method combing the multi-level segmentation with CART (classification and regression tree) model. The multi-level segmentation which combined the multi-resolution segmentation and the spectral difference segmentation could avoid the over and insufficient segmentation seen in the single segmentation mode. The CART model was established based on the spectral characteristics and texture feature which were excavated from training sample data. Xilinhaote City in Inner Mongolia Autonomous Region was chosen as the typical study area and the proposed method was verified by using visual interpretation results as approximate truth value. Meanwhile, the comparison with the nearest neighbor supervised classification method was obtained. The experimental results showed that the total precision of classification and the Kappa coefficient of the proposed method was 95 % and 0.9, respectively. However, the total precision of classification and the Kappa coefficient of the nearest neighbor supervised classification method was 80 % and 0.56, respectively. The result suggested that the accuracy of classification proposed in this paper was higher than the nearest neighbor supervised classification method. The experiment certificated that the proposed method was an effective extraction method of grassland information, which could enhance the boundary of grassland classification and avoid the restriction of grassland distribution scale. This method was also applicable to the extraction of grassland information in other regions with complicated spatial features, which could avoid the interference of woodland, arable land and water body effectively.

  18. Non-destructive geographical traceability of sea cucumber (Apostichopus japonicus) using near infrared spectroscopy combined with chemometric methods

    PubMed Central

    Cai, Rui; Wang, Shisheng; Tang, Bo; Li, Yueqing; Zhao, Weijie

    2018-01-01

    Sea cucumber is the major tonic seafood worldwide, and geographical origin traceability is an important part of its quality and safety control. In this work, a non-destructive method for origin traceability of sea cucumber (Apostichopus japonicus) from northern China Sea and East China Sea using near infrared spectroscopy (NIRS) and multivariate analysis methods was proposed. Total fat contents of 189 fresh sea cucumber samples were determined and partial least-squares (PLS) regression was used to establish the quantitative NIRS model. The ordered predictor selection algorithm was performed to select feasible wavelength regions for the construction of PLS and identification models. The identification model was developed by principal component analysis combined with Mahalanobis distance and scaling to the first range algorithms. In the test set of the optimum PLS models, the root mean square error of prediction was 0.45, and correlation coefficient was 0.90. The correct classification rates of 100% were obtained in both identification calibration model and test model. The overall results indicated that NIRS method combined with chemometric analysis was a suitable tool for origin traceability and identification of fresh sea cucumber samples from nine origins in China. PMID:29410795

  19. Non-destructive geographical traceability of sea cucumber (Apostichopus japonicus) using near infrared spectroscopy combined with chemometric methods.

    PubMed

    Guo, Xiuhan; Cai, Rui; Wang, Shisheng; Tang, Bo; Li, Yueqing; Zhao, Weijie

    2018-01-01

    Sea cucumber is the major tonic seafood worldwide, and geographical origin traceability is an important part of its quality and safety control. In this work, a non-destructive method for origin traceability of sea cucumber ( Apostichopus japonicus ) from northern China Sea and East China Sea using near infrared spectroscopy (NIRS) and multivariate analysis methods was proposed. Total fat contents of 189 fresh sea cucumber samples were determined and partial least-squares (PLS) regression was used to establish the quantitative NIRS model. The ordered predictor selection algorithm was performed to select feasible wavelength regions for the construction of PLS and identification models. The identification model was developed by principal component analysis combined with Mahalanobis distance and scaling to the first range algorithms. In the test set of the optimum PLS models, the root mean square error of prediction was 0.45, and correlation coefficient was 0.90. The correct classification rates of 100% were obtained in both identification calibration model and test model. The overall results indicated that NIRS method combined with chemometric analysis was a suitable tool for origin traceability and identification of fresh sea cucumber samples from nine origins in China.

  20. Urban Image Classification: Per-Pixel Classifiers, Sub-Pixel Analysis, Object-Based Image Analysis, and Geospatial Methods. 10; Chapter

    NASA Technical Reports Server (NTRS)

    Myint, Soe W.; Mesev, Victor; Quattrochi, Dale; Wentz, Elizabeth A.

    2013-01-01

    Remote sensing methods used to generate base maps to analyze the urban environment rely predominantly on digital sensor data from space-borne platforms. This is due in part from new sources of high spatial resolution data covering the globe, a variety of multispectral and multitemporal sources, sophisticated statistical and geospatial methods, and compatibility with GIS data sources and methods. The goal of this chapter is to review the four groups of classification methods for digital sensor data from space-borne platforms; per-pixel, sub-pixel, object-based (spatial-based), and geospatial methods. Per-pixel methods are widely used methods that classify pixels into distinct categories based solely on the spectral and ancillary information within that pixel. They are used for simple calculations of environmental indices (e.g., NDVI) to sophisticated expert systems to assign urban land covers. Researchers recognize however, that even with the smallest pixel size the spectral information within a pixel is really a combination of multiple urban surfaces. Sub-pixel classification methods therefore aim to statistically quantify the mixture of surfaces to improve overall classification accuracy. While within pixel variations exist, there is also significant evidence that groups of nearby pixels have similar spectral information and therefore belong to the same classification category. Object-oriented methods have emerged that group pixels prior to classification based on spectral similarity and spatial proximity. Classification accuracy using object-based methods show significant success and promise for numerous urban 3 applications. Like the object-oriented methods that recognize the importance of spatial proximity, geospatial methods for urban mapping also utilize neighboring pixels in the classification process. The primary difference though is that geostatistical methods (e.g., spatial autocorrelation methods) are utilized during both the pre- and post-classification steps. Within this chapter, each of the four approaches is described in terms of scale and accuracy classifying urban land use and urban land cover; and for its range of urban applications. We demonstrate the overview of four main classification groups in Figure 1 while Table 1 details the approaches with respect to classification requirements and procedures (e.g., reflectance conversion, steps before training sample selection, training samples, spatial approaches commonly used, classifiers, primary inputs for classification, output structures, number of output layers, and accuracy assessment). The chapter concludes with a brief summary of the methods reviewed and the challenges that remain in developing new classification methods for improving the efficiency and accuracy of mapping urban areas.

  1. Effective classification of the prevalence of Schistosoma mansoni.

    PubMed

    Mitchell, Shira A; Pagano, Marcello

    2012-12-01

    To present an effective classification method based on the prevalence of Schistosoma mansoni in the community. We created decision rules (defined by cut-offs for number of positive slides), which account for imperfect sensitivity, both with a simple adjustment of fixed sensitivity and with a more complex adjustment of changing sensitivity with prevalence. To reduce screening costs while maintaining accuracy, we propose a pooled classification method. To estimate sensitivity, we use the De Vlas model for worm and egg distributions. We compare the proposed method with the standard method to investigate differences in efficiency, measured by number of slides read, and accuracy, measured by probability of correct classification. Modelling varying sensitivity lowers the lower cut-off more significantly than the upper cut-off, correctly classifying regions as moderate rather than lower, thus receiving life-saving treatment. The classification method goes directly to classification on the basis of positive pools, avoiding having to know sensitivity to estimate prevalence. For model parameter values describing worm and egg distributions among children, the pooled method with 25 slides achieves an expected 89.9% probability of correct classification, whereas the standard method with 50 slides achieves 88.7%. Among children, it is more efficient and more accurate to use the pooled method for classification of S. mansoni prevalence than the current standard method. © 2012 Blackwell Publishing Ltd.

  2. Multivariate adaptive regression splines analysis to predict biomarkers of spontaneous preterm birth.

    PubMed

    Menon, Ramkumar; Bhat, Geeta; Saade, George R; Spratt, Heidi

    2014-04-01

    To develop classification models of demographic/clinical factors and biomarker data from spontaneous preterm birth in African Americans and Caucasians. Secondary analysis of biomarker data using multivariate adaptive regression splines (MARS), a supervised machine learning algorithm method. Analysis of data on 36 biomarkers from 191 women was reduced by MARS to develop predictive models for preterm birth in African Americans and Caucasians. Maternal plasma, cord plasma collected at admission for preterm or term labor and amniotic fluid at delivery. Data were partitioned into training and testing sets. Variable importance, a relative indicator (0-100%) and area under the receiver operating characteristic curve (AUC) characterized results. Multivariate adaptive regression splines generated models for combined and racially stratified biomarker data. Clinical and demographic data did not contribute to the model. Racial stratification of data produced distinct models in all three compartments. In African Americans maternal plasma samples IL-1RA, TNF-α, angiopoietin 2, TNFRI, IL-5, MIP1α, IL-1β and TGF-α modeled preterm birth (AUC train: 0.98, AUC test: 0.86). In Caucasians TNFR1, ICAM-1 and IL-1RA contributed to the model (AUC train: 0.84, AUC test: 0.68). African Americans cord plasma samples produced IL-12P70, IL-8 (AUC train: 0.82, AUC test: 0.66). Cord plasma in Caucasians modeled IGFII, PDGFBB, TGF-β1 , IL-12P70, and TIMP1 (AUC train: 0.99, AUC test: 0.82). Amniotic fluid in African Americans modeled FasL, TNFRII, RANTES, KGF, IGFI (AUC train: 0.95, AUC test: 0.89) and in Caucasians, TNF-α, MCP3, TGF-β3 , TNFR1 and angiopoietin 2 (AUC train: 0.94 AUC test: 0.79). Multivariate adaptive regression splines models multiple biomarkers associated with preterm birth and demonstrated racial disparity. © 2014 Nordic Federation of Societies of Obstetrics and Gynecology.

  3. Deconstructing multivariate decoding for the study of brain function.

    PubMed

    Hebart, Martin N; Baker, Chris I

    2017-08-04

    Multivariate decoding methods were developed originally as tools to enable accurate predictions in real-world applications. The realization that these methods can also be employed to study brain function has led to their widespread adoption in the neurosciences. However, prior to the rise of multivariate decoding, the study of brain function was firmly embedded in a statistical philosophy grounded on univariate methods of data analysis. In this way, multivariate decoding for brain interpretation grew out of two established frameworks: multivariate decoding for predictions in real-world applications, and classical univariate analysis based on the study and interpretation of brain activation. We argue that this led to two confusions, one reflecting a mixture of multivariate decoding for prediction or interpretation, and the other a mixture of the conceptual and statistical philosophies underlying multivariate decoding and classical univariate analysis. Here we attempt to systematically disambiguate multivariate decoding for the study of brain function from the frameworks it grew out of. After elaborating these confusions and their consequences, we describe six, often unappreciated, differences between classical univariate analysis and multivariate decoding. We then focus on how the common interpretation of what is signal and noise changes in multivariate decoding. Finally, we use four examples to illustrate where these confusions may impact the interpretation of neuroimaging data. We conclude with a discussion of potential strategies to help resolve these confusions in interpreting multivariate decoding results, including the potential departure from multivariate decoding methods for the study of brain function. Copyright © 2017. Published by Elsevier Inc.

  4. Multivariate analysis in thoracic research.

    PubMed

    Mengual-Macenlle, Noemí; Marcos, Pedro J; Golpe, Rafael; González-Rivas, Diego

    2015-03-01

    Multivariate analysis is based in observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. The development of multivariate methods emerged to analyze large databases and increasingly complex data. Since the best way to represent the knowledge of reality is the modeling, we should use multivariate statistical methods. Multivariate methods are designed to simultaneously analyze data sets, i.e., the analysis of different variables for each person or object studied. Keep in mind at all times that all variables must be treated accurately reflect the reality of the problem addressed. There are different types of multivariate analysis and each one should be employed according to the type of variables to analyze: dependent, interdependence and structural methods. In conclusion, multivariate methods are ideal for the analysis of large data sets and to find the cause and effect relationships between variables; there is a wide range of analysis types that we can use.

  5. Correlative and multivariate analysis of increased radon concentration in underground laboratory.

    PubMed

    Maletić, Dimitrije M; Udovičić, Vladimir I; Banjanac, Radomir M; Joković, Dejan R; Dragić, Aleksandar L; Veselinović, Nikola B; Filipović, Jelena

    2014-11-01

    The results of analysis using correlative and multivariate methods, as developed for data analysis in high-energy physics and implemented in the Toolkit for Multivariate Analysis software package, of the relations of the variation of increased radon concentration with climate variables in shallow underground laboratory is presented. Multivariate regression analysis identified a number of multivariate methods which can give a good evaluation of increased radon concentrations based on climate variables. The use of the multivariate regression methods will enable the investigation of the relations of specific climate variable with increased radon concentrations by analysis of regression methods resulting in 'mapped' underlying functional behaviour of radon concentrations depending on a wide spectrum of climate variables. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  6. Multivariate missing data in hydrology - Review and applications

    NASA Astrophysics Data System (ADS)

    Ben Aissia, Mohamed-Aymen; Chebana, Fateh; Ouarda, Taha B. M. J.

    2017-12-01

    Water resources planning and management require complete data sets of a number of hydrological variables, such as flood peaks and volumes. However, hydrologists are often faced with the problem of missing data (MD) in hydrological databases. Several methods are used to deal with the imputation of MD. During the last decade, multivariate approaches have gained popularity in the field of hydrology, especially in hydrological frequency analysis (HFA). However, treating the MD remains neglected in the multivariate HFA literature whereas the focus has been mainly on the modeling component. For a complete analysis and in order to optimize the use of data, MD should also be treated in the multivariate setting prior to modeling and inference. Imputation of MD in the multivariate hydrological framework can have direct implications on the quality of the estimation. Indeed, the dependence between the series represents important additional information that can be included in the imputation process. The objective of the present paper is to highlight the importance of treating MD in multivariate hydrological frequency analysis by reviewing and applying multivariate imputation methods and by comparing univariate and multivariate imputation methods. An application is carried out for multiple flood attributes on three sites in order to evaluate the performance of the different methods based on the leave-one-out procedure. The results indicate that, the performance of imputation methods can be improved by adopting the multivariate setting, compared to mean substitution and interpolation methods, especially when using the copula-based approach.

  7. Multivariate meta-analysis: potential and promise.

    PubMed

    Jackson, Dan; Riley, Richard; White, Ian R

    2011-09-10

    The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day 'Multivariate meta-analysis' event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd.

  8. Multivariate meta-analysis: Potential and promise

    PubMed Central

    Jackson, Dan; Riley, Richard; White, Ian R

    2011-01-01

    The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day ‘Multivariate meta-analysis’ event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd. PMID:21268052

  9. Plant species classification using flower images—A comparative study of local feature representations

    PubMed Central

    Seeland, Marco; Rzanny, Michael; Alaqraa, Nedal; Wäldchen, Jana; Mäder, Patrick

    2017-01-01

    Steady improvements of image description methods induced a growing interest in image-based plant species classification, a task vital to the study of biodiversity and ecological sensitivity. Various techniques have been proposed for general object classification over the past years and several of them have already been studied for plant species classification. However, results of these studies are selective in the evaluated steps of a classification pipeline, in the utilized datasets for evaluation, and in the compared baseline methods. No study is available that evaluates the main competing methods for building an image representation on the same datasets allowing for generalized findings regarding flower-based plant species classification. The aim of this paper is to comparatively evaluate methods, method combinations, and their parameters towards classification accuracy. The investigated methods span from detection, extraction, fusion, pooling, to encoding of local features for quantifying shape and color information of flower images. We selected the flower image datasets Oxford Flower 17 and Oxford Flower 102 as well as our own Jena Flower 30 dataset for our experiments. Findings show large differences among the various studied techniques and that their wisely chosen orchestration allows for high accuracies in species classification. We further found that true local feature detectors in combination with advanced encoding methods yield higher classification results at lower computational costs compared to commonly used dense sampling and spatial pooling methods. Color was found to be an indispensable feature for high classification results, especially while preserving spatial correspondence to gray-level features. In result, our study provides a comprehensive overview of competing techniques and the implications of their main parameters for flower-based plant species classification. PMID:28234999

  10. Application of multivariate statistics to vestibular testing: discriminating between Meniere's disease and migraine associated dizziness

    NASA Technical Reports Server (NTRS)

    Dimitri, P. S.; Wall, C. 3rd; Oas, J. G.; Rauch, S. D.

    2001-01-01

    Meniere's disease (MD) and migraine associated dizziness (MAD) are two disorders that can have similar symptomatologies, but differ vastly in treatment. Vestibular testing is sometimes used to help differentiate between these disorders, but the inefficiency of a human interpreter analyzing a multitude of variables independently decreases its utility. Our hypothesis was that we could objectively discriminate between patients with MD and those with MAD using select variables from the vestibular test battery. Sinusoidal harmonic acceleration test variables were reduced to three vestibulo-ocular reflex physiologic parameters: gain, time constant, and asymmetry. A combination of these parameters plus a measurement of reduced vestibular response from caloric testing allowed us to achieve a joint classification rate of 91%, independent quadratic classification algorithm. Data from posturography were not useful for this type of differentiation. Overall, our classification function can be used as an unbiased assistant to discriminate between MD and MAD and gave us insight into the pathophysiologic differences between the two disorders.

  11. A Raman spectroscopy bio-sensor for tissue discrimination in surgical robotics.

    PubMed

    Ashok, Praveen C; Giardini, Mario E; Dholakia, Kishan; Sibbett, Wilson

    2014-01-01

    We report the development of a fiber-based Raman sensor to be used in tumour margin identification during endoluminal robotic surgery. Although this is a generic platform, the sensor we describe was adapted for the ARAKNES (Array of Robots Augmenting the KiNematics of Endoluminal Surgery) robotic platform. On such a platform, the Raman sensor is intended to identify ambiguous tissue margins during robot-assisted surgeries. To maintain sterility of the probe during surgical intervention, a disposable sleeve was specially designed. A straightforward user-compatible interface was implemented where a supervised multivariate classification algorithm was used to classify different tissue types based on specific Raman fingerprints so that it could be used without prior knowledge of spectroscopic data analysis. The protocol avoids inter-patient variability in data and the sensor system is not restricted for use in the classification of a particular tissue type. Representative tissue classification assessments were performed using this system on excised tissue. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Multiple-factor classification of a human-modified forest landscape in the Hsuehshan Mountain Range, Taiwan.

    PubMed

    Berg, Kevan J; Icyeh, Lahuy; Lin, Yih-Ren; Janz, Arnold; Newmaster, Steven G

    2016-12-01

    Human actions drive landscape heterogeneity, yet most ecosystem classifications omit the role of human influence. This study explores land use history to inform a classification of forestland of the Tayal Mrqwang indigenous people of Taiwan. Our objectives were to determine the extent to which human action drives landscape heterogeneity. We used interviews, field sampling, and multivariate analysis to relate vegetation patterns to environmental gradients and human modification across 76 sites. We identified eleven forest classes. In total, around 70 % of plots were at lower elevations and had a history of shifting cultivation, terrace farming, and settlement that resulted in alder, laurel, oak, pine, and bamboo stands. Higher elevation mixed conifer forests were least disturbed. Arboriculture and selective harvesting were drivers of other conspicuous forest patterns. The findings show that past land uses play a key role in shaping forests, which is important to consider when setting targets to guide forest management.

  13. Analysis of failure in patients with adenoid cystic carcinoma of the head and neck. An international collaborative study.

    PubMed

    Amit, Moran; Binenbaum, Yoav; Sharma, Kanika; Ramer, Naomi; Ramer, Ilana; Agbetoba, Abib; Miles, Brett; Yang, Xinjie; Lei, Delin; Bjøerndal, Kristine; Godballe, Christian; Mücke, Thomas; Wolff, Klaus-Dietrich; Fliss, Dan; Eckardt, André M; Copelli, Chiara; Sesenna, Enrico; Palmer, Frank; Patel, Snehal; Gil, Ziv

    2014-07-01

    Adenoid cystic carcinoma (ACC) is a locally aggressive tumor with a high prevalence of distant metastases. The purpose of this study was to identify independent predictors of outcome and to characterize the patterns of failure. An international retrospective review was conducted of 489 patients with ACC treated between 1985 and 2011 in 9 cancer centers worldwide. Five-year overall-survival (OS), disease-specific survival (DSS), and disease-free survival (DFS) were 76%, 80%, and 68%, respectively. Independent predictors of OS and DSS were: age, site, N classification, and presence of distant metastases. N classification, age, and bone invasion were associated with DFS on multivariate analysis. Age, tumor site, orbital invasion, and N classification were independent predictors of distant metastases. The clinical course of ACC is slow but persistent. Paranasal sinus origin is associated with the lowest distant metastases rate but with the poorest outcome. These prognostic estimates should be considered when tailoring treatment for patients with ACC. Copyright © 2013 Wiley Periodicals, Inc.

  14. Methods for presentation and display of multivariate data

    NASA Technical Reports Server (NTRS)

    Myers, R. H.

    1981-01-01

    Methods for the presentation and display of multivariate data are discussed with emphasis placed on the multivariate analysis of variance problems and the Hotelling T(2) solution in the two-sample case. The methods utilize the concepts of stepwise discrimination analysis and the computation of partial correlation coefficients.

  15. Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.

    PubMed

    Tohka, Jussi; Moradi, Elaheh; Huttunen, Heikki

    2016-07-01

    We present a comparative split-half resampling analysis of various data driven feature selection and classification methods for the whole brain voxel-based classification analysis of anatomical magnetic resonance images. We compared support vector machines (SVMs), with or without filter based feature selection, several embedded feature selection methods and stability selection. While comparisons of the accuracy of various classification methods have been reported previously, the variability of the out-of-training sample classification accuracy and the set of selected features due to independent training and test sets have not been previously addressed in a brain imaging context. We studied two classification problems: 1) Alzheimer's disease (AD) vs. normal control (NC) and 2) mild cognitive impairment (MCI) vs. NC classification. In AD vs. NC classification, the variability in the test accuracy due to the subject sample did not vary between different methods and exceeded the variability due to different classifiers. In MCI vs. NC classification, particularly with a large training set, embedded feature selection methods outperformed SVM-based ones with the difference in the test accuracy exceeding the test accuracy variability due to the subject sample. The filter and embedded methods produced divergent feature patterns for MCI vs. NC classification that suggests the utility of the embedded feature selection for this problem when linked with the good generalization performance. The stability of the feature sets was strongly correlated with the number of features selected, weakly correlated with the stability of classification accuracy, and uncorrelated with the average classification accuracy.

  16. New wideband radar target classification method based on neural learning and modified Euclidean metric

    NASA Astrophysics Data System (ADS)

    Jiang, Yicheng; Cheng, Ping; Ou, Yangkui

    2001-09-01

    A new method for target classification of high-range resolution radar is proposed. It tries to use neural learning to obtain invariant subclass features of training range profiles. A modified Euclidean metric based on the Box-Cox transformation technique is investigated for Nearest Neighbor target classification improvement. The classification experiments using real radar data of three different aircraft have demonstrated that classification error can reduce 8% if this method proposed in this paper is chosen instead of the conventional method. The results of this paper have shown that by choosing an optimized metric, it is indeed possible to reduce the classification error without increasing the number of samples.

  17. Prognostic value of the new International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society classification in stage IB lung adenocarcinoma.

    PubMed

    Xu, C-h; Wang, W; Wei, Y; Hu, H-d; Zou, J; Yan, J; Yu, L-k; Yang, R-s; Wang, Y

    2015-10-01

    Patients with pathological stage IB lung adenocarcinoma have a variable prognosis, even if received the same treatment. This study investigated the prognostic value of the new International Association for the Study of Lung Cancer, American Thoracic Society, and European Respiratory Society (IASLC/ATS/ERS) lung adenocarcinoma classification in resected stage IB lung adenocarcinoma. We identified 276 patients with pathological stage IB adenocarcinoma who had undergone surgical resection at the Nanjing Chest Hospital between 2005 and 2010. The histological subtypes of all patients were classified according to the 2011 IASLC/ATS/ERS international multidisciplinary lung adenocarcinoma classification. Kaplan-Meier and Cox regression analyses were used to analyze the correlation between the IASLC/ATS/ERS classification and patients' prognosis. Two hundred and seventy-six patients with pathological stage IB adenocarcinoma had an 86.2% 5-year overall survival (OS) and 80.4% 5-year disease-free survival (DFS). Patients with micropapillary and solid predominant tumors had a significantly worse OS and DFS as compared to those with other subtypes predominant tumors (p = 0.003 and 0.001). Multivariate analysis revealed that the new classification was an independent prognostic factor for both OS and DFS of pathological stage IB adenocarcinoma (p = 0.009 and 0.003). Our study revealed that the new IASLC/ATS/ERS classification was an independent prognostic factor of pathological stage IB adenocarcinoma. This new classification is valuable of screening out high risk patients to receive postoperative adjuvant therapy. Copyright © 2015. Published by Elsevier Ltd.

  18. Classification of broiler breast filets according to deboning time using near infrared spectroscopy and multivariate analysis

    USDA-ARS?s Scientific Manuscript database

    Chicken breast filets were deboned and NIR spectra were collected after 2, 4, and 24 hours. The deboning was performed on pairs of filets to minimize differences due only to the meat and not the deboning time (i.e. right at 2 hours, left at 24; right at 2, left at 4; right at 4, left at 24 hrs). The...

  19. Use of the Köppen-Trewartha climate classification to evaluate climatic refugia in statistically derived ecoregions for the People’s Republic of China

    Treesearch

    B. Baker; Henry Diaz; William Hargrove; Forrest Hoffman

    2010-01-01

    Changes in climate as projected by state-of-the-art climate models are likely to result in novel combinations of climate and topo-edaphic factors that will have substantial impacts on the distribution and persistence of natural vegetation and animal species. We have used multivariate techniques to quantify some of these changes; the...

  20. Biometrics from the carbon isotope ratio analysis of amino acids in human hair.

    PubMed

    Jackson, Glen P; An, Yan; Konstantynova, Kateryna I; Rashaid, Ayat H B

    2015-01-01

    This study compares and contrasts the ability to classify individuals into different grouping factors through either bulk isotope ratio analysis or amino-acid-specific isotope ratio analysis of human hair. Using LC-IRMS, we measured the isotope ratios of 14 amino acids in hair proteins independently, and leucine/isoleucine as a co-eluting pair, to provide 15 variables for classification. Multivariate analysis confirmed that the essential amino acids and non-essential amino acids were mostly independent variables in the classification rules, thereby enabling the separation of dietary factors of isotope intake from intrinsic or phenotypic factors of isotope fractionation. Multivariate analysis revealed at least two potential sources of non-dietary factors influencing the carbon isotope ratio values of the amino acids in human hair: body mass index (BMI) and age. These results provide evidence that compound-specific isotope ratio analysis has the potential to go beyond region-of-origin or geospatial movements of individuals-obtainable through bulk isotope measurements-to the provision of physical and characteristic traits about the individuals, such as age and BMI. Further development and refinement, for example to genetic, metabolic, disease and hormonal factors could ultimately be of great assistance in forensic and clinical casework. Copyright © 2014 Forensic Science Society. Published by Elsevier Ireland Ltd. All rights reserved.

Top