Sample records for multivariate classification techniques

  1. Multivariate classification of infrared spectra of cell and tissue samples

    DOEpatents

    Haaland, David M.; Jones, Howland D. T.; Thomas, Edward V.

    1997-01-01

    Multivariate classification techniques are applied to spectra from cell and tissue samples irradiated with infrared radiation to determine if the samples are normal or abnormal (cancerous). Mid and near infrared radiation can be used for in vivo and in vitro classifications using at least different wavelengths.

  2. Data analysis techniques

    NASA Technical Reports Server (NTRS)

    Park, Steve

    1990-01-01

    A large and diverse number of computational techniques are routinely used to process and analyze remotely sensed data. These techniques include: univariate statistics; multivariate statistics; principal component analysis; pattern recognition and classification; other multivariate techniques; geometric correction; registration and resampling; radiometric correction; enhancement; restoration; Fourier analysis; and filtering. Each of these techniques will be considered, in order.

  3. Estuarial fingerprinting through multidimensional fluorescence and multivariate analysis.

    PubMed

    Hall, Gregory J; Clow, Kerin E; Kenny, Jonathan E

    2005-10-01

    As part of a strategy for preventing the introduction of aquatic nuisance species (ANS) to U.S. estuaries, ballast water exchange (BWE) regulations have been imposed. Enforcing these regulations requires a reliable method for determining the port of origin of water in the ballast tanks of ships entering U.S. waters. This study shows that a three-dimensional fluorescence fingerprinting technique, excitation emission matrix (EEM) spectroscopy, holds great promise as a ballast water analysis tool. In our technique, EEMs are analyzed by multivariate classification and curve resolution methods, such as N-way partial least squares Regression-discriminant analysis (NPLS-DA) and parallel factor analysis (PARAFAC). We demonstrate that classification techniques can be used to discriminate among sampling sites less than 10 miles apart, encompassing Boston Harbor and two tributaries in the Mystic River Watershed. To our knowledge, this work is the first to use multivariate analysis to classify water as to location of origin. Furthermore, it is shown that curve resolution can show seasonal features within the multidimensional fluorescence data sets, which correlate with difficulty in classification.

  4. Forensic Discrimination of Latent Fingerprints Using Laser-Induced Breakdown Spectroscopy (LIBS) and Chemometric Approaches.

    PubMed

    Yang, Jun-Ho; Yoh, Jack J

    2018-01-01

    A novel technique is reported for separating overlapping latent fingerprints using chemometric approaches that combine laser-induced breakdown spectroscopy (LIBS) and multivariate analysis. The LIBS technique provides the capability of real time analysis and high frequency scanning as well as the data regarding the chemical composition of overlapping latent fingerprints. These spectra offer valuable information for the classification and reconstruction of overlapping latent fingerprints by implementing appropriate statistical multivariate analysis. The current study employs principal component analysis and partial least square methods for the classification of latent fingerprints from the LIBS spectra. This technique was successfully demonstrated through a classification study of four distinct latent fingerprints using classification methods such as soft independent modeling of class analogy (SIMCA) and partial least squares discriminant analysis (PLS-DA). The novel method yielded an accuracy of more than 85% and was proven to be sufficiently robust. Furthermore, through laser scanning analysis at a spatial interval of 125 µm, the overlapping fingerprints were reconstructed as separate two-dimensional forms.

  5. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, George

    1993-01-01

    The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multiparameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resource.

  6. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, Stanislav

    1992-01-01

    The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.

  7. Combination of laser-induced breakdown spectroscopy and Raman spectroscopy for multivariate classification of bacteria

    NASA Astrophysics Data System (ADS)

    Prochazka, D.; Mazura, M.; Samek, O.; Rebrošová, K.; Pořízka, P.; Klus, J.; Prochazková, P.; Novotný, J.; Novotný, K.; Kaiser, J.

    2018-01-01

    In this work, we investigate the impact of data provided by complementary laser-based spectroscopic methods on multivariate classification accuracy. Discrimination and classification of five Staphylococcus bacterial strains and one strain of Escherichia coli is presented. The technique that we used for measurements is a combination of Raman spectroscopy and Laser-Induced Breakdown Spectroscopy (LIBS). Obtained spectroscopic data were then processed using Multivariate Data Analysis algorithms. Principal Components Analysis (PCA) was selected as the most suitable technique for visualization of bacterial strains data. To classify the bacterial strains, we used Neural Networks, namely a supervised version of Kohonen's self-organizing maps (SOM). We were processing results in three different ways - separately from LIBS measurements, from Raman measurements, and we also merged data from both mentioned methods. The three types of results were then compared. By applying the PCA to Raman spectroscopy data, we observed that two bacterial strains were fully distinguished from the rest of the data set. In the case of LIBS data, three bacterial strains were fully discriminated. Using a combination of data from both methods, we achieved the complete discrimination of all bacterial strains. All the data were classified with a high success rate using SOM algorithm. The most accurate classification was obtained using a combination of data from both techniques. The classification accuracy varied, depending on specific samples and techniques. As for LIBS, the classification accuracy ranged from 45% to 100%, as for Raman Spectroscopy from 50% to 100% and in case of merged data, all samples were classified correctly. Based on the results of the experiments presented in this work, we can assume that the combination of Raman spectroscopy and LIBS significantly enhances discrimination and classification accuracy of bacterial species and strains. The reason is the complementarity in obtained chemical information while using these two methods.

  8. Fast-HPLC Fingerprinting to Discriminate Olive Oil from Other Edible Vegetable Oils by Multivariate Classification Methods.

    PubMed

    Jiménez-Carvelo, Ana M; González-Casado, Antonio; Pérez-Castaño, Estefanía; Cuadros-Rodríguez, Luis

    2017-03-01

    A new analytical method for the differentiation of olive oil from other vegetable oils using reversed-phase LC and applying chemometric techniques was developed. A 3 cm short column was used to obtain the chromatographic fingerprint of the methyl-transesterified fraction of each vegetable oil. The chromatographic analysis took only 4 min. The multivariate classification methods used were k-nearest neighbors, partial least-squares (PLS) discriminant analysis, one-class PLS, support vector machine classification, and soft independent modeling of class analogies. The discrimination of olive oil from other vegetable edible oils was evaluated by several classification quality metrics. Several strategies for the classification of the olive oil were used: one input-class, two input-class, and pseudo two input-class.

  9. Large Uptake of Titania and Iron Oxide Nanoparticles in the Nucleus of Lung Epithelial Cells as Measured by Raman Imaging and Multivariate Classification

    PubMed Central

    Ahlinder, Linnea; Ekstrand-Hammarström, Barbro; Geladi, Paul; Österlund, Lars

    2013-01-01

    It is a challenging task to characterize the biodistribution of nanoparticles in cells and tissue on a subcellular level. Conventional methods to study the interaction of nanoparticles with living cells rely on labeling techniques that either selectively stain the particles or selectively tag them with tracer molecules. In this work, Raman imaging, a label-free technique that requires no extensive sample preparation, was combined with multivariate classification to quantify the spatial distribution of oxide nanoparticles inside living lung epithelial cells (A549). Cells were exposed to TiO2 (titania) and/or α-FeO(OH) (goethite) nanoparticles at various incubation times (4 or 48 h). Using multivariate classification of hyperspectral Raman data with partial least-squares discriminant analysis, we show that a surprisingly large fraction of spectra, classified as belonging to the cell nucleus, show Raman bands associated with nanoparticles. Up to 40% of spectra from the cell nucleus show Raman bands associated with nanoparticles. Complementary transmission electron microscopy data for thin cell sections qualitatively support the conclusions. PMID:23870252

  10. An efficient swarm intelligence approach to feature selection based on invasive weed optimization: Application to multivariate calibration and classification using spectroscopic data

    NASA Astrophysics Data System (ADS)

    Sheykhizadeh, Saheleh; Naseri, Abdolhossein

    2018-04-01

    Variable selection plays a key role in classification and multivariate calibration. Variable selection methods are aimed at choosing a set of variables, from a large pool of available predictors, relevant to the analyte concentrations estimation, or to achieve better classification results. Many variable selection techniques have now been introduced among which, those which are based on the methodologies of swarm intelligence optimization have been more respected during a few last decades since they are mainly inspired by nature. In this work, a simple and new variable selection algorithm is proposed according to the invasive weed optimization (IWO) concept. IWO is considered a bio-inspired metaheuristic mimicking the weeds ecological behavior in colonizing as well as finding an appropriate place for growth and reproduction; it has been shown to be very adaptive and powerful to environmental changes. In this paper, the first application of IWO, as a very simple and powerful method, to variable selection is reported using different experimental datasets including FTIR and NIR data, so as to undertake classification and multivariate calibration tasks. Accordingly, invasive weed optimization - linear discrimination analysis (IWO-LDA) and invasive weed optimization- partial least squares (IWO-PLS) are introduced for multivariate classification and calibration, respectively.

  11. An efficient swarm intelligence approach to feature selection based on invasive weed optimization: Application to multivariate calibration and classification using spectroscopic data.

    PubMed

    Sheykhizadeh, Saheleh; Naseri, Abdolhossein

    2018-04-05

    Variable selection plays a key role in classification and multivariate calibration. Variable selection methods are aimed at choosing a set of variables, from a large pool of available predictors, relevant to the analyte concentrations estimation, or to achieve better classification results. Many variable selection techniques have now been introduced among which, those which are based on the methodologies of swarm intelligence optimization have been more respected during a few last decades since they are mainly inspired by nature. In this work, a simple and new variable selection algorithm is proposed according to the invasive weed optimization (IWO) concept. IWO is considered a bio-inspired metaheuristic mimicking the weeds ecological behavior in colonizing as well as finding an appropriate place for growth and reproduction; it has been shown to be very adaptive and powerful to environmental changes. In this paper, the first application of IWO, as a very simple and powerful method, to variable selection is reported using different experimental datasets including FTIR and NIR data, so as to undertake classification and multivariate calibration tasks. Accordingly, invasive weed optimization - linear discrimination analysis (IWO-LDA) and invasive weed optimization- partial least squares (IWO-PLS) are introduced for multivariate classification and calibration, respectively. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. Multivariate geometry as an approach to algal community analysis

    USGS Publications Warehouse

    Allen, T.F.H.; Skagen, S.

    1973-01-01

    Multivariate analyses are put in the context of more usual approaches to phycological investigations. The intuitive common-sense involved in methods of ordination, classification and discrimination are emphasised by simple geometric accounts which avoid jargon and matrix algebra. Warnings are given that artifacts result from technique abuses by the naive or over-enthusiastic. An analysis of a simple periphyton data set is presented as an example of the approach. Suggestions are made as to situations in phycological investigations, where the techniques could be appropriate. The discipline is reprimanded for its neglect of the multivariate approach.

  13. FT-Raman and NIR spectroscopy data fusion strategy for multivariate qualitative analysis of food fraud.

    PubMed

    Márquez, Cristina; López, M Isabel; Ruisánchez, Itziar; Callao, M Pilar

    2016-12-01

    Two data fusion strategies (high- and mid-level) combined with a multivariate classification approach (Soft Independent Modelling of Class Analogy, SIMCA) have been applied to take advantage of the synergistic effect of the information obtained from two spectroscopic techniques: FT-Raman and NIR. Mid-level data fusion consists of merging some of the previous selected variables from the spectra obtained from each spectroscopic technique and then applying the classification technique. High-level data fusion combines the SIMCA classification results obtained individually from each spectroscopic technique. Of the possible ways to make the necessary combinations, we decided to use fuzzy aggregation connective operators. As a case study, we considered the possible adulteration of hazelnut paste with almond. Using the two-class SIMCA approach, class 1 consisted of unadulterated hazelnut samples and class 2 of samples adulterated with almond. Models performance was also studied with samples adulterated with chickpea. The results show that data fusion is an effective strategy since the performance parameters are better than the individual ones: sensitivity and specificity values between 75% and 100% for the individual techniques and between 96-100% and 88-100% for the mid- and high-level data fusion strategies, respectively. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. Rapid differentiation of Ghana cocoa beans by FT-NIR spectroscopy coupled with multivariate classification

    NASA Astrophysics Data System (ADS)

    Teye, Ernest; Huang, Xingyi; Dai, Huang; Chen, Quansheng

    2013-10-01

    Quick, accurate and reliable technique for discrimination of cocoa beans according to geographical origin is essential for quality control and traceability management. This current study presents the application of Near Infrared Spectroscopy technique and multivariate classification for the differentiation of Ghana cocoa beans. A total of 194 cocoa bean samples from seven cocoa growing regions were used. Principal component analysis (PCA) was used to extract relevant information from the spectral data and this gave visible cluster trends. The performance of four multivariate classification methods: Linear discriminant analysis (LDA), K-nearest neighbors (KNN), Back propagation artificial neural network (BPANN) and Support vector machine (SVM) were compared. The performances of the models were optimized by cross validation. The results revealed that; SVM model was superior to all the mathematical methods with a discrimination rate of 100% in both the training and prediction set after preprocessing with Mean centering (MC). BPANN had a discrimination rate of 99.23% for the training set and 96.88% for prediction set. While LDA model had 96.15% and 90.63% for the training and prediction sets respectively. KNN model had 75.01% for the training set and 72.31% for prediction set. The non-linear classification methods used were superior to the linear ones. Generally, the results revealed that NIR Spectroscopy coupled with SVM model could be used successfully to discriminate cocoa beans according to their geographical origins for effective quality assurance.

  15. DEFINITION OF MULTIVARIATE GEOCHEMICAL ASSOCIATIONS WITH POLYMETALLIC MINERAL OCCURRENCES USING A SPATIALLY DEPENDENT CLUSTERING TECHNIQUE AND RASTERIZED STREAM SEDIMENT DATA - AN ALASKAN EXAMPLE.

    USGS Publications Warehouse

    Jenson, Susan K.; Trautwein, C.M.

    1984-01-01

    The application of an unsupervised, spatially dependent clustering technique (AMOEBA) to interpolated raster arrays of stream sediment data has been found to provide useful multivariate geochemical associations for modeling regional polymetallic resource potential. The technique is based on three assumptions regarding the compositional and spatial relationships of stream sediment data and their regional significance. These assumptions are: (1) compositionally separable classes exist and can be statistically distinguished; (2) the classification of multivariate data should minimize the pair probability of misclustering to establish useful compositional associations; and (3) a compositionally defined class represented by three or more contiguous cells within an array is a more important descriptor of a terrane than a class represented by spatial outliers.

  16. Design of neural networks for classification of remotely sensed imagery

    NASA Technical Reports Server (NTRS)

    Chettri, Samir R.; Cromp, Robert F.; Birmingham, Mark

    1992-01-01

    Classification accuracies of a backpropagation neural network are discussed and compared with a maximum likelihood classifier (MLC) with multivariate normal class models. We have found that, because of its nonparametric nature, the neural network outperforms the MLC in this area. In addition, we discuss techniques for constructing optimal neural nets on parallel hardware like the MasPar MP-1 currently at GSFC. Other important discussions are centered around training and classification times of the two methods, and sensitivity to the training data. Finally, we discuss future work in the area of classification and neural nets.

  17. Chemometric and multivariate statistical analysis of time-of-flight secondary ion mass spectrometry spectra from complex Cu-Fe sulfides.

    PubMed

    Kalegowda, Yogesh; Harmer, Sarah L

    2012-03-20

    Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.

  18. Simulation techniques for estimating error in the classification of normal patterns

    NASA Technical Reports Server (NTRS)

    Whitsitt, S. J.; Landgrebe, D. A.

    1974-01-01

    Methods of efficiently generating and classifying samples with specified multivariate normal distributions were discussed. Conservative confidence tables for sample sizes are given for selective sampling. Simulation results are compared with classified training data. Techniques for comparing error and separability measure for two normal patterns are investigated and used to display the relationship between the error and the Chernoff bound.

  19. The classification of secondary colorectal liver cancer in human biopsy samples using angular dispersive x-ray diffraction and multivariate analysis

    NASA Astrophysics Data System (ADS)

    Theodorakou, Chrysoula; Farquharson, Michael J.

    2009-08-01

    The motivation behind this study is to assess whether angular dispersive x-ray diffraction (ADXRD) data, processed using multivariate analysis techniques, can be used for classifying secondary colorectal liver cancer tissue and normal surrounding liver tissue in human liver biopsy samples. The ADXRD profiles from a total of 60 samples of normal liver tissue and colorectal liver metastases were measured using a synchrotron radiation source. The data were analysed for 56 samples using nonlinear peak-fitting software. Four peaks were fitted to all of the ADXRD profiles, and the amplitude, area, amplitude and area ratios for three of the four peaks were calculated and used for the statistical and multivariate analysis. The statistical analysis showed that there are significant differences between all the peak-fitting parameters and ratios between the normal and the diseased tissue groups. The technique of soft independent modelling of class analogy (SIMCA) was used to classify normal liver tissue and colorectal liver metastases resulting in 67% of the normal tissue samples and 60% of the secondary colorectal liver tissue samples being classified correctly. This study has shown that the ADXRD data of normal and secondary colorectal liver cancer are statistically different and x-ray diffraction data analysed using multivariate analysis have the potential to be used as a method of tissue classification.

  20. Newer classification and regression tree techniques: Bagging and Random Forests for ecological prediction

    Treesearch

    Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw

    2006-01-01

    We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.

  1. Study of archaeological coins of different dynasties using libs coupled with multivariate analysis

    NASA Astrophysics Data System (ADS)

    Awasthi, Shikha; Kumar, Rohit; Rai, G. K.; Rai, A. K.

    2016-04-01

    Laser Induced Breakdown Spectroscopy (LIBS) is an atomic emission spectroscopic technique having unique capability of an in-situ monitoring tool for detection and quantification of elements present in different artifacts. Archaeological coins collected form G.R. Sharma Memorial Museum; University of Allahabad, India has been analyzed using LIBS technique. These coins were obtained from excavation of Kausambi, Uttar Pradesh, India. LIBS system assembled in the laboratory (laser Nd:YAG 532 nm, 4 ns pulse width FWHM with Ocean Optics LIBS 2000+ spectrometer) is employed for spectral acquisition. The spectral lines of Ag, Cu, Ca, Sn, Si, Fe and Mg are identified in the LIBS spectra of different coins. LIBS along with Multivariate Analysis play an effective role for classification and contribution of spectral lines in different coins. The discrimination between five coins with Archaeological interest has been carried out using Principal Component Analysis (PCA). The results show the potential relevancy of the methodology used in the elemental identification and classification of artifacts with high accuracy and robustness.

  2. Recent applications of multivariate data analysis methods in the authentication of rice and the most analyzed parameters: A review.

    PubMed

    Maione, Camila; Barbosa, Rommel Melgaço

    2018-01-24

    Rice is one of the most important staple foods around the world. Authentication of rice is one of the most addressed concerns in the present literature, which includes recognition of its geographical origin and variety, certification of organic rice and many other issues. Good results have been achieved by multivariate data analysis and data mining techniques when combined with specific parameters for ascertaining authenticity and many other useful characteristics of rice, such as quality, yield and others. This paper brings a review of the recent research projects on discrimination and authentication of rice using multivariate data analysis and data mining techniques. We found that data obtained from image processing, molecular and atomic spectroscopy, elemental fingerprinting, genetic markers, molecular content and others are promising sources of information regarding geographical origin, variety and other aspects of rice, being widely used combined with multivariate data analysis techniques. Principal component analysis and linear discriminant analysis are the preferred methods, but several other data classification techniques such as support vector machines, artificial neural networks and others are also frequently present in some studies and show high performance for discrimination of rice.

  3. Differences in chewing sounds of dry-crisp snacks by multivariate data analysis

    NASA Astrophysics Data System (ADS)

    De Belie, N.; Sivertsvik, M.; De Baerdemaeker, J.

    2003-09-01

    Chewing sounds of different types of dry-crisp snacks (two types of potato chips, prawn crackers, cornflakes and low calorie snacks from extruded starch) were analysed to assess differences in sound emission patterns. The emitted sounds were recorded by a microphone placed over the ear canal. The first bite and the first subsequent chew were selected from the time signal and a fast Fourier transformation provided the power spectra. Different multivariate analysis techniques were used for classification of the snack groups. This included principal component analysis (PCA) and unfold partial least-squares (PLS) algorithms, as well as multi-way techniques such as three-way PLS, three-way PCA (Tucker3), and parallel factor analysis (PARAFAC) on the first bite and subsequent chew. The models were evaluated by calculating the classification errors and the root mean square error of prediction (RMSEP) for independent validation sets. It appeared that the logarithm of the power spectra obtained from the chewing sounds could be used successfully to distinguish the different snack groups. When different chewers were used, recalibration of the models was necessary. Multi-way models distinguished better between chewing sounds of different snack groups than PCA on bite or chew separately and than unfold PLS. From all three-way models applied, N-PLS with three components showed the best classification capabilities, resulting in classification errors of 14-18%. The major amount of incorrect classifications was due to one type of potato chips that had a very irregular shape, resulting in a wide variation of the emitted sounds.

  4. Non-targeted 1H NMR fingerprinting and multivariate statistical analyses for the characterisation of the geographical origin of Italian sweet cherries.

    PubMed

    Longobardi, F; Ventrella, A; Bianco, A; Catucci, L; Cafagna, I; Gallo, V; Mastrorilli, P; Agostiano, A

    2013-12-01

    In this study, non-targeted (1)H NMR fingerprinting was used in combination with multivariate statistical techniques for the classification of Italian sweet cherries based on their different geographical origins (Emilia Romagna and Puglia). As classification techniques, Soft Independent Modelling of Class Analogy (SIMCA), Partial Least Squares Discriminant Analysis (PLS-DA), and Linear Discriminant Analysis (LDA) were carried out and the results were compared. For LDA, before performing a refined selection of the number/combination of variables, two different strategies for a preliminary reduction of the variable number were tested. The best average recognition and CV prediction abilities (both 100.0%) were obtained for all the LDA models, although PLS-DA also showed remarkable performances (94.6%). All the statistical models were validated by observing the prediction abilities with respect to an external set of cherry samples. The best result (94.9%) was obtained with LDA by performing a best subset selection procedure on a set of 30 principal components previously selected by a stepwise decorrelation. The metabolites that mostly contributed to the classification performances of such LDA model, were found to be malate, glucose, fructose, glutamine and succinate. Copyright © 2013 Elsevier Ltd. All rights reserved.

  5. Discrimination of irradiated MOX fuel from UOX fuel by multivariate statistical analysis of simulated activities of gamma-emitting isotopes

    NASA Astrophysics Data System (ADS)

    Åberg Lindell, M.; Andersson, P.; Grape, S.; Hellesen, C.; Håkansson, A.; Thulin, M.

    2018-03-01

    This paper investigates how concentrations of certain fission products and their related gamma-ray emissions can be used to discriminate between uranium oxide (UOX) and mixed oxide (MOX) type fuel. Discrimination of irradiated MOX fuel from irradiated UOX fuel is important in nuclear facilities and for transport of nuclear fuel, for purposes of both criticality safety and nuclear safeguards. Although facility operators keep records on the identity and properties of each fuel, tools for nuclear safeguards inspectors that enable independent verification of the fuel are critical in the recovery of continuity of knowledge, should it be lost. A discrimination methodology for classification of UOX and MOX fuel, based on passive gamma-ray spectroscopy data and multivariate analysis methods, is presented. Nuclear fuels and their gamma-ray emissions were simulated in the Monte Carlo code Serpent, and the resulting data was used as input to train seven different multivariate classification techniques. The trained classifiers were subsequently implemented and evaluated with respect to their capabilities to correctly predict the classes of unknown fuel items. The best results concerning successful discrimination of UOX and MOX-fuel were acquired when using non-linear classification techniques, such as the k nearest neighbors method and the Gaussian kernel support vector machine. For fuel with cooling times up to 20 years, when it is considered that gamma-rays from the isotope 134Cs can still be efficiently measured, success rates of 100% were obtained. A sensitivity analysis indicated that these methods were also robust.

  6. Sparse Multivariate Autoregressive Modeling for Mild Cognitive Impairment Classification

    PubMed Central

    Li, Yang; Wee, Chong-Yaw; Jie, Biao; Peng, Ziwen

    2014-01-01

    Brain connectivity network derived from functional magnetic resonance imaging (fMRI) is becoming increasingly prevalent in the researches related to cognitive and perceptual processes. The capability to detect causal or effective connectivity is highly desirable for understanding the cooperative nature of brain network, particularly when the ultimate goal is to obtain good performance of control-patient classification with biological meaningful interpretations. Understanding directed functional interactions between brain regions via brain connectivity network is a challenging task. Since many genetic and biomedical networks are intrinsically sparse, incorporating sparsity property into connectivity modeling can make the derived models more biologically plausible. Accordingly, we propose an effective connectivity modeling of resting-state fMRI data based on the multivariate autoregressive (MAR) modeling technique, which is widely used to characterize temporal information of dynamic systems. This MAR modeling technique allows for the identification of effective connectivity using the Granger causality concept and reducing the spurious causality connectivity in assessment of directed functional interaction from fMRI data. A forward orthogonal least squares (OLS) regression algorithm is further used to construct a sparse MAR model. By applying the proposed modeling to mild cognitive impairment (MCI) classification, we identify several most discriminative regions, including middle cingulate gyrus, posterior cingulate gyrus, lingual gyrus and caudate regions, in line with results reported in previous findings. A relatively high classification accuracy of 91.89 % is also achieved, with an increment of 5.4 % compared to the fully-connected, non-directional Pearson-correlation-based functional connectivity approach. PMID:24595922

  7. Multivariate pattern classification reveals autonomic and experiential representations of discrete emotions.

    PubMed

    Kragel, Philip A; Labar, Kevin S

    2013-08-01

    Defining the structural organization of emotions is a central unresolved question in affective science. In particular, the extent to which autonomic nervous system activity signifies distinct affective states remains controversial. Most prior research on this topic has used univariate statistical approaches in attempts to classify emotions from psychophysiological data. In the present study, electrodermal, cardiac, respiratory, and gastric activity, as well as self-report measures were taken from healthy subjects during the experience of fear, anger, sadness, surprise, contentment, and amusement in response to film and music clips. Information pertaining to affective states present in these response patterns was analyzed using multivariate pattern classification techniques. Overall accuracy for classifying distinct affective states was 58.0% for autonomic measures and 88.2% for self-report measures, both of which were significantly above chance. Further, examining the error distribution of classifiers revealed that the dimensions of valence and arousal selectively contributed to decoding emotional states from self-report, whereas a categorical configuration of affective space was evident in both self-report and autonomic measures. Taken together, these findings extend recent multivariate approaches to study emotion and indicate that pattern classification tools may improve upon univariate approaches to reveal the underlying structure of emotional experience and physiological expression. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  8. Multivariate Pattern Classification Reveals Autonomic and Experiential Representations of Discrete Emotions

    PubMed Central

    Kragel, Philip A.; LaBar, Kevin S.

    2013-01-01

    Defining the structural organization of emotions is a central unresolved question in affective science. In particular, the extent to which autonomic nervous system activity signifies distinct affective states remains controversial. Most prior research on this topic has used univariate statistical approaches in attempts to classify emotions from psychophysiological data. In the present study, electrodermal, cardiac, respiratory, and gastric activity, as well as self-report measures were taken from healthy subjects during the experience of fear, anger, sadness, surprise, contentment, and amusement in response to film and music clips. Information pertaining to affective states present in these response patterns was analyzed using multivariate pattern classification techniques. Overall accuracy for classifying distinct affective states was 58.0% for autonomic measures and 88.2% for self-report measures, both of which were significantly above chance. Further, examining the error distribution of classifiers revealed that the dimensions of valence and arousal selectively contributed to decoding emotional states from self-report, whereas a categorical configuration of affective space was evident in both self-report and autonomic measures. Taken together, these findings extend recent multivariate approaches to study emotion and indicate that pattern classification tools may improve upon univariate approaches to reveal the underlying structure of emotional experience and physiological expression. PMID:23527508

  9. MANCOVA for one way classification with homogeneity of regression coefficient vectors

    NASA Astrophysics Data System (ADS)

    Mokesh Rayalu, G.; Ravisankar, J.; Mythili, G. Y.

    2017-11-01

    The MANOVA and MANCOVA are the extensions of the univariate ANOVA and ANCOVA techniques to multidimensional or vector valued observations. The assumption of a Gaussian distribution has been replaced with the Multivariate Gaussian distribution for the vectors data and residual term variables in the statistical models of these techniques. The objective of MANCOVA is to determine if there are statistically reliable mean differences that can be demonstrated between groups later modifying the newly created variable. When randomization assignment of samples or subjects to groups is not possible, multivariate analysis of covariance (MANCOVA) provides statistical matching of groups by adjusting dependent variables as if all subjects scored the same on the covariates. In this research article, an extension has been made to the MANCOVA technique with more number of covariates and homogeneity of regression coefficient vectors is also tested.

  10. Discrimination of soft tissues using laser-induced breakdown spectroscopy in combination with k nearest neighbors (kNN) and support vector machine (SVM) classifiers

    NASA Astrophysics Data System (ADS)

    Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying

    2018-06-01

    In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.

  11. Feature combinations and the divergence criterion

    NASA Technical Reports Server (NTRS)

    Decell, H. P., Jr.; Mayekar, S. M.

    1976-01-01

    Classifying large quantities of multidimensional remotely sensed agricultural data requires efficient and effective classification techniques and the construction of certain transformations of a dimension reducing, information preserving nature. The construction of transformations that minimally degrade information (i.e., class separability) is described. Linear dimension reducing transformations for multivariate normal populations are presented. Information content is measured by divergence.

  12. (13)C NMR pattern recognition techniques for the classification of Atlantic salmon (Salmo salar L.) according to their wild, farmed, and geographical origin.

    PubMed

    Aursand, Marit; Standal, Inger B; Praël, Angelika; McEvoy, Lesley; Irvine, Joe; Axelson, David E

    2009-05-13

    (13)C nuclear magnetic resonance (NMR) in combination with multivariate data analysis was used to (1) discriminate between farmed and wild Atlantic salmon ( Salmo salar L.), (2) discriminate between different geographical origins, and (3) verify the origin of market samples. Muscle lipids from 195 Atlantic salmon of known origin (wild and farmed salmon from Norway, Scotland, Canada, Iceland, Ireland, the Faroes, and Tasmania) in addition to market samples were analyzed by (13)C NMR spectroscopy and multivariate analysis. Both probabilistic neural networks (PNN) and support vector machines (SVM) provided excellent discrimination (98.5 and 100.0%, respectively) between wild and farmed salmon. Discrimination with respect to geographical origin was somewhat more difficult, with correct classification rates ranging from 82.2 to 99.3% by PNN and SVM, respectively. In the analysis of market samples, five fish labeled and purchased as wild salmon were classified as farmed salmon (indicating mislabeling), and there were also some discrepancies between the classification and the product declaration with regard to geographical origin.

  13. From sensation to perception: Using multivariate classification of visual illusions to identify neural correlates of conscious awareness in space and time.

    PubMed

    Hogendoorn, Hinze

    2015-01-01

    An important goal of cognitive neuroscience is understanding the neural underpinnings of conscious awareness. Although the low-level processing of sensory input is well understood in most modalities, it remains a challenge to understand how the brain translates such input into conscious awareness. Here, I argue that the application of multivariate pattern classification techniques to neuroimaging data acquired while observers experience perceptual illusions provides a unique way to dissociate sensory mechanisms from mechanisms underlying conscious awareness. Using this approach, it is possible to directly compare patterns of neural activity that correspond to the contents of awareness, independent from changes in sensory input, and to track these neural representations over time at high temporal resolution. I highlight five recent studies using this approach, and provide practical considerations and limitations for future implementations.

  14. Portable XRF and principal component analysis for bill characterization in forensic science.

    PubMed

    Appoloni, C R; Melquiades, F L

    2014-02-01

    Several modern techniques have been applied to prevent counterfeiting of money bills. The objective of this study was to demonstrate the potential of Portable X-ray Fluorescence (PXRF) technique and the multivariate analysis method of Principal Component Analysis (PCA) for classification of bills in order to use it in forensic science. Bills of Dollar, Euro and Real (Brazilian currency) were measured directly at different colored regions, without any previous preparation. Spectra interpretation allowed the identification of Ca, Ti, Fe, Cu, Sr, Y, Zr and Pb. PCA analysis separated the bills in three groups and subgroups among Brazilian currency. In conclusion, the samples were classified according to its origin identifying the elements responsible for differentiation and basic pigment composition. PXRF allied to multivariate discriminate methods is a promising technique for rapid and no destructive identification of false bills in forensic science. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. REGIONAL-SCALE WIND FIELD CLASSIFICATION EMPLOYING CLUSTER ANALYSIS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Glascoe, L G; Glaser, R E; Chin, H S

    2004-06-17

    The classification of time-varying multivariate regional-scale wind fields at a specific location can assist event planning as well as consequence and risk analysis. Further, wind field classification involves data transformation and inference techniques that effectively characterize stochastic wind field variation. Such a classification scheme is potentially useful for addressing overall atmospheric transport uncertainty and meteorological parameter sensitivity issues. Different methods to classify wind fields over a location include the principal component analysis of wind data (e.g., Hardy and Walton, 1978) and the use of cluster analysis for wind data (e.g., Green et al., 1992; Kaufmann and Weber, 1996). The goalmore » of this study is to use a clustering method to classify the winds of a gridded data set, i.e, from meteorological simulations generated by a forecast model.« less

  16. A computer analysis of ERTS data of the Lake Gregory area of South Australia with particular emphasis on its role in terrain classification for engineering. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Lodwick, G. D. (Principal Investigator)

    1976-01-01

    A digital computer and multivariate statistical techniques were used to analyze 4-band multispectral data. A representation of the original data for each of the four bands allows a certain degree of terrain interpretation; however, variations in appearance of sites within and between bands, without additional criteria for deciding which representation should be preferred, create difficulties for classification. Investigation of the video data groups produced by principal components analysis and cluster analysis techniques shows that effective correlations with classifications of terrain produced by conventional methods could be carried out. The analyses also highlighted underlying relationships between the various elements. The approach used allows large areas (185 cm by 185 cm) to be classified into fundamental units within a matter of hours and can be applied to those parts of the Earth where facilities for conventional studies are poor or lacking.

  17. Multivariate decoding of brain images using ordinal regression.

    PubMed

    Doyle, O M; Ashburner, J; Zelaya, F O; Williams, S C R; Mehta, M A; Marquand, A F

    2013-11-01

    Neuroimaging data are increasingly being used to predict potential outcomes or groupings, such as clinical severity, drug dose response, and transitional illness states. In these examples, the variable (target) we want to predict is ordinal in nature. Conventional classification schemes assume that the targets are nominal and hence ignore their ranked nature, whereas parametric and/or non-parametric regression models enforce a metric notion of distance between classes. Here, we propose a novel, alternative multivariate approach that overcomes these limitations - whole brain probabilistic ordinal regression using a Gaussian process framework. We applied this technique to two data sets of pharmacological neuroimaging data from healthy volunteers. The first study was designed to investigate the effect of ketamine on brain activity and its subsequent modulation with two compounds - lamotrigine and risperidone. The second study investigates the effect of scopolamine on cerebral blood flow and its modulation using donepezil. We compared ordinal regression to multi-class classification schemes and metric regression. Considering the modulation of ketamine with lamotrigine, we found that ordinal regression significantly outperformed multi-class classification and metric regression in terms of accuracy and mean absolute error. However, for risperidone ordinal regression significantly outperformed metric regression but performed similarly to multi-class classification both in terms of accuracy and mean absolute error. For the scopolamine data set, ordinal regression was found to outperform both multi-class and metric regression techniques considering the regional cerebral blood flow in the anterior cingulate cortex. Ordinal regression was thus the only method that performed well in all cases. Our results indicate the potential of an ordinal regression approach for neuroimaging data while providing a fully probabilistic framework with elegant approaches for model selection. Copyright © 2013. Published by Elsevier Inc.

  18. Characterization of agricultural land using singular value decomposition

    NASA Astrophysics Data System (ADS)

    Herries, Graham M.; Danaher, Sean; Selige, Thomas

    1995-11-01

    A method is defined and tested for the characterization of agricultural land from multi-spectral imagery, based on singular value decomposition (SVD) and key vector analysis. The SVD technique, which bears a close resemblance to multivariate statistic techniques, has previously been successfully applied to problems of signal extraction for marine data and forestry species classification. In this study the SVD technique is used as a classifier for agricultural regions, using airborne Daedalus ATM data, with 1 m resolution. The specific region chosen is an experimental research farm in Bavaria, Germany. This farm has a large number of crops, within a very small region and hence is not amenable to existing techniques. There are a number of other significant factors which render existing techniques such as the maximum likelihood algorithm less suitable for this area. These include a very dynamic terrain and tessellated pattern soil differences, which together cause large variations in the growth characteristics of the crops. The SVD technique is applied to this data set using a multi-stage classification approach, removing unwanted land-cover classes one step at a time. Typical classification accuracy's for SVD are of the order of 85-100%. Preliminary results indicate that it is a fast and efficient classifier with the ability to differentiate between crop types such as wheat, rye, potatoes and clover. The results of characterizing 3 sub-classes of Winter Wheat are also shown.

  19. Reliable Early Classification on Multivariate Time Series with Numerical and Categorical Attributes

    DTIC Science & Technology

    2015-05-22

    design a procedure of feature extraction in REACT named MEG (Mining Equivalence classes with shapelet Generators) based on the concept of...Equivalence Classes Mining [12, 15]. MEG can efficiently and effectively generate the discriminative features. In addition, several strategies are proposed...technique of parallel computing [4] to propose a process of pa- rallel MEG for substantially reducing the computational overhead of discovering shapelet

  20. Models of Marine Fish Biodiversity: Assessing Predictors from Three Habitat Classification Schemes.

    PubMed

    Yates, Katherine L; Mellin, Camille; Caley, M Julian; Radford, Ben T; Meeuwig, Jessica J

    2016-01-01

    Prioritising biodiversity conservation requires knowledge of where biodiversity occurs. Such knowledge, however, is often lacking. New technologies for collecting biological and physical data coupled with advances in modelling techniques could help address these gaps and facilitate improved management outcomes. Here we examined the utility of environmental data, obtained using different methods, for developing models of both uni- and multivariate biodiversity metrics. We tested which biodiversity metrics could be predicted best and evaluated the performance of predictor variables generated from three types of habitat data: acoustic multibeam sonar imagery, predicted habitat classification, and direct observer habitat classification. We used boosted regression trees (BRT) to model metrics of fish species richness, abundance and biomass, and multivariate regression trees (MRT) to model biomass and abundance of fish functional groups. We compared model performance using different sets of predictors and estimated the relative influence of individual predictors. Models of total species richness and total abundance performed best; those developed for endemic species performed worst. Abundance models performed substantially better than corresponding biomass models. In general, BRT and MRTs developed using predicted habitat classifications performed less well than those using multibeam data. The most influential individual predictor was the abiotic categorical variable from direct observer habitat classification and models that incorporated predictors from direct observer habitat classification consistently outperformed those that did not. Our results show that while remotely sensed data can offer considerable utility for predictive modelling, the addition of direct observer habitat classification data can substantially improve model performance. Thus it appears that there are aspects of marine habitats that are important for modelling metrics of fish biodiversity that are not fully captured by remotely sensed data. As such, the use of remotely sensed data to model biodiversity represents a compromise between model performance and data availability.

  1. Models of Marine Fish Biodiversity: Assessing Predictors from Three Habitat Classification Schemes

    PubMed Central

    Yates, Katherine L.; Mellin, Camille; Caley, M. Julian; Radford, Ben T.; Meeuwig, Jessica J.

    2016-01-01

    Prioritising biodiversity conservation requires knowledge of where biodiversity occurs. Such knowledge, however, is often lacking. New technologies for collecting biological and physical data coupled with advances in modelling techniques could help address these gaps and facilitate improved management outcomes. Here we examined the utility of environmental data, obtained using different methods, for developing models of both uni- and multivariate biodiversity metrics. We tested which biodiversity metrics could be predicted best and evaluated the performance of predictor variables generated from three types of habitat data: acoustic multibeam sonar imagery, predicted habitat classification, and direct observer habitat classification. We used boosted regression trees (BRT) to model metrics of fish species richness, abundance and biomass, and multivariate regression trees (MRT) to model biomass and abundance of fish functional groups. We compared model performance using different sets of predictors and estimated the relative influence of individual predictors. Models of total species richness and total abundance performed best; those developed for endemic species performed worst. Abundance models performed substantially better than corresponding biomass models. In general, BRT and MRTs developed using predicted habitat classifications performed less well than those using multibeam data. The most influential individual predictor was the abiotic categorical variable from direct observer habitat classification and models that incorporated predictors from direct observer habitat classification consistently outperformed those that did not. Our results show that while remotely sensed data can offer considerable utility for predictive modelling, the addition of direct observer habitat classification data can substantially improve model performance. Thus it appears that there are aspects of marine habitats that are important for modelling metrics of fish biodiversity that are not fully captured by remotely sensed data. As such, the use of remotely sensed data to model biodiversity represents a compromise between model performance and data availability. PMID:27333202

  2. How many taxa can be recognized within the complex Tillandsia capillaris (Bromeliaceae, Tillandsioideae)? Analysis of the available classifications using a multivariate approach.

    PubMed

    Castello, Lucía V; Galetto, Leonardo

    2013-01-01

    Tillandsia capillaris Ruiz & Pav., which belongs to the subgenus Diaphoranthema is distributed in Ecuador, Peru, Bolivia, northern and central Argentina, and Chile, and includes forms that are difficult to circumscribe, thus considered to form a complex. The entities of this complex are predominantly small-sized epiphytes, adapted to xeric environments. The most widely used classification defines 5 forms for this complex based on few morphological reproductive traits: Tillandsia capillaris Ruiz & Pav. f. capillaris, Tillandsia capillaris f. incana (Mez) L.B. Sm., Tillandsia capillaris f. cordobensis (Hieron.) L.B. Sm., Tillandsia capillaris f. hieronymi (Mez) L.B. Sm. and Tillandsia capillaris f. virescens (Ruiz & Pav.) L.B. Sm. In this study, 35 floral and vegetative characters were analyzed with a multivariate approach in order to assess and discuss different proposals for classification of the Tillandsia capillaris complex, which presents morphotypes that co-occur in central and northern Argentina. To accomplish this, data of quantitative and categorical morphological characters of flowers and leaves were collected from herbarium specimens and field collections and were analyzed with statistical multivariate techniques. The results suggest that the last classification for the complex seems more comprehensive and three taxa were delimited: Tillandsia capillaris (=Tillandsia capillaris f. incana-hieronymi), Tillandsia virescens s. str. (=Tillandsia capillaris f. cordobensis) and Tillandsia virescens s. l. (=Tillandsia capillaris f. virescens). While Tillandsia capillaris and Tillandsia virescens s. str. co-occur, Tillandsia virescens s. l. is restricted to altitudes above 2000 m in Argentina. Characters previously used for taxa delimitation showed continuous variation and therefore were not useful. New diagnostic characters are proposed and a key is provided for delimiting these three taxa within the complex.

  3. How many taxa can be recognized within the complex Tillandsia capillaris (Bromeliaceae, Tillandsioideae)? Analysis of the available classifications using a multivariate approach

    PubMed Central

    Castello, Lucía V.; Galetto, Leonardo

    2013-01-01

    Abstract Tillandsia capillaris Ruiz & Pav., which belongs to the subgenus Diaphoranthema is distributed in Ecuador, Peru, Bolivia, northern and central Argentina, and Chile, and includes forms that are difficult to circumscribe, thus considered to form a complex. The entities of this complex are predominantly small-sized epiphytes, adapted to xeric environments. The most widely used classification defines 5 forms for this complex based on few morphological reproductive traits: Tillandsia capillaris Ruiz & Pav. f. capillaris, Tillandsia capillaris f. incana (Mez) L.B. Sm., Tillandsia capillaris f. cordobensis (Hieron.) L.B. Sm., Tillandsia capillaris f. hieronymi (Mez) L.B. Sm. and Tillandsia capillaris f. virescens (Ruiz & Pav.) L.B. Sm. In this study, 35 floral and vegetative characters were analyzed with a multivariate approach in order to assess and discuss different proposals for classification of the Tillandsia capillaris complex, which presents morphotypes that co-occur in central and northern Argentina. To accomplish this, data of quantitative and categorical morphological characters of flowers and leaves were collected from herbarium specimens and field collections and were analyzed with statistical multivariate techniques. The results suggest that the last classification for the complex seems more comprehensive and three taxa were delimited: Tillandsia capillaris (=Tillandsia capillaris f. incana-hieronymi), Tillandsia virescens s. str. (=Tillandsia capillaris f. cordobensis) and Tillandsia virescens s. l. (=Tillandsia capillaris f. virescens). While Tillandsia capillaris and Tillandsia virescens s. str. co-occur, Tillandsia virescens s. l. is restricted to altitudes above 2000 m in Argentina. Characters previously used for taxa delimitation showed continuous variation and therefore were not useful. New diagnostic characters are proposed and a key is provided for delimiting these three taxa within the complex. PMID:23805053

  4. Automated classification of single airborne particles from two-dimensional angle-resolved optical scattering (TAOS) patterns by non-linear filtering

    NASA Astrophysics Data System (ADS)

    Crosta, Giovanni Franco; Pan, Yong-Le; Aptowicz, Kevin B.; Casati, Caterina; Pinnick, Ronald G.; Chang, Richard K.; Videen, Gorden W.

    2013-12-01

    Measurement of two-dimensional angle-resolved optical scattering (TAOS) patterns is an attractive technique for detecting and characterizing micron-sized airborne particles. In general, the interpretation of these patterns and the retrieval of the particle refractive index, shape or size alone, are difficult problems. By reformulating the problem in statistical learning terms, a solution is proposed herewith: rather than identifying airborne particles from their scattering patterns, TAOS patterns themselves are classified through a learning machine, where feature extraction interacts with multivariate statistical analysis. Feature extraction relies on spectrum enhancement, which includes the discrete cosine FOURIER transform and non-linear operations. Multivariate statistical analysis includes computation of the principal components and supervised training, based on the maximization of a suitable figure of merit. All algorithms have been combined together to analyze TAOS patterns, organize feature vectors, design classification experiments, carry out supervised training, assign unknown patterns to classes, and fuse information from different training and recognition experiments. The algorithms have been tested on a data set with more than 3000 TAOS patterns. The parameters that control the algorithms at different stages have been allowed to vary within suitable bounds and are optimized to some extent. Classification has been targeted at discriminating aerosolized Bacillus subtilis particles, a simulant of anthrax, from atmospheric aerosol particles and interfering particles, like diesel soot. By assuming that all training and recognition patterns come from the respective reference materials only, the most satisfactory classification result corresponds to 20% false negatives from B. subtilis particles and <11% false positives from all other aerosol particles. The most effective operations have consisted of thresholding TAOS patterns in order to reject defective ones, and forming training sets from three or four pattern classes. The presented automated classification method may be adapted into a real-time operation technique, capable of detecting and characterizing micron-sized airborne particles.

  5. Classification of bladder cancer cell lines using Raman spectroscopy: a comparison of excitation wavelength, sample substrate and statistical algorithms

    NASA Astrophysics Data System (ADS)

    Kerr, Laura T.; Adams, Aine; O'Dea, Shirley; Domijan, Katarina; Cullen, Ivor; Hennelly, Bryan M.

    2014-05-01

    Raman microspectroscopy can be applied to the urinary bladder for highly accurate classification and diagnosis of bladder cancer. This technique can be applied in vitro to bladder epithelial cells obtained from urine cytology or in vivo as an optical biopsy" to provide results in real-time with higher sensitivity and specificity than current clinical methods. However, there exists a high degree of variability across experimental parameters which need to be standardised before this technique can be utilized in an everyday clinical environment. In this study, we investigate different laser wavelengths (473 nm and 532 nm), sample substrates (glass, fused silica and calcium fluoride) and multivariate statistical methods in order to gain insight into how these various experimental parameters impact on the sensitivity and specificity of Raman cytology.

  6. Superiority of artificial neural networks for a genetic classification procedure.

    PubMed

    Sant'Anna, I C; Tomaz, R S; Silva, G N; Nascimento, M; Bhering, L L; Cruz, C D

    2015-08-19

    The correct classification of individuals is extremely important for the preservation of genetic variability and for maximization of yield in breeding programs using phenotypic traits and genetic markers. The Fisher and Anderson discriminant functions are commonly used multivariate statistical techniques for these situations, which allow for the allocation of an initially unknown individual to predefined groups. However, for higher levels of similarity, such as those found in backcrossed populations, these methods have proven to be inefficient. Recently, much research has been devoted to developing a new paradigm of computing known as artificial neural networks (ANNs), which can be used to solve many statistical problems, including classification problems. The aim of this study was to evaluate the feasibility of ANNs as an evaluation technique of genetic diversity by comparing their performance with that of traditional methods. The discriminant functions were equally ineffective in discriminating the populations, with error rates of 23-82%, thereby preventing the correct discrimination of individuals between populations. The ANN was effective in classifying populations with low and high differentiation, such as those derived from a genetic design established from backcrosses, even in cases of low differentiation of the data sets. The ANN appears to be a promising technique to solve classification problems, since the number of individuals classified incorrectly by the ANN was always lower than that of the discriminant functions. We envisage the potential relevant application of this improved procedure in the genomic classification of markers to distinguish between breeds and accessions.

  7. Identification of Enterococcus, Streptococcus, and Staphylococcus by Multivariate Analysis of Proton Magnetic Resonance Spectroscopic Data from Plate Cultures

    PubMed Central

    Bourne, Roger; Himmelreich, Uwe; Sharma, Ansuiya; Mountford, Carolyn; Sorrell, Tania

    2001-01-01

    A new fingerprinting technique with the potential for rapid identification of bacteria was developed by combining proton magnetic resonance spectroscopy (1H MRS) with multivariate statistical analysis. This resulted in an objective identification strategy for common clinical isolates belonging to the bacterial species Staphylococcus aureus, Staphylococcus epidermidis, Enterococcus faecalis, Streptococcus pneumoniae, Streptococcus pyogenes, Streptococcus agalactiae, and the Streptococcus milleri group. Duplicate cultures of 104 different isolates were examined one or more times using 1H MRS. A total of 312 cultures were examined. An optimized classifier was developed using a bootstrapping process and a seven-group linear discriminant analysis to provide objective classification of the spectra. Identification of isolates was based on consistent high-probability classification of spectra from duplicate cultures and achieved 92% agreement with conventional methods of identification. Fewer than 1% of isolates were identified incorrectly. Identification of the remaining 7% of isolates was defined as indeterminate. PMID:11474013

  8. Biomarkers for Musculoskeletal Pain Conditions: Use of Brain Imaging and Machine Learning.

    PubMed

    Boissoneault, Jeff; Sevel, Landrew; Letzen, Janelle; Robinson, Michael; Staud, Roland

    2017-01-01

    Chronic musculoskeletal pain condition often shows poor correlations between tissue abnormalities and clinical pain. Therefore, classification of pain conditions like chronic low back pain, osteoarthritis, and fibromyalgia depends mostly on self report and less on objective findings like X-ray or magnetic resonance imaging (MRI) changes. However, recent advances in structural and functional brain imaging have identified brain abnormalities in chronic pain conditions that can be used for illness classification. Because the analysis of complex and multivariate brain imaging data is challenging, machine learning techniques have been increasingly utilized for this purpose. The goal of machine learning is to train specific classifiers to best identify variables of interest on brain MRIs (i.e., biomarkers). This report describes classification techniques capable of separating MRI-based brain biomarkers of chronic pain patients from healthy controls with high accuracy (70-92%) using machine learning, as well as critical scientific, practical, and ethical considerations related to their potential clinical application. Although self-report remains the gold standard for pain assessment, machine learning may aid in the classification of chronic pain disorders like chronic back pain and fibromyalgia as well as provide mechanistic information regarding their neural correlates.

  9. Multivariate analyses of Erzgebirge granite and rhyolite composition: Implications for classification of granites and their genetic relations

    USGS Publications Warehouse

    Forster, H.-J.; Davis, J.C.; Tischendorf, G.; Seltmann, R.

    1999-01-01

    High-precision major, minor and trace element analyses for 44 elements have been made of 329 Late Variscan granitic and rhyolitic rocks from the Erzgebirge metallogenic province of Germany. The intrusive histories of some of these granites are not completely understood and exposures of rock are not adequate to resolve relationships between what apparently are different plutons. Therefore, it is necessary to turn to chemical analyses to decipher the evolution of the plutons and their relationships. A new classification of Erzgebirge plutons into five major groups of granites, based on petrologic interpretations of geochemical and mineralogical relationships (low-F biotite granites; low-F two-mica granites; high-F, high-P2O5 Li-mica granites; high-F, low-P2O5 Li-mica granites; high-F, low-P2O5 biotite granites) was tested by multivariate techniques. Canonical analyses of major elements, minor elements, trace elements and ratio variables all distinguish the groups with differing amounts of success. Univariate ANOVA's, in combination with forward-stepwise and backward-elimination canonical analyses, were used to select ten variables which were most effective in distinguishing groups. In a biplot, groups form distinct clusters roughly arranged along a quadratic path. Within groups, individual plutons tend to be arranged in patterns possibly reflecting granitic evolution. Canonical functions were used to classify samples of rhyolites of unknown association into the five groups. Another canonical analysis was based on ten elements traditionally used in petrology and which were important in the new classification of granites. Their biplot pattern is similar to that from statistically chosen variables but less effective at distinguishing the five groups of granites. This study shows that multivariate statistical techniques can provide significant insight into problems of granitic petrogenesis and may be superior to conventional procedures for petrological interpretation.

  10. Use of the Köppen-Trewartha climate classification to evaluate climatic refugia in statistically derived ecoregions for the People’s Republic of China

    Treesearch

    B. Baker; Henry Diaz; William Hargrove; Forrest Hoffman

    2010-01-01

    Changes in climate as projected by state-of-the-art climate models are likely to result in novel combinations of climate and topo-edaphic factors that will have substantial impacts on the distribution and persistence of natural vegetation and animal species. We have used multivariate techniques to quantify some of these changes; the...

  11. Drunk driving detection based on classification of multivariate time series.

    PubMed

    Li, Zhenlong; Jin, Xue; Zhao, Xiaohua

    2015-09-01

    This paper addresses the problem of detecting drunk driving based on classification of multivariate time series. First, driving performance measures were collected from a test in a driving simulator located in the Traffic Research Center, Beijing University of Technology. Lateral position and steering angle were used to detect drunk driving. Second, multivariate time series analysis was performed to extract the features. A piecewise linear representation was used to represent multivariate time series. A bottom-up algorithm was then employed to separate multivariate time series. The slope and time interval of each segment were extracted as the features for classification. Third, a support vector machine classifier was used to classify driver's state into two classes (normal or drunk) according to the extracted features. The proposed approach achieved an accuracy of 80.0%. Drunk driving detection based on the analysis of multivariate time series is feasible and effective. The approach has implications for drunk driving detection. Copyright © 2015 Elsevier Ltd and National Safety Council. All rights reserved.

  12. Various forms of indexing HDMR for modelling multivariate classification problems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aksu, Çağrı; Tunga, M. Alper

    2014-12-10

    The Indexing HDMR method was recently developed for modelling multivariate interpolation problems. The method uses the Plain HDMR philosophy in partitioning the given multivariate data set into less variate data sets and then constructing an analytical structure through these partitioned data sets to represent the given multidimensional problem. Indexing HDMR makes HDMR be applicable to classification problems having real world data. Mostly, we do not know all possible class values in the domain of the given problem, that is, we have a non-orthogonal data structure. However, Plain HDMR needs an orthogonal data structure in the given problem to be modelled.more » In this sense, the main idea of this work is to offer various forms of Indexing HDMR to successfully model these real life classification problems. To test these different forms, several well-known multivariate classification problems given in UCI Machine Learning Repository were used and it was observed that the accuracy results lie between 80% and 95% which are very satisfactory.« less

  13. Refined composite multivariate generalized multiscale fuzzy entropy: A tool for complexity analysis of multichannel signals

    NASA Astrophysics Data System (ADS)

    Azami, Hamed; Escudero, Javier

    2017-01-01

    Multiscale entropy (MSE) is an appealing tool to characterize the complexity of time series over multiple temporal scales. Recent developments in the field have tried to extend the MSE technique in different ways. Building on these trends, we propose the so-called refined composite multivariate multiscale fuzzy entropy (RCmvMFE) whose coarse-graining step uses variance (RCmvMFEσ2) or mean (RCmvMFEμ). We investigate the behavior of these multivariate methods on multichannel white Gaussian and 1/ f noise signals, and two publicly available biomedical recordings. Our simulations demonstrate that RCmvMFEσ2 and RCmvMFEμ lead to more stable results and are less sensitive to the signals' length in comparison with the other existing multivariate multiscale entropy-based methods. The classification results also show that using both the variance and mean in the coarse-graining step offers complexity profiles with complementary information for biomedical signal analysis. We also made freely available all the Matlab codes used in this paper.

  14. Nonlinear multivariate and time series analysis by neural network methods

    NASA Astrophysics Data System (ADS)

    Hsieh, William W.

    2004-03-01

    Methods in multivariate statistical analysis are essential for working with large amounts of geophysical data, data from observational arrays, from satellites, or from numerical model output. In classical multivariate statistical analysis, there is a hierarchy of methods, starting with linear regression at the base, followed by principal component analysis (PCA) and finally canonical correlation analysis (CCA). A multivariate time series method, the singular spectrum analysis (SSA), has been a fruitful extension of the PCA technique. The common drawback of these classical methods is that only linear structures can be correctly extracted from the data. Since the late 1980s, neural network methods have become popular for performing nonlinear regression and classification. More recently, neural network methods have been extended to perform nonlinear PCA (NLPCA), nonlinear CCA (NLCCA), and nonlinear SSA (NLSSA). This paper presents a unified view of the NLPCA, NLCCA, and NLSSA techniques and their applications to various data sets of the atmosphere and the ocean (especially for the El Niño-Southern Oscillation and the stratospheric quasi-biennial oscillation). These data sets reveal that the linear methods are often too simplistic to describe real-world systems, with a tendency to scatter a single oscillatory phenomenon into numerous unphysical modes or higher harmonics, which can be largely alleviated in the new nonlinear paradigm.

  15. Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation

    PubMed Central

    Parsons, Helen M; Ludwig, Christian; Günther, Ulrich L; Viant, Mark R

    2007-01-01

    Background Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising from sample preparation and analytical measurements, and thereby maximise any contribution from wanted biological variance between different classes. The generalised logarithm (glog) transform was developed to stabilise the variance in DNA microarray datasets, but has rarely been applied to metabolomics data. In particular, it has not been rigorously evaluated against other scaling techniques used in metabolomics, nor tested on all forms of NMR spectra including 1-dimensional (1D) 1H, projections of 2D 1H, 1H J-resolved (pJRES), and intact 2D J-resolved (JRES). Results Here, the effects of the glog transform are compared against two commonly used variance stabilising techniques, autoscaling and Pareto scaling, as well as unscaled data. The four methods are evaluated in terms of the effects on the variance of NMR metabolomics data and on the classification accuracy following multivariate analysis, the latter achieved using principal component analysis followed by linear discriminant analysis. For two of three datasets analysed, classification accuracies were highest following glog transformation: 100% accuracy for discriminating 1D NMR spectra of hypoxic and normoxic invertebrate muscle, and 100% accuracy for discriminating 2D JRES spectra of fish livers sampled from two rivers. For the third dataset, pJRES spectra of urine from two breeds of dog, the glog transform and autoscaling achieved equal highest accuracies. Additionally we extended the glog algorithm to effectively suppress noise, which proved critical for the analysis of 2D JRES spectra. Conclusion We have demonstrated that the glog and extended glog transforms stabilise the technical variance in NMR metabolomics datasets. This significantly improves the discrimination between sample classes and has resulted in higher classification accuracies compared to unscaled, autoscaled or Pareto scaled data. Additionally we have confirmed the broad applicability of the glog approach using three disparate datasets from different biological samples using 1D NMR spectra, 1D projections of 2D JRES spectra, and intact 2D JRES spectra. PMID:17605789

  16. Classification Techniques for Multivariate Data Analysis.

    DTIC Science & Technology

    1980-03-28

    analysis among biologists, botanists, and ecologists, while some social scientists may refer "typology". Other frequently encountered terms are pattern...the determinantal equation: lB -XW 0 (42) 49 The solutions X. are the eigenvalues of the matrix W-1 B 1 as in discriminant analysis. There are t non...Statistical Package for Social Sciences (SPSS) (14) subprogram FACTOR was used for the principal components analysis. It is designed both for the factor

  17. PyMVPA: A python toolbox for multivariate pattern analysis of fMRI data.

    PubMed

    Hanke, Michael; Halchenko, Yaroslav O; Sederberg, Per B; Hanson, Stephen José; Haxby, James V; Pollmann, Stefan

    2009-01-01

    Decoding patterns of neural activity onto cognitive states is one of the central goals of functional brain imaging. Standard univariate fMRI analysis methods, which correlate cognitive and perceptual function with the blood oxygenation-level dependent (BOLD) signal, have proven successful in identifying anatomical regions based on signal increases during cognitive and perceptual tasks. Recently, researchers have begun to explore new multivariate techniques that have proven to be more flexible, more reliable, and more sensitive than standard univariate analysis. Drawing on the field of statistical learning theory, these new classifier-based analysis techniques possess explanatory power that could provide new insights into the functional properties of the brain. However, unlike the wealth of software packages for univariate analyses, there are few packages that facilitate multivariate pattern classification analyses of fMRI data. Here we introduce a Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets. PyMVPA makes use of Python's ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine learning packages. We present the framework in this paper and provide illustrative examples on its usage, features, and programmability.

  18. PyMVPA: A Python toolbox for multivariate pattern analysis of fMRI data

    PubMed Central

    Hanke, Michael; Halchenko, Yaroslav O.; Sederberg, Per B.; Hanson, Stephen José; Haxby, James V.; Pollmann, Stefan

    2009-01-01

    Decoding patterns of neural activity onto cognitive states is one of the central goals of functional brain imaging. Standard univariate fMRI analysis methods, which correlate cognitive and perceptual function with the blood oxygenation-level dependent (BOLD) signal, have proven successful in identifying anatomical regions based on signal increases during cognitive and perceptual tasks. Recently, researchers have begun to explore new multivariate techniques that have proven to be more flexible, more reliable, and more sensitive than standard univariate analysis. Drawing on the field of statistical learning theory, these new classifier-based analysis techniques possess explanatory power that could provide new insights into the functional properties of the brain. However, unlike the wealth of software packages for univariate analyses, there are few packages that facilitate multivariate pattern classification analyses of fMRI data. Here we introduce a Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets. PyMVPA makes use of Python's ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine-learning packages. We present the framework in this paper and provide illustrative examples on its usage, features, and programmability. PMID:19184561

  19. Fourier Transform Infrared (FT-IR) and Laser Ablation Inductively Coupled Plasma-Mass Spectrometry (LA-ICP-MS) Imaging of Cerebral Ischemia: Combined Analysis of Rat Brain Thin Cuts Toward Improved Tissue Classification.

    PubMed

    Balbekova, Anna; Lohninger, Hans; van Tilborg, Geralda A F; Dijkhuizen, Rick M; Bonta, Maximilian; Limbeck, Andreas; Lendl, Bernhard; Al-Saad, Khalid A; Ali, Mohamed; Celikic, Minja; Ofner, Johannes

    2018-02-01

    Microspectroscopic techniques are widely used to complement histological studies. Due to recent developments in the field of chemical imaging, combined chemical analysis has become attractive. This technique facilitates a deepened analysis compared to single techniques or side-by-side analysis. In this study, rat brains harvested one week after induction of photothrombotic stroke were investigated. Adjacent thin cuts from rats' brains were imaged using Fourier transform infrared (FT-IR) microspectroscopy and laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS). The LA-ICP-MS data were normalized using an internal standard (a thin gold layer). The acquired hyperspectral data cubes were fused and subjected to multivariate analysis. Brain regions affected by stroke as well as unaffected gray and white matter were identified and classified using a model based on either partial least squares discriminant analysis (PLS-DA) or random decision forest (RDF) algorithms. The RDF algorithm demonstrated the best results for classification. Improved classification was observed in the case of fused data in comparison to individual data sets (either FT-IR or LA-ICP-MS). Variable importance analysis demonstrated that both molecular and elemental content contribute to the improved RDF classification. Univariate spectral analysis identified biochemical properties of the assigned tissue types. Classification of multisensor hyperspectral data sets using an RDF algorithm allows access to a novel and in-depth understanding of biochemical processes and solid chemical allocation of different brain regions.

  20. An information-based network approach for protein classification

    PubMed Central

    Wan, Xiaogeng; Zhao, Xin; Yau, Stephen S. T.

    2017-01-01

    Protein classification is one of the critical problems in bioinformatics. Early studies used geometric distances and polygenetic-tree to classify proteins. These methods use binary trees to present protein classification. In this paper, we propose a new protein classification method, whereby theories of information and networks are used to classify the multivariate relationships of proteins. In this study, protein universe is modeled as an undirected network, where proteins are classified according to their connections. Our method is unsupervised, multivariate, and alignment-free. It can be applied to the classification of both protein sequences and structures. Nine examples are used to demonstrate the efficiency of our new method. PMID:28350835

  1. Robust diagnosis of non-Hodgkin lymphoma phenotypes validated on gene expression data from different laboratories.

    PubMed

    Bhanot, Gyan; Alexe, Gabriela; Levine, Arnold J; Stolovitzky, Gustavo

    2005-01-01

    A major challenge in cancer diagnosis from microarray data is the need for robust, accurate, classification models which are independent of the analysis techniques used and can combine data from different laboratories. We propose such a classification scheme originally developed for phenotype identification from mass spectrometry data. The method uses a robust multivariate gene selection procedure and combines the results of several machine learning tools trained on raw and pattern data to produce an accurate meta-classifier. We illustrate and validate our method by applying it to gene expression datasets: the oligonucleotide HuGeneFL microarray dataset of Shipp et al. (www.genome.wi.mit.du/MPR/lymphoma) and the Hu95Av2 Affymetrix dataset (DallaFavera's laboratory, Columbia University). Our pattern-based meta-classification technique achieves higher predictive accuracies than each of the individual classifiers , is robust against data perturbations and provides subsets of related predictive genes. Our techniques predict that combinations of some genes in the p53 pathway are highly predictive of phenotype. In particular, we find that in 80% of DLBCL cases the mRNA level of at least one of the three genes p53, PLK1 and CDK2 is elevated, while in 80% of FL cases, the mRNA level of at most one of them is elevated.

  2. ASTM clustering for improving coal analysis by near-infrared spectroscopy.

    PubMed

    Andrés, J M; Bona, M T

    2006-11-15

    Multivariate analysis techniques have been applied to near-infrared (NIR) spectra coals to investigate the relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding predictor variables. In this work, a whole set of coal samples was grouped into six more homogeneous clusters following the ASTM reference method for classification prior to the application of calibration methods to each coal set. The results obtained showed a considerable improvement of the error determination compared with the calibration for the whole sample set. For some groups, the established calibrations approached the quality required by the ASTM/ISO norms for laboratory analysis. To predict property values for a new coal sample it is necessary the assignation of that sample to its respective group. Thus, the discrimination and classification ability of coal samples by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) in the NIR range was also studied by applying Soft Independent Modelling of Class Analogy (SIMCA) and Linear Discriminant Analysis (LDA) techniques. Modelling of the groups by SIMCA led to overlapping models that cannot discriminate for unique classification. On the other hand, the application of Linear Discriminant Analysis improved the classification of the samples but not enough to be satisfactory for every group considered.

  3. Multivariate cross-classification: applying machine learning techniques to characterize abstraction in neural representations

    PubMed Central

    Kaplan, Jonas T.; Man, Kingson; Greening, Steven G.

    2015-01-01

    Here we highlight an emerging trend in the use of machine learning classifiers to test for abstraction across patterns of neural activity. When a classifier algorithm is trained on data from one cognitive context, and tested on data from another, conclusions can be drawn about the role of a given brain region in representing information that abstracts across those cognitive contexts. We call this kind of analysis Multivariate Cross-Classification (MVCC), and review several domains where it has recently made an impact. MVCC has been important in establishing correspondences among neural patterns across cognitive domains, including motor-perception matching and cross-sensory matching. It has been used to test for similarity between neural patterns evoked by perception and those generated from memory. Other work has used MVCC to investigate the similarity of representations for semantic categories across different kinds of stimulus presentation, and in the presence of different cognitive demands. We use these examples to demonstrate the power of MVCC as a tool for investigating neural abstraction and discuss some important methodological issues related to its application. PMID:25859202

  4. Quantifying uncertainty in high-resolution coupled hydrodynamic-ecosystem models

    NASA Astrophysics Data System (ADS)

    Allen, J. I.; Somerfield, P. J.; Gilbert, F. J.

    2007-01-01

    Marine ecosystem models are becoming increasingly complex and sophisticated, and are being used to estimate the effects of future changes in the earth system with a view to informing important policy decisions. Despite their potential importance, far too little attention has been, and is generally, paid to model errors and the extent to which model outputs actually relate to real-world processes. With the increasing complexity of the models themselves comes an increasing complexity among model results. If we are to develop useful modelling tools for the marine environment we need to be able to understand and quantify the uncertainties inherent in the simulations. Analysing errors within highly multivariate model outputs, and relating them to even more complex and multivariate observational data, are not trivial tasks. Here we describe the application of a series of techniques, including a 2-stage self-organising map (SOM), non-parametric multivariate analysis, and error statistics, to a complex spatio-temporal model run for the period 1988-1989 in the Southern North Sea, coinciding with the North Sea Project which collected a wealth of observational data. We use model output, large spatio-temporally resolved data sets and a combination of methodologies (SOM, MDS, uncertainty metrics) to simplify the problem and to provide tractable information on model performance. The use of a SOM as a clustering tool allows us to simplify the dimensions of the problem while the use of MDS on independent data grouped according to the SOM classification allows us to validate the SOM. The combination of classification and uncertainty metrics allows us to pinpoint the variables and associated processes which require attention in each region. We recommend the use of this combination of techniques for simplifying complex comparisons of model outputs with real data, and analysis of error distributions.

  5. Attachment-based classifications of children's family drawings: psychometric properties and relations with children's adjustment in kindergarten.

    PubMed

    Pianta, R C; Longmaid, K; Ferguson, J E

    1999-06-01

    Investigated an attachment-based theoretical framework and classification system, introduced by Kaplan and Main (1986), for interpreting children's family drawings. This study concentrated on the psychometric properties of the system and the relation between drawings classified using this system and teacher ratings of classroom social-emotional and behavioral functioning, controlling for child age, ethnic status, intelligence, and fine motor skills. This nonclinical sample consisted of 200 kindergarten children of diverse racial and socioeconomic status (SES). Limited support for reliability of this classification system was obtained. Kappas for overall classifications of drawings (e.g., secure) exceeded .80 and mean kappa for discrete drawing features (e.g., figures with smiles) was .82. Coders' endorsement of the presence of certain discrete drawing features predicted their overall classification at 82.5% accuracy. Drawing classification was related to teacher ratings of classroom functioning independent of child age, sex, race, SES, intelligence, and fine motor skills (with p values for the multivariate effects ranging from .043-.001). Results are discussed in terms of the psychometric properties of this system for classifying children's representations of family and the limitations of family drawing techniques for young children.

  6. Optimizing Functional Network Representation of Multivariate Time Series

    NASA Astrophysics Data System (ADS)

    Zanin, Massimiliano; Sousa, Pedro; Papo, David; Bajo, Ricardo; García-Prieto, Juan; Pozo, Francisco Del; Menasalvas, Ernestina; Boccaletti, Stefano

    2012-09-01

    By combining complex network theory and data mining techniques, we provide objective criteria for optimization of the functional network representation of generic multivariate time series. In particular, we propose a method for the principled selection of the threshold value for functional network reconstruction from raw data, and for proper identification of the network's indicators that unveil the most discriminative information on the system for classification purposes. We illustrate our method by analysing networks of functional brain activity of healthy subjects, and patients suffering from Mild Cognitive Impairment, an intermediate stage between the expected cognitive decline of normal aging and the more pronounced decline of dementia. We discuss extensions of the scope of the proposed methodology to network engineering purposes, and to other data mining tasks.

  7. Optimizing Functional Network Representation of Multivariate Time Series

    PubMed Central

    Zanin, Massimiliano; Sousa, Pedro; Papo, David; Bajo, Ricardo; García-Prieto, Juan; Pozo, Francisco del; Menasalvas, Ernestina; Boccaletti, Stefano

    2012-01-01

    By combining complex network theory and data mining techniques, we provide objective criteria for optimization of the functional network representation of generic multivariate time series. In particular, we propose a method for the principled selection of the threshold value for functional network reconstruction from raw data, and for proper identification of the network's indicators that unveil the most discriminative information on the system for classification purposes. We illustrate our method by analysing networks of functional brain activity of healthy subjects, and patients suffering from Mild Cognitive Impairment, an intermediate stage between the expected cognitive decline of normal aging and the more pronounced decline of dementia. We discuss extensions of the scope of the proposed methodology to network engineering purposes, and to other data mining tasks. PMID:22953051

  8. Lameness detection in dairy cattle: single predictor v. multivariate analysis of image-based posture processing and behaviour and performance sensing.

    PubMed

    Van Hertem, T; Bahr, C; Schlageter Tello, A; Viazzi, S; Steensels, M; Romanini, C E B; Lokhorst, C; Maltz, E; Halachmi, I; Berckmans, D

    2016-09-01

    The objective of this study was to evaluate if a multi-sensor system (milk, activity, body posture) was a better classifier for lameness than the single-sensor-based detection models. Between September 2013 and August 2014, 3629 cow observations were collected on a commercial dairy farm in Belgium. Human locomotion scoring was used as reference for the model development and evaluation. Cow behaviour and performance was measured with existing sensors that were already present at the farm. A prototype of three-dimensional-based video recording system was used to quantify automatically the back posture of a cow. For the single predictor comparisons, a receiver operating characteristics curve was made. For the multivariate detection models, logistic regression and generalized linear mixed models (GLMM) were developed. The best lameness classification model was obtained by the multi-sensor analysis (area under the receiver operating characteristics curve (AUC)=0.757±0.029), containing a combination of milk and milking variables, activity and gait and posture variables from videos. Second, the multivariate video-based system (AUC=0.732±0.011) performed better than the multivariate milk sensors (AUC=0.604±0.026) and the multivariate behaviour sensors (AUC=0.633±0.018). The video-based system performed better than the combined behaviour and performance-based detection model (AUC=0.669±0.028), indicating that it is worthwhile to consider a video-based lameness detection system, regardless the presence of other existing sensors in the farm. The results suggest that Θ2, the feature variable for the back curvature around the hip joints, with an AUC of 0.719 is the best single predictor variable for lameness detection based on locomotion scoring. In general, this study showed that the video-based back posture monitoring system is outperforming the behaviour and performance sensing techniques for locomotion scoring-based lameness detection. A GLMM with seven specific variables (walking speed, back posture measurement, daytime activity, milk yield, lactation stage, milk peak flow rate and milk peak conductivity) is the best combination of variables for lameness classification. The accuracy on four-level lameness classification was 60.3%. The accuracy improved to 79.8% for binary lameness classification. The binary GLMM obtained a sensitivity of 68.5% and a specificity of 87.6%, which both exceed the sensitivity (52.1%±4.7%) and specificity (83.2%±2.3%) of the multi-sensor logistic regression model. This shows that the repeated measures analysis in the GLMM, taking into account the individual history of the animal, outperforms the classification when thresholds based on herd level (a statistical population) are used.

  9. Predicting decisions in human social interactions using real-time fMRI and pattern classification.

    PubMed

    Hollmann, Maurice; Rieger, Jochem W; Baecke, Sebastian; Lützkendorf, Ralf; Müller, Charles; Adolf, Daniela; Bernarding, Johannes

    2011-01-01

    Negotiation and trade typically require a mutual interaction while simultaneously resting in uncertainty which decision the partner ultimately will make at the end of the process. Assessing already during the negotiation in which direction one's counterpart tends would provide a tremendous advantage. Recently, neuroimaging techniques combined with multivariate pattern classification of the acquired data have made it possible to discriminate subjective states of mind on the basis of their neuronal activation signature. However, to enable an online-assessment of the participant's mind state both approaches need to be extended to a real-time technique. By combining real-time functional magnetic resonance imaging (fMRI) and online pattern classification techniques, we show that it is possible to predict human behavior during social interaction before the interacting partner communicates a specific decision. Average accuracy reached approximately 70% when we predicted online the decisions of volunteers playing the ultimatum game, a well-known paradigm in economic game theory. Our results demonstrate the successful online analysis of complex emotional and cognitive states using real-time fMRI, which will enable a major breakthrough for social fMRI by providing information about mental states of partners already during the mutual interaction. Interestingly, an additional whole brain classification across subjects confirmed the online results: anterior insula, ventral striatum, and lateral orbitofrontal cortex, known to act in emotional self-regulation and reward processing for adjustment of behavior, appeared to be strong determinants of later overt behavior in the ultimatum game. Using whole brain classification we were also able to discriminate between brain processes related to subjective emotional and motivational states and brain processes related to the evaluation of objective financial incentives.

  10. Analyzing thematic maps and mapping for accuracy

    USGS Publications Warehouse

    Rosenfield, G.H.

    1982-01-01

    Two problems which exist while attempting to test the accuracy of thematic maps and mapping are: (1) evaluating the accuracy of thematic content, and (2) evaluating the effects of the variables on thematic mapping. Statistical analysis techniques are applicable to both these problems and include techniques for sampling the data and determining their accuracy. In addition, techniques for hypothesis testing, or inferential statistics, are used when comparing the effects of variables. A comprehensive and valid accuracy test of a classification project, such as thematic mapping from remotely sensed data, includes the following components of statistical analysis: (1) sample design, including the sample distribution, sample size, size of the sample unit, and sampling procedure; and (2) accuracy estimation, including estimation of the variance and confidence limits. Careful consideration must be given to the minimum sample size necessary to validate the accuracy of a given. classification category. The results of an accuracy test are presented in a contingency table sometimes called a classification error matrix. Usually the rows represent the interpretation, and the columns represent the verification. The diagonal elements represent the correct classifications. The remaining elements of the rows represent errors by commission, and the remaining elements of the columns represent the errors of omission. For tests of hypothesis that compare variables, the general practice has been to use only the diagonal elements from several related classification error matrices. These data are arranged in the form of another contingency table. The columns of the table represent the different variables being compared, such as different scales of mapping. The rows represent the blocking characteristics, such as the various categories of classification. The values in the cells of the tables might be the counts of correct classification or the binomial proportions of these counts divided by either the row totals or the column totals from the original classification error matrices. In hypothesis testing, when the results of tests of multiple sample cases prove to be significant, some form of statistical test must be used to separate any results that differ significantly from the others. In the past, many analyses of the data in this error matrix were made by comparing the relative magnitudes of the percentage of correct classifications, for either individual categories, the entire map or both. More rigorous analyses have used data transformations and (or) two-way classification analysis of variance. A more sophisticated step of data analysis techniques would be to use the entire classification error matrices using the methods of discrete multivariate analysis or of multiviariate analysis of variance.

  11. Information spreading by a combination of MEG source estimation and multivariate pattern classification.

    PubMed

    Sato, Masashi; Yamashita, Okito; Sato, Masa-Aki; Miyawaki, Yoichi

    2018-01-01

    To understand information representation in human brain activity, it is important to investigate its fine spatial patterns at high temporal resolution. One possible approach is to use source estimation of magnetoencephalography (MEG) signals. Previous studies have mainly quantified accuracy of this technique according to positional deviations and dispersion of estimated sources, but it remains unclear how accurately MEG source estimation restores information content represented by spatial patterns of brain activity. In this study, using simulated MEG signals representing artificial experimental conditions, we performed MEG source estimation and multivariate pattern analysis to examine whether MEG source estimation can restore information content represented by patterns of cortical current in source brain areas. Classification analysis revealed that the corresponding artificial experimental conditions were predicted accurately from patterns of cortical current estimated in the source brain areas. However, accurate predictions were also possible from brain areas whose original sources were not defined. Searchlight decoding further revealed that this unexpected prediction was possible across wide brain areas beyond the original source locations, indicating that information contained in the original sources can spread through MEG source estimation. This phenomenon of "information spreading" may easily lead to false-positive interpretations when MEG source estimation and classification analysis are combined to identify brain areas that represent target information. Real MEG data analyses also showed that presented stimuli were able to be predicted in the higher visual cortex at the same latency as in the primary visual cortex, also suggesting that information spreading took place. These results indicate that careful inspection is necessary to avoid false-positive interpretations when MEG source estimation and multivariate pattern analysis are combined.

  12. Information spreading by a combination of MEG source estimation and multivariate pattern classification

    PubMed Central

    Sato, Masashi; Yamashita, Okito; Sato, Masa-aki

    2018-01-01

    To understand information representation in human brain activity, it is important to investigate its fine spatial patterns at high temporal resolution. One possible approach is to use source estimation of magnetoencephalography (MEG) signals. Previous studies have mainly quantified accuracy of this technique according to positional deviations and dispersion of estimated sources, but it remains unclear how accurately MEG source estimation restores information content represented by spatial patterns of brain activity. In this study, using simulated MEG signals representing artificial experimental conditions, we performed MEG source estimation and multivariate pattern analysis to examine whether MEG source estimation can restore information content represented by patterns of cortical current in source brain areas. Classification analysis revealed that the corresponding artificial experimental conditions were predicted accurately from patterns of cortical current estimated in the source brain areas. However, accurate predictions were also possible from brain areas whose original sources were not defined. Searchlight decoding further revealed that this unexpected prediction was possible across wide brain areas beyond the original source locations, indicating that information contained in the original sources can spread through MEG source estimation. This phenomenon of “information spreading” may easily lead to false-positive interpretations when MEG source estimation and classification analysis are combined to identify brain areas that represent target information. Real MEG data analyses also showed that presented stimuli were able to be predicted in the higher visual cortex at the same latency as in the primary visual cortex, also suggesting that information spreading took place. These results indicate that careful inspection is necessary to avoid false-positive interpretations when MEG source estimation and multivariate pattern analysis are combined. PMID:29912968

  13. Use of collateral information to improve LANDSAT classification accuracies

    NASA Technical Reports Server (NTRS)

    Strahler, A. H. (Principal Investigator)

    1981-01-01

    Methods to improve LANDSAT classification accuracies were investigated including: (1) the use of prior probabilities in maximum likelihood classification as a methodology to integrate discrete collateral data with continuously measured image density variables; (2) the use of the logit classifier as an alternative to multivariate normal classification that permits mixing both continuous and categorical variables in a single model and fits empirical distributions of observations more closely than the multivariate normal density function; and (3) the use of collateral data in a geographic information system as exercised to model a desired output information layer as a function of input layers of raster format collateral and image data base layers.

  14. Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding.

    PubMed

    Xu, Yun; Muhamadali, Howbeer; Sayqal, Ali; Dixon, Neil; Goodacre, Royston

    2016-10-28

    Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a "pure" regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.

  15. Tools based on multivariate statistical analysis for classification of soil and groundwater in Apulian agricultural sites.

    PubMed

    Ielpo, Pierina; Leardi, Riccardo; Pappagallo, Giuseppe; Uricchio, Vito Felice

    2017-06-01

    In this paper, the results obtained from multivariate statistical techniques such as PCA (Principal component analysis) and LDA (Linear discriminant analysis) applied to a wide soil data set are presented. The results have been compared with those obtained on a groundwater data set, whose samples were collected together with soil ones, within the project "Improvement of the Regional Agro-meteorological Monitoring Network (2004-2007)". LDA, applied to soil data, has allowed to distinguish the geographical origin of the sample from either one of the two macroaeras: Bari and Foggia provinces vs Brindisi, Lecce e Taranto provinces, with a percentage of correct prediction in cross validation of 87%. In the case of the groundwater data set, the best classification was obtained when the samples were grouped into three macroareas: Foggia province, Bari province and Brindisi, Lecce and Taranto provinces, by reaching a percentage of correct predictions in cross validation of 84%. The obtained information can be very useful in supporting soil and water resource management, such as the reduction of water consumption and the reduction of energy and chemical (nutrients and pesticides) inputs in agriculture.

  16. Using Copula Distributions to Support More Accurate Imaging-Based Diagnostic Classifiers for Neuropsychiatric Disorders

    PubMed Central

    Bansal, Ravi; Hao, Xuejun; Liu, Jun; Peterson, Bradley S.

    2014-01-01

    Many investigators have tried to apply machine learning techniques to magnetic resonance images (MRIs) of the brain in order to diagnose neuropsychiatric disorders. Usually the number of brain imaging measures (such as measures of cortical thickness and measures of local surface morphology) derived from the MRIs (i.e., their dimensionality) has been large (e.g. >10) relative to the number of participants who provide the MRI data (<100). Sparse data in a high dimensional space increases the variability of the classification rules that machine learning algorithms generate, thereby limiting the validity, reproducibility, and generalizability of those classifiers. The accuracy and stability of the classifiers can improve significantly if the multivariate distributions of the imaging measures can be estimated accurately. To accurately estimate the multivariate distributions using sparse data, we propose to estimate first the univariate distributions of imaging data and then combine them using a Copula to generate more accurate estimates of their multivariate distributions. We then sample the estimated Copula distributions to generate dense sets of imaging measures and use those measures to train classifiers. We hypothesize that the dense sets of brain imaging measures will generate classifiers that are stable to variations in brain imaging measures, thereby improving the reproducibility, validity, and generalizability of diagnostic classification algorithms in imaging datasets from clinical populations. In our experiments, we used both computer-generated and real-world brain imaging datasets to assess the accuracy of multivariate Copula distributions in estimating the corresponding multivariate distributions of real-world imaging data. Our experiments showed that diagnostic classifiers generated using imaging measures sampled from the Copula were significantly more accurate and more reproducible than were the classifiers generated using either the real-world imaging measures or their multivariate Gaussian distributions. Thus, our findings demonstrate that estimated multivariate Copula distributions can generate dense sets of brain imaging measures that can in turn be used to train classifiers, and those classifiers are significantly more accurate and more reproducible than are those generated using real-world imaging measures alone. PMID:25093634

  17. Study of the questionnaire of the Polytechnic University of Valencia (UPV) teaching staff, using students opinion survey. Statistical treatment

    NASA Astrophysics Data System (ADS)

    Martinez Gomez, Monica

    Quality improvement of university institutions represents the most important challenge in the next years, and the potential tool to achieve it is based on the institutional evaluation in general, and specially the evaluation of the teaching performance. The opinion questionnaire from the students is the most generalised tool used to evaluate the teaching performance at Spanish universities. The general objective of this thesis is to develop a statistical methodology suitable to extract, analyse and interpret the information contained in the Questionnaire of Teaching Evaluation from Student Opinion (CEDA) of the UPV, aimed at optimising its practical use. The study is centred in the application of different multivariate techniques and has been structured in three parts: (1) Evaluation of the reliability, validity and dimensionality of the tool. The multivariate method used for this purpose is the Factorial Analysis. (2) Determination of the capacity of the questionnaire to identify different profiles of lecturers based on the quality perceived by students. This target is conducted with different multivariate classification techniques: hierarchical cluster analysis, non-hierarchical and two-stage analysis. Moreover, those items that best discriminate among the teaching typologies obtained are identified in the questionnaire. (3) Identification of the teaching typologies according to different descriptive characteristics referent to the subject and lecturer, with the use of decision trees. Once identified these typologies, a new discriminant analysis is conducted aimed at identifying those items that best characterise each typology. Finally, a study is carried out with the classification method SIMCA (Soft Independent Modelling of Class Analogy) in order to determine the discriminant loading of every item among the identified teaching typologies, allowing the identification of those that best distinguish the different classes obtained. With the combined use of the proposed techniques, it is expected to optimise the use of CEDA as a measuring tool and an indicator of the teaching quality at the university, that would allow the introduction of actions for the continuous improvement in the teaching processes of the UPV.

  18. Digital soil classification and elemental mapping using imaging Vis-NIR spectroscopy: How to explicitly quantify stagnic properties of a Luvisol under Norway spruce

    NASA Astrophysics Data System (ADS)

    Kriegs, Stefanie; Buddenbaum, Henning; Rogge, Derek; Steffens, Markus

    2015-04-01

    Laboratory imaging Vis-NIR spectroscopy of soil profiles is a novel technique in soil science that can determine quantity and quality of various chemical soil properties with a hitherto unreached spatial resolution in undisturbed soil profiles. We have applied this technique to soil cores in order to get quantitative proof of redoximorphic processes under two different tree species and to proof tree-soil interactions at microscale. Due to the imaging capabilities of Vis-NIR spectroscopy a spatially explicit understanding of soil processes and properties can be achieved. Spatial heterogeneity of the soil profile can be taken into account. We took six 30 cm long rectangular soil columns of adjacent Luvisols derived from quaternary aeolian sediments (Loess) in a forest soil near Freising/Bavaria using stainless steel boxes (100×100×300 mm). Three profiles were sampled under Norway spruce and three under European beech. A hyperspectral camera (VNIR, 400-1000 nm in 160 spectral bands) with spatial resolution of 63×63 µm² per pixel was used for data acquisition. Reference samples were taken at representative spots and analysed for organic carbon (OC) quantity and quality with a CN elemental analyser and for iron oxides (Fe) content using dithionite extraction followed by ICP-OES measurement. We compared two supervised classification algorithms, Spectral Angle Mapper and Maximum Likelihood, using different sets of training areas and spectral libraries. As established in chemometrics we used multivariate analysis such as partial least-squares regression (PLSR) in addition to multivariate adaptive regression splines (MARS) to correlate chemical data with Vis-NIR spectra. As a result elemental mapping of Fe and OC within the soil core at high spatial resolution has been achieved. The regression model was validated by a new set of reference samples for chemical analysis. Digital soil classification easily visualizes soil properties within the soil profiles. By combining both techniques, detailed soil maps, elemental balances and a deeper understanding of soil forming processes at the microscale become feasible for complete soil profiles.

  19. Combination of multivariate curve resolution and multivariate classification techniques for comprehensive high-performance liquid chromatography-diode array absorbance detection fingerprints analysis of Salvia reuterana extracts.

    PubMed

    Hakimzadeh, Neda; Parastar, Hadi; Fattahi, Mohammad

    2014-01-24

    In this study, multivariate curve resolution (MCR) and multivariate classification methods are proposed to develop a new chemometric strategy for comprehensive analysis of high-performance liquid chromatography-diode array absorbance detection (HPLC-DAD) fingerprints of sixty Salvia reuterana samples from five different geographical regions. Different chromatographic problems occurred during HPLC-DAD analysis of S. reuterana samples, such as baseline/background contribution and noise, low signal-to-noise ratio (S/N), asymmetric peaks, elution time shifts, and peak overlap are handled using the proposed strategy. In this way, chromatographic fingerprints of sixty samples are properly segmented to ten common chromatographic regions using local rank analysis and then, the corresponding segments are column-wise augmented for subsequent MCR analysis. Extended multivariate curve resolution-alternating least squares (MCR-ALS) is used to obtain pure component profiles in each segment. In general, thirty-one chemical components were resolved using MCR-ALS in sixty S. reuterana samples and the lack of fit (LOF) values of MCR-ALS models were below 10.0% in all cases. Pure spectral profiles are considered for identification of chemical components by comparing their resolved spectra with the standard ones and twenty-four components out of thirty-one components were identified. Additionally, pure elution profiles are used to obtain relative concentrations of chemical components in different samples for multivariate classification analysis by principal component analysis (PCA) and k-nearest neighbors (kNN). Inspection of the PCA score plot (explaining 76.1% of variance accounted for three PCs) showed that S. reuterana samples belong to four clusters. The degree of class separation (DCS) which quantifies the distance separating clusters in relation to the scatter within each cluster is calculated for four clusters and it was in the range of 1.6-5.8. These results are then confirmed by kNN. In addition, according to the PCA loading plot and kNN dendrogram of thirty-one variables, five chemical constituents of luteolin-7-o-glucoside, salvianolic acid D, rosmarinic acid, lithospermic acid and trijuganone A are identified as the most important variables (i.e., chemical markers) for clusters discrimination. Finally, the effect of different chemical markers on samples differentiation is investigated using counter-propagation artificial neural network (CP-ANN) method. It is concluded that the proposed strategy can be successfully applied for comprehensive analysis of chromatographic fingerprints of complex natural samples. Copyright © 2013 Elsevier B.V. All rights reserved.

  20. The Japanese Histologic Classification and T-score in the Oxford Classification system could predict renal outcome in Japanese IgA nephropathy patients.

    PubMed

    Kaihan, Ahmad Baseer; Yasuda, Yoshinari; Katsuno, Takayuki; Kato, Sawako; Imaizumi, Takahiro; Ozeki, Takaya; Hishida, Manabu; Nagata, Takanobu; Ando, Masahiko; Tsuboi, Naotake; Maruyama, Shoichi

    2017-12-01

    The Oxford Classification is utilized globally, but has not been fully validated. In this study, we conducted a comparative analysis between the Oxford Classification and Japanese Histologic Classification (JHC) to predict renal outcome in Japanese patients with IgA nephropathy (IgAN). A retrospective cohort study including 86 adult IgAN patients was conducted. The Oxford Classification and the JHC were evaluated by 7 independent specialists. The JHC, MEST score in the Oxford Classification, and crescents were analyzed in association with renal outcome, defined as a 50% increase in serum creatinine. In multivariate analysis without the JHC, only the T score was significantly associated with renal outcome. While, a significant association was revealed only in the JHC on multivariate analysis with JHC. The JHC and T score in the Oxford Classification were associated with renal outcome among Japanese patients with IgAN. Superiority of the JHC as a predictive index should be validated with larger study population and cohort studies in different ethnicities.

  1. A statistical approach to root system classification

    PubMed Central

    Bodner, Gernot; Leitner, Daniel; Nakhforoosh, Alireza; Sobotik, Monika; Moder, Karl; Kaul, Hans-Peter

    2013-01-01

    Plant root systems have a key role in ecology and agronomy. In spite of fast increase in root studies, still there is no classification that allows distinguishing among distinctive characteristics within the diversity of rooting strategies. Our hypothesis is that a multivariate approach for “plant functional type” identification in ecology can be applied to the classification of root systems. The classification method presented is based on a data-defined statistical procedure without a priori decision on the classifiers. The study demonstrates that principal component based rooting types provide efficient and meaningful multi-trait classifiers. The classification method is exemplified with simulated root architectures and morphological field data. Simulated root architectures showed that morphological attributes with spatial distribution parameters capture most distinctive features within root system diversity. While developmental type (tap vs. shoot-borne systems) is a strong, but coarse classifier, topological traits provide the most detailed differentiation among distinctive groups. Adequacy of commonly available morphologic traits for classification is supported by field data. Rooting types emerging from measured data, mainly distinguished by diameter/weight and density dominated types. Similarity of root systems within distinctive groups was the joint result of phylogenetic relation and environmental as well as human selection pressure. We concluded that the data-define classification is appropriate for integration of knowledge obtained with different root measurement methods and at various scales. Currently root morphology is the most promising basis for classification due to widely used common measurement protocols. To capture details of root diversity efforts in architectural measurement techniques are essential. PMID:23914200

  2. A statistical approach to root system classification.

    PubMed

    Bodner, Gernot; Leitner, Daniel; Nakhforoosh, Alireza; Sobotik, Monika; Moder, Karl; Kaul, Hans-Peter

    2013-01-01

    Plant root systems have a key role in ecology and agronomy. In spite of fast increase in root studies, still there is no classification that allows distinguishing among distinctive characteristics within the diversity of rooting strategies. Our hypothesis is that a multivariate approach for "plant functional type" identification in ecology can be applied to the classification of root systems. The classification method presented is based on a data-defined statistical procedure without a priori decision on the classifiers. The study demonstrates that principal component based rooting types provide efficient and meaningful multi-trait classifiers. The classification method is exemplified with simulated root architectures and morphological field data. Simulated root architectures showed that morphological attributes with spatial distribution parameters capture most distinctive features within root system diversity. While developmental type (tap vs. shoot-borne systems) is a strong, but coarse classifier, topological traits provide the most detailed differentiation among distinctive groups. Adequacy of commonly available morphologic traits for classification is supported by field data. Rooting types emerging from measured data, mainly distinguished by diameter/weight and density dominated types. Similarity of root systems within distinctive groups was the joint result of phylogenetic relation and environmental as well as human selection pressure. We concluded that the data-define classification is appropriate for integration of knowledge obtained with different root measurement methods and at various scales. Currently root morphology is the most promising basis for classification due to widely used common measurement protocols. To capture details of root diversity efforts in architectural measurement techniques are essential.

  3. Multivariate detrending of fMRI signal drifts for real-time multiclass pattern classification.

    PubMed

    Lee, Dongha; Jang, Changwon; Park, Hae-Jeong

    2015-03-01

    Signal drift in functional magnetic resonance imaging (fMRI) is an unavoidable artifact that limits classification performance in multi-voxel pattern analysis of fMRI. As conventional methods to reduce signal drift, global demeaning or proportional scaling disregards regional variations of drift, whereas voxel-wise univariate detrending is too sensitive to noisy fluctuations. To overcome these drawbacks, we propose a multivariate real-time detrending method for multiclass classification that involves spatial demeaning at each scan and the recursive detrending of drifts in the classifier outputs driven by a multiclass linear support vector machine. Experiments using binary and multiclass data showed that the linear trend estimation of the classifier output drift for each class (a weighted sum of drifts in the class-specific voxels) was more robust against voxel-wise artifacts that lead to inconsistent spatial patterns and the effect of online processing than voxel-wise detrending. The classification performance of the proposed method was significantly better, especially for multiclass data, than that of voxel-wise linear detrending, global demeaning, and classifier output detrending without demeaning. We concluded that the multivariate approach using classifier output detrending of fMRI signals with spatial demeaning preserves spatial patterns, is less sensitive than conventional methods to sample size, and increases classification performance, which is a useful feature for real-time fMRI classification. Copyright © 2014 Elsevier Inc. All rights reserved.

  4. Comparison of Xenon-Enhanced Area-Detector CT and Krypton Ventilation SPECT/CT for Assessment of Pulmonary Functional Loss and Disease Severity in Smokers.

    PubMed

    Ohno, Yoshiharu; Fujisawa, Yasuko; Takenaka, Daisuke; Kaminaga, Shigeo; Seki, Shinichiro; Sugihara, Naoki; Yoshikawa, Takeshi

    2018-02-01

    The objective of this study was to compare the capability of xenon-enhanced area-detector CT (ADCT) performed with a subtraction technique and coregistered 81m Kr-ventilation SPECT/CT for the assessment of pulmonary functional loss and disease severity in smokers. Forty-six consecutive smokers (32 men and 14 women; mean age, 67.0 years) underwent prospective unenhanced and xenon-enhanced ADCT, 81m Kr-ventilation SPECT/CT, and pulmonary function tests. Disease severity was evaluated according to the Global Initiative for Chronic Obstructive Lung Disease (GOLD) classification. CT-based functional lung volume (FLV), the percentage of wall area to total airway area (WA%), and ventilated FLV on xenon-enhanced ADCT and SPECT/CT were calculated for each smoker. All indexes were correlated with percentage of forced expiratory volume in 1 second (%FEV 1 ) using step-wise regression analyses, and univariate and multivariate logistic regression analyses were performed. In addition, the diagnostic accuracy of the proposed model was compared with that of each radiologic index by means of McNemar analysis. Multivariate logistic regression showed that %FEV 1 was significantly affected (r = 0.77, r 2 = 0.59) by two factors: the first factor, ventilated FLV on xenon-enhanced ADCT (p < 0.0001); and the second factor, WA% (p = 0.004). Univariate logistic regression analyses indicated that all indexes significantly affected GOLD classification (p < 0.05). Multivariate logistic regression analyses revealed that ventilated FLV on xenon-enhanced ADCT and CT-based FLV significantly influenced GOLD classification (p < 0.0001). The diagnostic accuracy of the proposed model was significantly higher than that of ventilated FLV on SPECT/CT (p = 0.03) and WA% (p = 0.008). Xenon-enhanced ADCT is more effective than 81m Kr-ventilation SPECT/CT for the assessment of pulmonary functional loss and disease severity.

  5. FT-MIR and NIR spectral data fusion: a synergetic strategy for the geographical traceability of Panax notoginseng.

    PubMed

    Li, Yun; Zhang, Jin-Yu; Wang, Yuan-Zhong

    2018-01-01

    Three data fusion strategies (low-llevel, mid-llevel, and high-llevel) combined with a multivariate classification algorithm (random forest, RF) were applied to authenticate the geographical origins of Panax notoginseng collected from five regions of Yunnan province in China. In low-level fusion, the original data from two spectra (Fourier transform mid-IR spectrum and near-IR spectrum) were directly concatenated into a new matrix, which then was applied for the classification. Mid-level fusion was the strategy that inputted variables extracted from the spectral data into an RF classification model. The extracted variables were processed by iterate variable selection of the RF model and principal component analysis. The use of high-level fusion combined the decision making of each spectroscopic technique and resulted in an ensemble decision. The results showed that the mid-level and high-level data fusion take advantage of the information synergy from two spectroscopic techniques and had better classification performance than that of independent decision making. High-level data fusion is the most effective strategy since the classification results are better than those of the other fusion strategies: accuracy rates ranged between 93% and 96% for the low-level data fusion, between 95% and 98% for the mid-level data fusion, and between 98% and 100% for the high-level data fusion. In conclusion, the high-level data fusion strategy for Fourier transform mid-IR and near-IR spectra can be used as a reliable tool for correct geographical identification of P. notoginseng. Graphical abstract The analytical steps of Fourier transform mid-IR and near-IR spectral data fusion for the geographical traceability of Panax notoginseng.

  6. External Validation of the European Hernia Society Classification for Postoperative Complications after Incisional Hernia Repair: A Cohort Study of 2,191 Patients.

    PubMed

    Kroese, Leonard F; Kleinrensink, Gert-Jan; Lange, Johan F; Gillion, Jean-Francois

    2018-03-01

    Incisional hernia is a frequent complication after midline laparotomy. Surgical hernia repair is associated with complications, but no clear predictive risk factors have been identified. The European Hernia Society (EHS) classification offers a structured framework to describe hernias and to analyze postoperative complications. Because of its structured nature, it might prove to be useful for preoperative patient or treatment classification. The objective of this study was to investigate the EHS classification as a predictor for postoperative complications after incisional hernia surgery. An analysis was performed using a registry-based, large-scale, prospective cohort study, including all patients undergoing incisional hernia surgery between September 1, 2011 and February 29, 2016. Univariate analyses and multivariable logistic regression analysis were performed to identify risk factors for postoperative complications. A total of 2,191 patients were included, of whom 323 (15%) had 1 or more complications. Factors associated with complications in univariate analyses (p < 0.20) and clinically relevant factors were included in the multivariable analysis. In the multivariable analysis, EHS width class, incarceration, open surgery, duration of surgery, Altemeier wound class, and therapeutic antibiotic treatment were independent risk factors for postoperative complications. Third recurrence and emergency surgery were associated with fewer complications. Incisional hernia repair is associated with a 15% complication rate. The EHS width classification is associated with postoperative complications. To identify patients at risk for complications, the EHS classification is useful. Copyright © 2017. Published by Elsevier Inc.

  7. Crystallization tendency of active pharmaceutical ingredients following rapid solvent evaporation--classification and comparison with crystallization tendency from undercooled melts.

    PubMed

    Van Eerdenbrugh, Bernard; Baird, Jared A; Taylor, Lynne S

    2010-09-01

    In this study, the crystallization behavior of a variety of compounds was studied following rapid solvent evaporation using spin coating. Initial screening to determine model compound suitability was performed using a structurally diverse set of 51 compounds in three different solvent systems [dichloromethane (DCM), a 1:1 (w/w) dichloromethane/ethanol mixture (MIX), and ethanol (EtOH)]. Of this starting set of 153 drug-solvent combinations, 93 (40 compounds) were selected for further evaluation based on solubility, chemical solution stability, and processability criteria. These systems were spin coated and their crystallization was monitored using polarized light microscopy (7 days, dry conditions). The crystallization behavior of the samples could be classified as rapid (Class I: 39 cases), intermediate (Class II: 23 cases), or slow (Class III: 31 cases). The solvent system employed influenced the classification outcome for only four of the compounds. The various compounds showed very diverse crystallization behavior. Upon comparison of classification results with those of a previous study, where cooling from the melt was used as a preparation technique, a good similarity was found whereby 68% of the cases were identically classified. Multivariate analysis was performed using a set of relevant physicochemical compound characteristics. It was found that a number of these parameters tended to differ between the different classes. These could be further interpreted in terms of the nature of the crystallization process. Additional multivariate analysis on the separate classes of compounds indicated some potential in predicting the crystallization tendency of a given compound.

  8. Analysis of spreadable cheese by Raman spectroscopy and chemometric tools.

    PubMed

    Oliveira, Kamila de Sá; Callegaro, Layce de Souza; Stephani, Rodrigo; Almeida, Mariana Ramos; de Oliveira, Luiz Fernando Cappa

    2016-03-01

    In this work, FT-Raman spectroscopy was explored to evaluate spreadable cheese samples. A partial least squares discriminant analysis was employed to identify the spreadable cheese samples containing starch. To build the models, two types of samples were used: commercial samples and samples manufactured in local industries. The method of supervised classification PLS-DA was employed to classify the samples as adulterated or without starch. Multivariate regression was performed using the partial least squares method to quantify the starch in the spreadable cheese. The limit of detection obtained for the model was 0.34% (w/w) and the limit of quantification was 1.14% (w/w). The reliability of the models was evaluated by determining the confidence interval, which was calculated using the bootstrap re-sampling technique. The results show that the classification models can be used to complement classical analysis and as screening methods. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Classification of white wine aromas with an electronic nose.

    PubMed

    Lozano, J; Santos, J P; Horrillo, M C

    2005-09-15

    This paper reports the use of a tin dioxide multisensor array based electronic nose for recognition of 29 typical aromas in white wine. Headspace technique has been used to extract aroma of the wine. Multivariate analysis, including principal component analysis (PCA) as well as probabilistic neural networks (PNNs), has been used to identify the main aroma added to the wine. The results showed that in spite of the strong influence of ethanol and other majority compounds of wine, the system could discriminate correctly the aromatic compounds added to the wine with a minimum accuracy of 97.2%.

  10. Elimination of RF inhomogeneity effects in segmentation.

    PubMed

    Agus, Onur; Ozkan, Mehmed; Aydin, Kubilay

    2007-01-01

    There are various methods proposed for the segmentation and analysis of MR images. However the efficiency of these techniques is effected by various artifacts that occur in the imaging system. One of the most encountered problems is the intensity variation across an image. To overcome this problem different methods are used. In this paper we propose a method for the elimination of intensity artifacts in segmentation of MRI images. Inter imager variations are also minimized to produce the same tissue segmentation for the same patient. A well-known multivariate classification algorithm, maximum likelihood is employed to illustrate the enhancement in segmentation.

  11. Classification of M1/M2-polarized human macrophages by label-free hyperspectral reflectance confocal microscopy and multivariate analysis.

    PubMed

    Bertani, Francesca R; Mozetic, Pamela; Fioramonti, Marco; Iuliani, Michele; Ribelli, Giulia; Pantano, Francesco; Santini, Daniele; Tonini, Giuseppe; Trombetta, Marcella; Businaro, Luca; Selci, Stefano; Rainer, Alberto

    2017-08-21

    The possibility of detecting and classifying living cells in a label-free and non-invasive manner holds significant theranostic potential. In this work, Hyperspectral Imaging (HSI) has been successfully applied to the analysis of macrophagic polarization, given its central role in several pathological settings, including the regulation of tumour microenvironment. Human monocyte derived macrophages have been investigated using hyperspectral reflectance confocal microscopy, and hyperspectral datasets have been analysed in terms of M1 vs. M2 polarization by Principal Components Analysis (PCA). Following PCA, Linear Discriminant Analysis has been implemented for semi-automatic classification of macrophagic polarization from HSI data. Our results confirm the possibility to perform single-cell-level in vitro classification of M1 vs. M2 macrophages in a non-invasive and label-free manner with a high accuracy (above 98% for cells deriving from the same donor), supporting the idea of applying the technique to the study of complex interacting cellular systems, such in the case of tumour-immunity in vitro models.

  12. Application of a Novel S3 Nanowire Gas Sensor Device in Parallel with GC-MS for the Identification of Rind Percentage of Grated Parmigiano Reggiano.

    PubMed

    Abbatangelo, Marco; Núñez-Carmona, Estefanía; Sberveglieri, Veronica; Zappa, Dario; Comini, Elisabetta; Sberveglieri, Giorgio

    2018-05-18

    Parmigiano Reggiano cheese is one of the most appreciated and consumed foods worldwide, especially in Italy, for its high content of nutrients and taste. However, these characteristics make this product subject to counterfeiting in different forms. In this study, a novel method based on an electronic nose has been developed to investigate the potentiality of this tool to distinguish rind percentages in grated Parmigiano Reggiano packages that should be lower than 18%. Different samples, in terms of percentage, seasoning and rind working process, were considered to tackle the problem at 360°. In parallel, GC-MS technique was used to give a name to the compounds that characterize Parmigiano and to relate them to sensors responses. Data analysis consisted of two stages: Multivariate analysis (PLS) and classification made in a hierarchical way with PLS-DA ad ANNs. Results were promising, in terms of correct classification of the samples. The correct classification rate (%) was higher for ANNs than PLS-DA, with correct identification approaching 100 percent.

  13. Geographic identification of Boletus mushrooms by data fusion of FT-IR and UV spectroscopies combined with multivariate statistical analysis

    NASA Astrophysics Data System (ADS)

    Yao, Sen; Li, Tao; Li, JieQing; Liu, HongGao; Wang, YuanZhong

    2018-06-01

    Boletus griseus and Boletus edulis are two well-known wild-grown edible mushrooms which have high nutrition, delicious flavor and high economic value distributing in Yunnan Province. In this study, a rapid method using Fourier transform infrared (FT-IR) and ultraviolet (UV) spectroscopies coupled with data fusion was established for the discrimination of Boletus mushrooms from seven different geographical origins with pattern recognition method. Initially, the spectra of 332 mushroom samples obtained from the two spectroscopic techniques were analyzed individually and then the classification performance based on data fusion strategy was investigated. Meanwhile, the latent variables (LVs) of FT-IR and UV spectra were extracted by partial least square discriminant analysis (PLS-DA) and two datasets were concatenated into a new matrix for data fusion. Then, the fusion matrix was further analyzed by support vector machine (SVM). Compared with single spectroscopic technique, data fusion strategy can improve the classification performance effectively. In particular, the accuracy of correct classification of SVM model in training and test sets were 99.10% and 100.00%, respectively. The results demonstrated that data fusion of FT-IR and UV spectra can provide higher synergic effect for the discrimination of different geographical origins of Boletus mushrooms, which may be benefit for further authentication and quality assessment of edible mushrooms.

  14. Geographic identification of Boletus mushrooms by data fusion of FT-IR and UV spectroscopies combined with multivariate statistical analysis.

    PubMed

    Yao, Sen; Li, Tao; Li, JieQing; Liu, HongGao; Wang, YuanZhong

    2018-06-05

    Boletus griseus and Boletus edulis are two well-known wild-grown edible mushrooms which have high nutrition, delicious flavor and high economic value distributing in Yunnan Province. In this study, a rapid method using Fourier transform infrared (FT-IR) and ultraviolet (UV) spectroscopies coupled with data fusion was established for the discrimination of Boletus mushrooms from seven different geographical origins with pattern recognition method. Initially, the spectra of 332 mushroom samples obtained from the two spectroscopic techniques were analyzed individually and then the classification performance based on data fusion strategy was investigated. Meanwhile, the latent variables (LVs) of FT-IR and UV spectra were extracted by partial least square discriminant analysis (PLS-DA) and two datasets were concatenated into a new matrix for data fusion. Then, the fusion matrix was further analyzed by support vector machine (SVM). Compared with single spectroscopic technique, data fusion strategy can improve the classification performance effectively. In particular, the accuracy of correct classification of SVM model in training and test sets were 99.10% and 100.00%, respectively. The results demonstrated that data fusion of FT-IR and UV spectra can provide higher synergic effect for the discrimination of different geographical origins of Boletus mushrooms, which may be benefit for further authentication and quality assessment of edible mushrooms. Copyright © 2018 Elsevier B.V. All rights reserved.

  15. Olive oil sensory defects classification with data fusion of instrumental techniques and multivariate analysis (PLS-DA).

    PubMed

    Borràs, Eva; Ferré, Joan; Boqué, Ricard; Mestres, Montserrat; Aceña, Laura; Calvo, Angels; Busto, Olga

    2016-07-15

    Three instrumental techniques, headspace-mass spectrometry (HS-MS), mid-infrared spectroscopy (MIR) and UV-visible spectrophotometry (UV-vis), have been combined to classify virgin olive oil samples based on the presence or absence of sensory defects. The reference sensory values were provided by an official taste panel. Different data fusion strategies were studied to improve the discrimination capability compared to using each instrumental technique individually. A general model was applied to discriminate high-quality non-defective olive oils (extra-virgin) and the lowest-quality olive oils considered non-edible (lampante). A specific identification of key off-flavours, such as musty, winey, fusty and rancid, was also studied. The data fusion of the three techniques improved the classification results in most of the cases. Low-level data fusion was the best strategy to discriminate musty, winey and fusty defects, using HS-MS, MIR and UV-vis, and the rancid defect using only HS-MS and MIR. The mid-level data fusion approach using partial least squares-discriminant analysis (PLS-DA) scores was found to be the best strategy for defective vs non-defective and edible vs non-edible oil discrimination. However, the data fusion did not sufficiently improve the results obtained by a single technique (HS-MS) to classify non-defective classes. These results indicate that instrumental data fusion can be useful for the identification of sensory defects in virgin olive oils. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Fast classification and compositional analysis of cornstover fractions using Fourier transform near-infrared techniques.

    PubMed

    Philip Ye, X; Liu, Lu; Hayes, Douglas; Womac, Alvin; Hong, Kunlun; Sokhansanj, Shahab

    2008-10-01

    The objectives of this research were to determine the variation of chemical composition across botanical fractions of cornstover, and to probe the potential of Fourier transform near-infrared (FT-NIR) techniques in qualitatively classifying separated cornstover fractions and in quantitatively analyzing chemical compositions of cornstover by developing calibration models to predict chemical compositions of cornstover based on FT-NIR spectra. Large variations of cornstover chemical composition for wide calibration ranges, which is required by a reliable calibration model, were achieved by manually separating the cornstover samples into six botanical fractions, and their chemical compositions were determined by conventional wet chemical analyses, which proved that chemical composition varies significantly among different botanical fractions of cornstover. Different botanic fractions, having total saccharide content in descending order, are husk, sheath, pith, rind, leaf, and node. Based on FT-NIR spectra acquired on the biomass, classification by Soft Independent Modeling of Class Analogy (SIMCA) was employed to conduct qualitative classification of cornstover fractions, and partial least square (PLS) regression was used for quantitative chemical composition analysis. SIMCA was successfully demonstrated in classifying botanical fractions of cornstover. The developed PLS model yielded root mean square error of prediction (RMSEP %w/w) of 0.92, 1.03, 0.17, 0.27, 0.21, 1.12, and 0.57 for glucan, xylan, galactan, arabinan, mannan, lignin, and ash, respectively. The results showed the potential of FT-NIR techniques in combination with multivariate analysis to be utilized by biomass feedstock suppliers, bioethanol manufacturers, and bio-power producers in order to better manage bioenergy feedstocks and enhance bioconversion.

  17. Incremental Validity of Multidimensional Proficiency Scores from Diagnostic Classification Models: An Illustration for Elementary School Mathematics

    ERIC Educational Resources Information Center

    Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver

    2017-01-01

    Diagnostic classification models (DCMs) hold great potential for applications in summative and formative assessment by providing discrete multivariate proficiency scores that yield statistically driven classifications of students. Using data from a newly developed diagnostic arithmetic assessment that was administered to 2032 fourth-grade students…

  18. Unique Characteristics of Diagnostic Classification Models: A Comprehensive Review of the Current State-of-the-Art

    ERIC Educational Resources Information Center

    Rupp, Andre A.; Templin, Jonathan L.

    2008-01-01

    "Diagnostic classification models" (DCM) are frequently promoted by psychometricians as important modelling alternatives for analyzing response data in situations where multivariate classifications of respondents are made on the basis of multiple postulated latent skills. In this review paper, a definitional boundary of the space of DCM…

  19. Testing Multivariate Adaptive Regression Splines (MARS) as a Method of Land Cover Classification of TERRA-ASTER Satellite Images.

    PubMed

    Quirós, Elia; Felicísimo, Angel M; Cuartero, Aurora

    2009-01-01

    This work proposes a new method to classify multi-spectral satellite images based on multivariate adaptive regression splines (MARS) and compares this classification system with the more common parallelepiped and maximum likelihood (ML) methods. We apply the classification methods to the land cover classification of a test zone located in southwestern Spain. The basis of the MARS method and its associated procedures are explained in detail, and the area under the ROC curve (AUC) is compared for the three methods. The results show that the MARS method provides better results than the parallelepiped method in all cases, and it provides better results than the maximum likelihood method in 13 cases out of 17. These results demonstrate that the MARS method can be used in isolation or in combination with other methods to improve the accuracy of soil cover classification. The improvement is statistically significant according to the Wilcoxon signed rank test.

  20. Progress in the detection of neoplastic progress and cancer by Raman spectroscopy

    NASA Astrophysics Data System (ADS)

    Bakker Schut, Tom C.; Stone, Nicholas; Kendall, Catherine A.; Barr, Hugh; Bruining, Hajo A.; Puppels, Gerwin J.

    2000-05-01

    Early detection of cancer is important because of the improved survival rates when the cancer is treated early. We study the application of NIR Raman spectroscopy for detection of dysplasia because this technique is sensitive to the small changes in molecular invasive in vivo detection using fiber-optic probes. The result of an in vitro study to detect neoplastic progress of esophageal Barrett's esophageal tissue will be presented. Using multivariate statistics, we developed three different linear discriminant analysis classification models to predict tissue type on the basis of the measured spectrum. Spectra of normal, metaplastic and dysplasia tissue could be discriminated with an accuracy of up to 88 percent. Therefore Raman spectroscopy seems to be a very suitable technique to detect dysplasia in Barrett's esophageal tissue.

  1. A Hybrid Semi-supervised Classification Scheme for Mining Multisource Geospatial Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vatsavai, Raju; Bhaduri, Budhendra L

    2011-01-01

    Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of large number of accurate training samples (10 to 30 |dimensions|), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, itmore » is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately there is no convenient multivariate statistical model that can be employed for mulitsource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on real datasets, and our new hybrid approach shows over 25 to 35% improvement in overall classification accuracy over conventional classification schemes.« less

  2. Development of an ecological classification system for the Wayne National Forest

    Treesearch

    David M. Hix; Andrea M. Chech

    1993-01-01

    In 1991, a collaborative research project was initiated to create an ecological classification system for the Wayne National Forest of southeastern Ohio. The work focuses on the ecological land type (ELT) level of ecosystem classification. The most common ELTs are being identified and described using information from intensive field sampling and multivariate data...

  3. Simultaneous Force Regression and Movement Classification of Fingers via Surface EMG within a Unified Bayesian Framework.

    PubMed

    Baldacchino, Tara; Jacobs, William R; Anderson, Sean R; Worden, Keith; Rowson, Jennifer

    2018-01-01

    This contribution presents a novel methodology for myolectric-based control using surface electromyographic (sEMG) signals recorded during finger movements. A multivariate Bayesian mixture of experts (MoE) model is introduced which provides a powerful method for modeling force regression at the fingertips, while also performing finger movement classification as a by-product of the modeling algorithm. Bayesian inference of the model allows uncertainties to be naturally incorporated into the model structure. This method is tested using data from the publicly released NinaPro database which consists of sEMG recordings for 6 degree-of-freedom force activations for 40 intact subjects. The results demonstrate that the MoE model achieves similar performance compared to the benchmark set by the authors of NinaPro for finger force regression. Additionally, inherent to the Bayesian framework is the inclusion of uncertainty in the model parameters, naturally providing confidence bounds on the force regression predictions. Furthermore, the integrated clustering step allows a detailed investigation into classification of the finger movements, without incurring any extra computational effort. Subsequently, a systematic approach to assessing the importance of the number of electrodes needed for accurate control is performed via sensitivity analysis techniques. A slight degradation in regression performance is observed for a reduced number of electrodes, while classification performance is unaffected.

  4. Guidelines to classification and nomenclature of Arabian felsic plutonic rocks

    USGS Publications Warehouse

    Ramsay, C.R.; Stoeser, D.B.; Drysdall, A.R.

    1986-01-01

    Well-defined procedures for classifying the felsic plutonic rocks of the Arabian Shield on the basis of petrographic, chemical and lithostratigraphic criteria and mineral-resource potential have been adopted and developed in the Saudi Arabian Deputy Ministry for Mineral Resources over the past decade. A number of problems with conventional classification schemes have been identified and resolved; others, notably those arising from difficulties in identifying precise mineral compositions, continue to present difficulties. The petrographic nomenclature used is essentially that recommended by the International Union of Geological Sciences. Problems that have arisen include the definition of: (1) rocks with sodic, zoned or perthitic feldspar, (2) trondhjemites, and (3) alkali granites. Chemical classification has been largely based on relative molar amounts of alumina, lime and alkalis, and the use of conventional variation diagrams, but pilot studies utilizing univariate and multivariate statistical techniques have been made. The classification used in Saudi Arabia for stratigraphic purposes is a hierarchy of formation-rank units, suites and super-suites as defined in the Saudi Arabian stratigraphic code. For genetic and petrological studies, a grouping as 'associations' of similar and genetically related lithologies is commonly used. In order to indicate mineral-resource potential, the felsic plutons are classed as common, precursor, specialized or mineralized, in order of increasing exploration significance. ?? 1986.

  5. Evaluation of a multi-fibre needle Raman probe for tissue analysis

    NASA Astrophysics Data System (ADS)

    Fullwood, Leanne M.; Iping Petterson, Ingeborg E.; Dudgeon, Alexander P.; Lloyd, Gavin R.; Kendall, Catherine; Hall, Charlie; Day, John C. C.; Stone, Nick

    2016-03-01

    Raman spectroscopy is a rapid technique for the identification of cancers. Its coupling with a hypodermic needle provides a minimally invasive instrument with the potential to aid real time assessment of suspicious lesions in vivo and guide surgery. A fibre optic Raman needle probe was utilised in this study to evaluate the classification ability of the instrument as a diagnostic tool together with multivariate analysis, through measurements of tissues from different animal species as well as various different porcine tissue types. Cross validation was performed and preliminary classification accuracies were calculated as 100% for the identification of tissue type and 97.5% for the identification of animal species. A lymph node sample was also measured using the needle probe to assess the use of the technique for human tissue and hence its efficiency as a clinical instrument. This needle probe has been demonstrated to have the capabilities to classify tissue samples based on their biochemical components. The Raman needle probe also has the potential to act as a diagnostic and surgical tool to delineate cancerous from non-cancerous cells in real time, thus assisting complete removal of a tumour.

  6. Multivariate Density Estimation and Remote Sensing

    NASA Technical Reports Server (NTRS)

    Scott, D. W.

    1983-01-01

    Current efforts to develop methods and computer algorithms to effectively represent multivariate data commonly encountered in remote sensing applications are described. While this may involve scatter diagrams, multivariate representations of nonparametric probability density estimates are emphasized. The density function provides a useful graphical tool for looking at data and a useful theoretical tool for classification. This approach is called a thunderstorm data analysis.

  7. Multivariate analysis of the volatile components in tobacco based on infrared-assisted extraction coupled to headspace solid-phase microextraction and gas chromatography-mass spectrometry.

    PubMed

    Yang, Yanqin; Pan, Yuanjiang; Zhou, Guojun; Chu, Guohai; Jiang, Jian; Yuan, Kailong; Xia, Qian; Cheng, Changhe

    2016-11-01

    A novel infrared-assisted extraction coupled to headspace solid-phase microextraction followed by gas chromatography with mass spectrometry method has been developed for the rapid determination of the volatile components in tobacco. The optimal extraction conditions for maximizing the extraction efficiency were as follows: 65 μm polydimethylsiloxane-divinylbenzene fiber, extraction time of 20 min, infrared power of 175 W, and distance between the infrared lamp and the headspace vial of 2 cm. Under the optimum conditions, 50 components were found to exist in all ten tobacco samples from different geographical origins. Compared with conventional water-bath heating and nonheating extraction methods, the extraction efficiency of infrared-assisted extraction was greatly improved. Furthermore, multivariate analysis including principal component analysis, hierarchical cluster analysis, and similarity analysis were performed to evaluate the chemical information of these samples and divided them into three classifications, including rich, moderate, and fresh flavors. The above-mentioned classification results were consistent with the sensory evaluation, which was pivotal and meaningful for tobacco discrimination. As a simple, fast, cost-effective, and highly efficient method, the infrared-assisted extraction coupled to headspace solid-phase microextraction technique is powerful and promising for distinguishing the geographical origins of the tobacco samples coupled to suitable chemometrics. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Classification of Malaysia aromatic rice using multivariate statistical analysis

    NASA Astrophysics Data System (ADS)

    Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A.; Omar, O.

    2015-05-01

    Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC-MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.

  9. Visual classification of very fine-grained sediments: Evaluation through univariate and multivariate statistics

    USGS Publications Warehouse

    Hohn, M. Ed; Nuhfer, E.B.; Vinopal, R.J.; Klanderman, D.S.

    1980-01-01

    Classifying very fine-grained rocks through fabric elements provides information about depositional environments, but is subject to the biases of visual taxonomy. To evaluate the statistical significance of an empirical classification of very fine-grained rocks, samples from Devonian shales in four cored wells in West Virginia and Virginia were measured for 15 variables: quartz, illite, pyrite and expandable clays determined by X-ray diffraction; total sulfur, organic content, inorganic carbon, matrix density, bulk density, porosity, silt, as well as density, sonic travel time, resistivity, and ??-ray response measured from well logs. The four lithologic types comprised: (1) sharply banded shale, (2) thinly laminated shale, (3) lenticularly laminated shale, and (4) nonbanded shale. Univariate and multivariate analyses of variance showed that the lithologic classification reflects significant differences for the variables measured, difference that can be detected independently of stratigraphic effects. Little-known statistical methods found useful in this work included: the multivariate analysis of variance with more than one effect, simultaneous plotting of samples and variables on canonical variates, and the use of parametric ANOVA and MANOVA on ranked data. ?? 1980 Plenum Publishing Corporation.

  10. Lake bed classification using acoustic data

    USGS Publications Warehouse

    Yin, Karen K.; Li, Xing; Bonde, John; Richards, Carl; Cholwek, Gary

    1998-01-01

    As part of our effort to identify the lake bed surficial substrates using remote sensing data, this work designs pattern classifiers by multivariate statistical methods. Probability distribution of the preprocessed acoustic signal is analyzed first. A confidence region approach is then adopted to improve the design of the existing classifier. A technique for further isolation is proposed which minimizes the expected loss from misclassification. The devices constructed are applicable for real-time lake bed categorization. A mimimax approach is suggested to treat more general cases where the a priori probability distribution of the substrate types is unknown. Comparison of the suggested methods with the traditional likelihood ratio tests is discussed.

  11. Multivariate data analysis and machine learning in Alzheimer's disease with a focus on structural magnetic resonance imaging.

    PubMed

    Falahati, Farshad; Westman, Eric; Simmons, Andrew

    2014-01-01

    Machine learning algorithms and multivariate data analysis methods have been widely utilized in the field of Alzheimer's disease (AD) research in recent years. Advances in medical imaging and medical image analysis have provided a means to generate and extract valuable neuroimaging information. Automatic classification techniques provide tools to analyze this information and observe inherent disease-related patterns in the data. In particular, these classifiers have been used to discriminate AD patients from healthy control subjects and to predict conversion from mild cognitive impairment to AD. In this paper, recent studies are reviewed that have used machine learning and multivariate analysis in the field of AD research. The main focus is on studies that used structural magnetic resonance imaging (MRI), but studies that included positron emission tomography and cerebrospinal fluid biomarkers in addition to MRI are also considered. A wide variety of materials and methods has been employed in different studies, resulting in a range of different outcomes. Influential factors such as classifiers, feature extraction algorithms, feature selection methods, validation approaches, and cohort properties are reviewed, as well as key MRI-based and multi-modal based studies. Current and future trends are discussed.

  12. Fluorescent marker-based and marker-free discrimination between healthy and cancerous human tissues using hyper-spectral imaging

    NASA Astrophysics Data System (ADS)

    Arnold, Thomas; De Biasio, Martin; Leitner, Raimund

    2015-06-01

    Two problems are addressed in this paper (i) the fluorescent marker-based and the (ii) marker-free discrimination between healthy and cancerous human tissues. For both applications the performance of hyper-spectral methods are quantified. Fluorescent marker-based tissue classification uses a number of fluorescent markers to dye specific parts of a human cell. The challenge is that the emission spectra of the fluorescent dyes overlap considerably. They are, furthermore disturbed by the inherent auto-fluorescence of human tissue. This results in ambiguities and decreased image contrast causing difficulties for the treatment decision. The higher spectral resolution introduced by tunable-filter-based spectral imaging in combination with spectral unmixing techniques results in an improvement of the image contrast and therefore more reliable information for the physician to choose the treatment decision. Marker-free tissue classification is based solely on the subtle spectral features of human tissue without the use of artificial markers. The challenge in this case is that the spectral differences between healthy and cancerous tissues are subtle and embedded in intra- and inter-patient variations of these features. The contributions of this paper are (i) the evaluation of hyper-spectral imaging in combination with spectral unmixing techniques for fluorescence marker-based tissue classification, (ii) the evaluation of spectral imaging for marker-free intra surgery tissue classification. Within this paper, we consider real hyper-spectral fluorescence and endoscopy data sets to emphasize the practical capability of the proposed methods. It is shown that the combination of spectral imaging with multivariate statistical methods can improve the sensitivity and specificity of the detection and the staging of cancerous tissues compared to standard procedures.

  13. Incorporation of support vector machines in the LIBS toolbox for sensitive and robust classification amidst unexpected sample and system variability

    PubMed Central

    ChariDingari, Narahara; Barman, Ishan; Myakalwar, Ashwin Kumar; Tewari, Surya P.; Kumar, G. Manoj

    2012-01-01

    Despite the intrinsic elemental analysis capability and lack of sample preparation requirements, laser-induced breakdown spectroscopy (LIBS) has not been extensively used for real world applications, e.g. quality assurance and process monitoring. Specifically, variability in sample, system and experimental parameters in LIBS studies present a substantive hurdle for robust classification, even when standard multivariate chemometric techniques are used for analysis. Considering pharmaceutical sample investigation as an example, we propose the use of support vector machines (SVM) as a non-linear classification method over conventional linear techniques such as soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) for discrimination based on LIBS measurements. Using over-the-counter pharmaceutical samples, we demonstrate that application of SVM enables statistically significant improvements in prospective classification accuracy (sensitivity), due to its ability to address variability in LIBS sample ablation and plasma self-absorption behavior. Furthermore, our results reveal that SVM provides nearly 10% improvement in correct allocation rate and a concomitant reduction in misclassification rates of 75% (cf. PLS-DA) and 80% (cf. SIMCA)-when measurements from samples not included in the training set are incorporated in the test data – highlighting its robustness. While further studies on a wider matrix of sample types performed using different LIBS systems is needed to fully characterize the capability of SVM to provide superior predictions, we anticipate that the improved sensitivity and robustness observed here will facilitate application of the proposed LIBS-SVM toolbox for screening drugs and detecting counterfeit samples as well as in related areas of forensic and biological sample analysis. PMID:22292496

  14. Incorporation of support vector machines in the LIBS toolbox for sensitive and robust classification amidst unexpected sample and system variability.

    PubMed

    Dingari, Narahara Chari; Barman, Ishan; Myakalwar, Ashwin Kumar; Tewari, Surya P; Kumar Gundawar, Manoj

    2012-03-20

    Despite the intrinsic elemental analysis capability and lack of sample preparation requirements, laser-induced breakdown spectroscopy (LIBS) has not been extensively used for real-world applications, e.g., quality assurance and process monitoring. Specifically, variability in sample, system, and experimental parameters in LIBS studies present a substantive hurdle for robust classification, even when standard multivariate chemometric techniques are used for analysis. Considering pharmaceutical sample investigation as an example, we propose the use of support vector machines (SVM) as a nonlinear classification method over conventional linear techniques such as soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) for discrimination based on LIBS measurements. Using over-the-counter pharmaceutical samples, we demonstrate that the application of SVM enables statistically significant improvements in prospective classification accuracy (sensitivity), because of its ability to address variability in LIBS sample ablation and plasma self-absorption behavior. Furthermore, our results reveal that SVM provides nearly 10% improvement in correct allocation rate and a concomitant reduction in misclassification rates of 75% (cf. PLS-DA) and 80% (cf. SIMCA)-when measurements from samples not included in the training set are incorporated in the test data-highlighting its robustness. While further studies on a wider matrix of sample types performed using different LIBS systems is needed to fully characterize the capability of SVM to provide superior predictions, we anticipate that the improved sensitivity and robustness observed here will facilitate application of the proposed LIBS-SVM toolbox for screening drugs and detecting counterfeit samples, as well as in related areas of forensic and biological sample analysis.

  15. Multivariate classification of the infrared spectra of cell and tissue samples

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Haaland, D.M.; Jones, H.D.; Thomas, E.V.

    1997-03-01

    Infrared microspectroscopy of biopsied canine lymph cells and tissue was performed to investigate the possibility of using IR spectra coupled with multivariate classification methods to classify the samples as normal, hyperplastic, or neoplastic (malignant). IR spectra were obtained in transmission mode through BaF{sub 2} windows and in reflection mode from samples prepared on gold-coated microscope slides. Cytology and histopathology samples were prepared by a variety of methods to identify the optimal methods of sample preparation. Cytospinning procedures that yielded a monolayer of cells on the BaF{sub 2} windows produced a limited set of IR transmission spectra. These transmission spectra weremore » converted to absorbance and formed the basis for a classification rule that yielded 100{percent} correct classification in a cross-validated context. Classifications of normal, hyperplastic, and neoplastic cell sample spectra were achieved by using both partial least-squares (PLS) and principal component regression (PCR) classification methods. Linear discriminant analysis applied to principal components obtained from the spectral data yielded a small number of misclassifications. PLS weight loading vectors yield valuable qualitative insight into the molecular changes that are responsible for the success of the infrared classification. These successful classification results show promise for assisting pathologists in the diagnosis of cell types and offer future potential for {ital in vivo} IR detection of some types of cancer. {copyright} {ital 1997} {ital Society for Applied Spectroscopy}« less

  16. Identification of species and geographical strains of Sitophilus oryzae and Sitophilus zeamais using the visible/near-infrared hyperspectral imaging technique.

    PubMed

    Cao, Yang; Zhang, Chaojie; Chen, Quansheng; Li, Yanyu; Qi, Shuai; Tian, Lin; Ren, YongLin

    2015-08-01

    Identifying stored-product insects is essential for granary management. Automated, computer-based classification methods are rapidly developing in many areas. A hyperspectral imaging technique could potentially be developed to identify stored-product insect species and geographical strains. This study tested and adapted the technique using four geographical strains of each of two insect species, the rice weevil and maize weevil, to collect and analyse the resultant hyperspectral data. Three characteristic images that corresponded to the dominant wavelengths, 505, 659 and 955 nm, were selected by multivariate image analysis. Each image was processed, and 22 morphological and textural features from regions of interest were extracted as the inputs for an identification model. We found the backpropagation neural network model to be the superior method for distinguishing between the insect species and geographical strains. The overall recognition rates of the classification model for insect species were 100 and 98.13% for the calibration and prediction sets respectively, while the rates of the model for geographical strains were 94.17 and 86.88% respectively. This study has demonstrated that hyperspectral imaging, together with the appropriate recognition method, could provide a potential instrument for identifying insects and could become a useful tool for identification of Sitophilus oryzae and Sitophilus zeamais to aid in the management of stored-product insects. © 2014 Society of Chemical Industry.

  17. Photoacoustic discrimination of vascular and pigmented lesions using classical and Bayesian methods

    NASA Astrophysics Data System (ADS)

    Swearingen, Jennifer A.; Holan, Scott H.; Feldman, Mary M.; Viator, John A.

    2010-01-01

    Discrimination of pigmented and vascular lesions in skin can be difficult due to factors such as size, subungual location, and the nature of lesions containing both melanin and vascularity. Misdiagnosis may lead to precancerous or cancerous lesions not receiving proper medical care. To aid in the rapid and accurate diagnosis of such pathologies, we develop a photoacoustic system to determine the nature of skin lesions in vivo. By irradiating skin with two laser wavelengths, 422 and 530 nm, we induce photoacoustic responses, and the relative response at these two wavelengths indicates whether the lesion is pigmented or vascular. This response is due to the distinct absorption spectrum of melanin and hemoglobin. In particular, pigmented lesions have ratios of photoacoustic amplitudes of approximately 1.4 to 1 at the two wavelengths, while vascular lesions have ratios of about 4.0 to 1. Furthermore, we consider two statistical methods for conducting classification of lesions: standard multivariate analysis classification techniques and a Bayesian-model-based approach. We study 15 human subjects with eight vascular and seven pigmented lesions. Using the classical method, we achieve a perfect classification rate, while the Bayesian approach has an error rate of 20%.

  18. Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning.

    PubMed

    Formisano, Elia; De Martino, Federico; Valente, Giancarlo

    2008-09-01

    Machine learning and pattern recognition techniques are being increasingly employed in functional magnetic resonance imaging (fMRI) data analysis. By taking into account the full spatial pattern of brain activity measured simultaneously at many locations, these methods allow detecting subtle, non-strictly localized effects that may remain invisible to the conventional analysis with univariate statistical methods. In typical fMRI applications, pattern recognition algorithms "learn" a functional relationship between brain response patterns and a perceptual, cognitive or behavioral state of a subject expressed in terms of a label, which may assume discrete (classification) or continuous (regression) values. This learned functional relationship is then used to predict the unseen labels from a new data set ("brain reading"). In this article, we describe the mathematical foundations of machine learning applications in fMRI. We focus on two methods, support vector machines and relevance vector machines, which are respectively suited for the classification and regression of fMRI patterns. Furthermore, by means of several examples and applications, we illustrate and discuss the methodological challenges of using machine learning algorithms in the context of fMRI data analysis.

  19. Semi-supervised anomaly detection - towards model-independent searches of new physics

    NASA Astrophysics Data System (ADS)

    Kuusela, Mikael; Vatanen, Tommi; Malmi, Eric; Raiko, Tapani; Aaltonen, Timo; Nagai, Yoshikazu

    2012-06-01

    Most classification algorithms used in high energy physics fall under the category of supervised machine learning. Such methods require a training set containing both signal and background events and are prone to classification errors should this training data be systematically inaccurate for example due to the assumed MC model. To complement such model-dependent searches, we propose an algorithm based on semi-supervised anomaly detection techniques, which does not require a MC training sample for the signal data. We first model the background using a multivariate Gaussian mixture model. We then search for deviations from this model by fitting to the observations a mixture of the background model and a number of additional Gaussians. This allows us to perform pattern recognition of any anomalous excess over the background. We show by a comparison to neural network classifiers that such an approach is a lot more robust against misspecification of the signal MC than supervised classification. In cases where there is an unexpected signal, a neural network might fail to correctly identify it, while anomaly detection does not suffer from such a limitation. On the other hand, when there are no systematic errors in the training data, both methods perform comparably.

  20. Classification of Brazilian and foreign gasolines adulterated with alcohol using infrared spectroscopy.

    PubMed

    da Silva, Neirivaldo C; Pimentel, Maria Fernanda; Honorato, Ricardo S; Talhavini, Marcio; Maldaner, Adriano O; Honorato, Fernanda A

    2015-08-01

    The smuggling of products across the border regions of many countries is a practice to be fought. Brazilian authorities are increasingly worried about the illicit trade of fuels along the frontiers of the country. In order to confirm this as a crime, the Federal Police must have a means of identifying the origin of the fuel. This work describes the development of a rapid and nondestructive methodology to classify gasoline as to its origin (Brazil, Venezuela and Peru), using infrared spectroscopy and multivariate classification. Partial Least Squares Discriminant Analysis (PLS-DA) and Soft Independent Modeling Class Analogy (SIMCA) models were built. Direct standardization (DS) was employed aiming to standardize the spectra obtained in different laboratories of the border units of the Federal Police. Two approaches were considered in this work: (1) local and (2) global classification models. When using Approach 1, the PLS-DA achieved 100% correct classification, and the deviation of the predicted values for the secondary instrument considerably decreased after performing DS. In this case, SIMCA models were not efficient in the classification, even after standardization. Using a global model (Approach 2), both PLS-DA and SIMCA techniques were effective after performing DS. Considering that real situations may involve questioned samples from other nations (such as Peru), the SIMCA method developed according to Approach 2 is a more adequate, since the sample will be classified neither as Brazil nor Venezuelan. This methodology could be applied to other forensic problems involving the chemical classification of a product, provided that a specific modeling is performed. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  1. Microcomputer-based classification of environmental data in municipal areas

    NASA Astrophysics Data System (ADS)

    Thiergärtner, H.

    1995-10-01

    Multivariate data-processing methods used in mineral resource identification can be used to classify urban regions. Using elements of expert systems, geographical information systems, as well as known classification and prognosis systems, it is possible to outline a single model that consists of resistant and of temporary parts of a knowledge base including graphical input and output treatment and of resistant and temporary elements of a bank of methods and algorithms. Whereas decision rules created by experts will be stored in expert systems directly, powerful classification rules in form of resistant but latent (implicit) decision algorithms may be implemented in the suggested model. The latent functions will be transformed into temporary explicit decision rules by learning processes depending on the actual task(s), parameter set(s), pixels selection(s), and expert control(s). This takes place both at supervised and nonsupervised classification of multivariately described pixel sets representing municipal subareas. The model is outlined briefly and illustrated by results obtained in a target area covering a part of the city of Berlin (Germany).

  2. Multivariate pattern recognition for diagnosis and prognosis in clinical neuroimaging: state of the art, current challenges and future trends.

    PubMed

    Haller, Sven; Lovblad, Karl-Olof; Giannakopoulos, Panteleimon; Van De Ville, Dimitri

    2014-05-01

    Many diseases are associated with systematic modifications in brain morphometry and function. These alterations may be subtle, in particular at early stages of the disease progress, and thus not evident by visual inspection alone. Group-level statistical comparisons have dominated neuroimaging studies for many years, proving fascinating insight into brain regions involved in various diseases. However, such group-level results do not warrant diagnostic value for individual patients. Recently, pattern recognition approaches have led to a fundamental shift in paradigm, bringing multivariate analysis and predictive results, notably for the early diagnosis of individual patients. We review the state-of-the-art fundamentals of pattern recognition including feature selection, cross-validation and classification techniques, as well as limitations including inter-individual variation in normal brain anatomy and neurocognitive reserve. We conclude with the discussion of future trends including multi-modal pattern recognition, multi-center approaches with data-sharing and cloud-computing.

  3. Classification of Fusarium-Infected Korean Hulled Barley Using Near-Infrared Reflectance Spectroscopy and Partial Least Squares Discriminant Analysis

    PubMed Central

    Lim, Jongguk; Kim, Giyoung; Mo, Changyeun; Oh, Kyoungmin; Yoo, Hyeonchae; Ham, Hyeonheui; Kim, Moon S.

    2017-01-01

    The purpose of this study is to use near-infrared reflectance (NIR) spectroscopy equipment to nondestructively and rapidly discriminate Fusarium-infected hulled barley. Both normal hulled barley and Fusarium-infected hulled barley were scanned by using a NIR spectrometer with a wavelength range of 1175 to 2170 nm. Multiple mathematical pretreatments were applied to the reflectance spectra obtained for Fusarium discrimination and the multivariate analysis method of partial least squares discriminant analysis (PLS-DA) was used for discriminant prediction. The PLS-DA prediction model developed by applying the second-order derivative pretreatment to the reflectance spectra obtained from the side of hulled barley without crease achieved 100% accuracy in discriminating the normal hulled barley and the Fusarium-infected hulled barley. These results demonstrated the feasibility of rapid discrimination of the Fusarium-infected hulled barley by combining multivariate analysis with the NIR spectroscopic technique, which is utilized as a nondestructive detection method. PMID:28974012

  4. Novel techniques for characterization of hydrocarbon emission sources in the Barnett Shale

    NASA Astrophysics Data System (ADS)

    Nathan, Brian Joseph

    Changes in ambient atmospheric hydrocarbon concentrations can have both short-term and long-term effects on the atmosphere and on human health. Thus, accurate characterization of emissions sources is critically important. The recent boom in shale gas production has led to an increase in hydrocarbon emissions from associated processes, though the exact extent is uncertain. As an original quantification technique, a model airplane equipped with a specially-designed, open-path methane sensor was flown multiple times over a natural gas compressor station in the Barnett Shale in October 2013. A linear optimization was introduced to a standard Gaussian plume model in an effort to determine the most probable emission rate coming from the station. This is shown to be a suitable approach given an ideal source with a single, central plume. Separately, an analysis was performed to characterize the nonmethane hydrocarbons in the Barnett during the same period. Starting with ambient hourly concentration measurements of forty-six hydrocarbon species, Lagrangian air parcel trajectories were implemented in a meteorological model to extend the resolution of these measurements and achieve domain-fillings of the region for the period of interest. A self-organizing map (a type of unsupervised classification) was then utilized to reduce the dimensionality of the total multivariate set of grids into characteristic one-dimensional signatures. By also introducing a self-organizing map classification of the contemporary wind measurements, the spatial hydrocarbon characterizations are analyzed for periods with similar wind conditions. The accuracy of the classification is verified through assessment of observed spatial mixing ratio enhancements of key species, through site-comparisons with a related long-term study, and through a random forest analysis (an ensemble learning method of supervised classification) to determine the most important species for defining key classes. The hydrocarbon classification is shown to have performed very well in identifying expected signatures near and downwind-of oil and gas facilities with active permits, which showcases this method's usefulness for future regional hydrocarbon source-apportionment analyses.

  5. Discrimination of edible oils and fats by combination of multivariate pattern recognition and FT-IR spectroscopy: A comparative study between different modeling methods

    NASA Astrophysics Data System (ADS)

    Javidnia, Katayoun; Parish, Maryam; Karimi, Sadegh; Hemmateenejad, Bahram

    2013-03-01

    By using FT-IR spectroscopy, many researchers from different disciplines enrich the experimental complexity of their research for obtaining more precise information. Moreover chemometrics techniques have boosted the use of IR instruments. In the present study we aimed to emphasize on the power of FT-IR spectroscopy for discrimination between different oil samples (especially fat from vegetable oils). Also our data were used to compare the performance of different classification methods. FT-IR transmittance spectra of oil samples (Corn, Colona, Sunflower, Soya, Olive, and Butter) were measured in the wave-number interval of 450-4000 cm-1. Classification analysis was performed utilizing PLS-DA, interval PLS-DA, extended canonical variate analysis (ECVA) and interval ECVA methods. The effect of data preprocessing by extended multiplicative signal correction was investigated. Whilst all employed method could distinguish butter from vegetable oils, iECVA resulted in the best performances for calibration and external test set with 100% sensitivity and specificity.

  6. Calibration of an electronic nose for poultry farm

    NASA Astrophysics Data System (ADS)

    Abdullah, A. H.; Shukor, S. A.; Kamis, M. S.; Shakaff, A. Y. M.; Zakaria, A.; Rahim, N. A.; Mamduh, S. M.; Kamarudin, K.; Saad, F. S. A.; Masnan, M. J.; Mustafa, H.

    2017-03-01

    Malodour from the poultry farms could cause air pollution and therefore potentially dangerous to humans' and animals' health. This issue also poses sustainability risk to the poultry industries due to objections from local community. The aim of this paper is to develop and calibrate a cost effective and efficient electronic nose for poultry farm air monitoring. The instrument main components include sensor chamber, array of specific sensors, microcontroller, signal conditioning circuits and wireless sensor networks. The instrument was calibrated to allow classification of different concentrations of main volatile compounds in the poultry farm malodour. The outcome of the process will also confirm the device's reliability prior to being used for poultry farm malodour assessment. The Multivariate Analysis (HCA and KNN) and Artificial Neural Network (ANN) pattern recognition technique was used to process the acquired data. The results show that the instrument is able to calibrate the samples using ANN classification model with high accuracy. The finding verifies the instrument's performance to be used as an effective poultry farm malodour monitoring.

  7. Kinetic study of olive oil degradation monitored by fourier transform infrared spectrometry. Application to oil characterization.

    PubMed

    Román Falcó, Iván P; Grané Teruel, Nuria; Prats Moya, Soledad; Martín Carratalá, M Luisa

    2012-11-28

    A new approach for the determination of kinetic parameters of the cis/trans isomerization during the oxidation process of 24 virgin olive oils belonging to 8 different varieties is presented. The accelerated process of degradation at 100 °C was monitored by recording the Fourier transform infrared spectra. The parameters obtained confirm pseudo-first-order kinetics for the degradation of cis and the appearance of trans double bonds. The kinetic approach affords the induction time and the rate coefficient; these parameters are related to the fatty acid profile of the fresh olive oils. The data obtained were used to compare the oil stability of the samples with the help of multivariate statistical techniques. Fatty acid allowed a classification of the samples in five groups, one of them constituted by the cultivars with higher stability. Meanwhile, the kinetic parameters showed greater ability for the characterization of olive oils, allowing the classification in seven groups.

  8. Particle analysis using laser ablation mass spectroscopy

    DOEpatents

    Parker, Eric P.; Rosenthal, Stephen E.; Trahan, Michael W.; Wagner, John S.

    2003-09-09

    The present invention provides a method of quickly identifying bioaerosols by class, even if the subject bioaerosol has not been previously encountered. The method begins by collecting laser ablation mass spectra from known particles. The spectra are correlated with the known particles, including the species of particle and the classification (e.g., bacteria). The spectra can then be used to train a neural network, for example using genetic algorithm-based training, to recognize each spectra and to recognize characteristics of the classifications. The spectra can also be used in a multivariate patch algorithm. Laser ablation mass specta from unknown particles can be presented as inputs to the trained neural net for identification as to classification. The description below first describes suitable intelligent algorithms and multivariate patch algorithms, then presents an example of the present invention including results.

  9. Parental Perceptions of Their Adolescent's Weight Status: The ECHO Study

    ERIC Educational Resources Information Center

    Hearst, Mary O.; Sherwood, Nancy E.; Klein, Elizabeth G.; Pasch, Keryn E.; Lytle, Leslie A.

    2011-01-01

    Objectives: To assess the correlates of parental classification of adolescent weight status. Methods: Measured adolescent weight status was compared to parent self-report perception data (n 374 dyads) using multivariate analyses with interactions to identify characteristics associated with inaccurate parent classification of adolescent weight…

  10. Estimating the Classification Efficiency of a Test Battery.

    ERIC Educational Resources Information Center

    De Corte, Wilfried

    2000-01-01

    Shows how a theorem proven by H. Brogden (1951, 1959) can be used to estimate the allocation average (a predictor based classification of a test battery) assuming that the predictor intercorrelations and validities are known and that the predictor variables have a joint multivariate normal distribution. (SLD)

  11. Evaluation of change detection techniques for monitoring coastal zone environments

    NASA Technical Reports Server (NTRS)

    Weismiller, R. A. (Principal Investigator); Kristof, S. J.; Scholz, D. K.; Anuta, P. E.; Momin, S. M.

    1977-01-01

    The author has identified the following significant results. Four change detection techniques were designed and implemented for evaluation: (1) post classification comparison change detection, (2) delta data change detection, (3) spectral/temporal change classification, and (4) layered spectral/temporal change classification. The post classification comparison technique reliably identified areas of change and was used as the standard for qualitatively evaluating the other three techniques. The layered spectral/temporal change classification and the delta data change detection results generally agreed with the post classification comparison technique results; however, many small areas of change were not identified. Major discrepancies existed between the post classification comparison and spectral/temporal change detection results.

  12. Statistical methods and neural network approaches for classification of data from multiple sources

    NASA Technical Reports Server (NTRS)

    Benediktsson, Jon Atli; Swain, Philip H.

    1990-01-01

    Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.

  13. The influence of different classification standards of age groups on prognosis in high-grade hemispheric glioma patients.

    PubMed

    Chen, Jian-Wu; Zhou, Chang-Fu; Lin, Zhi-Xiong

    2015-09-15

    Although age is thought to correlate with the prognosis of glioma patients, the most appropriate age-group classification standard to evaluate prognosis had not been fully studied. This study aimed to investigate the influence of age-group classification standards on the prognosis of patients with high-grade hemispheric glioma (HGG). This retrospective study of 125 HGG patients used three different classification standards of age-groups (≤ 50 and >50 years old, ≤ 60 and >60 years old, ≤ 45 and 45-65 and ≥ 65 years old) to evaluate the impact of age on prognosis. The primary end-point was overall survival (OS). The Kaplan-Meier method was applied for univariate analysis and Cox proportional hazards model for multivariate analysis. Univariate analysis showed a significant correlation between OS and all three classification standards of age-groups as well as between OS and pathological grade, gender, location of glioma, and regular chemotherapy and radiotherapy treatment. Multivariate analysis showed that the only independent predictors of OS were classification standard of age-groups ≤ 50 and > 50 years old, pathological grade and regular chemotherapy. In summary, the most appropriate classification standard of age-groups as an independent prognostic factor was ≤ 50 and > 50 years old. Pathological grade and chemotherapy were also independent predictors of OS in post-operative HGG patients. Copyright © 2015. Published by Elsevier B.V.

  14. Laser-induced breakdown spectroscopy-based investigation and classification of pharmaceutical tablets using multivariate chemometric analysis

    PubMed Central

    Myakalwar, Ashwin Kumar; Sreedhar, S.; Barman, Ishan; Dingari, Narahara Chari; Rao, S. Venugopal; Kiran, P. Prem; Tewari, Surya P.; Kumar, G. Manoj

    2012-01-01

    We report the effectiveness of laser-induced breakdown spectroscopy (LIBS) in probing the content of pharmaceutical tablets and also investigate its feasibility for routine classification. This method is particularly beneficial in applications where its exquisite chemical specificity and suitability for remote and on site characterization significantly improves the speed and accuracy of quality control and assurance process. Our experiments reveal that in addition to the presence of carbon, hydrogen, nitrogen and oxygen, which can be primarily attributed to the active pharmaceutical ingredients, specific inorganic atoms were also present in all the tablets. Initial attempts at classification by a ratiometric approach using oxygen to nitrogen compositional values yielded an optimal value (at 746.83 nm) with the least relative standard deviation but nevertheless failed to provide an acceptable classification. To overcome this bottleneck in the detection process, two chemometric algorithms, i.e. principal component analysis (PCA) and soft independent modeling of class analogy (SIMCA), were implemented to exploit the multivariate nature of the LIBS data demonstrating that LIBS has the potential to differentiate and discriminate among pharmaceutical tablets. We report excellent prospective classification accuracy using supervised classification via the SIMCA algorithm, demonstrating its potential for future applications in process analytical technology, especially for fast on-line process control monitoring applications in the pharmaceutical industry. PMID:22099648

  15. Predict or classify: The deceptive role of time-locking in brain signal classification

    NASA Astrophysics Data System (ADS)

    Rusconi, Marco; Valleriani, Angelo

    2016-06-01

    Several experimental studies claim to be able to predict the outcome of simple decisions from brain signals measured before subjects are aware of their decision. Often, these studies use multivariate pattern recognition methods with the underlying assumption that the ability to classify the brain signal is equivalent to predict the decision itself. Here we show instead that it is possible to correctly classify a signal even if it does not contain any predictive information about the decision. We first define a simple stochastic model that mimics the random decision process between two equivalent alternatives, and generate a large number of independent trials that contain no choice-predictive information. The trials are first time-locked to the time point of the final event and then classified using standard machine-learning techniques. The resulting classification accuracy is above chance level long before the time point of time-locking. We then analyze the same trials using information theory. We demonstrate that the high classification accuracy is a consequence of time-locking and that its time behavior is simply related to the large relaxation time of the process. We conclude that when time-locking is a crucial step in the analysis of neural activity patterns, both the emergence and the timing of the classification accuracy are affected by structural properties of the network that generates the signal.

  16. Identification and classification of failure modes in laminated composites by using a multivariate statistical analysis of wavelet coefficients

    NASA Astrophysics Data System (ADS)

    Baccar, D.; Söffker, D.

    2017-11-01

    Acoustic Emission (AE) is a suitable method to monitor the health of composite structures in real-time. However, AE-based failure mode identification and classification are still complex to apply due to the fact that AE waves are generally released simultaneously from all AE-emitting damage sources. Hence, the use of advanced signal processing techniques in combination with pattern recognition approaches is required. In this paper, AE signals generated from laminated carbon fiber reinforced polymer (CFRP) subjected to indentation test are examined and analyzed. A new pattern recognition approach involving a number of processing steps able to be implemented in real-time is developed. Unlike common classification approaches, here only CWT coefficients are extracted as relevant features. Firstly, Continuous Wavelet Transform (CWT) is applied to the AE signals. Furthermore, dimensionality reduction process using Principal Component Analysis (PCA) is carried out on the coefficient matrices. The PCA-based feature distribution is analyzed using Kernel Density Estimation (KDE) allowing the determination of a specific pattern for each fault-specific AE signal. Moreover, waveform and frequency content of AE signals are in depth examined and compared with fundamental assumptions reported in this field. A correlation between the identified patterns and failure modes is achieved. The introduced method improves the damage classification and can be used as a non-destructive evaluation tool.

  17. A methodological approach to study the stability of selected watercolours for painting reintegration, through reflectance spectrophotometry, Fourier transform infrared spectroscopy and hyperspectral imaging

    NASA Astrophysics Data System (ADS)

    Pelosi, Claudia; Capobianco, Giuseppe; Agresti, Giorgia; Bonifazi, Giuseppe; Morresi, Fabio; Rossi, Sara; Santamaria, Ulderico; Serranti, Silvia

    2018-06-01

    The aim of this work is to investigate the stability to simulated solar radiation of some paintings samples through a new methodological approach adopting non-invasive spectroscopic techniques. In particular, commercial watercolours and iron oxide based pigments were used, these last ones being prepared for the experimental by gum Arabic in order to propose a possible substitute for traditional reintegration materials. Reflectance spectrophotometry in the visible range and Hyperspectral Imaging in the short wave infrared were chosen as non-invasive techniques for evaluation the stability to irradiation of the chosen pigments. These were studied before and after artificial ageing procedure performed in Solar Box chamber under controlled conditions. Data were treated and elaborated in order to evaluate the sensitivity of the chosen techniques in identifying the variations on paint layers, induced by photo-degradation, before they could be observed by eye. Furthermore a supervised classification method for monitoring the painted surface changes adopting a multivariate approach was successfully applied.

  18. Portable Infrared Laser Spectroscopy for On-site Mycotoxin Analysis.

    PubMed

    Sieger, Markus; Kos, Gregor; Sulyok, Michael; Godejohann, Matthias; Krska, Rudolf; Mizaikoff, Boris

    2017-03-09

    Mycotoxins are toxic secondary metabolites of fungi that spoil food, and severely impact human health (e.g., causing cancer). Therefore, the rapid determination of mycotoxin contamination including deoxynivalenol and aflatoxin B 1 in food and feed samples is of prime interest for commodity importers and processors. While chromatography-based techniques are well established in laboratory environments, only very few (i.e., mostly immunochemical) techniques exist enabling direct on-site analysis for traders and manufacturers. In this study, we present MYCOSPEC - an innovative approach for spectroscopic mycotoxin contamination analysis at EU regulatory limits for the first time utilizing mid-infrared tunable quantum cascade laser (QCL) spectroscopy. This analysis technique facilitates on-site mycotoxin analysis by combining QCL technology with GaAs/AlGaAs thin-film waveguides. Multivariate data mining strategies (i.e., principal component analysis) enabled the classification of deoxynivalenol-contaminated maize and wheat samples, and of aflatoxin B 1 affected peanuts at EU regulatory limits of 1250 μg kg -1 and 8 μg kg -1 , respectively.

  19. Portable Infrared Laser Spectroscopy for On-site Mycotoxin Analysis

    PubMed Central

    Sieger, Markus; Kos, Gregor; Sulyok, Michael; Godejohann, Matthias; Krska, Rudolf; Mizaikoff, Boris

    2017-01-01

    Mycotoxins are toxic secondary metabolites of fungi that spoil food, and severely impact human health (e.g., causing cancer). Therefore, the rapid determination of mycotoxin contamination including deoxynivalenol and aflatoxin B1 in food and feed samples is of prime interest for commodity importers and processors. While chromatography-based techniques are well established in laboratory environments, only very few (i.e., mostly immunochemical) techniques exist enabling direct on-site analysis for traders and manufacturers. In this study, we present MYCOSPEC - an innovative approach for spectroscopic mycotoxin contamination analysis at EU regulatory limits for the first time utilizing mid-infrared tunable quantum cascade laser (QCL) spectroscopy. This analysis technique facilitates on-site mycotoxin analysis by combining QCL technology with GaAs/AlGaAs thin-film waveguides. Multivariate data mining strategies (i.e., principal component analysis) enabled the classification of deoxynivalenol-contaminated maize and wheat samples, and of aflatoxin B1 affected peanuts at EU regulatory limits of 1250 μg kg−1 and 8 μg kg−1, respectively. PMID:28276454

  20. Portable Infrared Laser Spectroscopy for On-site Mycotoxin Analysis

    NASA Astrophysics Data System (ADS)

    Sieger, Markus; Kos, Gregor; Sulyok, Michael; Godejohann, Matthias; Krska, Rudolf; Mizaikoff, Boris

    2017-03-01

    Mycotoxins are toxic secondary metabolites of fungi that spoil food, and severely impact human health (e.g., causing cancer). Therefore, the rapid determination of mycotoxin contamination including deoxynivalenol and aflatoxin B1 in food and feed samples is of prime interest for commodity importers and processors. While chromatography-based techniques are well established in laboratory environments, only very few (i.e., mostly immunochemical) techniques exist enabling direct on-site analysis for traders and manufacturers. In this study, we present MYCOSPEC - an innovative approach for spectroscopic mycotoxin contamination analysis at EU regulatory limits for the first time utilizing mid-infrared tunable quantum cascade laser (QCL) spectroscopy. This analysis technique facilitates on-site mycotoxin analysis by combining QCL technology with GaAs/AlGaAs thin-film waveguides. Multivariate data mining strategies (i.e., principal component analysis) enabled the classification of deoxynivalenol-contaminated maize and wheat samples, and of aflatoxin B1 affected peanuts at EU regulatory limits of 1250 μg kg-1 and 8 μg kg-1, respectively.

  1. Characterization of the volatile components in green tea by IRAE-HS-SPME/GC-MS combined with multivariate analysis.

    PubMed

    Yang, Yan-Qin; Yin, Hong-Xu; Yuan, Hai-Bo; Jiang, Yong-Wen; Dong, Chun-Wang; Deng, Yu-Liang

    2018-01-01

    In the present work, a novel infrared-assisted extraction coupled to headspace solid-phase microextraction (IRAE-HS-SPME) followed by gas chromatography-mass spectrometry (GC-MS) was developed for rapid determination of the volatile components in green tea. The extraction parameters such as fiber type, sample amount, infrared power, extraction time, and infrared lamp distance were optimized by orthogonal experimental design. Under optimum conditions, a total of 82 volatile compounds in 21 green tea samples from different geographical origins were identified. Compared with classical water-bath heating, the proposed technique has remarkable advantages of considerably reducing the analytical time and high efficiency. In addition, an effective classification of green teas based on their volatile profiles was achieved by partial least square-discriminant analysis (PLS-DA) and hierarchical clustering analysis (HCA). Furthermore, the application of a dual criterion based on the variable importance in the projection (VIP) values of the PLS-DA models and on the category from one-way univariate analysis (ANOVA) allowed the identification of 12 potential volatile markers, which were considered to make the most important contribution to the discrimination of the samples. The results suggest that IRAE-HS-SPME/GC-MS technique combined with multivariate analysis offers a valuable tool to assess geographical traceability of different tea varieties.

  2. Characterization of the volatile components in green tea by IRAE-HS-SPME/GC-MS combined with multivariate analysis

    PubMed Central

    Yin, Hong-Xu; Yuan, Hai-Bo; Jiang, Yong-Wen; Dong, Chun-Wang; Deng, Yu-Liang

    2018-01-01

    In the present work, a novel infrared-assisted extraction coupled to headspace solid-phase microextraction (IRAE-HS-SPME) followed by gas chromatography-mass spectrometry (GC-MS) was developed for rapid determination of the volatile components in green tea. The extraction parameters such as fiber type, sample amount, infrared power, extraction time, and infrared lamp distance were optimized by orthogonal experimental design. Under optimum conditions, a total of 82 volatile compounds in 21 green tea samples from different geographical origins were identified. Compared with classical water-bath heating, the proposed technique has remarkable advantages of considerably reducing the analytical time and high efficiency. In addition, an effective classification of green teas based on their volatile profiles was achieved by partial least square-discriminant analysis (PLS-DA) and hierarchical clustering analysis (HCA). Furthermore, the application of a dual criterion based on the variable importance in the projection (VIP) values of the PLS-DA models and on the category from one-way univariate analysis (ANOVA) allowed the identification of 12 potential volatile markers, which were considered to make the most important contribution to the discrimination of the samples. The results suggest that IRAE-HS-SPME/GC-MS technique combined with multivariate analysis offers a valuable tool to assess geographical traceability of different tea varieties. PMID:29494626

  3. Spatial patterns of brain atrophy in MCI patients, identified via high-dimensional pattern classification, predict subsequent cognitive decline

    PubMed Central

    Fan, Yong; Batmanghelich, Nematollah; Clark, Chris M.; Davatzikos, Christos

    2010-01-01

    Spatial patterns of brain atrophy in mild cognitive impairment (MCI) and Alzheimer’s disease (AD) were measured via methods of computational neuroanatomy. These patterns were spatially complex and involved many brain regions. In addition to the hippocampus and the medial temporal lobe gray matter, a number of other regions displayed significant atrophy, including orbitofrontal and medial-prefrontal grey matter, cingulate (mainly posterior), insula, uncus, and temporal lobe white matter. Approximately 2/3 of the MCI group presented patterns of atrophy that overlapped with AD, whereas the remaining 1/3 overlapped with cognitively normal individuals, thereby indicating that some, but not all, MCI patients have significant and extensive brain atrophy in this cohort of MCI patients. Importantly, the group with AD-like patterns presented much higher rate of MMSE decline in follow-up visits; conversely, pattern classification provided relatively high classification accuracy (87%) of the individuals that presented relatively higher MMSE decline within a year from baseline. High-dimensional pattern classification, a nonlinear multivariate analysis, provided measures of structural abnormality that can potentially be useful for individual patient classification, as well as for predicting progression and examining multivariate relationships in group analyses. PMID:18053747

  4. Interpreting support vector machine models for multivariate group wise analysis in neuroimaging

    PubMed Central

    Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos

    2015-01-01

    Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913

  5. Authentication of Trappist beers by LC-MS fingerprints and multivariate data analysis.

    PubMed

    Mattarucchi, Elia; Stocchero, Matteo; Moreno-Rojas, José Manuel; Giordano, Giuseppe; Reniero, Fabiano; Guillou, Claude

    2010-12-08

    The aim of this study was to asses the applicability of LC-MS profiling to authenticate a selected Trappist beer as part of a program on traceability funded by the European Commission. A total of 232 beers were fingerprinted and classified through multivariate data analysis. The selected beer was clearly distinguished from beers of different brands, while only 3 samples (3.5% of the test set) were wrongly classified when compared with other types of beer of the same Trappist brewery. The fingerprints were further analyzed to extract the most discriminating variables, which proved to be sufficient for classification, even using a simplified unsupervised model. This reduced fingerprint allowed us to study the influence of batch-to-batch variability on the classification model. Our results can easily be applied to different matrices and they confirmed the effectiveness of LC-MS profiling in combination with multivariate data analysis for the characterization of food products.

  6. Computerized recognition of persons by EEG spectral patterns.

    PubMed

    Stassen, H H

    1980-07-01

    Modified techniques of communication theory in connection with multivariate statistical procedures were applied to a sample of 82 patients for the purpose of defining EEG spectral patterns and for solving the relevant classification problems. Ten measurements per patient were made and it could be shown that a subject can be characterized and be recognized by his EEG spectral pattern with high reliability and a confidence probability of almost 90%. This result is valid not only for normal adults but also for schizophrenic patients, implying a close relationship between the EEG spectral pattern and the individual person. At the moment the nature of this relationship is not clear; in particular the supposed relationship to psychopathology could not be proved.

  7. Combination of complementary data mining methods for geographical characterization of extra virgin olive oils based on mineral composition.

    PubMed

    Sayago, Ana; González-Domínguez, Raúl; Beltrán, Rafael; Fernández-Recamales, Ángeles

    2018-09-30

    This work explores the potential of multi-element fingerprinting in combination with advanced data mining strategies to assess the geographical origin of extra virgin olive oil samples. For this purpose, the concentrations of 55 elements were determined in 125 oil samples from multiple Spanish geographic areas. Several unsupervised and supervised multivariate statistical techniques were used to build classification models and investigate the relationship between mineral composition of olive oils and their provenance. Results showed that Spanish extra virgin olive oils exhibit characteristic element profiles, which can be differentiated on the basis of their origin in accordance with three geographical areas: Atlantic coast (Huelva province), Mediterranean coast and inland regions. Furthermore, statistical modelling yielded high sensitivity and specificity, principally when random forest and support vector machines were employed, thus demonstrating the utility of these techniques in food traceability and authenticity research. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Profiling and classification of French propolis by combined multivariate data analysis of planar chromatograms and scanning direct analysis in real time mass spectra.

    PubMed

    Chasset, Thibaut; Häbe, Tim T; Ristivojevic, Petar; Morlock, Gertrud E

    2016-09-23

    Quality control of propolis is challenging, as it is a complex natural mixture of compounds, and thus, very difficult to analyze and standardize. Shown on the example of 30 French propolis samples, a strategy for an improved quality control was demonstrated in which high-performance thin-layer chromatography (HPTLC) fingerprints were evaluated in combination with selected mass signals obtained by desorption-based scanning mass spectrometry (MS). The French propolis sample extracts were separated by a newly developed reversed phase (RP)-HPTLC method. The fingerprints obtained by two different detection modes, i.e. after (1) derivatization and fluorescence detection (FLD) at UV 366nm and (2) scanning direct analysis in real time (DART)-MS, were analyzed by multivariate data analysis. Thus, RP-HPTLC-FLD and RP-HPTLC-DART-MS fingerprints were explored and the best classification was obtained using both methods in combination with pattern recognition techniques, such as principal component analysis. All investigated French propolis samples were divided in two types and characteristic patterns were observed. Phenolic compounds such as caffeic acid, p-coumaric acid, chrysin, pinobanksin, pinobanksin-3-acetate, galangin, kaempferol, tectochrysin and pinocembrin were identified as characteristic marker compounds of French propolis samples. This study expanded the research on the European poplar type of propolis and confirmed the presence of two botanically different types of propolis, known as the blue and orange types. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Fast classification of hazelnut cultivars through portable infrared spectroscopy and chemometrics

    NASA Astrophysics Data System (ADS)

    Manfredi, Marcello; Robotti, Elisa; Quasso, Fabio; Mazzucco, Eleonora; Calabrese, Giorgio; Marengo, Emilio

    2018-01-01

    The authentication and traceability of hazelnuts is very important for both the consumer and the food industry, to safeguard the protected varieties and the food quality. This study investigates the use of a portable FTIR spectrometer coupled to multivariate statistical analysis for the classification of raw hazelnuts. The method discriminates hazelnuts from different origins/cultivars based on differences of the signal intensities of their IR spectra. The multivariate classification methods, namely principal component analysis (PCA) followed by linear discriminant analysis (LDA) and partial least square discriminant analysis (PLS-DA), with or without variable selection, allowed a very good discrimination among the groups, with PLS-DA coupled to variable selection providing the best results. Due to the fast analysis, high sensitivity, simplicity and no sample preparation, the proposed analytical methodology could be successfully used to verify the cultivar of hazelnuts, and the analysis can be performed quickly and directly on site.

  10. Gravitational Wave Detection of Compact Binaries Through Multivariate Analysis

    NASA Astrophysics Data System (ADS)

    Atallah, Dany Victor; Dorrington, Iain; Sutton, Patrick

    2017-01-01

    The first detection of gravitational waves (GW), GW150914, as produced by a binary black hole merger, has ushered in the era of GW astronomy. The detection technique used to find GW150914 considered only a fraction of the information available describing the candidate event: mainly the detector signal to noise ratios and chi-squared values. In hopes of greatly increasing detection rates, we want to take advantage of all the information available about candidate events. We employ a technique called Multivariate Analysis (MVA) to improve LIGO sensitivity to GW signals. MVA techniques are efficient ways to scan high dimensional data spaces for signal/noise classification. Our goal is to use MVA to classify compact-object binary coalescence (CBC) events composed of any combination of black holes and neutron stars. CBC waveforms are modeled through numerical relativity. Templates of the modeled waveforms are used to search for CBCs and quantify candidate events. Different MVA pipelines are under investigation to look for CBC signals and un-modelled signals, with promising results. One such MVA pipeline used for the un-modelled search can theoretically analyze far more data than the MVA pipelines currently explored for CBCs, potentially making a more powerful classifier. In principle, this extra information could improve the sensitivity to GW signals. We will present the results from our efforts to adapt an MVA pipeline used in the un-modelled search to classify candidate events from the CBC search.

  11. Texture as a basis for acoustic classification of substrate in the nearshore region

    NASA Astrophysics Data System (ADS)

    Dennison, A.; Wattrus, N. J.

    2016-12-01

    Segmentation and classification of substrate type from two locations in Lake Superior, are predicted using multivariate statistical processing of textural measures derived from shallow-water, high-resolution multibeam bathymetric data. During a multibeam sonar survey, both bathymetric and backscatter data are collected. It is well documented that the statistical characteristic of a sonar backscatter mosaic is dependent on substrate type. While classifying the bottom-type on the basis on backscatter alone can accurately predict and map bottom-type, it lacks the ability to resolve and capture fine textural details, an important factor in many habitat mapping studies. Statistical processing can capture the pertinent details about the bottom-type that are rich in textural information. Further multivariate statistical processing can then isolate characteristic features, and provide the basis for an accurate classification scheme. Preliminary results from an analysis of bathymetric data and ground-truth samples collected from the Amnicon River, Superior, Wisconsin, and the Lester River, Duluth, Minnesota, demonstrate the ability to process and develop a novel classification scheme of the bottom type in two geomorphologically distinct areas.

  12. A Deep Learning Architecture for Temporal Sleep Stage Classification Using Multivariate and Multimodal Time Series.

    PubMed

    Chambon, Stanislas; Galtier, Mathieu N; Arnal, Pierrick J; Wainrib, Gilles; Gramfort, Alexandre

    2018-04-01

    Sleep stage classification constitutes an important preliminary exam in the diagnosis of sleep disorders. It is traditionally performed by a sleep expert who assigns to each 30 s of the signal of a sleep stage, based on the visual inspection of signals such as electroencephalograms (EEGs), electrooculograms (EOGs), electrocardiograms, and electromyograms (EMGs). We introduce here the first deep learning approach for sleep stage classification that learns end-to-end without computing spectrograms or extracting handcrafted features, that exploits all multivariate and multimodal polysomnography (PSG) signals (EEG, EMG, and EOG), and that can exploit the temporal context of each 30-s window of data. For each modality, the first layer learns linear spatial filters that exploit the array of sensors to increase the signal-to-noise ratio, and the last layer feeds the learnt representation to a softmax classifier. Our model is compared to alternative automatic approaches based on convolutional networks or decisions trees. Results obtained on 61 publicly available PSG records with up to 20 EEG channels demonstrate that our network architecture yields the state-of-the-art performance. Our study reveals a number of insights on the spatiotemporal distribution of the signal of interest: a good tradeoff for optimal classification performance measured with balanced accuracy is to use 6 EEG with 2 EOG (left and right) and 3 EMG chin channels. Also exploiting 1 min of data before and after each data segment offers the strongest improvement when a limited number of channels are available. As sleep experts, our system exploits the multivariate and multimodal nature of PSG signals in order to deliver the state-of-the-art classification performance with a small computational cost.

  13. Risk factors associated with oroantral perforation during surgical removal of maxillary third molar teeth.

    PubMed

    Hasegawa, Takumi; Tachibana, Akira; Takeda, Daisuke; Iwata, Eiji; Arimoto, Satomi; Sakakibara, Akiko; Akashi, Masaya; Komori, Takahide

    2016-12-01

    The relationship between radiographic findings and the occurrence of oroantral perforation is controversial. Few studies have quantitatively analyzed the risk factors contributing to oroantral perforation, and no study has reported multivariate analysis of the relationship(s) between these various factors. This retrospective study aims to fill this void. Various risk factors for oroantral perforation during maxillary third molar extraction were investigated by univariate and multivariate analysis. The proximity of the roots to the maxillary sinus floor (root-sinus [RS] classification) was assessed using panoramic radiography and classified as types 1-5. The relationship between the maxillary second and third molars was classified according to a modified version of the Archer classification. The relative depth of the maxillary third molar in the bone was classified as class A-C, and its angulation relative to the long axis of the second molar was also recorded. Performance of an incision (OR 5.16), mesioangular tooth angulation (OR 6.05), and type 3 RS classification (i.e., significant superimposition of the roots of all posterior maxillary teeth with the sinus floor; OR 10.18) were all identified as risk factors with significant association to an outcome of oroantral perforation. To our knowledge, this is the first multivariate analysis of the risk factors for oroantral perforation during surgical extraction of the maxillary third molar. This RS classification may offer a new predictive parameter for estimating the risk of oroantral perforation.

  14. Pattern recognition analysis and classification modeling of selenium-producing areas

    USGS Publications Warehouse

    Naftz, D.L.

    1996-01-01

    Established chemometric and geochemical techniques were applied to water quality data from 23 National Irrigation Water Quality Program (NIWQP) study areas in the Western United States. These techniques were applied to the NIWQP data set to identify common geochemical processes responsible for mobilization of selenium and to develop a classification model that uses major-ion concentrations to identify areas that contain elevated selenium concentrations in water that could pose a hazard to water fowl. Pattern recognition modeling of the simple-salt data computed with the SNORM geochemical program indicate three principal components that explain 95% of the total variance. A three-dimensional plot of PC 1, 2 and 3 scores shows three distinct clusters that correspond to distinct hydrochemical facies denoted as facies 1, 2 and 3. Facies 1 samples are distinguished by water samples without the CaCO3 simple salt and elevated concentrations of NaCl, CaSO4, MgSO4 and Na2SO4 simple salts relative to water samples in facies 2 and 3. Water samples in facies 2 are distinguished from facies 1 by the absence of the MgSO4 simple salt and the presence of the CaCO3 simple salt. Water samples in facies 3 are similar to samples in facies 2, with the absence of both MgSO4 and CaSO4 simple salts. Water samples in facies 1 have the largest selenium concentration (10 ??gl-1), compared to a median concentration of 2.0 ??gl-1 and less than 1.0 ??gl-1 for samples in facies 2 and 3. A classification model using the soft independent modeling by class analogy (SIMCA) algorithm was constructed with data from the NIWQP study areas. The classification model was successful in identifying water samples with a selenium concentration that is hazardous to some species of water-fowl from a test data set comprised of 2,060 water samples from throughout Utah and Wyoming. Application of chemometric and geochemical techniques during data synthesis analysis of multivariate environmental databases from other national-scale environmental programs such as the NIWQP could also provide useful insights for addressing 'real world' environmental problems.

  15. Multivariate decoding of cerebral blood flow measures in a clinical model of on-going postsurgical pain.

    PubMed

    O'Muircheartaigh, Jonathan; Marquand, Andre; Hodkinson, Duncan J; Krause, Kristina; Khawaja, Nadine; Renton, Tara F; Huggins, John P; Vennart, William; Williams, Steven C R; Howard, Matthew A

    2015-02-01

    Recent reports of multivariate machine learning (ML) techniques have highlighted their potential use to detect prognostic and diagnostic markers of pain. However, applications to date have focussed on acute experimental nociceptive stimuli rather than clinically relevant pain states. These reports have coincided with others describing the application of arterial spin labeling (ASL) to detect changes in regional cerebral blood flow (rCBF) in patients with on-going clinical pain. We combined these acquisition and analysis methodologies in a well-characterized postsurgical pain model. The principal aims were (1) to assess the classification accuracy of rCBF indices acquired prior to and following surgical intervention and (2) to optimise the amount of data required to maintain accurate classification. Twenty male volunteers, requiring bilateral, lower jaw third molar extraction (TME), underwent ASL examination prior to and following individual left and right TME, representing presurgical and postsurgical states, respectively. Six ASL time points were acquired at each exam. Each ASL image was preceded by visual analogue scale assessments of alertness and subjective pain experiences. Using all data from all sessions, an independent Gaussian Process binary classifier successfully discriminated postsurgical from presurgical states with 94.73% accuracy; over 80% accuracy could be achieved using half of the data (equivalent to 15 min scan time). This work demonstrates the concept and feasibility of time-efficient, probabilistic prediction of clinically relevant pain at the individual level. We discuss the potential of ML techniques to impact on the search for novel approaches to diagnosis, management, and treatment to complement conventional patient self-reporting. © 2014 The Authors. Human Brain Mapping Published by Wiley Periodicals, Inc. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

  16. Metabolite profiling in retinoblastoma identifies novel clinicopathological subgroups

    PubMed Central

    Kohe, Sarah; Brundler, Marie-Anne; Jenkinson, Helen; Parulekar, Manoj; Wilson, Martin; Peet, Andrew C; McConville, Carmel M

    2015-01-01

    Background: Tumour classification, based on histopathology or molecular pathology, is of value to predict tumour behaviour and to select appropriate treatment. In retinoblastoma, pathology information is not available at diagnosis and only exists for enucleated tumours. Alternative methods of tumour classification, using noninvasive techniques such as magnetic resonance spectroscopy, are urgently required to guide treatment decisions at the time of diagnosis. Methods: High-resolution magic-angle spinning magnetic resonance spectroscopy (HR-MAS MRS) was undertaken on enucleated retinoblastomas. Principal component analysis and cluster analysis of the HR-MAS MRS data was used to identify tumour subgroups. Individual metabolite concentrations were determined and were correlated with histopathological risk factors for each group. Results: Multivariate analysis identified three metabolic subgroups of retinoblastoma, with the most discriminatory metabolites being taurine, hypotaurine, total-choline and creatine. Metabolite concentrations correlated with specific histopathological features: taurine was correlated with differentiation, total-choline and phosphocholine with retrolaminar optic nerve invasion, and total lipids with necrosis. Conclusions: We have demonstrated that a metabolite-based classification of retinoblastoma can be obtained using ex vivo magnetic resonance spectroscopy, and that the subgroups identified correlate with histopathological features. This result justifies future studies to validate the clinical relevance of these subgroups and highlights the potential of in vivo MRS as a noninvasive diagnostic tool for retinoblastoma patient stratification. PMID:26348444

  17. Big genomics and clinical data analytics strategies for precision cancer prognosis.

    PubMed

    Ow, Ghim Siong; Kuznetsov, Vladimir A

    2016-11-07

    The field of personalized and precise medicine in the era of big data analytics is growing rapidly. Previously, we proposed our model of patient classification termed Prognostic Signature Vector Matching (PSVM) and identified a 37 variable signature comprising 36 let-7b associated prognostic significant mRNAs and the age risk factor that stratified large high-grade serous ovarian cancer patient cohorts into three survival-significant risk groups. Here, we investigated the predictive performance of PSVM via optimization of the prognostic variable weights, which represent the relative importance of one prognostic variable over the others. In addition, we compared several multivariate prognostic models based on PSVM with classical machine learning techniques such as K-nearest-neighbor, support vector machine, random forest, neural networks and logistic regression. Our results revealed that negative log-rank p-values provides more robust weight values as opposed to the use of other quantities such as hazard ratios, fold change, or a combination of those factors. PSVM, together with the classical machine learning classifiers were combined in an ensemble (multi-test) voting system, which collectively provides a more precise and reproducible patient stratification. The use of the multi-test system approach, rather than the search for the ideal classification/prediction method, might help to address limitations of the individual classification algorithm in specific situation.

  18. Classification of wheat: Badhwar profile similarity technique

    NASA Technical Reports Server (NTRS)

    Austin, W. W.

    1980-01-01

    The Badwar profile similarity classification technique used successfully for classification of corn was applied to spring wheat classifications. The software programs and the procedures used to generate full-scene classifications are presented, and numerical results of the acreage estimations are given.

  19. R-parametrization and its role in classification of linear multivariable feedback systems

    NASA Technical Reports Server (NTRS)

    Chen, Robert T. N.

    1988-01-01

    A classification of all the compensators that stabilize a given general plant in a linear, time-invariant multi-input, multi-output feedback system is developed. This classification, along with the associated necessary and sufficient conditions for stability of the feedback system, is achieved through the introduction of a new parameterization, referred to as R-Parameterization, which is a dual of the familiar Q-Parameterization. The classification is made to the stability conditions of the compensators and the plant by themselves; and necessary and sufficient conditions are based on the stability of Q and R themselves.

  20. An unsupervised classification technique for multispectral remote sensing data.

    NASA Technical Reports Server (NTRS)

    Su, M. Y.; Cummings, R. E.

    1973-01-01

    Description of a two-part clustering technique consisting of (a) a sequential statistical clustering, which is essentially a sequential variance analysis, and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum-likelihood classification techniques.

  1. Improved classification and visualization of healthy and pathological hard dental tissues by modeling specular reflections in NIR hyperspectral images

    NASA Astrophysics Data System (ADS)

    Usenik, Peter; Bürmen, Miran; Fidler, Aleš; Pernuš, Franjo; Likar, Boštjan

    2012-03-01

    Despite major improvements in dental healthcare and technology, dental caries remains one of the most prevalent chronic diseases of modern society. The initial stages of dental caries are characterized by demineralization of enamel crystals, commonly known as white spots, which are difficult to diagnose. Near-infrared (NIR) hyperspectral imaging is a new promising technique for early detection of demineralization which can classify healthy and pathological dental tissues. However, due to non-ideal illumination of the tooth surface the hyperspectral images can exhibit specular reflections, in particular around the edges and the ridges of the teeth. These reflections significantly affect the performance of automated classification and visualization methods. Cross polarized imaging setup can effectively remove the specular reflections, however is due to the complexity and other imaging setup limitations not always possible. In this paper, we propose an alternative approach based on modeling the specular reflections of hard dental tissues, which significantly improves the classification accuracy in the presence of specular reflections. The method was evaluated on five extracted human teeth with corresponding gold standard for 6 different healthy and pathological hard dental tissues including enamel, dentin, calculus, dentin caries, enamel caries and demineralized regions. Principal component analysis (PCA) was used for multivariate local modeling of healthy and pathological dental tissues. The classification was performed by employing multiple discriminant analysis. Based on the obtained results we believe the proposed method can be considered as an effective alternative to the complex cross polarized imaging setups.

  2. Classification of Spanish white wines using their electrophoretic profiles obtained by capillary zone electrophoresis with amperometric detection.

    PubMed

    Arribas, Alberto Sánchez; Martínez-Fernández, Marta; Moreno, Mónica; Bermejo, Esperanza; Zapardiel, Antonio; Chicharro, Manuel

    2014-06-01

    A method was developed for the simultaneous detection of eight polyphenols (t-resveratrol, (+)-catechin, quercetin and p-coumaric, caffeic, sinapic, ferulic, and gallic acids) by CZE with electrochemical detection. Separation of these polyphenols was achieved within 25 min using a 200 mM borate buffer (pH 9.4) containing 10% methanol as separation electrolyte. Amperometric detection of polyphenols was carried out with a glassy carbon electrode (GCE) modified with a multiwalled carbon nanotubes (CNT) layer obtained from a dispersion of CNT in polyethylenimine. The excellent electrochemical properties of this modified electrode allowed the detection and quantification of the selected polyphenols in white wines without any pretreatment step, showing remarkable signal stability despite the presence of potential fouling substances in wine. The electrophoretic profiles of white wines, obtained using this methodology, have proven to be useful for the classification of these wines by means of chemometric multivariate techniques. Principal component analysis and discriminant analysis allowed accurate classification of wine samples on the basis of their grape varietal (verdejo and airén) using the information contained in selected zones of the electropherogram. The utility of the proposed CZE methodology based on the electrochemical response of CNT-modified electrodes appears to be promising in the field of wine industry and it is expected to be successfully extended to classification of a wider range of wines made of other grape varietals. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Evaluating effects of methylphenidate on brain activity in cocaine addiction: a machine-learning approach

    NASA Astrophysics Data System (ADS)

    Rish, Irina; Bashivan, Pouya; Cecchi, Guillermo A.; Goldstein, Rita Z.

    2016-03-01

    The objective of this study is to investigate effects of methylphenidate on brain activity in individuals with cocaine use disorder (CUD) using functional MRI (fMRI). Methylphenidate hydrochloride (MPH) is an indirect dopamine agonist commonly used for treating attention deficit/hyperactivity disorders; it was also shown to have some positive effects on CUD subjects, such as improved stop signal reaction times associated with better control/inhibition,1 as well as normalized task-related brain activity2 and resting-state functional connectivity in specific areas.3 While prior fMRI studies of MPH in CUDs have focused on mass-univariate statistical hypothesis testing, this paper evaluates multivariate, whole-brain effects of MPH as captured by the generalization (prediction) accuracy of different classification techniques applied to features extracted from resting-state functional networks (e.g., node degrees). Our multivariate predictive results based on resting-state data from3 suggest that MPH tends to normalize network properties such as voxel degrees in CUD subjects, thus providing additional evidence for potential benefits of MPH in treating cocaine addiction.

  4. Analysis of Big Data in Gait Biomechanics: Current Trends and Future Directions.

    PubMed

    Phinyomark, Angkoon; Petri, Giovanni; Ibáñez-Marcelo, Esther; Osis, Sean T; Ferber, Reed

    2018-01-01

    The increasing amount of data in biomechanics research has greatly increased the importance of developing advanced multivariate analysis and machine learning techniques, which are better able to handle "big data". Consequently, advances in data science methods will expand the knowledge for testing new hypotheses about biomechanical risk factors associated with walking and running gait-related musculoskeletal injury. This paper begins with a brief introduction to an automated three-dimensional (3D) biomechanical gait data collection system: 3D GAIT, followed by how the studies in the field of gait biomechanics fit the quantities in the 5 V's definition of big data: volume, velocity, variety, veracity, and value. Next, we provide a review of recent research and development in multivariate and machine learning methods-based gait analysis that can be applied to big data analytics. These modern biomechanical gait analysis methods include several main modules such as initial input features, dimensionality reduction (feature selection and extraction), and learning algorithms (classification and clustering). Finally, a promising big data exploration tool called "topological data analysis" and directions for future research are outlined and discussed.

  5. Introduction to multivariate discrimination

    NASA Astrophysics Data System (ADS)

    Kégl, Balázs

    2013-07-01

    Multivariate discrimination or classification is one of the best-studied problem in machine learning, with a plethora of well-tested and well-performing algorithms. There are also several good general textbooks [1-9] on the subject written to an average engineering, computer science, or statistics graduate student; most of them are also accessible for an average physics student with some background on computer science and statistics. Hence, instead of writing a generic introduction, we concentrate here on relating the subject to a practitioner experimental physicist. After a short introduction on the basic setup (Section 1) we delve into the practical issues of complexity regularization, model selection, and hyperparameter optimization (Section 2), since it is this step that makes high-complexity non-parametric fitting so different from low-dimensional parametric fitting. To emphasize that this issue is not restricted to classification, we illustrate the concept on a low-dimensional but non-parametric regression example (Section 2.1). Section 3 describes the common algorithmic-statistical formal framework that unifies the main families of multivariate classification algorithms. We explain here the large-margin principle that partly explains why these algorithms work. Section 4 is devoted to the description of the three main (families of) classification algorithms, neural networks, the support vector machine, and AdaBoost. We do not go into the algorithmic details; the goal is to give an overview on the form of the functions these methods learn and on the objective functions they optimize. Besides their technical description, we also make an attempt to put these algorithm into a socio-historical context. We then briefly describe some rather heterogeneous applications to illustrate the pattern recognition pipeline and to show how widespread the use of these methods is (Section 5). We conclude the chapter with three essentially open research problems that are either relevant to or even motivated by certain unorthodox applications of multivariate discrimination in experimental physics.

  6. Potential application of machine vision technology to saffron (Crocus sativus L.) quality characterization.

    PubMed

    Kiani, Sajad; Minaei, Saeid

    2016-12-01

    Saffron quality characterization is an important issue in the food industry and of interest to the consumers. This paper proposes an expert system based on the application of machine vision technology for characterization of saffron and shows how it can be employed in practical usage. There is a correlation between saffron color and its geographic location of production and some chemical attributes which could be properly used for characterization of saffron quality and freshness. This may be accomplished by employing image processing techniques coupled with multivariate data analysis for quantification of saffron properties. Expert algorithms can be made available for prediction of saffron characteristics such as color as well as for product classification. Copyright © 2016. Published by Elsevier Ltd.

  7. Multivariate Analysis As a Support for Diagnostic Flowcharts in Allergic Bronchopulmonary Aspergillosis: A Proof-of-Concept Study.

    PubMed

    Vitte, Joana; Ranque, Stéphane; Carsin, Ania; Gomez, Carine; Romain, Thomas; Cassagne, Carole; Gouitaa, Marion; Baravalle-Einaudi, Mélisande; Bel, Nathalie Stremler-Le; Reynaud-Gaubert, Martine; Dubus, Jean-Christophe; Mège, Jean-Louis; Gaudart, Jean

    2017-01-01

    Molecular-based allergy diagnosis yields multiple biomarker datasets. The classical diagnostic score for allergic bronchopulmonary aspergillosis (ABPA), a severe disease usually occurring in asthmatic patients and people with cystic fibrosis, comprises succinct immunological criteria formulated in 1977: total IgE, anti- Aspergillus fumigatus ( Af ) IgE, anti- Af "precipitins," and anti- Af IgG. Progress achieved over the last four decades led to multiple IgE and IgG(4) Af biomarkers available with quantitative, standardized, molecular-level reports. These newly available biomarkers have not been included in the current diagnostic criteria, either individually or in algorithms, despite persistent underdiagnosis of ABPA. Large numbers of individual biomarkers may hinder their use in clinical practice. Conversely, multivariate analysis using new tools may bring about a better chance of less diagnostic mistakes. We report here a proof-of-concept work consisting of a three-step multivariate analysis of Af IgE, IgG, and IgG4 biomarkers through a combination of principal component analysis, hierarchical ascendant classification, and classification and regression tree multivariate analysis. The resulting diagnostic algorithms might show the way for novel criteria and improved diagnostic efficiency in Af -sensitized patients at risk for ABPA.

  8. Estimating global distribution of boreal, temperate, and tropical tree plant functional types using clustering techniques

    NASA Astrophysics Data System (ADS)

    Wang, Audrey; Price, David T.

    2007-03-01

    A simple integrated algorithm was developed to relate global climatology to distributions of tree plant functional types (PFT). Multivariate cluster analysis was performed to analyze the statistical homogeneity of the climate space occupied by individual tree PFTs. Forested regions identified from the satellite-based GLC2000 classification were separated into tropical, temperate, and boreal sub-PFTs for use in the Canadian Terrestrial Ecosystem Model (CTEM). Global data sets of monthly minimum temperature, growing degree days, an index of climatic moisture, and estimated PFT cover fractions were then used as variables in the cluster analysis. The statistical results for individual PFT clusters were found consistent with other global-scale classifications of dominant vegetation. As an improvement of the quantification of the climatic limitations on PFT distributions, the results also demonstrated overlapping of PFT cluster boundaries that reflected vegetation transitions, for example, between tropical and temperate biomes. The resulting global database should provide a better basis for simulating the interaction of climate change and terrestrial ecosystem dynamics using global vegetation models.

  9. Discrimination of edible oils and fats by combination of multivariate pattern recognition and FT-IR spectroscopy: a comparative study between different modeling methods.

    PubMed

    Javidnia, Katayoun; Parish, Maryam; Karimi, Sadegh; Hemmateenejad, Bahram

    2013-03-01

    By using FT-IR spectroscopy, many researchers from different disciplines enrich the experimental complexity of their research for obtaining more precise information. Moreover chemometrics techniques have boosted the use of IR instruments. In the present study we aimed to emphasize on the power of FT-IR spectroscopy for discrimination between different oil samples (especially fat from vegetable oils). Also our data were used to compare the performance of different classification methods. FT-IR transmittance spectra of oil samples (Corn, Colona, Sunflower, Soya, Olive, and Butter) were measured in the wave-number interval of 450-4000 cm(-1). Classification analysis was performed utilizing PLS-DA, interval PLS-DA, extended canonical variate analysis (ECVA) and interval ECVA methods. The effect of data preprocessing by extended multiplicative signal correction was investigated. Whilst all employed method could distinguish butter from vegetable oils, iECVA resulted in the best performances for calibration and external test set with 100% sensitivity and specificity. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Detection of Anomalies in Citrus Leaves Using Laser-Induced Breakdown Spectroscopy (LIBS).

    PubMed

    Sankaran, Sindhuja; Ehsani, Reza; Morgan, Kelly T

    2015-08-01

    Nutrient assessment and management are important to maintain productivity in citrus orchards. In this study, laser-induced breakdown spectroscopy (LIBS) was applied for rapid and real-time detection of citrus anomalies. Laser-induced breakdown spectroscopy spectra were collected from citrus leaves with anomalies such as diseases (Huanglongbing, citrus canker) and nutrient deficiencies (iron, manganese, magnesium, zinc), and compared with those of healthy leaves. Baseline correction, wavelet multivariate denoising, and normalization techniques were applied to the LIBS spectra before analysis. After spectral pre-processing, features were extracted using principal component analysis and classified using two models, quadratic discriminant analysis and support vector machine (SVM). The SVM resulted in a high average classification accuracy of 97.5%, with high average canker classification accuracy (96.5%). LIBS peak analysis indicated that high intensities at 229.7, 247.9, 280.3, 393.5, 397.0, and 769.8 nm were observed of 11 peaks found in all the samples. Future studies using controlled experiments with variable nutrient applications are required for quantification of foliar nutrients by using LIBS-based sensing.

  11. Screening analysis of biodiesel feedstock using UV-vis, NIR and synchronous fluorescence spectrometries and the successive projections algorithm.

    PubMed

    Insausti, Matías; Gomes, Adriano A; Cruz, Fernanda V; Pistonesi, Marcelo F; Araujo, Mario C U; Galvão, Roberto K H; Pereira, Claudete F; Band, Beatriz S F

    2012-08-15

    This paper investigates the use of UV-vis, near infrared (NIR) and synchronous fluorescence (SF) spectrometries coupled with multivariate classification methods to discriminate biodiesel samples with respect to the base oil employed in their production. More specifically, the present work extends previous studies by investigating the discrimination of corn-based biodiesel from two other biodiesel types (sunflower and soybean). Two classification methods are compared, namely full-spectrum SIMCA (soft independent modelling of class analogies) and SPA-LDA (linear discriminant analysis with variables selected by the successive projections algorithm). Regardless of the spectrometric technique employed, full-spectrum SIMCA did not provide an appropriate discrimination of the three biodiesel types. In contrast, all samples were correctly classified on the basis of a reduced number of wavelengths selected by SPA-LDA. It can be concluded that UV-vis, NIR and SF spectrometries can be successfully employed to discriminate corn-based biodiesel from the two other biodiesel types, but wavelength selection by SPA-LDA is key to the proper separation of the classes. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. A neuromorphic network for generic multivariate data classification

    PubMed Central

    Schmuker, Michael; Pfeil, Thomas; Nawrot, Martin Paul

    2014-01-01

    Computational neuroscience has uncovered a number of computational principles used by nervous systems. At the same time, neuromorphic hardware has matured to a state where fast silicon implementations of complex neural networks have become feasible. En route to future technical applications of neuromorphic computing the current challenge lies in the identification and implementation of functional brain algorithms. Taking inspiration from the olfactory system of insects, we constructed a spiking neural network for the classification of multivariate data, a common problem in signal and data analysis. In this model, real-valued multivariate data are converted into spike trains using “virtual receptors” (VRs). Their output is processed by lateral inhibition and drives a winner-take-all circuit that supports supervised learning. VRs are conveniently implemented in software, whereas the lateral inhibition and classification stages run on accelerated neuromorphic hardware. When trained and tested on real-world datasets, we find that the classification performance is on par with a naïve Bayes classifier. An analysis of the network dynamics shows that stable decisions in output neuron populations are reached within less than 100 ms of biological time, matching the time-to-decision reported for the insect nervous system. Through leveraging a population code, the network tolerates the variability of neuronal transfer functions and trial-to-trial variation that is inevitably present on the hardware system. Our work provides a proof of principle for the successful implementation of a functional spiking neural network on a configurable neuromorphic hardware system that can readily be applied to real-world computing problems. PMID:24469794

  13. A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations

    PubMed Central

    Horsch, Salome; Kopczynski, Dominik; Kuthe, Elias; Baumbach, Jörg Ingo; Rahmann, Sven

    2017-01-01

    Motivation Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. Method We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. Results The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology. PMID:28910313

  14. Multivariate qualitative analysis of banned additives in food safety using surface enhanced Raman scattering spectroscopy

    NASA Astrophysics Data System (ADS)

    He, Shixuan; Xie, Wanyi; Zhang, Wei; Zhang, Liqun; Wang, Yunxia; Liu, Xiaoling; Liu, Yulong; Du, Chunlei

    2015-02-01

    A novel strategy which combines iteratively cubic spline fitting baseline correction method with discriminant partial least squares qualitative analysis is employed to analyze the surface enhanced Raman scattering (SERS) spectroscopy of banned food additives, such as Sudan I dye and Rhodamine B in food, Malachite green residues in aquaculture fish. Multivariate qualitative analysis methods, using the combination of spectra preprocessing iteratively cubic spline fitting (ICSF) baseline correction with principal component analysis (PCA) and discriminant partial least squares (DPLS) classification respectively, are applied to investigate the effectiveness of SERS spectroscopy for predicting the class assignments of unknown banned food additives. PCA cannot be used to predict the class assignments of unknown samples. However, the DPLS classification can discriminate the class assignment of unknown banned additives using the information of differences in relative intensities. The results demonstrate that SERS spectroscopy combined with ICSF baseline correction method and exploratory analysis methodology DPLS classification can be potentially used for distinguishing the banned food additives in field of food safety.

  15. Multivariate pattern analysis strategies in detection of remitted major depressive disorder using resting state functional connectivity.

    PubMed

    Bhaumik, Runa; Jenkins, Lisanne M; Gowins, Jennifer R; Jacobs, Rachel H; Barba, Alyssa; Bhaumik, Dulal K; Langenecker, Scott A

    2017-01-01

    Understanding abnormal resting-state functional connectivity of distributed brain networks may aid in probing and targeting mechanisms involved in major depressive disorder (MDD). To date, few studies have used resting state functional magnetic resonance imaging (rs-fMRI) to attempt to discriminate individuals with MDD from individuals without MDD, and to our knowledge no investigations have examined a remitted (r) population. In this study, we examined the efficiency of support vector machine (SVM) classifier to successfully discriminate rMDD individuals from healthy controls (HCs) in a narrow early-adult age range. We empirically evaluated four feature selection methods including multivariate Least Absolute Shrinkage and Selection Operator (LASSO) and Elastic Net feature selection algorithms. Our results showed that SVM classification with Elastic Net feature selection achieved the highest classification accuracy of 76.1% (sensitivity of 81.5% and specificity of 68.9%) by leave-one-out cross-validation across subjects from a dataset consisting of 38 rMDD individuals and 29 healthy controls. The highest discriminating functional connections were between the left amygdala, left posterior cingulate cortex, bilateral dorso-lateral prefrontal cortex, and right ventral striatum. These appear to be key nodes in the etiopathophysiology of MDD, within and between default mode, salience and cognitive control networks. This technique demonstrates early promise for using rs-fMRI connectivity as a putative neurobiological marker capable of distinguishing between individuals with and without rMDD. These methods may be extended to periods of risk prior to illness onset, thereby allowing for earlier diagnosis, prevention, and intervention.

  16. Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sreepathi, Sarat; Kumar, Jitendra; Mills, Richard T.

    A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like themore » Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.« less

  17. A methodological approach to study the stability of selected watercolours for painting reintegration, through reflectance spectrophotometry, Fourier transform infrared spectroscopy and hyperspectral imaging.

    PubMed

    Pelosi, Claudia; Capobianco, Giuseppe; Agresti, Giorgia; Bonifazi, Giuseppe; Morresi, Fabio; Rossi, Sara; Santamaria, Ulderico; Serranti, Silvia

    2018-06-05

    The aim of this work is to investigate the stability to simulated solar radiation of some paintings samples through a new methodological approach adopting non-invasive spectroscopic techniques. In particular, commercial watercolours and iron oxide based pigments were used, these last ones being prepared for the experimental by gum Arabic in order to propose a possible substitute for traditional reintegration materials. Reflectance spectrophotometry in the visible range and Hyperspectral Imaging in the short wave infrared were chosen as non-invasive techniques for evaluation the stability to irradiation of the chosen pigments. These were studied before and after artificial ageing procedure performed in Solar Box chamber under controlled conditions. Data were treated and elaborated in order to evaluate the sensitivity of the chosen techniques in identifying the variations on paint layers, induced by photo-degradation, before they could be observed by eye. Furthermore a supervised classification method for monitoring the painted surface changes adopting a multivariate approach was successfully applied. Copyright © 2018 Elsevier B.V. All rights reserved.

  18. The identification of Oligo-Miocene mammalian palaeocommunities from the Riversleigh World Heritage Area, Australia and an appraisal of palaeoecological techniques

    PubMed Central

    Black, Karen H.; Archer, Michael; Hand, Suzanne J.

    2017-01-01

    Fourteen of the best sampled Oligo-Miocene local faunas from the Riversleigh World Heritage Area, north-western Queensland, Australia are analysed using classification and ordination techniques to identify potential mammalian palaeocommunities and palaeocommunity types. Abundance data for these faunas are used, for the first time, in conjunction with presence/absence data. An early Miocene Faunal Zone B and two middle Miocene Faunal Zone C palaeocommunities are recognised, as well as one palaeocommunity type. Change in palaeocommunity structure, between the early Miocene and middle Miocene, may be the result of significant climate change during the Miocene Carbon Isotope Excursion. The complexes of local faunas identified will allow researchers to use novel palaeocommunities in future analyses of Riversleigh’s fossil faunas. The utility of some palaeoecological multivariate indices and techniques is examined. The Dice index is found to outperform other binary similarity/distance coefficients, while the UPGMA algorithm is more useful than neighbour joining. Evidence is equivocal for the usefulness of presence/absence data compared to abundance. PMID:28674663

  19. Unsupervised classification of earth resources data.

    NASA Technical Reports Server (NTRS)

    Su, M. Y.; Jayroe, R. R., Jr.; Cummings, R. E.

    1972-01-01

    A new clustering technique is presented. It consists of two parts: (a) a sequential statistical clustering which is essentially a sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by existing supervised maximum liklihood classification technique.

  20. The classification of anxiety and hysterical states. Part I. Historical review and empirical delineation.

    PubMed

    Sheehan, D V; Sheehan, K H

    1982-08-01

    The history of the classification of anxiety, hysterical, and hypochondriacal disorders is reviewed. Problems in the ability of current classification schemes to predict, control, and describe the relationship between the symptoms and other phenomena are outlined. Existing classification schemes failed the first test of a good classification model--that of providing categories that are mutually exclusive. The independence of these diagnostic categories from each other does not appear to hold up on empirical testing. In the absence of inherently mutually exclusive categories, further empirical investigation of these classes is obstructed since statistically valid analysis of the nominal data and any useful multivariate analysis would be difficult if not impossible. It is concluded that the existing classifications are unsatisfactory and require some fundamental reconceptualization.

  1. A multivariate pattern analysis study of the HIV-related white matter anatomical structural connections alterations

    NASA Astrophysics Data System (ADS)

    Tang, Zhenchao; Liu, Zhenyu; Li, Ruili; Cui, Xinwei; Li, Hongjun; Dong, Enqing; Tian, Jie

    2017-03-01

    It's widely known that HIV infection would cause white matter integrity impairments. Nevertheless, it is still unclear that how the white matter anatomical structural connections are affected by HIV infection. In the current study, we employed a multivariate pattern analysis to explore the HIV-related white matter connections alterations. Forty antiretroviraltherapy- naïve HIV patients and thirty healthy controls were enrolled. Firstly, an Automatic Anatomical Label (AAL) atlas based white matter structural network, a 90 × 90 FA-weighted matrix, was constructed for each subject. Then, the white matter connections deprived from the structural network were entered into a lasso-logistic regression model to perform HIV-control group classification. Using leave one out cross validation, a classification accuracy (ACC) of 90% (P=0.002) and areas under the receiver operating characteristic curve (AUC) of 0.96 was obtained by the classification model. This result indicated that the white matter anatomical structural connections contributed greatly to HIV-control group classification, providing solid evidence that the white matter connections were affected by HIV infection. Specially, 11 white matter connections were selected in the classification model, mainly crossing the regions of frontal lobe, Cingulum, Hippocampus, and Thalamus, which were reported to be damaged in previous HIV studies. This might suggest that the white matter connections adjacent to the HIV-related impaired regions were prone to be damaged.

  2. Risk prediction for myocardial infarction via generalized functional regression models.

    PubMed

    Ieva, Francesca; Paganoni, Anna M

    2016-08-01

    In this paper, we propose a generalized functional linear regression model for a binary outcome indicating the presence/absence of a cardiac disease with multivariate functional data among the relevant predictors. In particular, the motivating aim is the analysis of electrocardiographic traces of patients whose pre-hospital electrocardiogram (ECG) has been sent to 118 Dispatch Center of Milan (the Italian free-toll number for emergencies) by life support personnel of the basic rescue units. The statistical analysis starts with a preprocessing of ECGs treated as multivariate functional data. The signals are reconstructed from noisy observations. The biological variability is then removed by a nonlinear registration procedure based on landmarks. Thus, in order to perform a data-driven dimensional reduction, a multivariate functional principal component analysis is carried out on the variance-covariance matrix of the reconstructed and registered ECGs and their first derivatives. We use the scores of the Principal Components decomposition as covariates in a generalized linear model to predict the presence of the disease in a new patient. Hence, a new semi-automatic diagnostic procedure is proposed to estimate the risk of infarction (in the case of interest, the probability of being affected by Left Bundle Brunch Block). The performance of this classification method is evaluated and compared with other methods proposed in literature. Finally, the robustness of the procedure is checked via leave-j-out techniques. © The Author(s) 2013.

  3. Hierarchical classification method and its application in shape representation

    NASA Astrophysics Data System (ADS)

    Ireton, M. A.; Oakley, John P.; Xydeas, Costas S.

    1992-04-01

    In this paper we describe a technique for performing shaped-based content retrieval of images from a large database. In order to be able to formulate such user-generated queries about visual objects, we have developed an hierarchical classification technique. This hierarchical classification technique enables similarity matching between objects, with the position in the hierarchy signifying the level of generality to be used in the query. The classification technique is unsupervised, robust, and general; it can be applied to any suitable parameter set. To establish the potential of this classifier for aiding visual querying, we have applied it to the classification of the 2-D outlines of leaves.

  4. Modification of Gaussian mixture models for data classification in high energy physics

    NASA Astrophysics Data System (ADS)

    Štěpánek, Michal; Franc, Jiří; Kůs, Václav

    2015-01-01

    In high energy physics, we deal with demanding task of signal separation from background. The Model Based Clustering method involves the estimation of distribution mixture parameters via the Expectation-Maximization algorithm in the training phase and application of Bayes' rule in the testing phase. Modifications of the algorithm such as weighting, missing data processing, and overtraining avoidance will be discussed. Due to the strong dependence of the algorithm on initialization, genetic optimization techniques such as mutation, elitism, parasitism, and the rank selection of individuals will be mentioned. Data pre-processing plays a significant role for the subsequent combination of final discriminants in order to improve signal separation efficiency. Moreover, the results of the top quark separation from the Tevatron collider will be compared with those of standard multivariate techniques in high energy physics. Results from this study has been used in the measurement of the inclusive top pair production cross section employing DØ Tevatron full Runll data (9.7 fb-1).

  5. Targeted Proteomics Approach for Precision Plant Breeding.

    PubMed

    Chawade, Aakash; Alexandersson, Erik; Bengtsson, Therese; Andreasson, Erik; Levander, Fredrik

    2016-02-05

    Selected reaction monitoring (SRM) is a targeted mass spectrometry technique that enables precise quantitation of hundreds of peptides in a single run. This technique provides new opportunities for multiplexed protein biomarker measurements. For precision plant breeding, DNA-based markers have been used extensively, but the potential of protein biomarkers has not been exploited. In this work, we developed an SRM marker panel with assays for 104 potato (Solanum tuberosum) peptides selected using univariate and multivariate statistics. Thereafter, using random forest classification, the prediction markers were identified for Phytopthora infestans resistance in leaves, P. infestans resistance in tubers, and plant yield in potato leaf secretome samples. The results suggest that the marker panel has the predictive potential for three traits, two of which have no commercial DNA markers so far. Furthermore, the marker panel was also tested and found to be applicable to potato clones not used during the marker development. The proposed workflow is thus a proof-of-concept for targeted proteomics as an efficient readout in accelerated breeding for complex and agronomically important traits.

  6. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data

    PubMed Central

    Hepworth, Philip J.; Nefedov, Alexey V.; Muchnik, Ilya B.; Morgan, Kenton L.

    2012-01-01

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide. PMID:22319115

  7. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data.

    PubMed

    Hepworth, Philip J; Nefedov, Alexey V; Muchnik, Ilya B; Morgan, Kenton L

    2012-08-07

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide.

  8. Panic disorder and agoraphobia: A direct comparison of their multivariate comorbidity patterns.

    PubMed

    Greene, Ashley L; Eaton, Nicholas R

    2016-01-15

    Scientific debate has long surrounded whether agoraphobia is a severe consequence of panic disorder or a frequently comorbid diagnosis. Multivariate comorbidity investigations typically treat these diagnoses as fungible in structural models, assuming both are manifestations of the fear-subfactor in the internalizing-externalizing model. No studies have directly compared these disorders' multivariate associations, which could clarify their conceptualization in classification and comorbidity research. In a nationally representative sample (N=43,093), we examined the multivariate comorbidity of panic disorder (1) without agoraphobia, (2) with agoraphobia, and (3) regardless of agoraphobia; and (4) agoraphobia without panic. We conducted exploratory and confirmatory factor analyses of these and 10 other lifetime DSM-IV diagnoses in a nationally representative sample (N=43,093). Differing bivariate and multivariate relations were found. Panic disorder without agoraphobia was largely a distress disorder, related to emotional disorders. Agoraphobia without panic was largely a fear disorder, related to phobias. When considered jointly, concomitant agoraphobia and panic was a fear disorder, and when panic was assessed without regard to agoraphobia (some individuals had agoraphobia while others did not) it was a mixed distress and fear disorder. Diagnoses were obtained from comprehensively trained lay interviewers, not clinicians and analyses used DSM-IV diagnoses (rather than DSM-5). These findings support the conceptualization of agoraphobia as a distinct diagnostic entity and the independent classification of both disorders in DSM-5, suggesting future multivariate comorbidity studies should not assume various panic/agoraphobia diagnoses are invariably fear disorders. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Evaluation of Urinary Tract Dilation Classification System for Grading Postnatal Hydronephrosis.

    PubMed

    Hodhod, Amr; Capolicchio, John-Paul; Jednak, Roman; El-Sherif, Eid; El-Doray, Abd El-Alim; El-Sherbiny, Mohamed

    2016-03-01

    We assessed the reliability and validity of the Urinary Tract Dilation classification system as a new grading system for postnatal hydronephrosis. We retrospectively reviewed charts of patients who presented with hydronephrosis from 2008 to 2013. We included patients diagnosed prenatally and those with hydronephrosis discovered incidentally during the first year of life. We excluded cases involving urinary tract infection, neurogenic bladder and chromosomal anomalies, those associated with extraurinary congenital malformations and those with followup of less than 24 months without resolution. Hydronephrosis was graded postnatally using the Society for Fetal Urology system, and then the management protocol was chosen. All units were regraded using the Urinary Tract Dilation classification system and compared to the Society for Fetal Urology system to assess reliability. Univariate and multivariate analyses were performed to assess the validity of the Urinary Tract Dilation classification system in predicting hydronephrosis resolution and surgical intervention. A total of 490 patients (730 renal units) were eligible to participate. The Urinary Tract Dilation classification system was reliable in the assessment of hydronephrosis (parallel forms 0.92). Hydronephrosis resolved in 357 units (49%), and 86 units (12%) were managed by surgical intervention. The remainder of renal units demonstrated stable or improved hydronephrosis. Multivariate analysis revealed that the likelihood of surgical intervention was predicted independently by Urinary Tract Dilation classification system risk group, while Society for Fetal Urology grades were predictive of likelihood of resolution. The Urinary Tract Dilation classification system is reliable for evaluation of postnatal hydronephrosis and is valid in predicting surgical intervention. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.

  10. Use of genetic algorithm for the selection of EEG features

    NASA Astrophysics Data System (ADS)

    Asvestas, P.; Korda, A.; Kostopoulos, S.; Karanasiou, I.; Ouzounoglou, A.; Sidiropoulos, K.; Ventouras, E.; Matsopoulos, G.

    2015-09-01

    Genetic Algorithm (GA) is a popular optimization technique that can detect the global optimum of a multivariable function containing several local optima. GA has been widely used in the field of biomedical informatics, especially in the context of designing decision support systems that classify biomedical signals or images into classes of interest. The aim of this paper is to present a methodology, based on GA, for the selection of the optimal subset of features that can be used for the efficient classification of Event Related Potentials (ERPs), which are recorded during the observation of correct or incorrect actions. In our experiment, ERP recordings were acquired from sixteen (16) healthy volunteers who observed correct or incorrect actions of other subjects. The brain electrical activity was recorded at 47 locations on the scalp. The GA was formulated as a combinatorial optimizer for the selection of the combination of electrodes that maximizes the performance of the Fuzzy C Means (FCM) classification algorithm. In particular, during the evolution of the GA, for each candidate combination of electrodes, the well-known (Σ, Φ, Ω) features were calculated and were evaluated by means of the FCM method. The proposed methodology provided a combination of 8 electrodes, with classification accuracy 93.8%. Thus, GA can be the basis for the selection of features that discriminate ERP recordings of observations of correct or incorrect actions.

  11. Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms.

    PubMed

    Bromuri, Stefano; Zufferey, Damien; Hennebert, Jean; Schumacher, Michael

    2014-10-01

    This research is motivated by the issue of classifying illnesses of chronically ill patients for decision support in clinical settings. Our main objective is to propose multi-label classification of multivariate time series contained in medical records of chronically ill patients, by means of quantization methods, such as bag of words (BoW), and multi-label classification algorithms. Our second objective is to compare supervised dimensionality reduction techniques to state-of-the-art multi-label classification algorithms. The hypothesis is that kernel methods and locality preserving projections make such algorithms good candidates to study multi-label medical time series. We combine BoW and supervised dimensionality reduction algorithms to perform multi-label classification on health records of chronically ill patients. The considered algorithms are compared with state-of-the-art multi-label classifiers in two real world datasets. Portavita dataset contains 525 diabetes type 2 (DT2) patients, with co-morbidities of DT2 such as hypertension, dyslipidemia, and microvascular or macrovascular issues. MIMIC II dataset contains 2635 patients affected by thyroid disease, diabetes mellitus, lipoid metabolism disease, fluid electrolyte disease, hypertensive disease, thrombosis, hypotension, chronic obstructive pulmonary disease (COPD), liver disease and kidney disease. The algorithms are evaluated using multi-label evaluation metrics such as hamming loss, one error, coverage, ranking loss, and average precision. Non-linear dimensionality reduction approaches behave well on medical time series quantized using the BoW algorithm, with results comparable to state-of-the-art multi-label classification algorithms. Chaining the projected features has a positive impact on the performance of the algorithm with respect to pure binary relevance approaches. The evaluation highlights the feasibility of representing medical health records using the BoW for multi-label classification tasks. The study also highlights that dimensionality reduction algorithms based on kernel methods, locality preserving projections or both are good candidates to deal with multi-label classification tasks in medical time series with many missing values and high label density. Copyright © 2014 Elsevier Inc. All rights reserved.

  12. Automated classification and visualization of healthy and pathological dental tissues based on near-infrared hyper-spectral imaging

    NASA Astrophysics Data System (ADS)

    Usenik, Peter; Bürmen, Miran; Vrtovec, Tomaž; Fidler, Aleš; Pernuš, Franjo; Likar, Boštjan

    2011-03-01

    Despite major improvements in dental healthcare and technology, dental caries remains one of the most prevalent chronic diseases of modern society. The initial stages of dental caries are characterized by demineralization of enamel crystals, commonly known as white spots which are difficult to diagnose. If detected early enough, such demineralization can be arrested and reversed by non-surgical means through well established dental treatments (fluoride therapy, anti-bacterial therapy, low intensity laser irradiation). Near-infrared (NIR) hyper-spectral imaging is a new promising technique for early detection of demineralization based on distinct spectral features of healthy and pathological dental tissues. In this study, we apply NIR hyper-spectral imaging to classify and visualize healthy and pathological dental tissues including enamel, dentin, calculus, dentin caries, enamel caries and demineralized areas. For this purpose, a standardized teeth database was constructed consisting of 12 extracted human teeth with different degrees of natural dental lesions imaged by NIR hyper-spectral system, X-ray and digital color camera. The color and X-ray images of teeth were presented to a clinical expert for localization and classification of the dental tissues, thereby obtaining the gold standard. Principal component analysis was used for multivariate local modeling of healthy and pathological dental tissues. Finally, the dental tissues were classified by employing multiple discriminant analysis. High agreement was observed between the resulting classification and the gold standard with the classification sensitivity and specificity exceeding 85 % and 97 %, respectively. This study demonstrates that NIR hyper-spectral imaging has considerable diagnostic potential for imaging hard dental tissues.

  13. Multivariate evaluation of Thyroid Imaging Reporting and Data System (TI-RADS) in diagnosis malignant thyroid nodule: application to PCA and PLS-DA analysis.

    PubMed

    Zhang, Tan; Li, Fangxuan; Mu, Jiali; Liu, Juntian; Zhang, Sheng

    2017-06-01

    To explore the significance of ultrasonic features in differential diagnosis of thyroid nodules via combining the thyroid imaging reporting and data system (TI-RADS) and multivariate statistical analysis. Patients who received surgical treatment and was diagnosed with single thyroid nodule by postoperative pathology and preoperative ultrasound were enrolled in this study. Multivariate analysis was applied to assess the significant ultrasonic features which correlated with identifying benign or malignance and grading the TI-RADS classification of thyroid nodule. There were significant differences in the nodule size, aspect ratio, internal, echogenicity, boundary, presence or absence of calcifications, calcification type and CDFI between benign and malignant thyroid nodules. Multivariate analysis showed clear-cut distinction both between benign and malignance and among different TI-RADS categories of malignancy nodules. The shape and calcification of the nodule were important factors for distinguish the benign and malignance. Height of the nodule, aspect and calcification was important factors for grading TI-RADS categories of malignancy thyroid nodules. Ill-defined boundary, irregular shape and presence of calcification related with highly malignant risk for thyroid nodule. The larger height and aspect and presence of calcification related with higher TI-RADS classification of malignancy thyroid nodule.

  14. Linear regression analysis and its application to multivariate chromatographic calibration for the quantitative analysis of two-component mixtures.

    PubMed

    Dinç, Erdal; Ozdemir, Abdil

    2005-01-01

    Multivariate chromatographic calibration technique was developed for the quantitative analysis of binary mixtures enalapril maleate (EA) and hydrochlorothiazide (HCT) in tablets in the presence of losartan potassium (LST). The mathematical algorithm of multivariate chromatographic calibration technique is based on the use of the linear regression equations constructed using relationship between concentration and peak area at the five-wavelength set. The algorithm of this mathematical calibration model having a simple mathematical content was briefly described. This approach is a powerful mathematical tool for an optimum chromatographic multivariate calibration and elimination of fluctuations coming from instrumental and experimental conditions. This multivariate chromatographic calibration contains reduction of multivariate linear regression functions to univariate data set. The validation of model was carried out by analyzing various synthetic binary mixtures and using the standard addition technique. Developed calibration technique was applied to the analysis of the real pharmaceutical tablets containing EA and HCT. The obtained results were compared with those obtained by classical HPLC method. It was observed that the proposed multivariate chromatographic calibration gives better results than classical HPLC.

  15. Impact of FAB classification on predicting outcome in acute myeloid leukemia, not otherwise specified, patients undergoing allogeneic stem cell transplantation in CR1: An analysis of 1690 patients from the acute leukemia working party of EBMT.

    PubMed

    Canaani, Jonathan; Beohou, Eric; Labopin, Myriam; Socié, Gerard; Huynh, Anne; Volin, Liisa; Cornelissen, Jan; Milpied, Noel; Gedde-Dahl, Tobias; Deconinck, Eric; Fegueux, Nathalie; Blaise, Didier; Mohty, Mohamad; Nagler, Arnon

    2017-04-01

    The French, American, and British (FAB) classification system for acute myeloid leukemia (AML) is extensively used and is incorporated into the AML, not otherwise specified (NOS) category in the 2016 WHO edition of myeloid neoplasm classification. While recent data proposes that FAB classification does not provide additional prognostic information for patients for whom NPM1 status is available, it is unknown whether FAB still retains a current prognostic role in predicting outcome of AML patients undergoing allogeneic stem cell transplantation. Using the European Society of Blood and Bone Marrow Transplantation registry we analyzed outcome of 1690 patients transplanted in CR1 to determine if FAB classification provides additional prognostic value. Multivariate analysis revealed that M6/M7 patients had decreased leukemia free survival (hazard ratio (HR) of 1.41, 95% confidence interval (CI), 1.01-1.99; P = .046) in addition to increased nonrelapse mortality (NRM) rates (HR, 1.79; 95% CI, 1.06-3.01; P = .028) compared with other FAB types. In the NPM1 wt AML, NOS cohort, FAB M6/M7 was also associated with increased NRM (HR, 2.17; 95% CI, 1.14-4.16; P = .019). Finally, in FLT3-ITD + patients, multivariate analyses revealed that specific FAB types were tightly associated with adverse outcome. In conclusion, FAB classification may predict outcome following transplantation in AML, NOS patients. © 2017 Wiley Periodicals, Inc.

  16. Comparisons of severity classification systems for oropharyngeal dysfunction in children with cerebral palsy: Relations with other functional profiles.

    PubMed

    Goh, Yu-Ra; Choi, Ja Young; Kim, Seon Ah; Park, Jieun; Park, Eun Sook

    2018-01-01

    This study aimed to investigate the relationships between various classification systems assessing the severity of oropharyngeal dysphagia and communication function and other functional profiles in children with cerebral palsy (CP). This is a prospective, cross-sectional, study in a university-affiliated, tertiary-care hospital. We recruited 151 children with CP (mean age 6.11 years, SD 3.42, range 3-18yr). The Eating and Drinking Ability Classification System (EDACS) and the dysphagia scales of Functional Oral Intake Scale (FOIS), Swallow Function Scales (SFS), and Food Intake Level Scale (FILS) were used. The Communication Function Classification System (CFCS) and Viking Speech Scale (VSS) were employed to classify communication function and speech intelligibility, respectively. The Pediatric Evaluation of Disability Inventory (PEDI) with the Gross Motor Function Classification System (GFMCS) and the Manual Ability Classification System (MACS) level were also assessed. Spearman correlation analysis to investigate the associations between measures and univariate and multivariate logistic regression models to identify significant factors were used. Median GMFCS level of participants was III (interquartile range II-IV). Significant dysphagia based on EDACS level III-V was noted in 23 children (15.2%). There were strong to very strong relationships between the EDACS level with the dysphagia scales. The EDACS presented strong associations with MACS, CFCS, and VSS, a moderate association with GMFCS level, and a moderate to strong association with each domain of the PEDI. In multivariate analysis, poor functioning in EDACS were associated with poor functioning in gross motor and communication functions. Copyright © 2017. Published by Elsevier Ltd.

  17. The search for structure - Object classification in large data sets. [for astronomers

    NASA Technical Reports Server (NTRS)

    Kurtz, Michael J.

    1988-01-01

    Research concerning object classifications schemes are reviewed, focusing on large data sets. Classification techniques are discussed, including syntactic, decision theoretic methods, fuzzy techniques, and stochastic and fuzzy grammars. Consideration is given to the automation of MK classification (Morgan and Keenan, 1973) and other problems associated with the classification of spectra. In addition, the classification of galaxies is examined, including the problems of systematic errors, blended objects, galaxy types, and galaxy clusters.

  18. Vegetation monitoring and classification using NOAA/AVHRR satellite data

    NASA Technical Reports Server (NTRS)

    Greegor, D. H., Jr.; Norwine, J. R.

    1983-01-01

    A vegetation gradient model, based on a new surface hydrologic index and NOAA/AVHRR meteorological satellite data, has been analyzed along a 1300 km east-west transect across the state of Texas. The model was developed to test the potential usefulness of such low-resolution data for vegetation stratification and monitoring. Normalized Difference values (ratio of AVHRR bands 1 and 2, considered to be an index of greenness) were determined and evaluated against climatological and vegetation characteristics at 50 sample locations (regular intervals of 0.25 deg longitude) along the transect on five days in 1980. Statistical treatment of the data indicate that a multivariate model incorporating satellite-measured spectral greenness values and a surface hydrologic factor offer promise as a new technique for regional-scale vegetation stratification and monitoring.

  19. Network-based high level data classification.

    PubMed

    Silva, Thiago Christiano; Zhao, Liang

    2012-06-01

    Traditional supervised data classification considers only physical features (e.g., distance or similarity) of the input data. Here, this type of learning is called low level classification. On the other hand, the human (animal) brain performs both low and high orders of learning and it has facility in identifying patterns according to the semantic meaning of the input data. Data classification that considers not only physical attributes but also the pattern formation is, here, referred to as high level classification. In this paper, we propose a hybrid classification technique that combines both types of learning. The low level term can be implemented by any classification technique, while the high level term is realized by the extraction of features of the underlying network constructed from the input data. Thus, the former classifies the test instances by their physical features or class topologies, while the latter measures the compliance of the test instances to the pattern formation of the data. Our study shows that the proposed technique not only can realize classification according to the pattern formation, but also is able to improve the performance of traditional classification techniques. Furthermore, as the class configuration's complexity increases, such as the mixture among different classes, a larger portion of the high level term is required to get correct classification. This feature confirms that the high level classification has a special importance in complex situations of classification. Finally, we show how the proposed technique can be employed in a real-world application, where it is capable of identifying variations and distortions of handwritten digit images. As a result, it supplies an improvement in the overall pattern recognition rate.

  20. Accuracy assessments and areal estimates using two-phase stratified random sampling, cluster plots, and the multivariate composite estimator

    Treesearch

    Raymond L. Czaplewski

    2000-01-01

    Consider the following example of an accuracy assessment. Landsat data are used to build a thematic map of land cover for a multicounty region. The map classifier (e.g., a supervised classification algorithm) assigns each pixel into one category of land cover. The classification system includes 12 different types of forest and land cover: black spruce, balsam fir,...

  1. Multivariate spline methods in surface fitting

    NASA Technical Reports Server (NTRS)

    Guseman, L. F., Jr. (Principal Investigator); Schumaker, L. L.

    1984-01-01

    The use of spline functions in the development of classification algorithms is examined. In particular, a method is formulated for producing spline approximations to bivariate density functions where the density function is decribed by a histogram of measurements. The resulting approximations are then incorporated into a Bayesiaan classification procedure for which the Bayes decision regions and the probability of misclassification is readily computed. Some preliminary numerical results are presented to illustrate the method.

  2. Assessing the Effectiveness of Statistical Classification Techniques in Predicting Future Employment of Participants in the Temporary Assistance for Needy Families Program

    ERIC Educational Resources Information Center

    Montoya, Isaac D.

    2008-01-01

    Three classification techniques (Chi-square Automatic Interaction Detection [CHAID], Classification and Regression Tree [CART], and discriminant analysis) were tested to determine their accuracy in predicting Temporary Assistance for Needy Families program recipients' future employment. Technique evaluation was based on proportion of correctly…

  3. Classification of the Regional Ionospheric Disturbance Based on Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Terzi, Merve Begum; Arikan, Orhan; Karatay, Secil; Arikan, Feza; Gulyaeva, Tamara

    2016-08-01

    In this study, Total Electron Content (TEC) estimated from GPS receivers is used to model the regional and local variability that differs from global activity along with solar and geomagnetic indices. For the automated classification of regional disturbances, a classification technique based on a robust machine learning technique that have found wide spread use, Support Vector Machine (SVM) is proposed. Performance of developed classification technique is demonstrated for midlatitude ionosphere over Anatolia using TEC estimates generated from GPS data provided by Turkish National Permanent GPS Network (TNPGN-Active) for solar maximum year of 2011. As a result of implementing developed classification technique to Global Ionospheric Map (GIM) TEC data, which is provided by the NASA Jet Propulsion Laboratory (JPL), it is shown that SVM can be a suitable learning method to detect anomalies in TEC variations.

  4. Multivariate qualitative analysis of banned additives in food safety using surface enhanced Raman scattering spectroscopy.

    PubMed

    He, Shixuan; Xie, Wanyi; Zhang, Wei; Zhang, Liqun; Wang, Yunxia; Liu, Xiaoling; Liu, Yulong; Du, Chunlei

    2015-02-25

    A novel strategy which combines iteratively cubic spline fitting baseline correction method with discriminant partial least squares qualitative analysis is employed to analyze the surface enhanced Raman scattering (SERS) spectroscopy of banned food additives, such as Sudan I dye and Rhodamine B in food, Malachite green residues in aquaculture fish. Multivariate qualitative analysis methods, using the combination of spectra preprocessing iteratively cubic spline fitting (ICSF) baseline correction with principal component analysis (PCA) and discriminant partial least squares (DPLS) classification respectively, are applied to investigate the effectiveness of SERS spectroscopy for predicting the class assignments of unknown banned food additives. PCA cannot be used to predict the class assignments of unknown samples. However, the DPLS classification can discriminate the class assignment of unknown banned additives using the information of differences in relative intensities. The results demonstrate that SERS spectroscopy combined with ICSF baseline correction method and exploratory analysis methodology DPLS classification can be potentially used for distinguishing the banned food additives in field of food safety. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. Geomorphic Classification and Evaluation of Channel Width and Emergent Sandbar Habitat Relations on the Lower Platte River, Nebraska

    USGS Publications Warehouse

    Elliott, Caroline M.

    2011-01-01

    This report presents a summary of geomorphic characteristics extracted from aerial imagery for three broad segments of the Lower Platte River. This report includes a summary of the longitudinal multivariate classification in Elliott and others (2009) and presents a new analysis of total channel width and habitat variables. Three segments on the lower 102.8 miles of the Lower Platte River are addressed in this report: the Loup River to the Elkhorn River (70 miles long), the Elkhorn River to Salt Creek (6.9 miles long), and Salt Creek to the Missouri River (25.9 miles long). The locations of these segments were determined by the locations of tributaries potentially significant to the hydrology or sediment supply of the Lower Platte River. This report summarizes channel characteristics as mapped from July 2006 aerial imagery including river width, valley width, channel curvature, and in-channel habitat features. In-channel habitat measurements were not made under consistent hydrologic conditions and must be considered general estimates of channel condition in late July 2006. Longitudinal patterns in these features are explored and are summarized in the context of the longitudinal multivariate classification in Elliott and others (2009) for the three Lower Platte River segments. Detailed descriptions of data collection and classification methods are described in Elliott and others (2009). Nesting data for the endangered interior least tern (Sternula antillarum) and threatened piping plover (Charadrius melodus) from 2006 through 2009 are examined within the context of the multivariate classification and Lower Platte River segments. The widest reaches of the Lower Platte River are located in the segment downstream from the Loup River to the Elkhorn River. This segment also has the widest valley and highest degree of braiding of the three segments and many large vegetated islands. The short segment of river between the Elkhorn River and Salt Creek has a fairly low valley width and high channel sinuosities at larger scales. The segment from Salt Creek to the Missouri River has narrow valleys and generally low channel sinuosity. Tern and plover nest sites from 2006 through 2009 in the multi-scale multivariate classification indicated relative nesting selection of cluster 2 reaches among the four-cluster classification and reaches containing clusters 2, 3, and 6 from the seven-cluster classification. These classes, with the exception of cluster 6 are common downstream from the Elkhorn River. Trends in total channel width indicated that reaches dominated by dark vegetation (islands) are the widest on the Lower Platte River. Reaches with high percentages of dry sand and dry sand plus light vegetation were the narrowest reaches. This suggests that narrow channel reaches have sufficient transport capacity to maintain sandbars under recent (2006) flow regimes and are likely to be most amenable to maintaining tern and plover habitat in the Lower Platte River. Further investigations into the dynamics of emergent sandbar habitat and the effects of bank stabilization on in-channel habitats will require the collection and analysis of new data, particularly detailed elevation information and an assessment of existing bank stabilization structures.

  6. Assessment of self-organizing maps to analyze sole-carbon source utilization profiles.

    PubMed

    Leflaive, Joséphine; Céréghino, Régis; Danger, Michaël; Lacroix, Gérard; Ten-Hage, Loïc

    2005-07-01

    The use of community-level physiological profiles obtained with Biolog microplates is widely employed to consider the functional diversity of bacterial communities. Biolog produces a great amount of data which analysis has been the subject of many studies. In most cases, after some transformations, these data were investigated with classical multivariate analyses. Here we provided an alternative to this method, that is the use of an artificial intelligence technique, the Self-Organizing Maps (SOM, unsupervised neural network). We used data from a microcosm study of algae-associated bacterial communities placed in various nutritive conditions. Analyses were carried out on the net absorbances at two incubation times for each substrates and on the chemical guild categorization of the total bacterial activity. Compared to Principal Components Analysis and cluster analysis, SOM appeared as a valuable tool for community classification, and to establish clear relationships between clusters of bacterial communities and sole-carbon sources utilization. Specifically, SOM offered a clear bidimensional projection of a relatively large volume of data and were easier to interpret than plots commonly obtained with multivariate analyses. They would be recommended to pattern the temporal evolution of communities' functional diversity.

  7. On the use of spectra from portable Raman and ATR-IR instruments in synthesis route attribution of a chemical warfare agent by multivariate modeling.

    PubMed

    Wiktelius, Daniel; Ahlinder, Linnea; Larsson, Andreas; Höjer Holmgren, Karin; Norlin, Rikard; Andersson, Per Ola

    2018-08-15

    Collecting data under field conditions for forensic investigations of chemical warfare agents calls for the use of portable instruments. In this study, a set of aged, crude preparations of sulfur mustard were characterized spectroscopically without any sample preparation using handheld Raman and portable IR instruments. The spectral data was used to construct Random Forest multivariate models for the attribution of test set samples to the synthetic method used for their production. Colored and fluorescent samples were included in the study, which made Raman spectroscopy challenging although fluorescence was diminished by using an excitation wavelength of 1064 nm. The predictive power of models constructed with IR or Raman data alone, as well as with combined data was investigated. Both techniques gave useful data for attribution. Model performance was enhanced when Raman and IR spectra were combined, allowing correct classification of 19/23 (83%) of test set spectra. The results demonstrate that data obtained with spectroscopy instruments amenable for field deployment can be useful in forensic studies of chemical warfare agents. Copyright © 2018 Elsevier B.V. All rights reserved.

  8. Development of Raman microspectroscopy for automated detection and imaging of basal cell carcinoma

    NASA Astrophysics Data System (ADS)

    Larraona-Puy, Marta; Ghita, Adrian; Zoladek, Alina; Perkins, William; Varma, Sandeep; Leach, Iain H.; Koloydenko, Alexey A.; Williams, Hywel; Notingher, Ioan

    2009-09-01

    We investigate the potential of Raman microspectroscopy (RMS) for automated evaluation of excised skin tissue during Mohs micrographic surgery (MMS). The main aim is to develop an automated method for imaging and diagnosis of basal cell carcinoma (BCC) regions. Selected Raman bands responsible for the largest spectral differences between BCC and normal skin regions and linear discriminant analysis (LDA) are used to build a multivariate supervised classification model. The model is based on 329 Raman spectra measured on skin tissue obtained from 20 patients. BCC is discriminated from healthy tissue with 90+/-9% sensitivity and 85+/-9% specificity in a 70% to 30% split cross-validation algorithm. This multivariate model is then applied on tissue sections from new patients to image tumor regions. The RMS images show excellent correlation with the gold standard of histopathology sections, BCC being detected in all positive sections. We demonstrate the potential of RMS as an automated objective method for tumor evaluation during MMS. The replacement of current histopathology during MMS by a ``generalization'' of the proposed technique may improve the feasibility and efficacy of MMS, leading to a wider use according to clinical need.

  9. Multivariate pattern analysis of obsessive-compulsive disorder using structural neuroanatomy.

    PubMed

    Hu, Xinyu; Liu, Qi; Li, Bin; Tang, Wanjie; Sun, Huaiqiang; Li, Fei; Yang, Yanchun; Gong, Qiyong; Huang, Xiaoqi

    2016-02-01

    Magnetic resonance imaging (MRI) studies have revealed brain structural abnormalities in obsessive-compulsive disorder (OCD) patients, involving both gray matter (GM) and white matter (WM). However, the results of previous publications were based on average differences between groups, which limited their usages in clinical practice. Therefore, the aim of this study was to examine whether the application of multivariate pattern analysis (MVPA) to high-dimensional structural images would allow accurate discrimination between OCD patients and healthy control subjects (HCS). High-resolution T1-weighted images were acquired from 33 OCD patients and 33 demographically matched HCS in a 3.0 T scanner. Differences in GM and WM volume between OCD and HCS were examined using two types of well-established MVPA techniques: support vector machine (SVM) and Gaussian process classifier (GPC). We also drew a receiver operating characteristic (ROC) curve to evaluate the performance of each classifier. The classification accuracies for both classifiers using GM and WM anatomy were all above 75%. The highest classification accuracy (81.82%, P<0.001) was achieved with the SVM classifier using WM information. Regional brain anomalies with high discriminative power were based on three distributed networks including the fronto-striatal circuit, the temporo-parieto-occipital junction and the cerebellum. Our study illustrated that both GM and WM anatomical features may be useful in differentiating OCD patients from HCS. WM volume using the SVM approach showed the highest accuracy in our population for revealing group differences, which suggested its potential diagnostic role in detecting highly enriched OCD patients at the level of the individual. Copyright © 2015 Elsevier B.V. and ECNP. All rights reserved.

  10. Use of Neuroanatomical Pattern Classification to Identify Subjects in At-Risk Mental States of Psychosis and Predict Disease Transition

    PubMed Central

    Koutsouleris, Nikolaos; Meisenzahl, Eva M.; Davatzikos, Christos; Bottlender, Ronald; Frodl, Thomas; Scheuerecker, Johanna; Schmitt, Gisela; Zetzsche, Thomas; Decker, Petra; Reiser, Maximilian; Möller, Hans-Jürgen; Gaser, Christian

    2014-01-01

    Context Identification of individuals at high risk of developing psychosis has relied on prodromal symptomatology. Recently, machine learning algorithms have been successfully used for magnetic resonance imaging–based diagnostic classification of neuropsychiatric patient populations. Objective To determine whether multivariate neuroanatomical pattern classification facilitates identification of individuals in different at-risk mental states (ARMS) of psychosis and enables the prediction of disease transition at the individual level. Design Multivariate neuroanatomical pattern classification was performed on the structural magnetic resonance imaging data of individuals in early or late ARMS vs healthy controls (HCs). The predictive power of the method was then evaluated by categorizing the baseline imaging data of individuals with transition to psychosis vs those without transition vs HCs after 4 years of clinical follow-up. Classification generalizability was estimated by cross-validation and by categorizing an independent cohort of 45 new HCs. Setting Departments of Psychiatry and Psychotherapy, Ludwig-Maximilians-University, Munich, Germany. Participants The first classification analysis included 20 early and 25 late at-risk individuals and 25 matched HCs. The second analysis consisted of 15 individuals with transition, 18 without transition, and 17 matched HCs. Main Outcome Measures Specificity, sensitivity, and accuracy of classification. Results The 3-group, cross-validated classification accuracies of the first analysis were 86% (HCs vs the rest), 91% (early at-risk individuals vs the rest), and 86% (late at-risk individuals vs the rest). The accuracies in the second analysis were 90% (HCs vs the rest), 88% (individuals with transition vs the rest), and 86% (individuals without transition vs the rest). Independent HCs were correctly classified in 96% (first analysis) and 93% (second analysis) of cases. Conclusions Different ARMSs and their clinical outcomes may be reliably identified on an individual basis by assessing patterns of whole-brain neuroanatomical abnormalities. These patterns may serve as valuable biomarkers for the clinician to guide early detection in the prodromal phase of psychosis. PMID:19581561

  11. Elemental analysis of soils using laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) and laser-induced breakdown spectroscopy (LIBS) with multivariate discrimination: tape mounting as an alternative to pellets for small forensic transfer specimens.

    PubMed

    Jantzi, Sarah C; Almirall, José R

    2014-01-01

    Elemental analysis of soil is a useful application of both laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) and laser-induced breakdown spectroscopy (LIBS) in geological, agricultural, environmental, archeological, planetary, and forensic sciences. In forensic science, the question to be answered is often whether soil specimens found on objects (e.g., shoes, tires, or tools) originated from the crime scene or other location of interest. Elemental analysis of the soil from the object and the locations of interest results in a characteristic elemental profile of each specimen, consisting of the amount of each element present. Because multiple elements are measured, multivariate statistics can be used to compare the elemental profiles in order to determine whether the specimen from the object is similar to one of the locations of interest. Previous work involved milling and pressing 0.5 g of soil into pellets before analysis using LA-ICP-MS and LIBS. However, forensic examiners prefer techniques that require smaller samples, are less time consuming, and are less destructive, allowing for future analysis by other techniques. An alternative sample introduction method was developed to meet these needs while still providing quantitative results suitable for multivariate comparisons. The tape-mounting method involved deposition of a thin layer of soil onto double-sided adhesive tape. A comparison of tape-mounting and pellet method performance is reported for both LA-ICP-MS and LIBS. Calibration standards and reference materials, prepared using the tape method, were analyzed by LA-ICP-MS and LIBS. As with the pellet method, linear calibration curves were achieved with the tape method, as well as good precision and low bias. Soil specimens from Miami-Dade County were prepared by both the pellet and tape methods and analyzed by LA-ICP-MS and LIBS. Principal components analysis and linear discriminant analysis were applied to the multivariate data. Results from both the tape method and the pellet method were nearly identical, with clear groupings and correct classification rates of >94%.

  12. Analysis techniques for multivariate root loci. [a tool in linear control systems

    NASA Technical Reports Server (NTRS)

    Thompson, P. M.; Stein, G.; Laub, A. J.

    1980-01-01

    Analysis and techniques are developed for the multivariable root locus and the multivariable optimal root locus. The generalized eigenvalue problem is used to compute angles and sensitivities for both types of loci, and an algorithm is presented that determines the asymptotic properties of the optimal root locus.

  13. LSST Astroinformatics And Astrostatistics: Data-oriented Astronomical Research

    NASA Astrophysics Data System (ADS)

    Borne, Kirk D.; Stassun, K.; Brunner, R. J.; Djorgovski, S. G.; Graham, M.; Hakkila, J.; Mahabal, A.; Paegert, M.; Pesenson, M.; Ptak, A.; Scargle, J.; Informatics, LSST; Statistics Team

    2011-01-01

    The LSST Informatics and Statistics Science Collaboration (ISSC) focuses on research and scientific discovery challenges posed by the very large and complex data collection that LSST will generate. Application areas include astroinformatics, machine learning, data mining, astrostatistics, visualization, scientific data semantics, time series analysis, and advanced signal processing. Research problems to be addressed with these methodologies include transient event characterization and classification, rare class discovery, correlation mining, outlier/anomaly/surprise detection, improved estimators (e.g., for photometric redshift or early onset supernova classification), exploration of highly dimensional (multivariate) data catalogs, and more. We present sample science results from these data-oriented approaches to large-data astronomical research. We present results from LSST ISSC team members, including the EB (Eclipsing Binary) Factory, the environmental variations in the fundamental plane of elliptical galaxies, and outlier detection in multivariate catalogs.

  14. Multivariate analysis of volatile compounds detected by headspace solid-phase microextraction/gas chromatography: A tool for sensory classification of cork stoppers.

    PubMed

    Prat, Chantal; Besalú, Emili; Bañeras, Lluís; Anticó, Enriqueta

    2011-06-15

    The volatile fraction of aqueous cork macerates of tainted and non-tainted agglomerate cork stoppers was analysed by headspace solid-phase microextraction (HS-SPME)/gas chromatography. Twenty compounds containing terpenoids, aliphatic alcohols, lignin-related compounds and others were selected and analysed in individual corks. Cork stoppers were previously classified in six different classes according to sensory descriptions including, 2,4,6-trichloroanisole taint and other frequent, non-characteristic odours found in cork. A multivariate analysis of the chromatographic data of 20 selected chemical compounds using linear discriminant analysis models helped in the differentiation of the a priori made groups. The discriminant model selected five compounds as the best combination. Selected compounds appear in the model in the following order; 2,4,6 TCA, fenchyl alcohol, 1-octen-3-ol, benzyl alcohol and benzothiazole. Unfortunately, not all six a priori differentiated sensory classes were clearly discriminated in the model, probably indicating that no measurable differences exist in the chromatographic data for some categories. The predictive analyses of a refined model in which two sensory classes were fused together resulted in a good classification. Prediction rates of control (non-tainted), TCA, musty-earthy-vegetative, vegetative and chemical descriptions were 100%, 100%, 85%, 67.3% and 100%, respectively, when the modified model was used. The multivariate analysis of chromatographic data will help in the classification of stoppers and provide a perfect complement to sensory analyses. Copyright © 2010 Elsevier Ltd. All rights reserved.

  15. Accurate Identification of MCI Patients via Enriched White-Matter Connectivity Network

    NASA Astrophysics Data System (ADS)

    Wee, Chong-Yaw; Yap, Pew-Thian; Brownyke, Jeffery N.; Potter, Guy G.; Steffens, David C.; Welsh-Bohmer, Kathleen; Wang, Lihong; Shen, Dinggang

    Mild cognitive impairment (MCI), often a prodromal phase of Alzheimer's disease (AD), is frequently considered to be a good target for early diagnosis and therapeutic interventions of AD. Recent emergence of reliable network characterization techniques have made understanding neurological disorders at a whole brain connectivity level possible. Accordingly, we propose a network-based multivariate classification algorithm, using a collection of measures derived from white-matter (WM) connectivity networks, to accurately identify MCI patients from normal controls. An enriched description of WM connections, utilizing six physiological parameters, i.e., fiber penetration count, fractional anisotropy (FA), mean diffusivity (MD), and principal diffusivities (λ 1, λ 2, λ 3), results in six connectivity networks for each subject to account for the connection topology and the biophysical properties of the connections. Upon parcellating the brain into 90 regions-of-interest (ROIs), the average statistics of each ROI in relation to the remaining ROIs are extracted as features for classification. These features are then sieved to select the most discriminant subset of features for building an MCI classifier via support vector machines (SVMs). Cross-validation results indicate better diagnostic power of the proposed enriched WM connection description than simple description with any single physiological parameter.

  16. Classification of earth terrain using polarimetric synthetic aperture radar images

    NASA Technical Reports Server (NTRS)

    Lim, H. H.; Swartz, A. A.; Yueh, H. A.; Kong, J. A.; Shin, R. T.; Van Zyl, J. J.

    1989-01-01

    Supervised and unsupervised classification techniques are developed and used to classify the earth terrain components from SAR polarimetric images of San Francisco Bay and Traverse City, Michigan. The supervised techniques include the Bayes classifiers, normalized polarimetric classification, and simple feature classification using discriminates such as the absolute and normalized magnitude response of individual receiver channel returns and the phase difference between receiver channels. An algorithm is developed as an unsupervised technique which classifies terrain elements based on the relationship between the orientation angle and the handedness of the transmitting and receiving polariation states. It is found that supervised classification produces the best results when accurate classifier training data are used, while unsupervised classification may be applied when training data are not available.

  17. Discriminant forest classification method and system

    DOEpatents

    Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.

    2012-11-06

    A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.

  18. Unsupervised classification of remote multispectral sensing data

    NASA Technical Reports Server (NTRS)

    Su, M. Y.

    1972-01-01

    The new unsupervised classification technique for classifying multispectral remote sensing data which can be either from the multispectral scanner or digitized color-separation aerial photographs consists of two parts: (a) a sequential statistical clustering which is a one-pass sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. Applications of the technique using an IBM-7094 computer on multispectral data sets over Purdue's Flight Line C-1 and the Yellowstone National Park test site have been accomplished. Comparisons between the classification maps by the unsupervised technique and the supervised maximum liklihood technique indicate that the classification accuracies are in agreement.

  19. Investigating common coding of observed and executed actions in the monkey brain using cross-modal multi-variate fMRI classification.

    PubMed

    Fiave, Prosper Agbesi; Sharma, Saloni; Jastorff, Jan; Nelissen, Koen

    2018-05-19

    Mirror neurons are generally described as a neural substrate hosting shared representations of actions, by simulating or 'mirroring' the actions of others onto the observer's own motor system. Since single neuron recordings are rarely feasible in humans, it has been argued that cross-modal multi-variate pattern analysis (MVPA) of non-invasive fMRI data is a suitable technique to investigate common coding of observed and executed actions, allowing researchers to infer the presence of mirror neurons in the human brain. In an effort to close the gap between monkey electrophysiology and human fMRI data with respect to the mirror neuron system, here we tested this proposal for the first time in the monkey. Rhesus monkeys either performed reach-and-grasp or reach-and-touch motor acts with their right hand in the dark or observed videos of human actors performing similar motor acts. Unimodal decoding showed that both executed or observed motor acts could be decoded from numerous brain regions. Specific portions of rostral parietal, premotor and motor cortices, previously shown to house mirror neurons, in addition to somatosensory regions, yielded significant asymmetric action-specific cross-modal decoding. These results validate the use of cross-modal multi-variate fMRI analyses to probe the representations of own and others' actions in the primate brain and support the proposed mapping of others' actions onto the observer's own motor cortices. Copyright © 2018 Elsevier Inc. All rights reserved.

  20. Is Mitochondrial Donation Germ-Line Gene Therapy? Classifications and Ethical Implications.

    PubMed

    Newson, Ainsley J; Wrigley, Anthony

    2017-01-01

    The classification of techniques used in mitochondrial donation, including their role as purported germ-line gene therapies, is far from clear. These techniques exhibit characteristics typical of a variety of classifications that have been used in both scientific and bioethics scholarship. This raises two connected questions, which we address in this paper: (i) how should we classify mitochondrial donation techniques?; and (ii) what ethical implications surround such a classification? First, we outline how methods of genetic intervention, such as germ-line gene therapy, are typically defined or classified. We then consider whether techniques of mitochondrial donation fit into these, whether they might do so with some refinement of these categories, or whether they require some other approach to classification. To answer the second question, we discuss the relationship between classification and several key ethical issues arising from mitochondrial donation. We conclude that the properties characteristic of mitochondrial inheritance mean that most mitochondrial donation techniques belong to a new sub-class of genetic modification, which we call 'conditionally inheritable genomic modification' (CIGM). © 2017 John Wiley & Sons Ltd.

  1. Functional Groups Based on Leaf Physiology: Are they Spatially and Temporally Robust?

    NASA Technical Reports Server (NTRS)

    Foster, Tammy E.; Brooks, J. Renee

    2004-01-01

    The functional grouping hypothesis, which suggests that complexity in ecosystem function can be simplified by grouping species with similar responses, was tested in the Florida scrub habitat. Functional groups were identified based on how species in fire maintained Florida scrub regulate exchange of carbon and water with the atmosphere as indicated by both instantaneous gas exchange measurements and integrated measures of function (%N, delta C-13, delta N-15, C-N ratio). Using cluster analysis, five distinct physiologically-based functional groups were identified in the fire maintained scrub. These functional groups were tested to determine if they were robust spatially, temporally, and with management regime. Analysis of Similarities (ANOSIM), a non-parametric multivariate analysis, indicated that these five physiologically-based groupings were not altered by plot differences (R = -0.115, p = 0.893) or by the three different management regimes; prescribed burn, mechanically treated and burn, and fire-suppressed (R = 0.018, p = 0.349). The physiological groupings also remained robust between the two climatically different years 1999 and 2000 (R = -0.027, p = 0.725). Easy-to-measure morphological characteristics indicating functional groups would be more practical for scaling and modeling ecosystem processes than detailed gas-exchange measurements, therefore we tested a variety of morphological characteristics as functional indicators. A combination of non-parametric multivariate techniques (Hierarchical cluster analysis, non-metric Multi-Dimensional Scaling, and ANOSIM) were used to compare the ability of life form, leaf thickness, and specific leaf area classifications to identify the physiologically-based functional groups. Life form classifications (ANOSIM; R = 0.629, p 0.001) were able to depict the physiological groupings more adequately than either specific leaf area (ANOSIM; R = 0.426, p = 0.001) or leaf thickness (ANOSIM; R 0.344, p 0.001). The ability of life forms to depict the physiological groupings was improved by separating the parasitic Ximenia americana from the shrub category (ANOSIM; R = 0.794, p = 0.001). Therefore, a life form classification including parasites was determined to be a good indicator of the physiological processes of scrub species, and would be a useful method of grouping for scaling physiological processes to the ecosystem level.

  2. Age-specific discrimination of blood plasma samples of healthy and ovarian cancer prone mice using laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Melikechi, Noureddine; Markushin, Yuri; Connolly, Denise C.; Lasue, Jeremie; Ewusi-Annan, Ebo; Makrogiannis, Sokratis

    2016-09-01

    Epithelial ovarian cancer (EOC) mortality rates are strongly correlated with the stage at which it is diagnosed. Detection of EOC prior to its dissemination from the site of origin is known to significantly improve the patient outcome. However, there are currently no effective methods for early detection of the most common and lethal subtype of EOC. We sought to determine whether laser-induced breakdown spectroscopy (LIBS) and classification techniques such as linear discriminant analysis (LDA) and random forest (RF) could classify and differentiate blood plasma specimens from transgenic mice with ovarian carcinoma and wild type control mice. Herein we report results using this approach to distinguish blood plasma samples obtained from serially bled (at 8, 12, and 16 weeks) tumor-bearing TgMISIIR-TAg transgenic and wild type cancer-free littermate control mice. We have calculated the age-specific accuracy of classification using 18,000 laser-induced breakdown spectra of the blood plasma samples from tumor-bearing mice and wild type controls. When the analysis is performed in the spectral range 250 nm to 680 nm using LDA, these are 76.7 (± 2.6)%, 71.2 (± 1.3)%, and 73.1 (± 1.4)%, for the 8, 12 and 16 weeks. When the RF classifier is used, we obtain values of 78.5 (± 2.3)%, 76.9 (± 2.1)% and 75.4 (± 2.0)% in the spectral range of 250 nm to 680 nm, and 81.0 (± 1.8)%, 80.4 (± 2.1)% and 79.6 (± 3.5)% in 220 nm to 850 nm. In addition, we report, the positive and negative predictive values of the classification of the two classes of blood plasma samples. The approach used in this study is rapid, requires only 5 μL of blood plasma, and is based on the use of unsupervised and widely accepted multivariate analysis algorithms. These findings suggest that LIBS and multivariate analysis may be a novel approach for detecting EOC.

  3. Classification of forensic autopsy reports through conceptual graph-based document representation model.

    PubMed

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2018-06-01

    Text categorization has been used extensively in recent years to classify plain-text clinical reports. This study employs text categorization techniques for the classification of open narrative forensic autopsy reports. One of the key steps in text classification is document representation. In document representation, a clinical report is transformed into a format that is suitable for classification. The traditional document representation technique for text categorization is the bag-of-words (BoW) technique. In this study, the traditional BoW technique is ineffective in classifying forensic autopsy reports because it merely extracts frequent but discriminative features from clinical reports. Moreover, this technique fails to capture word inversion, as well as word-level synonymy and polysemy, when classifying autopsy reports. Hence, the BoW technique suffers from low accuracy and low robustness unless it is improved with contextual and application-specific information. To overcome the aforementioned limitations of the BoW technique, this research aims to develop an effective conceptual graph-based document representation (CGDR) technique to classify 1500 forensic autopsy reports from four (4) manners of death (MoD) and sixteen (16) causes of death (CoD). Term-based and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) based conceptual features were extracted and represented through graphs. These features were then used to train a two-level text classifier. The first level classifier was responsible for predicting MoD. In addition, the second level classifier was responsible for predicting CoD using the proposed conceptual graph-based document representation technique. To demonstrate the significance of the proposed technique, its results were compared with those of six (6) state-of-the-art document representation techniques. Lastly, this study compared the effects of one-level classification and two-level classification on the experimental results. The experimental results indicated that the CGDR technique achieved 12% to 15% improvement in accuracy compared with fully automated document representation baseline techniques. Moreover, two-level classification obtained better results compared with one-level classification. The promising results of the proposed conceptual graph-based document representation technique suggest that pathologists can adopt the proposed system as their basis for second opinion, thereby supporting them in effectively determining CoD. Copyright © 2018 Elsevier Inc. All rights reserved.

  4. Multivariate neuroanatomical classification of cognitive subtypes in schizophrenia: A support vector machine learning approach

    PubMed Central

    Gould, Ian C.; Shepherd, Alana M.; Laurens, Kristin R.; Cairns, Murray J.; Carr, Vaughan J.; Green, Melissa J.

    2014-01-01

    Heterogeneity in the structural brain abnormalities associated with schizophrenia has made identification of reliable neuroanatomical markers of the disease difficult. The use of more homogenous clinical phenotypes may improve the accuracy of predicting psychotic disorder/s on the basis of observable brain disturbances. Here we investigate the utility of cognitive subtypes of schizophrenia – ‘cognitive deficit’ and ‘cognitively spared’ – in determining whether multivariate patterns of volumetric brain differences can accurately discriminate these clinical subtypes from healthy controls, and from each other. We applied support vector machine classification to grey- and white-matter volume data from 126 schizophrenia patients previously allocated to the cognitive spared subtype, 74 cognitive deficit schizophrenia patients, and 134 healthy controls. Using this method, cognitive subtypes were distinguished from healthy controls with up to 72% accuracy. Cross-validation analyses between subtypes achieved an accuracy of 71%, suggesting that some common neuroanatomical patterns distinguish both subtypes from healthy controls. Notably, cognitive subtypes were best distinguished from one another when the sample was stratified by sex prior to classification analysis: cognitive subtype classification accuracy was relatively low (<60%) without stratification, and increased to 83% for females with sex stratification. Distinct neuroanatomical patterns predicted cognitive subtype status in each sex: sex-specific multivariate patterns did not predict cognitive subtype status in the other sex above chance, and weight map analyses demonstrated negative correlations between the spatial patterns of weights underlying classification for each sex. These results suggest that in typical mixed-sex samples of schizophrenia patients, the volumetric brain differences between cognitive subtypes are relatively minor in contrast to the large common disease-associated changes. Volumetric differences that distinguish between cognitive subtypes on a case-by-case basis appear to occur in a sex-specific manner that is consistent with previous evidence of disrupted relationships between brain structure and cognition in male, but not female, schizophrenia patients. Consideration of sex-specific differences in brain organization is thus likely to assist future attempts to distinguish subgroups of schizophrenia patients on the basis of neuroanatomical features. PMID:25379435

  5. A comparison of different chemometrics approaches for the robust classification of electronic nose data.

    PubMed

    Gromski, Piotr S; Correa, Elon; Vaughan, Andrew A; Wedge, David C; Turner, Michael L; Goodacre, Royston

    2014-11-01

    Accurate detection of certain chemical vapours is important, as these may be diagnostic for the presence of weapons, drugs of misuse or disease. In order to achieve this, chemical sensors could be deployed remotely. However, the readout from such sensors is a multivariate pattern, and this needs to be interpreted robustly using powerful supervised learning methods. Therefore, in this study, we compared the classification accuracy of four pattern recognition algorithms which include linear discriminant analysis (LDA), partial least squares-discriminant analysis (PLS-DA), random forests (RF) and support vector machines (SVM) which employed four different kernels. For this purpose, we have used electronic nose (e-nose) sensor data (Wedge et al., Sensors Actuators B Chem 143:365-372, 2009). In order to allow direct comparison between our four different algorithms, we employed two model validation procedures based on either 10-fold cross-validation or bootstrapping. The results show that LDA (91.56% accuracy) and SVM with a polynomial kernel (91.66% accuracy) were very effective at analysing these e-nose data. These two models gave superior prediction accuracy, sensitivity and specificity in comparison to the other techniques employed. With respect to the e-nose sensor data studied here, our findings recommend that SVM with a polynomial kernel should be favoured as a classification method over the other statistical models that we assessed. SVM with non-linear kernels have the advantage that they can be used for classifying non-linear as well as linear mapping from analytical data space to multi-group classifications and would thus be a suitable algorithm for the analysis of most e-nose sensor data.

  6. On the effect of experimental noise on the classification of biological samples using Raman micro-spectroscopy

    NASA Astrophysics Data System (ADS)

    Barton, Sinead J.; Kerr, Laura T.; Domijan, Katarina; Hennelly, Bryan M.

    2016-04-01

    Raman micro-spectroscopy is an optoelectronic technique that can be used to evaluate the chemical composition of biological samples and has been shown to be a powerful diagnostic tool for the investigation of various cancer related diseases including bladder, breast, and cervical cancer. Raman scattering is an inherently weak process with approximately 1 in 107 photons undergoing scattering and for this reason, noise from the recording system can have a significant impact on the quality of the signal, and its suitability for diagnostic classification. The main sources of noise in the recorded signal are shot noise, CCD dark current, and CCD readout noise. Shot noise results from the low signal photon count while dark current results from thermally generated electrons in the semiconductor pixels. Both of these noise sources are time dependent; readout noise is time independent but is inherent in each individual recording and results in the fundamental limit of measurement, arising from the internal electronics of the camera. In this paper, each of the aforementioned noise sources are analysed in isolation, and used to experimentally validate a mathematical model. This model is then used to simulate spectra that might be acquired under various experimental conditions including the use of different cameras, different source wavelength, and power etc. Simulated noisy datasets of T24 and RT112 cell line spectra are generated based on true cell Raman spectrum irradiance values (recorded using very long exposure times) and the addition of simulated noise. These datasets are then input to multivariate classification using Principal Components Analysis and Linear Discriminant Analysis. This method enables an investigation into the effect of noise on the sensitivity and specificity of Raman based classification under various experimental conditions and using different equipment.

  7. Improved sparse decomposition based on a smoothed L0 norm using a Laplacian kernel to select features from fMRI data.

    PubMed

    Zhang, Chuncheng; Song, Sutao; Wen, Xiaotong; Yao, Li; Long, Zhiying

    2015-04-30

    Feature selection plays an important role in improving the classification accuracy of multivariate classification techniques in the context of fMRI-based decoding due to the "few samples and large features" nature of functional magnetic resonance imaging (fMRI) data. Recently, several sparse representation methods have been applied to the voxel selection of fMRI data. Despite the low computational efficiency of the sparse representation methods, they still displayed promise for applications that select features from fMRI data. In this study, we proposed the Laplacian smoothed L0 norm (LSL0) approach for feature selection of fMRI data. Based on the fast sparse decomposition using smoothed L0 norm (SL0) (Mohimani, 2007), the LSL0 method used the Laplacian function to approximate the L0 norm of sources. Results of the simulated and real fMRI data demonstrated the feasibility and robustness of LSL0 for the sparse source estimation and feature selection. Simulated results indicated that LSL0 produced more accurate source estimation than SL0 at high noise levels. The classification accuracy using voxels that were selected by LSL0 was higher than that by SL0 in both simulated and real fMRI experiment. Moreover, both LSL0 and SL0 showed higher classification accuracy and required less time than ICA and t-test for the fMRI decoding. LSL0 outperformed SL0 in sparse source estimation at high noise level and in feature selection. Moreover, LSL0 and SL0 showed better performance than ICA and t-test for feature selection. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. Classification of autistic individuals and controls using cross-task characterization of fMRI activity

    PubMed Central

    Chanel, Guillaume; Pichon, Swann; Conty, Laurence; Berthoz, Sylvie; Chevallier, Coralie; Grèzes, Julie

    2015-01-01

    Multivariate pattern analysis (MVPA) has been applied successfully to task-based and resting-based fMRI recordings to investigate which neural markers distinguish individuals with autistic spectrum disorders (ASD) from controls. While most studies have focused on brain connectivity during resting state episodes and regions of interest approaches (ROI), a wealth of task-based fMRI datasets have been acquired in these populations in the last decade. This calls for techniques that can leverage information not only from a single dataset, but from several existing datasets that might share some common features and biomarkers. We propose a fully data-driven (voxel-based) approach that we apply to two different fMRI experiments with social stimuli (faces and bodies). The method, based on Support Vector Machines (SVMs) and Recursive Feature Elimination (RFE), is first trained for each experiment independently and each output is then combined to obtain a final classification output. Second, this RFE output is used to determine which voxels are most often selected for classification to generate maps of significant discriminative activity. Finally, to further explore the clinical validity of the approach, we correlate phenotypic information with obtained classifier scores. The results reveal good classification accuracy (range between 69% and 92.3%). Moreover, we were able to identify discriminative activity patterns pertaining to the social brain without relying on a priori ROI definitions. Finally, social motivation was the only dimension which correlated with classifier scores, suggesting that it is the main dimension captured by the classifiers. Altogether, we believe that the present RFE method proves to be efficient and may help identifying relevant biomarkers by taking advantage of acquired task-based fMRI datasets in psychiatric populations. PMID:26793434

  9. Missing data exploration: highlighting graphical presentation of missing pattern.

    PubMed

    Zhang, Zhongheng

    2015-12-01

    Functions shipped with R base can fulfill many tasks of missing data handling. However, because the data volume of electronic medical record (EMR) system is always very large, more sophisticated methods may be helpful in data management. The article focuses on missing data handling by using advanced techniques. There are three types of missing data, that is, missing completely at random (MCAR), missing at random (MAR) and not missing at random (NMAR). This classification system depends on how missing values are generated. Two packages, Multivariate Imputation by Chained Equations (MICE) and Visualization and Imputation of Missing Values (VIM), provide sophisticated functions to explore missing data pattern. In particular, the VIM package is especially helpful in visual inspection of missing data. Finally, correlation analysis provides information on the dependence of missing data on other variables. Such information is useful in subsequent imputations.

  10. Recursive Partitioning Analysis for New Classification of Patients With Esophageal Cancer Treated by Chemoradiotherapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nomura, Motoo, E-mail: excell@hkg.odn.ne.jp; Department of Clinical Oncology, Aichi Cancer Center Hospital, Nagoya; Department of Radiation Oncology, Aichi Cancer Center Hospital, Nagoya

    2012-11-01

    Background: The 7th edition of the American Joint Committee on Cancer staging system does not include lymph node size in the guidelines for staging patients with esophageal cancer. The objectives of this study were to determine the prognostic impact of the maximum metastatic lymph node diameter (ND) on survival and to develop and validate a new staging system for patients with esophageal squamous cell cancer who were treated with definitive chemoradiotherapy (CRT). Methods: Information on 402 patients with esophageal cancer undergoing CRT at two institutions was reviewed. Univariate and multivariate analyses of data from one institution were used to assessmore » the impact of clinical factors on survival, and recursive partitioning analysis was performed to develop the new staging classification. To assess its clinical utility, the new classification was validated using data from the second institution. Results: By multivariate analysis, gender, T, N, and ND stages were independently and significantly associated with survival (p < 0.05). The resulting new staging classification was based on the T and ND. The four new stages led to good separation of survival curves in both the developmental and validation datasets (p < 0.05). Conclusions: Our results showed that lymph node size is a strong independent prognostic factor and that the new staging system, which incorporated lymph node size, provided good prognostic power, and discriminated effectively for patients with esophageal cancer undergoing CRT.« less

  11. Metacarpophalangeal pattern profile analysis: useful diagnostic tool for differentiating between dyschondrosteosis, Turner syndrome, and hypochondroplasia.

    PubMed

    Laurencikas, E; Sävendahl, L; Jorulf, H

    2006-06-01

    To assess the value of the metacarpophalangeal pattern profile (MCPP) analysis as a diagnostic tool for differentiating between patients with dyschondrosteosis, Turner syndrome, and hypochondroplasia. Radiographic and clinical data from 135 patients between 1 and 51 years of age were collected and analyzed. The study included 25 patients with hypochondroplasia (HCP), 39 with dyschondrosteosis (LWD), and 71 with Turner syndrome (TS). Hand pattern profiles were calculated and compared with those of 110 normal individuals. Pearson correlation coefficient (r) and multivariate discriminant analysis were used for pattern profile analysis. Pattern variability index, a measure of dysmorphogenesis, was calculated for LWD, TS, HCP, and normal controls. Our results demonstrate that patients with LWD, TS, or HCP have distinct pattern profiles that are significantly different from each other and from those of normal controls. Discriminant analysis yielded correct classification of normal versus abnormal individuals in 84% of cases. Classification of the patients into LWD, TS, and HCP groups was successful in 75%. The correct classification rate was higher (85%) when differentiating two pathological groups at a time. Pattern variability index was not helpful for differential diagnosis of LWD, TS, and HCP. Patients with LWD, TS, or HCP have distinct MCPPs and can be successfully differentiated from each other using advanced MCPP analysis. Discriminant analysis is to be preferred over Pearson correlation coefficient because it is a more sensitive and specific technique. MCPP analysis is a helpful tool for differentiating between syndromes with similar clinical and radiological abnormalities.

  12. Hyperspectral image reconstruction using RGB color for foodborne pathogen detection on agar plates

    NASA Astrophysics Data System (ADS)

    Yoon, Seung-Chul; Shin, Tae-Sung; Park, Bosoon; Lawrence, Kurt C.; Heitschmidt, Gerald W.

    2014-03-01

    This paper reports the latest development of a color vision technique for detecting colonies of foodborne pathogens grown on agar plates with a hyperspectral image classification model that was developed using full hyperspectral data. The hyperspectral classification model depended on reflectance spectra measured in the visible and near-infrared spectral range from 400 and 1,000 nm (473 narrow spectral bands). Multivariate regression methods were used to estimate and predict hyperspectral data from RGB color values. The six representative non-O157 Shiga-toxin producing Eschetichia coli (STEC) serogroups (O26, O45, O103, O111, O121, and O145) were grown on Rainbow agar plates. A line-scan pushbroom hyperspectral image sensor was used to scan 36 agar plates grown with pure STEC colonies at each plate. The 36 hyperspectral images of the agar plates were divided in half to create training and test sets. The mean Rsquared value for hyperspectral image estimation was about 0.98 in the spectral range between 400 and 700 nm for linear, quadratic and cubic polynomial regression models and the detection accuracy of the hyperspectral image classification model with the principal component analysis and k-nearest neighbors for the test set was up to 92% (99% with the original hyperspectral images). Thus, the results of the study suggested that color-based detection may be viable as a multispectral imaging solution without much loss of prediction accuracy compared to hyperspectral imaging.

  13. Multi-parameter machine learning approach to the neuroanatomical basis of developmental dyslexia.

    PubMed

    Płoński, Piotr; Gradkowski, Wojciech; Altarelli, Irene; Monzalvo, Karla; van Ermingen-Marbach, Muna; Grande, Marion; Heim, Stefan; Marchewka, Artur; Bogorodzki, Piotr; Ramus, Franck; Jednoróg, Katarzyna

    2017-02-01

    Despite decades of research, the anatomical abnormalities associated with developmental dyslexia are still not fully described. Studies have focused on between-group comparisons in which different neuroanatomical measures were generally explored in isolation, disregarding potential interactions between regions and measures. Here, for the first time a multivariate classification approach was used to investigate grey matter disruptions in children with dyslexia in a large (N = 236) multisite sample. A variety of cortical morphological features, including volumetric (volume, thickness and area) and geometric (folding index and mean curvature) measures were taken into account and generalizability of classification was assessed with both 10-fold and leave-one-out cross validation (LOOCV) techniques. Classification into control vs. dyslexic subjects achieved above chance accuracy (AUC = 0.66 and ACC = 0.65 in the case of 10-fold CV, and AUC = 0.65 and ACC = 0.64 using LOOCV) after principled feature selection. Features that discriminated between dyslexic and control children were exclusively situated in the left hemisphere including superior and middle temporal gyri, subparietal sulcus and prefrontal areas. They were related to geometric properties of the cortex, with generally higher mean curvature and a greater folding index characterizing the dyslexic group. Our results support the hypothesis that an atypical curvature pattern with extra folds in left hemispheric perisylvian regions characterizes dyslexia. Hum Brain Mapp 38:900-908, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  14. Using machine learning techniques to automate sky survey catalog generation

    NASA Technical Reports Server (NTRS)

    Fayyad, Usama M.; Roden, J. C.; Doyle, R. J.; Weir, Nicholas; Djorgovski, S. G.

    1993-01-01

    We describe the application of machine classification techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Palomar Observatory Sky Survey provides comprehensive photographic coverage of the northern celestial hemisphere. The photographic plates are being digitized into images containing on the order of 10(exp 7) galaxies and 10(exp 8) stars. Since the size of this data set precludes manual analysis and classification of objects, our approach is to develop a software system which integrates independently developed techniques for image processing and data classification. Image processing routines are applied to identify and measure features of sky objects. Selected features are used to determine the classification of each object. GID3* and O-BTree, two inductive learning techniques, are used to automatically learn classification decision trees from examples. We describe the techniques used, the details of our specific application, and the initial encouraging results which indicate that our approach is well-suited to the problem. The benefits of the approach are increased data reduction throughput, consistency of classification, and the automated derivation of classification rules that will form an objective, examinable basis for classifying sky objects. Furthermore, astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems given automatically cataloged data.

  15. Comparative forensic soil analysis of New Jersey state parks using a combination of simple techniques with multivariate statistics.

    PubMed

    Bonetti, Jennifer; Quarino, Lawrence

    2014-05-01

    This study has shown that the combination of simple techniques with the use of multivariate statistics offers the potential for the comparative analysis of soil samples. Five samples were obtained from each of twelve state parks across New Jersey in both the summer and fall seasons. Each sample was examined using particle-size distribution, pH analysis in both water and 1 M CaCl2 , and a loss on ignition technique. Data from each of the techniques were combined, and principal component analysis (PCA) and canonical discriminant analysis (CDA) were used for multivariate data transformation. Samples from different locations could be visually differentiated from one another using these multivariate plots. Hold-one-out cross-validation analysis showed error rates as low as 3.33%. Ten blind study samples were analyzed resulting in no misclassifications using Mahalanobis distance calculations and visual examinations of multivariate plots. Seasonal variation was minimal between corresponding samples, suggesting potential success in forensic applications. © 2014 American Academy of Forensic Sciences.

  16. Importance of recurrence rating, morphology, hernial gap size, and risk factors in ventral and incisional hernia classification.

    PubMed

    Dietz, U A; Winkler, M S; Härtel, R W; Fleischhacker, A; Wiegering, A; Isbert, C; Jurowich, Ch; Heuschmann, P; Germer, C-T

    2014-02-01

    There is limited evidence on the natural course of ventral and incisional hernias and the results of hernia repair, what might partially be explained by the lack of an accepted classification system. The aim of the present study is to investigate the association of the criteria included in the Wuerzburg classification system of ventral and incisional hernias with postoperative complications and long-term recurrence. In a retrospective cohort study, the data on 330 consecutive patients who underwent surgery to repair ventral and incisional hernias were analyzed. The following four classification criteria were applied: (a) recurrence rating (ventral, incisional or incisional recurrent); (b) morphology (location); (c) size of the hernial gap; and (d) risk factors. The primary endpoint was the occurrence of a recurrence during follow-up. Secondary endpoints were incidence of postoperative complications. Independent association between classification criteria, type of surgical procedures and postoperative complications was calculated by multivariate logistic regression analysis and between classification criteria, type of surgical procedures and risk of long-term recurrence by Cox regression analysis. Follow-up lasted a mean 47.7 ± 23.53 months (median 45 months) or 3.9 ± 1.96 years. The criterion "recurrence rating" was found as predictive factor for postoperative complications in the multivariate analysis (OR 2.04; 95 % CI 1.09-3.84; incisional vs. ventral hernia). The criterion "morphology" had influence neither on the incidence of the critical event "recurrence during follow-up" nor on the incidence of postoperative complications. Hernial gap "width" predicted postoperative complications in the multivariate analysis (OR 1.98; 95 % CI 1.19-3.29; ≤5 vs. >5 cm). Length of the hernial gap was found to be an independent prognostic factor for the critical event "recurrence during follow-up" (HR 2.05; 95 % CI 1.25-3.37; ≤5 vs. >5 cm). The presence of 3 or more risk factors was a consistent predictor for "recurrence during follow-up" (HR 2.25; 95 % CI 1.28-9.92). Mesh repair was an independent protective factor for "recurrence during follow-up" compared to suture (HR 0.53; 95 % CI 0.32-0.86). The ventral and incisional hernia classification of Dietz et al. employs a clinically proven terminology and has an open classification structure. Hernial gap size and the number of risk factors are independent predictors for "recurrence during follow-up", whereas recurrence rating and hernial gap size correlated significantly with the incidence of postoperative complications. We propose the application of these criteria for future clinical research, as larger patient numbers will be needed to refine the results.

  17. Fast discrimination of hydroxypropyl methyl cellulose using portable Raman spectrometer and multivariate methods

    NASA Astrophysics Data System (ADS)

    Song, Biao; Lu, Dan; Peng, Ming; Li, Xia; Zou, Ye; Huang, Meizhen; Lu, Feng

    2017-02-01

    Raman spectroscopy is developed as a fast and non-destructive method for the discrimination and classification of hydroxypropyl methyl cellulose (HPMC) samples. 44 E series and 41 K series of HPMC samples are measured by a self-developed portable Raman spectrometer (Hx-Raman) which is excited by a 785 nm diode laser and the spectrum range is 200-2700 cm-1 with a resolution (FWHM) of 6 cm-1. Multivariate analysis is applied for discrimination of E series from K series. By methods of principal components analysis (PCA) and Fisher discriminant analysis (FDA), a discrimination result with sensitivity of 90.91% and specificity of 95.12% is achieved. The corresponding receiver operating characteristic (ROC) is 0.99, indicting the accuracy of the predictive model. This result demonstrates the prospect of portable Raman spectrometer for rapid, non-destructive classification and discrimination of E series and K series samples of HPMC.

  18. "Relative CIR": an image enhancement and visualization technique

    USGS Publications Warehouse

    Fleming, Michael D.

    1993-01-01

    Many techniques exist to spectrally and spatially enhance digital multispectral scanner data. One technique enhances an image while keeping the colors as they would appear in a color-infrared (CIR) image. This "relative CIR" technique generates an image that is both spectrally and spatially enhanced, while displaying a maximum range of colors. The technique enables an interpreter to visualize either spectral or land cover classes by their relative CIR characteristics. A relative CIR image is generated by developed spectral statistics for each class in the classifications and then, using a nonparametric approach for spectral enhancement, the means of the classes for each band are ranked. A 3 by 3 pixel smoothing filter is applied to the classification for spatial enhancement and the classes are mapped to the representative rank for each band. Practical applications of the technique include displaying an image classification product as a CIR image that was not derived directly from a spectral image, visualizing how a land cover classification would look as a CIR image, and displaying a spectral classification or intermediate product that will be used to label spectral classes.

  19. A machine learning approach to identify functional biomarkers in human prefrontal cortex for individuals with traumatic brain injury using functional near-infrared spectroscopy.

    PubMed

    Karamzadeh, Nader; Amyot, Franck; Kenney, Kimbra; Anderson, Afrouz; Chowdhry, Fatima; Dashtestani, Hadis; Wassermann, Eric M; Chernomordik, Victor; Boccara, Claude; Wegman, Edward; Diaz-Arrastia, Ramon; Gandjbakhche, Amir H

    2016-11-01

    We have explored the potential prefrontal hemodynamic biomarkers to characterize subjects with Traumatic Brain Injury (TBI) by employing the multivariate machine learning approach and introducing a novel task-related hemodynamic response detection followed by a heuristic search for optimum set of hemodynamic features. To achieve this goal, the hemodynamic response from a group of 31 healthy controls and 30 chronic TBI subjects were recorded as they performed a complexity task. To determine the optimum hemodynamic features, we considered 11 features and their combinations in characterizing TBI subjects. We investigated the significance of the features by utilizing a machine learning classification algorithm to score all the possible combinations of features according to their predictive power. The identified optimum feature elements resulted in classification accuracy, sensitivity, and specificity of 85%, 85%, and 84%, respectively. Classification improvement was achieved for TBI subject classification through feature combination. It signified the major advantage of the multivariate analysis over the commonly used univariate analysis suggesting that the features that are individually irrelevant in characterizing the data may become relevant when used in combination. We also conducted a spatio-temporal classification to identify regions within the prefrontal cortex (PFC) that contribute in distinguishing between TBI and healthy subjects. As expected, Brodmann areas (BA) 10 within the PFC were isolated as the region that healthy subjects (unlike subjects with TBI), showed major hemodynamic activity in response to the High Complexity task. Overall, our results indicate that identified temporal and spatio-temporal features from PFC's hemodynamic activity are promising biomarkers in classifying subjects with TBI.

  20. Control-group feature normalization for multivariate pattern analysis of structural MRI data using the support vector machine.

    PubMed

    Linn, Kristin A; Gaonkar, Bilwaj; Satterthwaite, Theodore D; Doshi, Jimit; Davatzikos, Christos; Shinohara, Russell T

    2016-05-15

    Normalization of feature vector values is a common practice in machine learning. Generally, each feature value is standardized to the unit hypercube or by normalizing to zero mean and unit variance. Classification decisions based on support vector machines (SVMs) or by other methods are sensitive to the specific normalization used on the features. In the context of multivariate pattern analysis using neuroimaging data, standardization effectively up- and down-weights features based on their individual variability. Since the standard approach uses the entire data set to guide the normalization, it utilizes the total variability of these features. This total variation is inevitably dependent on the amount of marginal separation between groups. Thus, such a normalization may attenuate the separability of the data in high dimensional space. In this work we propose an alternate approach that uses an estimate of the control-group standard deviation to normalize features before training. We study our proposed approach in the context of group classification using structural MRI data. We show that control-based normalization leads to better reproducibility of estimated multivariate disease patterns and improves the classifier performance in many cases. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Exploring the CAESAR database using dimensionality reduction techniques

    NASA Astrophysics Data System (ADS)

    Mendoza-Schrock, Olga; Raymer, Michael L.

    2012-06-01

    The Civilian American and European Surface Anthropometry Resource (CAESAR) database containing over 40 anthropometric measurements on over 4000 humans has been extensively explored for pattern recognition and classification purposes using the raw, original data [1-4]. However, some of the anthropometric variables would be impossible to collect in an uncontrolled environment. Here, we explore the use of dimensionality reduction methods in concert with a variety of classification algorithms for gender classification using only those variables that are readily observable in an uncontrolled environment. Several dimensionality reduction techniques are employed to learn the underlining structure of the data. These techniques include linear projections such as the classical Principal Components Analysis (PCA) and non-linear (manifold learning) techniques, such as Diffusion Maps and the Isomap technique. This paper briefly describes all three techniques, and compares three different classifiers, Naïve Bayes, Adaboost, and Support Vector Machines (SVM), for gender classification in conjunction with each of these three dimensionality reduction approaches.

  2. Objective classification of ecological status in marine water bodies using ecotoxicological information and multivariate analysis.

    PubMed

    Beiras, Ricardo; Durán, Iria

    2014-12-01

    Some relevant shortcomings have been identified in the current approach for the classification of ecological status in marine water bodies, leading to delays in the fulfillment of the Water Framework Directive objectives. Natural variability makes difficult to settle fixed reference values and boundary values for the Ecological Quality Ratios (EQR) for the biological quality elements. Biological responses to environmental degradation are frequently of nonmonotonic nature, hampering the EQR approach. Community structure traits respond only once ecological damage has already been done and do not provide early warning signals. An alternative methodology for the classification of ecological status integrating chemical measurements, ecotoxicological bioassays and community structure traits (species richness and diversity), and using multivariate analyses (multidimensional scaling and cluster analysis), is proposed. This approach does not depend on the arbitrary definition of fixed reference values and EQR boundary values, and it is suitable to integrate nonlinear, sensitive signals of ecological degradation. As a disadvantage, this approach demands the inclusion of sampling sites representing the full range of ecological status in each monitoring campaign. National or international agencies in charge of coastal pollution monitoring have comprehensive data sets available to overcome this limitation.

  3. Workshop on Algorithms for Time-Series Analysis

    NASA Astrophysics Data System (ADS)

    Protopapas, Pavlos

    2012-04-01

    abstract-type="normal">SummaryThis Workshop covered the four major subjects listed below in two 90-minute sessions. Each talk or tutorial allowed questions, and concluded with a discussion. Classification: Automatic classification using machine-learning methods is becoming a standard in surveys that generate large datasets. Ashish Mahabal (Caltech) reviewed various methods, and presented examples of several applications. Time-Series Modelling: Suzanne Aigrain (Oxford University) discussed autoregressive models and multivariate approaches such as Gaussian Processes. Meta-classification/mixture of expert models: Karim Pichara (Pontificia Universidad Católica, Chile) described the substantial promise which machine-learning classification methods are now showing in automatic classification, and discussed how the various methods can be combined together. Event Detection: Pavlos Protopapas (Harvard) addressed methods of fast identification of events with low signal-to-noise ratios, enlarging on the characterization and statistical issues of low signal-to-noise ratios and rare events.

  4. Explaining Match Outcome During The Men’s Basketball Tournament at The Olympic Games

    PubMed Central

    Leicht, Anthony S.; Gómez, Miguel A.; Woods, Carl T.

    2017-01-01

    In preparation for the Olympics, there is a limited opportunity for coaches and athletes to interact regularly with team performance indicators providing important guidance to coaches for enhanced match success at the elite level. This study examined the relationship between match outcome and team performance indicators during men’s basketball tournaments at the Olympic Games. Twelve team performance indicators were collated from all men’s teams and matches during the basketball tournament of the 2004-2016 Olympic Games (n = 156). Linear and non-linear analyses examined the relationship between match outcome and team performance indicator characteristics; namely, binary logistic regression and a conditional interference (CI) classification tree. The most parsimonious logistic regression model retained ‘assists’, ‘defensive rebounds’, ‘field-goal percentage’, ‘fouls’, ‘fouls against’, ‘steals’ and ‘turnovers’ (delta AIC <0.01; Akaike weight = 0.28) with a classification accuracy of 85.5%. Conversely, four performance indicators were retained with the CI classification tree with an average classification accuracy of 81.4%. However, it was the combination of ‘field-goal percentage’ and ‘defensive rebounds’ that provided the greatest probability of winning (93.2%). Match outcome during the men’s basketball tournaments at the Olympic Games was identified by a unique combination of performance indicators. Despite the average model accuracy being marginally higher for the logistic regression analysis, the CI classification tree offered a greater practical utility for coaches through its resolution of non-linear phenomena to guide team success. Key points A unique combination of team performance indicators explained 93.2% of winning observations in men’s basketball at the Olympics. Monitoring of these team performance indicators may provide coaches with the capability to devise multiple game plans or strategies to enhance their likelihood of winning. Incorporation of machine learning techniques with team performance indicators may provide a valuable and strategic approach to explain patterns within multivariate datasets in sport science. PMID:29238245

  5. Gene masking - a technique to improve accuracy for cancer classification with high dimensionality in microarray data.

    PubMed

    Saini, Harsh; Lal, Sunil Pranit; Naidu, Vimal Vikash; Pickering, Vincel Wince; Singh, Gurmeet; Tsunoda, Tatsuhiko; Sharma, Alok

    2016-12-05

    High dimensional feature space generally degrades classification in several applications. In this paper, we propose a strategy called gene masking, in which non-contributing dimensions are heuristically removed from the data to improve classification accuracy. Gene masking is implemented via a binary encoded genetic algorithm that can be integrated seamlessly with classifiers during the training phase of classification to perform feature selection. It can also be used to discriminate between features that contribute most to the classification, thereby, allowing researchers to isolate features that may have special significance. This technique was applied on publicly available datasets whereby it substantially reduced the number of features used for classification while maintaining high accuracies. The proposed technique can be extremely useful in feature selection as it heuristically removes non-contributing features to improve the performance of classifiers.

  6. Decoding the Traumatic Memory among Women with PTSD: Implications for Neurocircuitry Models of PTSD and Real-Time fMRI Neurofeedback

    PubMed Central

    Cisler, Josh M.; Bush, Keith; James, G. Andrew; Smitherman, Sonet; Kilts, Clinton D.

    2015-01-01

    Posttraumatic Stress Disorder (PTSD) is characterized by intrusive recall of the traumatic memory. While numerous studies have investigated the neural processing mechanisms engaged during trauma memory recall in PTSD, these analyses have only focused on group-level contrasts that reveal little about the predictive validity of the identified brain regions. By contrast, a multivariate pattern analysis (MVPA) approach towards identifying the neural mechanisms engaged during trauma memory recall would entail testing whether a multivariate set of brain regions is reliably predictive of (i.e., discriminates) whether an individual is engaging in trauma or non-trauma memory recall. Here, we use a MVPA approach to test 1) whether trauma memory vs neutral memory recall can be predicted reliably using a multivariate set of brain regions among women with PTSD related to assaultive violence exposure (N=16), 2) the methodological parameters (e.g., spatial smoothing, number of memory recall repetitions, etc.) that optimize classification accuracy and reproducibility of the feature weight spatial maps, and 3) the correspondence between brain regions that discriminate trauma memory recall and the brain regions predicted by neurocircuitry models of PTSD. Cross-validation classification accuracy was significantly above chance for all methodological permutations tested; mean accuracy across participants was 76% for the methodological parameters selected as optimal for both efficiency and accuracy. Classification accuracy was significantly better for a voxel-wise approach relative to voxels within restricted regions-of-interest (ROIs); classification accuracy did not differ when using PTSD-related ROIs compared to randomly generated ROIs. ROI-based analyses suggested the reliable involvement of the left hippocampus in discriminating memory recall across participants and that the contribution of the left amygdala to the decision function was dependent upon PTSD symptom severity. These results have methodological implications for real-time fMRI neurofeedback of the trauma memory in PTSD and conceptual implications for neurocircuitry models of PTSD that attempt to explain core neural processing mechanisms mediating PTSD. PMID:26241958

  7. Decoding the Traumatic Memory among Women with PTSD: Implications for Neurocircuitry Models of PTSD and Real-Time fMRI Neurofeedback.

    PubMed

    Cisler, Josh M; Bush, Keith; James, G Andrew; Smitherman, Sonet; Kilts, Clinton D

    2015-01-01

    Posttraumatic Stress Disorder (PTSD) is characterized by intrusive recall of the traumatic memory. While numerous studies have investigated the neural processing mechanisms engaged during trauma memory recall in PTSD, these analyses have only focused on group-level contrasts that reveal little about the predictive validity of the identified brain regions. By contrast, a multivariate pattern analysis (MVPA) approach towards identifying the neural mechanisms engaged during trauma memory recall would entail testing whether a multivariate set of brain regions is reliably predictive of (i.e., discriminates) whether an individual is engaging in trauma or non-trauma memory recall. Here, we use a MVPA approach to test 1) whether trauma memory vs neutral memory recall can be predicted reliably using a multivariate set of brain regions among women with PTSD related to assaultive violence exposure (N=16), 2) the methodological parameters (e.g., spatial smoothing, number of memory recall repetitions, etc.) that optimize classification accuracy and reproducibility of the feature weight spatial maps, and 3) the correspondence between brain regions that discriminate trauma memory recall and the brain regions predicted by neurocircuitry models of PTSD. Cross-validation classification accuracy was significantly above chance for all methodological permutations tested; mean accuracy across participants was 76% for the methodological parameters selected as optimal for both efficiency and accuracy. Classification accuracy was significantly better for a voxel-wise approach relative to voxels within restricted regions-of-interest (ROIs); classification accuracy did not differ when using PTSD-related ROIs compared to randomly generated ROIs. ROI-based analyses suggested the reliable involvement of the left hippocampus in discriminating memory recall across participants and that the contribution of the left amygdala to the decision function was dependent upon PTSD symptom severity. These results have methodological implications for real-time fMRI neurofeedback of the trauma memory in PTSD and conceptual implications for neurocircuitry models of PTSD that attempt to explain core neural processing mechanisms mediating PTSD.

  8. Electroencephalogram-based decoding cognitive states using convolutional neural network and likelihood ratio based score fusion.

    PubMed

    Zafar, Raheel; Dass, Sarat C; Malik, Aamir Saeed

    2017-01-01

    Electroencephalogram (EEG)-based decoding human brain activity is challenging, owing to the low spatial resolution of EEG. However, EEG is an important technique, especially for brain-computer interface applications. In this study, a novel algorithm is proposed to decode brain activity associated with different types of images. In this hybrid algorithm, convolutional neural network is modified for the extraction of features, a t-test is used for the selection of significant features and likelihood ratio-based score fusion is used for the prediction of brain activity. The proposed algorithm takes input data from multichannel EEG time-series, which is also known as multivariate pattern analysis. Comprehensive analysis was conducted using data from 30 participants. The results from the proposed method are compared with current recognized feature extraction and classification/prediction techniques. The wavelet transform-support vector machine method is the most popular currently used feature extraction and prediction method. This method showed an accuracy of 65.7%. However, the proposed method predicts the novel data with improved accuracy of 79.9%. In conclusion, the proposed algorithm outperformed the current feature extraction and prediction method.

  9. Accuracy assessment of vegetation community maps generated by aerial photography interpretation: perspective from the tropical savanna, Australia

    NASA Astrophysics Data System (ADS)

    Lewis, Donna L.; Phinn, Stuart

    2011-01-01

    Aerial photography interpretation is the most common mapping technique in the world. However, unlike an algorithm-based classification of satellite imagery, accuracy of aerial photography interpretation generated maps is rarely assessed. Vegetation communities covering an area of 530 km2 on Bullo River Station, Northern Territory, Australia, were mapped using an interpretation of 1:50,000 color aerial photography. Manual stereoscopic line-work was delineated at 1:10,000 and thematic maps generated at 1:25,000 and 1:100,000. Multivariate and intuitive analysis techniques were employed to identify 22 vegetation communities within the study area. The accuracy assessment was based on 50% of a field dataset collected over a 4 year period (2006 to 2009) and the remaining 50% of sites were used for map attribution. The overall accuracy and Kappa coefficient for both thematic maps was 66.67% and 0.63, respectively, calculated from standard error matrices. Our findings highlight the need for appropriate scales of mapping and accuracy assessment of aerial photography interpretation generated vegetation community maps.

  10. Noninvasive prostate cancer screening based on serum surface-enhanced Raman spectroscopy and support vector machine

    NASA Astrophysics Data System (ADS)

    Li, Shaoxin; Zhang, Yanjiao; Xu, Junfa; Li, Linfang; Zeng, Qiuyao; Lin, Lin; Guo, Zhouyi; Liu, Zhiming; Xiong, Honglian; Liu, Songhao

    2014-09-01

    This study aims to present a noninvasive prostate cancer screening methods using serum surface-enhanced Raman scattering (SERS) and support vector machine (SVM) techniques through peripheral blood sample. SERS measurements are performed using serum samples from 93 prostate cancer patients and 68 healthy volunteers by silver nanoparticles. Three types of kernel functions including linear, polynomial, and Gaussian radial basis function (RBF) are employed to build SVM diagnostic models for classifying measured SERS spectra. For comparably evaluating the performance of SVM classification models, the standard multivariate statistic analysis method of principal component analysis (PCA) is also applied to classify the same datasets. The study results show that for the RBF kernel SVM diagnostic model, the diagnostic accuracy of 98.1% is acquired, which is superior to the results of 91.3% obtained from PCA methods. The receiver operating characteristic curve of diagnostic models further confirm above research results. This study demonstrates that label-free serum SERS analysis technique combined with SVM diagnostic algorithm has great potential for noninvasive prostate cancer screening.

  11. A Review of Mid-Infrared and Near-Infrared Imaging: Principles, Concepts and Applications in Plant Tissue Analysis.

    PubMed

    Türker-Kaya, Sevgi; Huck, Christian W

    2017-01-20

    Plant cells, tissues and organs are composed of various biomolecules arranged as structurally diverse units, which represent heterogeneity at microscopic levels. Molecular knowledge about those constituents with their localization in such complexity is very crucial for both basic and applied plant sciences. In this context, infrared imaging techniques have advantages over conventional methods to investigate heterogeneous plant structures in providing quantitative and qualitative analyses with spatial distribution of the components. Thus, particularly, with the use of proper analytical approaches and sampling methods, these technologies offer significant information for the studies on plant classification, physiology, ecology, genetics, pathology and other related disciplines. This review aims to present a general perspective about near-infrared and mid-infrared imaging/microspectroscopy in plant research. It is addressed to compare potentialities of these methodologies with their advantages and limitations. With regard to the organization of the document, the first section will introduce the respective underlying principles followed by instrumentation, sampling techniques, sample preparations, measurement, and an overview of spectral pre-processing and multivariate analysis. The last section will review selected applications in the literature.

  12. Hyperspectral imaging using a color camera and its application for pathogen detection

    NASA Astrophysics Data System (ADS)

    Yoon, Seung-Chul; Shin, Tae-Sung; Heitschmidt, Gerald W.; Lawrence, Kurt C.; Park, Bosoon; Gamble, Gary

    2015-02-01

    This paper reports the results of a feasibility study for the development of a hyperspectral image recovery (reconstruction) technique using a RGB color camera and regression analysis in order to detect and classify colonies of foodborne pathogens. The target bacterial pathogens were the six representative non-O157 Shiga-toxin producing Escherichia coli (STEC) serogroups (O26, O45, O103, O111, O121, and O145) grown in Petri dishes of Rainbow agar. The purpose of the feasibility study was to evaluate whether a DSLR camera (Nikon D700) could be used to predict hyperspectral images in the wavelength range from 400 to 1,000 nm and even to predict the types of pathogens using a hyperspectral STEC classification algorithm that was previously developed. Unlike many other studies using color charts with known and noise-free spectra for training reconstruction models, this work used hyperspectral and color images, separately measured by a hyperspectral imaging spectrometer and the DSLR color camera. The color images were calibrated (i.e. normalized) to relative reflectance, subsampled and spatially registered to match with counterpart pixels in hyperspectral images that were also calibrated to relative reflectance. Polynomial multivariate least-squares regression (PMLR) was previously developed with simulated color images. In this study, partial least squares regression (PLSR) was also evaluated as a spectral recovery technique to minimize multicollinearity and overfitting. The two spectral recovery models (PMLR and PLSR) and their parameters were evaluated by cross-validation. The QR decomposition was used to find a numerically more stable solution of the regression equation. The preliminary results showed that PLSR was more effective especially with higher order polynomial regressions than PMLR. The best classification accuracy measured with an independent test set was about 90%. The results suggest the potential of cost-effective color imaging using hyperspectral image classification algorithms for rapidly differentiating pathogens in agar plates.

  13. Design of multivariable feedback control systems via spectral assignment. [as applied to aircraft flight control

    NASA Technical Reports Server (NTRS)

    Liberty, S. R.; Mielke, R. R.; Tung, L. J.

    1981-01-01

    Applied research in the area of spectral assignment in multivariable systems is reported. A frequency domain technique for determining the set of all stabilizing controllers for a single feedback loop multivariable system is described. It is shown that decoupling and tracking are achievable using this procedure. The technique is illustrated with a simple example.

  14. Comparison on three classification techniques for sex estimation from the bone length of Asian children below 19 years old: an analysis using different group of ages.

    PubMed

    Darmawan, M F; Yusuf, Suhaila M; Kadir, M R Abdul; Haron, H

    2015-02-01

    Sex estimation is used in forensic anthropology to assist the identification of individual remains. However, the estimation techniques tend to be unique and applicable only to a certain population. This paper analyzed sex estimation on living individual child below 19 years old using the length of 19 bones of left hand applied for three classification techniques, which were Discriminant Function Analysis (DFA), Support Vector Machine (SVM) and Artificial Neural Network (ANN) multilayer perceptron. These techniques were carried out on X-ray images of the left hand taken from an Asian population data set. All the 19 bones of the left hand were measured using Free Image software, and all the techniques were performed using MATLAB. The group of age "16-19" years old and "7-9" years old were the groups that could be used for sex estimation with as their average of accuracy percentage was above 80%. ANN model was the best classification technique with the highest average of accuracy percentage in the two groups of age compared to other classification techniques. The results show that each classification technique has the best accuracy percentage on each different group of age. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  15. Classification Techniques for Digital Map Compression

    DTIC Science & Technology

    1989-03-01

    classification improved the performance of the K-means classification algorithm resulting in a compression of 8.06:1 with Lempel - Ziv coding. Run-length coding... compression performance are run-length coding [2], [8] and Lempel - Ziv coding 110], [11]. These techniques are chosen because they are most efficient when...investigated. After the classification, some standard file compression methods, such as Lempel - Ziv and run-length encoding were applied to the

  16. Multivariate classification with random forests for gravitational wave searches of black hole binary coalescence

    NASA Astrophysics Data System (ADS)

    Baker, Paul T.; Caudill, Sarah; Hodge, Kari A.; Talukder, Dipongkar; Capano, Collin; Cornish, Neil J.

    2015-03-01

    Searches for gravitational waves produced by coalescing black hole binaries with total masses ≳25 M⊙ use matched filtering with templates of short duration. Non-Gaussian noise bursts in gravitational wave detector data can mimic short signals and limit the sensitivity of these searches. Previous searches have relied on empirically designed statistics incorporating signal-to-noise ratio and signal-based vetoes to separate gravitational wave candidates from noise candidates. We report on sensitivity improvements achieved using a multivariate candidate ranking statistic derived from a supervised machine learning algorithm. We apply the random forest of bagged decision trees technique to two separate searches in the high mass (≳25 M⊙ ) parameter space. For a search which is sensitive to gravitational waves from the inspiral, merger, and ringdown of binary black holes with total mass between 25 M⊙ and 100 M⊙ , we find sensitive volume improvements as high as 70±13%-109±11% when compared to the previously used ranking statistic. For a ringdown-only search which is sensitive to gravitational waves from the resultant perturbed intermediate mass black hole with mass roughly between 10 M⊙ and 600 M⊙ , we find sensitive volume improvements as high as 61±4%-241±12% when compared to the previously used ranking statistic. We also report how sensitivity improvements can differ depending on mass regime, mass ratio, and available data quality information. Finally, we describe the techniques used to tune and train the random forest classifier that can be generalized to its use in other searches for gravitational waves.

  17. The composite sequential clustering technique for analysis of multispectral scanner data

    NASA Technical Reports Server (NTRS)

    Su, M. Y.

    1972-01-01

    The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.

  18. Comparisons of neural networks to standard techniques for image classification and correlation

    NASA Technical Reports Server (NTRS)

    Paola, Justin D.; Schowengerdt, Robert A.

    1994-01-01

    Neural network techniques for multispectral image classification and spatial pattern detection are compared to the standard techniques of maximum-likelihood classification and spatial correlation. The neural network produced a more accurate classification than maximum-likelihood of a Landsat scene of Tucson, Arizona. Some of the errors in the maximum-likelihood classification are illustrated using decision region and class probability density plots. As expected, the main drawback to the neural network method is the long time required for the training stage. The network was trained using several different hidden layer sizes to optimize both the classification accuracy and training speed, and it was found that one node per class was optimal. The performance improved when 3x3 local windows of image data were entered into the net. This modification introduces texture into the classification without explicit calculation of a texture measure. Larger windows were successfully used for the detection of spatial features in Landsat and Magellan synthetic aperture radar imagery.

  19. Development of Pattern Recognition Techniques for the Evaluation of Toxicant Impacts to Multispecies Systems

    DTIC Science & Technology

    1993-06-18

    the exception. In the Standardized Aquatic Microcosm and the Mixed Flask Culture (MFC) microcosms, multivariate analysis and clustering methods...rule rather than the exception. In the Standardized Aquatic Microcosm and the Mixed Flask Culture (MFC) microcosms, multivariate analysis and...experiments using two microcosm protocols. We use nonmetric clustering, a multivariate pattern recognition technique developed by Matthews and Heame (1991

  20. Ensemble of sparse classifiers for high-dimensional biological data.

    PubMed

    Kim, Sunghan; Scalzo, Fabien; Telesca, Donatello; Hu, Xiao

    2015-01-01

    Biological data are often high in dimension while the number of samples is small. In such cases, the performance of classification can be improved by reducing the dimension of data, which is referred to as feature selection. Recently, a novel feature selection method has been proposed utilising the sparsity of high-dimensional biological data where a small subset of features accounts for most variance of the dataset. In this study we propose a new classification method for high-dimensional biological data, which performs both feature selection and classification within a single framework. Our proposed method utilises a sparse linear solution technique and the bootstrap aggregating algorithm. We tested its performance on four public mass spectrometry cancer datasets along with two other conventional classification techniques such as Support Vector Machines and Adaptive Boosting. The results demonstrate that our proposed method performs more accurate classification across various cancer datasets than those conventional classification techniques.

  1. Comparative Analysis of RF Emission Based Fingerprinting Techniques for ZigBee Device Classification

    DTIC Science & Technology

    quantify the differences invarious RF fingerprinting techniques via comparative analysis of MDA/ML classification results. The findings herein demonstrate...correct classification rates followed by COR-DNA and then RF-DNA in most test cases and especially in low Eb/N0 ranges, where ZigBee is designed to operate.

  2. The trophic classification of lakes using ERTS multispectral scanner data

    NASA Technical Reports Server (NTRS)

    Blackwell, R. J.; Boland, D. H.

    1975-01-01

    Lake classification methods based on the use of ERTS data are described. Preliminary classification results obtained by multispectral and digital image processing techniques indicate satisfactory correlation between ERTS data and EPA-supplied water analysis. Techniques for determining lake trophic levels using ERTS data are examined, and data obtained for 20 lakes are discussed.

  3. Development Of Polarimetric Decomposition Techniques For Indian Forest Resource Assessment Using Radar Imaging Satellite (Risat-1) Images

    NASA Astrophysics Data System (ADS)

    Sridhar, J.

    2015-12-01

    The focus of this work is to examine polarimetric decomposition techniques primarily focussed on Pauli decomposition and Sphere Di-Plane Helix (SDH) decomposition for forest resource assessment. The data processing methods adopted are Pre-processing (Geometric correction and Radiometric calibration), Speckle Reduction, Image Decomposition and Image Classification. Initially to classify forest regions, unsupervised classification was applied to determine different unknown classes. It was observed K-means clustering method gave better results in comparison with ISO Data method.Using the algorithm developed for Radar Tools, the code for decomposition and classification techniques were applied in Interactive Data Language (IDL) and was applied to RISAT-1 image of Mysore-Mandya region of Karnataka, India. This region is chosen for studying forest vegetation and consists of agricultural lands, water and hilly regions. Polarimetric SAR data possess a high potential for classification of earth surface.After applying the decomposition techniques, classification was done by selecting region of interests andpost-classification the over-all accuracy was observed to be higher in the SDH decomposed image, as it operates on individual pixels on a coherent basis and utilises the complete intrinsic coherent nature of polarimetric SAR data. Thereby, making SDH decomposition particularly suited for analysis of high-resolution SAR data. The Pauli Decomposition represents all the polarimetric information in a single SAR image however interpretation of the resulting image is difficult. The SDH decomposition technique seems to produce better results and interpretation as compared to Pauli Decomposition however more quantification and further analysis are being done in this area of research. The comparison of Polarimetric decomposition techniques and evolutionary classification techniques will be the scope of this work.

  4. Novel Histogram Based Unsupervised Classification Technique to Determine Natural Classes From Biophysically Relevant Fit Parameters to Hyperspectral Data

    DOE PAGES

    McCann, Cooper; Repasky, Kevin S.; Morin, Mikindra; ...

    2017-05-23

    Hyperspectral image analysis has benefited from an array of methods that take advantage of the increased spectral depth compared to multispectral sensors; however, the focus of these developments has been on supervised classification methods. Lack of a priori knowledge regarding land cover characteristics can make unsupervised classification methods preferable under certain circumstances. An unsupervised classification technique is presented in this paper that utilizes physically relevant basis functions to model the reflectance spectra. These fit parameters used to generate the basis functions allow clustering based on spectral characteristics rather than spectral channels and provide both noise and data reduction. Histogram splittingmore » of the fit parameters is then used as a means of producing an unsupervised classification. Unlike current unsupervised classification techniques that rely primarily on Euclidian distance measures to determine similarity, the unsupervised classification technique uses the natural splitting of the fit parameters associated with the basis functions creating clusters that are similar in terms of physical parameters. The data set used in this work utilizes the publicly available data collected at Indian Pines, Indiana. This data set provides reference data allowing for comparisons of the efficacy of different unsupervised data analysis. The unsupervised histogram splitting technique presented in this paper is shown to be better than the standard unsupervised ISODATA clustering technique with an overall accuracy of 34.3/19.0% before merging and 40.9/39.2% after merging. Finally, this improvement is also seen as an improvement of kappa before/after merging of 24.8/30.5 for the histogram splitting technique compared to 15.8/28.5 for ISODATA.« less

  5. Novel Histogram Based Unsupervised Classification Technique to Determine Natural Classes From Biophysically Relevant Fit Parameters to Hyperspectral Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McCann, Cooper; Repasky, Kevin S.; Morin, Mikindra

    Hyperspectral image analysis has benefited from an array of methods that take advantage of the increased spectral depth compared to multispectral sensors; however, the focus of these developments has been on supervised classification methods. Lack of a priori knowledge regarding land cover characteristics can make unsupervised classification methods preferable under certain circumstances. An unsupervised classification technique is presented in this paper that utilizes physically relevant basis functions to model the reflectance spectra. These fit parameters used to generate the basis functions allow clustering based on spectral characteristics rather than spectral channels and provide both noise and data reduction. Histogram splittingmore » of the fit parameters is then used as a means of producing an unsupervised classification. Unlike current unsupervised classification techniques that rely primarily on Euclidian distance measures to determine similarity, the unsupervised classification technique uses the natural splitting of the fit parameters associated with the basis functions creating clusters that are similar in terms of physical parameters. The data set used in this work utilizes the publicly available data collected at Indian Pines, Indiana. This data set provides reference data allowing for comparisons of the efficacy of different unsupervised data analysis. The unsupervised histogram splitting technique presented in this paper is shown to be better than the standard unsupervised ISODATA clustering technique with an overall accuracy of 34.3/19.0% before merging and 40.9/39.2% after merging. Finally, this improvement is also seen as an improvement of kappa before/after merging of 24.8/30.5 for the histogram splitting technique compared to 15.8/28.5 for ISODATA.« less

  6. Multivariate statistical analysis: Principles and applications to coorbital streams of meteorite falls

    NASA Technical Reports Server (NTRS)

    Wolf, S. F.; Lipschutz, M. E.

    1993-01-01

    Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.

  7. Nationwide forestry applications program. Analysis of forest classification accuracy

    NASA Technical Reports Server (NTRS)

    Congalton, R. G.; Mead, R. A.; Oderwald, R. G.; Heinen, J. (Principal Investigator)

    1981-01-01

    The development of LANDSAT classification accuracy assessment techniques, and of a computerized system for assessing wildlife habitat from land cover maps are considered. A literature review on accuracy assessment techniques and an explanation for the techniques development under both projects are included along with listings of the computer programs. The presentations and discussions at the National Working Conference on LANDSAT Classification Accuracy are summarized. Two symposium papers which were published on the results of this project are appended.

  8. Risk factors of poor functional results at 1-year after pseudocontinent perineal colostomy for ultralow rectal adenocarcinoma.

    PubMed

    Souadka, Amine; Majbar, Mohammed Anass; Bougutab, Abdeslam; El Othmany, Azzedine; Jalil, Abdelouahed; Ahyoud, Fatema Zahra; El Malki, Hadj Omar; Souadka, Abdelilah

    2013-10-01

    Pseudocontinent perineal colostomy is one of the techniques that helps recover the body image of patients undergoing abdominoperineal resection. This technique is rarely used internationally given its unknown functional results. The study aimed to evaluate 1-year functional outcomes of perineal pseudocontinent colostomy and to determine the risk factors for "poor" functional results. This study is a retrospective interventional case series. This study was conducted at a tertiary care university hospital and oncological center in Morocco. From January 1993 to December 2007, 149 patients underwent pseudocontinent perineal colostomy after abdominoperineal resection for low rectal adenocarcinoma. Pseudocontinent perineal colostomy was performed with the use of the Schmidt technique after abdominoperineal resection. One-year functional results were assessed according to the Kirwan classification system. Functional results were considered "poor" when the Kirwan score was C, D, or E. Univariable and multivariable analyses were used to evaluate the impact of age, sex, type of surgery, irrigation frequency, palpable muscular ring, concomitant chemoradiotherapy, stage, and perineal complications on functional results. One hundred forty-six patients were analyzed. According to the Kirwan system, the scores showed that 100 (68.5%) patients had "good" continence results (stage A-B) and 46 (31.5%) patients had altered functional results (stage C-D-E). With the exception of pelvic recurrences, no conversions from a perineal colostomy to an abdominal colostomy were performed for dissatisfactory functional results. In multivariate analysis, the only independent predictive factors of poor functional results were the occurrence of perineal complications (OR, 3.923; 95% CI, 1.461-10.35; p = 0.007) and extended resection (OR, 3.03; 95% CI, 1.183-7.750; p = 0.021) LIMITATION OF THE STUDY:: This study is an observational retrospective study on selected patients (mainly a young population). This study showed that perineal complications and extended resection are associated with poor functional results after pseudocontinent perineal colostomy. These data can help clinicians to better inform patients about the outcomes of this technique and to assist them in choosing the right reconstruction technique after abdominoperineal resection.

  9. A revision of chiggers of the minuta species-group (Acari: Trombiculidae: Neotrombicula Hirst, 1925) using multivariate morphometrics.

    PubMed

    Stekolnikov, Alexandr A; Klimov, Pavel B

    2010-09-01

    We revise chiggers belonging to the minuta-species group (genus Neotrombicula Hirst, 1925) from the Palaearctic using size-free multivariate morphometrics. This approach allowed us to resolve several diagnostic problems. We show that the widely distributed Neotrombicula scrupulosa Kudryashova, 1993 forms three spatially and ecologically isolated groups different from each other in size or shape (morphometric property) only: specimens from the Caucasus are distinct from those from Asia in shape, whereas the Asian specimens from plains and mountains are different from each other in size. We developed a multivariate classification model to separate three closely related species: N. scrupulosa, N. lubrica Kudryashova, 1993 and N. minuta Schluger, 1966. This model is based on five shape variables selected from an initial 17 variables by a best subset analysis using a custom size-correction subroutine. The variable selection procedure slightly improved the predictive power of the model, suggesting that it not only removed redundancy but also reduced 'noise' in the dataset. The overall classification accuracy of this model is 96.2, 96.2 and 95.5%, as estimated by internal validation, external validation and jackknife statistics, respectively. Our analyses resulted in one new synonymy: N. dimidiata Stekolnikov, 1995 is considered to be a synonym of N. lubrica. Both N. scrupulosa and N. lubrica are recorded from new localities. A key to species of the minuta-group incorporating results from our multivariate analyses is presented.

  10. Selecting reusable components using algebraic specifications

    NASA Technical Reports Server (NTRS)

    Eichmann, David A.

    1992-01-01

    A significant hurdle confronts the software reuser attempting to select candidate components from a software repository - discriminating between those components without resorting to inspection of the implementation(s). We outline a mixed classification/axiomatic approach to this problem based upon our lattice-based faceted classification technique and Guttag and Horning's algebraic specification techniques. This approach selects candidates by natural language-derived classification, by their interfaces, using signatures, and by their behavior, using axioms. We briefly outline our problem domain and related work. Lattice-based faceted classifications are described; the reader is referred to surveys of the extensive literature for algebraic specification techniques. Behavioral support for reuse queries is presented, followed by the conclusions.

  11. Exploring Raman spectroscopy for the evaluation of glaucomatous retinal changes

    NASA Astrophysics Data System (ADS)

    Wang, Qi; Grozdanic, Sinisa D.; Harper, Matthew M.; Hamouche, Nicolas; Kecova, Helga; Lazic, Tatjana; Yu, Chenxu

    2011-10-01

    Glaucoma is a chronic neurodegenerative disease characterized by apoptosis of retinal ganglion cells and subsequent loss of visual function. Early detection of glaucoma is critical for the prevention of permanent structural damage and irreversible vision loss. Raman spectroscopy is a technique that provides rapid biochemical characterization of tissues in a nondestructive and noninvasive fashion. In this study, we explored the potential of using Raman spectroscopy for detection of glaucomatous changes in vitro. Raman spectroscopic imaging was conducted on retinal tissues of dogs with hereditary glaucoma and healthy control dogs. The Raman spectra were subjected to multivariate discriminant analysis with a support vector machine algorithm, and a classification model was developed to differentiate disease tissues versus healthy tissues. Spectroscopic analysis of 105 retinal ganglion cells (RGCs) from glaucomatous dogs and 267 RGCs from healthy dogs revealed spectroscopic markers that differentiated glaucomatous specimens from healthy controls. Furthermore, the multivariate discriminant model differentiated healthy samples and glaucomatous samples with good accuracy [healthy 89.5% and glaucomatous 97.6% for the same breed (Basset Hounds); and healthy 85.0% and glaucomatous 85.5% for different breeds (Beagles versus Basset Hounds)]. Raman spectroscopic screening can be used for in vitro detection of glaucomatous changes in retinal tissue with a high specificity.

  12. Delineation of estuarine management areas using multivariate geostatistics: the case of Sado Estuary.

    PubMed

    Caeiro, Sandra; Goovaerts, Pierre; Painho, Marco; Costa, M Helena

    2003-09-15

    The Sado Estuary is a coastal zone located in the south of Portugal where conflicts between conservation and development exist because of its location near industrialized urban zones and its designation as a natural reserve. The aim of this paper is to evaluate a set of multivariate geostatistical approaches to delineate spatially contiguous regions of sediment structure for Sado Estuary. These areas will be the supporting infrastructure of an environmental management system for this estuary. The boundaries of each homogeneous area were derived from three sediment characterization attributes through three different approaches: (1) cluster analysis of dissimilarity matrix function of geographical separation followed by indicator kriging of the cluster data, (2) discriminant analysis of kriged values of the three sediment attributes, and (3) a combination of methods 1 and 2. Final maximum likelihood classification was integrated into a geographical information system. All methods generated fairly spatially contiguous management areas that reproduce well the environment of the estuary. Map comparison techniques based on kappa statistics showed thatthe resultant three maps are similar, supporting the choice of any of the methods as appropriate for management of the Sado Estuary. However, the results of method 1 seem to be in better agreement with estuary behavior, assessment of contamination sources, and previous work conducted at this site.

  13. Exploring Raman spectroscopy for the evaluation of glaucomatous retinal changes.

    PubMed

    Wang, Qi; Grozdanic, Sinisa D; Harper, Matthew M; Hamouche, Nicolas; Kecova, Helga; Lazic, Tatjana; Yu, Chenxu

    2011-10-01

    Glaucoma is a chronic neurodegenerative disease characterized by apoptosis of retinal ganglion cells and subsequent loss of visual function. Early detection of glaucoma is critical for the prevention of permanent structural damage and irreversible vision loss. Raman spectroscopy is a technique that provides rapid biochemical characterization of tissues in a nondestructive and noninvasive fashion. In this study, we explored the potential of using Raman spectroscopy for detection of glaucomatous changes in vitro. Raman spectroscopic imaging was conducted on retinal tissues of dogs with hereditary glaucoma and healthy control dogs. The Raman spectra were subjected to multivariate discriminant analysis with a support vector machine algorithm, and a classification model was developed to differentiate disease tissues versus healthy tissues. Spectroscopic analysis of 105 retinal ganglion cells (RGCs) from glaucomatous dogs and 267 RGCs from healthy dogs revealed spectroscopic markers that differentiated glaucomatous specimens from healthy controls. Furthermore, the multivariate discriminant model differentiated healthy samples and glaucomatous samples with good accuracy [healthy 89.5% and glaucomatous 97.6% for the same breed (Basset Hounds); and healthy 85.0% and glaucomatous 85.5% for different breeds (Beagles versus Basset Hounds)]. Raman spectroscopic screening can be used for in vitro detection of glaucomatous changes in retinal tissue with a high specificity.

  14. Metal and physico-chemical variations at a hydroelectric reservoir analyzed by Multivariate Analyses and Artificial Neural Networks: environmental management and policy/decision-making tools.

    PubMed

    Cavalcante, Y L; Hauser-Davis, R A; Saraiva, A C F; Brandão, I L S; Oliveira, T F; Silveira, A M

    2013-01-01

    This paper compared and evaluated seasonal variations in physico-chemical parameters and metals at a hydroelectric power station reservoir by applying Multivariate Analyses and Artificial Neural Networks (ANN) statistical techniques. A Factor Analysis was used to reduce the number of variables: the first factor was composed of elements Ca, K, Mg and Na, and the second by Chemical Oxygen Demand. The ANN showed 100% correct classifications in training and validation samples. Physico-chemical analyses showed that water pH values were not statistically different between the dry and rainy seasons, while temperature, conductivity, alkalinity, ammonia and DO were higher in the dry period. TSS, hardness and COD, on the other hand, were higher during the rainy season. The statistical analyses showed that Ca, K, Mg and Na are directly connected to the Chemical Oxygen Demand, which indicates a possibility of their input into the reservoir system by domestic sewage and agricultural run-offs. These statistical applications, thus, are also relevant in cases of environmental management and policy decision-making processes, to identify which factors should be further studied and/or modified to recover degraded or contaminated water bodies. Copyright © 2012 Elsevier B.V. All rights reserved.

  15. A multivariate geostatistical methodology to delineate areas of potential interest for future sedimentary gold exploration.

    PubMed

    Goovaerts, P; Albuquerque, Teresa; Antunes, Margarida

    2016-11-01

    This paper describes a multivariate geostatistical methodology to delineate areas of potential interest for future sedimentary gold exploration, with an application to an abandoned sedimentary gold mining region in Portugal. The main challenge was the existence of only a dozen gold measurements confined to the grounds of the old gold mines, which precluded the application of traditional interpolation techniques, such as cokriging. The analysis could, however, capitalize on 376 stream sediment samples that were analyzed for twenty two elements. Gold (Au) was first predicted at all 376 locations using linear regression (R 2 =0.798) and four metals (Fe, As, Sn and W), which are known to be mostly associated with the local gold's paragenesis. One hundred realizations of the spatial distribution of gold content were generated using sequential indicator simulation and a soft indicator coding of regression estimates, to supplement the hard indicator coding of gold measurements. Each simulated map then underwent a local cluster analysis to identify significant aggregates of low or high values. The one hundred classified maps were processed to derive the most likely classification of each simulated node and the associated probability of occurrence. Examining the distribution of the hot-spots and cold-spots reveals a clear enrichment in Au along the Erges River downstream from the old sedimentary mineralization.

  16. Polarization-based material classification technique using passive millimeter-wave polarimetric imagery.

    PubMed

    Hu, Fei; Cheng, Yayun; Gui, Liangqi; Wu, Liang; Zhang, Xinyi; Peng, Xiaohui; Su, Jinlong

    2016-11-01

    The polarization properties of thermal millimeter-wave emission capture inherent information of objects, e.g., material composition, shape, and surface features. In this paper, a polarization-based material-classification technique using passive millimeter-wave polarimetric imagery is presented. Linear polarization ratio (LPR) is created to be a new feature discriminator that is sensitive to material type and to remove the reflected ambient radiation effect. The LPR characteristics of several common natural and artificial materials are investigated by theoretical and experimental analysis. Based on a priori information about LPR characteristics, the optimal range of incident angle and the classification criterion are discussed. Simulation and measurement results indicate that the presented classification technique is effective for distinguishing between metals and dielectrics. This technique suggests possible applications for outdoor metal target detection in open scenes.

  17. Application of satellite data and LARS's data processing techniques to mapping vegetation of the Dismal Swamp. M.S. Thesis - Old Dominion Univ.

    NASA Technical Reports Server (NTRS)

    Messmore, J. A.

    1976-01-01

    The feasibility of using digital satellite imagery and automatic data processing techniques as a means of mapping swamp forest vegetation was considered, using multispectral scanner data acquired by the LANDSAT-1 satellite. The site for this investigation was the Dismal Swamp, a 210,000 acre swamp forest located south of Suffolk, Va. on the Virginia-North Carolina border. Two basic classification strategies were employed. The initial classification utilized unsupervised techniques which produced a map of the swamp indicating the distribution of thirteen forest spectral classes. These classes were later combined into three informational categories: Atlantic white cedar (Chamaecyparis thyoides), Loblolly pine (Pinus taeda), and deciduous forest. The subsequent classification employed supervised techniques which mapped Atlantic white cedar, Loblolly pine, deciduous forest, water and agriculture within the study site. A classification accuracy of 82.5% was produced by unsupervised techniques compared with 89% accuracy using supervised techniques.

  18. Grouping Parturients by Parity, Previous-Cesarean, and Mode of Delivery (P-C-MoD Classification) Better Identifies Groups at Risk for Postpartum Hemorrhage.

    PubMed

    Reichman, Orna; Gal, Micahel; Sela, Hen Y; Khayyat, Izzat; Emanuel, Michael; Samueloff, Arnon

    2016-10-01

    Objective We aimed to create a clinical classification to better identify parturients at risk for postpartum hemorrhage (PPH). Method A retrospective cohort, including all women who delivered at a single tertiary care medical center, between 2006 and 2014. Parturients were grouped by parity and history of cesarean delivery (CD): primiparas, multipara, and multipara with previous CD. Each were further subgrouped by mode of delivery (spontaneous vaginal delivery [SVD], operative vaginal delivery [OVD], emergency or elective CD). In all, 12 subgroups, based on parity, previous cesarean, and mode of delivery, formed the P-C-MoD classification. PPH was defined as a decrease of ≥3 gram% hemoglobin from admission and/or transfusion of blood products. Univariate analysis followed by multivariate analysis was performed to assess risk for PPH, controlling for confounders. Results The crude rate of PPH among 126,693 parturients was 7%. The prevalence differed significantly among independent risk factors: primiparity, 14%; multiparity, 4%; OVD, 22%; and CD, 15%. The P-C-MoD classification, segregated better between parturients at risk for PPH. The prevalence of PPH was highest for primiparous undergoing OVD (27%) compared with multiparous with SVD (3%), odds ratio [OR] = 12.8 (95% confidence interval [CI],11.9-13.9). These finding were consistent in the multivariate analysis OR = 13.1 (95% CI,12.1-14.3). Conclusion Employing the P-C-MoD classification more readily identifies parturients at risk for PPH and is superior to estimations based on single risk factors. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  19. Multivariate analysis of standoff laser-induced breakdown spectroscopy spectra for classification of explosive-containing residues

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    De Lucia, Frank C. Jr.; Gottfried, Jennifer L.; Munson, Chase A.

    2008-11-01

    A technique being evaluated for standoff explosives detection is laser-induced breakdown spectroscopy (LIBS). LIBS is a real-time sensor technology that uses components that can be configured into a ruggedized standoff instrument. The U.S. Army Research Laboratory has been coupling standoff LIBS spectra with chemometrics for several years now in order to discriminate between explosives and nonexplosives. We have investigated the use of partial least squares discriminant analysis (PLS-DA) for explosives detection. We have extended our study of PLS-DA to more complex sample types, including binary mixtures, different types of explosives, and samples not included in the model. We demonstrate themore » importance of building the PLS-DA model by iteratively testing it against sample test sets. Independent test sets are used to test the robustness of the final model.« less

  20. Missing data exploration: highlighting graphical presentation of missing pattern

    PubMed Central

    2015-01-01

    Functions shipped with R base can fulfill many tasks of missing data handling. However, because the data volume of electronic medical record (EMR) system is always very large, more sophisticated methods may be helpful in data management. The article focuses on missing data handling by using advanced techniques. There are three types of missing data, that is, missing completely at random (MCAR), missing at random (MAR) and not missing at random (NMAR). This classification system depends on how missing values are generated. Two packages, Multivariate Imputation by Chained Equations (MICE) and Visualization and Imputation of Missing Values (VIM), provide sophisticated functions to explore missing data pattern. In particular, the VIM package is especially helpful in visual inspection of missing data. Finally, correlation analysis provides information on the dependence of missing data on other variables. Such information is useful in subsequent imputations. PMID:26807411

  1. The use of logistic regression to enhance risk assessment and decision making by mental health administrators.

    PubMed

    Menditto, Anthony A; Linhorst, Donald M; Coleman, James C; Beck, Niels C

    2006-04-01

    Development of policies and procedures to contend with the risks presented by elopement, aggression, and suicidal behaviors are long-standing challenges for mental health administrators. Guidance in making such judgments can be obtained through the use of a multivariate statistical technique known as logistic regression. This procedure can be used to develop a predictive equation that is mathematically formulated to use the best combination of predictors, rather than considering just one factor at a time. This paper presents an overview of logistic regression and its utility in mental health administrative decision making. A case example of its application is presented using data on elopements from Missouri's long-term state psychiatric hospitals. Ultimately, the use of statistical prediction analyses tempered with differential qualitative weighting of classification errors can augment decision-making processes in a manner that provides guidance and flexibility while wrestling with the complex problem of risk assessment and decision making.

  2. Potential of mid-infrared spectroscopy as a non-invasive diagnostic test in urine for endometrial or ovarian cancer.

    PubMed

    Paraskevaidi, Maria; Morais, Camilo L M; Lima, Kássio M G; Ashton, Katherine M; Stringfellow, Helen F; Martin-Hirsch, Pierre L; Martin, Francis L

    2018-06-07

    The current lack of an accurate, cost-effective and non-invasive test that would allow for screening and diagnosis of gynaecological carcinomas, such as endometrial and ovarian cancer, signals the necessity for alternative approaches. The potential of spectroscopic techniques in disease investigation and diagnosis has been previously demonstrated. Here, we used attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy to analyse urine samples from women with endometrial (n = 10) and ovarian cancer (n = 10), as well as from healthy individuals (n = 10). After applying multivariate analysis and classification algorithms, biomarkers of disease were pointed out and high levels of accuracy were achieved for both endometrial (95% sensitivity, 100% specificity; accuracy: 95%) and ovarian cancer (100% sensitivity, 96.3% specificity; accuracy 100%). The efficacy of this approach, in combination with the non-invasive method for urine collection, suggest a potential diagnostic tool for endometrial and ovarian cancers.

  3. Investigation of production method, geographical origin and species authentication in commercially relevant shrimps using stable isotope ratio and/or multi-element analyses combined with chemometrics: an exploratory analysis.

    PubMed

    Ortea, Ignacio; Gallardo, José M

    2015-03-01

    Three factors defining the traceability of a food product are production method (wild or farmed), geographical origin and biological species, which have to be checked and guaranteed, not only in order to avoid mislabelling and commercial fraud, but also to address food safety issues and to comply with legal regulations. The aim of this study was to determine whether these three factors could be differentiated in shrimps using stable isotope ratio analysis of carbon and nitrogen and/or multi-element composition. Different multivariate statistics methods were applied to different data subsets in order to evaluate their performance in terms of classification or predictive ability. Although the success rates varied depending on the dataset used, the combination of both techniques allowed the correct classification of 100% of the samples according to their actual origin and method of production, and 93.5% according to biological species. Even though further studies including a larger number of samples in each group are needed in order to validate these findings, we can conclude that these methodologies should be considered for studies regarding seafood product authenticity. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. An integrated analysis for determining the geographical origin of medicinal herbs using ICP-AES/ICP-MS and (1)H NMR analysis.

    PubMed

    Kwon, Yong-Kook; Bong, Yeon-Sik; Lee, Kwang-Sik; Hwang, Geum-Sook

    2014-10-15

    ICP-MS and (1)H NMR are commonly used to determine the geographical origin of food and crops. In this study, data from multielemental analysis performed by ICP-AES/ICP-MS and metabolomic data obtained from (1)H NMR were integrated to improve the reliability of determining the geographical origin of medicinal herbs. Astragalus membranaceus and Paeonia albiflora with different origins in Korea and China were analysed by (1)H NMR and ICP-AES/ICP-MS, and an integrated multivariate analysis was performed to characterise the differences between their origins. Four classification methods were applied: linear discriminant analysis (LDA), k-nearest neighbour classification (KNN), support vector machines (SVM), and partial least squares-discriminant analysis (PLS-DA). Results were compared using leave-one-out cross-validation and external validation. The integration of multielemental and metabolomic data was more suitable for determining geographical origin than the use of each individual data set alone. The integration of the two analytical techniques allowed diverse environmental factors such as climate and geology, to be considered. Our study suggests that an appropriate integration of different types of analytical data is useful for determining the geographical origin of food and crops with a high degree of reliability. Copyright © 2014 Elsevier Ltd. All rights reserved.

  5. Classification of buildings mold threat using electronic nose

    NASA Astrophysics Data System (ADS)

    Łagód, Grzegorz; Suchorab, Zbigniew; Guz, Łukasz; Sobczuk, Henryk

    2017-07-01

    Mold is considered to be one of the most important features of Sick Building Syndrome and is an important problem in current building industry. In many cases it is caused by the rising moisture of building envelopes surface and exaggerated humidity of indoor air. Concerning historical buildings it is mostly caused by outdated raising techniques among that is absence of horizontal isolation against moisture and hygroscopic materials applied for construction. Recent buildings also suffer problem of mold risk which is caused in many cases by hermetization leading to improper performance of gravitational ventilation systems that make suitable conditions for mold development. Basing on our research there is proposed a method of buildings mold threat classification using electronic nose, based on a gas sensors array which consists of MOS sensors (metal oxide semiconductor). Used device is frequently applied for air quality assessment in environmental engineering branches. Presented results show the interpretation of e-nose readouts of indoor air sampled in rooms threatened with mold development in comparison with clean reference rooms and synthetic air. Obtained multivariate data were processed, visualized and classified using a PCA (Principal Component Analysis) and ANN (Artificial Neural Network) methods. Described investigation confirmed that electronic nose - gas sensors array supported with data processing enables to classify air samples taken from different rooms affected with mold.

  6. A nonlinear heartbeat dynamics model approach for personalized emotion recognition.

    PubMed

    Valenza, Gaetano; Citi, Luca; Lanatà, Antonio; Scilingo, Enzo Pasquale; Barbieri, Riccardo

    2013-01-01

    Emotion recognition based on autonomic nervous system signs is one of the ambitious goals of affective computing. It is well-accepted that standard signal processing techniques require relative long-time series of multivariate records to ensure reliability and robustness of recognition and classification algorithms. In this work, we present a novel methodology able to assess cardiovascular dynamics during short-time (i.e. < 10 seconds) affective stimuli, thus overcoming some of the limitations of current emotion recognition approaches. We developed a personalized, fully parametric probabilistic framework based on point-process theory where heartbeat events are modelled using a 2(nd)-order nonlinear autoregressive integrative structure in order to achieve effective performances in short-time affective assessment. Experimental results show a comprehensive emotional characterization of 4 subjects undergoing a passive affective elicitation using a sequence of standardized images gathered from the international affective picture system. Each picture was identified by the IAPS arousal and valence scores as well as by a self-reported emotional label associating a subjective positive or negative emotion. Results show a clear classification of two defined levels of arousal, valence and self-emotional state using features coming from the instantaneous spectrum and bispectrum of the considered RR intervals, reaching up to 90% recognition accuracy.

  7. Biometric Authentication for Gender Classification Techniques: A Review

    NASA Astrophysics Data System (ADS)

    Mathivanan, P.; Poornima, K.

    2017-12-01

    One of the challenging biometric authentication applications is gender identification and age classification, which captures gait from far distance and analyze physical information of the subject such as gender, race and emotional state of the subject. It is found that most of the gender identification techniques have focused only with frontal pose of different human subject, image size and type of database used in the process. The study also classifies different feature extraction process such as, Principal Component Analysis (PCA) and Local Directional Pattern (LDP) that are used to extract the authentication features of a person. This paper aims to analyze different gender classification techniques that help in evaluating strength and weakness of existing gender identification algorithm. Therefore, it helps in developing a novel gender classification algorithm with less computation cost and more accuracy. In this paper, an overview and classification of different gender identification techniques are first presented and it is compared with other existing human identification system by means of their performance.

  8. Risk assessments using the Strain Index and the TLV for HAL, Part I: Task and multi-task job exposure classifications.

    PubMed

    Kapellusch, Jay M; Bao, Stephen S; Silverstein, Barbara A; Merryweather, Andrew S; Thiese, Mathew S; Hegmann, Kurt T; Garg, Arun

    2017-12-01

    The Strain Index (SI) and the American Conference of Governmental Industrial Hygienists (ACGIH) Threshold Limit Value for Hand Activity Level (TLV for HAL) use different constituent variables to quantify task physical exposures. Similarly, time-weighted-average (TWA), Peak, and Typical exposure techniques to quantify physical exposure from multi-task jobs make different assumptions about each task's contribution to the whole job exposure. Thus, task and job physical exposure classifications differ depending upon which model and technique are used for quantification. This study examines exposure classification agreement, disagreement, correlation, and magnitude of classification differences between these models and techniques. Data from 710 multi-task job workers performing 3,647 tasks were analyzed using the SI and TLV for HAL models, as well as with the TWA, Typical and Peak job exposure techniques. Physical exposures were classified as low, medium, and high using each model's recommended, or a priori limits. Exposure classification agreement and disagreement between models (SI, TLV for HAL) and between job exposure techniques (TWA, Typical, Peak) were described and analyzed. Regardless of technique, the SI classified more tasks as high exposure than the TLV for HAL, and the TLV for HAL classified more tasks as low exposure. The models agreed on 48.5% of task classifications (kappa = 0.28) with 15.5% of disagreement between low and high exposure categories. Between-technique (i.e., TWA, Typical, Peak) agreement ranged from 61-93% (kappa: 0.16-0.92) depending on whether the SI or TLV for HAL was used. There was disagreement between the SI and TLV for HAL and between the TWA, Typical and Peak techniques. Disagreement creates uncertainty for job design, job analysis, risk assessments, and developing interventions. Task exposure classifications from the SI and TLV for HAL might complement each other. However, TWA, Typical, and Peak job exposure techniques all have limitations. Part II of this article examines whether the observed differences between these models and techniques produce different exposure-response relationships for predicting prevalence of carpal tunnel syndrome.

  9. Arm structure in normal spiral galaxies, 1: Multivariate data for 492 galaxies

    NASA Technical Reports Server (NTRS)

    Magri, Christopher

    1994-01-01

    Multivariate data have been collected as part of an effort to develop a new classification system for spiral galaxies, one which is not necessarily based on subjective morphological properties. A sample of 492 moderately bright northern Sa and Sc spirals was chosen for future statistical analysis. New observations were made at 20 and 21 cm; the latter data are described in detail here. Infrared Astronomy Satellite (IRAS) fluxes were obtained from archival data. Finally, new estimates of arm pattern radomness and of local environmental harshness were compiled for most sample objects.

  10. Evidence-based provisional clinical classification criteria for autoinflammatory periodic fevers.

    PubMed

    Federici, Silvia; Sormani, Maria Pia; Ozen, Seza; Lachmann, Helen J; Amaryan, Gayane; Woo, Patricia; Koné-Paut, Isabelle; Dewarrat, Natacha; Cantarini, Luca; Insalaco, Antonella; Uziel, Yosef; Rigante, Donato; Quartier, Pierre; Demirkaya, Erkan; Herlin, Troels; Meini, Antonella; Fabio, Giovanna; Kallinich, Tilmann; Martino, Silvana; Butbul, Aviel Yonatan; Olivieri, Alma; Kuemmerle-Deschner, Jasmin; Neven, Benedicte; Simon, Anna; Ozdogan, Huri; Touitou, Isabelle; Frenkel, Joost; Hofer, Michael; Martini, Alberto; Ruperto, Nicolino; Gattorno, Marco

    2015-05-01

    The objective of this work was to develop and validate a set of clinical criteria for the classification of patients affected by periodic fevers. Patients with inherited periodic fevers (familial Mediterranean fever (FMF); mevalonate kinase deficiency (MKD); tumour necrosis factor receptor-associated periodic fever syndrome (TRAPS); cryopyrin-associated periodic syndromes (CAPS)) enrolled in the Eurofever Registry up until March 2013 were evaluated. Patients with periodic fever, aphthosis, pharyngitis and adenitis (PFAPA) syndrome were used as negative controls. For each genetic disease, patients were considered to be 'gold standard' on the basis of the presence of a confirmatory genetic analysis. Clinical criteria were formulated on the basis of univariate and multivariate analysis in an initial group of patients (training set) and validated in an independent set of patients (validation set). A total of 1215 consecutive patients with periodic fevers were identified, and 518 gold standard patients (291 FMF, 74 MKD, 86 TRAPS, 67 CAPS) and 199 patients with PFAPA as disease controls were evaluated. The univariate and multivariate analyses identified a number of clinical variables that correlated independently with each disease, and four provisional classification scores were created. Cut-off values of the classification scores were chosen using receiver operating characteristic curve analysis as those giving the highest sensitivity and specificity. The classification scores were then tested in an independent set of patients (validation set) with an area under the curve of 0.98 for FMF, 0.95 for TRAPS, 0.96 for MKD, and 0.99 for CAPS. In conclusion, evidence-based provisional clinical criteria with high sensitivity and specificity for the clinical classification of patients with inherited periodic fevers have been developed. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  11. [Value of the albumin to globulin ratio in predicting severity and prognosis in myasthenia gravis patients].

    PubMed

    Yang, D H; Su, Z Q; Chen, Y; Chen, Z B; Ding, Z N; Weng, Y Y; Li, J; Li, X; Tong, Q L; Han, Y X; Zhang, X

    2016-03-08

    To assess the predictive value of the albumin to globulin ratio (AGR) in evaluation of disease severity and prognosis in myasthenia gravis patients. A total of 135 myasthenia gravis (MG) patients were enrolled between February 2009 and March 2015. The AGR was detected on the first day of hospitalization and ranked from lowest to highest, and the patients were divided into three equal tertiles according to the AGR values, which were T1 (AGR <1.34), T2 (1.34≤AGR≤1.53) and T3 (AGR>1.53). The Kaplan-Meier curve was used to evaluate the prognostic value of AGR. Cox model analysis was used to evaluate the relevant factors. Multivariate Logistic regression analysis was used to find the predictors of myasthenia crisis during hospitalization. The median length of hospital stay for each tertile was: for the T1 21 days (15-35.5), T2 18 days (14-27.5), and T3 16 days (12-22.5) (P<0.01), and Kaplan-Meier curves showed significant difference among the three groups. In the univariate model, serum albumin, creatinine, AGR and MGFA clinical classification were related to prognosis of myasthenia gravis. At the multivariate Cox regression analysis, the AGR (P<0.001) and MGFA clinical classification (P<0.001) were independent predictive factors of disease severity and prognosis in myasthenia gravis patients. Respectively, the hazard ratio (HR) were 4.655 (95% CI: 2.355-9.202) and 0.596 (95% CI: 0.492-0.723). Multivariate Logistic regression analysis showed the AGR (P<0.001) and MGFA clinical classification were related to myasthenia crisis. The AGR may represent a simple, potentially useful predictive biomarker for evaluating the disease severity and prognosis of patients with myasthenia gravis.

  12. Biostatistics Series Module 10: Brief Overview of Multivariate Methods.

    PubMed

    Hazra, Avijit; Gogtay, Nithya

    2017-01-01

    Multivariate analysis refers to statistical techniques that simultaneously look at three or more variables in relation to the subjects under investigation with the aim of identifying or clarifying the relationships between them. These techniques have been broadly classified as dependence techniques, which explore the relationship between one or more dependent variables and their independent predictors, and interdependence techniques, that make no such distinction but treat all variables equally in a search for underlying relationships. Multiple linear regression models a situation where a single numerical dependent variable is to be predicted from multiple numerical independent variables. Logistic regression is used when the outcome variable is dichotomous in nature. The log-linear technique models count type of data and can be used to analyze cross-tabulations where more than two variables are included. Analysis of covariance is an extension of analysis of variance (ANOVA), in which an additional independent variable of interest, the covariate, is brought into the analysis. It tries to examine whether a difference persists after "controlling" for the effect of the covariate that can impact the numerical dependent variable of interest. Multivariate analysis of variance (MANOVA) is a multivariate extension of ANOVA used when multiple numerical dependent variables have to be incorporated in the analysis. Interdependence techniques are more commonly applied to psychometrics, social sciences and market research. Exploratory factor analysis and principal component analysis are related techniques that seek to extract from a larger number of metric variables, a smaller number of composite factors or components, which are linearly related to the original variables. Cluster analysis aims to identify, in a large number of cases, relatively homogeneous groups called clusters, without prior information about the groups. The calculation intensive nature of multivariate analysis has so far precluded most researchers from using these techniques routinely. The situation is now changing with wider availability, and increasing sophistication of statistical software and researchers should no longer shy away from exploring the applications of multivariate methods to real-life data sets.

  13. Automated radial basis function neural network based image classification system for diabetic retinopathy detection in retinal images

    NASA Astrophysics Data System (ADS)

    Anitha, J.; Vijila, C. Kezi Selva; Hemanth, D. Jude

    2010-02-01

    Diabetic retinopathy (DR) is a chronic eye disease for which early detection is highly essential to avoid any fatal results. Image processing of retinal images emerge as a feasible tool for this early diagnosis. Digital image processing techniques involve image classification which is a significant technique to detect the abnormality in the eye. Various automated classification systems have been developed in the recent years but most of them lack high classification accuracy. Artificial neural networks are the widely preferred artificial intelligence technique since it yields superior results in terms of classification accuracy. In this work, Radial Basis function (RBF) neural network based bi-level classification system is proposed to differentiate abnormal DR Images and normal retinal images. The results are analyzed in terms of classification accuracy, sensitivity and specificity. A comparative analysis is performed with the results of the probabilistic classifier namely Bayesian classifier to show the superior nature of neural classifier. Experimental results show promising results for the neural classifier in terms of the performance measures.

  14. Automated simultaneous multiple feature classification of MTI data

    NASA Astrophysics Data System (ADS)

    Harvey, Neal R.; Theiler, James P.; Balick, Lee K.; Pope, Paul A.; Szymanski, John J.; Perkins, Simon J.; Porter, Reid B.; Brumby, Steven P.; Bloch, Jeffrey J.; David, Nancy A.; Galassi, Mark C.

    2002-08-01

    Los Alamos National Laboratory has developed and demonstrated a highly capable system, GENIE, for the two-class problem of detecting a single feature against a background of non-feature. In addition to the two-class case, however, a commonly encountered remote sensing task is the segmentation of multispectral image data into a larger number of distinct feature classes or land cover types. To this end we have extended our existing system to allow the simultaneous classification of multiple features/classes from multispectral data. The technique builds on previous work and its core continues to utilize a hybrid evolutionary-algorithm-based system capable of searching for image processing pipelines optimized for specific image feature extraction tasks. We describe the improvements made to the GENIE software to allow multiple-feature classification and describe the application of this system to the automatic simultaneous classification of multiple features from MTI image data. We show the application of the multiple-feature classification technique to the problem of classifying lava flows on Mauna Loa volcano, Hawaii, using MTI image data and compare the classification results with standard supervised multiple-feature classification techniques.

  15. Identifying functional groups for response to disturbance in an abandoned pasture

    NASA Astrophysics Data System (ADS)

    Lavorel, Sandra; Touzard, Blaise; Lebreton, Jean-Dominique; Clément, Bernard

    1998-06-01

    In an abandoned pasture in Brittany, we compared artificial small-scale disturbances to natural disturbances by wild boar and undisturbed vegetation. We developed a multivariate statistical approach which analyses how species biological attributes explain the response of community composition to disturbances. This technique, which reconciles the inductive and deductive approaches for functional classifications, identifies groups of species with similar responses to disturbance and characterizes their biological profiles. After 5 months of recolonization, artificial disturbances had a greater species richness than undisturbed vegetation as a result of recruitment of new species without the exclusion of pre-existing matrix species. Species morphology, described by canopy structure, canopy height and lateral spread, explained a large part (16 %) of community response to disturbance. Regeneration strategies, described by life history, seed mass, dispersal agent, dormancy and the existence of vegetative multiplication, explained a smaller part of community response to disturbance (8 %). Artificial disturbances were characterized by therophyte and compact rosettes with moderately dormant seeds, including a number of Asteraceae and other early successional species. Natural disturbances were colonized by leafy guerrilla species without seed dormancy. Few species were tightly related to undisturbed vegetation and were essentially grasses with a phalanx rosette morphology. The functional classification obtained is consistent with the classification of the community into fugitives, regenerators and persistors. These groups are structured according to Grubb's model for temperate grasslands, with regenerators and persistors in the matrix and fugitives taking advantage of gaps open by small-scale disturbances. The conjunction of functional diversity and species diversity within functional groups is the key to resilience to disturbance, an important ecosystem function.

  16. Experimental analysis of computer system dependability

    NASA Technical Reports Server (NTRS)

    Iyer, Ravishankar, K.; Tang, Dong

    1993-01-01

    This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance.

  17. Classification of Physical Activity: Information to Artificial Pancreas Control Systems in Real Time.

    PubMed

    Turksoy, Kamuran; Paulino, Thiago Marques Luz; Zaharieva, Dessi P; Yavelberg, Loren; Jamnik, Veronica; Riddell, Michael C; Cinar, Ali

    2015-10-06

    Physical activity has a wide range of effects on glucose concentrations in type 1 diabetes (T1D) depending on the type (ie, aerobic, anaerobic, mixed) and duration of activity performed. This variability in glucose responses to physical activity makes the development of artificial pancreas (AP) systems challenging. Automatic detection of exercise type and intensity, and its classification as aerobic or anaerobic would provide valuable information to AP control algorithms. This can be achieved by using a multivariable AP approach where biometric variables are measured and reported to the AP at high frequency. We developed a classification system that identifies, in real time, the exercise intensity and its reliance on aerobic or anaerobic metabolism and tested this approach using clinical data collected from 5 persons with T1D and 3 individuals without T1D in a controlled laboratory setting using a variety of common types of physical activity. The classifier had an average sensitivity of 98.7% for physiological data collected over a range of exercise modalities and intensities in these subjects. The classifier will be added as a new module to the integrated multivariable adaptive AP system to enable the detection of aerobic and anaerobic exercise for enhancing the accuracy of insulin infusion strategies during and after exercise. © 2015 Diabetes Technology Society.

  18. Multivariate Classification of Structural MRI Data Detects Chronic Low Back Pain

    PubMed Central

    Ung, Hoameng; Brown, Justin E.; Johnson, Kevin A.; Younger, Jarred; Hush, Julia; Mackey, Sean

    2014-01-01

    Chronic low back pain (cLBP) has a tremendous personal and socioeconomic impact, yet the underlying pathology remains a mystery in the majority of cases. An objective measure of this condition, that augments self-report of pain, could have profound implications for diagnostic characterization and therapeutic development. Contemporary research indicates that cLBP is associated with abnormal brain structure and function. Multivariate analyses have shown potential to detect a number of neurological diseases based on structural neuroimaging. Therefore, we aimed to empirically evaluate such an approach in the detection of cLBP, with a goal to also explore the relevant neuroanatomy. We extracted brain gray matter (GM) density from magnetic resonance imaging scans of 47 patients with cLBP and 47 healthy controls. cLBP was classified with an accuracy of 76% by support vector machine analysis. Primary drivers of the classification included areas of the somatosensory, motor, and prefrontal cortices—all areas implicated in the pain experience. Differences in areas of the temporal lobe, including bordering the amygdala, medial orbital gyrus, cerebellum, and visual cortex, were also useful for the classification. Our findings suggest that cLBP is characterized by a pattern of GM changes that can have discriminative power and reflect relevant pathological brain morphology. PMID:23246778

  19. Classification of Physical Activity

    PubMed Central

    Turksoy, Kamuran; Paulino, Thiago Marques Luz; Zaharieva, Dessi P.; Yavelberg, Loren; Jamnik, Veronica; Riddell, Michael C.; Cinar, Ali

    2015-01-01

    Physical activity has a wide range of effects on glucose concentrations in type 1 diabetes (T1D) depending on the type (ie, aerobic, anaerobic, mixed) and duration of activity performed. This variability in glucose responses to physical activity makes the development of artificial pancreas (AP) systems challenging. Automatic detection of exercise type and intensity, and its classification as aerobic or anaerobic would provide valuable information to AP control algorithms. This can be achieved by using a multivariable AP approach where biometric variables are measured and reported to the AP at high frequency. We developed a classification system that identifies, in real time, the exercise intensity and its reliance on aerobic or anaerobic metabolism and tested this approach using clinical data collected from 5 persons with T1D and 3 individuals without T1D in a controlled laboratory setting using a variety of common types of physical activity. The classifier had an average sensitivity of 98.7% for physiological data collected over a range of exercise modalities and intensities in these subjects. The classifier will be added as a new module to the integrated multivariable adaptive AP system to enable the detection of aerobic and anaerobic exercise for enhancing the accuracy of insulin infusion strategies during and after exercise. PMID:26443291

  20. Discriminant analysis of cardiovascular and respiratory variables for classification of road cyclists by specialty.

    PubMed

    Nikolić, Biljana; Martinović, Jelena; Matić, Milan; Stefanović, Đorđe

    2018-05-29

    Different variables determine the performance of cyclists, which brings up the question how these parameters may help in their classification by specialty. The aim of the study was to determine differences in cardiorespiratory parameters of male cyclists according to their specialty, flat rider (N=21), hill rider (N=35) and sprinter (N=20) and obtain the multivariate model for further cyclists classification by specialties, based on selected variables. Seventeen variables were measured at submaximal and maximum load on the cycle ergometer Cosmed E 400HK (Cosmed, Rome, Italy) (initial 100W with 25W increase, 90-100 rpm). Multivariate discriminant analysis was used to determine which variables group cyclists within their specialty, and to predict which variables can direct cyclists to a particular specialty. Among nine variables that statistically contribute to the discriminant power of the model, achieved power on the anaerobic threshold and the produced CO2 had the biggest impact. The obtained discriminatory model correctly classified 91.43% of flat riders, 85.71% of hill riders, while sprinters were classified completely correct (100%), i.e. 92.10% of examinees were correctly classified, which point out the strength of the discriminatory model. Respiratory indicators mostly contribute to the discriminant power of the model, which may significantly contribute to training practice and laboratory tests in future.

  1. Lava Morphology Classification of a Fast-Spreading Ridge Using Deep-Towed Sonar Data: East Pacific Rise

    NASA Astrophysics Data System (ADS)

    Meyer, J.; White, S.

    2005-05-01

    Classification of lava morphology on a regional scale contributes to the understanding of the distribution and extent of lava flows at a mid-ocean ridge. Seafloor classification is essential to understand the regional undersea environment at midocean ridges. In this study, the development of a classification scheme is found to identify and extract textural patterns of different lava morphologies along the East Pacific Rise using DSL-120 side-scan and ARGO camera imagery. Application of an accurate image classification technique to side-scan sonar allows us to expand upon the locally available visual ground reference data to make the first comprehensive regional maps of small-scale lava morphology present at a mid-ocean ridge. The submarine lava morphologies focused upon in this study; sheet flows, lobate flows, and pillow flows; have unique textures. Several algorithms were applied to the sonar backscatter intensity images to produce multiple textural image layers useful in distinguishing the different lava morphologies. The intensity and spatially enhanced images were then combined and applied to a hybrid classification technique. The hybrid classification involves two integrated classifiers, a rule-based expert system classifier and a machine learning classifier. The complementary capabilities of the two integrated classifiers provided a higher accuracy of regional seafloor classification compared to using either classifier alone. Once trained, the hybrid classifier can then be applied to classify neighboring images with relative ease. This classification technique has been used to map the lava morphology distribution and infer spatial variability of lava effusion rates along two segments of the East Pacific Rise, 17 deg S and 9 deg N. Future use of this technique may also be useful for attaining temporal information. Repeated documentation of morphology classification in this dynamic environment can be compared to detect regional seafloor change.

  2. Accuracy of land use change detection using support vector machine and maximum likelihood techniques for open-cast coal mining areas.

    PubMed

    Karan, Shivesh Kishore; Samadder, Sukha Ranjan

    2016-08-01

    One objective of the present study was to evaluate the performance of support vector machine (SVM)-based image classification technique with the maximum likelihood classification (MLC) technique for a rapidly changing landscape of an open-cast mine. The other objective was to assess the change in land use pattern due to coal mining from 2006 to 2016. Assessing the change in land use pattern accurately is important for the development and monitoring of coalfields in conjunction with sustainable development. For the present study, Landsat 5 Thematic Mapper (TM) data of 2006 and Landsat 8 Operational Land Imager (OLI)/Thermal Infrared Sensor (TIRS) data of 2016 of a part of Jharia Coalfield, Dhanbad, India, were used. The SVM classification technique provided greater overall classification accuracy when compared to the MLC technique in classifying heterogeneous landscape with limited training dataset. SVM exceeded MLC in handling a difficult challenge of classifying features having near similar reflectance on the mean signature plot, an improvement of over 11 % was observed in classification of built-up area, and an improvement of 24 % was observed in classification of surface water using SVM; similarly, the SVM technique improved the overall land use classification accuracy by almost 6 and 3 % for Landsat 5 and Landsat 8 images, respectively. Results indicated that land degradation increased significantly from 2006 to 2016 in the study area. This study will help in quantifying the changes and can also serve as a basis for further decision support system studies aiding a variety of purposes such as planning and management of mines and environmental impact assessment.

  3. Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study.

    PubMed

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa

    2018-07-01

    Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  4. Classification of Ilex species based on metabolomic fingerprinting using nuclear magnetic resonance and multivariate data analysis.

    PubMed

    Choi, Young Hae; Sertic, Sarah; Kim, Hye Kyong; Wilson, Erica G; Michopoulos, Filippos; Lefeber, Alfons W M; Erkelens, Cornelis; Prat Kricun, Sergio D; Verpoorte, Robert

    2005-02-23

    The metabolomic analysis of 11 Ilex species, I. argentina, I. brasiliensis, I. brevicuspis, I. dumosavar. dumosa, I. dumosa var. guaranina, I. integerrima, I. microdonta, I. paraguariensis var. paraguariensis, I. pseudobuxus, I. taubertiana, and I. theezans, was carried out by NMR spectroscopy and multivariate data analysis. The analysis using principal component analysis and classification of the (1)H NMR spectra showed a clear discrimination of those samples based on the metabolites present in the organic and aqueous fractions. The major metabolites that contribute to the discrimination are arbutin, caffeine, phenylpropanoids, and theobromine. Among those metabolites, arbutin, which has not been reported yet as a constituent of Ilex species, was found to be a biomarker for I. argentina,I. brasiliensis, I. brevicuspis, I. integerrima, I. microdonta, I. pseudobuxus, I. taubertiana, and I. theezans. This reliable method based on the determination of a large number of metabolites makes the chemotaxonomical analysis of Ilex species possible.

  5. Craters on Earth, Moon, and Mars: Multivariate classification and mode of origin

    USGS Publications Warehouse

    Pike, R.J.

    1974-01-01

    Testing extraterrestrial craters and candidate terrestrial analogs for morphologic similitude is treated as a problem in numerical taxonomy. According to a principal-components solution and a cluster analysis, 402 representative craters on the Earth, the Moon, and Mars divide into two major classes of contrasting shapes and modes of origin. Craters of net accumulation of material (cratered lunar domes, Martian "calderas," and all terrestrial volcanoes except maars and tuff rings) group apart from craters of excavation (terrestrial meteorite impact and experimental explosion craters, typical Martian craters, and all other lunar craters). Maars and tuff rings belong to neither group but are transitional. The classification criteria are four independent attributes of topographic geometry derived from seven descriptive variables by the principal-components transformation. Morphometric differences between crater bowl and raised rim constitute the strongest of the four components. Although single topographic variables cannot confidently predict the genesis of individual extraterrestrial craters, multivariate statistical models constructed from several variables can distinguish consistently between large impact craters and volcanoes. ?? 1974.

  6. A conceptual weather-type classification procedure for the Philadelphia, Pennsylvania, area

    USGS Publications Warehouse

    McCabe, Gregory J.

    1990-01-01

    A simple method of weather-type classification, based on a conceptual model of pressure systems that pass through the Philadelphia, Pennsylvania, area, has been developed. The only inputs required for the procedure are daily mean wind direction and cloud cover, which are used to index the relative position of pressure systems and fronts to Philadelphia.Daily mean wind-direction and cloud-cover data recorded at Philadelphia, Pennsylvania, from January 1954 through August 1988 were used to categorize daily weather conditions. The conceptual weather types reflect changes in daily air and dew-point temperatures, and changes in monthly mean temperature and monthly and annual precipitation. The weather-type classification produced by using the conceptual model was similar to a classification produced by using a multivariate statistical classification procedure. Even though the conceptual weather types are derived from a small amount of data, they appear to account for the variability of daily weather patterns sufficiently to describe distinct weather conditions for use in environmental analyses of weather-sensitive processes.

  7. Neuroendocrine tumors of colon and rectum: validation of clinical and prognostic values of the World Health Organization 2010 grading classifications and European Neuroendocrine Tumor Society staging systems.

    PubMed

    Shen, Chaoyong; Yin, Yuan; Chen, Huijiao; Tang, Sumin; Yin, Xiaonan; Zhou, Zongguang; Zhang, Bo; Chen, Zhixin

    2017-03-28

    This study evaluated and compared the clinical and prognostic values of the grading criteria used by the World Health Organization (WHO) and the European Neuroendocrine Tumors Society (ENETS). Moreover, this work assessed the current best prognostic model for colorectal neuroendocrine tumors (CRNETs). The 2010 WHO classifications and the ENETS systems can both stratify the patients into prognostic groups, although the 2010 WHO criteria is more applicable to CRNET patients. Along with tumor location, the 2010 WHO criteria are important independent prognostic parameters for CRNETs in both univariate and multivariate analyses through Cox regression (P<0.05). Data from 192 consecutive patients histopathologically diagnosed with CRNETs and had undergone surgical resection from January 2009 to May 2016 in a single center were retrospectively analyzed. Findings suggest that the WHO classifications are superior over the ENETS classification system in predicting the prognosis of CRNETs. Additionally, the WHO classifications can be widely used in clinical practice.

  8. Classification of damage in structural systems using time series analysis and supervised and unsupervised pattern recognition techniques

    NASA Astrophysics Data System (ADS)

    Omenzetter, Piotr; de Lautour, Oliver R.

    2010-04-01

    Developed for studying long, periodic records of various measured quantities, time series analysis methods are inherently suited and offer interesting possibilities for Structural Health Monitoring (SHM) applications. However, their use in SHM can still be regarded as an emerging application and deserves more studies. In this research, Autoregressive (AR) models were used to fit experimental acceleration time histories from two experimental structural systems, a 3- storey bookshelf-type laboratory structure and the ASCE Phase II SHM Benchmark Structure, in healthy and several damaged states. The coefficients of the AR models were chosen as damage sensitive features. Preliminary visual inspection of the large, multidimensional sets of AR coefficients to check the presence of clusters corresponding to different damage severities was achieved using Sammon mapping - an efficient nonlinear data compression technique. Systematic classification of damage into states based on the analysis of the AR coefficients was achieved using two supervised classification techniques: Nearest Neighbor Classification (NNC) and Learning Vector Quantization (LVQ), and one unsupervised technique: Self-organizing Maps (SOM). This paper discusses the performance of AR coefficients as damage sensitive features and compares the efficiency of the three classification techniques using experimental data.

  9. Land Cover Analysis by Using Pixel-Based and Object-Based Image Classification Method in Bogor

    NASA Astrophysics Data System (ADS)

    Amalisana, Birohmatin; Rokhmatullah; Hernina, Revi

    2017-12-01

    The advantage of image classification is to provide earth’s surface information like landcover and time-series changes. Nowadays, pixel-based image classification technique is commonly performed with variety of algorithm such as minimum distance, parallelepiped, maximum likelihood, mahalanobis distance. On the other hand, landcover classification can also be acquired by using object-based image classification technique. In addition, object-based classification uses image segmentation from parameter such as scale, form, colour, smoothness and compactness. This research is aimed to compare the result of landcover classification and its change detection between parallelepiped pixel-based and object-based classification method. Location of this research is Bogor with 20 years range of observation from 1996 until 2016. This region is famous as urban areas which continuously change due to its rapid development, so that time-series landcover information of this region will be interesting.

  10. Application of multivariate statistical techniques in microbial ecology

    PubMed Central

    Paliy, O.; Shankar, V.

    2016-01-01

    Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large scale ecological datasets. Especially noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions, and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amounts of data, powerful statistical techniques of multivariate analysis are well suited to analyze and interpret these datasets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular dataset. In this review we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive, and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and dataset structure. PMID:26786791

  11. Weather patterns as a downscaling tool - evaluating their skill in stratifying local climate variables

    NASA Astrophysics Data System (ADS)

    Murawski, Aline; Bürger, Gerd; Vorogushyn, Sergiy; Merz, Bruno

    2016-04-01

    The use of a weather pattern based approach for downscaling of coarse, gridded atmospheric data, as usually obtained from the output of general circulation models (GCM), allows for investigating the impact of anthropogenic greenhouse gas emissions on fluxes and state variables of the hydrological cycle such as e.g. on runoff in large river catchments. Here we aim at attributing changes in high flows in the Rhine catchment to anthropogenic climate change. Therefore we run an objective classification scheme (simulated annealing and diversified randomisation - SANDRA, available from the cost733 classification software) on ERA20C reanalyses data and apply the established classification to GCMs from the CMIP5 project. After deriving weather pattern time series from GCM runs using forcing from all greenhouse gases (All-Hist) and using natural greenhouse gas forcing only (Nat-Hist), a weather generator will be employed to obtain climate data time series for the hydrological model. The parameters of the weather pattern classification (i.e. spatial extent, number of patterns, classification variables) need to be selected in a way that allows for good stratification of the meteorological variables that are of interest for the hydrological modelling. We evaluate the skill of the classification in stratifying meteorological data using a multi-variable approach. This allows for estimating the stratification skill for all meteorological variables together, not separately as usually done in existing similar work. The advantage of the multi-variable approach is to properly account for situations where e.g. two patterns are associated with similar mean daily temperature, but one pattern is dry while the other one is related to considerable amounts of precipitation. Thus, the separation of these two patterns would not be justified when considering temperature only, but is perfectly reasonable when accounting for precipitation as well. Besides that, the weather patterns derived from reanalyses data should be well represented in the All-Hist GCM runs in terms of e.g. frequency, seasonality, and persistence. In this contribution we show how to select the most appropriate weather pattern classification and how the classes derived from it are reflected in the GCMs.

  12. The Classification of Ground Roasted Decaffeinated Coffee Using UV-VIS Spectroscopy and SIMCA Method

    NASA Astrophysics Data System (ADS)

    Yulia, M.; Asnaning, A. R.; Suhandy, D.

    2018-05-01

    In this work, an investigation on the classification between decaffeinated and non- decaffeinated coffee samples using UV-VIS spectroscopy and SIMCA method was investigated. Total 200 samples of ground roasted coffee were used (100 samples for decaffeinated coffee and 100 samples for non-decaffeinated coffee). After extraction and dilution, the spectra of coffee samples solution were acquired using a UV-VIS spectrometer (Genesys™ 10S UV-VIS, Thermo Scientific, USA) in the range of 190-1100 nm. The multivariate analyses of the spectra were performed using principal component analysis (PCA) and soft independent modeling of class analogy (SIMCA). The SIMCA model showed that the classification between decaffeinated and non-decaffeinated coffee samples was detected with 100% sensitivity and specificity.

  13. When and why a colonoscopist should discontinue colonoscopy by himself?

    PubMed

    Gan, Tao; Yang, Jin-Lin; Wu, Jun-Chao; Wang, Yi-Ping; Yang, Li

    2015-07-07

    To investigate when and why a colonoscopist should discontinue incomplete colonoscopy by himself. In this cross-sectional study, 517 difficult colonoscope insertions (Grade C, Kudo's difficulty classification) screened from 37800 colonoscopy insertions were collected from April 2004 to June 2014 by three 4(th)-level (Kudo's classification) colonoscopists. The following common factors for the incomplete insertion were excluded: structural obstruction of the colon or rectum, insufficient colon cleansing, discontinuation due to patient's discomfort or pain, severe colon disease with a perforation risk (e.g., severe ischemic colonopathy). All the excluded patients were re-scheduled if permission was obtained from the patients whose intubation had failed. If the repeat intubations were still a failure because of the difficult operative techniques, those patients were also included in this study. The patient's age, sex, anesthesia and colonoscope type were recorded before colonoscopy. During the colonoscopic examination, the influencing factors of fixation, tortuosity, laxity and redundancy of the colon were assessed, and the insertion time (> 10 min or ≤ 10 min) were registered. The insertion time was analyzed by t-test, and other factors were analyzed by univariate and multivariate logistic regression. Three hundred and twenty-two (62.3%) of the 517 insertions were complete in the colonoscope insertion into the ileocecum, but 195 (37.7%) failed in the insertion. Fixation, tortuosity, laxity or redundancy occurred during the colonoscopic examination. Multivariate logistic regression analysis revealed that fixation (OR = 0.06, 95%CI: 0.03-0.16, P < 0.001) and tortuosity (OR = 0.04, 95%CI: 0.02-0.08, P < 0.001) were significantly related to the insertion into the ileocecum in the left hemicolon; multivariate logistic regression analysis also revealed that fixation (OR = 0.16, 95%CI: 0.06-0.39, P < 0.001), tortuosity (OR 0.23, 95%CI: 0.13-0.43, P < 0.001), redundancy (OR = 0.12, 95%CI: 0.05-0.26, P < 0.001) and sex (OR = 0.35, 95%CI: 0.20-0.63, P < 0.001) were significantly related to the insertion into the ileocecum in the right hemicolon. Prolonged insertion time (> 10 min) was an unfavorable factor for the insertion into the ileocecum. Colonoscopy should be discontinued if freedom of the colonoscope body's insertion and rotation is completely lost, and the insertion time is prolonged over 30 min.

  14. Discrimination of Gastrodia elata from Different Geographical Origin for Quality Evaluation Using Newly-Build Near Infrared Spectrum Coupled with Multivariate Analysis.

    PubMed

    Zuo, Yamin; Deng, Xuehua; Wu, Qing

    2018-05-04

    Discrimination of Gastrodia elata ( G. elata ) geographical origin is of great importance to pharmaceutical companies and consumers in China. this paper focuses on the feasibility of near infrared spectrum (NIRS) combined multivariate analysis as a rapid and non-destructive method to prove its fit for this purpose. Firstly, 16 batches of G. elata samples from four main-cultivation regions in China were quantified by traditional HPLC method. It showed that samples from different origins could not be efficiently differentiated by the contents of four phenolic compounds in this study. Secondly, the raw near infrared (NIR) spectra of those samples were acquired and two different pattern recognition techniques were used to classify the geographical origins. The results showed that with spectral transformation optimized, discriminant analysis (DA) provided 97% and 99% correct classification for the calibration and validation sets of samples from discriminating of four different main-cultivation regions, and provided 98% and 99% correct classifications for the calibration and validation sets of samples from eight different cities, respectively, which all performed better than the principal component analysis (PCA) method. Thirdly, as phenolic compounds content (PCC) is highly related with the quality of G. elata , synergy interval partial least squares (Si-PLS) was applied to build the PCC prediction model. The coefficient of determination for prediction (R p ²) of the Si-PLS model was 0.9209, and root mean square error for prediction (RMSEP) was 0.338. The two regions (4800 cm −1 ⁻5200 cm −1 , and 5600 cm −1 ⁻6000 cm −1 ) selected by Si-PLS corresponded to the absorptions of aromatic ring in the basic phenolic structure. It can be concluded that NIR spectroscopy combined with PCA, DA and Si-PLS would be a potential tool to provide a reference for the quality control of G. elata.

  15. Multivariate analysis as a key tool in chemotaxonomy of brinjal eggplant, African eggplants and wild related species.

    PubMed

    Haliński, Łukasz P; Samuels, John; Stepnowski, Piotr

    2017-12-01

    The brinjal eggplant (Solanum melongena L.) is an important vegetable species worldwide, while African eggplants (S. aethiopicum L., S. macrocarpon L.) are indigenous vegetable species of local significance. Taxonomy of eggplants and their wild relatives is complicated and still unclear. Hence, the objective of the study was to clarify taxonomic position of cultivars and landraces of brinjal, its wild relatives and African eggplant species and their wild ancestors using chemotaxonomic markers and multivariate analysis techniques for data processing, with special attention paid to the recognition of markers characteristic for each group of the plants. The total of 34 accessions belonging to 9 species from genus Solanum L. were used in the study. Chemotaxonomic analysis was based on the profiles of cuticular n-alkanes and methylalkanes, obtained using gas chromatography-mass spectrometry and gas chromatography with flame ionization detector. Standard hierarchical cluster analysis (HCA) and principal component analysis (PCA) were used for the classification, while the latter and two-way HCA allowed to identify markers responsible for the clustering of the species. Cultivars, landraces and wild forms of S. melongena were practically identical in terms of their taxonomic position. The results confirmed high and statistically significant distinctiveness of all African eggplant species from the brinjal eggplant. The latter was characterized mostly by abundant long chain hydrocarbons in the range of 34-37 carbon atoms. The differences between both African eggplant species were, however, also statistically significant; S. aethiopicum displayed the highest contribution of 2-methylalkanes to the total cuticular hydrocarbons, while S. macrocarpon was characterized by elevated n-alkanes in the range of 25-32 carbon atoms. Wild ancestors of both African eggplant species were identical with their cultivated relatives. Concluding, high usefulness of the chemotaxonomic approach in classification of this important group of plants was confirmed. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Automatic topic identification of health-related messages in online health community using text classification.

    PubMed

    Lu, Yingjie

    2013-01-01

    To facilitate patient involvement in online health community and obtain informative support and emotional support they need, a topic identification approach was proposed in this paper for identifying automatically topics of the health-related messages in online health community, thus assisting patients in reaching the most relevant messages for their queries efficiently. Feature-based classification framework was presented for automatic topic identification in our study. We first collected the messages related to some predefined topics in a online health community. Then we combined three different types of features, n-gram-based features, domain-specific features and sentiment features to build four feature sets for health-related text representation. Finally, three different text classification techniques, C4.5, Naïve Bayes and SVM were adopted to evaluate our topic classification model. By comparing different feature sets and different classification techniques, we found that n-gram-based features, domain-specific features and sentiment features were all considered to be effective in distinguishing different types of health-related topics. In addition, feature reduction technique based on information gain was also effective to improve the topic classification performance. In terms of classification techniques, SVM outperformed C4.5 and Naïve Bayes significantly. The experimental results demonstrated that the proposed approach could identify the topics of online health-related messages efficiently.

  17. A comparison of autonomous techniques for multispectral image analysis and classification

    NASA Astrophysics Data System (ADS)

    Valdiviezo-N., Juan C.; Urcid, Gonzalo; Toxqui-Quitl, Carina; Padilla-Vivanco, Alfonso

    2012-10-01

    Multispectral imaging has given place to important applications related to classification and identification of objects from a scene. Because of multispectral instruments can be used to estimate the reflectance of materials in the scene, these techniques constitute fundamental tools for materials analysis and quality control. During the last years, a variety of algorithms has been developed to work with multispectral data, whose main purpose has been to perform the correct classification of the objects in the scene. The present study introduces a brief review of some classical as well as a novel technique that have been used for such purposes. The use of principal component analysis and K-means clustering techniques as important classification algorithms is here discussed. Moreover, a recent method based on the min-W and max-M lattice auto-associative memories, that was proposed for endmember determination in hyperspectral imagery, is introduced as a classification method. Besides a discussion of their mathematical foundation, we emphasize their main characteristics and the results achieved for two exemplar images conformed by objects similar in appearance, but spectrally different. The classification results state that the first components computed from principal component analysis can be used to highlight areas with different spectral characteristics. In addition, the use of lattice auto-associative memories provides good results for materials classification even in the cases where some spectral similarities appears in their spectral responses.

  18. Application of multivariable search techniques to the optimization of airfoils in a low speed nonlinear inviscid flow field

    NASA Technical Reports Server (NTRS)

    Hague, D. S.; Merz, A. W.

    1975-01-01

    Multivariable search techniques are applied to a particular class of airfoil optimization problems. These are the maximization of lift and the minimization of disturbance pressure magnitude in an inviscid nonlinear flow field. A variety of multivariable search techniques contained in an existing nonlinear optimization code, AESOP, are applied to this design problem. These techniques include elementary single parameter perturbation methods, organized search such as steepest-descent, quadratic, and Davidon methods, randomized procedures, and a generalized search acceleration technique. Airfoil design variables are seven in number and define perturbations to the profile of an existing NACA airfoil. The relative efficiency of the techniques are compared. It is shown that elementary one parameter at a time and random techniques compare favorably with organized searches in the class of problems considered. It is also shown that significant reductions in disturbance pressure magnitude can be made while retaining reasonable lift coefficient values at low free stream Mach numbers.

  19. Multivariable PID controller design tuning using bat algorithm for activated sludge process

    NASA Astrophysics Data System (ADS)

    Atikah Nor’Azlan, Nur; Asmiza Selamat, Nur; Mat Yahya, Nafrizuan

    2018-04-01

    The designing of a multivariable PID control for multi input multi output is being concerned with this project by applying four multivariable PID control tuning which is Davison, Penttinen-Koivo, Maciejowski and Proposed Combined method. The determination of this study is to investigate the performance of selected optimization technique to tune the parameter of MPID controller. The selected optimization technique is Bat Algorithm (BA). All the MPID-BA tuning result will be compared and analyzed. Later, the best MPID-BA will be chosen in order to determine which techniques are better based on the system performances in terms of transient response.

  20. A Discriminative Approach to EEG Seizure Detection

    PubMed Central

    Johnson, Ashley N.; Sow, Daby; Biem, Alain

    2011-01-01

    Seizures are abnormal sudden discharges in the brain with signatures represented in electroencephalograms (EEG). The efficacy of the application of speech processing techniques to discriminate between seizure and non-seizure states in EEGs is reported. The approach accounts for the challenges of unbalanced datasets (seizure and non-seizure), while also showing a system capable of real-time seizure detection. The Minimum Classification Error (MCE) algorithm, which is a discriminative learning algorithm with wide-use in speech processing, is applied and compared with conventional classification techniques that have already been applied to the discrimination between seizure and non-seizure states in the literature. The system is evaluated on 22 pediatric patients multi-channel EEG recordings. Experimental results show that the application of speech processing techniques and MCE compare favorably with conventional classification techniques in terms of classification performance, while requiring less computational overhead. The results strongly suggests the possibility of deploying the designed system at the bedside. PMID:22195192

  1. Meta-Analytic Structural Equation Modeling (MASEM): Comparison of the Multivariate Methods

    ERIC Educational Resources Information Center

    Zhang, Ying

    2011-01-01

    Meta-analytic Structural Equation Modeling (MASEM) has drawn interest from many researchers recently. In doing MASEM, researchers usually first synthesize correlation matrices across studies using meta-analysis techniques and then analyze the pooled correlation matrix using structural equation modeling techniques. Several multivariate methods of…

  2. The classification of gunshot residue using laser electrospray mass spectrometry and offline multivariate statistical analysis

    USDA-ARS?s Scientific Manuscript database

    Nonresonant laser vaporization combined with high-resolution electrospray time-of-flight mass spectrometry enables analysis of a casing after discharge of a firearm revealing organic signature molecules including methyl centralite (MC), diphenylamine (DPA), N-nitrosodiphenylamine (N-NO-DPA), 4-nitro...

  3. Mapping the Diversity among Runaways: A Descriptive Multivariate Analysis of Selected Social Psychological Background Conditions.

    ERIC Educational Resources Information Center

    Brennan, Tim

    1980-01-01

    A review of prior classification systems of runaways is followed by a descriptive taxonomy of runaways developed using cluster-analytic methods. The empirical types illustrate patterns of weakness in bonds between runaways and families, schools, or peer relationships. (Author)

  4. Influence of intravenous opioid dose on postoperative ileus.

    PubMed

    Barletta, Jeffrey F; Asgeirsson, Theodor; Senagore, Anthony J

    2011-07-01

    Intravenous opioids represent a major component in the pathophysiology of postoperative ileus (POI). However, the most appropriate measure and threshold to quantify the association between opioid dose (eg, average daily, cumulative, maximum daily) and POI remains unknown. To evaluate the relationship between opioid dose, POI, and length of stay (LOS) and identify the opioid measure that was most strongly associated with POI. Consecutive patients admitted to a community teaching hospital who underwent elective colorectal surgery by any technique with an enhanced-recovery protocol postoperatively were retrospectively identified. Patients were excluded if they received epidural analgesia, developed a major intraabdominal complication or medical complication, or had a prolonged workup prior to surgery. Intravenous opioid doses were quantified and converted to hydromorphone equivalents. Classification and regression tree (CART) analysis was used to determine the dosing threshold for the opioid measure most associated with POI and define high versus low use of opioids. Risk factors for POI and prolonged LOS were determined through multivariate analysis. The incidence of POI in 279 patients was 8.6%. CART analysis identified a maximum daily intravenous hydromorphone dose of 2 mg or more as the opioid measure most associated with POI. Multivariate analysis revealed maximum daily hydromorphone dose of 2 mg or more (p = 0.034), open surgical technique (p = 0.045), and days of intravenous narcotic therapy (p = 0.003) as significant risk factors for POI. Variables associated with increased LOS were POI (p < 0.001), maximum daily hydromorphone dose of 2 mg or more (p < 0.001), and age (p = 0.005); laparoscopy (p < 0.001) was associated with a decreased LOS. Intravenous opioid therapy is significantly associated with POI and prolonged LOS, particularly when the maximum hydromorphone dose per day exceeds 2 mg. Clinicians should consider alternative, nonopioid-based pain management options when this occurs.

  5. Geomorphic Classification and Assessment of Channel Dynamics in the Missouri National Recreational River, South Dakota and Nebraska

    USGS Publications Warehouse

    Elliott, Caroline M.; Jacobson, Robert B.

    2006-01-01

    A multiscale geomorphic classification was established for the 39-mile, 59-mile, and adjacent segments of the Missouri National Recreational River administered by the National Park Service in South Dakota and Nebraska. The objective of the classification was to define naturally occurring clusters of geomorphic characteristics that would be indicative of discrete sets of geomorphic processes, with the intent that such a classification would be useful in river-management and rehabilitation decisions. The statistical classification was based on geomorphic characteristics of the river collected from 1999 orthophotography and the persistence of classified units was evaluated by comparison with similar datasets for 2003 and 2004 and by evaluating variation of bank erosion rates by geomorphic class. Changes in channel location and form were also explored using imagery and maps from 1993-2004, 1941 and 1894. The multivariate classification identified a hierarchy of naturally occurring clusters of reach-scale geomorphic characteristics. The simplest level of the hierarchy divides the river from segments into discrete reaches characterized by single and multithread channels and additional hierarchical levels established 4-part and 10-part classifications. The classification system presents a physical framework that can be applied to prioritization and design of bank stabilization projects, design of habitat rehabilitation projects, and stratification of monitoring and assessment sampling programs.

  6. Fatty acid methyl ester analysis to identify sources of soil in surface water.

    PubMed

    Banowetz, Gary M; Whittaker, Gerald W; Dierksen, Karen P; Azevedo, Mark D; Kennedy, Ann C; Griffith, Stephen M; Steiner, Jeffrey J

    2006-01-01

    Efforts to improve land-use practices to prevent contamination of surface waters with soil are limited by an inability to identify the primary sources of soil present in these waters. We evaluated the utility of fatty acid methyl ester (FAME) profiles of dry reference soils for multivariate statistical classification of soils collected from surface waters adjacent to agricultural production fields and a wooded riparian zone. Trials that compared approaches to concentrate soil from surface water showed that aluminum sulfate precipitation provided comparable yields to that obtained by vacuum filtration and was more suitable for handling large numbers of samples. Fatty acid methyl ester profiles were developed from reference soils collected from contrasting land uses in different seasons to determine whether specific fatty acids would consistently serve as variables in multivariate statistical analyses to permit reliable classification of soils. We used a Bayesian method and an independent iterative process to select appropriate fatty acids and found that variable selection was strongly impacted by the season during which soil was collected. The apparent seasonal variation in the occurrence of marker fatty acids in FAME profiles from reference soils prevented preparation of a standardized set of variables. Nevertheless, accurate classification of soil in surface water was achieved utilizing fatty acid variables identified in seasonally matched reference soils. Correlation analysis of entire chromatograms and subsequent discriminant analyses utilizing a restricted number of fatty acid variables showed that FAME profiles of soils exposed to the aquatic environment still had utility for classification at least 1 wk after submersion.

  7. Multivariate pattern analysis of fMRI data reveals deficits in distributed representations in schizophrenia

    PubMed Central

    Yoon, Jong H.; Tamir, Diana; Minzenberg, Michael J.; Ragland, J. Daniel; Ursu, Stefan; Carter, Cameron S.

    2009-01-01

    Background Multivariate pattern analysis is an alternative method of analyzing fMRI data, which is capable of decoding distributed neural representations. We applied this method to test the hypothesis of the impairment in distributed representations in schizophrenia. We also compared the results of this method with traditional GLM-based univariate analysis. Methods 19 schizophrenia and 15 control subjects viewed two runs of stimuli--exemplars of faces, scenes, objects, and scrambled images. To verify engagement with stimuli, subjects completed a 1-back matching task. A multi-voxel pattern classifier was trained to identify category-specific activity patterns on one run of fMRI data. Classification testing was conducted on the remaining run. Correlation of voxel-wise activity across runs evaluated variance over time in activity patterns. Results Patients performed the task less accurately. This group difference was reflected in the pattern analysis results with diminished classification accuracy in patients compared to controls, 59% and 72% respectively. In contrast, there was no group difference in GLM-based univariate measures. In both groups, classification accuracy was significantly correlated with behavioral measures. Both groups showed highly significant correlation between inter-run correlations and classification accuracy. Conclusions Distributed representations of visual objects are impaired in schizophrenia. This impairment is correlated with diminished task performance, suggesting that decreased integrity of cortical activity patterns is reflected in impaired behavior. Comparisons with univariate results suggest greater sensitivity of pattern analysis in detecting group differences in neural activity and reduced likelihood of non-specific factors driving these results. PMID:18822407

  8. The Effect of the Multivariate Box-Cox Transformation on the Power of MANOVA.

    ERIC Educational Resources Information Center

    Kirisci, Levent; Hsu, Tse-Chi

    Most of the multivariate statistical techniques rely on the assumption of multivariate normality. The effects of non-normality on multivariate tests are assumed to be negligible when variance-covariance matrices and sample sizes are equal. Therefore, in practice, investigators do not usually attempt to remove non-normality. In this simulation…

  9. Evaluation of Meterorite Amono Acid Analysis Data Using Multivariate Techniques

    NASA Technical Reports Server (NTRS)

    McDonald, G.; Storrie-Lombardi, M.; Nealson, K.

    1999-01-01

    The amino acid distributions in the Murchison carbonaceous chondrite, Mars meteorite ALH84001, and ice from the Allan Hills region of Antarctica are shown, using a multivariate technique known as Principal Component Analysis (PCA), to be statistically distinct from the average amino acid compostion of 101 terrestrial protein superfamilies.

  10. A discrete wavelet based feature extraction and hybrid classification technique for microarray data analysis.

    PubMed

    Bennet, Jaison; Ganaprakasam, Chilambuchelvan Arul; Arputharaj, Kannan

    2014-01-01

    Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN), naive Bayes, and support vector machine (SVM). Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique (MWT) is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.

  11. Electroencephalogram-based decoding cognitive states using convolutional neural network and likelihood ratio based score fusion

    PubMed Central

    2017-01-01

    Electroencephalogram (EEG)-based decoding human brain activity is challenging, owing to the low spatial resolution of EEG. However, EEG is an important technique, especially for brain–computer interface applications. In this study, a novel algorithm is proposed to decode brain activity associated with different types of images. In this hybrid algorithm, convolutional neural network is modified for the extraction of features, a t-test is used for the selection of significant features and likelihood ratio-based score fusion is used for the prediction of brain activity. The proposed algorithm takes input data from multichannel EEG time-series, which is also known as multivariate pattern analysis. Comprehensive analysis was conducted using data from 30 participants. The results from the proposed method are compared with current recognized feature extraction and classification/prediction techniques. The wavelet transform-support vector machine method is the most popular currently used feature extraction and prediction method. This method showed an accuracy of 65.7%. However, the proposed method predicts the novel data with improved accuracy of 79.9%. In conclusion, the proposed algorithm outperformed the current feature extraction and prediction method. PMID:28558002

  12. Multivariate Feature Selection of Image Descriptors Data for Breast Cancer with Computer-Assisted Diagnosis

    PubMed Central

    Galván-Tejada, Carlos E.; Zanella-Calzada, Laura A.; Galván-Tejada, Jorge I.; Celaya-Padilla, José M.; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L.

    2017-01-01

    Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions. PMID:28216571

  13. Multivariate Feature Selection of Image Descriptors Data for Breast Cancer with Computer-Assisted Diagnosis.

    PubMed

    Galván-Tejada, Carlos E; Zanella-Calzada, Laura A; Galván-Tejada, Jorge I; Celaya-Padilla, José M; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L

    2017-02-14

    Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions.

  14. Artificial Neural Networks in Policy Research: A Current Assessment.

    ERIC Educational Resources Information Center

    Woelfel, Joseph

    1993-01-01

    Suggests that artificial neural networks (ANNs) exhibit properties that promise usefulness for policy researchers. Notes that ANNs have found extensive use in areas once reserved for multivariate statistical programs such as regression and multiple classification analysis and are developing an extensive community of advocates for processing text…

  15. MMPI Modal Profiles in a Juvenile Delinquent Population.

    ERIC Educational Resources Information Center

    Pickett, Lawrence K., Jr.

    1981-01-01

    The MMPI results obtained from 245 adolescent males referred to the evaluation unit of a Juvenile Court were submitted to a multivariate classification system. By correlating individual subject profiles with the modal profiles, six membership groups were formed. No relationship was found between group membership and age or race. (Author)

  16. Text Classification for Organizational Researchers

    PubMed Central

    Kobayashi, Vladimer B.; Mol, Stefan T.; Berkers, Hannah A.; Kismihók, Gábor; Den Hartog, Deanne N.

    2017-01-01

    Organizations are increasingly interested in classifying texts or parts thereof into categories, as this enables more effective use of their information. Manual procedures for text classification work well for up to a few hundred documents. However, when the number of documents is larger, manual procedures become laborious, time-consuming, and potentially unreliable. Techniques from text mining facilitate the automatic assignment of text strings to categories, making classification expedient, fast, and reliable, which creates potential for its application in organizational research. The purpose of this article is to familiarize organizational researchers with text mining techniques from machine learning and statistics. We describe the text classification process in several roughly sequential steps, namely training data preparation, preprocessing, transformation, application of classification techniques, and validation, and provide concrete recommendations at each step. To help researchers develop their own text classifiers, the R code associated with each step is presented in a tutorial. The tutorial draws from our own work on job vacancy mining. We end the article by discussing how researchers can validate a text classification model and the associated output. PMID:29881249

  17. Quality characterization and pollution source identification of surface water using multivariate statistical techniques, Nalagarh Valley, Himachal Pradesh, India

    NASA Astrophysics Data System (ADS)

    Herojeet, Rajkumar; Rishi, Madhuri S.; Lata, Renu; Dolma, Konchok

    2017-09-01

    Sirsa River flows through the central part of the Nalagarh valley, belongs to the rapid industrial belt of Baddi, Barotiwala and Nalagarh (BBN). The appraisal of surface water quality to ascertain its utility in such ecologically sensitive areas is need of the hour. The present study envisages the application of multivariate analysis, water utility class and conventional graphical representation to reveal the hidden factor responsible for deterioration of water quality and determine the hydrochemical facies and its evolution processes of water types in Nalagarh valley, India. The quality assessment is made by estimating pH, electrical conductivity (EC), total dissolved solids (TDS), total hardness, major ions (Na+, K+, Ca2+, Mg2+, HCO3 -, Cl-, SO4 2-, NO3 - and PO4 3-), dissolved oxygen (DO), biological oxygen demand (BOD) and total coliform (TC) to determine its suitability for drinking and domestic purposes. The parameters like pH, TDS, TH, Ca2+, HCO3 -, Cl-, SO4 2-, NO3 - are within the desirable limit as per Bureau of Indian Standards (Indian Standard Drinking Water Specification (Second Edition) IS:10500. Indian Standard Institute, New Delhi, pp 1-18, 2012). Mg2+, Na+ and K+ ions for pre monsoon and EC during pre and post monsoon at few sites and approx 40% samples of BOD and TC for both seasons exceeds the permissible limits indicate organic contamination from human activities. Water quality classification for designated use indicates that maximum surface water samples are not suitable for drinking water source without conventional treatment. The result of piper trillinear and Chadha's diagram classified majority of surface water samples for both seasons fall in the fields of Ca2+-Mg2+-HCO3 - water type indicating temporary hardness. PCA and CA reveal that the surface water chemistry is influenced by natural factors such as weathering of minerals, ion exchange processes and anthropogenic factors. Thus, the present paper illustrates the importance of multivariate techniques for reliable quality characterization of surface water quality to develop effective pollution reduction strategies and maintain a fine balance between the industrialization and ecological integrity.

  18. Predictors of unsuccessful outcome in cemented femoral revisions using bone impaction grafting; Cox regression analysis of 208 cases.

    PubMed

    Te Stroet, Martijn A J; Rijnen, Wim H C; Gardeniers, Jean W M; Schreurs, B Willem; Hannink, Gerjon

    2016-09-29

    Despite improvements in the technique of femoral impaction bone grafting, reconstruction failures still can occur. Therefore, the aim of our study was to determine risk factors for the endpoint re-revision for any reason. We used prospectively collected demographic, clinical and surgical data of all 202 patients who underwent 208 femoral revisions using the X-change Femoral Revision System (Stryker-Howmedica), fresh-frozen morcellised allograft and a cemented polished Exeter stem in our department from 1991 to 2007. Univariable and multivariable Cox regression analyses were performed to identify potential factors associated with re-revision. The mean follow-up was 10.6 (5-21) years. The cumulative re-revision rate was 6.3% (13/208). After univariable selection, sex, age, body mass index (BMI), American Association of Anesthesiologists (ASA) classification, type of removed femoral component, and mesh used for reconstruction were included in multivariable regression analysis.In the multivariable analysis, BMI was the only factor that was significantly associated with the risk of re-revision after bone impaction grafting (BMI ≥30 vs. BMI <30, HR = 6.54 [95% CI 1.89-22.65]; p = 0.003). BMI was the only factor associated with the risk of re-revision for any reason. Besides BMI also other factors, such as Endoklinik score and the type of removed femoral component, can provide guidance in the process of preclinical decision making. With the knowledge obtained from this study, preoperative patient selection, informed consent, and treatment protocols can be better adjusted to the individual patient who needs to undergo a femoral revision with impaction bone grafting.

  19. Multivariate evoked response detection based on the spectral F-test.

    PubMed

    Rocha, Paulo Fábio F; Felix, Leonardo B; Miranda de Sá, Antonio Mauricio F L; Mendes, Eduardo M A M

    2016-05-01

    Objective response detection techniques, such as magnitude square coherence, component synchrony measure, and the spectral F-test, have been used to automate the detection of evoked responses. The performance of these detectors depends on both the signal-to-noise ratio (SNR) and the length of the electroencephalogram (EEG) signal. Recently, multivariate detectors were developed to increase the detection rate even in the case of a low signal-to-noise ratio or of short data records originated from EEG signals. In this context, an extension to the multivariate case of the spectral F-test detector is proposed. The performance of this technique is assessed using Monte Carlo. As an example, EEG data from 12 subjects during photic stimulation is used to demonstrate the usefulness of the proposed detector. The multivariate method showed detection rates consistently higher than those ones when only one signal was used. It is shown that the response detection in EEG signals with the multivariate technique was statistically significant if two or more EEG derivations were used. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. [Modern bacterial taxonomy: techniques review--application to bacteria that nodulate leguminous plants (BNL)].

    PubMed

    Zakhia, Frédéric; de Lajudie, Philippe

    2006-03-01

    Taxonomy is the science that studies the relationships between organisms. It comprises classification, nomenclature, and identification. Modern bacterial taxonomy is polyphasic. This means that it is based on several molecular techniques, each one retrieving the information at different cellular levels (proteins, fatty acids, DNA...). The obtained results are combined and analysed to reach a "consensus taxonomy" of a microorganism. Until 1970, a small number of classification techniques were available for microbiologists (mainly phenotypic characterization was performed: a legume species nodulation ability for a Rhizobium, for example). With the development of techniques based on polymerase chain reaction for characterization, the bacterial taxonomy has undergone great changes. In particular, the classification of the legume nodulating bacteria has been repeatedly modified over the last 20 years. We present here a review of the currently used molecular techniques in bacterial characterization, with examples of application of these techniques for the study of the legume nodulating bacteria.

  1. Transition from a multiport technique to a single-port technique for lung cancer surgery: is lymph node dissection inferior using the single-port technique?†.

    PubMed

    Liu, Chia-Chuan; Shih, Chih-Shiun; Pennarun, Nicolas; Cheng, Chih-Tao

    2016-01-01

    The feasibility and radicalism of lymph node dissection for lung cancer surgery by a single-port technique has frequently been challenged. We performed a retrospective cohort study to investigate this issue. Two chest surgeons initiated multiple-port thoracoscopic surgery in a 180-bed cancer centre in 2005 and shifted to a single-port technique gradually after 2010. Data, including demographic and clinical information, from 389 patients receiving multiport thoracoscopic lobectomy or segmentectomy and 149 consecutive patients undergoing either single-port lobectomy or segmentectomy for primary non-small-cell lung cancer were retrieved and entered for statistical analysis by multivariable linear regression models and Box-Cox transformed multivariable analysis. The mean number of total dissected lymph nodes in the lobectomy group was 28.5 ± 11.7 for the single-port group versus 25.2 ± 11.3 for the multiport group; the mean number of total dissected lymph nodes in the segmentectomy group was 19.5 ± 10.8 for the single-port group versus 17.9 ± 10.3 for the multiport group. In linear multivariable and after Box-Cox transformed multivariable analyses, the single-port approach was still associated with a higher total number of dissected lymph nodes. The total number of dissected lymph nodes for primary lung cancer surgery by single-port video-assisted thoracoscopic surgery (VATS) was higher than by multiport VATS in univariable, multivariable linear regression and Box-Cox transformed multivariable analyses. This study confirmed that highly effective lymph node dissection could be achieved through single-port VATS in our setting. © The Author 2015. Published by Oxford University Press on behalf of the European Association for Cardio-Thoracic Surgery. All rights reserved.

  2. Applications of remote sensing, volume 3

    NASA Technical Reports Server (NTRS)

    Landgrebe, D. A. (Principal Investigator)

    1977-01-01

    The author has identified the following significant results. Of the four change detection techniques (post classification comparison, delta data, spectral/temporal, and layered spectral temporal), the post classification comparison was selected for further development. This was based upon test performances of the four change detection method, straightforwardness of the procedures, and the output products desired. A standardized modified, supervised classification procedure for analyzing the Texas coastal zone data was compiled. This procedure was developed in order that all quadrangles in the study are would be classified using similar analysis techniques to allow for meaningful comparisons and evaluations of the classifications.

  3. Detection and characterization of glaucoma-like canine retinal tissues using Raman spectroscopy

    NASA Astrophysics Data System (ADS)

    Wang, Qi; Grozdanic, Sinisa D.; Harper, Matthew M.; Hamouche, Karl; Hamouche, Nicholas; Kecova, Helga; Lazic, Tatjana; Hernandez-Merino, Elena; Yu, Chenxu

    2013-06-01

    Early detection of pathological changes and progression in glaucoma and other neuroretinal diseases remains a great challenge and is critical to reduce permanent structural and functional retina and optic nerve damage. Raman spectroscopy is a sensitive technique that provides rapid biochemical characterization of tissues in a nondestructive and noninvasive fashion. In this study, spectroscopic analysis was conducted on the retinal tissues of seven beagles with acute elevation of intraocular pressure (AEIOP), six beagles with compressive optic neuropathy (CON), and five healthy beagles. Spectroscopic markers were identified associated with the different neuropathic conditions. Furthermore, the Raman spectra were subjected to multivariate discriminate analysis to classify independent tissue samples into diseased/healthy categories. The multivariate discriminant model yielded an average optimal classification accuracy of 72.6% for AEIOP and 63.4% for CON with 20 principal components being used that accounted for 87% of the total variance in the data set. A strong correlation (R2>0.92) was observed between pattern electroretinography characteristics of AEIOP dogs and Raman separation distance that measures the separation of spectra of diseased tissues from normal tissues; however, the underlining mechanism of this correlation remains to be understood. Since AEIOP mimics the pathological symptoms of acute/early-stage glaucoma, it was demonstrated that Raman spectroscopic screening has the potential to become a powerful tool for the detection and characterization of early-stage disease.

  4. A multivariate geostatistical methodology to delineate areas of potential interest for future sedimentary gold exploration

    PubMed Central

    Goovaerts, P.; Albuquerque, Teresa; Antunes, Margarida

    2015-01-01

    This paper describes a multivariate geostatistical methodology to delineate areas of potential interest for future sedimentary gold exploration, with an application to an abandoned sedimentary gold mining region in Portugal. The main challenge was the existence of only a dozen gold measurements confined to the grounds of the old gold mines, which precluded the application of traditional interpolation techniques, such as cokriging. The analysis could, however, capitalize on 376 stream sediment samples that were analyzed for twenty two elements. Gold (Au) was first predicted at all 376 locations using linear regression (R2=0.798) and four metals (Fe, As, Sn and W), which are known to be mostly associated with the local gold’s paragenesis. One hundred realizations of the spatial distribution of gold content were generated using sequential indicator simulation and a soft indicator coding of regression estimates, to supplement the hard indicator coding of gold measurements. Each simulated map then underwent a local cluster analysis to identify significant aggregates of low or high values. The one hundred classified maps were processed to derive the most likely classification of each simulated node and the associated probability of occurrence. Examining the distribution of the hot-spots and cold-spots reveals a clear enrichment in Au along the Erges River downstream from the old sedimentary mineralization. PMID:27777638

  5. Identification of the compositional changes in Orthosiphon stamineus leaves triggered by different drying techniques using 1 H NMR metabolomics.

    PubMed

    Pariyani, Raghunath; Ismail, Intan Safinar; Ahmad Azam, Amalina; Abas, Faridah; Shaari, Khozirah

    2017-09-01

    Java tea is a well-known herbal infusion prepared from the leaves of Orthosiphon stamineus (OS). The biological properties of tea are in direct correlation with the primary and secondary metabolite composition, which in turn largely depends on the choice of drying method. Herein, the impact of three commonly used drying methods, i.e. shade, microwave and freeze drying, on the metabolite composition and antioxidant activity of OS leaves was investigated using proton nuclear magnetic resonance ( 1 H NMR) spectroscopy combined with multivariate classification and regression analysis tools. A total of 31 constituents comprising primary and secondary metabolites belonging to the chemical classes of fatty acids, amino acids, sugars, terpenoids and phenolic compounds were identified. Shade-dried leaves were identified to possess the highest concentrations of bioactive secondary metabolites such as chlorogenic acid, caffeic acid, luteolin, orthosiphol and apigenin, followed by microwave-dried samples. Freeze-dried leaves had higher concentrations of choline, amino acids leucine, alanine and glutamine and sugars such as fructose and α-glucose, but contained the lowest levels of secondary metabolites. Metabolite profiling coupled with multivariate analysis identified shade drying as the best method to prepare OS leaves as Java tea or to include in traditional medicine preparation. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.

  6. Mini-DIAL system measurements coupled with multivariate data analysis to identify TIC and TIM simulants: preliminary absorption database analysis.

    NASA Astrophysics Data System (ADS)

    Gaudio, P.; Malizia, A.; Gelfusa, M.; Martinelli, E.; Di Natale, C.; Poggi, L. A.; Bellecci, C.

    2017-01-01

    Nowadays Toxic Industrial Components (TICs) and Toxic Industrial Materials (TIMs) are one of the most dangerous and diffuse vehicle of contamination in urban and industrial areas. The academic world together with the industrial and military one are working on innovative solutions to monitor the diffusion in atmosphere of such pollutants. In this phase the most common commercial sensors are based on “point detection” technology but it is clear that such instruments cannot satisfy the needs of the smart cities. The new challenge is developing stand-off systems to continuously monitor the atmosphere. Quantum Electronics and Plasma Physics (QEP) research group has a long experience in laser system development and has built two demonstrators based on DIAL (Differential Absorption of Light) technology could be able to identify chemical agents in atmosphere. In this work the authors will present one of those DIAL system, the miniaturized one, together with the preliminary results of an experimental campaign conducted on TICs and TIMs simulants in cell with aim of use the absorption database for the further atmospheric an analysis using the same DIAL system. The experimental results are analysed with standard multivariate data analysis technique as Principal Component Analysis (PCA) to develop a classification model aimed at identifying organic chemical compound in atmosphere. The preliminary results of absorption coefficients of some chemical compound are shown together pre PCA analysis.

  7. A multivariate study of mangrove morphology (Rhizophora mangle) using both above and below-water plant architecture

    USGS Publications Warehouse

    Brooks, R.A.; Bell, S.S.

    2005-01-01

    A descriptive study of the architecture of the red mangrove, Rhizophora mangle L., habitat of Tampa Bay, FL, was conducted to assess if plant architecture could be used to discriminate overwash from fringing forest type. Seven above-water (e.g., tree height, diameter at breast height, and leaf area) and 10 below-water (e.g., root density, root complexity, and maximum root order) architectural features were measured in eight mangrove stands. A multivariate technique (discriminant analysis) was used to test the ability of different models comprising above-water, below-water, or whole tree architecture to classify forest type. Root architectural features appear to be better than classical forestry measurements at discriminating between fringing and overwash forests but, regardless of the features loaded into the model, misclassification rates were high as forest type was only correctly classified in 66% of the cases. Based upon habitat architecture, the results of this study do not support a sharp distinction between overwash and fringing red mangrove forests in Tampa Bay but rather indicate that the two are architecturally undistinguishable. Therefore, within this northern portion of the geographic range of red mangroves, a more appropriate classification system based upon architecture may be one in which overwash and fringing forest types are combined into a single, "tide dominated" category. ?? 2005 Elsevier Ltd. All rights reserved.

  8. A comparative analysis of swarm intelligence techniques for feature selection in cancer classification.

    PubMed

    Gunavathi, Chellamuthu; Premalatha, Kandasamy

    2014-01-01

    Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.

  9. A Bio Medical Waste Identification and Classification Algorithm Using Mltrp and Rvm.

    PubMed

    Achuthan, Aravindan; Ayyallu Madangopal, Vasumathi

    2016-10-01

    We aimed to extract the histogram features for text analysis and, to classify the types of Bio Medical Waste (BMW) for garbage disposal and management. The given BMW was preprocessed by using the median filtering technique that efficiently reduced the noise in the image. After that, the histogram features of the filtered image were extracted with the help of proposed Modified Local Tetra Pattern (MLTrP) technique. Finally, the Relevance Vector Machine (RVM) was used to classify the BMW into human body parts, plastics, cotton and liquids. The BMW image was collected from the garbage image dataset for analysis. The performance of the proposed BMW identification and classification system was evaluated in terms of sensitivity, specificity, classification rate and accuracy with the help of MATLAB. When compared to the existing techniques, the proposed techniques provided the better results. This work proposes a new texture analysis and classification technique for BMW management and disposal. It can be used in many real time applications such as hospital and healthcare management systems for proper BMW disposal.

  10. F100 Multivariable Control Synthesis Program. Computer Implementation of the F100 Multivariable Control Algorithm

    NASA Technical Reports Server (NTRS)

    Soeder, J. F.

    1983-01-01

    As turbofan engines become more complex, the development of controls necessitate the use of multivariable control techniques. A control developed for the F100-PW-100(3) turbofan engine by using linear quadratic regulator theory and other modern multivariable control synthesis techniques is described. The assembly language implementation of this control on an SEL 810B minicomputer is described. This implementation was then evaluated by using a real-time hybrid simulation of the engine. The control software was modified to run with a real engine. These modifications, in the form of sensor and actuator failure checks and control executive sequencing, are discussed. Finally recommendations for control software implementations are presented.

  11. Evaluation of AMOEBA: a spectral-spatial classification method

    USGS Publications Warehouse

    Jenson, Susan K.; Loveland, Thomas R.; Bryant, J.

    1982-01-01

    Muitispectral remotely sensed images have been treated as arbitrary multivariate spectral data for purposes of clustering and classifying. However, the spatial properties of image data can also be exploited. AMOEBA is a clustering and classification method that is based on a spatially derived model for image data. In an evaluation test, Landsat data were classified with both AMOEBA and a widely used spectral classifier. The test showed that irrigated crop types can be classified as accurately with the AMOEBA method as with the generally used spectral method ISOCLS; the AMOEBA method, however, requires less computer time.

  12. Application of multivariate statistical techniques in microbial ecology.

    PubMed

    Paliy, O; Shankar, V

    2016-03-01

    Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.

  13. Can the new RCP R0/R1 classification predict the clinical outcome in ductal adenocarcinoma of the pancreatic head?

    PubMed

    Janot, M S; Kersting, S; Belyaev, O; Matuschek, A; Chromik, A M; Suelberg, D; Uhl, W; Tannapfel, A; Bergmann, U

    2012-08-01

    According to the International Union Against Cancer (UICC), R1 is defined as the microscopic presence of tumor cells at the surface of the resection margin (RM). In contrast, the Royal College of Pathologists (RCP) suggested to declare R1 already when tumor cells are found within 1 mm of the RM. The aim of this study was to determine the significance of the RM concerning the prognosis of pancreatic ductal adenocarcinoma (PDAC). From 2007 to 2009, 62 patients underwent a curative operation for PDAC of the pancreatic head. The relevance of R status on cumulative overall survival (OS) was assessed on univariate and multivariate analysis for both the classic R classification (UICC) and the suggestion of the RCP. Following the UICC criteria, a positive RM was detected in 8 %. Along with grading and lymph node ratio, R status revealed a significant impact on OS on univariate and multivariate analysis. Applying the suggestion of the RCP, R1 rate rose to 26 % resulting in no significant impact on OS in univariate analysis. Our study has shown that the RCP suggestion for R status has no impact on the prognosis of PDAC. In contrast, our data confirmed the UICC R classification of RM as well as N category, grading, and lymph node ratio as significant prognostic factors.

  14. Simulating Multivariate Nonnormal Data Using an Iterative Algorithm

    ERIC Educational Resources Information Center

    Ruscio, John; Kaczetow, Walter

    2008-01-01

    Simulating multivariate nonnormal data with specified correlation matrices is difficult. One especially popular method is Vale and Maurelli's (1983) extension of Fleishman's (1978) polynomial transformation technique to multivariate applications. This requires the specification of distributional moments and the calculation of an intermediate…

  15. Justification of Fuzzy Declustering Vector Quantization Modeling in Classification of Genotype-Image Phenotypes

    NASA Astrophysics Data System (ADS)

    Ng, Theam Foo; Pham, Tuan D.; Zhou, Xiaobo

    2010-01-01

    With the fast development of multi-dimensional data compression and pattern classification techniques, vector quantization (VQ) has become a system that allows large reduction of data storage and computational effort. One of the most recent VQ techniques that handle the poor estimation of vector centroids due to biased data from undersampling is to use fuzzy declustering-based vector quantization (FDVQ) technique. Therefore, in this paper, we are motivated to propose a justification of FDVQ based hidden Markov model (HMM) for investigating its effectiveness and efficiency in classification of genotype-image phenotypes. The performance evaluation and comparison of the recognition accuracy between a proposed FDVQ based HMM (FDVQ-HMM) and a well-known LBG (Linde, Buzo, Gray) vector quantization based HMM (LBG-HMM) will be carried out. The experimental results show that the performances of both FDVQ-HMM and LBG-HMM are almost similar. Finally, we have justified the competitiveness of FDVQ-HMM in classification of cellular phenotype image database by using hypotheses t-test. As a result, we have validated that the FDVQ algorithm is a robust and an efficient classification technique in the application of RNAi genome-wide screening image data.

  16. Classification of deadlift biomechanics with wearable inertial measurement units.

    PubMed

    O'Reilly, Martin A; Whelan, Darragh F; Ward, Tomas E; Delahunt, Eamonn; Caulfield, Brian M

    2017-06-14

    The deadlift is a compound full-body exercise that is fundamental in resistance training, rehabilitation programs and powerlifting competitions. Accurate quantification of deadlift biomechanics is important to reduce the risk of injury and ensure training and rehabilitation goals are achieved. This study sought to develop and evaluate deadlift exercise technique classification systems utilising Inertial Measurement Units (IMUs), recording at 51.2Hz, worn on the lumbar spine, both thighs and both shanks. It also sought to compare classification quality when these IMUs are worn in combination and in isolation. Two datasets of IMU deadlift data were collected. Eighty participants first completed deadlifts with acceptable technique and 5 distinct, deliberately induced deviations from acceptable form. Fifty-five members of this group also completed a fatiguing protocol (3-Repition Maximum test) to enable the collection of natural deadlift deviations. For both datasets, universal and personalised random-forests classifiers were developed and evaluated. Personalised classifiers outperformed universal classifiers in accuracy, sensitivity and specificity in the binary classification of acceptable or aberrant technique and in the multi-label classification of specific deadlift deviations. Whilst recent research has favoured universal classifiers due to the reduced overhead in setting them up for new system users, this work demonstrates that such techniques may not be appropriate for classifying deadlift technique due to the poor accuracy achieved. However, personalised classifiers perform very well in assessing deadlift technique, even when using data derived from a single lumbar-worn IMU to detect specific naturally occurring technique mistakes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Pattern recognition of satellite cloud imagery for improved weather prediction

    NASA Technical Reports Server (NTRS)

    Gautier, Catherine; Somerville, Richard C. J.; Volfson, Leonid B.

    1986-01-01

    The major accomplishment was the successful development of a method for extracting time derivative information from geostationary meteorological satellite imagery. This research is a proof-of-concept study which demonstrates the feasibility of using pattern recognition techniques and a statistical cloud classification method to estimate time rate of change of large-scale meteorological fields from remote sensing data. The cloud classification methodology is based on typical shape function analysis of parameter sets characterizing the cloud fields. The three specific technical objectives, all of which were successfully achieved, are as follows: develop and test a cloud classification technique based on pattern recognition methods, suitable for the analysis of visible and infrared geostationary satellite VISSR imagery; develop and test a methodology for intercomparing successive images using the cloud classification technique, so as to obtain estimates of the time rate of change of meteorological fields; and implement this technique in a testbed system incorporating an interactive graphics terminal to determine the feasibility of extracting time derivative information suitable for comparison with numerical weather prediction products.

  18. Automated analysis and classification of melanocytic tumor on skin whole slide images.

    PubMed

    Xu, Hongming; Lu, Cheng; Berendt, Richard; Jha, Naresh; Mandal, Mrinal

    2018-06-01

    This paper presents a computer-aided technique for automated analysis and classification of melanocytic tumor on skin whole slide biopsy images. The proposed technique consists of four main modules. First, skin epidermis and dermis regions are segmented by a multi-resolution framework. Next, epidermis analysis is performed, where a set of epidermis features reflecting nuclear morphologies and spatial distributions is computed. In parallel with epidermis analysis, dermis analysis is also performed, where dermal cell nuclei are segmented and a set of textural and cytological features are computed. Finally, the skin melanocytic image is classified into different categories such as melanoma, nevus or normal tissue by using a multi-class support vector machine (mSVM) with extracted epidermis and dermis features. Experimental results on 66 skin whole slide images indicate that the proposed technique achieves more than 95% classification accuracy, which suggests that the technique has the potential to be used for assisting pathologists on skin biopsy image analysis and classification. Copyright © 2018 Elsevier Ltd. All rights reserved.

  19. Data mining for rapid prediction of facility fit and debottlenecking of biomanufacturing facilities.

    PubMed

    Yang, Yang; Farid, Suzanne S; Thornhill, Nina F

    2014-06-10

    Higher titre processes can pose facility fit challenges in legacy biopharmaceutical purification suites with capacities originally matched to lower titre processes. Bottlenecks caused by mismatches in equipment sizes, combined with process fluctuations upon scale-up, can result in discarding expensive product. This paper describes a data mining decisional tool for rapid prediction of facility fit issues and debottlenecking of biomanufacturing facilities exposed to batch-to-batch variability and higher titres. The predictive tool comprised advanced multivariate analysis techniques to interrogate Monte Carlo stochastic simulation datasets that mimicked batch fluctuations in cell culture titres, step yields and chromatography eluate volumes. A decision tree classification method, CART (classification and regression tree) was introduced to explore the impact of these process fluctuations on product mass loss and reveal the root causes of bottlenecks. The resulting pictorial decision tree determined a series of if-then rules for the critical combinations of factors that lead to different mass loss levels. Three different debottlenecking strategies were investigated involving changes to equipment sizes, using higher capacity chromatography resins and elution buffer optimisation. The analysis compared the impact of each strategy on mass output, direct cost of goods per gram and processing time, as well as consideration of extra capital investment and space requirements. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.

  20. Chemometric studies for the characterization and differentiation of microorganisms using in situ derivatization and thermal desorption ion mobility spectrometry.

    PubMed

    Ochoa, Mariela L; Harrington, Peter B

    2005-02-01

    Whole-cell bacteria were characterized and differentiated by thermal desorption ion mobility spectrometry and chemometric modeling. Principal component analysis was used to evaluate the differences in the ion mobility spectra of whole-cell bacteria and the fatty acid methyl esters (FAMEs) generated in situ after derivatization of the bacterial lipids. Alternating least squares served to extract bacterial peaks from the complex ion mobility spectra of intact microorganisms and, therefore, facilitated the characterization of bacterial strains, species, and Gram type. In situ thermal hydrolysis/methylation with tetramethylammonium hydroxide was necessary for the differentiation of Escherichia coli strains, which otherwise could not be distinguished by spectra acquired with the ITEMISER ion mobility spectrometer. The addition of the methylating agent had no effect on Gram-positive bacteria, and therefore, they could not be differentiated by genera. The classification of E. coli strains was possible by analysis of the IMS spectra from the FAMEs generated in situ. By using the fuzzy multivariate rule-building expert system and cross-validation, a correct classification rate of 96% (22 out of 23 spectra) was obtained. Chemometric modeling on bacterial ion mobility spectra coupled to thermal hydrolysis/methylation proved a simple, rapid (2 min/sample), inexpensive, and sensitive technique to characterize and differentiate intact microorganisms. The ITEMISER ion mobility spectrometer could detect as few as 4 x 10(6) cells/sample.

  1. Segmentation of prostate biopsy needles in transrectal ultrasound images

    NASA Astrophysics Data System (ADS)

    Krefting, Dagmar; Haupt, Barbara; Tolxdorff, Thomas; Kempkensteffen, Carsten; Miller, Kurt

    2007-03-01

    Prostate cancer is the most common cancer in men. Tissue extraction at different locations (biopsy) is the gold-standard for diagnosis of prostate cancer. These biopsies are commonly guided by transrectal ultrasound imaging (TRUS). Exact location of the extracted tissue within the gland is desired for more specific diagnosis and provides better therapy planning. While the orientation and the position of the needle within clinical TRUS image are limited, the appearing length and visibility of the needle varies strongly. Marker lines are present and tissue inhomogeneities and deflection artefacts may appear. Simple intensity, gradient oder edge-detecting based segmentation methods fail. Therefore a multivariate statistical classificator is implemented. The independent feature model is built by supervised learning using a set of manually segmented needles. The feature space is spanned by common binary object features as size and eccentricity as well as imaging-system dependent features like distance and orientation relative to the marker line. The object extraction is done by multi-step binarization of the region of interest. The ROI is automatically determined at the beginning of the segmentation and marker lines are removed from the images. The segmentation itself is realized by scale-invariant classification using maximum likelihood estimation and Mahalanobis distance as discriminator. The technique presented here could be successfully applied in 94% of 1835 TRUS images from 30 tissue extractions. It provides a robust method for biopsy needle localization in clinical prostate biopsy TRUS images.

  2. Chemical indices and methods of multivariate statistics as a tool for odor classification.

    PubMed

    Mahlke, Ingo T; Thiesen, Peter H; Niemeyer, Bernd

    2007-04-01

    Industrial and agricultural off-gas streams are comprised of numerous volatile compounds, many of which have substantially different odorous properties. State-of-the-art waste-gas treatment includes the characterization of these molecules and is directed at, if possible, either the avoidance of such odorants during processing or the use of existing standardized air purification techniques like bioscrubbing or afterburning, which however, often show low efficiency under ecological and economical regards. Selective odor separation from the off-gas streams could ease many of these disadvantages but is not yet widely applicable. Thus, the aim of this paper is to identify possible model substances in selective odor separation research from 155 volatile molecules mainly originating from livestock facilities, fat refineries, and cocoa and coffee production by knowledge-based methods. All compounds are examined with regard to their structure and information-content using topological and information-theoretical indices. Resulting data are fitted in an observation matrix, and similarities between the substances are computed. Principal component analysis and k-means cluster analysis are conducted showing that clustering of indices data can depict odor information correlating well to molecular composition and molecular shape. Quantitative molecule describtion along with the application of such statistical means therefore provide a good classification tool of malodorant structure properties with no thermodynamic data needed. The approximate look-alike shape of odorous compounds within the clusters suggests a fair choice of possible model molecules.

  3. Developing an Automated Machine Learning Marine Oil Spill Detection System with Synthetic Aperture Radar

    NASA Astrophysics Data System (ADS)

    Pinales, J. C.; Graber, H. C.; Hargrove, J. T.; Caruso, M. J.

    2016-02-01

    Previous studies have demonstrated the ability to detect and classify marine hydrocarbon films with spaceborne synthetic aperture radar (SAR) imagery. The dampening effects of hydrocarbon discharges on small surface capillary-gravity waves renders the ocean surface "radar dark" compared with the standard wind-borne ocean surfaces. Given the scope and impact of events like the Deepwater Horizon oil spill, the need for improved, automated and expedient monitoring of hydrocarbon-related marine anomalies has become a pressing and complex issue for governments and the extraction industry. The research presented here describes the development, training, and utilization of an algorithm that detects marine oil spills in an automated, semi-supervised manner, utilizing X-, C-, or L-band SAR data as the primary input. Ancillary datasets include related radar-borne variables (incidence angle, etc.), environmental data (wind speed, etc.) and textural descriptors. Shapefiles produced by an experienced human-analyst served as targets (validation) during the training portion of the investigation. Training and testing datasets were chosen for development and assessment of algorithm effectiveness as well as optimal conditions for oil detection in SAR data. The algorithm detects oil spills by following a 3-step methodology: object detection, feature extraction, and classification. Previous oil spill detection and classification methodologies such as machine learning algorithms, artificial neural networks (ANN), and multivariate classification methods like partial least squares-discriminant analysis (PLS-DA) are evaluated and compared. Statistical, transform, and model-based image texture techniques, commonly used for object mapping directly or as inputs for more complex methodologies, are explored to determine optimal textures for an oil spill detection system. The influence of the ancillary variables is explored, with a particular focus on the role of strong vs. weak wind forcing.

  4. Fluorescence spectroscopy for rapid detection and classification of bacterial pathogens.

    PubMed

    Sohn, Miryeong; Himmelsbach, David S; Barton, Franklin E; Fedorka-Cray, Paula J

    2009-11-01

    This study deals with the rapid detection and differentiation of Escherichia coli, Salmonella, and Campylobacter, which are the most commonly identified commensal and pathogenic bacteria in foods, using fluorescence spectroscopy and multivariate analysis. Each bacterial sample cultured under controlled conditions was diluted in physiologic saline for analysis. Fluorescence spectra were collected over a range of 200-700 nm with 0.5 nm intervals on the PerkinElmer Fluorescence Spectrometer. The synchronous scan technique was employed to find the optimum excitation (lambda(ex)) and emission (lambda(em)) wavelengths for individual bacteria with the wavelength interval (Deltalambda) being varied from 10 to 200 nm. The synchronous spectra and two-dimensional plots showed two maximum lambda(ex) values at 225 nm and 280 nm and one maximum lambda(em) at 335-345 nm (lambda(em) = lambda(ex) + Deltalambda), which correspond to the lambda(ex) = 225 nm, Deltalambda = 110-120 nm, and lambda(ex) = 280 nm, Deltalambda = 60-65 nm. For all three bacterial genera, the same synchronous scan results were obtained. The emission spectra from the three bacteria groups were very similar, creating difficulty in classification. However, the application of principal component analysis (PCA) to the fluorescence spectra resulted in successful classification of the bacteria by their genus as well as determining their concentration. The detection limit was approximately 10(3)-10(4) cells/mL for each bacterial sample. These results demonstrated that fluorescence spectroscopy, when coupled with PCA processing, has the potential to detect and to classify bacterial pathogens in liquids. The methodology is rapid (>10 min), inexpensive, and requires minimal sample preparation compared to standard analytical methods for bacterial detection.

  5. Toward literature-based feature selection for diagnostic classification: a meta-analysis of resting-state fMRI in depression.

    PubMed

    Sundermann, Benedikt; Olde Lütke Beverborg, Mona; Pfleiderer, Bettina

    2014-01-01

    Information derived from functional magnetic resonance imaging (fMRI) during wakeful rest has been introduced as a candidate diagnostic biomarker in unipolar major depressive disorder (MDD). Multiple reports of resting state fMRI in MDD describe group effects. Such prior knowledge can be adopted to pre-select potentially discriminating features for diagnostic classification models with the aim to improve diagnostic accuracy. Purpose of this analysis was to consolidate spatial information about alterations of spontaneous brain activity in MDD, primarily to serve as feature selection for multivariate pattern analysis techniques (MVPA). Thirty two studies were included in final analyses. Coordinates extracted from the original reports were assigned to two categories based on directionality of findings. Meta-analyses were calculated using the non-additive activation likelihood estimation approach with coordinates organized by subject group to account for non-independent samples. Converging evidence revealed a distributed pattern of brain regions with increased or decreased spontaneous activity in MDD. The most distinct finding was hyperactivity/hyperconnectivity presumably reflecting the interaction of cortical midline structures (posterior default mode network components including the precuneus and neighboring posterior cingulate cortices associated with self-referential processing and the subgenual anterior cingulate and neighboring medial frontal cortices) with lateral prefrontal areas related to externally-directed cognition. Other areas of hyperactivity/hyperconnectivity include the left lateral parietal cortex, right hippocampus and right cerebellum whereas hypoactivity/hypoconnectivity was observed mainly in the left temporal cortex, the insula, precuneus, superior frontal gyrus, lentiform nucleus and thalamus. Results are made available in two different data formats to be used as spatial hypotheses in future studies, particularly for diagnostic classification by MVPA.

  6. Contextual classification of multispectral image data: An unbiased estimator for the context distribution

    NASA Technical Reports Server (NTRS)

    Tilton, J. C.; Swain, P. H. (Principal Investigator); Vardeman, S. B.

    1981-01-01

    A key input to a statistical classification algorithm, which exploits the tendency of certain ground cover classes to occur more frequently in some spatial context than in others, is a statistical characterization of the context: the context distribution. An unbiased estimator of the context distribution is discussed which, besides having the advantage of statistical unbiasedness, has the additional advantage over other estimation techniques of being amenable to an adaptive implementation in which the context distribution estimate varies according to local contextual information. Results from applying the unbiased estimator to the contextual classification of three real LANDSAT data sets are presented and contrasted with results from non-contextual classifications and from contextual classifications utilizing other context distribution estimation techniques.

  7. Plant species classification using flower images—A comparative study of local feature representations

    PubMed Central

    Seeland, Marco; Rzanny, Michael; Alaqraa, Nedal; Wäldchen, Jana; Mäder, Patrick

    2017-01-01

    Steady improvements of image description methods induced a growing interest in image-based plant species classification, a task vital to the study of biodiversity and ecological sensitivity. Various techniques have been proposed for general object classification over the past years and several of them have already been studied for plant species classification. However, results of these studies are selective in the evaluated steps of a classification pipeline, in the utilized datasets for evaluation, and in the compared baseline methods. No study is available that evaluates the main competing methods for building an image representation on the same datasets allowing for generalized findings regarding flower-based plant species classification. The aim of this paper is to comparatively evaluate methods, method combinations, and their parameters towards classification accuracy. The investigated methods span from detection, extraction, fusion, pooling, to encoding of local features for quantifying shape and color information of flower images. We selected the flower image datasets Oxford Flower 17 and Oxford Flower 102 as well as our own Jena Flower 30 dataset for our experiments. Findings show large differences among the various studied techniques and that their wisely chosen orchestration allows for high accuracies in species classification. We further found that true local feature detectors in combination with advanced encoding methods yield higher classification results at lower computational costs compared to commonly used dense sampling and spatial pooling methods. Color was found to be an indispensable feature for high classification results, especially while preserving spatial correspondence to gray-level features. In result, our study provides a comprehensive overview of competing techniques and the implications of their main parameters for flower-based plant species classification. PMID:28234999

  8. Stability and bias of classification rates in biological applications of discriminant analysis

    USGS Publications Warehouse

    Williams, B.K.; Titus, K.; Hines, J.E.

    1990-01-01

    We assessed the sampling stability of classification rates in discriminant analysis by using a factorial design with factors for multivariate dimensionality, dispersion structure, configuration of group means, and sample size. A total of 32,400 discriminant analyses were conducted, based on data from simulated populations with appropriate underlying statistical distributions. Simulation results indicated strong bias in correct classification rates when group sample sizes were small and when overlap among groups was high. We also found that stability of the correct classification rates was influenced by these factors, indicating that the number of samples required for a given level of precision increases with the amount of overlap among groups. In a review of 60 published studies, we found that 57% of the articles presented results on classification rates, though few of them mentioned potential biases in their results. Wildlife researchers should choose the total number of samples per group to be at least 2 times the number of variables to be measured when overlap among groups is low. Substantially more samples are required as the overlap among groups increases

  9. Urogenital tuberculosis: definition and classification.

    PubMed

    Kulchavenya, Ekaterina

    2014-10-01

    To improve the approach to the diagnosis and management of urogenital tuberculosis (UGTB), we need clear and unique classification. UGTB remains an important problem, especially in developing countries, but it is often an overlooked disease. As with any other infection, UGTB should be cured by antibacterial therapy, but because of late diagnosis it may often require surgery. Scientific literature dedicated to this problem was critically analyzed and juxtaposed with the author's own more than 30 years' experience in tuberculosis urology. The conception, terms and definition were consolidated into one system; classification stage by stage as well as complications are presented. Classification of any disease includes dispersion on forms and stages and exact definitions for each stage. Clinical features and symptoms significantly vary between different forms and stages of UGTB. A simple diagnostic algorithm was constructed. UGTB is multivariant disease and a standard unified approach to it is impossible. Clear definition as well as unique classification are necessary for real estimation of epidemiology and the optimization of therapy. The term 'UGTB' has insufficient information in order to estimate therapy, surgery and prognosis, or to evaluate the epidemiology.

  10. Characterization of Escherichia coli isolates from different fecal sources by means of classification tree analysis of fatty acid methyl ester (FAME) profiles.

    PubMed

    Seurinck, Sylvie; Deschepper, Ellen; Deboch, Bishaw; Verstraete, Willy; Siciliano, Steven

    2006-03-01

    Microbial source tracking (MST) methods need to be rapid, inexpensive and accurate. Unfortunately, many MST methods provide a wealth of information that is difficult to interpret by the regulators who use this information to make decisions. This paper describes the use of classification tree analysis to interpret the results of a MST method based on fatty acid methyl ester (FAME) profiles of Escherichia coli isolates, and to present results in a format readily interpretable by water quality managers. Raw sewage E. coli isolates and animal E. coli isolates from cow, dog, gull, and horse were isolated and their FAME profiles collected. Correct classification rates determined with leaveone-out cross-validation resulted in an overall low correct classification rate of 61%. A higher overall correct classification rate of 85% was obtained when the animal isolates were pooled together and compared to the raw sewage isolates. Bootstrap aggregation or adaptive resampling and combining of the FAME profile data increased correct classification rates substantially. Other MST methods may be better suited to differentiate between different fecal sources but classification tree analysis has enabled us to distinguish raw sewage from animal E. coli isolates, which previously had not been possible with other multivariate methods such as principal component analysis and cluster analysis.

  11. Three-Way Analysis of Spectrospatial Electromyography Data: Classification and Interpretation

    PubMed Central

    Kauppi, Jukka-Pekka; Hahne, Janne; Müller, Klaus-Robert; Hyvärinen, Aapo

    2015-01-01

    Classifying multivariate electromyography (EMG) data is an important problem in prosthesis control as well as in neurophysiological studies and diagnosis. With modern high-density EMG sensor technology, it is possible to capture the rich spectrospatial structure of the myoelectric activity. We hypothesize that multi-way machine learning methods can efficiently utilize this structure in classification as well as reveal interesting patterns in it. To this end, we investigate the suitability of existing three-way classification methods to EMG-based hand movement classification in spectrospatial domain, as well as extend these methods by sparsification and regularization. We propose to use Fourier-domain independent component analysis as preprocessing to improve classification and interpretability of the results. In high-density EMG experiments on hand movements across 10 subjects, three-way classification yielded higher average performance compared with state-of-the art classification based on temporal features, suggesting that the three-way analysis approach can efficiently utilize detailed spectrospatial information of high-density EMG. Phase and amplitude patterns of features selected by the classifier in finger-movement data were found to be consistent with known physiology. Thus, our approach can accurately resolve hand and finger movements on the basis of detailed spectrospatial information, and at the same time allows for physiological interpretation of the results. PMID:26039100

  12. Predicting trauma patient mortality: ICD [or ICD-10-AM] versus AIS based approaches.

    PubMed

    Willis, Cameron D; Gabbe, Belinda J; Jolley, Damien; Harrison, James E; Cameron, Peter A

    2010-11-01

    The International Classification of Diseases Injury Severity Score (ICISS) has been proposed as an International Classification of Diseases (ICD)-10-based alternative to mortality prediction tools that use Abbreviated Injury Scale (AIS) data, including the Trauma and Injury Severity Score (TRISS). To date, studies have not examined the performance of ICISS using Australian trauma registry data. This study aimed to compare the performance of ICISS with other mortality prediction tools in an Australian trauma registry. This was a retrospective review of prospectively collected data from the Victorian State Trauma Registry. A training dataset was created for model development and a validation dataset for evaluation. The multiplicative ICISS model was compared with a worst injury ICISS approach, Victorian TRISS (V-TRISS, using local coefficients), maximum AIS severity and a multivariable model including ICD-10-AM codes as predictors. Models were investigated for discrimination (C-statistic) and calibration (Hosmer-Lemeshow statistic). The multivariable approach had the highest level of discrimination (C-statistic 0.90) and calibration (H-L 7.65, P= 0.468). Worst injury ICISS, V-TRISS and maximum AIS had similar performance. The multiplicative ICISS produced the lowest level of discrimination (C-statistic 0.80) and poorest calibration (H-L 50.23, P < 0.001). The performance of ICISS may be affected by the data used to develop estimates, the ICD version employed, the methods for deriving estimates and the inclusion of covariates. In this analysis, a multivariable approach using ICD-10-AM codes was the best-performing method. A multivariable ICISS approach may therefore be a useful alternative to AIS-based methods and may have comparable predictive performance to locally derived TRISS models. © 2010 The Authors. ANZ Journal of Surgery © 2010 Royal Australasian College of Surgeons.

  13. The use of the modified Cholesky decomposition in divergence and classification calculations

    NASA Technical Reports Server (NTRS)

    Vanroony, D. L.; Lynn, M. S.; Snyder, C. H.

    1973-01-01

    The use of the Cholesky decomposition technique is analyzed as applied to the feature selection and classification algorithms used in the analysis of remote sensing data (e.g. as in LARSYS). This technique is approximately 30% faster in classification and a factor of 2-3 faster in divergence, as compared with LARSYS. Also numerical stability and accuracy are slightly improved. Other methods necessary to deal with numerical stablity problems are briefly discussed.

  14. The use of the modified Cholesky decomposition in divergence and classification calculations

    NASA Technical Reports Server (NTRS)

    Van Rooy, D. L.; Lynn, M. S.; Snyder, C. H.

    1973-01-01

    This report analyzes the use of the modified Cholesky decomposition technique as applied to the feature selection and classification algorithms used in the analysis of remote sensing data (e.g., as in LARSYS). This technique is approximately 30% faster in classification and a factor of 2-3 faster in divergence, as compared with LARSYS. Also numerical stability and accuracy are slightly improved. Other methods necessary to deal with numerical stability problems are briefly discussed.

  15. Time-reversal imaging for classification of submerged elastic targets via Gibbs sampling and the Relevance Vector Machine.

    PubMed

    Dasgupta, Nilanjan; Carin, Lawrence

    2005-04-01

    Time-reversal imaging (TRI) is analogous to matched-field processing, although TRI is typically very wideband and is appropriate for subsequent target classification (in addition to localization). Time-reversal techniques, as applied to acoustic target classification, are highly sensitive to channel mismatch. Hence, it is crucial to estimate the channel parameters before time-reversal imaging is performed. The channel-parameter statistics are estimated here by applying a geoacoustic inversion technique based on Gibbs sampling. The maximum a posteriori (MAP) estimate of the channel parameters are then used to perform time-reversal imaging. Time-reversal implementation requires a fast forward model, implemented here by a normal-mode framework. In addition to imaging, extraction of features from the time-reversed images is explored, with these applied to subsequent target classification. The classification of time-reversed signatures is performed by the relevance vector machine (RVM). The efficacy of the technique is analyzed on simulated in-channel data generated by a free-field finite element method (FEM) code, in conjunction with a channel propagation model, wherein the final classification performance is demonstrated to be relatively insensitive to the associated channel parameters. The underlying theory of Gibbs sampling and TRI are presented along with the feature extraction and target classification via the RVM.

  16. An automatic taxonomy of galaxy morphology using unsupervised machine learning

    NASA Astrophysics Data System (ADS)

    Hocking, Alex; Geach, James E.; Sun, Yi; Davey, Neil

    2018-01-01

    We present an unsupervised machine learning technique that automatically segments and labels galaxies in astronomical imaging surveys using only pixel data. Distinct from previous unsupervised machine learning approaches used in astronomy we use no pre-selection or pre-filtering of target galaxy type to identify galaxies that are similar. We demonstrate the technique on the Hubble Space Telescope (HST) Frontier Fields. By training the algorithm using galaxies from one field (Abell 2744) and applying the result to another (MACS 0416.1-2403), we show how the algorithm can cleanly separate early and late type galaxies without any form of pre-directed training for what an 'early' or 'late' type galaxy is. We then apply the technique to the HST Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) fields, creating a catalogue of approximately 60 000 classifications. We show how the automatic classification groups galaxies of similar morphological (and photometric) type and make the classifications public via a catalogue, a visual catalogue and galaxy similarity search. We compare the CANDELS machine-based classifications to human-classifications from the Galaxy Zoo: CANDELS project. Although there is not a direct mapping between Galaxy Zoo and our hierarchical labelling, we demonstrate a good level of concordance between human and machine classifications. Finally, we show how the technique can be used to identify rarer objects and present lensed galaxy candidates from the CANDELS imaging.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Honorio, J.; Goldstein, R.; Honorio, J.

    We propose a simple, well grounded classification technique which is suited for group classification on brain fMRI data sets that have high dimensionality, small number of subjects, high noise level, high subject variability, imperfect registration and capture subtle cognitive effects. We propose threshold-split region as a new feature selection method and majority voteas the classification technique. Our method does not require a predefined set of regions of interest. We use average acros ssessions, only one feature perexperimental condition, feature independence assumption, and simple classifiers. The seeming counter-intuitive approach of using a simple design is supported by signal processing and statisticalmore » theory. Experimental results in two block design data sets that capture brain function under distinct monetary rewards for cocaine addicted and control subjects, show that our method exhibits increased generalization accuracy compared to commonly used feature selection and classification techniques.« less

  18. Remodeling characteristics and collagen distribution in synthetic mesh materials explanted from human subjects after abdominal wall reconstruction: an analysis of remodeling characteristics by patient risk factors and surgical site classifications

    PubMed Central

    Cavallo, Jaime A.; Roma, Andres A.; Jasielec, Mateusz S.; Ousley, Jenny; Creamer, Jennifer; Pichert, Matthew D.; Baalman, Sara; Frisella, Margaret M.; Matthews, Brent D.

    2014-01-01

    Background The purpose of this study was to evaluate the associations between patient characteristics or surgical site classifications and the histologic remodeling scores of synthetic meshes biopsied from their abdominal wall repair sites in the first attempt to generate a multivariable risk prediction model of non-constructive remodeling. Methods Biopsies of the synthetic meshes were obtained from the abdominal wall repair sites of 51 patients during a subsequent abdominal re-exploration. Biopsies were stained with hematoxylin and eosin, and evaluated according to a semi-quantitative scoring system for remodeling characteristics (cell infiltration, cell types, extracellular matrix deposition, inflammation, fibrous encapsulation, and neovascularization) and a mean composite score (CR). Biopsies were also stained with Sirius Red and Fast Green, and analyzed to determine the collagen I:III ratio. Based on univariate analyses between subject clinical characteristics or surgical site classification and the histologic remodeling scores, cohort variables were selected for multivariable regression models using a threshold p value of ≤0.200. Results The model selection process for the extracellular matrix score yielded two variables: subject age at time of mesh implantation, and mesh classification (c-statistic = 0.842). For CR score, the model selection process yielded two variables: subject age at time of mesh implantation and mesh classification (r2 = 0.464). The model selection process for the collagen III area yielded a model with two variables: subject body mass index at time of mesh explantation and pack-year history (r2 = 0.244). Conclusion Host characteristics and surgical site assessments may predict degree of remodeling for synthetic meshes used to reinforce abdominal wall repair sites. These preliminary results constitute the first steps in generating a risk prediction model that predicts the patients and clinical circumstances for which non-constructive remodeling of an abdominal wall repair site with synthetic mesh reinforcement is most likely to occur. PMID:24442681

  19. Malware distributed collection and pre-classification system using honeypot technology

    NASA Astrophysics Data System (ADS)

    Grégio, André R. A.; Oliveira, Isabela L.; Santos, Rafael D. C.; Cansian, Adriano M.; de Geus, Paulo L.

    2009-04-01

    Malware has become a major threat in the last years due to the ease of spread through the Internet. Malware detection has become difficult with the use of compression, polymorphic methods and techniques to detect and disable security software. Those and other obfuscation techniques pose a problem for detection and classification schemes that analyze malware behavior. In this paper we propose a distributed architecture to improve malware collection using different honeypot technologies to increase the variety of malware collected. We also present a daemon tool developed to grab malware distributed through spam and a pre-classification technique that uses antivirus technology to separate malware in generic classes.

  20. Characterizing multivariate decoding models based on correlated EEG spectral features

    PubMed Central

    McFarland, Dennis J.

    2013-01-01

    Objective Multivariate decoding methods are popular techniques for analysis of neurophysiological data. The present study explored potential interpretative problems with these techniques when predictors are correlated. Methods Data from sensorimotor rhythm-based cursor control experiments was analyzed offline with linear univariate and multivariate models. Features were derived from autoregressive (AR) spectral analysis of varying model order which produced predictors that varied in their degree of correlation (i.e., multicollinearity). Results The use of multivariate regression models resulted in much better prediction of target position as compared to univariate regression models. However, with lower order AR features interpretation of the spectral patterns of the weights was difficult. This is likely to be due to the high degree of multicollinearity present with lower order AR features. Conclusions Care should be exercised when interpreting the pattern of weights of multivariate models with correlated predictors. Comparison with univariate statistics is advisable. Significance While multivariate decoding algorithms are very useful for prediction their utility for interpretation may be limited when predictors are correlated. PMID:23466267

  1. Neural networks in chemistry

    NASA Astrophysics Data System (ADS)

    Zupan, Jure

    1995-04-01

    All problems that in some way are linked to handling of multi-variate experiments versus multi-variate responses can be approached by the group of methods that has recently became known as the artificial neural network (ANN) techniques. In this lecture, the types of the problems that can be solved by ANN techniques rather than the ANN techniques themselves will be addressed first. This issue is rather important due to the fact that the ANN techniques can be used for a very broad range of problems and choosing the wrong method can often result in either a failure to produce an effective solution or in a very time consuming and ineffective handling. Among the types of problems that can be solved by different ANN techniques the classification, mapping, look-up table, and modelling will be emphasized and discussed. Because all mentioned methods can be solved by different standard techniques, special emphasis will be paid to stress the advantages and drawbacks when employing different ANN techniques. Due to the fact that the range of possible use of ANN is so broad, even a very specific problem can be solved by many different ANN architectures or even using different learning strategies within ANN. In the second part the main learning strategies and corresponding choices of ANN architectures will be discussed. In this part the parameters and some guidelines how to select the method and the design of the ANNs will be shown on the examples of reported ANN applications in chemistry. The ANN learning strategies discussed will be back-propagation of errors, the Kohonen, and the counter propagation learning. The potential user of ANN should first, consider the problem, second, he must inspect the availability of data and the data themselves to decide for which ANN method they are best suited. In this respect, the amount of data, the dimensionality of the measurement space, the form of data (alphanumeric entries, binary, real, or even mixed forms of data) are crucial. After considering all this factors, the determination of the appropriate neural network architecture can be made. Additionally, the selection the optimal ANN involves the determination of specific internal parameters like the learning rate, the momentum term, the neighbourhood function, the time dependent decrease of corrections, etc. Even after all these decisions have been made the learning procedure itself is not a straightforward task. Here, the division of the entire ensemble of data into three data sets: training, controlling and the test set are crucial. This problem is addressed as well.

  2. NIR technique in the classification of cotton leaf grade

    USDA-ARS?s Scientific Manuscript database

    Near infrared (NIR) spectroscopy, a useful technique due to the speed, ease of use, and adaptability to on-line or off-line implementation, has been applied to perform the qualitative classification and quantitative prediction of cotton quality characteristics, including trash index. One term to as...

  3. Metabolomic analysis applied to chemosystematics and evolution of megadiverse Brazilian Vernonieae (Asteraceae).

    PubMed

    Gallon, Marília Elias; Monge, Marcelo; Casoti, Rosana; Da Costa, Fernando Batista; Semir, João; Gobbo-Neto, Leonardo

    2018-06-01

    Vernonia sensu lato is the largest and most complex genus of the tribe Vernonieae (Asteraceae). The tribe is chemically characterized by the presence of sesquiterpene lactones and flavonoids. Over the years, several taxonomic classifications have been proposed for Vernonia s.l. and for the tribe; however, there has been no consensus among the researches. According to traditional classification, Vernonia s.l. comprises more than 1000 species divided into sections, subsections and series (sensu Bentham). In a more recent classification, these species have been segregated into other genera and some subtribes were proposed, while the genus Vernonia sensu stricto was restricted to 22 species distributed mainly in North America (sensu Robinson). In this study, species from the subtribes Vernoniinae, Lepidaploinae and Rolandrinae were analyzed by UHPLC-UV-HRMS followed by multivariate statistical analysis. Data mining was performed using unsupervised (HCA and PCA) and supervised methods (OPLS-DA). The HCA showed the segregation of the species into four main groups. Comparing the HCA with taxonomical classifications of Vernonieae, we observed that the groups of the dendogram, based on metabolic profiling, were in accordance with the generic classification proposed by Robinson and with previous phylogenetic studies. The species of the genera Stenocephalum, Stilpnopappus, Strophopappus and Rolandra (Group 1) were revealed to be more related to the species of the genus Vernonanthura (Group 2), while the genera Cyrtocymura, Chrysolaena and Echinocoryne (Group 3) were chemically more similar to the genera Lessingianthus and Lepidaploa (Group 4). These findings indicated that the subtribes Vernoniinae and Lepidaploinae are non-chemically homogeneous groups and highlighted the application of untargeted metabolomic tools for taxonomy and as indicators of species evolution. Discriminant compounds for the groups obtained by OPLS-DA were determined. Groups 1 and 2 were characterized by the presence of 3',4'-dimethoxyluteolin, glaucolide A and 8-tigloyloxyglaucolide A. The species of Groups 3 and 4 were characterized by the presence of putative acacetin 7-O-rutinoside and glaucolide B. Therefore, untargeted metabolomic approach combined with multivariate statistical analysis, as proposed herein, allowed the identification of potential chemotaxonomic markers, helping in the taxonomic classifications. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. The Dirichlet-Multinomial Model for Multivariate Randomized Response Data and Small Samples

    ERIC Educational Resources Information Center

    Avetisyan, Marianna; Fox, Jean-Paul

    2012-01-01

    In survey sampling the randomized response (RR) technique can be used to obtain truthful answers to sensitive questions. Although the individual answers are masked due to the RR technique, individual (sensitive) response rates can be estimated when observing multivariate response data. The beta-binomial model for binary RR data will be generalized…

  5. All-Possible-Subsets for MANOVA and Factorial MANOVAs: Less than a Weekend Project

    ERIC Educational Resources Information Center

    Nimon, Kim; Zientek, Linda Reichwein; Kraha, Amanda

    2016-01-01

    Multivariate techniques are increasingly popular as researchers attempt to accurately model a complex world. MANOVA is a multivariate technique used to investigate the dimensions along which groups differ, and how these dimensions may be used to predict group membership. A concern in a MANOVA analysis is to determine if a smaller subset of…

  6. Multivariate mixed linear model analysis of longitudinal data: an information-rich statistical technique for analyzing disease resistance data

    USDA-ARS?s Scientific Manuscript database

    The mixed linear model (MLM) is currently among the most advanced and flexible statistical modeling techniques and its use in tackling problems in plant pathology has begun surfacing in the literature. The longitudinal MLM is a multivariate extension that handles repeatedly measured data, such as r...

  7. Functional Path Analysis as a Multivariate Technique in Developing a Theory of Participation in Adult Education.

    ERIC Educational Resources Information Center

    Martin, James L.

    This paper reports on attempts by the author to construct a theoretical framework of adult education participation using a theory development process and the corresponding multivariate statistical techniques. Two problems are identified: the lack of theoretical framework in studying problems, and the limiting of statistical analysis to univariate…

  8. Classification of broiler breast fillets according to storage and to freeze-thaw treatment using near infrared spectroscopy and multivariate analysis

    USDA-ARS?s Scientific Manuscript database

    Visible/near-infrared (NIR) spectroscopy has shown potential for successfully classifying broiler breast fillets according to their texture properties. Freshness and shelf life are also important quality characteristics of boneless skinless chicken breast products in the marketplace. This study deal...

  9. Interpretable Early Classification of Multivariate Time Series

    ERIC Educational Resources Information Center

    Ghalwash, Mohamed F.

    2013-01-01

    Recent advances in technology have led to an explosion in data collection over time rather than in a single snapshot. For example, microarray technology allows us to measure gene expression levels in different conditions over time. Such temporal data grants the opportunity for data miners to develop algorithms to address domain-related problems,…

  10. Prediction of Gestational Diabetes through NMR Metabolomics of Maternal Blood.

    PubMed

    Pinto, Joana; Almeida, Lara M; Martins, Ana S; Duarte, Daniela; Barros, António S; Galhano, Eulália; Pita, Cristina; Almeida, Maria do Céu; Carreira, Isabel M; Gil, Ana M

    2015-06-05

    Metabolic biomarkers of pre- and postdiagnosis gestational diabetes mellitus (GDM) were sought, using nuclear magnetic resonance (NMR) metabolomics of maternal plasma and corresponding lipid extracts. Metabolite differences between controls and disease were identified through multivariate analysis of variable selected (1)H NMR spectra. For postdiagnosis GDM, partial least squares regression identified metabolites with higher dependence on normal gestational age evolution. Variable selection of NMR spectra produced good classification models for both pre- and postdiagnostic GDM. Prediagnosis GDM was accompanied by cholesterol increase and minor increases in lipoproteins (plasma), fatty acids, and triglycerides (extracts). Small metabolite changes comprised variations in glucose (up regulated), amino acids, betaine, urea, creatine, and metabolites related to gut microflora. Most changes were enhanced upon GDM diagnosis, in addition to newly observed changes in low-Mw compounds. GDM prediction seems possible exploiting multivariate profile changes rather than a set of univariate changes. Postdiagnosis GDM is successfully classified using a 26-resonance plasma biomarker. Plasma and extracts display comparable classification performance, the former enabling direct and more rapid analysis. Results and putative biochemical hypotheses require further confirmation in larger cohorts of distinct ethnicities.

  11. Learning a Mahalanobis Distance-Based Dynamic Time Warping Measure for Multivariate Time Series Classification.

    PubMed

    Mei, Jiangyuan; Liu, Meizhu; Wang, Yuan-Fang; Gao, Huijun

    2016-06-01

    Multivariate time series (MTS) datasets broadly exist in numerous fields, including health care, multimedia, finance, and biometrics. How to classify MTS accurately has become a hot research topic since it is an important element in many computer vision and pattern recognition applications. In this paper, we propose a Mahalanobis distance-based dynamic time warping (DTW) measure for MTS classification. The Mahalanobis distance builds an accurate relationship between each variable and its corresponding category. It is utilized to calculate the local distance between vectors in MTS. Then we use DTW to align those MTS which are out of synchronization or with different lengths. After that, how to learn an accurate Mahalanobis distance function becomes another key problem. This paper establishes a LogDet divergence-based metric learning with triplet constraint model which can learn Mahalanobis matrix with high precision and robustness. Furthermore, the proposed method is applied on nine MTS datasets selected from the University of California, Irvine machine learning repository and Robert T. Olszewski's homepage, and the results demonstrate the improved performance of the proposed approach.

  12. Automatic Classification of Sub-Techniques in Classical Cross-Country Skiing Using a Machine Learning Algorithm on Micro-Sensor Data

    PubMed Central

    Seeberg, Trine M.; Tjønnås, Johannes; Haugnes, Pål; Sandbakk, Øyvind

    2017-01-01

    The automatic classification of sub-techniques in classical cross-country skiing provides unique possibilities for analyzing the biomechanical aspects of outdoor skiing. This is currently possible due to the miniaturization and flexibility of wearable inertial measurement units (IMUs) that allow researchers to bring the laboratory to the field. In this study, we aimed to optimize the accuracy of the automatic classification of classical cross-country skiing sub-techniques by using two IMUs attached to the skier’s arm and chest together with a machine learning algorithm. The novelty of our approach is the reliable detection of individual cycles using a gyroscope on the skier’s arm, while a neural network machine learning algorithm robustly classifies each cycle to a sub-technique using sensor data from an accelerometer on the chest. In this study, 24 datasets from 10 different participants were separated into the categories training-, validation- and test-data. Overall, we achieved a classification accuracy of 93.9% on the test-data. Furthermore, we illustrate how an accurate classification of sub-techniques can be combined with data from standard sports equipment including position, altitude, speed and heart rate measuring systems. Combining this information has the potential to provide novel insight into physiological and biomechanical aspects valuable to coaches, athletes and researchers. PMID:29283421

  13. Two techniques for mapping and area estimation of small grains in California using Landsat digital data

    NASA Technical Reports Server (NTRS)

    Sheffner, E. J.; Hlavka, C. A.; Bauer, E. M.

    1984-01-01

    Two techniques have been developed for the mapping and area estimation of small grains in California from Landsat digital data. The two techniques are Band Ratio Thresholding, a semi-automated version of a manual procedure, and LCLS, a layered classification technique which can be fully automated and is based on established clustering and classification technology. Preliminary evaluation results indicate that the two techniques have potential for providing map products which can be incorporated into existing inventory procedures and automated alternatives to traditional inventory techniques and those which currently employ Landsat imagery.

  14. Landsat TM inventory and assessment of waterbird habitat in the southern altiplano of South America

    USGS Publications Warehouse

    Boyle, T.P.; Caziani, S.M.; Waltermire, R.G.

    2004-01-01

    The diverse set of wetlands in southern altiplano of South America supports a number of endemic and migratory waterbirds. These species include endangered endemic flamingos and shorebirds that nest in North America and winter in the altiplano. This research developed maps from nine Landsat Thematic Mapper (TM) images (254,300 km2) to provide an inventory of aquatic waterbird habitats. Image processing software was used to produce a map with a classification of wetlands according to the habitat requirements of different types of waterbirds. A hierarchical procedure was used to, first, isolate the bodies of water within the TM image; second, execute an unsupervised classification on the subsetted image to produce 300 signatures of cover types, which were further subdivided as necessary. Third, each of the classifications was examined in the light of field data and personal experience for relevance to the determination of the various habitat types. Finally, the signatures were applied to the entire image and other adjacent images to yield a map depicting the location of the various waterbird habitats in the southern altiplano. The data sets referenced with a global positioning system receiver were used to test the classification system. Multivariate analysis of the bird communities censused at each lake by individual habitats indicated a salinity gradient, and then the depth of the water separated the birds. Multivariate analysis of the chemical and physical data from the lakes showed that the variation in lakes were significantly associated with difference in depth, transparency, latitude, elevation, and pH. The presence of gravel bottoms was also one of the qualities distinguishing a group of lakes. This information will be directly useful to the Flamingo Census Project and serve as an element for risk assessment for future development.

  15. False alarm reduction by the And-ing of multiple multivariate Gaussian classifiers

    NASA Astrophysics Data System (ADS)

    Dobeck, Gerald J.; Cobb, J. Tory

    2003-09-01

    The high-resolution sonar is one of the principal sensors used by the Navy to detect and classify sea mines in minehunting operations. For such sonar systems, substantial effort has been devoted to the development of automated detection and classification (D/C) algorithms. These have been spurred by several factors including (1) aids for operators to reduce work overload, (2) more optimal use of all available data, and (3) the introduction of unmanned minehunting systems. The environments where sea mines are typically laid (harbor areas, shipping lanes, and the littorals) give rise to many false alarms caused by natural, biologic, and man-made clutter. The objective of the automated D/C algorithms is to eliminate most of these false alarms while still maintaining a very high probability of mine detection and classification (PdPc). In recent years, the benefits of fusing the outputs of multiple D/C algorithms have been studied. We refer to this as Algorithm Fusion. The results have been remarkable, including reliable robustness to new environments. This paper describes a method for training several multivariate Gaussian classifiers such that their And-ing dramatically reduces false alarms while maintaining a high probability of classification. This training approach is referred to as the Focused- Training method. This work extends our 2001-2002 work where the Focused-Training method was used with three other types of classifiers: the Attractor-based K-Nearest Neighbor Neural Network (a type of radial-basis, probabilistic neural network), the Optimal Discrimination Filter Classifier (based linear discrimination theory), and the Quadratic Penalty Function Support Vector Machine (QPFSVM). Although our experience has been gained in the area of sea mine detection and classification, the principles described herein are general and can be applied to a wide range of pattern recognition and automatic target recognition (ATR) problems.

  16. Landcover classification in MRF context using Dempster-Shafer fusion for multisensor imagery.

    PubMed

    Sarkar, Anjan; Banerjee, Anjan; Banerjee, Nilanjan; Brahma, Siddhartha; Kartikeyan, B; Chakraborty, Manab; Majumder, K L

    2005-05-01

    This work deals with multisensor data fusion to obtain landcover classification. The role of feature-level fusion using the Dempster-Shafer rule and that of data-level fusion in the MRF context is studied in this paper to obtain an optimally segmented image. Subsequently, segments are validated and classification accuracy for the test data is evaluated. Two examples of data fusion of optical images and a synthetic aperture radar image are presented, each set having been acquired on different dates. Classification accuracies of the technique proposed are compared with those of some recent techniques in literature for the same image data.

  17. An efficient ensemble learning method for gene microarray classification.

    PubMed

    Osareh, Alireza; Shadgar, Bita

    2013-01-01

    The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

  18. Maximum-likelihood techniques for joint segmentation-classification of multispectral chromosome images.

    PubMed

    Schwartzkopf, Wade C; Bovik, Alan C; Evans, Brian L

    2005-12-01

    Traditional chromosome imaging has been limited to grayscale images, but recently a 5-fluorophore combinatorial labeling technique (M-FISH) was developed wherein each class of chromosomes binds with a different combination of fluorophores. This results in a multispectral image, where each class of chromosomes has distinct spectral components. In this paper, we develop new methods for automatic chromosome identification by exploiting the multispectral information in M-FISH chromosome images and by jointly performing chromosome segmentation and classification. We (1) develop a maximum-likelihood hypothesis test that uses multispectral information, together with conventional criteria, to select the best segmentation possibility; (2) use this likelihood function to combine chromosome segmentation and classification into a robust chromosome identification system; and (3) show that the proposed likelihood function can also be used as a reliable indicator of errors in segmentation, errors in classification, and chromosome anomalies, which can be indicators of radiation damage, cancer, and a wide variety of inherited diseases. We show that the proposed multispectral joint segmentation-classification method outperforms past grayscale segmentation methods when decomposing touching chromosomes. We also show that it outperforms past M-FISH classification techniques that do not use segmentation information.

  19. Significance of clustering and classification applications in digital and physical libraries

    NASA Astrophysics Data System (ADS)

    Triantafyllou, Ioannis; Koulouris, Alexandros; Zervos, Spiros; Dendrinos, Markos; Giannakopoulos, Georgios

    2015-02-01

    Applications of clustering and classification techniques can be proved very significant in both digital and physical (paper-based) libraries. The most essential application, document classification and clustering, is crucial for the content that is produced and maintained in digital libraries, repositories, databases, social media, blogs etc., based on various tags and ontology elements, transcending the traditional library-oriented classification schemes. Other applications with very useful and beneficial role in the new digital library environment involve document routing, summarization and query expansion. Paper-based libraries can benefit as well since classification combined with advanced material characterization techniques such as FTIR (Fourier Transform InfraRed spectroscopy) can be vital for the study and prevention of material deterioration. An improved two-level self-organizing clustering architecture is proposed in order to enhance the discrimination capacity of the learning space, prior to classification, yielding promising results when applied to the above mentioned library tasks.

  20. Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.

    PubMed

    Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V

    2007-01-01

    The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.

  1. Formalized classification of moss litters in swampy spruce forests of intermontane depressions of Kuznetsk Alatau

    NASA Astrophysics Data System (ADS)

    Efremova, T. T.; Avrova, A. F.; Efremov, S. P.

    2016-09-01

    The approaches of multivariate statistics have been used for the numerical classification of morphogenetic types of moss litters in swampy spruce forests according to their physicochemical properties (the ash content, decomposition degree, bulk density, pH, mass, and thickness). Three clusters of moss litters— peat, peaty, and high-ash peaty—have been specified. The functions of classification for identification of new objects have been calculated and evaluated. The degree of decomposition and the ash content are the main classification parameters of litters, though all other characteristics are also statistically significant. The final prediction accuracy of the assignment of a litter to a particular cluster is 86%. Two leading factors participating in the clustering of litters have been determined. The first factor—the degree of transformation of plant remains (quality)—specifies 49% of the total variance, and the second factor—the accumulation rate (quantity)— specifies 26% of the total variance. The morphogenetic structure and physicochemical properties of the clusters of moss litters are characterized.

  2. Comparing statistical and machine learning classifiers: alternatives for predictive modeling in human factors research.

    PubMed

    Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann

    2003-01-01

    Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.

  3. Simultaneous fecal microbial and metabolite profiling enables accurate classification of pediatric irritable bowel syndrome.

    PubMed

    Shankar, Vijay; Reo, Nicholas V; Paliy, Oleg

    2015-12-09

    We previously showed that stool samples of pre-adolescent and adolescent US children diagnosed with diarrhea-predominant IBS (IBS-D) had different compositions of microbiota and metabolites compared to healthy age-matched controls. Here we explored whether observed fecal microbiota and metabolite differences between these two adolescent populations can be used to discriminate between IBS and health. We constructed individual microbiota- and metabolite-based sample classification models based on the partial least squares multivariate analysis and then applied a Bayesian approach to integrate individual models into a single classifier. The resulting combined classification achieved 84 % accuracy of correct sample group assignment and 86 % prediction for IBS-D in cross-validation tests. The performance of the cumulative classification model was further validated by the de novo analysis of stool samples from a small independent IBS-D cohort. High-throughput microbial and metabolite profiling of subject stool samples can be used to facilitate IBS diagnosis.

  4. Electroconvulsive therapy-induced brain functional connectivity predicts therapeutic efficacy in patients with schizophrenia: a multivariate pattern recognition study.

    PubMed

    Li, Peng; Jing, Ri-Xing; Zhao, Rong-Jiang; Ding, Zeng-Bo; Shi, Le; Sun, Hong-Qiang; Lin, Xiao; Fan, Teng-Teng; Dong, Wen-Tian; Fan, Yong; Lu, Lin

    2017-05-11

    Previous studies suggested that electroconvulsive therapy can influence regional metabolism and dopamine signaling, thereby alleviating symptoms of schizophrenia. It remains unclear what patients may benefit more from the treatment. The present study sought to identify biomarkers that predict the electroconvulsive therapy response in individual patients. Thirty-four schizophrenia patients and 34 controls were included in this study. Patients were scanned prior to treatment and after 6 weeks of treatment with antipsychotics only (n = 16) or a combination of antipsychotics and electroconvulsive therapy (n = 13). Subject-specific intrinsic connectivity networks were computed for each subject using a group information-guided independent component analysis technique. Classifiers were built to distinguish patients from controls and quantify brain states based on intrinsic connectivity networks. A general linear model was built on the classification scores of first scan (referred to as baseline classification scores) to predict treatment response. Classifiers built on the default mode network, the temporal lobe network, the language network, the corticostriatal network, the frontal-parietal network, and the cerebellum achieved a cross-validated classification accuracy of 83.82%, with specificity of 91.18% and sensitivity of 76.47%. After the electroconvulsive therapy, psychosis symptoms of the patients were relieved and classification scores of the patients were decreased. Moreover, the baseline classification scores were predictive for the treatment outcome. Schizophrenia patients exhibited functional deviations in multiple intrinsic connectivity networks which were able to distinguish patients from healthy controls at an individual level. Patients with lower classification scores prior to treatment had better treatment outcome, indicating that the baseline classification scores before treatment is a good predictor for treatment outcome. CONNECTIVITY NETWORKS REVEAL GOOD CANDIDATES FOR BRAIN STIMULATION: Connectivity patterns in the brain may help identify patients with schizophrenia most likely to benefit from electroconvulsive therapy. A team led by Lin Lu from Peking University, China, and Yong Fan from the University of Pennsylvania, USA, took functional magnetic resonance imaging (MRI) scans of 34 people with schizophrenia and 34 control individuals without mental illness. Those with schizophrenia were scanned before and after treatment; some received antipsychotics alone, others received medication plus electroconvulsive therapy. The researchers created organizational brain maps known as "intrinsic connectivity networks" for each individual, and showed that the neuroimaging pattern could discriminate between people with and without schizophrenia. For the schizophrenia patients, the connectivity networks taken prior to treatment also helped predict who would benefit from the brain-stimulation procedure. Such a biomarker could prove a useful diagnostic tool for clinicians.

  5. Gender, Race, and Survival: A Study in Non-Small-Cell Lung Cancer Brain Metastases Patients Utilizing the Radiation Therapy Oncology Group Recursive Partitioning Analysis Classification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Videtic, Gregory M.M., E-mail: videtig@ccf.or; Reddy, Chandana A.; Chao, Samuel T.

    Purpose: To explore whether gender and race influence survival in non-small-cell lung cancer (NSCLC) in patients with brain metastases, using our large single-institution brain tumor database and the Radiation Therapy Oncology Group recursive partitioning analysis (RPA) brain metastases classification. Methods and materials: A retrospective review of a single-institution brain metastasis database for the interval January 1982 to September 2004 yielded 835 NSCLC patients with brain metastases for analysis. Patient subsets based on combinations of gender, race, and RPA class were then analyzed for survival differences. Results: Median follow-up was 5.4 months (range, 0-122.9 months). There were 485 male patients (M)more » (58.4%) and 346 female patients (F) (41.6%). Of the 828 evaluable patients (99%), 143 (17%) were black/African American (B) and 685 (83%) were white/Caucasian (W). Median survival time (MST) from time of brain metastasis diagnosis for all patients was 5.8 months. Median survival time by gender (F vs. M) and race (W vs. B) was 6.3 months vs. 5.5 months (p = 0.013) and 6.0 months vs. 5.2 months (p = 0.08), respectively. For patients stratified by RPA class, gender, and race, MST significantly favored BFs over BMs in Class II: 11.2 months vs. 4.6 months (p = 0.021). On multivariable analysis, significant variables were gender (p = 0.041, relative risk [RR] 0.83) and RPA class (p < 0.0001, RR 0.28 for I vs. III; p < 0.0001, RR 0.51 for II vs. III) but not race. Conclusions: Gender significantly influences NSCLC brain metastasis survival. Race trended to significance in overall survival but was not significant on multivariable analysis. Multivariable analysis identified gender and RPA classification as significant variables with respect to survival.« less

  6. Diagnosis of rheumatoid arthritis: multivariate analysis of biomarkers.

    PubMed

    Wild, Norbert; Karl, Johann; Grunert, Veit P; Schmitt, Raluca I; Garczarek, Ursula; Krause, Friedemann; Hasler, Fritz; van Riel, Piet L C M; Bayer, Peter M; Thun, Matthias; Mattey, Derek L; Sharif, Mohammed; Zolg, Werner

    2008-02-01

    To test if a combination of biomarkers can increase the classification power of autoantibodies to cyclic citrullinated peptides (anti-CCP) in the diagnosis of rheumatoid arthritis (RA) depending on the diagnostic situation. Biomarkers were subject to three inclusion/exclusion criteria (discrimination between RA patients and healthy blood donors, ability to identify anti-CCP-negative RA patients, specificity in a panel with major non-rheumatological diseases) before univariate ranking and multivariate analysis was carried out using a modelling panel (n = 906). To enable the evaluation of the classification power in different diagnostic settings the disease controls (n = 542) were weighted according to the admission rates in rheumatology clinics modelling a clinic panel or according to the relative prevalences of musculoskeletal disorders in the general population seen by general practitioners modelling a GP panel. Out of 131 biomarkers considered originally, we evaluated 32 biomarkers in this study, of which only seven passed the three inclusion/exclusion criteria and were combined by multivariate analysis using four different mathematical models. In the modelled clinic panel, anti-CCP was the lead marker with a sensitivity of 75.8% and a specificity of 94.0%. Due to the lack in specificity of the markers other than anti-CCP in this diagnostic setting, any gain in sensitivity by any marker combination is off-set by a corresponding loss in specificity. In the modelled GP panel, the best marker combination of anti-CCP and interleukin (IL)-6 resulted in a sensitivity gain of 7.6% (85.9% vs. 78.3%) at a minor loss in specificity of 1.6% (90.3% vs. 91.9%) compared with anti-CCP as the best single marker. Depending on the composition of the sample panel, anti-CCP alone or anti-CCP in combination with IL-6 has the highest classification power for the diagnosis of established RA.

  7. Computer-Assisted Decision Support for Student Admissions Based on Their Predicted Academic Performance.

    PubMed

    Muratov, Eugene; Lewis, Margaret; Fourches, Denis; Tropsha, Alexander; Cox, Wendy C

    2017-04-01

    Objective. To develop predictive computational models forecasting the academic performance of students in the didactic-rich portion of a doctor of pharmacy (PharmD) curriculum as admission-assisting tools. Methods. All PharmD candidates over three admission cycles were divided into two groups: those who completed the PharmD program with a GPA ≥ 3; and the remaining candidates. Random Forest machine learning technique was used to develop a binary classification model based on 11 pre-admission parameters. Results. Robust and externally predictive models were developed that had particularly high overall accuracy of 77% for candidates with high or low academic performance. These multivariate models were highly accurate in predicting these groups to those obtained using undergraduate GPA and composite PCAT scores only. Conclusion. The models developed in this study can be used to improve the admission process as preliminary filters and thus quickly identify candidates who are likely to be successful in the PharmD curriculum.

  8. Vibrational biospectroscopy: from plants to animals to humans. A historical perspective

    NASA Astrophysics Data System (ADS)

    Shaw, R. Anthony; Mantsch, Henry H.

    1999-05-01

    Today, more than ever, vibrational spectroscopy means different things to different people. From their roots as molecular fingerprinting techniques, both infrared and Raman spectroscopy have evolved to the point where they play roles in a staggering variety of scientific endeavors. This survey focuses upon biological and medical applications. The past 40 years have witnessed enormous advances in our understanding of the building blocks of life, and vibrational spectroscopy has played an important role. That role is reviewed briefly here. In parallel with these efforts, the near-IR community developed powerful 'chemometric' methods to extract a wealth of information from spectra that appeared superficially featureless. As vibrational spectroscopy is finding new niches in the medical and clinical realms, these chemometric methods are proving to be a valuable (but not infallible!) adjunct to conventional spectral interpretation. This survey includes a brief outline of biomedical vibrational spectroscopy and imaging, including several representative examples to illustrate the strengths and pitfalls of a growing reliance upon multivariate quantitation and classification methods.

  9. Qualitative and quantitative analysis of milk for the detection of adulteration by Laser Induced Breakdown Spectroscopy (LIBS).

    PubMed

    Moncayo, S; Manzoor, S; Rosales, J D; Anzano, J; Caceres, J O

    2017-10-01

    The present work focuses on the development of a fast and cost effective method based on Laser Induced Breakdown Spectroscopy (LIBS) to the quality control, traceability and detection of adulteration in milk. Two adulteration cases have been studied; a qualitative analysis for the discrimination between different milk blends and quantification of melamine in adulterated toddler milk powder. Principal Component Analysis (PCA) and neural networks (NN) have been used to analyze LIBS spectra obtaining a correct classification rate of 98% with a 100% of robustness. For the quantification of melamine, two methodologies have been developed; univariate analysis using CN emission band and multivariate calibration NN model obtaining correlation coefficient (R 2 ) values of 0.982 and 0.999 respectively. The results of the use of LIBS technique coupled with chemometric analysis are discussed in terms of its potential use in the food industry to perform the quality control of this dairy product. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Evaluation criteria for software classification inventories, accuracies, and maps

    NASA Technical Reports Server (NTRS)

    Jayroe, R. R., Jr.

    1976-01-01

    Statistical criteria are presented for modifying the contingency table used to evaluate tabular classification results obtained from remote sensing and ground truth maps. This classification technique contains information on the spatial complexity of the test site, on the relative location of classification errors, on agreement of the classification maps with ground truth maps, and reduces back to the original information normally found in a contingency table.

  11. Computer-implemented land use classification with pattern recognition software and ERTS digital data. [Mississippi coastal plains

    NASA Technical Reports Server (NTRS)

    Joyce, A. T.

    1974-01-01

    Significant progress has been made in the classification of surface conditions (land uses) with computer-implemented techniques based on the use of ERTS digital data and pattern recognition software. The supervised technique presently used at the NASA Earth Resources Laboratory is based on maximum likelihood ratioing with a digital table look-up approach to classification. After classification, colors are assigned to the various surface conditions (land uses) classified, and the color-coded classification is film recorded on either positive or negative 9 1/2 in. film at the scale desired. Prints of the film strips are then mosaicked and photographed to produce a land use map in the format desired. Computer extraction of statistical information is performed to show the extent of each surface condition (land use) within any given land unit that can be identified in the image. Evaluations of the product indicate that classification accuracy is well within the limits for use by land resource managers and administrators. Classifications performed with digital data acquired during different seasons indicate that the combination of two or more classifications offer even better accuracy.

  12. Motor Oil Classification using Color Histograms and Pattern Recognition Techniques.

    PubMed

    Ahmadi, Shiva; Mani-Varnosfaderani, Ahmad; Habibi, Biuck

    2018-04-20

    Motor oil classification is important for quality control and the identification of oil adulteration. In thiswork, we propose a simple, rapid, inexpensive and nondestructive approach based on image analysis and pattern recognition techniques for the classification of nine different types of motor oils according to their corresponding color histograms. For this, we applied color histogram in different color spaces such as red green blue (RGB), grayscale, and hue saturation intensity (HSI) in order to extract features that can help with the classification procedure. These color histograms and their combinations were used as input for model development and then were statistically evaluated by using linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and support vector machine (SVM) techniques. Here, two common solutions for solving a multiclass classification problem were applied: (1) transformation to binary classification problem using a one-against-all (OAA) approach and (2) extension from binary classifiers to a single globally optimized multilabel classification model. In the OAA strategy, LDA, QDA, and SVM reached up to 97% in terms of accuracy, sensitivity, and specificity for both the training and test sets. In extension from binary case, despite good performances by the SVM classification model, QDA and LDA provided better results up to 92% for RGB-grayscale-HSI color histograms and up to 93% for the HSI color map, respectively. In order to reduce the numbers of independent variables for modeling, a principle component analysis algorithm was used. Our results suggest that the proposed method is promising for the identification and classification of different types of motor oils.

  13. Determination of fragrance content in perfume by Raman spectroscopy and multivariate calibration

    NASA Astrophysics Data System (ADS)

    Godinho, Robson B.; Santos, Mauricio C.; Poppi, Ronei J.

    2016-03-01

    An alternative methodology is herein proposed for determination of fragrance content in perfumes and their classification according to the guidelines established by fine perfume manufacturers. The methodology is based on Raman spectroscopy associated with multivariate calibration, allowing the determination of fragrance content in a fast, nondestructive, and sustainable manner. The results were considered consistent with the conventional method, whose standard error of prediction values was lower than the 1.0%. This result indicates that the proposed technology is a feasible analytical tool for determination of the fragrance content in a hydro-alcoholic solution for use in manufacturing, quality control and regulatory agencies.

  14. Multivariate assessment of event-related potentials with the t-CWT method.

    PubMed

    Bostanov, Vladimir

    2015-11-05

    Event-related brain potentials (ERPs) are usually assessed with univariate statistical tests although they are essentially multivariate objects. Brain-computer interface applications are a notable exception to this practice, because they are based on multivariate classification of single-trial ERPs. Multivariate ERP assessment can be facilitated by feature extraction methods. One such method is t-CWT, a mathematical-statistical algorithm based on the continuous wavelet transform (CWT) and Student's t-test. This article begins with a geometric primer on some basic concepts of multivariate statistics as applied to ERP assessment in general and to the t-CWT method in particular. Further, it presents for the first time a detailed, step-by-step, formal mathematical description of the t-CWT algorithm. A new multivariate outlier rejection procedure based on principal component analysis in the frequency domain is presented as an important pre-processing step. The MATLAB and GNU Octave implementation of t-CWT is also made publicly available for the first time as free and open source code. The method is demonstrated on some example ERP data obtained in a passive oddball paradigm. Finally, some conceptually novel applications of the multivariate approach in general and of the t-CWT method in particular are suggested and discussed. Hopefully, the publication of both the t-CWT source code and its underlying mathematical algorithm along with a didactic geometric introduction to some basic concepts of multivariate statistics would make t-CWT more accessible to both users and developers in the field of neuroscience research.

  15. Multivariate stochastic simulation with subjective multivariate normal distributions

    Treesearch

    P. J. Ince; J. Buongiorno

    1991-01-01

    In many applications of Monte Carlo simulation in forestry or forest products, it may be known that some variables are correlated. However, for simplicity, in most simulations it has been assumed that random variables are independently distributed. This report describes an alternative Monte Carlo simulation technique for subjectively assesed multivariate normal...

  16. Model-based Clustering of High-Dimensional Data in Astrophysics

    NASA Astrophysics Data System (ADS)

    Bouveyron, C.

    2016-05-01

    The nature of data in Astrophysics has changed, as in other scientific fields, in the past decades due to the increase of the measurement capabilities. As a consequence, data are nowadays frequently of high dimensionality and available in mass or stream. Model-based techniques for clustering are popular tools which are renowned for their probabilistic foundations and their flexibility. However, classical model-based techniques show a disappointing behavior in high-dimensional spaces which is mainly due to their dramatical over-parametrization. The recent developments in model-based classification overcome these drawbacks and allow to efficiently classify high-dimensional data, even in the "small n / large p" situation. This work presents a comprehensive review of these recent approaches, including regularization-based techniques, parsimonious modeling, subspace classification methods and classification methods based on variable selection. The use of these model-based methods is also illustrated on real-world classification problems in Astrophysics using R packages.

  17. Intrapartum fetal heart rate classification from trajectory in Sparse SVM feature space.

    PubMed

    Spilka, J; Frecon, J; Leonarduzzi, R; Pustelnik, N; Abry, P; Doret, M

    2015-01-01

    Intrapartum fetal heart rate (FHR) constitutes a prominent source of information for the assessment of fetal reactions to stress events during delivery. Yet, early detection of fetal acidosis remains a challenging signal processing task. The originality of the present contribution are three-fold: multiscale representations and wavelet leader based multifractal analysis are used to quantify FHR variability ; Supervised classification is achieved by means of Sparse-SVM that aim jointly to achieve optimal detection performance and to select relevant features in a multivariate setting ; Trajectories in the feature space accounting for the evolution along time of features while labor progresses are involved in the construction of indices quantifying fetal health. The classification performance permitted by this combination of tools are quantified on a intrapartum FHR large database (≃ 1250 subjects) collected at a French academic public hospital.

  18. An application of LANDSAT multispectral imagery for the classification of hydrobiological systems, Shark River Slough, Everglades National Park, Florida

    NASA Technical Reports Server (NTRS)

    Rose, P. W.; Rosendahl, P. C. (Principal Investigator)

    1979-01-01

    Multivariant hydrologic parameters over the Shark River Slough were investigated. Ground truth was established utilizing U-2 infrared photography and comprehensive field data to define a control network which represented all hydrobiological systems in the slough. These data were then applied to LANDSAT imagery utilizing an interactive multispectral processor which generated hydrographic maps through classification of the slough and defined the multispectral surface radiance characteristics of the wetlands areas in the park. The spectral response of each hydrobiological zone was determined and plotted to formulate multispectral relationships between the emittent energy from the slough in order to determine the best possible multispectral wavelength combinations to enhance classification results. The extent of each hydrobiological zone in slough was determined and flow vectors for water movement throughout the slough established.

  19. a Single-Exposure Dual-Energy Computed Radiography Technique for Improved Nodule Detection and Classification in Chest Imaging

    NASA Astrophysics Data System (ADS)

    Zink, Frank Edward

    The detection and classification of pulmonary nodules is of great interest in chest radiography. Nodules are often indicative of primary cancer, and their detection is particularly important in asymptomatic patients. The ability to classify nodules as calcified or non-calcified is important because calcification is a positive indicator that the nodule is benign. Dual-energy methods offer the potential to improve both the detection and classification of nodules by allowing the formation of material-selective images. Tissue-selective images can improve detection by virtue of the elimination of obscuring rib structure. Bone -selective images are essentially calcium images, allowing classification of the nodule. A dual-energy technique is introduced which uses a computed radiography system to acquire dual-energy chest radiographs in a single-exposure. All aspects of the dual-energy technique are described, with particular emphasis on scatter-correction, beam-hardening correction, and noise-reduction algorithms. The adaptive noise-reduction algorithm employed improves material-selective signal-to-noise ratio by up to a factor of seven with minimal sacrifice in selectivity. A clinical comparison study is described, undertaken to compare the dual-energy technique to conventional chest radiography for the tasks of nodule detection and classification. Observer performance data were collected using the Free Response Observer Characteristic (FROC) method and the bi-normal Alternative FROC (AFROC) performance model. Results of the comparison study, analyzed using two common multiple observer statistical models, showed that the dual-energy technique was superior to conventional chest radiography for detection of nodules at a statistically significant level (p < .05). Discussion of the comparison study emphasizes the unique combination of data collection and analysis techniques employed, as well as the limitations of comparison techniques in the larger context of technology assessment.

  20. Physical Human Activity Recognition Using Wearable Sensors.

    PubMed

    Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine

    2015-12-11

    This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors' placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject.

  1. Physical Human Activity Recognition Using Wearable Sensors

    PubMed Central

    Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine

    2015-01-01

    This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors’ placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject. PMID:26690450

  2. Virtual Sensor of Surface Electromyography in a New Extensive Fault-Tolerant Classification System.

    PubMed

    de Moura, Karina de O A; Balbinot, Alexandre

    2018-05-01

    A few prosthetic control systems in the scientific literature obtain pattern recognition algorithms adapted to changes that occur in the myoelectric signal over time and, frequently, such systems are not natural and intuitive. These are some of the several challenges for myoelectric prostheses for everyday use. The concept of the virtual sensor, which has as its fundamental objective to estimate unavailable measures based on other available measures, is being used in other fields of research. The virtual sensor technique applied to surface electromyography can help to minimize these problems, typically related to the degradation of the myoelectric signal that usually leads to a decrease in the classification accuracy of the movements characterized by computational intelligent systems. This paper presents a virtual sensor in a new extensive fault-tolerant classification system to maintain the classification accuracy after the occurrence of the following contaminants: ECG interference, electrode displacement, movement artifacts, power line interference, and saturation. The Time-Varying Autoregressive Moving Average (TVARMA) and Time-Varying Kalman filter (TVK) models are compared to define the most robust model for the virtual sensor. Results of movement classification were presented comparing the usual classification techniques with the method of the degraded signal replacement and classifier retraining. The experimental results were evaluated for these five noise types in 16 surface electromyography (sEMG) channel degradation case studies. The proposed system without using classifier retraining techniques recovered of mean classification accuracy was of 4% to 38% for electrode displacement, movement artifacts, and saturation noise. The best mean classification considering all signal contaminants and channel combinations evaluated was the classification using the retraining method, replacing the degraded channel by the virtual sensor TVARMA model. This method recovered the classification accuracy after the degradations, reaching an average of 5.7% below the classification of the clean signal, that is the signal without the contaminants or the original signal. Moreover, the proposed intelligent technique minimizes the impact of the motion classification caused by signal contamination related to degrading events over time. There are improvements in the virtual sensor model and in the algorithm optimization that need further development to provide an increase the clinical application of myoelectric prostheses but already presents robust results to enable research with virtual sensors on biological signs with stochastic behavior.

  3. Virtual Sensor of Surface Electromyography in a New Extensive Fault-Tolerant Classification System

    PubMed Central

    Balbinot, Alexandre

    2018-01-01

    A few prosthetic control systems in the scientific literature obtain pattern recognition algorithms adapted to changes that occur in the myoelectric signal over time and, frequently, such systems are not natural and intuitive. These are some of the several challenges for myoelectric prostheses for everyday use. The concept of the virtual sensor, which has as its fundamental objective to estimate unavailable measures based on other available measures, is being used in other fields of research. The virtual sensor technique applied to surface electromyography can help to minimize these problems, typically related to the degradation of the myoelectric signal that usually leads to a decrease in the classification accuracy of the movements characterized by computational intelligent systems. This paper presents a virtual sensor in a new extensive fault-tolerant classification system to maintain the classification accuracy after the occurrence of the following contaminants: ECG interference, electrode displacement, movement artifacts, power line interference, and saturation. The Time-Varying Autoregressive Moving Average (TVARMA) and Time-Varying Kalman filter (TVK) models are compared to define the most robust model for the virtual sensor. Results of movement classification were presented comparing the usual classification techniques with the method of the degraded signal replacement and classifier retraining. The experimental results were evaluated for these five noise types in 16 surface electromyography (sEMG) channel degradation case studies. The proposed system without using classifier retraining techniques recovered of mean classification accuracy was of 4% to 38% for electrode displacement, movement artifacts, and saturation noise. The best mean classification considering all signal contaminants and channel combinations evaluated was the classification using the retraining method, replacing the degraded channel by the virtual sensor TVARMA model. This method recovered the classification accuracy after the degradations, reaching an average of 5.7% below the classification of the clean signal, that is the signal without the contaminants or the original signal. Moreover, the proposed intelligent technique minimizes the impact of the motion classification caused by signal contamination related to degrading events over time. There are improvements in the virtual sensor model and in the algorithm optimization that need further development to provide an increase the clinical application of myoelectric prostheses but already presents robust results to enable research with virtual sensors on biological signs with stochastic behavior. PMID:29723994

  4. Uses of Multivariate Analytical Techniques in Online and Blended Business Education: An Assessment of Current Practice and Recommendations for Future Research

    ERIC Educational Resources Information Center

    Arbaugh, J. B.; Hwang, Alvin

    2013-01-01

    Seeking to assess the analytical rigor of empirical research in management education, this article reviews the use of multivariate statistical techniques in 85 studies of online and blended management education over the past decade and compares them with prescriptions offered by both the organization studies and educational research communities.…

  5. Multimodal Neuroimaging: Basic Concepts and Classification of Neuropsychiatric Diseases.

    PubMed

    Tulay, Emine Elif; Metin, Barış; Tarhan, Nevzat; Arıkan, Mehmet Kemal

    2018-06-01

    Neuroimaging techniques are widely used in neuroscience to visualize neural activity, to improve our understanding of brain mechanisms, and to identify biomarkers-especially for psychiatric diseases; however, each neuroimaging technique has several limitations. These limitations led to the development of multimodal neuroimaging (MN), which combines data obtained from multiple neuroimaging techniques, such as electroencephalography, functional magnetic resonance imaging, and yields more detailed information about brain dynamics. There are several types of MN, including visual inspection, data integration, and data fusion. This literature review aimed to provide a brief summary and basic information about MN techniques (data fusion approaches in particular) and classification approaches. Data fusion approaches are generally categorized as asymmetric and symmetric. The present review focused exclusively on studies based on symmetric data fusion methods (data-driven methods), such as independent component analysis and principal component analysis. Machine learning techniques have recently been introduced for use in identifying diseases and biomarkers of disease. The machine learning technique most widely used by neuroscientists is classification-especially support vector machine classification. Several studies differentiated patients with psychiatric diseases and healthy controls with using combined datasets. The common conclusion among these studies is that the prediction of diseases increases when combining data via MN techniques; however, there remain a few challenges associated with MN, such as sample size. Perhaps in the future N-way fusion can be used to combine multiple neuroimaging techniques or nonimaging predictors (eg, cognitive ability) to overcome the limitations of MN.

  6. SVM classification of microaneurysms with imbalanced dataset based on borderline-SMOTE and data cleaning techniques

    NASA Astrophysics Data System (ADS)

    Wang, Qingjie; Xin, Jingmin; Wu, Jiayi; Zheng, Nanning

    2017-03-01

    Microaneurysms are the earliest clinic signs of diabetic retinopathy, and many algorithms were developed for the automatic classification of these specific pathology. However, the imbalanced class distribution of dataset usually causes the classification accuracy of true microaneurysms be low. Therefore, by combining the borderline synthetic minority over-sampling technique (BSMOTE) with the data cleaning techniques such as Tomek links and Wilson's edited nearest neighbor rule (ENN) to resample the imbalanced dataset, we propose two new support vector machine (SVM) classification algorithms for the microaneurysms. The proposed BSMOTE-Tomek and BSMOTE-ENN algorithms consist of: 1) the adaptive synthesis of the minority samples in the neighborhood of the borderline, and 2) the remove of redundant training samples for improving the efficiency of data utilization. Moreover, the modified SVM classifier with probabilistic outputs is used to divide the microaneurysm candidates into two groups: true microaneurysms and false microaneurysms. The experiments with a public microaneurysms database shows that the proposed algorithms have better classification performance including the receiver operating characteristic (ROC) curve and the free-response receiver operating characteristic (FROC) curve.

  7. Multimodal Feature Integration in the Angular Gyrus during Episodic and Semantic Retrieval

    PubMed Central

    Bonnici, Heidi M.; Richter, Franziska R.; Yazar, Yasemin

    2016-01-01

    Much evidence from distinct lines of investigation indicates the involvement of angular gyrus (AnG) in the retrieval of both episodic and semantic information, but the region's precise function and whether that function differs across episodic and semantic retrieval have yet to be determined. We used univariate and multivariate fMRI analysis methods to examine the role of AnG in multimodal feature integration during episodic and semantic retrieval. Human participants completed episodic and semantic memory tasks involving unimodal (auditory or visual) and multimodal (audio-visual) stimuli. Univariate analyses revealed the recruitment of functionally distinct AnG subregions during the retrieval of episodic and semantic information. Consistent with a role in multimodal feature integration during episodic retrieval, significantly greater AnG activity was observed during retrieval of integrated multimodal episodic memories compared with unimodal episodic memories. Multivariate classification analyses revealed that individual multimodal episodic memories could be differentiated in AnG, with classification accuracy tracking the vividness of participants' reported recollections, whereas distinct unimodal memories were represented in sensory association areas only. In contrast to episodic retrieval, AnG was engaged to a statistically equivalent degree during retrieval of unimodal and multimodal semantic memories, suggesting a distinct role for AnG during semantic retrieval. Modality-specific sensory association areas exhibited corresponding activity during both episodic and semantic retrieval, which mirrored the functional specialization of these regions during perception. The results offer new insights into the integrative processes subserved by AnG and its contribution to our subjective experience of remembering. SIGNIFICANCE STATEMENT Using univariate and multivariate fMRI analyses, we provide evidence that functionally distinct subregions of angular gyrus (AnG) contribute to the retrieval of episodic and semantic memories. Our multivariate pattern classifier could distinguish episodic memory representations in AnG according to whether they were multimodal (audio-visual) or unimodal (auditory or visual) in nature, whereas statistically equivalent AnG activity was observed during retrieval of unimodal and multimodal semantic memories. Classification accuracy during episodic retrieval scaled with the trial-by-trial vividness with which participants experienced their recollections. Therefore, the findings offer new insights into the integrative processes subserved by AnG and how its function may contribute to our subjective experience of remembering. PMID:27194327

  8. Multimodal Feature Integration in the Angular Gyrus during Episodic and Semantic Retrieval.

    PubMed

    Bonnici, Heidi M; Richter, Franziska R; Yazar, Yasemin; Simons, Jon S

    2016-05-18

    Much evidence from distinct lines of investigation indicates the involvement of angular gyrus (AnG) in the retrieval of both episodic and semantic information, but the region's precise function and whether that function differs across episodic and semantic retrieval have yet to be determined. We used univariate and multivariate fMRI analysis methods to examine the role of AnG in multimodal feature integration during episodic and semantic retrieval. Human participants completed episodic and semantic memory tasks involving unimodal (auditory or visual) and multimodal (audio-visual) stimuli. Univariate analyses revealed the recruitment of functionally distinct AnG subregions during the retrieval of episodic and semantic information. Consistent with a role in multimodal feature integration during episodic retrieval, significantly greater AnG activity was observed during retrieval of integrated multimodal episodic memories compared with unimodal episodic memories. Multivariate classification analyses revealed that individual multimodal episodic memories could be differentiated in AnG, with classification accuracy tracking the vividness of participants' reported recollections, whereas distinct unimodal memories were represented in sensory association areas only. In contrast to episodic retrieval, AnG was engaged to a statistically equivalent degree during retrieval of unimodal and multimodal semantic memories, suggesting a distinct role for AnG during semantic retrieval. Modality-specific sensory association areas exhibited corresponding activity during both episodic and semantic retrieval, which mirrored the functional specialization of these regions during perception. The results offer new insights into the integrative processes subserved by AnG and its contribution to our subjective experience of remembering. Using univariate and multivariate fMRI analyses, we provide evidence that functionally distinct subregions of angular gyrus (AnG) contribute to the retrieval of episodic and semantic memories. Our multivariate pattern classifier could distinguish episodic memory representations in AnG according to whether they were multimodal (audio-visual) or unimodal (auditory or visual) in nature, whereas statistically equivalent AnG activity was observed during retrieval of unimodal and multimodal semantic memories. Classification accuracy during episodic retrieval scaled with the trial-by-trial vividness with which participants experienced their recollections. Therefore, the findings offer new insights into the integrative processes subserved by AnG and how its function may contribute to our subjective experience of remembering. Copyright © 2016 Bonnici, Richter, et al.

  9. Thematic mapper design parameter investigation

    NASA Technical Reports Server (NTRS)

    Colby, C. P., Jr.; Wheeler, S. G.

    1978-01-01

    This study simulated the multispectral data sets to be expected from three different Thematic Mapper configurations, and the ground processing of these data sets by three different resampling techniques. The simulated data sets were then evaluated by processing them for multispectral classification, and the Thematic Mapper configuration, and resampling technique which provided the best classification accuracy were identified.

  10. Adjustment of localized alveolar ridge defects by soft tissue transplantation to improve mucogingival esthetics: a proposal for clinical classification and an evaluation of procedures.

    PubMed

    Studer, S; Naef, R; Schärer, P

    1997-12-01

    Esthetically correct treatment of a localized alveolar ridge defect is a frequent prosthetic challenge. Such defects can be overcome not only by a variety of prosthetic means, but also by several periodontal surgical techniques, notably soft tissue augmentations. Preoperative classification of the localized alveolar ridge defect can be greatly useful in evaluating the prognosis and technical difficulties involved. A semiquantitative classification, dependent on the severity of vertical and horizontal dimensional loss, is proposed to supplement the recognized qualitative classification of a ridge defect. Various methods of soft tissue augmentation are evaluated, based on initial volumetric measurements. The roll flap technique is proposed when the problem is related to ridge quality (single-tooth defect with little horizontal and vertical loss). Larger defects in which a volumetric problem must be solved are corrected through the subepithelial connective tissue technique. Additional mucogingival problems (eg, insufficient gingival width, high frenum, gingival scarring, or tattoo) should not be corrected simultaneously with augmentation procedures. In these cases, the onlay transplant technique is favored.

  11. Signal analysis techniques for incipient failure detection in turbomachinery

    NASA Technical Reports Server (NTRS)

    Coffin, T.

    1985-01-01

    Signal analysis techniques for the detection and classification of incipient mechanical failures in turbomachinery were developed, implemented and evaluated. Signal analysis techniques available to describe dynamic measurement characteristics are reviewed. Time domain and spectral methods are described, and statistical classification in terms of moments is discussed. Several of these waveform analysis techniques were implemented on a computer and applied to dynamic signals. A laboratory evaluation of the methods with respect to signal detection capability is described. Plans for further technique evaluation and data base development to characterize turbopump incipient failure modes from Space Shuttle main engine (SSME) hot firing measurements are outlined.

  12. Stochastic modelling of temperatures affecting the in situ performance of a solar-assisted heat pump: The multivariate approach and physical interpretation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Loveday, D.L.; Craggs, C.

    Box-Jenkins-based multivariate stochastic modeling is carried out using data recorded from a domestic heating system. The system comprises an air-source heat pump sited in the roof space of a house, solar assistance being provided by the conventional tile roof acting as a radiation absorber. Multivariate models are presented which illustrate the time-dependent relationships between three air temperatures - at external ambient, at entry to, and at exit from, the heat pump evaporator. Using a deterministic modeling approach, physical interpretations are placed on the results of the multivariate technique. It is concluded that the multivariate Box-Jenkins approach is a suitable techniquemore » for building thermal analysis. Application to multivariate Box-Jenkins approach is a suitable technique for building thermal analysis. Application to multivariate model-based control is discussed, with particular reference to building energy management systems. It is further concluded that stochastic modeling of data drawn from a short monitoring period offers a means of retrofitting an advanced model-based control system in existing buildings, which could be used to optimize energy savings. An approach to system simulation is suggested.« less

  13. The Pathways for Intelligible Speech: Multivariate and Univariate Perspectives

    PubMed Central

    Evans, S.; Kyong, J.S.; Rosen, S.; Golestani, N.; Warren, J.E.; McGettigan, C.; Mourão-Miranda, J.; Wise, R.J.S.; Scott, S.K.

    2014-01-01

    An anterior pathway, concerned with extracting meaning from sound, has been identified in nonhuman primates. An analogous pathway has been suggested in humans, but controversy exists concerning the degree of lateralization and the precise location where responses to intelligible speech emerge. We have demonstrated that the left anterior superior temporal sulcus (STS) responds preferentially to intelligible speech (Scott SK, Blank CC, Rosen S, Wise RJS. 2000. Identification of a pathway for intelligible speech in the left temporal lobe. Brain. 123:2400–2406.). A functional magnetic resonance imaging study in Cerebral Cortex used equivalent stimuli and univariate and multivariate analyses to argue for the greater importance of bilateral posterior when compared with the left anterior STS in responding to intelligible speech (Okada K, Rong F, Venezia J, Matchin W, Hsieh IH, Saberi K, Serences JT,Hickok G. 2010. Hierarchical organization of human auditory cortex: evidence from acoustic invariance in the response to intelligible speech. 20: 2486–2495.). Here, we also replicate our original study, demonstrating that the left anterior STS exhibits the strongest univariate response and, in decoding using the bilateral temporal cortex, contains the most informative voxels showing an increased response to intelligible speech. In contrast, in classifications using local “searchlights” and a whole brain analysis, we find greater classification accuracy in posterior rather than anterior temporal regions. Thus, we show that the precise nature of the multivariate analysis used will emphasize different response profiles associated with complex sound to speech processing. PMID:23585519

  14. A Multivariate Analytic Approach to the Differential Diagnosis of Apraxia of Speech

    ERIC Educational Resources Information Center

    Basilakos, Alexandra; Yourganov, Grigori; den Ouden, Dirk-Bart; Fogerty, Daniel; Rorden, Chris; Feenaughty, Lynda; Fridriksson, Julius

    2017-01-01

    Purpose: Apraxia of speech (AOS) is a consequence of stroke that frequently co-occurs with aphasia. Its study is limited by difficulties with its perceptual evaluation and dissociation from co-occurring impairments. This study examined the classification accuracy of several acoustic measures for the differential diagnosis of AOS in a sample of…

  15. An unconventional approach to ecosystem unit classification in western North Carolina, USA

    Treesearch

    W. Henry McNab; Sara A. Browning; Steven A. Simon; Penelope E. Fouts

    1999-01-01

    The authors used an unconventional combination of data transformation and multivariate analyses to reduce subjectivity in identification of ecosystem units in a mountainous region of western North Carolina, USA. Vegetative cover and environmental variables were measured on 79 stratified, randomly located, 0.1 ha sample plots in a 4000 ha watershed. Binary...

  16. On the classification techniques in data mining for microarray data classification

    NASA Astrophysics Data System (ADS)

    Aydadenta, Husna; Adiwijaya

    2018-03-01

    Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.

  17. Computer implemented classification of vegetation using aircraft acquired multispectral scanner data

    NASA Technical Reports Server (NTRS)

    Cibula, W. G.

    1975-01-01

    The use of aircraft 24-channel multispectral scanner data in conjunction with computer processing techniques to obtain an automated classification of plant species association was discussed. The classification of various plant species associations was related to information needed for specific applications. In addition, the necessity for multiple selection of training fields for a single class in situations where the study area consists of highly irregular terrain was detailed. A single classification was illuminated differently in different areas, resulting in the existence of multiple spectral signatures for a given class. These different signatures result since different qualities of radiation upwell to the detector from portions that have differing qualities of incident radiation. Techniques of training field selection were outlined, and a classification obtained from a natural area in Tishomingo State Park in northern Mississippi was presented.

  18. Supervision of Ethylene Propylene Diene M-Class (EPDM) Rubber Vulcanization and Recovery Processes Using Attenuated Total Reflection Fourier Transform Infrared (ATR FT-IR) Spectroscopy and Multivariate Analysis.

    PubMed

    Riba Ruiz, Jordi-Roger; Canals, Trini; Cantero, Rosa

    2017-01-01

    Ethylene propylene diene monomer (EPDM) rubber is widely used in a diverse type of applications, such as the automotive, industrial and construction sectors among others. Due to its appealing features, the consumption of vulcanized EPDM rubber is growing significantly. However, environmental issues are forcing the application of devulcanization processes to facilitate recovery, which has led rubber manufacturers to implement strict quality controls. Consequently, it is important to develop methods for supervising the vulcanizing and recovery processes of such products. This paper deals with the supervision process of EPDM compounds by means of Fourier transform mid-infrared (FT-IR) spectroscopy and suitable multivariate statistical methods. An expedited and nondestructive classification approach was applied to a sufficient number of EPDM samples with different applied processes, that is, with and without application of vulcanizing agents, vulcanized samples, and microwave treated samples. First the FT-IR spectra of the samples is acquired and next it is processed by applying suitable feature extraction methods, i.e., principal component analysis and canonical variate analysis to obtain the latent variables to be used for classifying test EPDM samples. Finally, the k nearest neighbor algorithm was used in the classification stage. Experimental results prove the accuracy of the proposed method and the potential of FT-IR spectroscopy in this area, since the classification accuracy can be as high as 100%.

  19. Periodontal inflamed surface area as a novel numerical variable describing periodontal conditions

    PubMed Central

    2017-01-01

    Purpose A novel index, the periodontal inflamed surface area (PISA), represents the sum of the periodontal pocket depth of bleeding on probing (BOP)-positive sites. In the present study, we evaluated correlations between PISA and periodontal classifications, and examined PISA as an index integrating the discrete conventional periodontal indexes. Methods This study was a cross-sectional subgroup analysis of data from a prospective cohort study investigating the association between chronic periodontitis and the clinical features of ankylosing spondylitis. Data from 84 patients without systemic diseases (the control group in the previous study) were analyzed in the present study. Results PISA values were positively correlated with conventional periodontal classifications (Spearman correlation coefficient=0.52; P<0.01) and with periodontal indexes, such as BOP and the plaque index (PI) (r=0.94; P<0.01 and r=0.60; P<0.01, respectively; Pearson correlation test). Porphyromonas gingivalis (P. gingivalis) expression and the presence of serum P. gingivalis antibodies were significant factors affecting PISA values in a simple linear regression analysis, together with periodontal classification, PI, bleeding index, and smoking, but not in the multivariate analysis. In the multivariate linear regression analysis, PISA values were positively correlated with the quantity of current smoking, PI, and severity of periodontal disease. Conclusions PISA integrates multiple periodontal indexes, such as probing pocket depth, BOP, and PI into a numerical variable. PISA is advantageous for quantifying periodontal inflammation and plaque accumulation. PMID:29093989

  20. Predicting clinical diagnosis in Huntington's disease: An imaging polymarker

    PubMed Central

    Daws, Richard E.; Soreq, Eyal; Johnson, Eileanoir B.; Scahill, Rachael I.; Tabrizi, Sarah J.; Barker, Roger A.; Hampshire, Adam

    2018-01-01

    Objective Huntington's disease (HD) gene carriers can be identified before clinical diagnosis; however, statistical models for predicting when overt motor symptoms will manifest are too imprecise to be useful at the level of the individual. Perfecting this prediction is integral to the search for disease modifying therapies. This study aimed to identify an imaging marker capable of reliably predicting real‐life clinical diagnosis in HD. Method A multivariate machine learning approach was applied to resting‐state and structural magnetic resonance imaging scans from 19 premanifest HD gene carriers (preHD, 8 of whom developed clinical disease in the 5 years postscanning) and 21 healthy controls. A classification model was developed using cross‐group comparisons between preHD and controls, and within the preHD group in relation to “estimated” and “actual” proximity to disease onset. Imaging measures were modeled individually, and combined, and permutation modeling robustly tested classification accuracy. Results Classification performance for preHDs versus controls was greatest when all measures were combined. The resulting polymarker predicted converters with high accuracy, including those who were not expected to manifest in that time scale based on the currently adopted statistical models. Interpretation We propose that a holistic multivariate machine learning treatment of brain abnormalities in the premanifest phase can be used to accurately identify those patients within 5 years of developing motor features of HD, with implications for prognostication and preclinical trials. Ann Neurol 2018;83:532–543 PMID:29405351

  1. Characterizing multivariate decoding models based on correlated EEG spectral features.

    PubMed

    McFarland, Dennis J

    2013-07-01

    Multivariate decoding methods are popular techniques for analysis of neurophysiological data. The present study explored potential interpretative problems with these techniques when predictors are correlated. Data from sensorimotor rhythm-based cursor control experiments was analyzed offline with linear univariate and multivariate models. Features were derived from autoregressive (AR) spectral analysis of varying model order which produced predictors that varied in their degree of correlation (i.e., multicollinearity). The use of multivariate regression models resulted in much better prediction of target position as compared to univariate regression models. However, with lower order AR features interpretation of the spectral patterns of the weights was difficult. This is likely to be due to the high degree of multicollinearity present with lower order AR features. Care should be exercised when interpreting the pattern of weights of multivariate models with correlated predictors. Comparison with univariate statistics is advisable. While multivariate decoding algorithms are very useful for prediction their utility for interpretation may be limited when predictors are correlated. Copyright © 2013 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  2. Overweight and Obesity Prevalence Among School-Aged Nunavik Inuit Children According to Three Body Mass Index Classification Systems.

    PubMed

    Medehouenou, Thierry Comlan Marc; Ayotte, Pierre; St-Jean, Audray; Meziou, Salma; Roy, Cynthia; Muckle, Gina; Lucas, Michel

    2015-07-01

    Little is known about the suitability of three commonly used body mass index (BMI) classification system for Indigenous children. This study aims to estimate overweight and obesity prevalence among school-aged Nunavik Inuit children according to International Obesity Task Force (IOTF), Centers for Disease Control and Prevention (CDC), and World Health Organization (WHO) BMI classification systems, to measure agreement between those classification systems, and to investigate whether BMI status as defined by these classification systems is associated with levels of metabolic and inflammatory biomarkers. Data were collected on 290 school-aged children (aged 8-14 years; 50.7% girls) from the Nunavik Child Development Study with data collected in 2005-2010. Anthropometric parameters were measured and blood sampled. Participants were classified as normal weight, overweight, and obese according to BMI classification systems. Weighted kappa (κw) statistics assessed agreement between different BMI classification systems, and multivariate analysis of variance ascertained their relationship with metabolic and inflammatory biomarkers. The combined prevalence rate of overweight/obesity was 26.9% (with 6.6% obesity) with IOTF, 24.1% (11.0%) with CDC, and 40.4% (12.8%) with WHO classification systems. Agreement was the highest between IOTF and CDC (κw = .87) classifications, and substantial for IOTF and WHO (κw = .69) and for CDC and WHO (κw = .73). Insulin and high-sensitivity C-reactive protein plasma levels were significantly higher from normal weight to obesity, regardless of classification system. Among obese subjects, higher insulin level was observed with IOTF. Compared with other systems, IOTF classification appears to be more specific to identify overweight and obesity in Inuit children. Copyright © 2015 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.

  3. An experiment in multispectral, multitemporal crop classification using relaxation techniques

    NASA Technical Reports Server (NTRS)

    Davis, L. S.; Wang, C.-Y.; Xie, H.-C

    1983-01-01

    The paper describes the result of an experimental study concerning the use of probabilistic relaxation for improving pixel classification rates. Two LACIE sites were used in the study and in both cases, relaxation resulted in a marked improvement in classification rates.

  4. Untargeted metabolomic profiling of seminal plasma in nonobstructive azoospermia men: A noninvasive detection of spermatogenesis.

    PubMed

    Gilany, Kambiz; Mani-Varnosfaderani, Ahmad; Minai-Tehrani, Arash; Mirzajani, Fateme; Ghassempour, Alireza; Sadeghi, Mohammed Reza; Amini, Mehdi; Rezadoost, Hassan

    2017-08-01

    Male factor infertility is involved in almost half of all infertile couples. Lack of the ejaculated sperm owing to testicular malfunction has been reported in 6-10% of infertile men, a condition named nonobstructive azoospermia (NOA). In this study, we investigated untargeted metabolomic profiling of the seminal plasma in NOA men using gas chromatography-mass spectrometry and advance chemometrics. In this regard, the seminal plasma fluids of 11 NOA men with TESE-negative, nine NOA men with TESE-positive and 10 fertile healthy men (as a control group) were collected. Quadratic discriminate analysis (QDA) technique was implemented on total ion chromatograms (TICs) for identification of discriminatory retention times. We developed multivariate classification models using the QDA technique. Our results revealed that the developed QDA models could predict the classes of samples using their TIC data. The receiver operating characteristic curves for these models were >0.88. After recognition of discriminatory retention time's asymmetric penalized least square, evolving factor analysis, correlation optimized warping and alternating least squares strategies were applied for preprocessing and deconvolution of the overlapped chromatographic peaks. We could identify 36 discriminatory metabolites. These metabolites may be considered discriminatory biomarkers for different groups in NOA. Copyright © 2017 John Wiley & Sons, Ltd.

  5. Simple adaptive sparse representation based classification schemes for EEG based brain-computer interface applications.

    PubMed

    Shin, Younghak; Lee, Seungchan; Ahn, Minkyu; Cho, Hohyun; Jun, Sung Chan; Lee, Heung-No

    2015-11-01

    One of the main problems related to electroencephalogram (EEG) based brain-computer interface (BCI) systems is the non-stationarity of the underlying EEG signals. This results in the deterioration of the classification performance during experimental sessions. Therefore, adaptive classification techniques are required for EEG based BCI applications. In this paper, we propose simple adaptive sparse representation based classification (SRC) schemes. Supervised and unsupervised dictionary update techniques for new test data and a dictionary modification method by using the incoherence measure of the training data are investigated. The proposed methods are very simple and additional computation for the re-training of the classifier is not needed. The proposed adaptive SRC schemes are evaluated using two BCI experimental datasets. The proposed methods are assessed by comparing classification results with the conventional SRC and other adaptive classification methods. On the basis of the results, we find that the proposed adaptive schemes show relatively improved classification accuracy as compared to conventional methods without requiring additional computation. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. A hybrid LIBS-Raman system combined with chemometrics: an efficient tool for plastic identification and sorting.

    PubMed

    Shameem, K M Muhammed; Choudhari, Khoobaram S; Bankapur, Aseefhali; Kulkarni, Suresh D; Unnikrishnan, V K; George, Sajan D; Kartha, V B; Santhosh, C

    2017-05-01

    Classification of plastics is of great importance in the recycling industry as the littering of plastic wastes increases day by day as a result of its extensive use. In this paper, we demonstrate the efficacy of a combined laser-induced breakdown spectroscopy (LIBS)-Raman system for the rapid identification and classification of post-consumer plastics. The atomic information and molecular information of polyethylene terephthalate, polyethylene, polypropylene, and polystyrene were studied using plasma emission spectra and scattered signal obtained in the LIBS and Raman technique, respectively. The collected spectral features of the samples were analyzed using statistical tools (principal component analysis, Mahalanobis distance) to categorize the plastics. The analyses of the data clearly show that elemental information and molecular information obtained from these techniques are efficient for classification of plastics. In addition, the molecular information collected via Raman spectroscopy exhibits clearly distinct features for the transparent plastics (100% discrimination), whereas the LIBS technique shows better spectral feature differences for the colored samples. The study shows that the information obtained from these complementary techniques allows the complete classification of the plastic samples, irrespective of the color or additives. This work further throws some light on the fact that the potential limitations of any of these techniques for sample identification can be overcome by the complementarity of these two techniques. Graphical Abstract ᅟ.

  7. Fully Convolutional Networks for Ground Classification from LIDAR Point Clouds

    NASA Astrophysics Data System (ADS)

    Rizaldy, A.; Persello, C.; Gevaert, C. M.; Oude Elberink, S. J.

    2018-05-01

    Deep Learning has been massively used for image classification in recent years. The use of deep learning for ground classification from LIDAR point clouds has also been recently studied. However, point clouds need to be converted into an image in order to use Convolutional Neural Networks (CNNs). In state-of-the-art techniques, this conversion is slow because each point is converted into a separate image. This approach leads to highly redundant computation during conversion and classification. The goal of this study is to design a more efficient data conversion and ground classification. This goal is achieved by first converting the whole point cloud into a single image. The classification is then performed by a Fully Convolutional Network (FCN), a modified version of CNN designed for pixel-wise image classification. The proposed method is significantly faster than state-of-the-art techniques. On the ISPRS Filter Test dataset, it is 78 times faster for conversion and 16 times faster for classification. Our experimental analysis on the same dataset shows that the proposed method results in 5.22 % of total error, 4.10 % of type I error, and 15.07 % of type II error. Compared to the previous CNN-based technique and LAStools software, the proposed method reduces the total error and type I error (while type II error is slightly higher). The method was also tested on a very high point density LIDAR point clouds resulting in 4.02 % of total error, 2.15 % of type I error and 6.14 % of type II error.

  8. 2014 Bio-Acoustics Data Challenge for the International Community on Machine Learning and Bioacoustics

    DTIC Science & Technology

    2014-09-30

    This ONR grant promotes the development and application of advanced machine learning techniques for detection and classification of marine mammal...sounds. The objective is to engage a broad community of data scientists in the development and application of advanced machine learning techniques for detection and classification of marine mammal sounds.

  9. Non-Destructive Classification Approaches for Equilbrated Ordinary Chondrites

    NASA Technical Reports Server (NTRS)

    Righter, K.; Harrington, R.; Schroeder, C.; Morris, R. V.

    2013-01-01

    Classification of meteorites is most effectively carried out by petrographic and mineralogic studies of thin sections, but a rapid and accurate classification technique for the many samples collected in dense collection areas (hot and cold deserts) is of great interest. Oil immersion techniques have been used to classify a large proportion of the US Antarctic meteorite collections since the mid-1980s [1]. This approach has allowed rapid characterization of thousands of samples over time, but nonetheless utilizes a piece of the sample that has been ground to grains or a powder. In order to compare a few non-destructive techniques with the standard approaches, we have characterized a group of chondrites from the Larkman Nunatak region using magnetic susceptibility and Moessbauer spectroscopy.

  10. Analysis of Landsat-4 Thematic Mapper data for classification of forest stands in Baldwin County, Alabama

    NASA Technical Reports Server (NTRS)

    Hill, C. L.

    1984-01-01

    A computer-implemented classification has been derived from Landsat-4 Thematic Mapper data acquired over Baldwin County, Alabama on January 15, 1983. One set of spectral signatures was developed from the data by utilizing a 3x3 pixel sliding window approach. An analysis of the classification produced from this technique identified forested areas. Additional information regarding only the forested areas. Additional information regarding only the forested areas was extracted by employing a pixel-by-pixel signature development program which derived spectral statistics only for pixels within the forested land covers. The spectral statistics from both approaches were integrated and the data classified. This classification was evaluated by comparing the spectral classes produced from the data against corresponding ground verification polygons. This iterative data analysis technique resulted in an overall classification accuracy of 88.4 percent correct for slash pine, young pine, loblolly pine, natural pine, and mixed hardwood-pine. An accuracy assessment matrix has been produced for the classification.

  11. The Effect of Normalization in Violence Video Classification Performance

    NASA Astrophysics Data System (ADS)

    Ali, Ashikin; Senan, Norhalina

    2017-08-01

    Basically, data pre-processing is an important part of data mining. Normalization is a pre-processing stage for any type of problem statement, especially in video classification. Challenging problems that arises in video classification is because of the heterogeneous content, large variations in video quality and complex semantic meanings of the concepts involved. Therefore, to regularize this problem, it is thoughtful to ensure normalization or basically involvement of thorough pre-processing stage aids the robustness of classification performance. This process is to scale all the numeric variables into certain range to make it more meaningful for further phases in available data mining techniques. Thus, this paper attempts to examine the effect of 2 normalization techniques namely Min-max normalization and Z-score in violence video classifications towards the performance of classification rate using Multi-layer perceptron (MLP) classifier. Using Min-Max Normalization range of [0,1] the result shows almost 98% of accuracy, meanwhile Min-Max Normalization range of [-1,1] accuracy is 59% and for Z-score the accuracy is 50%.

  12. Abstracting of suspected illegal land use in urban areas using case-based classification of remote sensing images

    NASA Astrophysics Data System (ADS)

    Chen, Fulong; Wang, Chao; Yang, Chengyun; Zhang, Hong; Wu, Fan; Lin, Wenjuan; Zhang, Bo

    2008-11-01

    This paper proposed a method that uses a case-based classification of remote sensing images and applied this method to abstract the information of suspected illegal land use in urban areas. Because of the discrete cases for imagery classification, the proposed method dealt with the oscillation of spectrum or backscatter within the same land use category, and it not only overcame the deficiency of maximum likelihood classification (the prior probability of land use could not be obtained) but also inherited the advantages of the knowledge-based classification system, such as artificial intelligence and automatic characteristics. Consequently, the proposed method could do the classifying better. Then the researchers used the object-oriented technique for shadow removal in highly dense city zones. With multi-temporal SPOT 5 images whose resolution was 2.5×2.5 meters, the researchers found that the method can abstract suspected illegal land use information in urban areas using post-classification comparison technique.

  13. Applying machine learning classification techniques to automate sky object cataloguing

    NASA Astrophysics Data System (ADS)

    Fayyad, Usama M.; Doyle, Richard J.; Weir, W. Nick; Djorgovski, Stanislav

    1993-08-01

    We describe the application of an Artificial Intelligence machine learning techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Mt. Palomar Northern Sky Survey is nearly completed. This survey provides comprehensive coverage of the northern celestial hemisphere in the form of photographic plates. The plates are being transformed into digitized images whose quality will probably not be surpassed in the next ten to twenty years. The images are expected to contain on the order of 107 galaxies and 108 stars. Astronomers wish to determine which of these sky objects belong to various classes of galaxies and stars. Unfortunately, the size of this data set precludes analysis in an exclusively manual fashion. Our approach is to develop a software system which integrates the functions of independently developed techniques for image processing and data classification. Digitized sky images are passed through image processing routines to identify sky objects and to extract a set of features for each object. These routines are used to help select a useful set of attributes for classifying sky objects. Then GID3 (Generalized ID3) and O-B Tree, two inductive learning techniques, learns classification decision trees from examples. These classifiers will then be applied to new data. These developmnent process is highly interactive, with astronomer input playing a vital role. Astronomers refine the feature set used to construct sky object descriptions, and evaluate the performance of the automated classification technique on new data. This paper gives an overview of the machine learning techniques with an emphasis on their general applicability, describes the details of our specific application, and reports the initial encouraging results. The results indicate that our machine learning approach is well-suited to the problem. The primary benefit of the approach is increased data reduction throughput. Another benefit is consistency of classification. The classification rules which are the product of the inductive learning techniques will form an objective, examinable basis for classifying sky objects. A final, not to be underestimated benefit is that astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems based on automatically catalogued data.

  14. Nondestructive evaluation technique guide

    NASA Technical Reports Server (NTRS)

    Vary, A.

    1973-01-01

    A total of 70 individual nondestructive evaluation (NDE) techniques are described. Information is presented that permits ease of comparison of the merits and limitations of each technique with respect to various NDE problems. An NDE technique classification system is presented. It is based on the system that was adopted by the National Materials Advisory Board (NMAB). The classification system presented follows the NMAB system closely with the exception of additional categories that have been added to cover more advanced techniques presently in use. The rationale of the technique is explained. The format provides for a concise description of each technique, the physical principles involved, objectives of interrogation, example applications, limitations of each technique, a schematic illustration, and key reference material. Cross-index tabulations are also provided so that particular NDE problems can be referred to appropriate techniques.

  15. Characterization of Rheumatoid Arthritis Subtypes Using Symptom Profiles, Clinical Chemistry and Metabolomics Measurements

    PubMed Central

    van der Kooij, Anita J.; Reijmers, Theo H.; Schroën, Yan; Wang, Mei; Xu, Zhiliang; Wang, Xinchang; Kong, Hongwei; Xu, Guowang; Hankemeier, Thomas; Meulman, Jacqueline J.; van der Greef, Jan

    2012-01-01

    Objective The aim is to characterize subgroups or phenotypes of rheumatoid arthritis (RA) patients using a systems biology approach. The discovery of subtypes of rheumatoid arthritis patients is an essential research area for the improvement of response to therapy and the development of personalized medicine strategies. Methods In this study, 39 RA patients are phenotyped using clinical chemistry measurements, urine and plasma metabolomics analysis and symptom profiles. In addition, a Chinese medicine expert classified each RA patient as a Cold or Heat type according to Chinese medicine theory. Multivariate data analysis techniques are employed to detect and validate biochemical and symptom relationships with the classification. Results The questionnaire items ‘Red joints’, ‘Swollen joints’, ‘Warm joints’ suggest differences in the level of inflammation between the groups although c-reactive protein (CRP) and rheumatoid factor (RHF) levels were equal. Multivariate analysis of the urine metabolomics data revealed that the levels of 11 acylcarnitines were lower in the Cold RA than in the Heat RA patients, suggesting differences in muscle breakdown. Additionally, higher dehydroepiandrosterone sulfate (DHEAS) levels in Heat patients compared to Cold patients were found suggesting that the Cold RA group has a more suppressed hypothalamic-pituitary-adrenal (HPA) axis function. Conclusion Significant and relevant biochemical differences are found between Cold and Heat RA patients. Differences in immune function, HPA axis involvement and muscle breakdown point towards opportunities to tailor disease management strategies to each of the subgroups RA patient. PMID:22984493

  16. Application of machine learning methods to describe the effects of conjugated equine estrogens therapy on region-specific brain volumes.

    PubMed

    Casanova, Ramon; Espeland, Mark A; Goveas, Joseph S; Davatzikos, Christos; Gaussoin, Sarah A; Maldjian, Joseph A; Brunner, Robert L; Kuller, Lewis H; Johnson, Karen C; Mysiw, W Jerry; Wagner, Benjamin; Resnick, Susan M

    2011-05-01

    Use of conjugated equine estrogens (CEE) has been linked to smaller regional brain volumes in women aged ≥65 years; however, it is unknown whether this results in a broad-based characteristic pattern of effects. Structural magnetic resonance imaging was used to assess regional volumes of normal tissue and ischemic lesions among 513 women who had been enrolled in a randomized clinical trial of CEE therapy for an average of 6.6 years, beginning at ages 65-80 years. A multivariate pattern analysis, based on a machine learning technique that combined Random Forest and logistic regression with L(1) penalty, was applied to identify patterns among regional volumes associated with therapy and whether patterns discriminate between treatment groups. The multivariate pattern analysis detected smaller regional volumes of normal tissue within the limbic and temporal lobes among women that had been assigned to CEE therapy. Mean decrements ranged as high as 7% in the left entorhinal cortex and 5% in the left perirhinal cortex, which exceeded the effect sizes reported previously in frontal lobe and hippocampus. Overall accuracy of classification based on these patterns, however, was projected to be only 54.5%. Prescription of CEE therapy for an average of 6.6 years is associated with lower regional brain volumes, but it does not induce a characteristic spatial pattern of changes in brain volumes of sufficient magnitude to discriminate users and nonusers. Copyright © 2011 Elsevier Inc. All rights reserved.

  17. Multivariate Statistical Analysis of Orthogonal Mass Spectral Data for the Identification of Chemical Attribution Signatures of 3-Methylfentanyl

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mayer, B. P.; Valdez, C. A.; DeHope, A. J.

    Critical to many modern forensic investigations is the chemical attribution of the origin of an illegal drug. This process greatly relies on identification of compounds indicative of its clandestine or commercial production. The results of these studies can yield detailed information on method of manufacture, sophistication of the synthesis operation, starting material source, and final product. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic 3- methylfentanyl, N-(3-methyl-1-phenethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods were studied in an effort to identify and classify route-specific signatures. These methods were chosen to minimize the use of scheduledmore » precursors, complicated laboratory equipment, number of overall steps, and demanding reaction conditions. Using gas and liquid chromatographies combined with mass spectrometric methods (GC-QTOF and LC-QTOF) in conjunction with inductivelycoupled plasma mass spectrometry (ICP-MS), over 240 distinct compounds and elements were monitored. As seen in our previous work with CAS of fentanyl synthesis the complexity of the resultant data matrix necessitated the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 62 statistically significant, route-specific CAS were identified. Statistical classification models using a variety of machine learning techniques were then developed with the ability to predict the method of 3-methylfentanyl synthesis from three blind crude samples generated by synthetic chemists without prior experience with these methods.« less

  18. Application of machine learning methods to describe the effects of conjugated equine estrogens therapy on region-specific brain volumes

    PubMed Central

    Casanova, Ramon; Espeland, Mark A.; Goveas, Joseph S.; Davatzikos, Christos; Gaussoin, Sarah A.; Maldjian, Joseph A.; Brunner, Robert L.; Kuller, Lewis H.; Johnson, Karen C.; Mysiw, W. Jerry; Wagner, Benjamin; Resnick, Susan M.

    2011-01-01

    Use of conjugated equine estrogens (CEE) has been linked to smaller regional brain volumes in women aged ≥65 years, however it is unknown whether this results in a broad-based characteristic pattern of effects. Structural MRI was used to assess regional volumes of normal tissue and ischemic lesions among 513 women who had been enrolled in a randomized clinical trial of CEE therapy for an average of 6.6 years, beginning at ages 65-80 years. A multivariate pattern analysis, based on a machine learning technique that combined Random Forest and logistic regression with L1 penalty, was applied to identify patterns among regional volumes associated with therapy and whether patterns discriminate between treatment groups. The multivariate pattern analysis detected smaller regional volumes of normal tissue within the limbic and temporal lobes among women that had been assigned to CEE therapy. Mean decrements ranged as high as 7% in the left entorhinal cortex and 5% in the left perirhinal cortex, which exceeded the effect sizes reported previously in frontal lobe and hippocampus. Overall accuracy of classification based on these patterns, however, was projected to be only 54.5%. Prescription of CEE therapy for an average of 6.6 years is associated with lower regional brain volumes, but it does not induce a characteristic spatial pattern of changes in brain volumes of sufficient magnitude to discriminate users and non-users. PMID:21292420

  19. Cole-Cole, linear and multivariate modeling of capacitance data for on-line monitoring of biomass.

    PubMed

    Dabros, Michal; Dennewald, Danielle; Currie, David J; Lee, Mark H; Todd, Robert W; Marison, Ian W; von Stockar, Urs

    2009-02-01

    This work evaluates three techniques of calibrating capacitance (dielectric) spectrometers used for on-line monitoring of biomass: modeling of cell properties using the theoretical Cole-Cole equation, linear regression of dual-frequency capacitance measurements on biomass concentration, and multivariate (PLS) modeling of scanning dielectric spectra. The performance and robustness of each technique is assessed during a sequence of validation batches in two experimental settings of differing signal noise. In more noisy conditions, the Cole-Cole model had significantly higher biomass concentration prediction errors than the linear and multivariate models. The PLS model was the most robust in handling signal noise. In less noisy conditions, the three models performed similarly. Estimates of the mean cell size were done additionally using the Cole-Cole and PLS models, the latter technique giving more satisfactory results.

  20. Discriminative analysis of early Alzheimer's disease based on two intrinsically anti-correlated networks with resting-state fMRI.

    PubMed

    Wang, Kun; Jiang, Tianzi; Liang, Meng; Wang, Liang; Tian, Lixia; Zhang, Xinqing; Li, Kuncheng; Liu, Zhening

    2006-01-01

    In this work, we proposed a discriminative model of Alzheimer's disease (AD) on the basis of multivariate pattern classification and functional magnetic resonance imaging (fMRI). This model used the correlation/anti-correlation coefficients of two intrinsically anti-correlated networks in resting brains, which have been suggested by two recent studies, as the feature of classification. Pseudo-Fisher Linear Discriminative Analysis (pFLDA) was then performed on the feature space and a linear classifier was generated. Using leave-one-out (LOO) cross validation, our results showed a correct classification rate of 83%. We also compared the proposed model with another one based on the whole brain functional connectivity. Our proposed model outperformed the other one significantly, and this implied that the two intrinsically anti-correlated networks may be a more susceptible part of the whole brain network in the early stage of AD.

  1. A Temporal Pattern Mining Approach for Classifying Electronic Health Record Data

    PubMed Central

    Batal, Iyad; Valizadegan, Hamed; Cooper, Gregory F.; Hauskrecht, Milos

    2013-01-01

    We study the problem of learning classification models from complex multivariate temporal data encountered in electronic health record systems. The challenge is to define a good set of features that are able to represent well the temporal aspect of the data. Our method relies on temporal abstractions and temporal pattern mining to extract the classification features. Temporal pattern mining usually returns a large number of temporal patterns, most of which may be irrelevant to the classification task. To address this problem, we present the Minimal Predictive Temporal Patterns framework to generate a small set of predictive and non-spurious patterns. We apply our approach to the real-world clinical task of predicting patients who are at risk of developing heparin induced thrombocytopenia. The results demonstrate the benefit of our approach in efficiently learning accurate classifiers, which is a key step for developing intelligent clinical monitoring systems. PMID:25309815

  2. Using Multivariate Regression Model with Least Absolute Shrinkage and Selection Operator (LASSO) to Predict the Incidence of Xerostomia after Intensity-Modulated Radiotherapy for Head and Neck Cancer

    PubMed Central

    Ting, Hui-Min; Chang, Liyun; Huang, Yu-Jie; Wu, Jia-Ming; Wang, Hung-Yu; Horng, Mong-Fong; Chang, Chun-Ming; Lan, Jen-Hong; Huang, Ya-Yu; Fang, Fu-Min; Leung, Stephen Wan

    2014-01-01

    Purpose The aim of this study was to develop a multivariate logistic regression model with least absolute shrinkage and selection operator (LASSO) to make valid predictions about the incidence of moderate-to-severe patient-rated xerostomia among head and neck cancer (HNC) patients treated with IMRT. Methods and Materials Quality of life questionnaire datasets from 206 patients with HNC were analyzed. The European Organization for Research and Treatment of Cancer QLQ-H&N35 and QLQ-C30 questionnaires were used as the endpoint evaluation. The primary endpoint (grade 3+ xerostomia) was defined as moderate-to-severe xerostomia at 3 (XER3m) and 12 months (XER12m) after the completion of IMRT. Normal tissue complication probability (NTCP) models were developed. The optimal and suboptimal numbers of prognostic factors for a multivariate logistic regression model were determined using the LASSO with bootstrapping technique. Statistical analysis was performed using the scaled Brier score, Nagelkerke R2, chi-squared test, Omnibus, Hosmer-Lemeshow test, and the AUC. Results Eight prognostic factors were selected by LASSO for the 3-month time point: Dmean-c, Dmean-i, age, financial status, T stage, AJCC stage, smoking, and education. Nine prognostic factors were selected for the 12-month time point: Dmean-i, education, Dmean-c, smoking, T stage, baseline xerostomia, alcohol abuse, family history, and node classification. In the selection of the suboptimal number of prognostic factors by LASSO, three suboptimal prognostic factors were fine-tuned by Hosmer-Lemeshow test and AUC, i.e., Dmean-c, Dmean-i, and age for the 3-month time point. Five suboptimal prognostic factors were also selected for the 12-month time point, i.e., Dmean-i, education, Dmean-c, smoking, and T stage. The overall performance for both time points of the NTCP model in terms of scaled Brier score, Omnibus, and Nagelkerke R2 was satisfactory and corresponded well with the expected values. Conclusions Multivariate NTCP models with LASSO can be used to predict patient-rated xerostomia after IMRT. PMID:24586971

  3. Using multivariate regression model with least absolute shrinkage and selection operator (LASSO) to predict the incidence of Xerostomia after intensity-modulated radiotherapy for head and neck cancer.

    PubMed

    Lee, Tsair-Fwu; Chao, Pei-Ju; Ting, Hui-Min; Chang, Liyun; Huang, Yu-Jie; Wu, Jia-Ming; Wang, Hung-Yu; Horng, Mong-Fong; Chang, Chun-Ming; Lan, Jen-Hong; Huang, Ya-Yu; Fang, Fu-Min; Leung, Stephen Wan

    2014-01-01

    The aim of this study was to develop a multivariate logistic regression model with least absolute shrinkage and selection operator (LASSO) to make valid predictions about the incidence of moderate-to-severe patient-rated xerostomia among head and neck cancer (HNC) patients treated with IMRT. Quality of life questionnaire datasets from 206 patients with HNC were analyzed. The European Organization for Research and Treatment of Cancer QLQ-H&N35 and QLQ-C30 questionnaires were used as the endpoint evaluation. The primary endpoint (grade 3(+) xerostomia) was defined as moderate-to-severe xerostomia at 3 (XER3m) and 12 months (XER12m) after the completion of IMRT. Normal tissue complication probability (NTCP) models were developed. The optimal and suboptimal numbers of prognostic factors for a multivariate logistic regression model were determined using the LASSO with bootstrapping technique. Statistical analysis was performed using the scaled Brier score, Nagelkerke R(2), chi-squared test, Omnibus, Hosmer-Lemeshow test, and the AUC. Eight prognostic factors were selected by LASSO for the 3-month time point: Dmean-c, Dmean-i, age, financial status, T stage, AJCC stage, smoking, and education. Nine prognostic factors were selected for the 12-month time point: Dmean-i, education, Dmean-c, smoking, T stage, baseline xerostomia, alcohol abuse, family history, and node classification. In the selection of the suboptimal number of prognostic factors by LASSO, three suboptimal prognostic factors were fine-tuned by Hosmer-Lemeshow test and AUC, i.e., Dmean-c, Dmean-i, and age for the 3-month time point. Five suboptimal prognostic factors were also selected for the 12-month time point, i.e., Dmean-i, education, Dmean-c, smoking, and T stage. The overall performance for both time points of the NTCP model in terms of scaled Brier score, Omnibus, and Nagelkerke R(2) was satisfactory and corresponded well with the expected values. Multivariate NTCP models with LASSO can be used to predict patient-rated xerostomia after IMRT.

  4. Epileptic seizure detection in EEG signal using machine learning techniques.

    PubMed

    Jaiswal, Abeg Kumar; Banka, Haider

    2018-03-01

    Epilepsy is a well-known nervous system disorder characterized by seizures. Electroencephalograms (EEGs), which capture brain neural activity, can detect epilepsy. Traditional methods for analyzing an EEG signal for epileptic seizure detection are time-consuming. Recently, several automated seizure detection frameworks using machine learning technique have been proposed to replace these traditional methods. The two basic steps involved in machine learning are feature extraction and classification. Feature extraction reduces the input pattern space by keeping informative features and the classifier assigns the appropriate class label. In this paper, we propose two effective approaches involving subpattern based PCA (SpPCA) and cross-subpattern correlation-based PCA (SubXPCA) with Support Vector Machine (SVM) for automated seizure detection in EEG signals. Feature extraction was performed using SpPCA and SubXPCA. Both techniques explore the subpattern correlation of EEG signals, which helps in decision-making process. SVM is used for classification of seizure and non-seizure EEG signals. The SVM was trained with radial basis kernel. All the experiments have been carried out on the benchmark epilepsy EEG dataset. The entire dataset consists of 500 EEG signals recorded under different scenarios. Seven different experimental cases for classification have been conducted. The classification accuracy was evaluated using tenfold cross validation. The classification results of the proposed approaches have been compared with the results of some of existing techniques proposed in the literature to establish the claim.

  5. Classification of Regional Ionospheric Disturbances Based on Support Vector Machines

    NASA Astrophysics Data System (ADS)

    Begüm Terzi, Merve; Arikan, Feza; Arikan, Orhan; Karatay, Secil

    2016-07-01

    Ionosphere is an anisotropic, inhomogeneous, time varying and spatio-temporally dispersive medium whose parameters can be estimated almost always by using indirect measurements. Geomagnetic, gravitational, solar or seismic activities cause variations of ionosphere at various spatial and temporal scales. This complex spatio-temporal variability is challenging to be identified due to extensive scales in period, duration, amplitude and frequency of disturbances. Since geomagnetic and solar indices such as Disturbance storm time (Dst), F10.7 solar flux, Sun Spot Number (SSN), Auroral Electrojet (AE), Kp and W-index provide information about variability on a global scale, identification and classification of regional disturbances poses a challenge. The main aim of this study is to classify the regional effects of global geomagnetic storms and classify them according to their risk levels. For this purpose, Total Electron Content (TEC) estimated from GPS receivers, which is one of the major parameters of ionosphere, will be used to model the regional and local variability that differs from global activity along with solar and geomagnetic indices. In this work, for the automated classification of the regional disturbances, a classification technique based on a robust machine learning technique that have found wide spread use, Support Vector Machine (SVM) is proposed. SVM is a supervised learning model used for classification with associated learning algorithm that analyze the data and recognize patterns. In addition to performing linear classification, SVM can efficiently perform nonlinear classification by embedding data into higher dimensional feature spaces. Performance of the developed classification technique is demonstrated for midlatitude ionosphere over Anatolia using TEC estimates generated from the GPS data provided by Turkish National Permanent GPS Network (TNPGN-Active) for solar maximum year of 2011. As a result of implementing the developed classification technique to the Global Ionospheric Map (GIM) TEC data which is provided by the NASA Jet Propulsion Laboratory (JPL), it will be shown that SVM can be a suitable learning method to detect the anomalies in Total Electron Content (TEC) variations. This study is supported by TUBITAK 114E541 project as a part of the Scientific and Technological Research Projects Funding Program (1001).

  6. Characterization of Used Nuclear Fuel with Multivariate Analysis for Process Monitoring

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dayman, Kenneth J.; Coble, Jamie B.; Orton, Christopher R.

    2014-01-01

    The Multi-Isotope Process (MIP) Monitor combines gamma spectroscopy and multivariate analysis to detect anomalies in various process streams in a nuclear fuel reprocessing system. Measured spectra are compared to models of nominal behavior at each measurement location to detect unexpected changes in system behavior. In order to improve the accuracy and specificity of process monitoring, fuel characterization may be used to more accurately train subsequent models in a full analysis scheme. This paper presents initial development of a reactor-type classifier that is used to select a reactor-specific partial least squares model to predict fuel burnup. Nuclide activities for prototypic usedmore » fuel samples were generated in ORIGEN-ARP and used to investigate techniques to characterize used nuclear fuel in terms of reactor type (pressurized or boiling water reactor) and burnup. A variety of reactor type classification algorithms, including k-nearest neighbors, linear and quadratic discriminant analyses, and support vector machines, were evaluated to differentiate used fuel from pressurized and boiling water reactors. Then, reactor type-specific partial least squares models were developed to predict the burnup of the fuel. Using these reactor type-specific models instead of a model trained for all light water reactors improved the accuracy of burnup predictions. The developed classification and prediction models were combined and applied to a large dataset that included eight fuel assembly designs, two of which were not used in training the models, and spanned the range of the initial 235U enrichment, cooling time, and burnup values expected of future commercial used fuel for reprocessing. Error rates were consistent across the range of considered enrichment, cooling time, and burnup values. Average absolute relative errors in burnup predictions for validation data both within and outside the training space were 0.0574% and 0.0597%, respectively. The errors seen in this work are artificially low, because the models were trained, optimized, and tested on simulated, noise-free data. However, these results indicate that the developed models may generalize well to new data and that the proposed approach constitutes a viable first step in developing a fuel characterization algorithm based on gamma spectra.« less

  7. Multi-material classification of dry recyclables from municipal solid waste based on thermal imaging.

    PubMed

    Gundupalli, Sathish Paulraj; Hait, Subrata; Thakur, Atul

    2017-12-01

    There has been a significant rise in municipal solid waste (MSW) generation in the last few decades due to rapid urbanization and industrialization. Due to the lack of source segregation practice, a need for automated segregation of recyclables from MSW exists in the developing countries. This paper reports a thermal imaging based system for classifying useful recyclables from simulated MSW sample. Experimental results have demonstrated the possibility to use thermal imaging technique for classification and a robotic system for sorting of recyclables in a single process step. The reported classification system yields an accuracy in the range of 85-96% and is comparable with the existing single-material recyclable classification techniques. We believe that the reported thermal imaging based system can emerge as a viable and inexpensive large-scale classification-cum-sorting technology in recycling plants for processing MSW in developing countries. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Classification-Based Spatial Error Concealment for Visual Communications

    NASA Astrophysics Data System (ADS)

    Chen, Meng; Zheng, Yefeng; Wu, Min

    2006-12-01

    In an error-prone transmission environment, error concealment is an effective technique to reconstruct the damaged visual content. Due to large variations of image characteristics, different concealment approaches are necessary to accommodate the different nature of the lost image content. In this paper, we address this issue and propose using classification to integrate the state-of-the-art error concealment techniques. The proposed approach takes advantage of multiple concealment algorithms and adaptively selects the suitable algorithm for each damaged image area. With growing awareness that the design of sender and receiver systems should be jointly considered for efficient and reliable multimedia communications, we proposed a set of classification-based block concealment schemes, including receiver-side classification, sender-side attachment, and sender-side embedding. Our experimental results provide extensive performance comparisons and demonstrate that the proposed classification-based error concealment approaches outperform the conventional approaches.

  9. Determination of fragrance content in perfume by Raman spectroscopy and multivariate calibration.

    PubMed

    Godinho, Robson B; Santos, Mauricio C; Poppi, Ronei J

    2016-03-15

    An alternative methodology is herein proposed for determination of fragrance content in perfumes and their classification according to the guidelines established by fine perfume manufacturers. The methodology is based on Raman spectroscopy associated with multivariate calibration, allowing the determination of fragrance content in a fast, nondestructive, and sustainable manner. The results were considered consistent with the conventional method, whose standard error of prediction values was lower than the 1.0%. This result indicates that the proposed technology is a feasible analytical tool for determination of the fragrance content in a hydro-alcoholic solution for use in manufacturing, quality control and regulatory agencies. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. Modulatory effects of ketamine, risperidone and lamotrigine on resting brain perfusion in healthy human subjects.

    PubMed

    Shcherbinin, Sergey; Doyle, Orla; Zelaya, Fernando O; de Simoni, Sara; Mehta, Mitul A; Schwarz, Adam J

    2015-11-01

    Resting brain perfusion, measured using the MRI-based arterial spin labelling (ASL) technique, is sensitive to detect central effects of single, clinically effective, doses of pharmacological compounds. However, pharmacological interaction experiments, such as the modulation of one drug response in the presence of another, have not been widely investigated using a task-free ASL approach. We assessed the effects of three psychoactive compounds (ketamine, risperidone and lamotrigine), and their interaction, on resting brain perfusion in healthy human volunteers. A multivariate Gaussian process classification (GPC) and more conventional univariate analyses were applied. The four pre-infusion conditions for each subject comprised risperidone, lamotrigine and two placebo sessions. The two placebo conditions enabled us to evaluate the classification performance in a test-retest setting, in addition to its performance in distinguishing the active oral drugs from placebo (direct effect on brain perfusion). The post ketamine- or saline-infusion scans allowed the effect of ketamine, and its interaction with risperidone and lamotrigine, on brain perfusion to be characterised. The pseudo-continuous ASL measurements of perfusion were sensitive to the effects of ketamine infusion and risperidone. The GPC captured consistent changes in perfusion across the group and contextualised the univariate changes with a larger pattern of regions contributing to accurate discrimination of ketamine from placebo. The findings argue against perfusion changes confounding in the previously described evoked BOLD response to ketamine and emphasise the blockade of the NMDA receptor over neuronal glutamate release in determining the perfusion changes induced by ketamine.

  11. Instrumental and statistical methods for the comparison of class evidence

    NASA Astrophysics Data System (ADS)

    Liszewski, Elisa Anne

    Trace evidence is a major field within forensic science. Association of trace evidence samples can be problematic due to sample heterogeneity and a lack of quantitative criteria for comparing spectra or chromatograms. The aim of this study is to evaluate different types of instrumentation for their ability to discriminate among samples of various types of trace evidence. Chemometric analysis, including techniques such as Agglomerative Hierarchical Clustering, Principal Components Analysis, and Discriminant Analysis, was employed to evaluate instrumental data. First, automotive clear coats were analyzed by using microspectrophotometry to collect UV absorption data. In total, 71 samples were analyzed with classification accuracy of 91.61%. An external validation was performed, resulting in a prediction accuracy of 81.11%. Next, fiber dyes were analyzed using UV-Visible microspectrophotometry. While several physical characteristics of cotton fiber can be identified and compared, fiber color is considered to be an excellent source of variation, and thus was examined in this study. Twelve dyes were employed, some being visually indistinguishable. Several different analyses and comparisons were done, including an inter-laboratory comparison and external validations. Lastly, common plastic samples and other polymers were analyzed using pyrolysis-gas chromatography/mass spectrometry, and their pyrolysis products were then analyzed using multivariate statistics. The classification accuracy varied dependent upon the number of classes chosen, but the plastics were grouped based on composition. The polymers were used as an external validation and misclassifications occurred with chlorinated samples all being placed into the category containing PVC.

  12. Integration of data mining classification techniques and ensemble learning to identify risk factors and diagnose ovarian cancer recurrence.

    PubMed

    Tseng, Chih-Jen; Lu, Chi-Jie; Chang, Chi-Chang; Chen, Gin-Den; Cheewakriangkrai, Chalong

    2017-05-01

    Ovarian cancer is the second leading cause of deaths among gynecologic cancers in the world. Approximately 90% of women with ovarian cancer reported having symptoms long before a diagnosis was made. Literature shows that recurrence should be predicted with regard to their personal risk factors and the clinical symptoms of this devastating cancer. In this study, ensemble learning and five data mining approaches, including support vector machine (SVM), C5.0, extreme learning machine (ELM), multivariate adaptive regression splines (MARS), and random forest (RF), were integrated to rank the importance of risk factors and diagnose the recurrence of ovarian cancer. The medical records and pathologic status were extracted from the Chung Shan Medical University Hospital Tumor Registry. Experimental results illustrated that the integrated C5.0 model is a superior approach in predicting the recurrence of ovarian cancer. Moreover, the classification accuracies of C5.0, ELM, MARS, RF, and SVM indeed increased after using the selected important risk factors as predictors. Our findings suggest that The International Federation of Gynecology and Obstetrics (FIGO), Pathologic M, Age, and Pathologic T were the four most critical risk factors for ovarian cancer recurrence. In summary, the above information can support the important influence of personality and clinical symptom representations on all phases of guide interventions, with the complexities of multiple symptoms associated with ovarian cancer in all phases of the recurrent trajectory. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. Fast and effective characterization of 3D region of interest in medical image data

    NASA Astrophysics Data System (ADS)

    Kontos, Despina; Megalooikonomou, Vasileios

    2004-05-01

    We propose a framework for detecting, characterizing and classifying spatial Regions of Interest (ROIs) in medical images, such as tumors and lesions in MRI or activation regions in fMRI. A necessary step prior to classification is efficient extraction of discriminative features. For this purpose, we apply a characterization technique especially designed for spatial ROIs. The main idea of this technique is to extract a k-dimensional feature vector using concentric spheres in 3D (or circles in 2D) radiating out of the ROI's center of mass. These vectors form characterization signatures that can be used to represent the initial ROIs. We focus on classifying fMRI ROIs obtained from a study that explores neuroanatomical correlates of semantic processing in Alzheimer's disease (AD). We detect a ROI highly associated with AD and apply the feature extraction technique with different experimental settings. We seek to distinguish control from patient samples. We study how classification can be performed using the extracted signatures as well as how different experimental parameters affect classification accuracy. The obtained classification accuracy ranged from 82% to 87% (based on the selected ROI) suggesting that the proposed classification framework can be potentially useful in supporting medical decision-making.

  14. SELECTION AND TRAINING, A SURVEY OF IOWA MANUFACTURING FIRMS. MONOGRAPH SERIES NO. 4.

    ERIC Educational Resources Information Center

    SHERIFF, DON R.; AND OTHERS

    INFORMATION ON EMPLOYEE SELECTION AND TRAINING ACTIVITIES WAS SECURED FROM QUESTIONNAIRES RETURNED BY 215 OF 283 FIRMS EMPLOYING AT LEAST 100 PERSONS. DATA FROM 207 SEPARATE ITEMS FOR EACH FIRM WERE KEY PUNCHED AND TABULATED INTO MULTIVARIATE CROSS-CLASSIFICATIONS. OVER 60 PERCENT OF THE FIRMS WERE IN CITIES HAVING OVER 25,000 POPULATION, 40…

  15. Proceedings of the Third Annual Symposium on Mathematical Pattern Recognition and Image Analysis

    NASA Technical Reports Server (NTRS)

    Guseman, L. F., Jr.

    1985-01-01

    Topics addressed include: multivariate spline method; normal mixture analysis applied to remote sensing; image data analysis; classifications in spatially correlated environments; probability density functions; graphical nonparametric methods; subpixel registration analysis; hypothesis integration in image understanding systems; rectification of satellite scanner imagery; spatial variation in remotely sensed images; smooth multidimensional interpolation; and optimal frequency domain textural edge detection filters.

  16. Distributed effects of methylphenidate on the network structure of the resting brain: a connectomic pattern classification analysis.

    PubMed

    Sripada, Chandra Sekhar; Kessler, Daniel; Welsh, Robert; Angstadt, Michael; Liberzon, Israel; Phan, K Luan; Scott, Clayton

    2013-11-01

    Methylphenidate is a psychostimulant medication that produces improvements in functions associated with multiple neurocognitive systems. To investigate the potentially distributed effects of methylphenidate on the brain's intrinsic network architecture, we coupled resting state imaging with multivariate pattern classification. In a within-subject, double-blind, placebo-controlled, randomized, counterbalanced, cross-over design, 32 healthy human volunteers received either methylphenidate or placebo prior to two fMRI resting state scans separated by approximately one week. Resting state connectomes were generated by placing regions of interest at regular intervals throughout the brain, and these connectomes were submitted for support vector machine analysis. We found that methylphenidate produces a distributed, reliably detected, multivariate neural signature. Methylphenidate effects were evident across multiple resting state networks, especially visual, somatomotor, and default networks. Methylphenidate reduced coupling within visual and somatomotor networks. In addition, default network exhibited decoupling with several task positive networks, consistent with methylphenidate modulation of the competitive relationship between these networks. These results suggest that connectivity changes within and between large-scale networks are potentially involved in the mechanisms by which methylphenidate improves attention functioning. Copyright © 2013 Elsevier Inc. All rights reserved.

  17. Gap Shape Classification using Landscape Indices and Multivariate Statistics

    PubMed Central

    Wu, Chih-Da; Cheng, Chi-Chuan; Chang, Che-Chang; Lin, Chinsu; Chang, Kun-Cheng; Chuang, Yung-Chung

    2016-01-01

    This study proposed a novel methodology to classify the shape of gaps using landscape indices and multivariate statistics. Patch-level indices were used to collect the qualified shape and spatial configuration characteristics for canopy gaps in the Lienhuachih Experimental Forest in Taiwan in 1998 and 2002. Non-hierarchical cluster analysis was used to assess the optimal number of gap clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy gap classification. The gaps for the two periods were optimally classified into three categories. In general, gap type 1 had a more complex shape, gap type 2 was more elongated and gap type 3 had the largest gaps that were more regular in shape. The results were evaluated using Wilks’ lambda as satisfactory (p < 0.001). The agreement rate of confusion matrices exceeded 96%. Differences in gap characteristics between the classified gap types that were determined using a one-way ANOVA showed a statistical significance in all patch indices (p = 0.00), except for the Euclidean nearest neighbor distance (ENN) in 2002. Taken together, these results demonstrated the feasibility and applicability of the proposed methodology to classify the shape of a gap. PMID:27901127

  18. Gap Shape Classification using Landscape Indices and Multivariate Statistics.

    PubMed

    Wu, Chih-Da; Cheng, Chi-Chuan; Chang, Che-Chang; Lin, Chinsu; Chang, Kun-Cheng; Chuang, Yung-Chung

    2016-11-30

    This study proposed a novel methodology to classify the shape of gaps using landscape indices and multivariate statistics. Patch-level indices were used to collect the qualified shape and spatial configuration characteristics for canopy gaps in the Lienhuachih Experimental Forest in Taiwan in 1998 and 2002. Non-hierarchical cluster analysis was used to assess the optimal number of gap clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy gap classification. The gaps for the two periods were optimally classified into three categories. In general, gap type 1 had a more complex shape, gap type 2 was more elongated and gap type 3 had the largest gaps that were more regular in shape. The results were evaluated using Wilks' lambda as satisfactory (p < 0.001). The agreement rate of confusion matrices exceeded 96%. Differences in gap characteristics between the classified gap types that were determined using a one-way ANOVA showed a statistical significance in all patch indices (p = 0.00), except for the Euclidean nearest neighbor distance (ENN) in 2002. Taken together, these results demonstrated the feasibility and applicability of the proposed methodology to classify the shape of a gap.

  19. Incorporating Multiple-Choice Questions into an AACSB Assurance of Learning Process: A Course-Embedded Assessment Application to an Introductory Finance Course

    ERIC Educational Resources Information Center

    Santos, Michael R.; Hu, Aidong; Jordan, Douglas

    2014-01-01

    The authors offer a classification technique to make a quantitative skills rubric more operational, with the groupings of multiple-choice questions to match the student learning levels in knowledge, calculation, quantitative reasoning, and analysis. The authors applied this classification technique to the mid-term exams of an introductory finance…

  20. Doppler Feature Based Classification of Wind Profiler Data

    NASA Astrophysics Data System (ADS)

    Sinha, Swati; Chandrasekhar Sarma, T. V.; Lourde. R, Mary

    2017-01-01

    Wind Profilers (WP) are coherent pulsed Doppler radars in UHF and VHF bands. They are used for vertical profiling of wind velocity and direction. This information is very useful for weather modeling, study of climatic patterns and weather prediction. Observations at different height and different wind velocities are possible by changing the operating parameters of WP. A set of Doppler power spectra is the standard form of WP data. Wind velocity, direction and wind velocity turbulence at different heights can be derived from it. Modern wind profilers operate for long duration and generate approximately 4 megabytes of data per hour. The radar data stream contains Doppler power spectra from different radar configurations with echoes from different atmospheric targets. In order to facilitate systematic study, this data needs to be segregated according the type of target. A reliable automated target classification technique is required to do this job. Classical techniques of radar target identification use pattern matching and minimization of mean squared error, Euclidean distance etc. These techniques are not effective for the classification of WP echoes, as these targets do not have well-defined signature in Doppler power spectra. This paper presents an effective target classification technique based on range-Doppler features.

  1. Using Unidimensional IRT Models for Dichotomous Classification via Computerized Classification Testing with Multidimensional Data.

    ERIC Educational Resources Information Center

    Lau, Che-Ming Allen; And Others

    This study focused on the robustness of unidimensional item response theory (UIRT) models in computerized classification testing against violation of the unidimensionality assumption. The study addressed whether UIRT models remain acceptable under various testing conditions and dimensionality strengths. Monte Carlo simulation techniques were used…

  2. Development and application of operational techniques for the inventory and monitoring of resources and uses for the Texas coastal zone

    NASA Technical Reports Server (NTRS)

    Harwood, P. (Principal Investigator); Finley, R.; Mcculloch, S.; Marphy, D.; Hupp, B.

    1976-01-01

    The author has identified the following significant results. Image interpretation mapping techniques were successfully applied to test site 5, an area with a semi-arid climate. The land cover/land use classification required further modification. A new program, HGROUP, added to the ADP classification schedule provides a convenient method for examining the spectral similarity between classes. This capability greatly simplifies the task of combining 25-30 unsupervised subclasses into about 15 major classes that approximately correspond to the land use/land cover classification scheme.

  3. Classification of remotely sensed data using OCR-inspired neural network techniques. [Optical Character Recognition

    NASA Technical Reports Server (NTRS)

    Kiang, Richard K.

    1992-01-01

    Neural networks have been applied to classifications of remotely sensed data with some success. To improve the performance of this approach, an examination was made of how neural networks are applied to the optical character recognition (OCR) of handwritten digits and letters. A three-layer, feedforward network, along with techniques adopted from OCR, was used to classify Landsat-4 Thematic Mapper data. Good results were obtained. To overcome the difficulties that are characteristic of remote sensing applications and to attain significant improvements in classification accuracy, a special network architecture may be required.

  4. Diagnostic classification of macular ganglion cell and retinal nerve fiber layer analysis: differentiation of false-positives from glaucoma.

    PubMed

    Kim, Ko Eun; Jeoung, Jin Wook; Park, Ki Ho; Kim, Dong Myung; Kim, Seok Hwan

    2015-03-01

    To investigate the rate and associated factors of false-positive diagnostic classification of ganglion cell analysis (GCA) and retinal nerve fiber layer (RNFL) maps, and characteristic false-positive patterns on optical coherence tomography (OCT) deviation maps. Prospective, cross-sectional study. A total of 104 healthy eyes of 104 normal participants. All participants underwent peripapillary and macular spectral-domain (Cirrus-HD, Carl Zeiss Meditec Inc, Dublin, CA) OCT scans. False-positive diagnostic classification was defined as yellow or red color-coded areas for GCA and RNFL maps. Univariate and multivariate logistic regression analyses were used to determine associated factors. Eyes with abnormal OCT deviation maps were categorized on the basis of the shape and location of abnormal color-coded area. Differences in clinical characteristics among the subgroups were compared. (1) The rate and associated factors of false-positive OCT maps; (2) patterns of false-positive, color-coded areas on the GCA deviation map and associated clinical characteristics. Of the 104 healthy eyes, 42 (40.4%) and 32 (30.8%) showed abnormal diagnostic classifications on any of the GCA and RNFL maps, respectively. Multivariate analysis revealed that false-positive GCA diagnostic classification was associated with longer axial length and larger fovea-disc angle, whereas longer axial length and smaller disc area were associated with abnormal RNFL maps. Eyes with abnormal GCA deviation map were categorized as group A (donut-shaped round area around the inner annulus), group B (island-like isolated area), and group C (diffuse, circular area with an irregular inner margin in either). The axial length showed a significant increasing trend from group A to C (P=0.001), and likewise, the refractive error was more myopic in group C than in groups A (P=0.015) and B (P=0.014). Group C had thinner average ganglion cell-inner plexiform layer thickness compared with other groups (group A=B>C, P=0.004). Abnormal OCT diagnostic classification should be interpreted with caution, especially in eyes with long axial lengths, large fovea-disc angles, and small optic discs. Our findings suggest that the characteristic patterns of OCT deviation map can provide useful clues to distinguish glaucomatous changes from false-positive findings. Copyright © 2015 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  5. FDG-PET and CSF biomarker accuracy in prediction of conversion to different dementias in a large multicentre MCI cohort.

    PubMed

    Caminiti, Silvia Paola; Ballarini, Tommaso; Sala, Arianna; Cerami, Chiara; Presotto, Luca; Santangelo, Roberto; Fallanca, Federico; Vanoli, Emilia Giovanna; Gianolli, Luigi; Iannaccone, Sandro; Magnani, Giuseppe; Perani, Daniela

    2018-01-01

    In this multicentre study in clinical settings, we assessed the accuracy of optimized procedures for FDG-PET brain metabolism and CSF classifications in predicting or excluding the conversion to Alzheimer's disease (AD) dementia and non-AD dementias. We included 80 MCI subjects with neurological and neuropsychological assessments, FDG-PET scan and CSF measures at entry, all with clinical follow-up. FDG-PET data were analysed with a validated voxel-based SPM method. Resulting single-subject SPM maps were classified by five imaging experts according to the disease-specific patterns, as "typical-AD", "atypical-AD" (i.e. posterior cortical atrophy, asymmetric logopenic AD variant, frontal-AD variant), "non-AD" (i.e. behavioural variant FTD, corticobasal degeneration, semantic variant FTD; dementia with Lewy bodies) or "negative" patterns. To perform the statistical analyses, the individual patterns were grouped either as "AD dementia vs. non-AD dementia (all diseases)" or as "FTD vs. non-FTD (all diseases)". Aβ42, total and phosphorylated Tau CSF-levels were classified dichotomously, and using the Erlangen Score algorithm. Multivariate logistic models tested the prognostic accuracy of FDG-PET-SPM and CSF dichotomous classifications. Accuracy of Erlangen score and Erlangen Score aided by FDG-PET SPM classification was evaluated. The multivariate logistic model identified FDG-PET "AD" SPM classification (Expβ = 19.35, 95% C.I. 4.8-77.8, p < 0.001) and CSF Aβ42 (Expβ = 6.5, 95% C.I. 1.64-25.43, p < 0.05) as the best predictors of conversion from MCI to AD dementia. The "FTD" SPM pattern significantly predicted conversion to FTD dementias at follow-up (Expβ = 14, 95% C.I. 3.1-63, p < 0.001). Overall, FDG-PET-SPM classification was the most accurate biomarker, able to correctly differentiate either the MCI subjects who converted to AD or FTD dementias, and those who remained stable or reverted to normal cognition (Expβ = 17.9, 95% C.I. 4.55-70.46, p < 0.001). Our results support the relevant role of FDG-PET-SPM classification in predicting progression to different dementia conditions in prodromal MCI phase, and in the exclusion of progression, outperforming CSF biomarkers.

  6. Selected-ion flow-tube mass-spectrometry (SIFT-MS) fingerprinting versus chemical profiling for geographic traceability of Moroccan Argan oils.

    PubMed

    Kharbach, Mourad; Kamal, Rabie; Mansouri, Mohammed Alaoui; Marmouzi, Ilias; Viaene, Johan; Cherrah, Yahia; Alaoui, Katim; Vercammen, Joeri; Bouklouze, Abdelaziz; Vander Heyden, Yvan

    2018-10-15

    This study investigated the effectiveness of SIFT-MS versus chemical profiling, both coupled to multivariate data analysis, to classify 95 Extra Virgin Argan Oils (EVAO), originating from five Moroccan Argan forest locations. The full scan option of SIFT-MS, is suitable to indicate the geographic origin of EVAO based on the fingerprints obtained using the three chemical ionization precursors (H 3 O + , NO + and O 2 + ). The chemical profiling (including acidity, peroxide value, spectrophotometric indices, fatty acids, tocopherols- and sterols composition) was also used for classification. Partial least squares discriminant analysis (PLS-DA), soft independent modeling of class analogy (SIMCA), K-nearest neighbors (KNN), and support vector machines (SVM), were compared. The SIFT-MS data were therefore fed to variable-selection methods to find potential biomarkers for classification. The classification models based either on chemical profiling or SIFT-MS data were able to classify the samples with high accuracy. SIFT-MS was found to be advantageous for rapid geographic classification. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. Predictors of clinical-pathologic stage discrepancy in oral cavity squamous cell carcinoma: A National Cancer Database study.

    PubMed

    Kılıç, Sarah S; Kılıç, Suat; Crippen, Meghan M; Varughese, Denny; Eloy, Jean Anderson; Baredes, Soly; Mahmoud, Omar M; Park, Richard Chan Woo

    2018-04-01

    Few studies have examined the frequency and survival implications of clinicopathologic stage discrepancy in oral cavity squamous cell carcinoma (SCC). Oral cavity SCC cases with full pathologic staging information were identified in the National Cancer Database (NCDB). Clinical and pathologic stages were compared. Multivariate logistic regressions were performed to identify factors associated with stage discrepancy. There were 9110 cases identified, of which 67.3% of the cases were stage concordant, 19.9% were upstaged, and 12.8% were downstaged. The N classification discordance (28.5%) was more common than T classification discordance (27.6%). In cases of T classification discordance, downstaging is more common than upstaging (15.4% vs 12.1% of cases), but in cases of N classification discordance, the reverse is true; upstaging is much more common than downstaging (20.1 vs 8.4% of cases). Clinicopathologic stage discrepancy in oral cavity SCC is a common phenomenon that is associated with a number of clinical factors and has survival implications. © 2018 Wiley Periodicals, Inc.

  8. A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

    PubMed

    Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

    2015-01-01

    The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.

  9. Multivariate classification of small order watersheds in the Quabbin Reservoir Basin, Massachusetts

    USGS Publications Warehouse

    Lent, R.M.; Waldron, M.C.; Rader, J.C.

    1998-01-01

    A multivariate approach was used to analyze hydrologic, geologic, geographic, and water-chemistry data from small order watersheds in the Quabbin Reservoir Basin in central Massachusetts. Eighty three small order watersheds were delineated and landscape attributes defining hydrologic, geologic, and geographic features of the watersheds were compiled from geographic information system data layers. Principal components analysis was used to evaluate 11 chemical constituents collected bi-weekly for 1 year at 15 surface-water stations in order to subdivide the basin into subbasins comprised of watersheds with similar water quality characteristics. Three principal components accounted for about 90 percent of the variance in water chemistry data. The principal components were defined as a biogeochemical variable related to wetland density, an acid-neutralization variable, and a road-salt variable related to density of primary roads. Three subbasins were identified. Analysis of variance and multiple comparisons of means were used to identify significant differences in stream water chemistry and landscape attributes among subbasins. All stream water constituents were significantly different among subbasins. Multiple regression techniques were used to relate stream water chemistry to landscape attributes. Important differences in landscape attributes were related to wetlands, slope, and soil type.A multivariate approach was used to analyze hydrologic, geologic, geographic, and water-chemistry data from small order watersheds in the Quabbin Reservoir Basin in central Massachusetts. Eighty three small order watersheds were delineated and landscape attributes defining hydrologic, geologic, and geographic features of the watersheds were compiled from geographic information system data layers. Principal components analysis was used to evaluate 11 chemical constituents collected bi-weekly for 1 year at 15 surface-water stations in order to subdivide the basin into subbasins comprised of watersheds with similar water quality characteristics. Three principal components accounted for about 90 percent of the variance in water chemistry data. The principal components were defined as a biogeochemical variable related to wetland density, an acid-neutralization variable, and a road-salt variable related to density of primary roads. Three subbasins were identified. Analysis of variance and multiple comparisons of means were used to identify significant differences in stream water chemistry and landscape attributes among subbasins. All stream water constituents were significantly different among subbasins. Multiple regression techniques were used to relate stream water chemistry to landscape attributes. Important differences in landscape attributes were related to wetlands, slope, and soil type.

  10. Gold-standard for computer-assisted morphological sperm analysis.

    PubMed

    Chang, Violeta; Garcia, Alejandra; Hitschfeld, Nancy; Härtel, Steffen

    2017-04-01

    Published algorithms for classification of human sperm heads are based on relatively small image databases that are not open to the public, and thus no direct comparison is available for competing methods. We describe a gold-standard for morphological sperm analysis (SCIAN-MorphoSpermGS), a dataset of sperm head images with expert-classification labels in one of the following classes: normal, tapered, pyriform, small or amorphous. This gold-standard is for evaluating and comparing known techniques and future improvements to present approaches for classification of human sperm heads for semen analysis. Although this paper does not provide a computational tool for morphological sperm analysis, we present a set of experiments for comparing sperm head description and classification common techniques. This classification base-line is aimed to be used as a reference for future improvements to present approaches for human sperm head classification. The gold-standard provides a label for each sperm head, which is achieved by majority voting among experts. The classification base-line compares four supervised learning methods (1- Nearest Neighbor, naive Bayes, decision trees and Support Vector Machine (SVM)) and three shape-based descriptors (Hu moments, Zernike moments and Fourier descriptors), reporting the accuracy and the true positive rate for each experiment. We used Fleiss' Kappa Coefficient to evaluate the inter-expert agreement and Fisher's exact test for inter-expert variability and statistical significant differences between descriptors and learning techniques. Our results confirm the high degree of inter-expert variability in the morphological sperm analysis. Regarding the classification base line, we show that none of the standard descriptors or classification approaches is best suitable for tackling the problem of sperm head classification. We discovered that the correct classification rate was highly variable when trying to discriminate among non-normal sperm heads. By using the Fourier descriptor and SVM, we achieved the best mean correct classification: only 49%. We conclude that the SCIAN-MorphoSpermGS will provide a standard tool for evaluation of characterization and classification approaches for human sperm heads. Indeed, there is a clear need for a specific shape-based descriptor for human sperm heads and a specific classification approach to tackle the problem of high variability within subcategories of abnormal sperm cells. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Comparative Approach of MRI-Based Brain Tumor Segmentation and Classification Using Genetic Algorithm.

    PubMed

    Bahadure, Nilesh Bhaskarrao; Ray, Arun Kumar; Thethi, Har Pal

    2018-01-17

    The detection of a brain tumor and its classification from modern imaging modalities is a primary concern, but a time-consuming and tedious work was performed by radiologists or clinical supervisors. The accuracy of detection and classification of tumor stages performed by radiologists is depended on their experience only, so the computer-aided technology is very important to aid with the diagnosis accuracy. In this study, to improve the performance of tumor detection, we investigated comparative approach of different segmentation techniques and selected the best one by comparing their segmentation score. Further, to improve the classification accuracy, the genetic algorithm is employed for the automatic classification of tumor stage. The decision of classification stage is supported by extracting relevant features and area calculation. The experimental results of proposed technique are evaluated and validated for performance and quality analysis on magnetic resonance brain images, based on segmentation score, accuracy, sensitivity, specificity, and dice similarity index coefficient. The experimental results achieved 92.03% accuracy, 91.42% specificity, 92.36% sensitivity, and an average segmentation score between 0.82 and 0.93 demonstrating the effectiveness of the proposed technique for identifying normal and abnormal tissues from brain MR images. The experimental results also obtained an average of 93.79% dice similarity index coefficient, which indicates better overlap between the automated extracted tumor regions with manually extracted tumor region by radiologists.

  12. Mapping of rock types using a joint approach by combining the multivariate statistics, self-organizing map and Bayesian neural networks: an example from IODP 323 site

    NASA Astrophysics Data System (ADS)

    Karmakar, Mampi; Maiti, Saumen; Singh, Amrita; Ojha, Maheswar; Maity, Bhabani Sankar

    2017-07-01

    Modeling and classification of the subsurface lithology is very important to understand the evolution of the earth system. However, precise classification and mapping of lithology using a single framework are difficult due to the complexity and the nonlinearity of the problem driven by limited core sample information. Here, we implement a joint approach by combining the unsupervised and the supervised methods in a single framework for better classification and mapping of rock types. In the unsupervised method, we use the principal component analysis (PCA), K-means cluster analysis (K-means), dendrogram analysis, Fuzzy C-means (FCM) cluster analysis and self-organizing map (SOM). In the supervised method, we use the Bayesian neural networks (BNN) optimized by the Hybrid Monte Carlo (HMC) (BNN-HMC) and the scaled conjugate gradient (SCG) (BNN-SCG) techniques. We use P-wave velocity, density, neutron porosity, resistivity and gamma ray logs of the well U1343E of the Integrated Ocean Drilling Program (IODP) Expedition 323 in the Bering Sea slope region. While the SOM algorithm allows us to visualize the clustering results in spatial domain, the combined classification schemes (supervised and unsupervised) uncover the different patterns of lithology such of as clayey-silt, diatom-silt and silty-clay from an un-cored section of the drilled hole. In addition, the BNN approach is capable of estimating uncertainty in the predictive modeling of three types of rocks over the entire lithology section at site U1343. Alternate succession of clayey-silt, diatom-silt and silty-clay may be representative of crustal inhomogeneity in general and thus could be a basis for detail study related to the productivity of methane gas in the oceans worldwide. Moreover, at the 530 m depth down below seafloor (DSF), the transition from Pliocene to Pleistocene could be linked to lithological alternation between the clayey-silt and the diatom-silt. The present results could provide the basis for the detailed study to get deeper insight into the Bering Sea' sediment deposition and sequence.

  13. Semantic Classification of Diseases in Discharge Summaries Using a Context-aware Rule-based Classifier

    PubMed Central

    Solt, Illés; Tikk, Domonkos; Gál, Viktor; Kardkovács, Zsolt T.

    2009-01-01

    Objective Automated and disease-specific classification of textual clinical discharge summaries is of great importance in human life science, as it helps physicians to make medical studies by providing statistically relevant data for analysis. This can be further facilitated if, at the labeling of discharge summaries, semantic labels are also extracted from text, such as whether a given disease is present, absent, questionable in a patient, or is unmentioned in the document. The authors present a classification technique that successfully solves the semantic classification task. Design The authors introduce a context-aware rule-based semantic classification technique for use on clinical discharge summaries. The classification is performed in subsequent steps. First, some misleading parts are removed from the text; then the text is partitioned into positive, negative, and uncertain context segments, then a sequence of binary classifiers is applied to assign the appropriate semantic labels. Measurement For evaluation the authors used the documents of the i2b2 Obesity Challenge and adopted its evaluation measures: F1-macro and F1-micro for measurements. Results On the two subtasks of the Obesity Challenge (textual and intuitive classification) the system performed very well, and achieved a F1-macro = 0.80 for the textual and F1-macro = 0.67 for the intuitive tasks, and obtained second place at the textual and first place at the intuitive subtasks of the challenge. Conclusions The authors show in the paper that a simple rule-based classifier can tackle the semantic classification task more successfully than machine learning techniques, if the training data are limited and some semantic labels are very sparse. PMID:19390101

  14. Identification of Reliable Components in Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS): a Data-Driven Approach across Metabolic Processes.

    PubMed

    Motegi, Hiromi; Tsuboi, Yuuri; Saga, Ayako; Kagami, Tomoko; Inoue, Maki; Toki, Hideaki; Minowa, Osamu; Noda, Tetsuo; Kikuchi, Jun

    2015-11-04

    There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. In addition to classical analyses such as hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis, various multivariate strategies, including independent component analysis, non-negative matrix factorization, and multivariate curve resolution, have recently been proposed. However, determining the number of components is problematic. Despite the proposal of several different methods, no satisfactory approach has yet been reported. To resolve this problem, we implemented a new idea: classifying a component as "reliable" or "unreliable" based on the reproducibility of its appearance, regardless of the number of components in the calculation. Using the clustering method for classification, we applied this idea to multivariate curve resolution-alternating least squares (MCR-ALS). Comparisons between conventional and modified methods applied to proton nuclear magnetic resonance ((1)H-NMR) spectral datasets derived from known standard mixtures and biological mixtures (urine and feces of mice) revealed that more plausible results are obtained by the modified method. In particular, clusters containing little information were detected with reliability. This strategy, named "cluster-aided MCR-ALS," will facilitate the attainment of more reliable results in the metabolomics datasets.

  15. Mapping Informative Clusters in a Hierarchial Framework of fMRI Multivariate Analysis

    PubMed Central

    Xu, Rui; Zhen, Zonglei; Liu, Jia

    2010-01-01

    Pattern recognition methods have become increasingly popular in fMRI data analysis, which are powerful in discriminating between multi-voxel patterns of brain activities associated with different mental states. However, when they are used in functional brain mapping, the location of discriminative voxels varies significantly, raising difficulties in interpreting the locus of the effect. Here we proposed a hierarchical framework of multivariate approach that maps informative clusters rather than voxels to achieve reliable functional brain mapping without compromising the discriminative power. In particular, we first searched for local homogeneous clusters that consisted of voxels with similar response profiles. Then, a multi-voxel classifier was built for each cluster to extract discriminative information from the multi-voxel patterns. Finally, through multivariate ranking, outputs from the classifiers were served as a multi-cluster pattern to identify informative clusters by examining interactions among clusters. Results from both simulated and real fMRI data demonstrated that this hierarchical approach showed better performance in the robustness of functional brain mapping than traditional voxel-based multivariate methods. In addition, the mapped clusters were highly overlapped for two perceptually equivalent object categories, further confirming the validity of our approach. In short, the hierarchical framework of multivariate approach is suitable for both pattern classification and brain mapping in fMRI studies. PMID:21152081

  16. Application of multivariate statistical techniques for differentiation of ripe banana flour based on the composition of elements.

    PubMed

    Alkarkhi, Abbas F M; Ramli, Saifullah Bin; Easa, Azhar Mat

    2009-01-01

    Major (sodium, potassium, calcium, magnesium) and minor elements (iron, copper, zinc, manganese) and one heavy metal (lead) of Cavendish banana flour and Dream banana flour were determined, and data were analyzed using multivariate statistical techniques of factor analysis and discriminant analysis. Factor analysis yielded four factors explaining more than 81% of the total variance: the first factor explained 28.73%, comprising magnesium, sodium, and iron; the second factor explained 21.47%, comprising only manganese and copper; the third factor explained 15.66%, comprising zinc and lead; while the fourth factor explained 15.50%, comprising potassium. Discriminant analysis showed that magnesium and sodium exhibited a strong contribution in discriminating the two types of banana flour, affording 100% correct assignation. This study presents the usefulness of multivariate statistical techniques for analysis and interpretation of complex mineral content data from banana flour of different varieties.

  17. BANYAN. XI. The BANYAN Σ Multivariate Bayesian Algorithm to Identify Members of Young Associations with 150 pc

    NASA Astrophysics Data System (ADS)

    Gagné, Jonathan; Mamajek, Eric E.; Malo, Lison; Riedel, Adric; Rodriguez, David; Lafrenière, David; Faherty, Jacqueline K.; Roy-Loubier, Olivier; Pueyo, Laurent; Robin, Annie C.; Doyon, René

    2018-03-01

    BANYAN Σ is a new Bayesian algorithm to identify members of young stellar associations within 150 pc of the Sun. It includes 27 young associations with ages in the range ∼1–800 Myr, modeled with multivariate Gaussians in six-dimensional (6D) XYZUVW space. It is the first such multi-association classification tool to include the nearest sub-groups of the Sco-Cen OB star-forming region, the IC 2602, IC 2391, Pleiades and Platais 8 clusters, and the ρ Ophiuchi, Corona Australis, and Taurus star formation regions. A model of field stars is built from a mixture of multivariate Gaussians based on the Besançon Galactic model. The algorithm can derive membership probabilities for objects with only sky coordinates and proper motion, but can also include parallax and radial velocity measurements, as well as spectrophotometric distance constraints from sequences in color–magnitude or spectral type–magnitude diagrams. BANYAN Σ benefits from an analytical solution to the Bayesian marginalization integrals over unknown radial velocities and distances that makes it more accurate and significantly faster than its predecessor BANYAN II. A contamination versus hit rate analysis is presented and demonstrates that BANYAN Σ achieves a better classification performance than other moving group tools available in the literature, especially in terms of cross-contamination between young associations. An updated list of bona fide members in the 27 young associations, augmented by the Gaia-DR1 release, as well as all parameters for the 6D multivariate Gaussian models for each association and the Galactic field neighborhood within 300 pc are presented. This new tool will make it possible to analyze large data sets such as the upcoming Gaia-DR2 to identify new young stars. IDL and Python versions of BANYAN Σ are made available with this publication, and a more limited online web tool is available at http://www.exoplanetes.umontreal.ca/banyan/banyansigma.php.

  18. Multivariate logistic regression analysis of postoperative complications and risk model establishment of gastrectomy for gastric cancer: A single-center cohort report.

    PubMed

    Zhou, Jinzhe; Zhou, Yanbing; Cao, Shougen; Li, Shikuan; Wang, Hao; Niu, Zhaojian; Chen, Dong; Wang, Dongsheng; Lv, Liang; Zhang, Jian; Li, Yu; Jiao, Xuelong; Tan, Xiaojie; Zhang, Jianli; Wang, Haibo; Zhang, Bingyuan; Lu, Yun; Sun, Zhenqing

    2016-01-01

    Reporting of surgical complications is common, but few provide information about the severity and estimate risk factors of complications. If have, but lack of specificity. We retrospectively analyzed data on 2795 gastric cancer patients underwent surgical procedure at the Affiliated Hospital of Qingdao University between June 2007 and June 2012, established multivariate logistic regression model to predictive risk factors related to the postoperative complications according to the Clavien-Dindo classification system. Twenty-four out of 86 variables were identified statistically significant in univariate logistic regression analysis, 11 significant variables entered multivariate analysis were employed to produce the risk model. Liver cirrhosis, diabetes mellitus, Child classification, invasion of neighboring organs, combined resection, introperative transfusion, Billroth II anastomosis of reconstruction, malnutrition, surgical volume of surgeons, operating time and age were independent risk factors for postoperative complications after gastrectomy. Based on logistic regression equation, p=Exp∑BiXi / (1+Exp∑BiXi), multivariate logistic regression predictive model that calculated the risk of postoperative morbidity was developed, p = 1/(1 + e((4.810-1.287X1-0.504X2-0.500X3-0.474X4-0.405X5-0.318X6-0.316X7-0.305X8-0.278X9-0.255X10-0.138X11))). The accuracy, sensitivity and specificity of the model to predict the postoperative complications were 86.7%, 76.2% and 88.6%, respectively. This risk model based on Clavien-Dindo grading severity of complications system and logistic regression analysis can predict severe morbidity specific to an individual patient's risk factors, estimate patients' risks and benefits of gastric surgery as an accurate decision-making tool and may serve as a template for the development of risk models for other surgical groups.

  19. Comprehensive analysis of Polygoni Multiflori Radix of different geographical origins using ultra-high-performance liquid chromatography fingerprints and multivariate chemometric methods.

    PubMed

    Sun, Li-Li; Wang, Meng; Zhang, Hui-Jie; Liu, Ya-Nan; Ren, Xiao-Liang; Deng, Yan-Ru; Qi, Ai-Di

    2018-01-01

    Polygoni Multiflori Radix (PMR) is increasingly being used not just as a traditional herbal medicine but also as a popular functional food. In this study, multivariate chemometric methods and mass spectrometry were combined to analyze the ultra-high-performance liquid chromatograph (UPLC) fingerprints of PMR from six different geographical origins. A chemometric strategy based on multivariate curve resolution-alternating least squares (MCR-ALS) and three classification methods is proposed to analyze the UPLC fingerprints obtained. Common chromatographic problems, including the background contribution, baseline contribution, and peak overlap, were handled by the established MCR-ALS model. A total of 22 components were resolved. Moreover, relative species concentrations were obtained from the MCR-ALS model, which was used for multivariate classification analysis. Principal component analysis (PCA) and Ward's method have been applied to classify 72 PMR samples from six different geographical regions. The PCA score plot showed that the PMR samples fell into four clusters, which related to the geographical location and climate of the source areas. The results were then corroborated by Ward's method. In addition, according to the variance-weighted distance between cluster centers obtained from Ward's method, five components were identified as the most significant variables (chemical markers) for cluster discrimination. A counter-propagation artificial neural network has been applied to confirm and predict the effects of chemical markers on different samples. Finally, the five chemical markers were identified by UPLC-quadrupole time-of-flight mass spectrometer. Components 3, 12, 16, 18, and 19 were identified as 2,3,5,4'-tetrahydroxy-stilbene-2-O-β-d-glucoside, emodin-8-O-β-d-glucopyranoside, emodin-8-O-(6'-O-acetyl)-β-d-glucopyranoside, emodin, and physcion, respectively. In conclusion, the proposed method can be applied for the comprehensive analysis of natural samples. Copyright © 2016. Published by Elsevier B.V.

  20. Predictive factors in patients with hepatocellular carcinoma receiving sorafenib therapy using time-dependent receiver operating characteristic analysis.

    PubMed

    Nishikawa, Hiroki; Nishijima, Norihiro; Enomoto, Hirayuki; Sakamoto, Azusa; Nasu, Akihiro; Komekado, Hideyuki; Nishimura, Takashi; Kita, Ryuichi; Kimura, Toru; Iijima, Hiroko; Nishiguchi, Shuhei; Osaki, Yukio

    2017-01-01

    To investigate variables before sorafenib therapy on the clinical outcomes in hepatocellular carcinoma (HCC) patients receiving sorafenib and to further assess and compare the predictive performance of continuous parameters using time-dependent receiver operating characteristics (ROC) analysis. A total of 225 HCC patients were analyzed. We retrospectively examined factors related to overall survival (OS) and progression free survival (PFS) using univariate and multivariate analyses. Subsequently, we performed time-dependent ROC analysis of continuous parameters which were significant in the multivariate analysis in terms of OS and PFS. Total sum of area under the ROC in all time points (defined as TAAT score) in each case was calculated. Our cohort included 175 male and 50 female patients (median age, 72 years) and included 158 Child-Pugh A and 67 Child-Pugh B patients. The median OS time was 0.68 years, while the median PFS time was 0.24 years. On multivariate analysis, gender, body mass index (BMI), Child-Pugh classification, extrahepatic metastases, tumor burden, aspartate aminotransferase (AST) and alpha-fetoprotein (AFP) were identified as significant predictors of OS and ECOG-performance status, Child-Pugh classification and extrahepatic metastases were identified as significant predictors of PFS. Among three continuous variables (i.e., BMI, AST and AFP), AFP had the highest TAAT score for the entire cohort. In subgroup analyses, AFP had the highest TAAT score except for Child-Pugh B and female among three continuous variables. In continuous variables, AFP could have higher predictive accuracy for survival in HCC patients undergoing sorafenib therapy.

Top