Addressing multi-label imbalance problem of surgical tool detection using CNN.
Sahu, Manish; Mukhopadhyay, Anirban; Szengel, Angelika; Zachow, Stefan
2017-06-01
A fully automated surgical tool detection framework is proposed for endoscopic video streams. State-of-the-art surgical tool detection methods rely on supervised one-vs-all or multi-class classification techniques, completely ignoring the co-occurrence relationship of the tools and the associated class imbalance. In this paper, we formulate tool detection as a multi-label classification task where tool co-occurrences are treated as separate classes. In addition, imbalance on tool co-occurrences is analyzed and stratification techniques are employed to address the imbalance during convolutional neural network (CNN) training. Moreover, temporal smoothing is introduced as an online post-processing step to enhance runtime prediction. Quantitative analysis is performed on the M2CAI16 tool detection dataset to highlight the importance of stratification, temporal smoothing and the overall framework for tool detection. The analysis on tool imbalance, backed by the empirical results, indicates the need and superiority of the proposed framework over state-of-the-art techniques.
Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification
Huang, Lingkang; Zhang, Hao Helen; Zeng, Zhao-Bang; Bushel, Pierre R.
2013-01-01
Background Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention. Availability: The source MATLAB code are available from http://math.arizona.edu/~hzhang/software.html. PMID:23966761
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition
Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina
2007-01-01
Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145
Support vector machines-based fault diagnosis for turbo-pump rotor
NASA Astrophysics Data System (ADS)
Yuan, Sheng-Fa; Chu, Fu-Lei
2006-05-01
Most artificial intelligence methods used in fault diagnosis are based on empirical risk minimisation principle and have poor generalisation when fault samples are few. Support vector machines (SVM) is a new general machine-learning tool based on structural risk minimisation principle that exhibits good generalisation even when fault samples are few. Fault diagnosis based on SVM is discussed. Since basic SVM is originally designed for two-class classification, while most of fault diagnosis problems are multi-class cases, a new multi-class classification of SVM named 'one to others' algorithm is presented to solve the multi-class recognition problems. It is a binary tree classifier composed of several two-class classifiers organised by fault priority, which is simple, and has little repeated training amount, and the rate of training and recognition is expedited. The effectiveness of the method is verified by the application to the fault diagnosis for turbo pump rotor.
Steganalysis feature improvement using expectation maximization
NASA Astrophysics Data System (ADS)
Rodriguez, Benjamin M.; Peterson, Gilbert L.; Agaian, Sos S.
2007-04-01
Images and data files provide an excellent opportunity for concealing illegal or clandestine material. Currently, there are over 250 different tools which embed data into an image without causing noticeable changes to the image. From a forensics perspective, when a system is confiscated or an image of a system is generated the investigator needs a tool that can scan and accurately identify files suspected of containing malicious information. The identification process is termed the steganalysis problem which focuses on both blind identification, in which only normal images are available for training, and multi-class identification, in which both the clean and stego images at several embedding rates are available for training. In this paper an investigation of a clustering and classification technique (Expectation Maximization with mixture models) is used to determine if a digital image contains hidden information. The steganalysis problem is for both anomaly detection and multi-class detection. The various clusters represent clean images and stego images with between 1% and 10% embedding percentage. Based on the results it is concluded that the EM classification technique is highly suitable for both blind detection and the multi-class problem.
NASA Astrophysics Data System (ADS)
Ma, Weiwei; Gong, Cailan; Hu, Yong; Li, Long; Meng, Peng
2015-10-01
Remote sensing technology has been broadly recognized for its convenience and efficiency in mapping vegetation, particularly in high-altitude and inaccessible areas where there are lack of in-situ observations. In this study, Landsat Thematic Mapper (TM) images and Chinese environmental mitigation satellite CCD sensor (HJ-1 CCD) images, both of which are at 30m spatial resolution were employed for identifying and monitoring of vegetation types in a area of Western China——Qinghai Lake Watershed(QHLW). A decision classification tree (DCT) algorithm using multi-characteristic including seasonal TM/HJ-1 CCD time series data combined with digital elevation models (DEMs) dataset, and a supervised maximum likelihood classification (MLC) algorithm with single-data TM image were applied vegetation classification. Accuracy of the two algorithms was assessed using field observation data. Based on produced vegetation classification maps, it was found that the DCT using multi-season data and geomorphologic parameters was superior to the MLC algorithm using single-data image, improving the overall accuracy by 11.86% at second class level and significantly reducing the "salt and pepper" noise. The DCT algorithm applied to TM /HJ-1 CCD time series data geomorphologic parameters appeared as a valuable and reliable tool for monitoring vegetation at first class level (5 vegetation classes) and second class level(8 vegetation subclasses). The DCT algorithm using multi-characteristic might provide a theoretical basis and general approach to automatic extraction of vegetation types from remote sensing imagery over plateau areas.
Jahandideh, Samad; Srinivasasainagendra, Vinodh; Zhi, Degui
2012-11-07
RNA-protein interaction plays an important role in various cellular processes, such as protein synthesis, gene regulation, post-transcriptional gene regulation, alternative splicing, and infections by RNA viruses. In this study, using Gene Ontology Annotated (GOA) and Structural Classification of Proteins (SCOP) databases an automatic procedure was designed to capture structurally solved RNA-binding protein domains in different subclasses. Subsequently, we applied tuned multi-class SVM (TMCSVM), Random Forest (RF), and multi-class ℓ1/ℓq-regularized logistic regression (MCRLR) for analysis and classifying RNA-binding protein domains based on a comprehensive set of sequence and structural features. In this study, we compared prediction accuracy of three different state-of-the-art predictor methods. From our results, TMCSVM outperforms the other methods and suggests the potential of TMCSVM as a useful tool for facilitating the multi-class prediction of RNA-binding protein domains. On the other hand, MCRLR by elucidating importance of features for their contribution in predictive accuracy of RNA-binding protein domains subclasses, helps us to provide some biological insights into the roles of sequences and structures in protein-RNA interactions.
Semi-supervised classification tool for DubaiSat-2 multispectral imagery
NASA Astrophysics Data System (ADS)
Al-Mansoori, Saeed
2015-10-01
This paper addresses a semi-supervised classification tool based on a pixel-based approach of the multi-spectral satellite imagery. There are not many studies demonstrating such algorithm for the multispectral images, especially when the image consists of 4 bands (Red, Green, Blue and Near Infrared) as in DubaiSat-2 satellite images. The proposed approach utilizes both unsupervised and supervised classification schemes sequentially to identify four classes in the image, namely, water bodies, vegetation, land (developed and undeveloped areas) and paved areas (i.e. roads). The unsupervised classification concept is applied to identify two classes; water bodies and vegetation, based on a well-known index that uses the distinct wavelengths of visible and near-infrared sunlight that is absorbed and reflected by the plants to identify the classes; this index parameter is called "Normalized Difference Vegetation Index (NDVI)". Afterward, the supervised classification is performed by selecting training homogenous samples for roads and land areas. Here, a precise selection of training samples plays a vital role in the classification accuracy. Post classification is finally performed to enhance the classification accuracy, where the classified image is sieved, clumped and filtered before producing final output. Overall, the supervised classification approach produced higher accuracy than the unsupervised method. This paper shows some current preliminary research results which point out the effectiveness of the proposed technique in a virtual perspective.
NASA Technical Reports Server (NTRS)
Mehta, N. C.
1984-01-01
The utility of radar scatterometers for discrimination and characterization of natural vegetation was investigated. Backscatter measurements were acquired with airborne multi-frequency, multi-polarization, multi-angle radar scatterometers over a test site in a southern temperate forest. Separability between ground cover classes was studied using a two-class separability measure. Very good separability is achieved between most classes. Longer wavelength is useful in separating trees from non-tree classes, while shorter wavelength and cross polarization are helpful for discrimination among tree classes. Using the maximum likelihood classifier, 50% overall classification accuracy is achieved using a single, short-wavelength scatterometer channel. Addition of multiple incidence angles and another radar band improves classification accuracy by 20% and 50%, respectively, over the single channel accuracy. Incorporation of a third radar band seems redundant for vegetation classification. Vertical transmit polarization is critically important for all classes.
Deep learning architectures for multi-label classification of intelligent health risk prediction.
Maxwell, Andrew; Li, Runzhi; Yang, Bei; Weng, Heng; Ou, Aihua; Hong, Huixiao; Zhou, Zhaoxian; Gong, Ping; Zhang, Chaoyang
2017-12-28
Multi-label classification of data remains to be a challenging problem. Because of the complexity of the data, it is sometimes difficult to infer information about classes that are not mutually exclusive. For medical data, patients could have symptoms of multiple different diseases at the same time and it is important to develop tools that help to identify problems early. Intelligent health risk prediction models built with deep learning architectures offer a powerful tool for physicians to identify patterns in patient data that indicate risks associated with certain types of chronic diseases. Physical examination records of 110,300 anonymous patients were used to predict diabetes, hypertension, fatty liver, a combination of these three chronic diseases, and the absence of disease (8 classes in total). The dataset was split into training (90%) and testing (10%) sub-datasets. Ten-fold cross validation was used to evaluate prediction accuracy with metrics such as precision, recall, and F-score. Deep Learning (DL) architectures were compared with standard and state-of-the-art multi-label classification methods. Preliminary results suggest that Deep Neural Networks (DNN), a DL architecture, when applied to multi-label classification of chronic diseases, produced accuracy that was comparable to that of common methods such as Support Vector Machines. We have implemented DNNs to handle both problem transformation and algorithm adaption type multi-label methods and compare both to see which is preferable. Deep Learning architectures have the potential of inferring more information about the patterns of physical examination data than common classification methods. The advanced techniques of Deep Learning can be used to identify the significance of different features from physical examination data as well as to learn the contributions of each feature that impact a patient's risk for chronic diseases. However, accurate prediction of chronic disease risks remains a challenging problem that warrants further studies.
Retinex Preprocessing for Improved Multi-Spectral Image Classification
NASA Technical Reports Server (NTRS)
Thompson, B.; Rahman, Z.; Park, S.
2000-01-01
The goal of multi-image classification is to identify and label "similar regions" within a scene. The ability to correctly classify a remotely sensed multi-image of a scene is affected by the ability of the classification process to adequately compensate for the effects of atmospheric variations and sensor anomalies. Better classification may be obtained if the multi-image is preprocessed before classification, so as to reduce the adverse effects of image formation. In this paper, we discuss the overall impact on multi-spectral image classification when the retinex image enhancement algorithm is used to preprocess multi-spectral images. The retinex is a multi-purpose image enhancement algorithm that performs dynamic range compression, reduces the dependence on lighting conditions, and generally enhances apparent spatial resolution. The retinex has been successfully applied to the enhancement of many different types of grayscale and color images. We show in this paper that retinex preprocessing improves the spatial structure of multi-spectral images and thus provides better within-class variations than would otherwise be obtained without the preprocessing. For a series of multi-spectral images obtained with diffuse and direct lighting, we show that without retinex preprocessing the class spectral signatures vary substantially with the lighting conditions. Whereas multi-dimensional clustering without preprocessing produced one-class homogeneous regions, the classification on the preprocessed images produced multi-class non-homogeneous regions. This lack of homogeneity is explained by the interaction between different agronomic treatments applied to the regions: the preprocessed images are closer to ground truth. The principle advantage that the retinex offers is that for different lighting conditions classifications derived from the retinex preprocessed images look remarkably "similar", and thus more consistent, whereas classifications derived from the original images, without preprocessing, are much less similar.
Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I
2017-01-01
A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.
Combining multiple decisions: applications to bioinformatics
NASA Astrophysics Data System (ADS)
Yukinawa, N.; Takenouchi, T.; Oba, S.; Ishii, S.
2008-01-01
Multi-class classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. This article reviews two recent approaches to multi-class classification by combining multiple binary classifiers, which are formulated based on a unified framework of error-correcting output coding (ECOC). The first approach is to construct a multi-class classifier in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. In the second approach, misclassification of each binary classifier is formulated as a bit inversion error with a probabilistic model by making an analogy to the context of information transmission theory. Experimental studies using various real-world datasets including cancer classification problems reveal that both of the new methods are superior or comparable to other multi-class classification methods.
NASA Astrophysics Data System (ADS)
Gao, Lin; Cheng, Wei; Zhang, Jinhua; Wang, Jue
2016-08-01
Brain-computer interface (BCI) systems provide an alternative communication and control approach for people with limited motor function. Therefore, the feature extraction and classification approach should differentiate the relative unusual state of motion intention from a common resting state. In this paper, we sought a novel approach for multi-class classification in BCI applications. We collected electroencephalographic (EEG) signals registered by electrodes placed over the scalp during left hand motor imagery, right hand motor imagery, and resting state for ten healthy human subjects. We proposed using the Kolmogorov complexity (Kc) for feature extraction and a multi-class Adaboost classifier with extreme learning machine as base classifier for classification, in order to classify the three-class EEG samples. An average classification accuracy of 79.5% was obtained for ten subjects, which greatly outperformed commonly used approaches. Thus, it is concluded that the proposed method could improve the performance for classification of motor imagery tasks for multi-class samples. It could be applied in further studies to generate the control commands to initiate the movement of a robotic exoskeleton or orthosis, which finally facilitates the rehabilitation of disabled people.
Faust, Kevin; Xie, Quin; Han, Dominick; Goyle, Kartikay; Volynskaya, Zoya; Djuric, Ugljesa; Diamandis, Phedias
2018-05-16
There is growing interest in utilizing artificial intelligence, and particularly deep learning, for computer vision in histopathology. While accumulating studies highlight expert-level performance of convolutional neural networks (CNNs) on focused classification tasks, most studies rely on probability distribution scores with empirically defined cutoff values based on post-hoc analysis. More generalizable tools that allow humans to visualize histology-based deep learning inferences and decision making are scarce. Here, we leverage t-distributed Stochastic Neighbor Embedding (t-SNE) to reduce dimensionality and depict how CNNs organize histomorphologic information. Unique to our workflow, we develop a quantitative and transparent approach to visualizing classification decisions prior to softmax compression. By discretizing the relationships between classes on the t-SNE plot, we show we can super-impose randomly sampled regions of test images and use their distribution to render statistically-driven classifications. Therefore, in addition to providing intuitive outputs for human review, this visual approach can carry out automated and objective multi-class classifications similar to more traditional and less-transparent categorical probability distribution scores. Importantly, this novel classification approach is driven by a priori statistically defined cutoffs. It therefore serves as a generalizable classification and anomaly detection tool less reliant on post-hoc tuning. Routine incorporation of this convenient approach for quantitative visualization and error reduction in histopathology aims to accelerate early adoption of CNNs into generalized real-world applications where unanticipated and previously untrained classes are often encountered.
Automatic interpretation of ERTS data for forest management
NASA Technical Reports Server (NTRS)
Kirvida, L.; Johnson, G. R.
1973-01-01
Automatic stratification of forested land from ERTS-1 data provides a valuable tool for resource management. The results are useful for wood product yield estimates, recreation and wild life management, forest inventory and forest condition monitoring. Automatic procedures based on both multi-spectral and spatial features are evaluated. With five classes, training and testing on the same samples, classification accuracy of 74% was achieved using the MSS multispectral features. When adding texture computed from 8 x 8 arrays, classification accuracy of 99% was obtained.
ERIC Educational Resources Information Center
Albaqshi, Amani Mohammed H.
2017-01-01
Functional Data Analysis (FDA) has attracted substantial attention for the last two decades. Within FDA, classifying curves into two or more categories is consistently of interest to scientists, but multi-class prediction within FDA is challenged in that most classification tools have been limited to binary response applications. The functional…
Increasing CAD system efficacy for lung texture analysis using a convolutional network
NASA Astrophysics Data System (ADS)
Tarando, Sebastian Roberto; Fetita, Catalin; Faccinetto, Alex; Brillet, Pierre-Yves
2016-03-01
The infiltrative lung diseases are a class of irreversible, non-neoplastic lung pathologies requiring regular follow-up with CT imaging. Quantifying the evolution of the patient status imposes the development of automated classification tools for lung texture. For the large majority of CAD systems, such classification relies on a two-dimensional analysis of axial CT images. In a previously developed CAD system, we proposed a fully-3D approach exploiting a multi-scale morphological analysis which showed good performance in detecting diseased areas, but with a major drawback consisting of sometimes overestimating the pathological areas and mixing different type of lung patterns. This paper proposes a combination of the existing CAD system with the classification outcome provided by a convolutional network, specifically tuned-up, in order to increase the specificity of the classification and the confidence to diagnosis. The advantage of using a deep learning approach is a better regularization of the classification output (because of a deeper insight into a given pathological class over a large series of samples) where the previous system is extra-sensitive due to the multi-scale response on patient-specific, localized patterns. In a preliminary evaluation, the combined approach was tested on a 10 patient database of various lung pathologies, showing a sharp increase of true detections.
[Research on the methods for multi-class kernel CSP-based feature extraction].
Wang, Jinjia; Zhang, Lingzhi; Hu, Bei
2012-04-01
To relax the presumption of strictly linear patterns in the common spatial patterns (CSP), we studied the kernel CSP (KCSP). A new multi-class KCSP (MKCSP) approach was proposed in this paper, which combines the kernel approach with multi-class CSP technique. In this approach, we used kernel spatial patterns for each class against all others, and extracted signal components specific to one condition from EEG data sets of multiple conditions. Then we performed classification using the Logistic linear classifier. Brain computer interface (BCI) competition III_3a was used in the experiment. Through the experiment, it can be proved that this approach could decompose the raw EEG singles into spatial patterns extracted from multi-class of single trial EEG, and could obtain good classification results.
An Assessment of Worldview-2 Imagery for the Classification Of a Mixed Deciduous Forest
NASA Astrophysics Data System (ADS)
Carter, Nahid
Remote sensing provides a variety of methods for classifying forest communities and can be a valuable tool for the impact assessment of invasive species. The emerald ash borer (Agrilus planipennis) infestation of ash trees (Fraxinus) in the United States has resulted in the mortality of large stands of ash throughout the Northeast. This study assessed the suitability of multi-temporal Worldview-2 multispectral satellite imagery for classifying a mixed deciduous forest in Upstate New York. Training sites were collected using a Global Positioning System (GPS) receiver, with each training site consisting of a single tree of a corresponding class. Six classes were collected; Ash, Maple, Oak, Beech, Evergreen, and Other. Three different classifications were investigated on four data sets. A six class classification (6C), a two class classification consisting of ash and all other classes combined (2C), and a merging of the ash and maple classes for a five class classification (5C). The four data sets included Worldview-2 multispectral data collection from June 2010 (J-WV2) and September 2010 (S-WV2), a layer stacked data set using J-WV2 and S-WV2 (LS-WV2), and a reduced data set (RD-WV2). RD-WV2 was created using a statistical analysis of the processed and unprocessed imagery. Statistical analysis was used to reduce the dimensionality of the data and identify key bands to create a fourth data set (RD-WV2). Overall accuracy varied considerably depending upon the classification type, but results indicated that ash was confused with maple in a majority of the classifications. Ash was most accurately identified using the 2C classification and RD-WV2 data set (81.48%). A combination of the ash and maple classes yielded an accuracy of 89.41%. Future work should focus on separating the ash and maple classifiers by using data sources such as hyperspectral imagery, LiDAR, or extensive forest surveys.
Decoding Multiple Sound Categories in the Human Temporal Cortex Using High Resolution fMRI
Zhang, Fengqing; Wang, Ji-Ping; Kim, Jieun; Parrish, Todd; Wong, Patrick C. M.
2015-01-01
Perception of sound categories is an important aspect of auditory perception. The extent to which the brain’s representation of sound categories is encoded in specialized subregions or distributed across the auditory cortex remains unclear. Recent studies using multivariate pattern analysis (MVPA) of brain activations have provided important insights into how the brain decodes perceptual information. In the large existing literature on brain decoding using MVPA methods, relatively few studies have been conducted on multi-class categorization in the auditory domain. Here, we investigated the representation and processing of auditory categories within the human temporal cortex using high resolution fMRI and MVPA methods. More importantly, we considered decoding multiple sound categories simultaneously through multi-class support vector machine-recursive feature elimination (MSVM-RFE) as our MVPA tool. Results show that for all classifications the model MSVM-RFE was able to learn the functional relation between the multiple sound categories and the corresponding evoked spatial patterns and classify the unlabeled sound-evoked patterns significantly above chance. This indicates the feasibility of decoding multiple sound categories not only within but across subjects. However, the across-subject variation affects classification performance more than the within-subject variation, as the across-subject analysis has significantly lower classification accuracies. Sound category-selective brain maps were identified based on multi-class classification and revealed distributed patterns of brain activity in the superior temporal gyrus and the middle temporal gyrus. This is in accordance with previous studies, indicating that information in the spatially distributed patterns may reflect a more abstract perceptual level of representation of sound categories. Further, we show that the across-subject classification performance can be significantly improved by averaging the fMRI images over items, because the irrelevant variations between different items of the same sound category are reduced and in turn the proportion of signals relevant to sound categorization increases. PMID:25692885
Decoding multiple sound categories in the human temporal cortex using high resolution fMRI.
Zhang, Fengqing; Wang, Ji-Ping; Kim, Jieun; Parrish, Todd; Wong, Patrick C M
2015-01-01
Perception of sound categories is an important aspect of auditory perception. The extent to which the brain's representation of sound categories is encoded in specialized subregions or distributed across the auditory cortex remains unclear. Recent studies using multivariate pattern analysis (MVPA) of brain activations have provided important insights into how the brain decodes perceptual information. In the large existing literature on brain decoding using MVPA methods, relatively few studies have been conducted on multi-class categorization in the auditory domain. Here, we investigated the representation and processing of auditory categories within the human temporal cortex using high resolution fMRI and MVPA methods. More importantly, we considered decoding multiple sound categories simultaneously through multi-class support vector machine-recursive feature elimination (MSVM-RFE) as our MVPA tool. Results show that for all classifications the model MSVM-RFE was able to learn the functional relation between the multiple sound categories and the corresponding evoked spatial patterns and classify the unlabeled sound-evoked patterns significantly above chance. This indicates the feasibility of decoding multiple sound categories not only within but across subjects. However, the across-subject variation affects classification performance more than the within-subject variation, as the across-subject analysis has significantly lower classification accuracies. Sound category-selective brain maps were identified based on multi-class classification and revealed distributed patterns of brain activity in the superior temporal gyrus and the middle temporal gyrus. This is in accordance with previous studies, indicating that information in the spatially distributed patterns may reflect a more abstract perceptual level of representation of sound categories. Further, we show that the across-subject classification performance can be significantly improved by averaging the fMRI images over items, because the irrelevant variations between different items of the same sound category are reduced and in turn the proportion of signals relevant to sound categorization increases.
Automatic classification and detection of clinically relevant images for diabetic retinopathy
NASA Astrophysics Data System (ADS)
Xu, Xinyu; Li, Baoxin
2008-03-01
We proposed a novel approach to automatic classification of Diabetic Retinopathy (DR) images and retrieval of clinically-relevant DR images from a database. Given a query image, our approach first classifies the image into one of the three categories: microaneurysm (MA), neovascularization (NV) and normal, and then it retrieves DR images that are clinically-relevant to the query image from an archival image database. In the classification stage, the query DR images are classified by the Multi-class Multiple-Instance Learning (McMIL) approach, where images are viewed as bags, each of which contains a number of instances corresponding to non-overlapping blocks, and each block is characterized by low-level features including color, texture, histogram of edge directions, and shape. McMIL first learns a collection of instance prototypes for each class that maximizes the Diverse Density function using Expectation- Maximization algorithm. A nonlinear mapping is then defined using the instance prototypes and maps every bag to a point in a new multi-class bag feature space. Finally a multi-class Support Vector Machine is trained in the multi-class bag feature space. In the retrieval stage, we retrieve images from the archival database who bear the same label with the query image, and who are the top K nearest neighbors of the query image in terms of similarity in the multi-class bag feature space. The classification approach achieves high classification accuracy, and the retrieval of clinically-relevant images not only facilitates utilization of the vast amount of hidden diagnostic knowledge in the database, but also improves the efficiency and accuracy of DR lesion diagnosis and assessment.
A data set for evaluating the performance of multi-class multi-object video tracking
NASA Astrophysics Data System (ADS)
Chakraborty, Avishek; Stamatescu, Victor; Wong, Sebastien C.; Wigley, Grant; Kearney, David
2017-05-01
One of the challenges in evaluating multi-object video detection, tracking and classification systems is having publically available data sets with which to compare different systems. However, the measures of performance for tracking and classification are different. Data sets that are suitable for evaluating tracking systems may not be appropriate for classification. Tracking video data sets typically only have ground truth track IDs, while classification video data sets only have ground truth class-label IDs. The former identifies the same object over multiple frames, while the latter identifies the type of object in individual frames. This paper describes an advancement of the ground truth meta-data for the DARPA Neovision2 Tower data set to allow both the evaluation of tracking and classification. The ground truth data sets presented in this paper contain unique object IDs across 5 different classes of object (Car, Bus, Truck, Person, Cyclist) for 24 videos of 871 image frames each. In addition to the object IDs and class labels, the ground truth data also contains the original bounding box coordinates together with new bounding boxes in instances where un-annotated objects were present. The unique IDs are maintained during occlusions between multiple objects or when objects re-enter the field of view. This will provide: a solid foundation for evaluating the performance of multi-object tracking of different types of objects, a straightforward comparison of tracking system performance using the standard Multi Object Tracking (MOT) framework, and classification performance using the Neovision2 metrics. These data have been hosted publically.
An Artificial Intelligence Classification Tool and Its Application to Gamma-Ray Bursts
NASA Technical Reports Server (NTRS)
Hakkila, Jon; Haglin, David J.; Roiger, Richard J.; Giblin, Timothy; Paciesas, William S.; Pendleton, Geoffrey N.; Mallozzi, Robert S.
2004-01-01
Despite being the most energetic phenomenon in the known universe, the astrophysics of gamma-ray bursts (GRBs) has still proven difficult to understand. It has only been within the past five years that the GRB distance scale has been firmly established, on the basis of a few dozen bursts with x-ray, optical, and radio afterglows. The afterglows indicate source redshifts of z=1 to z=5, total energy outputs of roughly 10(exp 52) ergs, and energy confined to the far x-ray to near gamma-ray regime of the electromagnetic spectrum. The multi-wavelength afterglow observations have thus far provided more insight on the nature of the GRB mechanism than the GRB observations; far more papers have been written about the few observed gamma-ray burst afterglows in the past few years than about the thousands of detected gamma-ray bursts. One reason the GRB central engine is still so poorly understood is that GRBs have complex, overlapping characteristics that do not appear to be produced by one homogeneous process. At least two subclasses have been found on the basis of duration, spectral hardness, and fluence (time integrated flux); Class 1 bursts are softer, longer, and brighter than Class 2 bursts (with two second durations indicating a rough division). A third GRB subclass, overlapping the other two, has been identified using statistical clustering techniques; Class 3 bursts are intermediate between Class 1 and Class 2 bursts in brightness and duration, but are softer than Class 1 bursts. We are developing a tool to aid scientists in the study of GRB properties. In the process of developing this tool, we are building a large gamma-ray burst classification database. We are also scientifically analyzing some GRB data as we develop the tool. Tool development thus proceeds in tandem with the dataset for which it is being designed. The tool invokes a modified KDD (Knowledge Discovery in Databases) process, which is described as follows.
Sun, Min; Wong, David; Kronenfeld, Barry
2016-01-01
Despite conceptual and technology advancements in cartography over the decades, choropleth map design and classification fail to address a fundamental issue: estimates that are statistically indifferent may be assigned to different classes on maps or vice versa. Recently, the class separability concept was introduced as a map classification criterion to evaluate the likelihood that estimates in two classes are statistical different. Unfortunately, choropleth maps created according to the separability criterion usually have highly unbalanced classes. To produce reasonably separable but more balanced classes, we propose a heuristic classification approach to consider not just the class separability criterion but also other classification criteria such as evenness and intra-class variability. A geovisual-analytic package was developed to support the heuristic mapping process to evaluate the trade-off between relevant criteria and to select the most preferable classification. Class break values can be adjusted to improve the performance of a classification. PMID:28286426
Automated compound classification using a chemical ontology.
Bobach, Claudia; Böhme, Timo; Laube, Ulf; Püschel, Anett; Weber, Lutz
2012-12-29
Classification of chemical compounds into compound classes by using structure derived descriptors is a well-established method to aid the evaluation and abstraction of compound properties in chemical compound databases. MeSH and recently ChEBI are examples of chemical ontologies that provide a hierarchical classification of compounds into general compound classes of biological interest based on their structural as well as property or use features. In these ontologies, compounds have been assigned manually to their respective classes. However, with the ever increasing possibilities to extract new compounds from text documents using name-to-structure tools and considering the large number of compounds deposited in databases, automated and comprehensive chemical classification methods are needed to avoid the error prone and time consuming manual classification of compounds. In the present work we implement principles and methods to construct a chemical ontology of classes that shall support the automated, high-quality compound classification in chemical databases or text documents. While SMARTS expressions have already been used to define chemical structure class concepts, in the present work we have extended the expressive power of such class definitions by expanding their structure-based reasoning logic. Thus, to achieve the required precision and granularity of chemical class definitions, sets of SMARTS class definitions are connected by OR and NOT logical operators. In addition, AND logic has been implemented to allow the concomitant use of flexible atom lists and stereochemistry definitions. The resulting chemical ontology is a multi-hierarchical taxonomy of concept nodes connected by directed, transitive relationships. A proposal for a rule based definition of chemical classes has been made that allows to define chemical compound classes more precisely than before. The proposed structure-based reasoning logic allows to translate chemistry expert knowledge into a computer interpretable form, preventing erroneous compound assignments and allowing automatic compound classification. The automated assignment of compounds in databases, compound structure files or text documents to their related ontology classes is possible through the integration with a chemical structure search engine. As an application example, the annotation of chemical structure files with a prototypic ontology is demonstrated.
Automated compound classification using a chemical ontology
2012-01-01
Background Classification of chemical compounds into compound classes by using structure derived descriptors is a well-established method to aid the evaluation and abstraction of compound properties in chemical compound databases. MeSH and recently ChEBI are examples of chemical ontologies that provide a hierarchical classification of compounds into general compound classes of biological interest based on their structural as well as property or use features. In these ontologies, compounds have been assigned manually to their respective classes. However, with the ever increasing possibilities to extract new compounds from text documents using name-to-structure tools and considering the large number of compounds deposited in databases, automated and comprehensive chemical classification methods are needed to avoid the error prone and time consuming manual classification of compounds. Results In the present work we implement principles and methods to construct a chemical ontology of classes that shall support the automated, high-quality compound classification in chemical databases or text documents. While SMARTS expressions have already been used to define chemical structure class concepts, in the present work we have extended the expressive power of such class definitions by expanding their structure-based reasoning logic. Thus, to achieve the required precision and granularity of chemical class definitions, sets of SMARTS class definitions are connected by OR and NOT logical operators. In addition, AND logic has been implemented to allow the concomitant use of flexible atom lists and stereochemistry definitions. The resulting chemical ontology is a multi-hierarchical taxonomy of concept nodes connected by directed, transitive relationships. Conclusions A proposal for a rule based definition of chemical classes has been made that allows to define chemical compound classes more precisely than before. The proposed structure-based reasoning logic allows to translate chemistry expert knowledge into a computer interpretable form, preventing erroneous compound assignments and allowing automatic compound classification. The automated assignment of compounds in databases, compound structure files or text documents to their related ontology classes is possible through the integration with a chemical structure search engine. As an application example, the annotation of chemical structure files with a prototypic ontology is demonstrated. PMID:23273256
Camouflage target reconnaissance based on hyperspectral imaging technology
NASA Astrophysics Data System (ADS)
Hua, Wenshen; Guo, Tong; Liu, Xun
2015-08-01
Efficient camouflaged target reconnaissance technology makes great influence on modern warfare. Hyperspectral images can provide large spectral range and high spectral resolution, which are invaluable in discriminating between camouflaged targets and backgrounds. Hyperspectral target detection and classification technology are utilized to achieve single class and multi-class camouflaged targets reconnaissance respectively. Constrained energy minimization (CEM), a widely used algorithm in hyperspectral target detection, is employed to achieve one class camouflage target reconnaissance. Then, support vector machine (SVM), a classification method, is proposed to achieve multi-class camouflage target reconnaissance. Experiments have been conducted to demonstrate the efficiency of the proposed method.
A Framework for Inferring Taxonomic Class of Asteroids.
NASA Technical Reports Server (NTRS)
Dotson, J. L.; Mathias, D. L.
2017-01-01
Introduction: Taxonomic classification of asteroids based on their visible / near-infrared spectra or multi band photometry has proven to be a useful tool to infer other properties about asteroids. Meteorite analogs have been identified for several taxonomic classes, permitting detailed inference about asteroid composition. Trends have been identified between taxonomy and measured asteroid density. Thanks to NEOWise (Near-Earth-Object Wide-field Infrared Survey Explorer) and Spitzer (Spitzer Space Telescope), approximately twice as many asteroids have measured albedos than the number with taxonomic classifications. (If one only considers spectroscopically determined classifications, the ratio is greater than 40.) We present a Bayesian framework that provides probabilistic estimates of the taxonomic class of an asteroid based on its albedo. Although probabilistic estimates of taxonomic classes are not a replacement for spectroscopic or photometric determinations, they can be a useful tool for identifying objects for further study or for asteroid threat assessment models. Inputs and Framework: The framework relies upon two inputs: the expected fraction of each taxonomic class in the population and the albedo distribution of each class. Luckily, numerous authors have addressed both of these questions. For example, the taxonomic distribution by number, surface area and mass of the main belt has been estimated and a diameter limited estimate of fractional abundances of the near earth asteroid population was made. Similarly, the albedo distributions for taxonomic classes have been estimated for the combined main belt and NEA (Near Earth Asteroid) populations in different taxonomic systems and for the NEA population specifically. The framework utilizes a Bayesian inference appropriate for categorical data. The population fractions provide the prior while the albedo distributions allow calculation of the likelihood an albedo measurement is consistent with a given taxonomic class. These inputs allows calculation of the probability an asteroid with a specified albedo belongs to any given taxonomic class.
Di-codon Usage for Gene Classification
NASA Astrophysics Data System (ADS)
Nguyen, Minh N.; Ma, Jianmin; Fogel, Gary B.; Rajapakse, Jagath C.
Classification of genes into biologically related groups facilitates inference of their functions. Codon usage bias has been described previously as a potential feature for gene classification. In this paper, we demonstrate that di-codon usage can further improve classification of genes. By using both codon and di-codon features, we achieve near perfect accuracies for the classification of HLA molecules into major classes and sub-classes. The method is illustrated on 1,841 HLA sequences which are classified into two major classes, HLA-I and HLA-II. Major classes are further classified into sub-groups. A binary SVM using di-codon usage patterns achieved 99.95% accuracy in the classification of HLA genes into major HLA classes; and multi-class SVM achieved accuracy rates of 99.82% and 99.03% for sub-class classification of HLA-I and HLA-II genes, respectively. Furthermore, by combining codon and di-codon usages, the prediction accuracies reached 100%, 99.82%, and 99.84% for HLA major class classification, and for sub-class classification of HLA-I and HLA-II genes, respectively.
Borchani, Hanen; Bielza, Concha; Toro, Carlos; Larrañaga, Pedro
2013-03-01
Our aim is to use multi-dimensional Bayesian network classifiers in order to predict the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors given an input set of respective resistance mutations that an HIV patient carries. Multi-dimensional Bayesian network classifiers (MBCs) are probabilistic graphical models especially designed to solve multi-dimensional classification problems, where each input instance in the data set has to be assigned simultaneously to multiple output class variables that are not necessarily binary. In this paper, we introduce a new method, named MB-MBC, for learning MBCs from data by determining the Markov blanket around each class variable using the HITON algorithm. Our method is applied to both reverse transcriptase and protease data sets obtained from the Stanford HIV-1 database. Regarding the prediction of antiretroviral combination therapies, the experimental study shows promising results in terms of classification accuracy compared with state-of-the-art MBC learning algorithms. For reverse transcriptase inhibitors, we get 71% and 11% in mean and global accuracy, respectively; while for protease inhibitors, we get more than 84% and 31% in mean and global accuracy, respectively. In addition, the analysis of MBC graphical structures lets us gain insight into both known and novel interactions between reverse transcriptase and protease inhibitors and their respective resistance mutations. MB-MBC algorithm is a valuable tool to analyze the HIV-1 reverse transcriptase and protease inhibitors prediction problem and to discover interactions within and between these two classes of inhibitors. Copyright © 2012 Elsevier B.V. All rights reserved.
Reducing uncertainty on satellite image classification through spatiotemporal reasoning
NASA Astrophysics Data System (ADS)
Partsinevelos, Panagiotis; Nikolakaki, Natassa; Psillakis, Periklis; Miliaresis, George; Xanthakis, Michail
2014-05-01
The natural habitat constantly endures both inherent natural and human-induced influences. Remote sensing has been providing monitoring oriented solutions regarding the natural Earth surface, by offering a series of tools and methodologies which contribute to prudent environmental management. Processing and analysis of multi-temporal satellite images for the observation of the land changes include often classification and change-detection techniques. These error prone procedures are influenced mainly by the distinctive characteristics of the study areas, the remote sensing systems limitations and the image analysis processes. The present study takes advantage of the temporal continuity of multi-temporal classified images, in order to reduce classification uncertainty, based on reasoning rules. More specifically, pixel groups that temporally oscillate between classes are liable to misclassification or indicate problematic areas. On the other hand, constant pixel group growth indicates a pressure prone area. Computational tools are developed in order to disclose the alterations in land use dynamics and offer a spatial reference to the pressures that land use classes endure and impose between them. Moreover, by revealing areas that are susceptible to misclassification, we propose specific target site selection for training during the process of supervised classification. The underlying objective is to contribute to the understanding and analysis of anthropogenic and environmental factors that influence land use changes. The developed algorithms have been tested upon Landsat satellite image time series, depicting the National Park of Ainos in Kefallinia, Greece, where the unique in the world Abies cephalonica grows. Along with the minor changes and pressures indicated in the test area due to harvesting and other human interventions, the developed algorithms successfully captured fire incidents that have been historically confirmed. Overall, the results have shown that the use of the suggested procedures can contribute to the reduction of the classification uncertainty and support the existing knowledge regarding the pressure among land-use changes.
Hoang, Tuan; Tran, Dat; Huang, Xu
2013-01-01
Common Spatial Pattern (CSP) is a state-of-the-art method for feature extraction in Brain-Computer Interface (BCI) systems. However it is designed for 2-class BCI classification problems. Current extensions of this method to multiple classes based on subspace union and covariance matrix similarity do not provide a high performance. This paper presents a new approach to solving multi-class BCI classification problems by forming a subspace resembled from original subspaces and the proposed method for this approach is called Approximation-based Common Principal Component (ACPC). We perform experiments on Dataset 2a used in BCI Competition IV to evaluate the proposed method. This dataset was designed for motor imagery classification with 4 classes. Preliminary experiments show that the proposed ACPC feature extraction method when combining with Support Vector Machines outperforms CSP-based feature extraction methods on the experimental dataset.
NASA Astrophysics Data System (ADS)
Bayoudh, Meriam; Roux, Emmanuel; Richard, Gilles; Nock, Richard
2015-03-01
The number of satellites and sensors devoted to Earth observation has become increasingly elevated, delivering extensive data, especially images. At the same time, the access to such data and the tools needed to process them has considerably improved. In the presence of such data flow, we need automatic image interpretation methods, especially when it comes to the monitoring and prediction of environmental and societal changes in highly dynamic socio-environmental contexts. This could be accomplished via artificial intelligence. The concept described here relies on the induction of classification rules that explicitly take into account structural knowledge, using Aleph, an Inductive Logic Programming (ILP) system, combined with a multi-class classification procedure. This methodology was used to monitor changes in land cover/use of the French Guiana coastline. One hundred and fifty-eight classification rules were induced from 3 diachronic land cover/use maps including 38 classes. These rules were expressed in first order logic language, which makes them easily understandable by non-experts. A 10-fold cross-validation gave significant average values of 84.62%, 99.57% and 77.22% for classification accuracy, specificity and sensitivity, respectively. Our methodology could be beneficial to automatically classify new objects and to facilitate object-based classification procedures.
Classification of multiple sclerosis lesions using adaptive dictionary learning.
Deshpande, Hrishikesh; Maurel, Pierre; Barillot, Christian
2015-12-01
This paper presents a sparse representation and an adaptive dictionary learning based method for automated classification of multiple sclerosis (MS) lesions in magnetic resonance (MR) images. Manual delineation of MS lesions is a time-consuming task, requiring neuroradiology experts to analyze huge volume of MR data. This, in addition to the high intra- and inter-observer variability necessitates the requirement of automated MS lesion classification methods. Among many image representation models and classification methods that can be used for such purpose, we investigate the use of sparse modeling. In the recent years, sparse representation has evolved as a tool in modeling data using a few basis elements of an over-complete dictionary and has found applications in many image processing tasks including classification. We propose a supervised classification approach by learning dictionaries specific to the lesions and individual healthy brain tissues, which include white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF). The size of the dictionaries learned for each class plays a major role in data representation but it is an even more crucial element in the case of competitive classification. Our approach adapts the size of the dictionary for each class, depending on the complexity of the underlying data. The algorithm is validated using 52 multi-sequence MR images acquired from 13 MS patients. The results demonstrate the effectiveness of our approach in MS lesion classification. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Cheng, Gong; Han, Junwei; Zhou, Peicheng; Guo, Lei
2014-12-01
The rapid development of remote sensing technology has facilitated us the acquisition of remote sensing images with higher and higher spatial resolution, but how to automatically understand the image contents is still a big challenge. In this paper, we develop a practical and rotation-invariant framework for multi-class geospatial object detection and geographic image classification based on collection of part detectors (COPD). The COPD is composed of a set of representative and discriminative part detectors, where each part detector is a linear support vector machine (SVM) classifier used for the detection of objects or recurring spatial patterns within a certain range of orientation. Specifically, when performing multi-class geospatial object detection, we learn a set of seed-based part detectors where each part detector corresponds to a particular viewpoint of an object class, so the collection of them provides a solution for rotation-invariant detection of multi-class objects. When performing geographic image classification, we utilize a large number of pre-trained part detectors to discovery distinctive visual parts from images and use them as attributes to represent the images. Comprehensive evaluations on two remote sensing image databases and comparisons with some state-of-the-art approaches demonstrate the effectiveness and superiority of the developed framework.
Random forest wetland classification using ALOS-2 L-band, RADARSAT-2 C-band, and TerraSAR-X imagery
NASA Astrophysics Data System (ADS)
Mahdianpari, Masoud; Salehi, Bahram; Mohammadimanesh, Fariba; Motagh, Mahdi
2017-08-01
Wetlands are important ecosystems around the world, although they are degraded due both to anthropogenic and natural process. Newfoundland is among the richest Canadian province in terms of different wetland classes. Herbaceous wetlands cover extensive areas of the Avalon Peninsula, which are the habitat of a number of animal and plant species. In this study, a novel hierarchical object-based Random Forest (RF) classification approach is proposed for discriminating between different wetland classes in a sub-region located in the north eastern portion of the Avalon Peninsula. Particularly, multi-polarization and multi-frequency SAR data, including X-band TerraSAR-X single polarized (HH), L-band ALOS-2 dual polarized (HH/HV), and C-band RADARSAT-2 fully polarized images, were applied in different classification levels. First, a SAR backscatter analysis of different land cover types was performed by training data and used in Level-I classification to separate water from non-water classes. This was followed by Level-II classification, wherein the water class was further divided into shallow- and deep-water classes, and the non-water class was partitioned into herbaceous and non-herbaceous classes. In Level-III classification, the herbaceous class was further divided into bog, fen, and marsh classes, while the non-herbaceous class was subsequently partitioned into urban, upland, and swamp classes. In Level-II and -III classifications, different polarimetric decomposition approaches, including Cloude-Pottier, Freeman-Durden, Yamaguchi decompositions, and Kennaugh matrix elements were extracted to aid the RF classifier. The overall accuracy and kappa coefficient were determined in each classification level for evaluating the classification results. The importance of input features was also determined using the variable importance obtained by RF. It was found that the Kennaugh matrix elements, Yamaguchi, and Freeman-Durden decompositions were the most important parameters for wetland classification in this study. Using this new hierarchical RF classification approach, an overall accuracy of up to 94% was obtained for classifying different land cover types in the study area.
A fast learning method for large scale and multi-class samples of SVM
NASA Astrophysics Data System (ADS)
Fan, Yu; Guo, Huiming
2017-06-01
A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.
A Mixtures-of-Trees Framework for Multi-Label Classification
Hong, Charmgil; Batal, Iyad; Hauskrecht, Milos
2015-01-01
We propose a new probabilistic approach for multi-label classification that aims to represent the class posterior distribution P(Y|X). Our approach uses a mixture of tree-structured Bayesian networks, which can leverage the computational advantages of conditional tree-structured models and the abilities of mixtures to compensate for tree-structured restrictions. We develop algorithms for learning the model from data and for performing multi-label predictions using the learned model. Experiments on multiple datasets demonstrate that our approach outperforms several state-of-the-art multi-label classification methods. PMID:25927011
MultiSpec: A Desktop and Online Geospatial Image Data Processing Tool
NASA Astrophysics Data System (ADS)
Biehl, L. L.; Hsu, W. K.; Maud, A. R. M.; Yeh, T. T.
2017-12-01
MultiSpec is an easy to learn and use, freeware image processing tool for interactively analyzing a broad spectrum of geospatial image data, with capabilities such as image display, unsupervised and supervised classification, feature extraction, feature enhancement, and several other functions. Originally developed for Macintosh and Windows desktop computers, it has a community of several thousand users worldwide, including researchers and educators, as a practical and robust solution for analyzing multispectral and hyperspectral remote sensing data in several different file formats. More recently MultiSpec was adapted to run in the HUBzero collaboration platform so that it can be used within a web browser, allowing new user communities to be engaged through science gateways. MultiSpec Online has also been extended to interoperate with other components (e.g., data management) in HUBzero through integration with the geospatial data building blocks (GABBs) project. This integration enables a user to directly launch MultiSpec Online from data that is stored and/or shared in a HUBzero gateway and to save output data from MultiSpec Online to hub storage, allowing data sharing and multi-step workflows without having to move data between different systems. MultiSpec has also been used in K-12 classes for which one example is the GLOBE program (www.globe.gov) and in outreach material such as that provided by the USGS (eros.usgs.gov/educational-activities). MultiSpec Online now provides teachers with another way to use MultiSpec without having to install the desktop tool. Recently MultiSpec Online was used in a geospatial data session with 30-35 middle school students at the Turned Onto Technology and Leadership (TOTAL) Camp in the summers of 2016 and 2017 at Purdue University. The students worked on a flood mapping exercise using Landsat 5 data to learn about land remote sensing using supervised classification techniques. Online documentation is available for MultiSpec (engineering.purdue.edu/ biehl/MultiSpec/) including a reference manual and several tutorials allowing young high-school students through research faculty to learn the basic functions in MultiSpec. Some of the tutorials have been translated to other languages by MultiSpec users.
NASA Astrophysics Data System (ADS)
He, Xin; Frey, Eric C.
2007-03-01
Binary ROC analysis has solid decision-theoretic foundations and a close relationship to linear discriminant analysis (LDA). In particular, for the case of Gaussian equal covariance input data, the area under the ROC curve (AUC) value has a direct relationship to the Hotelling trace. Many attempts have been made to extend binary classification methods to multi-class. For example, Fukunaga extended binary LDA to obtain multi-class LDA, which uses the multi-class Hotelling trace as a figure-of-merit, and we have previously developed a three-class ROC analysis method. This work explores the relationship between conventional multi-class LDA and three-class ROC analysis. First, we developed a linear observer, the three-class Hotelling observer (3-HO). For Gaussian equal covariance data, the 3- HO provides equivalent performance to the three-class ideal observer and, under less strict conditions, maximizes the signal to noise ratio for classification of all pairs of the three classes simultaneously. The 3-HO templates are not the eigenvectors obtained from multi-class LDA. Second, we show that the three-class Hotelling trace, which is the figureof- merit in the conventional three-class extension of LDA, has significant limitations. Third, we demonstrate that, under certain conditions, there is a linear relationship between the eigenvectors obtained from multi-class LDA and 3-HO templates. We conclude that the 3-HO based on decision theory has advantages both in its decision theoretic background and in the usefulness of its figure-of-merit. Additionally, there exists the possibility of interpreting the two linear features extracted by the conventional extension of LDA from a decision theoretic point of view.
Fan, Ming; Zheng, Bin; Li, Lihua
2015-10-01
Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.
A Coupled k-Nearest Neighbor Algorithm for Multi-Label Classification
2015-05-22
classification, an image may contain several concepts simultaneously, such as beach, sunset and kangaroo . Such tasks are usually denoted as multi-label...informatics, a gene can belong to both metabolism and transcription classes; and in music categorization, a song may labeled as Mozart and sad. In the
A Generalized Mixture Framework for Multi-label Classification
Hong, Charmgil; Batal, Iyad; Hauskrecht, Milos
2015-01-01
We develop a novel probabilistic ensemble framework for multi-label classification that is based on the mixtures-of-experts architecture. In this framework, we combine multi-label classification models in the classifier chains family that decompose the class posterior distribution P(Y1, …, Yd|X) using a product of posterior distributions over components of the output space. Our approach captures different input–output and output–output relations that tend to change across data. As a result, we can recover a rich set of dependency relations among inputs and outputs that a single multi-label classification model cannot capture due to its modeling simplifications. We develop and present algorithms for learning the mixtures-of-experts models from data and for performing multi-label predictions on unseen data instances. Experiments on multiple benchmark datasets demonstrate that our approach achieves highly competitive results and outperforms the existing state-of-the-art multi-label classification methods. PMID:26613069
Enhanced risk management by an emerging multi-agent architecture
NASA Astrophysics Data System (ADS)
Lin, Sin-Jin; Hsu, Ming-Fu
2014-07-01
Classification in imbalanced datasets has attracted much attention from researchers in the field of machine learning. Most existing techniques tend not to perform well on minority class instances when the dataset is highly skewed because they focus on minimising the forecasting error without considering the relative distribution of each class. This investigation proposes an emerging multi-agent architecture, grounded on cooperative learning, to solve the class-imbalanced classification problem. Additionally, this study deals further with the obscure nature of the multi-agent architecture and expresses comprehensive rules for auditors. The results from this study indicate that the presented model performs satisfactorily in risk management and is able to tackle a highly class-imbalanced dataset comparatively well. Furthermore, the knowledge visualised process, supported by real examples, can assist both internal and external auditors who must allocate limited detecting resources; they can take the rules as roadmaps to modify the auditing programme.
Multi-class SVM model for fMRI-based classification and grading of liver fibrosis
NASA Astrophysics Data System (ADS)
Freiman, M.; Sela, Y.; Edrei, Y.; Pappo, O.; Joskowicz, L.; Abramovitch, R.
2010-03-01
We present a novel non-invasive automatic method for the classification and grading of liver fibrosis from fMRI maps based on hepatic hemodynamic changes. This method automatically creates a model for liver fibrosis grading based on training datasets. Our supervised learning method evaluates hepatic hemodynamics from an anatomical MRI image and three T2*-W fMRI signal intensity time-course scans acquired during the breathing of air, air-carbon dioxide, and carbogen. It constructs a statistical model of liver fibrosis from these fMRI scans using a binary-based one-against-all multi class Support Vector Machine (SVM) classifier. We evaluated the resulting classification model with the leave-one out technique and compared it to both full multi-class SVM and K-Nearest Neighbor (KNN) classifications. Our experimental study analyzed 57 slice sets from 13 mice, and yielded a 98.2% separation accuracy between healthy and low grade fibrotic subjects, and an overall accuracy of 84.2% for fibrosis grading. These results are better than the existing image-based methods which can only discriminate between healthy and high grade fibrosis subjects. With appropriate extensions, our method may be used for non-invasive classification and progression monitoring of liver fibrosis in human patients instead of more invasive approaches, such as biopsy or contrast-enhanced imaging.
Multi-class Mode of Action Classification of Toxic Compounds Using Logic Based Kernel Methods.
Lodhi, Huma; Muggleton, Stephen; Sternberg, Mike J E
2010-09-17
Toxicity prediction is essential for drug design and development of effective therapeutics. In this paper we present an in silico strategy, to identify the mode of action of toxic compounds, that is based on the use of a novel logic based kernel method. The technique uses support vector machines in conjunction with the kernels constructed from first order rules induced by an Inductive Logic Programming system. It constructs multi-class models by using a divide and conquer reduction strategy that splits multi-classes into binary groups and solves each individual problem recursively hence generating an underlying decision list structure. In order to evaluate the effectiveness of the approach for chemoinformatics problems like predictive toxicology, we apply it to toxicity classification in aquatic systems. The method is used to identify and classify 442 compounds with respect to the mode of action. The experimental results show that the technique successfully classifies toxic compounds and can be useful in assessing environmental risks. Experimental comparison of the performance of the proposed multi-class scheme with the standard multi-class Inductive Logic Programming algorithm and multi-class Support Vector Machine yields statistically significant results and demonstrates the potential power and benefits of the approach in identifying compounds of various toxic mechanisms. Copyright © 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Dronova, I.; Gong, P.; Wang, L.; Clinton, N.; Fu, W.; Qi, S.
2011-12-01
Remote sensing-based vegetation classifications representing plant function such as photosynthesis and productivity are challenging in wetlands with complex cover and difficult field access. Recent advances in object-based image analysis (OBIA) and machine-learning algorithms offer new classification tools; however, few comparisons of different algorithms and spatial scales have been discussed to date. We applied OBIA to delineate wetland plant functional types (PFTs) for Poyang Lake, the largest freshwater lake in China and Ramsar wetland conservation site, from 30-m Landsat TM scene at the peak of spring growing season. We targeted major PFTs (C3 grasses, C3 forbs and different types of C4 grasses and aquatic vegetation) that are both key players in system's biogeochemical cycles and critical providers of waterbird habitat. Classification results were compared among: a) several object segmentation scales (with average object sizes 900-9000 m2); b) several families of statistical classifiers (including Bayesian, Logistic, Neural Network, Decision Trees and Support Vector Machines) and c) two hierarchical levels of vegetation classification, a generalized 3-class set and more detailed 6-class set. We found that classification benefited from object-based approach which allowed including object shape, texture and context descriptors in classification. While a number of classifiers achieved high accuracy at the finest pixel-equivalent segmentation scale, the highest accuracies and best agreement among algorithms occurred at coarser object scales. No single classifier was consistently superior across all scales, although selected algorithms of Neural Network, Logistic and K-Nearest Neighbors families frequently provided the best discrimination of classes at different scales. The choice of vegetation categories also affected classification accuracy. The 6-class set allowed for higher individual class accuracies but lower overall accuracies than the 3-class set because individual classes differed in scales at which they were best discriminated from others. Main classification challenges included a) presence of C3 grasses in C4-grass areas, particularly following harvesting of C4 reeds and b) mixtures of emergent, floating and submerged aquatic plants at sub-object and sub-pixel scales. We conclude that OBIA with advanced statistical classifiers offers useful instruments for landscape vegetation analyses, and that spatial scale considerations are critical in mapping PFTs, while multi-scale comparisons can be used to guide class selection. Future work will further apply fuzzy classification and field-collected spectral data for PFT analysis and compare results with MODIS PFT products.
Young Kim, Eun; Johnson, Hans J
2013-01-01
A robust multi-modal tool, for automated registration, bias correction, and tissue classification, has been implemented for large-scale heterogeneous multi-site longitudinal MR data analysis. This work focused on improving the an iterative optimization framework between bias-correction, registration, and tissue classification inspired from previous work. The primary contributions are robustness improvements from incorporation of following four elements: (1) utilize multi-modal and repeated scans, (2) incorporate high-deformable registration, (3) use extended set of tissue definitions, and (4) use of multi-modal aware intensity-context priors. The benefits of these enhancements were investigated by a series of experiments with both simulated brain data set (BrainWeb) and by applying to highly-heterogeneous data from a 32 site imaging study with quality assessments through the expert visual inspection. The implementation of this tool is tailored for, but not limited to, large-scale data processing with great data variation with a flexible interface. In this paper, we describe enhancements to a joint registration, bias correction, and the tissue classification, that improve the generalizability and robustness for processing multi-modal longitudinal MR scans collected at multi-sites. The tool was evaluated by using both simulated and simulated and human subject MRI images. With these enhancements, the results showed improved robustness for large-scale heterogeneous MRI processing.
Inter-class sparsity based discriminative least square regression.
Wen, Jie; Xu, Yong; Li, Zuoyong; Ma, Zhongli; Xu, Yuanrong
2018-06-01
Least square regression is a very popular supervised classification method. However, two main issues greatly limit its performance. The first one is that it only focuses on fitting the input features to the corresponding output labels while ignoring the correlations among samples. The second one is that the used label matrix, i.e., zero-one label matrix is inappropriate for classification. To solve these problems and improve the performance, this paper presents a novel method, i.e., inter-class sparsity based discriminative least square regression (ICS_DLSR), for multi-class classification. Different from other methods, the proposed method pursues that the transformed samples have a common sparsity structure in each class. For this goal, an inter-class sparsity constraint is introduced to the least square regression model such that the margins of samples from the same class can be greatly reduced while those of samples from different classes can be enlarged. In addition, an error term with row-sparsity constraint is introduced to relax the strict zero-one label matrix, which allows the method to be more flexible in learning the discriminative transformation matrix. These factors encourage the method to learn a more compact and discriminative transformation for regression and thus has the potential to perform better than other methods. Extensive experimental results show that the proposed method achieves the best performance in comparison with other methods for multi-class classification. Copyright © 2018 Elsevier Ltd. All rights reserved.
Fernández, Alberto; Carmona, Cristobal José; José Del Jesus, María; Herrera, Francisco
2017-09-01
Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.
Border Lakes land-cover classification
Marvin Bauer; Brian Loeffelholz; Doug Shinneman
2009-01-01
This document contains metadata and description of land-cover classification of approximately 5.1 million acres of land bordering Minnesota, U.S.A. and Ontario, Canada. The classification focused on the separation and identification of specific forest-cover types. Some separation of the nonforest classes also was performed. The classification was derived from multi-...
Multivariate decoding of brain images using ordinal regression.
Doyle, O M; Ashburner, J; Zelaya, F O; Williams, S C R; Mehta, M A; Marquand, A F
2013-11-01
Neuroimaging data are increasingly being used to predict potential outcomes or groupings, such as clinical severity, drug dose response, and transitional illness states. In these examples, the variable (target) we want to predict is ordinal in nature. Conventional classification schemes assume that the targets are nominal and hence ignore their ranked nature, whereas parametric and/or non-parametric regression models enforce a metric notion of distance between classes. Here, we propose a novel, alternative multivariate approach that overcomes these limitations - whole brain probabilistic ordinal regression using a Gaussian process framework. We applied this technique to two data sets of pharmacological neuroimaging data from healthy volunteers. The first study was designed to investigate the effect of ketamine on brain activity and its subsequent modulation with two compounds - lamotrigine and risperidone. The second study investigates the effect of scopolamine on cerebral blood flow and its modulation using donepezil. We compared ordinal regression to multi-class classification schemes and metric regression. Considering the modulation of ketamine with lamotrigine, we found that ordinal regression significantly outperformed multi-class classification and metric regression in terms of accuracy and mean absolute error. However, for risperidone ordinal regression significantly outperformed metric regression but performed similarly to multi-class classification both in terms of accuracy and mean absolute error. For the scopolamine data set, ordinal regression was found to outperform both multi-class and metric regression techniques considering the regional cerebral blood flow in the anterior cingulate cortex. Ordinal regression was thus the only method that performed well in all cases. Our results indicate the potential of an ordinal regression approach for neuroimaging data while providing a fully probabilistic framework with elegant approaches for model selection. Copyright © 2013. Published by Elsevier Inc.
Li, Juntao; Wang, Yanyan; Jiang, Tao; Xiao, Huimin; Song, Xuekun
2018-05-09
Diagnosing acute leukemia is the necessary prerequisite to treating it. Multi-classification on the gene expression data of acute leukemia is help for diagnosing it which contains B-cell acute lymphoblastic leukemia (BALL), T-cell acute lymphoblastic leukemia (TALL) and acute myeloid leukemia (AML). However, selecting cancer-causing genes is a challenging problem in performing multi-classification. In this paper, weighted gene co-expression networks are employed to divide the genes into groups. Based on the dividing groups, a new regularized multinomial regression with overlapping group lasso penalty (MROGL) has been presented to simultaneously perform multi-classification and select gene groups. By implementing this method on three-class acute leukemia data, the grouped genes which work synergistically are identified, and the overlapped genes shared by different groups are also highlighted. Moreover, MROGL outperforms other five methods on multi-classification accuracy. Copyright © 2017. Published by Elsevier B.V.
Gates to Gregg High Voltage Transmission Line Study. [California
NASA Technical Reports Server (NTRS)
Bergis, V.; Maw, K.; Newland, W.; Sinnott, D.; Thornbury, G.; Easterwood, P.; Bonderud, J.
1982-01-01
The usefulness of LANDSAT data in the planning of transmission line routes was assessed. LANDSAT digital data and image processing techniques, specifically a multi-date supervised classification aproach, were used to develop a land cover map for an agricultural area near Fresno, California. Twenty-six land cover classes were identified, of which twenty classes were agricultural crops. High classification accuracies (greater than 80%) were attained for several classes, including cotton, grain, and vineyards. The primary products generated were 1:24,000, 1:100,000 and 1:250,000 scale maps of the classification and acreage summaries for all land cover classes within four alternate transmission line routes.
The Classification of Romanian High-Schools
ERIC Educational Resources Information Center
Ivan, Ion; Milodin, Daniel; Naie, Lucian
2006-01-01
The article tries to tackle the issue of high-schools classification from one city, district or from Romania. The classification criteria are presented. The National Database of Education is also presented and the application of criteria is illustrated. An algorithm for high-school multi-rang classification is proposed in order to build classes of…
NASA Astrophysics Data System (ADS)
Chang Chien, Kuang-Che; Fetita, Catalin; Brillet, Pierre-Yves; Prêteux, Françoise; Chang, Ruey-Feng
2009-02-01
Multi-detector computed tomography (MDCT) has high accuracy and specificity on volumetrically capturing serial images of the lung. It increases the capability of computerized classification for lung tissue in medical research. This paper proposes a three-dimensional (3D) automated approach based on mathematical morphology and fuzzy logic for quantifying and classifying interstitial lung diseases (ILDs) and emphysema. The proposed methodology is composed of several stages: (1) an image multi-resolution decomposition scheme based on a 3D morphological filter is used to detect and analyze the different density patterns of the lung texture. Then, (2) for each pattern in the multi-resolution decomposition, six features are computed, for which fuzzy membership functions define a probability of association with a pathology class. Finally, (3) for each pathology class, the probabilities are combined up according to the weight assigned to each membership function and two threshold values are used to decide the final class of the pattern. The proposed approach was tested on 10 MDCT cases and the classification accuracy was: emphysema: 95%, fibrosis/honeycombing: 84% and ground glass: 97%.
Semantic classification of business images
NASA Astrophysics Data System (ADS)
Erol, Berna; Hull, Jonathan J.
2006-01-01
Digital cameras are becoming increasingly common for capturing information in business settings. In this paper, we describe a novel method for classifying images into the following semantic classes: document, whiteboard, business card, slide, and regular images. Our method is based on combining low-level image features, such as text color, layout, and handwriting features with high-level OCR output analysis. Several Support Vector Machine Classifiers are combined for multi-class classification of input images. The system yields 95% accuracy in classification.
2012-01-01
Background Automated classification of histopathology involves identification of multiple classes, including benign, cancerous, and confounder categories. The confounder tissue classes can often mimic and share attributes with both the diseased and normal tissue classes, and can be particularly difficult to identify, both manually and by automated classifiers. In the case of prostate cancer, they may be several confounding tissue types present in a biopsy sample, posing as major sources of diagnostic error for pathologists. Two common multi-class approaches are one-shot classification (OSC), where all classes are identified simultaneously, and one-versus-all (OVA), where a “target” class is distinguished from all “non-target” classes. OSC is typically unable to handle discrimination of classes of varying similarity (e.g. with images of prostate atrophy and high grade cancer), while OVA forces several heterogeneous classes into a single “non-target” class. In this work, we present a cascaded (CAS) approach to classifying prostate biopsy tissue samples, where images from different classes are grouped to maximize intra-group homogeneity while maximizing inter-group heterogeneity. Results We apply the CAS approach to categorize 2000 tissue samples taken from 214 patient studies into seven classes: epithelium, stroma, atrophy, prostatic intraepithelial neoplasia (PIN), and prostate cancer Gleason grades 3, 4, and 5. A series of increasingly granular binary classifiers are used to split the different tissue classes until the images have been categorized into a single unique class. Our automatically-extracted image feature set includes architectural features based on location of the nuclei within the tissue sample as well as texture features extracted on a per-pixel level. The CAS strategy yields a positive predictive value (PPV) of 0.86 in classifying the 2000 tissue images into one of 7 classes, compared with the OVA (0.77 PPV) and OSC approaches (0.76 PPV). Conclusions Use of the CAS strategy increases the PPV for a multi-category classification system over two common alternative strategies. In classification problems such as histopathology, where multiple class groups exist with varying degrees of heterogeneity, the CAS system can intelligently assign class labels to objects by performing multiple binary classifications according to domain knowledge. PMID:23110677
NASA Astrophysics Data System (ADS)
Pradhan, Biswajeet; Kabiri, Keivan
2012-07-01
This paper describes an assessment of coral reef mapping using multi sensor satellite images such as Landsat ETM, SPOT and IKONOS images for Tioman Island, Malaysia. The study area is known to be one of the best Islands in South East Asia for its unique collection of diversified coral reefs and serves host to thousands of tourists every year. For the coral reef identification, classification and analysis, Landsat ETM, SPOT and IKONOS images were collected processed and classified using hierarchical classification schemes. At first, Decision tree classification method was implemented to separate three main land cover classes i.e. water, rural and vegetation and then maximum likelihood supervised classification method was used to classify these main classes. The accuracy of the classification result is evaluated by a separated test sample set, which is selected based on the fieldwork survey and view interpretation from IKONOS image. Few types of ancillary data in used are: (a) DGPS ground control points; (b) Water quality parameters measured by Hydrolab DS4a; (c) Sea-bed substrates spectrum measured by Unispec and; (d) Landcover observation photos along Tioman island coastal area. The overall accuracy of the final classification result obtained was 92.25% with the kappa coefficient is 0.8940. Key words: Coral reef, Multi-spectral Segmentation, Pixel-Based Classification, Decision Tree, Tioman Island
NASA Astrophysics Data System (ADS)
Zhong, Yanfei; Han, Xiaobing; Zhang, Liangpei
2018-04-01
Multi-class geospatial object detection from high spatial resolution (HSR) remote sensing imagery is attracting increasing attention in a wide range of object-related civil and engineering applications. However, the distribution of objects in HSR remote sensing imagery is location-variable and complicated, and how to accurately detect the objects in HSR remote sensing imagery is a critical problem. Due to the powerful feature extraction and representation capability of deep learning, the deep learning based region proposal generation and object detection integrated framework has greatly promoted the performance of multi-class geospatial object detection for HSR remote sensing imagery. However, due to the translation caused by the convolution operation in the convolutional neural network (CNN), although the performance of the classification stage is seldom influenced, the localization accuracies of the predicted bounding boxes in the detection stage are easily influenced. The dilemma between translation-invariance in the classification stage and translation-variance in the object detection stage has not been addressed for HSR remote sensing imagery, and causes position accuracy problems for multi-class geospatial object detection with region proposal generation and object detection. In order to further improve the performance of the region proposal generation and object detection integrated framework for HSR remote sensing imagery object detection, a position-sensitive balancing (PSB) framework is proposed in this paper for multi-class geospatial object detection from HSR remote sensing imagery. The proposed PSB framework takes full advantage of the fully convolutional network (FCN), on the basis of a residual network, and adopts the PSB framework to solve the dilemma between translation-invariance in the classification stage and translation-variance in the object detection stage. In addition, a pre-training mechanism is utilized to accelerate the training procedure and increase the robustness of the proposed algorithm. The proposed algorithm is validated with a publicly available 10-class object detection dataset.
Thomaz, Ricardo de Lima; Carneiro, Pedro Cunha; Bonin, João Eliton; Macedo, Túlio Augusto Alves; Patrocinio, Ana Claudia; Soares, Alcimar Barbosa
2018-05-01
Detection of early hepatocellular carcinoma (HCC) is responsible for increasing survival rates in up to 40%. One-class classifiers can be used for modeling early HCC in multidetector computed tomography (MDCT), but demand the specific knowledge pertaining to the set of features that best describes the target class. Although the literature outlines several features for characterizing liver lesions, it is unclear which is most relevant for describing early HCC. In this paper, we introduce an unconstrained GA feature selection algorithm based on a multi-objective Mahalanobis fitness function to improve the classification performance for early HCC. We compared our approach to a constrained Mahalanobis function and two other unconstrained functions using Welch's t-test and Gaussian Data Descriptors. The performance of each fitness function was evaluated by cross-validating a one-class SVM. The results show that the proposed multi-objective Mahalanobis fitness function is capable of significantly reducing data dimensionality (96.4%) and improving one-class classification of early HCC (0.84 AUC). Furthermore, the results provide strong evidence that intensity features extracted at the arterial to portal and arterial to equilibrium phases are important for classifying early HCC.
Automatic threshold selection for multi-class open set recognition
NASA Astrophysics Data System (ADS)
Scherreik, Matthew; Rigling, Brian
2017-05-01
Multi-class open set recognition is the problem of supervised classification with additional unknown classes encountered after a model has been trained. An open set classifer often has two core components. The first component is a base classifier which estimates the most likely class of a given example. The second component consists of open set logic which estimates if the example is truly a member of the candidate class. Such a system is operated in a feed-forward fashion. That is, a candidate label is first estimated by the base classifier, and the true membership of the example to the candidate class is estimated afterward. Previous works have developed an iterative threshold selection algorithm for rejecting examples from classes which were not present at training time. In those studies, a Platt-calibrated SVM was used as the base classifier, and the thresholds were applied to class posterior probabilities for rejection. In this work, we investigate the effectiveness of other base classifiers when paired with the threshold selection algorithm and compare their performance with the original SVM solution.
NASA Astrophysics Data System (ADS)
Leena, N.; Saju, K. K.
2018-04-01
Nutritional deficiencies in plants are a major concern for farmers as it affects productivity and thus profit. The work aims to classify nutritional deficiencies in maize plant in a non-destructive mannerusing image processing and machine learning techniques. The colored images of the leaves are analyzed and classified with multi-class support vector machine (SVM) method. Several images of maize leaves with known deficiencies like nitrogen, phosphorous and potassium (NPK) are used to train the SVM classifier prior to the classification of test images. The results show that the method was able to classify and identify nutritional deficiencies.
FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome.
Wucher, Valentin; Legeai, Fabrice; Hédan, Benoît; Rizk, Guillaume; Lagoutte, Lætitia; Leeb, Tosso; Jagannathan, Vidhya; Cadieu, Edouard; David, Audrey; Lohi, Hannes; Cirera, Susanna; Fredholm, Merete; Botherel, Nadine; Leegwater, Peter A J; Le Béguec, Céline; Fieten, Hille; Johnson, Jeremy; Alföldi, Jessica; André, Catherine; Lindblad-Toh, Kerstin; Hitte, Christophe; Derrien, Thomas
2017-05-05
Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. One of the main bottlenecks, however, is to correctly identify the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program that accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-the-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE data sets. The program also provides specific modules that enable the user to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to identify lncRNAs even in the absence of a training set of non-coding RNAs. We used FEELnc on a real data set comprising 20 canine RNA-seq samples produced by the European LUPA consortium to substantially expand the canine genome annotation to include 10 374 novel lncRNAs and 58 640 mRNA transcripts. FEELnc moves beyond conventional coding potential classifiers by providing a standardized and complete solution for annotating lncRNAs and is freely available at https://github.com/tderrien/FEELnc. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard
2010-01-30
Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.
2010-01-01
Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context. PMID:20113515
Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng
2013-01-01
In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR.
Integrating multisource imagery and GIS analysis for mapping Bermuda`s benthic habitats
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vierros, M.K.
1997-06-01
Bermuda is a group of isolated oceanic situated in the northwest Atlantic Ocean and surrounded by the Sargasso Sea. Bermuda possesses the northernmost coral reefs and mangroves in the Atlantic Ocean, and because of its high population density, both the terrestrial and marine environments are under intense human pressure. Although a long record of scientific research exists, this study is the first attempt to comprehensively map the area`s benthic habitats, despite the need for such a map for resource assessment and management purposes. Multi-source and multi-date imagery were used for producing the habitat map due to lack of a completemore » up-to-date image. Classifications were performed with SPOT data, and the results verified from recent aerial photography and current aerial video, along with extensive ground truthing. Stratification of the image into regions prior to classification reduced the confusing effects of varying water depth. Classification accuracy in shallow areas was increased by derivation of a texture pseudo-channel, while bathymetry was used as a classification tool in deeper areas, where local patterns of zonation were well known. Because of seasonal variation in extent of seagrasses, a classification scheme based on density could not be used. Instead, a set of classes based on the seagrass area`s exposure to the open ocean were developed. The resulting habitat map is currently being assessed for accuracy with promising preliminary results, indicating its usefulness as a basis for future resource assessment studies.« less
NASA Technical Reports Server (NTRS)
Kumar, Uttam; Nemani, Ramakrishna R.; Ganguly, Sangram; Kalia, Subodh; Michaelis, Andrew
2017-01-01
In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS-national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91 percent was achieved, which is a 6 percent improvement in unmixing based classification relative to per-pixel-based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.
NASA Astrophysics Data System (ADS)
Ganguly, S.; Kumar, U.; Nemani, R. R.; Kalia, S.; Michaelis, A.
2017-12-01
In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS - national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91% was achieved, which is a 6% improvement in unmixing based classification relative to per-pixel based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.
2013-05-28
those of the support vector machine and relevance vector machine, and the model runs more quickly than the other algorithms . When one class occurs...incremental support vector machine algorithm for online learning when fewer than 50 data points are available. (a) Papers published in peer-reviewed journals...learning environments, where data processing occurs one observation at a time and the classification algorithm improves over time with new
Design of partially supervised classifiers for multispectral image data
NASA Technical Reports Server (NTRS)
Jeon, Byeungwoo; Landgrebe, David
1993-01-01
A partially supervised classification problem is addressed, especially when the class definition and corresponding training samples are provided a priori only for just one particular class. In practical applications of pattern classification techniques, a frequently observed characteristic is the heavy, often nearly impossible requirements on representative prior statistical class characteristics of all classes in a given data set. Considering the effort in both time and man-power required to have a well-defined, exhaustive list of classes with a corresponding representative set of training samples, this 'partially' supervised capability would be very desirable, assuming adequate classifier performance can be obtained. Two different classification algorithms are developed to achieve simplicity in classifier design by reducing the requirement of prior statistical information without sacrificing significant classifying capability. The first one is based on optimal significance testing, where the optimal acceptance probability is estimated directly from the data set. In the second approach, the partially supervised classification is considered as a problem of unsupervised clustering with initially one known cluster or class. A weighted unsupervised clustering procedure is developed to automatically define other classes and estimate their class statistics. The operational simplicity thus realized should make these partially supervised classification schemes very viable tools in pattern classification.
Brain tumor classification and segmentation using sparse coding and dictionary learning.
Salman Al-Shaikhli, Saif Dawood; Yang, Michael Ying; Rosenhahn, Bodo
2016-08-01
This paper presents a novel fully automatic framework for multi-class brain tumor classification and segmentation using a sparse coding and dictionary learning method. The proposed framework consists of two steps: classification and segmentation. The classification of the brain tumors is based on brain topology and texture. The segmentation is based on voxel values of the image data. Using K-SVD, two types of dictionaries are learned from the training data and their associated ground truth segmentation: feature dictionary and voxel-wise coupled dictionaries. The feature dictionary consists of global image features (topological and texture features). The coupled dictionaries consist of coupled information: gray scale voxel values of the training image data and their associated label voxel values of the ground truth segmentation of the training data. For quantitative evaluation, the proposed framework is evaluated using different metrics. The segmentation results of the brain tumor segmentation (MICCAI-BraTS-2013) database are evaluated using five different metric scores, which are computed using the online evaluation tool provided by the BraTS-2013 challenge organizers. Experimental results demonstrate that the proposed approach achieves an accurate brain tumor classification and segmentation and outperforms the state-of-the-art methods.
Seismic Data Analysis throught Multi-Class Classification.
NASA Astrophysics Data System (ADS)
Anderson, P.; Kappedal, R. D.; Magana-Zook, S. A.
2017-12-01
In this research, we conducted twenty experiments of varying time and frequency bands on 5000seismic signals with the intent of finding a method to classify signals as either an explosion or anearthquake in an automated fashion. We used a multi-class approach by clustering of the data throughvarious techniques. Dimensional reduction was examined through the use of wavelet transforms withthe use of the coiflet mother wavelet and various coefficients to explore possible computational time vsaccuracy dependencies. Three and four classes were generated from the clustering techniques andexamined with the three class approach producing the most accurate and realistic results.
Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng
2013-01-01
In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR. PMID:23536777
NASA Astrophysics Data System (ADS)
Dondurur, Mehmet
The primary objective of this study was to determine the degree to which modern SAR systems can be used to obtain information about the Earth's vegetative resources. Information obtainable from microwave synthetic aperture radar (SAR) data was compared with that obtainable from LANDSAT-TM and SPOT data. Three hypotheses were tested: (a) Classification of land cover/use from SAR data can be accomplished on a pixel-by-pixel basis with the same overall accuracy as from LANDSAT-TM and SPOT data. (b) Classification accuracy for individual land cover/use classes will differ between sensors. (c) Combining information derived from optical and SAR data into an integrated monitoring system will improve overall and individual land cover/use class accuracies. The study was conducted with three data sets for the Sleeping Bear Dunes test site in the northwestern part of Michigan's lower peninsula, including an October 1982 LANDSAT-TM scene, a June 1989 SPOT scene and C-, L- and P-Band radar data from the Jet Propulsion Laboratory AIRSAR. Reference data were derived from the Michigan Resource Information System (MIRIS) and available color infrared aerial photos. Classification and rectification of data sets were done using ERDAS Image Processing Programs. Classification algorithms included Maximum Likelihood, Mahalanobis Distance, Minimum Spectral Distance, ISODATA, Parallelepiped, and Sequential Cluster Analysis. Classified images were rectified as necessary so that all were at the same scale and oriented north-up. Results were analyzed with contingency tables and percent correctly classified (PCC) and Cohen's Kappa (CK) as accuracy indices using CSLANT and ImagePro programs developed for this study. Accuracy analyses were based upon a 1.4 by 6.5 km area with its long axis east-west. Reference data for this subscene total 55,770 15 by 15 m pixels with sixteen cover types, including seven level III forest classes, three level III urban classes, two level II range classes, two water classes, one wetland class and one agriculture class. An initial analysis was made without correcting the 1978 MIRIS reference data to the different dates of the TM, SPOT and SAR data sets. In this analysis, highest overall classification accuracy (PCC) was 87% with the TM data set, with both SPOT and C-Band SAR at 85%, a difference statistically significant at the 0.05 level. When the reference data were corrected for land cover change between 1978 and 1991, classification accuracy with the C-Band SAR data increased to 87%. Classification accuracy differed from sensor to sensor for individual land cover classes, Combining sensors into hypothetical multi-sensor systems resulted in higher accuracies than for any single sensor. Combining LANDSAT -TM and C-Band SAR yielded an overall classification accuracy (PCC) of 92%. The results of this study indicate that C-Band SAR data provide an acceptable substitute for LANDSAT-TM or SPOT data when land cover information is desired of areas where cloud cover obscures the terrain. Even better results can be obtained by integrating TM and C-Band SAR data into a multi-sensor system.
2002-01-01
their expression profile and for classification of cells into tumerous and non- tumerous classes. Then we will present a parallel tree method for... cancerous cells. We will use the same dataset and use tree structured classifiers with multi-resolution analysis for classifying cancerous from non- cancerous ...cells. We have the expressions of 4096 genes from 98 different cell types. Of these 98, 72 are cancerous while 26 are non- cancerous . We are interested
A Multi-modal, Discriminative and Spatially Invariant CNN for RGB-D Object Labeling.
Asif, Umar; Bennamoun, Mohammed; Sohel, Ferdous
2017-08-30
While deep convolutional neural networks have shown a remarkable success in image classification, the problems of inter-class similarities, intra-class variances, the effective combination of multimodal data, and the spatial variability in images of objects remain to be major challenges. To address these problems, this paper proposes a novel framework to learn a discriminative and spatially invariant classification model for object and indoor scene recognition using multimodal RGB-D imagery. This is achieved through three postulates: 1) spatial invariance - this is achieved by combining a spatial transformer network with a deep convolutional neural network to learn features which are invariant to spatial translations, rotations, and scale changes, 2) high discriminative capability - this is achieved by introducing Fisher encoding within the CNN architecture to learn features which have small inter-class similarities and large intra-class compactness, and 3) multimodal hierarchical fusion - this is achieved through the regularization of semantic segmentation to a multi-modal CNN architecture, where class probabilities are estimated at different hierarchical levels (i.e., imageand pixel-levels), and fused into a Conditional Random Field (CRF)- based inference hypothesis, the optimization of which produces consistent class labels in RGB-D images. Extensive experimental evaluations on RGB-D object and scene datasets, and live video streams (acquired from Kinect) show that our framework produces superior object and scene classification results compared to the state-of-the-art methods.
Camera-Model Identification Using Markovian Transition Probability Matrix
NASA Astrophysics Data System (ADS)
Xu, Guanshuo; Gao, Shang; Shi, Yun Qing; Hu, Ruimin; Su, Wei
Detecting the (brands and) models of digital cameras from given digital images has become a popular research topic in the field of digital forensics. As most of images are JPEG compressed before they are output from cameras, we propose to use an effective image statistical model to characterize the difference JPEG 2-D arrays of Y and Cb components from the JPEG images taken by various camera models. Specifically, the transition probability matrices derived from four different directional Markov processes applied to the image difference JPEG 2-D arrays are used to identify statistical difference caused by image formation pipelines inside different camera models. All elements of the transition probability matrices, after a thresholding technique, are directly used as features for classification purpose. Multi-class support vector machines (SVM) are used as the classification tool. The effectiveness of our proposed statistical model is demonstrated by large-scale experimental results.
A Neural-Network-Based Semi-Automated Geospatial Classification Tool
NASA Astrophysics Data System (ADS)
Hale, R. G.; Herzfeld, U. C.
2014-12-01
North America's largest glacier system, the Bering Bagley Glacier System (BBGS) in Alaska, surged in 2011-2013, as shown by rapid mass transfer, elevation change, and heavy crevassing. Little is known about the physics controlling surge glaciers' semi-cyclic patterns; therefore, it is crucial to collect and analyze as much data as possible so that predictive models can be made. In addition, physical signs frozen in ice in the form of crevasses may help serve as a warning for future surges. The BBGS surge provided an opportunity to develop an automated classification tool for crevasse classification based on imagery collected from small aircraft. The classification allows one to link image classification to geophysical processes associated with ice deformation. The tool uses an approach that employs geostatistical functions and a feed-forward perceptron with error back-propagation. The connectionist-geostatistical approach uses directional experimental (discrete) variograms to parameterize images into a form that the Neural Network (NN) can recognize. In an application to preform analysis on airborne video graphic data from the surge of the BBGS, an NN was able to distinguish 18 different crevasse classes with 95 percent or higher accuracy, for over 3,000 images. Recognizing that each surge wave results in different crevasse types and that environmental conditions affect the appearance in imagery, we designed the tool's semi-automated pre-training algorithm to be adaptable. The tool can be optimized to specific settings and variables of image analysis: (airborne and satellite imagery, different camera types, observation altitude, number and types of classes, and resolution). The generalization of the classification tool brings three important advantages: (1) multiple types of problems in geophysics can be studied, (2) the training process is sufficiently formalized to allow non-experts in neural nets to perform the training process, and (3) the time required to manually pre-sort imagery into classes is greatly reduced.
Lajnef, Tarek; Chaibi, Sahbi; Ruby, Perrine; Aguera, Pierre-Emmanuel; Eichenlaub, Jean-Baptiste; Samet, Mounir; Kachouri, Abdennaceur; Jerbi, Karim
2015-07-30
Sleep staging is a critical step in a range of electrophysiological signal processing pipelines used in clinical routine as well as in sleep research. Although the results currently achievable with automatic sleep staging methods are promising, there is need for improvement, especially given the time-consuming and tedious nature of visual sleep scoring. Here we propose a sleep staging framework that consists of a multi-class support vector machine (SVM) classification based on a decision tree approach. The performance of the method was evaluated using polysomnographic data from 15 subjects (electroencephalogram (EEG), electrooculogram (EOG) and electromyogram (EMG) recordings). The decision tree, or dendrogram, was obtained using a hierarchical clustering technique and a wide range of time and frequency-domain features were extracted. Feature selection was carried out using forward sequential selection and classification was evaluated using k-fold cross-validation. The dendrogram-based SVM (DSVM) achieved mean specificity, sensitivity and overall accuracy of 0.92, 0.74 and 0.88 respectively, compared to expert visual scoring. Restricting DSVM classification to data where both experts' scoring was consistent (76.73% of the data) led to a mean specificity, sensitivity and overall accuracy of 0.94, 0.82 and 0.92 respectively. The DSVM framework outperforms classification with more standard multi-class "one-against-all" SVM and linear-discriminant analysis. The promising results of the proposed methodology suggest that it may be a valuable alternative to existing automatic methods and that it could accelerate visual scoring by providing a robust starting hypnogram that can be further fine-tuned by expert inspection. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Cheng, Tao; Zhang, Jialong; Zheng, Xinyan; Yuan, Rujin
2018-03-01
The project of The First National Geographic Conditions Census developed by Chinese government has designed the data acquisition content and indexes, and has built corresponding classification system mainly based on the natural property of material. However, the unified standard for land cover classification system has not been formed; the production always needs converting to meet the actual needs. Therefore, it proposed a refined classification method based on multi source of remote sensing information fusion. It takes the third-level classes of forest land and grassland for example, and has collected the thematic data of Vegetation Map of China (1:1,000,000), attempts to develop refined classification utilizing raster spatial analysis model. Study area is selected, and refined classification is achieved by using the proposed method. The results show that land cover within study area is divided principally among 20 classes, from subtropical broad-leaved forest (31131) to grass-forb community type of low coverage grassland (41192); what's more, after 30 years in the study area, climatic factors, developmental rhythm characteristics and vegetation ecological geographical characteristics have not changed fundamentally, only part of the original vegetation types have changed in spatial distribution range or land cover types. Research shows that refined classification for the third-level classes of forest land and grassland could make the results take on both the natural attributes of the original and plant community ecology characteristics, which could meet the needs of some industry application, and has certain practical significance for promoting the product of The First National Geographic Conditions Census.
Large-scale optimization-based classification models in medicine and biology.
Lee, Eva K
2007-06-01
We present novel optimization-based classification models that are general purpose and suitable for developing predictive rules for large heterogeneous biological and medical data sets. Our predictive model simultaneously incorporates (1) the ability to classify any number of distinct groups; (2) the ability to incorporate heterogeneous types of attributes as input; (3) a high-dimensional data transformation that eliminates noise and errors in biological data; (4) the ability to incorporate constraints to limit the rate of misclassification, and a reserved-judgment region that provides a safeguard against over-training (which tends to lead to high misclassification rates from the resulting predictive rule); and (5) successive multi-stage classification capability to handle data points placed in the reserved-judgment region. To illustrate the power and flexibility of the classification model and solution engine, and its multi-group prediction capability, application of the predictive model to a broad class of biological and medical problems is described. Applications include: the differential diagnosis of the type of erythemato-squamous diseases; predicting presence/absence of heart disease; genomic analysis and prediction of aberrant CpG island meythlation in human cancer; discriminant analysis of motility and morphology data in human lung carcinoma; prediction of ultrasonic cell disruption for drug delivery; identification of tumor shape and volume in treatment of sarcoma; discriminant analysis of biomarkers for prediction of early atherosclerois; fingerprinting of native and angiogenic microvascular networks for early diagnosis of diabetes, aging, macular degeneracy and tumor metastasis; prediction of protein localization sites; and pattern recognition of satellite images in classification of soil types. In all these applications, the predictive model yields correct classification rates ranging from 80 to 100%. This provides motivation for pursuing its use as a medical diagnostic, monitoring and decision-making tool.
NASA Astrophysics Data System (ADS)
Ganguly, S.; Kumar, U.; Nemani, R. R.; Kalia, S.; Michaelis, A.
2016-12-01
In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS - national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91% was achieved, which is a 6% improvement in unmixing based classification relative to per-pixel based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.
Gu, Yingxin; Brown, Jesslyn F.; Miura, Tomoaki; van Leeuwen, Willem J.D.; Reed, Bradley C.
2010-01-01
This study introduces a new geographic framework, phenological classification, for the conterminous United States based on Moderate Resolution Imaging Spectroradiometer (MODIS) Normalized Difference Vegetation Index (NDVI) time-series data and a digital elevation model. The resulting pheno-class map is comprised of 40 pheno-classes, each having unique phenological and topographic characteristics. Cross-comparison of the pheno-classes with the 2001 National Land Cover Database indicates that the new map contains additional phenological and climate information. The pheno-class framework may be a suitable basis for the development of an Advanced Very High Resolution Radiometer (AVHRR)-MODIS NDVI translation algorithm and for various biogeographic studies.
The Gamma-Ray Burst ToolSHED is Open for Business
NASA Astrophysics Data System (ADS)
Giblin, Timothy W.; Hakkila, Jon; Haglin, David J.; Roiger, Richard J.
2004-09-01
The GRB ToolSHED, a Gamma-Ray Burst SHell for Expeditions in Data-Mining, is now online and available via a web browser to all in the scientific community. The ToolSHED is an online web utility that contains pre-processed burst attributes of the BATSE catalog and a suite of induction-based machine learning and statistical tools for classification and cluster analysis. Users create their own login account and study burst properties within user-defined multi-dimensional parameter spaces. Although new GRB attributes are periodically added to the database for user selection, the ToolSHED has a feature that allows users to upload their own burst attributes (e.g. spectral parameters, etc.) so that additional parameter spaces can be explored. A data visualization feature using GNUplot and web-based IDL has also been implemented to provide interactive plotting of user-selected session output. In an era in which GRB observations and attributes are becoming increasingly more complex, a utility such as the GRB ToolSHED may play an important role in deciphering GRB classes and understanding intrinsic burst properties.
NASA Astrophysics Data System (ADS)
Richards, Joseph W.; Starr, Dan L.; Miller, Adam A.; Bloom, Joshua S.; Butler, Nathaniel R.; Brink, Henrik; Crellin-Quick, Arien
2012-12-01
With growing data volumes from synoptic surveys, astronomers necessarily must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of classification purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In addition to producing accurate classifications, we show how to estimate calibrated class probabilities and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All-Sky Automated Survey (ASAS), and release the Machine-learned ASAS Classification Catalog (MACC), a 28 class probabilistic classification catalog of 50,124 ASAS sources in the ASAS Catalog of Variable Stars. We estimate that MACC achieves a sub-20% classification error rate and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Richards, Joseph W.; Starr, Dan L.; Miller, Adam A.
2012-12-15
With growing data volumes from synoptic surveys, astronomers necessarily must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of classification purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In additionmore » to producing accurate classifications, we show how to estimate calibrated class probabilities and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All-Sky Automated Survey (ASAS), and release the Machine-learned ASAS Classification Catalog (MACC), a 28 class probabilistic classification catalog of 50,124 ASAS sources in the ASAS Catalog of Variable Stars. We estimate that MACC achieves a sub-20% classification error rate and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes.« less
Aguiar, G F M; Batista, B L; Rodrigues, J L; Silva, L R S; Campiglia, A D; Barbosa, R M; Barbosa, F
2012-12-01
The reproductive performance of cattle may be influenced by several factors, but mineral imbalances are crucial in terms of direct effects on reproduction. Several studies have shown that elements such as calcium, copper, iron, magnesium, selenium, and zinc are essential for reproduction and can prevent oxidative stress. However, toxic elements such as lead, nickel, and arsenic can have adverse effects on reproduction. In this paper, we applied a simple and fast method of multi-element analysis to bovine semen samples from Zebu and European classes used in reproduction programs and artificial insemination. Samples were analyzed by inductively coupled plasma spectrometry (ICP-MS) using aqueous medium calibration and the samples were diluted in a proportion of 1:50 in a solution containing 0.01% (vol/vol) Triton X-100 and 0.5% (vol/vol) nitric acid. Rhodium, iridium, and yttrium were used as the internal standards for ICP-MS analysis. To develop a reliable method of tracing the class of bovine semen, we used data mining techniques that make it possible to classify unknown samples after checking the differentiation of known-class samples. Based on the determination of 15 elements in 41 samples of bovine semen, 3 machine-learning tools for classification were applied to determine cattle class. Our results demonstrate the potential of support vector machine (SVM), multilayer perceptron (MLP), and random forest (RF) chemometric tools to identify cattle class. Moreover, the selection tools made it possible to reduce the number of chemical elements needed from 15 to just 8. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Sah, Shagan
An increasingly important application of remote sensing is to provide decision support during emergency response and disaster management efforts. Land cover maps constitute one such useful application product during disaster events; if generated rapidly after any disaster, such map products can contribute to the efficacy of the response effort. In light of recent nuclear incidents, e.g., after the earthquake/tsunami in Japan (2011), our research focuses on constructing rapid and accurate land cover maps of the impacted area in case of an accidental nuclear release. The methodology involves integration of results from two different approaches, namely coarse spatial resolution multi-temporal and fine spatial resolution imagery, to increase classification accuracy. Although advanced methods have been developed for classification using high spatial or temporal resolution imagery, only a limited amount of work has been done on fusion of these two remote sensing approaches. The presented methodology thus involves integration of classification results from two different remote sensing modalities in order to improve classification accuracy. The data used included RapidEye and MODIS scenes over the Nine Mile Point Nuclear Power Station in Oswego (New York, USA). The first step in the process was the construction of land cover maps from freely available, high temporal resolution, low spatial resolution MODIS imagery using a time-series approach. We used the variability in the temporal signatures among different land cover classes for classification. The time series-specific features were defined by various physical properties of a pixel, such as variation in vegetation cover and water content over time. The pixels were classified into four land cover classes - forest, urban, water, and vegetation - using Euclidean and Mahalanobis distance metrics. On the other hand, a high spatial resolution commercial satellite, such as RapidEye, can be tasked to capture images over the affected area in the case of a nuclear event. This imagery served as a second source of data to augment results from the time series approach. The classifications from the two approaches were integrated using an a posteriori probability-based fusion approach. This was done by establishing a relationship between the classes, obtained after classification of the two data sources. Despite the coarse spatial resolution of MODIS pixels, acceptable accuracies were obtained using time series features. The overall accuracies using the fusion-based approach were in the neighborhood of 80%, when compared with GIS data sets from New York State. This fusion thus contributed to classification accuracy refinement, with a few additional advantages, such as correction for cloud cover and providing for an approach that is robust against point-in-time seasonal anomalies, due to the inclusion of multi-temporal data. We concluded that this approach is capable of generating land cover maps of acceptable accuracy and rapid turnaround, which in turn can yield reliable estimates of crop acreage of a region. The final algorithm is part of an automated software tool, which can be used by emergency response personnel to generate a nuclear ingestion pathway information product within a few hours of data collection.
Li, Jinyan; Fong, Simon; Sung, Yunsick; Cho, Kyungeun; Wong, Raymond; Wong, Kelvin K L
2016-01-01
An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE. Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.
ERIC Educational Resources Information Center
Chen, Hsin-Liang; Williams, James Patrick
2009-01-01
This project studies the use of multi-modal media objects in an online information literacy class. One hundred sixty-two undergraduate students answered seven surveys. Significant relationships are found among computer skills, teaching materials, communication tools and learning experience. Multi-modal media objects and communication tools are…
Henriksen, James A.; Heasley, John; Kennen, Jonathan G.; Nieswand, Steven
2006-01-01
Applying the Hydroecological Integrity Assessment Process involves four steps: (1) a hydrologic classification of relatively unmodified streams in a geographic area using long-term gage records and 171 ecologically relevant indices; (2) the identification of statistically significant, nonredundant, hydroecologically relevant indices associated with the five major flow components for each stream class; and (3) the development of a stream-classification tool and a hydrologic assessment tool. Four computer software tools have been developed.
Ko, Yi-An; Mukherjee, Bhramar; Smith, Jennifer A; Kardia, Sharon L R; Allison, Matthew; Diez Roux, Ana V
2016-11-01
There has been an increased interest in identifying gene-environment interaction (G × E) in the context of multiple environmental exposures. Most G × E studies analyze one exposure at a time, but we are exposed to multiple exposures in reality. Efficient analysis strategies for complex G × E with multiple environmental factors in a single model are still lacking. Using the data from the Multiethnic Study of Atherosclerosis, we illustrate a two-step approach for modeling G × E with multiple environmental factors. First, we utilize common clustering and classification strategies (e.g., k-means, latent class analysis, classification and regression trees, Bayesian clustering using Dirichlet Process) to define subgroups corresponding to distinct environmental exposure profiles. Second, we illustrate the use of an additive main effects and multiplicative interaction model, instead of the conventional saturated interaction model using product terms of factors, to study G × E with the data-driven exposure subgroups defined in the first step. We demonstrate useful analytical approaches to translate multiple environmental exposures into one summary class. These tools not only allow researchers to consider several environmental exposures in G × E analysis but also provide some insight into how genes modify the effect of a comprehensive exposure profile instead of examining effect modification for each exposure in isolation.
Classification and authentication of unknown water samples using machine learning algorithms.
Kundu, Palash K; Panchariya, P C; Kundu, Madhusree
2011-07-01
This paper proposes the development of water sample classification and authentication, in real life which is based on machine learning algorithms. The proposed techniques used experimental measurements from a pulse voltametry method which is based on an electronic tongue (E-tongue) instrumentation system with silver and platinum electrodes. E-tongue include arrays of solid state ion sensors, transducers even of different types, data collectors and data analysis tools, all oriented to the classification of liquid samples and authentication of unknown liquid samples. The time series signal and the corresponding raw data represent the measurement from a multi-sensor system. The E-tongue system, implemented in a laboratory environment for 6 numbers of different ISI (Bureau of Indian standard) certified water samples (Aquafina, Bisleri, Kingfisher, Oasis, Dolphin, and McDowell) was the data source for developing two types of machine learning algorithms like classification and regression. A water data set consisting of 6 numbers of sample classes containing 4402 numbers of features were considered. A PCA (principal component analysis) based classification and authentication tool was developed in this study as the machine learning component of the E-tongue system. A proposed partial least squares (PLS) based classifier, which was dedicated as well; to authenticate a specific category of water sample evolved out as an integral part of the E-tongue instrumentation system. The developed PCA and PLS based E-tongue system emancipated an overall encouraging authentication percentage accuracy with their excellent performances for the aforesaid categories of water samples. Copyright © 2011 ISA. Published by Elsevier Ltd. All rights reserved.
Cao, Peng; Liu, Xiaoli; Yang, Jinzhu; Zhao, Dazhe; Huang, Min; Zhang, Jian; Zaiane, Osmar
2017-12-01
Alzheimer's disease (AD) has been not only a substantial financial burden to the health care system but also an emotional burden to patients and their families. Making accurate diagnosis of AD based on brain magnetic resonance imaging (MRI) is becoming more and more critical and emphasized at the earliest stages. However, the high dimensionality and imbalanced data issues are two major challenges in the study of computer aided AD diagnosis. The greatest limitations of existing dimensionality reduction and over-sampling methods are that they assume a linear relationship between the MRI features (predictor) and the disease status (response). To better capture the complicated but more flexible relationship, we propose a multi-kernel based dimensionality reduction and over-sampling approaches. We combined Marginal Fisher Analysis with ℓ 2,1 -norm based multi-kernel learning (MKMFA) to achieve the sparsity of region-of-interest (ROI), which leads to simultaneously selecting a subset of the relevant brain regions and learning a dimensionality transformation. Meanwhile, a multi-kernel over-sampling (MKOS) was developed to generate synthetic instances in the optimal kernel space induced by MKMFA, so as to compensate for the class imbalanced distribution. We comprehensively evaluate the proposed models for the diagnostic classification (binary class and multi-class classification) including all subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. The experimental results not only demonstrate the proposed method has superior performance over multiple comparable methods, but also identifies relevant imaging biomarkers that are consistent with prior medical knowledge. Copyright © 2017 Elsevier Ltd. All rights reserved.
Multi-site evaluation of IKONOS data for classification of tropical coral reef environments
Andrefouet, S.; Kramer, Philip; Torres-Pulliza, D.; Joyce, K.E.; Hochberg, E.J.; Garza-Perez, R.; Mumby, P.J.; Riegl, Bernhard; Yamano, H.; White, W.H.; Zubia, M.; Brock, J.C.; Phinn, S.R.; Naseer, A.; Hatcher, B.G.; Muller-Karger, F. E.
2003-01-01
Ten IKONOS images of different coral reef sites distributed around the world were processed to assess the potential of 4-m resolution multispectral data for coral reef habitat mapping. Complexity of reef environments, established by field observation, ranged from 3 to 15 classes of benthic habitats containing various combinations of sediments, carbonate pavement, seagrass, algae, and corals in different geomorphologic zones (forereef, lagoon, patch reef, reef flats). Processing included corrections for sea surface roughness and bathymetry, unsupervised or supervised classification, and accuracy assessment based on ground-truth data. IKONOS classification results were compared with classified Landsat 7 imagery for simple to moderate complexity of reef habitats (5-11 classes). For both sensors, overall accuracies of the classifications show a general linear trend of decreasing accuracy with increasing habitat complexity. The IKONOS sensor performed better, with a 15-20% improvement in accuracy compared to Landsat. For IKONOS, overall accuracy was 77% for 4-5 classes, 71% for 7-8 classes, 65% in 9-11 classes, and 53% for more than 13 classes. The Landsat classification accuracy was systematically lower, with an average of 56% for 5-10 classes. Within this general trend, inter-site comparisons and specificities demonstrate the benefits of different approaches. Pre-segmentation of the different geomorphologic zones and depth correction provided different advantages in different environments. Our results help guide scientists and managers in applying IKONOS-class data for coral reef mapping applications. ?? 2003 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Mohamed, Abdul Aziz; Hasan, Abu Bakar; Ghazali, Abu Bakar Mhd.
2017-01-01
Classification of large data into respected classes or groups could be carried out with the help of artificial intelligence (AI) tools readily available in the market. To get the optimum or best results, optimization tool could be applied on those data. Classification and optimization have been used by researchers throughout their works, and the outcomes were very encouraging indeed. Here, the authors are trying to share what they have experienced in three different areas of applied research.
Wan, Shixiang; Duan, Yucong; Zou, Quan
2017-09-01
Predicting the subcellular localization of proteins is an important and challenging problem. Traditional experimental approaches are often expensive and time-consuming. Consequently, a growing number of research efforts employ a series of machine learning approaches to predict the subcellular location of proteins. There are two main challenges among the state-of-the-art prediction methods. First, most of the existing techniques are designed to deal with multi-class rather than multi-label classification, which ignores connections between multiple labels. In reality, multiple locations of particular proteins imply that there are vital and unique biological significances that deserve special focus and cannot be ignored. Second, techniques for handling imbalanced data in multi-label classification problems are necessary, but never employed. For solving these two issues, we have developed an ensemble multi-label classifier called HPSLPred, which can be applied for multi-label classification with an imbalanced protein source. For convenience, a user-friendly webserver has been established at http://server.malab.cn/HPSLPred. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Natural stimuli improve auditory BCIs with respect to ergonomics and performance
NASA Astrophysics Data System (ADS)
Höhne, Johannes; Krenzlin, Konrad; Dähne, Sven; Tangermann, Michael
2012-08-01
Moving from well-controlled, brisk artificial stimuli to natural and less-controlled stimuli seems counter-intuitive for event-related potential (ERP) studies. As natural stimuli typically contain a richer internal structure, they might introduce higher levels of variance and jitter in the ERP responses. Both characteristics are unfavorable for a good single-trial classification of ERPs in the context of a multi-class brain-computer interface (BCI) system, where the class-discriminant information between target stimuli and non-target stimuli must be maximized. For the application in an auditory BCI system, however, the transition from simple artificial tones to natural syllables can be useful despite the variance introduced. In the presented study, healthy users (N = 9) participated in an offline auditory nine-class BCI experiment with artificial and natural stimuli. It is shown that the use of syllables as natural stimuli does not only improve the users’ ergonomic ratings; also the classification performance is increased. Moreover, natural stimuli obtain a better balance in multi-class decisions, such that the number of systematic confusions between the nine classes is reduced. Hopefully, our findings may contribute to make auditory BCI paradigms more user friendly and applicable for patients.
Classification of forest land attributes using multi-source remotely sensed data
NASA Astrophysics Data System (ADS)
Pippuri, Inka; Suvanto, Aki; Maltamo, Matti; Korhonen, Kari T.; Pitkänen, Juho; Packalen, Petteri
2016-02-01
The aim of the study was to (1) examine the classification of forest land using airborne laser scanning (ALS) data, satellite images and sample plots of the Finnish National Forest Inventory (NFI) as training data and to (2) identify best performing metrics for classifying forest land attributes. Six different schemes of forest land classification were studied: land use/land cover (LU/LC) classification using both national classes and FAO (Food and Agricultural Organization of the United Nations) classes, main type, site type, peat land type and drainage status. Special interest was to test different ALS-based surface metrics in classification of forest land attributes. Field data consisted of 828 NFI plots collected in 2008-2012 in southern Finland and remotely sensed data was from summer 2010. Multinomial logistic regression was used as the classification method. Classification of LU/LC classes were highly accurate (kappa-values 0.90 and 0.91) but also the classification of site type, peat land type and drainage status succeeded moderately well (kappa-values 0.51, 0.69 and 0.52). ALS-based surface metrics were found to be the most important predictor variables in classification of LU/LC class, main type and drainage status. In best classification models of forest site types both spectral metrics from satellite data and point cloud metrics from ALS were used. In turn, in the classification of peat land types ALS point cloud metrics played the most important role. Results indicated that the prediction of site type and forest land category could be incorporated into stand level forest management inventory system in Finland.
Astroinformatics in the Age of LSST: Analyzing the Summer 2012 Data Release
NASA Astrophysics Data System (ADS)
Borne, Kirk D.; De Lee, N. M.; Stassun, K.; Paegert, M.; Cargile, P.; Burger, D.; Bloom, J. S.; Richards, J.
2013-01-01
The Large Synoptic Survey Telescope (LSST) will image the visible southern sky every three nights. This multi-band, multi-epoch survey will produce a torrent of data, which traditional methods of object-by-object data analysis will not be able to accommodate. Thus the need for new astroinformatics tools to visualize, simulate, mine, and analyze this quantity of data. The Berkeley Center for Time-Domain Informatics (CTDI) is building the informatics infrastructure for generic light curve classification, including the innovation of new algorithms for feature generation and machine learning. The CTDI portal (http://dotastro.org) contains one of the largest collections of public light curves, with visualization and exploration tools. The group has also published the first calibrated probabilistic classification catalog of 50k variable stars along with a data exploration portal called http://bigmacc.info. Twice a year, the LSST collaboration releases simulated LSST data, in order to aid software development. This poster also showcases a suite of new tools from the Vanderbilt Initiative in Data-instensive Astrophysics (VIDA), designed to take advantage of these large data sets. VIDA's Filtergraph interactive web tool allows one to instantly create an interactive data portal for fast, real-time visualization of large data sets. Filtergraph enables quick selection of interesting objects by easily filtering on many different columns, 2-D and 3-D representations, and on-the-fly arithmetic calculations on the data. It also makes sharing the data and the tool with collaborators very easy. The EB/RRL Factory is a neural-network based variable star classifier, which is designed to quickly identify variable stars in a variety of classes from LSST light curve data (currently tuned to Eclipsing Binaries and RR Lyrae stars), and to provide likelihood-based orbital elements or stellar parameters as appropriate. Finally the LCsimulator software allows one to create simulated light curves of multiple types of variable stars based on an LSST cadence.
EUCLID: automatic classification of proteins in functional classes by their database annotations.
Tamames, J; Ouzounis, C; Casari, G; Sander, C; Valencia, A
1998-01-01
A tool is described for the automatic classification of sequences in functional classes using their database annotations. The Euclid system is based on a simple learning procedure from examples provided by human experts. Euclid is freely available for academics at http://www.gredos.cnb.uam.es/EUCLID, with the corresponding dictionaries for the generation of three, eight and 14 functional classes. E-mail: valencia@cnb.uam.es The results of the EUCLID classification of different genomes are available at http://www.sander.ebi.ac. uk/genequiz/. A detailed description of the different applications mentioned in the text is available at http://www.gredos.cnb.uam. es/EUCLID/Full_Paper
nRC: non-coding RNA Classifier based on structural features.
Fiannaca, Antonino; La Rosa, Massimo; La Paglia, Laura; Rizzo, Riccardo; Urso, Alfonso
2017-01-01
Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. In this work, we introduce a new ncRNA classification tool, nRC (non-coding RNA Classifier). Our approach is based on features extraction from the ncRNA secondary structure together with a supervised classification algorithm implementing a deep learning architecture based on convolutional neural networks. We tested our approach for the classification of 13 different ncRNA classes. We obtained classification scores, using the most common statistical measures. In particular, we reach an accuracy and sensitivity score of about 74%. The proposed method outperforms other similar classification methods based on secondary structure features and machine learning algorithms, including the RNAcon tool that, to date, is the reference classifier. nRC tool is freely available as a docker image at https://hub.docker.com/r/tblab/nrc/. The source code of nRC tool is also available at https://github.com/IcarPA-TBlab/nrc.
NASA Astrophysics Data System (ADS)
Alevizos, Evangelos; Snellen, Mirjam; Simons, Dick; Siemes, Kerstin; Greinert, Jens
2018-06-01
This study applies three classification methods exploiting the angular dependence of acoustic seafloor backscatter along with high resolution sub-bottom profiling for seafloor sediment characterization in the Eckernförde Bay, Baltic Sea Germany. This area is well suited for acoustic backscatter studies due to its shallowness, its smooth bathymetry and the presence of a wide range of sediment types. Backscatter data were acquired using a Seabeam1180 (180 kHz) multibeam echosounder and sub-bottom profiler data were recorded using a SES-2000 parametric sonar transmitting 6 and 12 kHz. The high density of seafloor soundings allowed extracting backscatter layers for five beam angles over a large part of the surveyed area. A Bayesian probability method was employed for sediment classification based on the backscatter variability at a single incidence angle, whereas Maximum Likelihood Classification (MLC) and Principal Components Analysis (PCA) were applied to the multi-angle layers. The Bayesian approach was used for identifying the optimum number of acoustic classes because cluster validation is carried out prior to class assignment and class outputs are ordinal categorical values. The method is based on the principle that backscatter values from a single incidence angle express a normal distribution for a particular sediment type. The resulting Bayesian classes were well correlated to median grain sizes and the percentage of coarse material. The MLC method uses angular response information from five layers of training areas extracted from the Bayesian classification map. The subsequent PCA analysis is based on the transformation of these five layers into two principal components that comprise most of the data variability. These principal components were clustered in five classes after running an external cluster validation test. In general both methods MLC and PCA, separated the various sediment types effectively, showing good agreement (kappa >0.7) with the Bayesian approach which also correlates well with ground truth data (r2 > 0.7). In addition, sub-bottom data were used in conjunction with the Bayesian classification results to characterize acoustic classes with respect to their geological and stratigraphic interpretation. The joined interpretation of seafloor and sub-seafloor data sets proved to be an efficient approach for a better understanding of seafloor backscatter patchiness and to discriminate acoustically similar classes in different geological/bathymetric settings.
Mapping raised bogs with an iterative one-class classification approach
NASA Astrophysics Data System (ADS)
Mack, Benjamin; Roscher, Ribana; Stenzel, Stefanie; Feilhauer, Hannes; Schmidtlein, Sebastian; Waske, Björn
2016-10-01
Land use and land cover maps are one of the most commonly used remote sensing products. In many applications the user only requires a map of one particular class of interest, e.g. a specific vegetation type or an invasive species. One-class classifiers are appealing alternatives to common supervised classifiers because they can be trained with labeled training data of the class of interest only. However, training an accurate one-class classification (OCC) model is challenging, particularly when facing a large image, a small class and few training samples. To tackle these problems we propose an iterative OCC approach. The presented approach uses a biased Support Vector Machine as core classifier. In an iterative pre-classification step a large part of the pixels not belonging to the class of interest is classified. The remaining data is classified by a final classifier with a novel model and threshold selection approach. The specific objective of our study is the classification of raised bogs in a study site in southeast Germany, using multi-seasonal RapidEye data and a small number of training sample. Results demonstrate that the iterative OCC outperforms other state of the art one-class classifiers and approaches for model selection. The study highlights the potential of the proposed approach for an efficient and improved mapping of small classes such as raised bogs. Overall the proposed approach constitutes a feasible approach and useful modification of a regular one-class classifier.
NASA Astrophysics Data System (ADS)
Dekavalla, Maria; Argialas, Demetre
2017-07-01
The analysis of undersea topography and geomorphological features provides necessary information to related disciplines and many applications. The development of an automated knowledge-based classification approach of undersea topography and geomorphological features is challenging due to their multi-scale nature. The aim of the study is to develop and evaluate an automated knowledge-based OBIA approach to: i) decompose the global undersea topography to multi-scale regions of distinct morphometric properties, and ii) assign the derived regions to characteristic geomorphological features. First, the global undersea topography was decomposed through the SRTM30_PLUS bathymetry data to the so-called morphometric objects of discrete morphometric properties and spatial scales defined by data-driven methods (local variance graphs and nested means) and multi-scale analysis. The derived morphometric objects were combined with additional relative topographic position information computed with a self-adaptive pattern recognition method (geomorphons), and auxiliary data and were assigned to characteristic undersea geomorphological feature classes through a knowledge base, developed from standard definitions. The decomposition of the SRTM30_PLUS data to morphometric objects was considered successful for the requirements of maximizing intra-object and inter-object heterogeneity, based on the near zero values of the Moran's I and the low values of the weighted variance index. The knowledge-based classification approach was tested for its transferability in six case studies of various tectonic settings and achieved the efficient extraction of 11 undersea geomorphological feature classes. The classification results for the six case studies were compared with the digital global seafloor geomorphic features map (GSFM). The 11 undersea feature classes and their producer's accuracies in respect to the GSFM relevant areas were Basin (95%), Continental Shelf (94.9%), Trough (88.4%), Plateau (78.9%), Continental Slope (76.4%), Trench (71.2%), Abyssal Hill (62.9%), Abyssal Plain (62.4%), Ridge (49.8%), Seamount (48.8%) and Continental Rise (25.4%). The knowledge-based OBIA classification approach was considered transferable since the percentages of spatial and thematic agreement between the most of the classified undersea feature classes and the GSFM exhibited low deviations across the six case studies.
Localized contourlet features in vehicle make and model recognition
NASA Astrophysics Data System (ADS)
Zafar, I.; Edirisinghe, E. A.; Acar, B. S.
2009-02-01
Automatic vehicle Make and Model Recognition (MMR) systems provide useful performance enhancements to vehicle recognitions systems that are solely based on Automatic Number Plate Recognition (ANPR) systems. Several vehicle MMR systems have been proposed in literature. In parallel to this, the usefulness of multi-resolution based feature analysis techniques leading to efficient object classification algorithms have received close attention from the research community. To this effect, Contourlet transforms that can provide an efficient directional multi-resolution image representation has recently been introduced. Already an attempt has been made in literature to use Curvelet/Contourlet transforms in vehicle MMR. In this paper we propose a novel localized feature detection method in Contourlet transform domain that is capable of increasing the classification rates up to 4%, as compared to the previously proposed Contourlet based vehicle MMR approach in which the features are non-localized and thus results in sub-optimal classification. Further we show that the proposed algorithm can achieve the increased classification accuracy of 96% at significantly lower computational complexity due to the use of Two Dimensional Linear Discriminant Analysis (2DLDA) for dimensionality reduction by preserving the features with high between-class variance and low inter-class variance.
Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi
2017-11-02
Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.
NASA Astrophysics Data System (ADS)
Modiri, M.; Salehabadi, A.; Mohebbi, M.; Hashemi, A. M.; Masumi, M.
2015-12-01
The use of UAV in the application of photogrammetry to obtain cover images and achieve the main objectives of the photogrammetric mapping has been a boom in the region. The images taken from REGGIOLO region in the province of, Italy Reggio -Emilia by UAV with non-metric camera Canon Ixus and with an average height of 139.42 meters were used to classify urban feature. Using the software provided SURE and cover images of the study area, to produce dense point cloud, DSM and Artvqvtv spatial resolution of 10 cm was prepared. DTM area using Adaptive TIN filtering algorithm was developed. NDSM area was prepared with using the difference between DSM and DTM and a separate features in the image stack. In order to extract features, using simultaneous occurrence matrix features mean, variance, homogeneity, contrast, dissimilarity, entropy, second moment, and correlation for each of the RGB band image was used Orthophoto area. Classes used to classify urban problems, including buildings, trees and tall vegetation, grass and vegetation short, paved road and is impervious surfaces. Class consists of impervious surfaces such as pavement conditions, the cement, the car, the roof is stored. In order to pixel-based classification and selection of optimal features of classification was GASVM pixel basis. In order to achieve the classification results with higher accuracy and spectral composition informations, texture, and shape conceptual image featureOrthophoto area was fencing. The segmentation of multi-scale segmentation method was used.it belonged class. Search results using the proposed classification of urban feature, suggests the suitability of this method of classification complications UAV is a city using images. The overall accuracy and kappa coefficient method proposed in this study, respectively, 47/93% and 84/91% was.
Structure-based classification and ontology in chemistry
2012-01-01
Background Recent years have seen an explosion in the availability of data in the chemistry domain. With this information explosion, however, retrieving relevant results from the available information, and organising those results, become even harder problems. Computational processing is essential to filter and organise the available resources so as to better facilitate the work of scientists. Ontologies encode expert domain knowledge in a hierarchically organised machine-processable format. One such ontology for the chemical domain is ChEBI. ChEBI provides a classification of chemicals based on their structural features and a role or activity-based classification. An example of a structure-based class is 'pentacyclic compound' (compounds containing five-ring structures), while an example of a role-based class is 'analgesic', since many different chemicals can act as analgesics without sharing structural features. Structure-based classification in chemistry exploits elegant regularities and symmetries in the underlying chemical domain. As yet, there has been neither a systematic analysis of the types of structural classification in use in chemistry nor a comparison to the capabilities of available technologies. Results We analyze the different categories of structural classes in chemistry, presenting a list of patterns for features found in class definitions. We compare these patterns of class definition to tools which allow for automation of hierarchy construction within cheminformatics and within logic-based ontology technology, going into detail in the latter case with respect to the expressive capabilities of the Web Ontology Language and recent extensions for modelling structured objects. Finally we discuss the relationships and interactions between cheminformatics approaches and logic-based approaches. Conclusion Systems that perform intelligent reasoning tasks on chemistry data require a diverse set of underlying computational utilities including algorithmic, statistical and logic-based tools. For the task of automatic structure-based classification of chemical entities, essential to managing the vast swathes of chemical data being brought online, systems which are capable of hybrid reasoning combining several different approaches are crucial. We provide a thorough review of the available tools and methodologies, and identify areas of open research. PMID:22480202
Supervised target detection in hyperspectral images using one-class Fukunaga-Koontz Transform
NASA Astrophysics Data System (ADS)
Binol, Hamidullah; Bal, Abdullah
2016-05-01
A novel hyperspectral target detection technique based on Fukunaga-Koontz transform (FKT) is presented. FKT offers significant properties for feature selection and ordering. However, it can only be used to solve multi-pattern classification problems. Target detection may be considered as a two-class classification problem, i.e., target versus background clutter. Nevertheless, background clutter typically contains different types of materials. That's why; target detection techniques are different than classification methods by way of modeling clutter. To avoid the modeling of the background clutter, we have improved one-class FKT (OC-FKT) for target detection. The statistical properties of target training samples are used to define tunnel-like boundary of the target class. Non-target samples are then created synthetically as to be outside of the boundary. Thus, only limited target samples become adequate for training of FKT. The hyperspectral image experiments confirm that the proposed OC-FKT technique provides an effective means for target detection.
Zu, Chen; Jie, Biao; Liu, Mingxia; Chen, Songcan
2015-01-01
Multimodal classification methods using different modalities of imaging and non-imaging data have recently shown great advantages over traditional single-modality-based ones for diagnosis and prognosis of Alzheimer’s disease (AD), as well as its prodromal stage, i.e., mild cognitive impairment (MCI). However, to the best of our knowledge, most existing methods focus on mining the relationship across multiple modalities of the same subjects, while ignoring the potentially useful relationship across different subjects. Accordingly, in this paper, we propose a novel learning method for multimodal classification of AD/MCI, by fully exploring the relationships across both modalities and subjects. Specifically, our proposed method includes two subsequent components, i.e., label-aligned multi-task feature selection and multimodal classification. In the first step, the feature selection learning from multiple modalities are treated as different learning tasks and a group sparsity regularizer is imposed to jointly select a subset of relevant features. Furthermore, to utilize the discriminative information among labeled subjects, a new label-aligned regularization term is added into the objective function of standard multi-task feature selection, where label-alignment means that all multi-modality subjects with the same class labels should be closer in the new feature-reduced space. In the second step, a multi-kernel support vector machine (SVM) is adopted to fuse the selected features from multi-modality data for final classification. To validate our method, we perform experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database using baseline MRI and FDG-PET imaging data. The experimental results demonstrate that our proposed method achieves better classification performance compared with several state-of-the-art methods for multimodal classification of AD/MCI. PMID:26572145
Robust point cloud classification based on multi-level semantic relationships for urban scenes
NASA Astrophysics Data System (ADS)
Zhu, Qing; Li, Yuan; Hu, Han; Wu, Bo
2017-07-01
The semantic classification of point clouds is a fundamental part of three-dimensional urban reconstruction. For datasets with high spatial resolution but significantly more noises, a general trend is to exploit more contexture information to surmount the decrease of discrimination of features for classification. However, previous works on adoption of contexture information are either too restrictive or only in a small region and in this paper, we propose a point cloud classification method based on multi-level semantic relationships, including point-homogeneity, supervoxel-adjacency and class-knowledge constraints, which is more versatile and incrementally propagate the classification cues from individual points to the object level and formulate them as a graphical model. The point-homogeneity constraint clusters points with similar geometric and radiometric properties into regular-shaped supervoxels that correspond to the vertices in the graphical model. The supervoxel-adjacency constraint contributes to the pairwise interactions by providing explicit adjacent relationships between supervoxels. The class-knowledge constraint operates at the object level based on semantic rules, guaranteeing the classification correctness of supervoxel clusters at that level. International Society of Photogrammetry and Remote Sensing (ISPRS) benchmark tests have shown that the proposed method achieves state-of-the-art performance with an average per-area completeness and correctness of 93.88% and 95.78%, respectively. The evaluation of classification of photogrammetric point clouds and DSM generated from aerial imagery confirms the method's reliability in several challenging urban scenes.
Wang, Xinglong; Rak, Rafal; Restificar, Angelo; Nobata, Chikashi; Rupp, C J; Batista-Navarro, Riza Theresa B; Nawaz, Raheel; Ananiadou, Sophia
2011-10-03
The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest's Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles. We proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task's development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew's Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics. Our novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance.
Canizo, Brenda V; Escudero, Leticia B; Pérez, María B; Pellerano, Roberto G; Wuilloud, Rodolfo G
2018-03-01
The feasibility of the application of chemometric techniques associated with multi-element analysis for the classification of grape seeds according to their provenance vineyard soil was investigated. Grape seed samples from different localities of Mendoza province (Argentina) were evaluated. Inductively coupled plasma mass spectrometry (ICP-MS) was used for the determination of twenty-nine elements (Ag, As, Ce, Co, Cs, Cu, Eu, Fe, Ga, Gd, La, Lu, Mn, Mo, Nb, Nd, Ni, Pr, Rb, Sm, Te, Ti, Tl, Tm, U, V, Y, Zn and Zr). Once the analytical data were collected, supervised pattern recognition techniques such as linear discriminant analysis (LDA), partial least square discriminant analysis (PLS-DA), k-nearest neighbors (k-NN), support vector machine (SVM) and Random Forest (RF) were applied to construct classification/discrimination rules. The results indicated that nonlinear methods, RF and SVM, perform best with up to 98% and 93% accuracy rate, respectively, and therefore are excellent tools for classification of grapes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Stojanova, Daniela; Ceci, Michelangelo; Malerba, Donato; Dzeroski, Saso
2013-09-26
Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions.
NASA Astrophysics Data System (ADS)
Gupta, Rajendra Kumar
The increase in lion and leopard population in the GIR wild life sanctuary and National Park (Gir Protected Area) demands periodic and precision monitoring of habitat at close intervals using space based remote sensing data. Besides characterizing the different forest classes, remote sensing needs to support for the assessment of thermal stress zones and identification of possible corridors for lion dispersion to new home ranges. The study focuses on assessing the thematic forest classification accuracies in percentage terms(CA) attainable using single date post-monsoon (CA=60, kappa = 0.514) as well as leaf shedding (CA=48.4, kappa = 0.372) season data in visible and Near-IR spectral bands of IRS/LISS-III at 23.5 m spatial resolution; and improvement of CA by using joint two date (multi-temporal) data sets (CA=87.2, kappa = 0.843) in the classification. The 188 m spatial resolution IRS/WiFS and 23.5 m spatial resolution LISS-III data were used to study the possible corridors for dispersion of Lions from GIR protected areas (PA). A relative thermal stress index (RTSI) for Gir PA has been developed using NOAA/ AVHRR data sets of post-monsoon, leaf shedded and summer seasons. The paper discusses the role of RTSI as a tool to work out forest management plans using leaf shedded season data to combat the thermal stress in the habitat, by identifying locations for artificial water holes during the ensuing summer season.
Sørensen, Lauge; Nielsen, Mads
2018-05-15
The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.
A review and analysis of neural networks for classification of remotely sensed multispectral imagery
NASA Technical Reports Server (NTRS)
Paola, Justin D.; Schowengerdt, Robert A.
1993-01-01
A literature survey and analysis of the use of neural networks for the classification of remotely sensed multispectral imagery is presented. As part of a brief mathematical review, the backpropagation algorithm, which is the most common method of training multi-layer networks, is discussed with an emphasis on its application to pattern recognition. The analysis is divided into five aspects of neural network classification: (1) input data preprocessing, structure, and encoding; (2) output encoding and extraction of classes; (3) network architecture, (4) training algorithms; and (5) comparisons to conventional classifiers. The advantages of the neural network method over traditional classifiers are its non-parametric nature, arbitrary decision boundary capabilities, easy adaptation to different types of data and input structures, fuzzy output values that can enhance classification, and good generalization for use with multiple images. The disadvantages of the method are slow training time, inconsistent results due to random initial weights, and the requirement of obscure initialization values (e.g., learning rate and hidden layer size). Possible techniques for ameliorating these problems are discussed. It is concluded that, although the neural network method has several unique capabilities, it will become a useful tool in remote sensing only if it is made faster, more predictable, and easier to use.
Mapping of land cover in northern California with simulated hyperspectral satellite imagery
NASA Astrophysics Data System (ADS)
Clark, Matthew L.; Kilham, Nina E.
2016-09-01
Land-cover maps are important science products needed for natural resource and ecosystem service management, biodiversity conservation planning, and assessing human-induced and natural drivers of land change. Analysis of hyperspectral, or imaging spectrometer, imagery has shown an impressive capacity to map a wide range of natural and anthropogenic land cover. Applications have been mostly with single-date imagery from relatively small spatial extents. Future hyperspectral satellites will provide imagery at greater spatial and temporal scales, and there is a need to assess techniques for mapping land cover with these data. Here we used simulated multi-temporal HyspIRI satellite imagery over a 30,000 km2 area in the San Francisco Bay Area, California to assess its capabilities for mapping classes defined by the international Land Cover Classification System (LCCS). We employed a mapping methodology and analysis framework that is applicable to regional and global scales. We used the Random Forests classifier with three sets of predictor variables (reflectance, MNF, hyperspectral metrics), two temporal resolutions (summer, spring-summer-fall), two sample scales (pixel, polygon) and two levels of classification complexity (12, 20 classes). Hyperspectral metrics provided a 16.4-21.8% and 3.1-6.7% increase in overall accuracy relative to MNF and reflectance bands, respectively, depending on pixel or polygon scales of analysis. Multi-temporal metrics improved overall accuracy by 0.9-3.1% over summer metrics, yet increases were only significant at the pixel scale of analysis. Overall accuracy at pixel scales was 72.2% (Kappa 0.70) with three seasons of metrics. Anthropogenic and homogenous natural vegetation classes had relatively high confidence and producer and user accuracies were over 70%; in comparison, woodland and forest classes had considerable confusion. We next focused on plant functional types with relatively pure spectra by removing open-canopy shrublands, woodlands and mixed forests from the classification. This 12-class map had significantly improved accuracy of 85.1% (Kappa 0.83) and most classes had over 70% producer and user accuracies. Finally, we summarized important metrics from the multi-temporal Random Forests to infer the underlying chemical and structural properties that best discriminated our land-cover classes across seasons.
Automated classification of articular cartilage surfaces based on surface texture.
Stachowiak, G P; Stachowiak, G W; Podsiadlo, P
2006-11-01
In this study the automated classification system previously developed by the authors was used to classify articular cartilage surfaces with different degrees of wear. This automated system classifies surfaces based on their texture. Plug samples of sheep cartilage (pins) were run on stainless steel discs under various conditions using a pin-on-disc tribometer. Testing conditions were specifically designed to produce different severities of cartilage damage due to wear. Environmental scanning electron microscope (SEM) (ESEM) images of cartilage surfaces, that formed a database for pattern recognition analysis, were acquired. The ESEM images of cartilage were divided into five groups (classes), each class representing different wear conditions or wear severity. Each class was first examined and assessed visually. Next, the automated classification system (pattern recognition) was applied to all classes. The results of the automated surface texture classification were compared to those based on visual assessment of surface morphology. It was shown that the texture-based automated classification system was an efficient and accurate method of distinguishing between various cartilage surfaces generated under different wear conditions. It appears that the texture-based classification method has potential to become a useful tool in medical diagnostics.
NASA Astrophysics Data System (ADS)
Gatos, I.; Tsantis, S.; Karamesini, M.; Skouroliakou, A.; Kagadis, G.
2015-09-01
Purpose: The design and implementation of a computer-based image analysis system employing the support vector machine (SVM) classifier system for the classification of Focal Liver Lesions (FLLs) on routine non-enhanced, T2-weighted Magnetic Resonance (MR) images. Materials and Methods: The study comprised 92 patients; each one of them has undergone MRI performed on a Magnetom Concerto (Siemens). Typical signs on dynamic contrast-enhanced MRI and biopsies were employed towards a three class categorization of the 92 cases: 40-benign FLLs, 25-Hepatocellular Carcinomas (HCC) within Cirrhotic liver parenchyma and 27-liver metastases from Non-Cirrhotic liver. Prior to FLLs classification an automated lesion segmentation algorithm based on Marcov Random Fields was employed in order to acquire each FLL Region of Interest. 42 texture features derived from the gray-level histogram, co-occurrence and run-length matrices and 12 morphological features were obtained from each lesion. Stepwise multi-linear regression analysis was utilized to avoid feature redundancy leading to a feature subset that fed the multiclass SVM classifier designed for lesion classification. SVM System evaluation was performed by means of leave-one-out method and ROC analysis. Results: Maximum accuracy for all three classes (90.0%) was obtained by means of the Radial Basis Kernel Function and three textural features (Inverse- Different-Moment, Sum-Variance and Long-Run-Emphasis) that describe lesion's contrast, variability and shape complexity. Sensitivity values for the three classes were 92.5%, 81.5% and 96.2% respectively, whereas specificity values were 94.2%, 95.3% and 95.5%. The AUC value achieved for the selected subset was 0.89 with 0.81 - 0.94 confidence interval. Conclusion: The proposed SVM system exhibit promising results that could be utilized as a second opinion tool to the radiologist in order to decrease the time/cost of diagnosis and the need for patients to undergo invasive examination.
Large Scale Crop Classification in Ukraine using Multi-temporal Landsat-8 Images with Missing Data
NASA Astrophysics Data System (ADS)
Kussul, N.; Skakun, S.; Shelestov, A.; Lavreniuk, M. S.
2014-12-01
At present, there are no globally available Earth observation (EO) derived products on crop maps. This issue is being addressed within the Sentinel-2 for Agriculture initiative where a number of test sites (including from JECAM) participate to provide coherent protocols and best practices for various global agriculture systems, and subsequently crop maps from Sentinel-2. One of the problems in dealing with optical images for large territories (more than 10,000 sq. km) is the presence of clouds and shadows that result in having missing values in data sets. In this abstract, a new approach to classification of multi-temporal optical satellite imagery with missing data due to clouds and shadows is proposed. First, self-organizing Kohonen maps (SOMs) are used to restore missing pixel values in a time series of satellite imagery. SOMs are trained for each spectral band separately using non-missing values. Missing values are restored through a special procedure that substitutes input sample's missing components with neuron's weight coefficients. After missing data restoration, a supervised classification is performed for multi-temporal satellite images. For this, an ensemble of neural networks, in particular multilayer perceptrons (MLPs), is proposed. Ensembling of neural networks is done by the technique of average committee, i.e. to calculate the average class probability over classifiers and select the class with the highest average posterior probability for the given input sample. The proposed approach is applied for large scale crop classification using multi temporal Landsat-8 images for the JECAM test site in Ukraine [1-2]. It is shown that ensemble of MLPs provides better performance than a single neural network in terms of overall classification accuracy and kappa coefficient. The obtained classification map is also validated through estimated crop and forest areas and comparison to official statistics. 1. A.Yu. Shelestov et al., "Geospatial information system for agricultural monitoring," Cybernetics Syst. Anal., vol. 49, no. 1, pp. 124-132, 2013. 2. J. Gallego et al., "Efficiency Assessment of Different Approaches to Crop Classification Based on Satellite and Ground Observations," J. Autom. Inform. Scie., vol. 44, no. 5, pp. 67-80, 2012.
A hierarchical anatomical classification schema for prediction of phenotypic side effects
Kanji, Rakesh
2018-01-01
Prediction of adverse drug reactions is an important problem in drug discovery endeavors which can be addressed with data-driven strategies. SIDER is one of the most reliable and frequently used datasets for identification of key features as well as building machine learning models for side effects prediction. The inherently unbalanced nature of this data presents with a difficult multi-label multi-class problem towards prediction of drug side effects. We highlight the intrinsic issue with SIDER data and methodological flaws in relying on performance measures such as AUC while attempting to predict side effects.We argue for the use of metrics that are robust to class imbalance for evaluation of classifiers. Importantly, we present a ‘hierarchical anatomical classification schema’ which aggregates side effects into organs, sub-systems, and systems. With the help of a weighted performance measure, using 5-fold cross-validation we show that this strategy facilitates biologically meaningful side effects prediction at different levels of anatomical hierarchy. By implementing various machine learning classifiers we show that Random Forest model yields best classification accuracy at each level of coarse-graining. The manually curated, hierarchical schema for side effects can also serve as the basis of future studies towards prediction of adverse reactions and identification of key features linked to specific organ systems. Our study provides a strategy for hierarchical classification of side effects rooted in the anatomy and can pave the way for calibrated expert systems for multi-level prediction of side effects. PMID:29494708
A hierarchical anatomical classification schema for prediction of phenotypic side effects.
Wadhwa, Somin; Gupta, Aishwarya; Dokania, Shubham; Kanji, Rakesh; Bagler, Ganesh
2018-01-01
Prediction of adverse drug reactions is an important problem in drug discovery endeavors which can be addressed with data-driven strategies. SIDER is one of the most reliable and frequently used datasets for identification of key features as well as building machine learning models for side effects prediction. The inherently unbalanced nature of this data presents with a difficult multi-label multi-class problem towards prediction of drug side effects. We highlight the intrinsic issue with SIDER data and methodological flaws in relying on performance measures such as AUC while attempting to predict side effects.We argue for the use of metrics that are robust to class imbalance for evaluation of classifiers. Importantly, we present a 'hierarchical anatomical classification schema' which aggregates side effects into organs, sub-systems, and systems. With the help of a weighted performance measure, using 5-fold cross-validation we show that this strategy facilitates biologically meaningful side effects prediction at different levels of anatomical hierarchy. By implementing various machine learning classifiers we show that Random Forest model yields best classification accuracy at each level of coarse-graining. The manually curated, hierarchical schema for side effects can also serve as the basis of future studies towards prediction of adverse reactions and identification of key features linked to specific organ systems. Our study provides a strategy for hierarchical classification of side effects rooted in the anatomy and can pave the way for calibrated expert systems for multi-level prediction of side effects.
A Stimulus-Independent Hybrid BCI Based on Motor Imagery and Somatosensory Attentional Orientation.
Yao, Lin; Sheng, Xinjun; Zhang, Dingguo; Jiang, Ning; Mrachacz-Kersting, Natalie; Zhu, Xiangyang; Farina, Dario
2017-09-01
Distinctive EEG signals from the motor and somatosensory cortex are generated during mental tasks of motor imagery (MI) and somatosensory attentional orientation (SAO). In this paper, we hypothesize that a combination of these two signal modalities provides improvements in a brain-computer interface (BCI) performance with respect to using the two methods separately, and generate novel types of multi-class BCI systems. Thirty two subjects were randomly divided into a Control-Group and a Hybrid-Group. In the Control-Group, the subjects performed left and right hand motor imagery (i.e., L-MI and R-MI). In the Hybrid-Group, the subjects performed the four mental tasks (i.e., L-MI, R-MI, L-SAO, and R-SAO). The results indicate that combining two of the tasks in a hybrid manner (such as L-SAO and R-MI) resulted in a significantly greater classification accuracy than when using two MI tasks. The hybrid modality reached 86.1% classification accuracy on average, with a 7.70% increase with respect to MI ( ), and 7.21% to SAO ( ) alone. Moreover, all 16 subjects in the hybrid modality reached at least 70% accuracy, which is considered the threshold for BCI illiteracy. In addition to the two-class results, the classification accuracy was 68.1% and 54.1% for the three-class and four-class hybrid BCI. Combining the induced brain signals from motor and somatosensory cortex, the proposed stimulus-independent hybrid BCI has shown improved performance with respect to individual modalities, reducing the portion of BCI-illiterate subjects, and provided novel types of multi-class BCIs.
MFV-class: a multi-faceted visualization tool of object classes.
Zhang, Zhi-meng; Pan, Yun-he; Zhuang, Yue-ting
2004-11-01
Classes are key software components in an object-oriented software system. In many industrial OO software systems, there are some classes that have complicated structure and relationships. So in the processes of software maintenance, testing, software reengineering, software reuse and software restructure, it is a challenge for software engineers to understand these classes thoroughly. This paper proposes a class comprehension model based on constructivist learning theory, and implements a software visualization tool (MFV-Class) to help in the comprehension of a class. The tool provides multiple views of class to uncover manifold facets of class contents. It enables visualizing three object-oriented metrics of classes to help users focus on the understanding process. A case study was conducted to evaluate our approach and the toolkit.
A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets
NASA Astrophysics Data System (ADS)
Bikku, Thulasi; Sambasiva Rao, N., Dr; Rao, Akepogu Ananda, Dr
2017-08-01
This paper mainly focuseson developing aHadoop based framework for feature selection and classification models to classify high dimensionality data in heterogeneous biomedical databases. Wide research has been performing in the fields of Machine learning, Big data and Data mining for identifying patterns. The main challenge is extracting useful features generated from diverse biological systems. The proposed model can be used for predicting diseases in various applications and identifying the features relevant to particular diseases. There is an exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Extracting key features from unstructured documents often lead to uncertain results due to outliers and missing values. In this paper, we proposed a two phase map-reduce framework with text preprocessor and classification model. In the first phase, mapper based preprocessing method was designed to eliminate irrelevant features, missing values and outliers from the biomedical data. In the second phase, a Map-Reduce based multi-class ensemble decision tree model was designed and implemented in the preprocessed mapper data to improve the true positive rate and computational time. The experimental results on the complex biomedical datasets show that the performance of our proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.
A Features Selection for Crops Classification
NASA Astrophysics Data System (ADS)
Liu, Yifan; Shao, Luyi; Yin, Qiang; Hong, Wen
2016-08-01
The components of the polarimetric target decomposition reflect the differences of target since they linked with the scattering properties of the target and can be imported into SVM as the classification features. The result of decomposition usually concentrate on part of the components. Selecting a combination of components can reduce the features that importing into the SVM. The features reduction can lead to less calculation and targeted classification of one target when we classify a multi-class area. In this research, we import different combinations of features into the SVM and find a better combination for classification with a data of AGRISAR.
NASA Astrophysics Data System (ADS)
Huang, Xin; Chen, Huijun; Gong, Jianya
2018-01-01
Spaceborne multi-angle images with a high-resolution are capable of simultaneously providing spatial details and three-dimensional (3D) information to support detailed and accurate classification of complex urban scenes. In recent years, satellite-derived digital surface models (DSMs) have been increasingly utilized to provide height information to complement spectral properties for urban classification. However, in such a way, the multi-angle information is not effectively exploited, which is mainly due to the errors and difficulties of the multi-view image matching and the inaccuracy of the generated DSM over complex and dense urban scenes. Therefore, it is still a challenging task to effectively exploit the available angular information from high-resolution multi-angle images. In this paper, we investigate the potential for classifying urban scenes based on local angular properties characterized from high-resolution ZY-3 multi-view images. Specifically, three categories of angular difference features (ADFs) are proposed to describe the angular information at three levels (i.e., pixel, feature, and label levels): (1) ADF-pixel: the angular information is directly extrapolated by pixel comparison between the multi-angle images; (2) ADF-feature: the angular differences are described in the feature domains by comparing the differences between the multi-angle spatial features (e.g., morphological attribute profiles (APs)). (3) ADF-label: label-level angular features are proposed based on a group of urban primitives (e.g., buildings and shadows), in order to describe the specific angular information related to the types of primitive classes. In addition, we utilize spatial-contextual information to refine the multi-level ADF features using superpixel segmentation, for the purpose of alleviating the effects of salt-and-pepper noise and representing the main angular characteristics within a local area. The experiments on ZY-3 multi-angle images confirm that the proposed ADF features can effectively improve the accuracy of urban scene classification, with a significant increase in overall accuracy (3.8-11.7%) compared to using the spectral bands alone. Furthermore, the results indicated the superiority of the proposed ADFs in distinguishing between the spectrally similar and complex man-made classes, including roads and various types of buildings (e.g., high buildings, urban villages, and residential apartments).
Manifold alignment with Schroedinger eigenmaps
NASA Astrophysics Data System (ADS)
Johnson, Juan E.; Bachmann, Charles M.; Cahill, Nathan D.
2016-05-01
The sun-target-sensor angle can change during aerial remote sensing. In an attempt to compensate BRDF effects in multi-angular hyperspectral images, the Semi-Supervised Manifold Alignment (SSMA) algorithm pulls data from similar classes together and pushes data from different classes apart. SSMA uses Laplacian Eigenmaps (LE) to preserve the original geometric structure of each local data set independently. In this paper, we replace LE with Spatial-Spectral Schoedinger Eigenmaps (SSSE) which was designed to be a semisupervised enhancement to the to extend the SSMA methodology and improve classification of multi-angular hyperspectral images captured over Hog Island in the Virginia Coast Reserve.
Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals
2014-09-30
floor 1176 Howell St Newport RI 02842 phone: (401) 832-5749 fax: (401) 832-4441 email: David.Moretti@navy.mil Steve W. Martin SPAWAR...APPROACH Odontocete click detection and classification. A multi-class support vector machine (SVM) classifier was previously developed ( Jarvis ...beaked whales, Risso’s dolphins, short-finned pilot whales, and sperm whales. Here Moretti’s group, particularly S. Jarvis , is improving the SVM
Intelligent feature selection techniques for pattern classification of Lamb wave signals
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hinders, Mark K.; Miller, Corey A.
2014-02-18
Lamb wave interaction with flaws is a complex, three-dimensional phenomenon, which often frustrates signal interpretation schemes based on mode arrival time shifts predicted by dispersion curves. As the flaw severity increases, scattering and mode conversion effects will often dominate the time-domain signals, obscuring available information about flaws because multiple modes may arrive on top of each other. Even for idealized flaw geometries the scattering and mode conversion behavior of Lamb waves is very complex. Here, multi-mode Lamb waves in a metal plate are propagated across a rectangular flat-bottom hole in a sequence of pitch-catch measurements corresponding to the double crossholemore » tomography geometry. The flaw is sequentially deepened, with the Lamb wave measurements repeated at each flaw depth. Lamb wave tomography reconstructions are used to identify which waveforms have interacted with the flaw and thereby carry information about its depth. Multiple features are extracted from each of the Lamb wave signals using wavelets, which are then fed to statistical pattern classification algorithms that identify flaw severity. In order to achieve the highest classification accuracy, an optimal feature space is required but it’s never known a priori which features are going to be best. For structural health monitoring we make use of the fact that physical flaws, such as corrosion, will only increase over time. This allows us to identify feature vectors which are topologically well-behaved by requiring that sequential classes “line up” in feature vector space. An intelligent feature selection routine is illustrated that identifies favorable class distributions in multi-dimensional feature spaces using computational homology theory. Betti numbers and formal classification accuracies are calculated for each feature space subset to establish a correlation between the topology of the class distribution and the corresponding classification accuracy.« less
Enhanced Data Representation by Kernel Metric Learning for Dementia Diagnosis
Cárdenas-Peña, David; Collazos-Huertas, Diego; Castellanos-Dominguez, German
2017-01-01
Alzheimer's disease (AD) is the kind of dementia that affects the most people around the world. Therefore, an early identification supporting effective treatments is required to increase the life quality of a wide number of patients. Recently, computer-aided diagnosis tools for dementia using Magnetic Resonance Imaging scans have been successfully proposed to discriminate between patients with AD, mild cognitive impairment, and healthy controls. Most of the attention has been given to the clinical data, provided by initiatives as the ADNI, supporting reliable researches on intervention, prevention, and treatments of AD. Therefore, there is a need for improving the performance of classification machines. In this paper, we propose a kernel framework for learning metrics that enhances conventional machines and supports the diagnosis of dementia. Our framework aims at building discriminative spaces through the maximization of center kernel alignment function, aiming at improving the discrimination of the three considered neurological classes. The proposed metric learning performance is evaluated on the widely-known ADNI database using three supervised classification machines (k-nn, SVM and NNs) for multi-class and bi-class scenarios from structural MRIs. Specifically, from ADNI collection 286 AD patients, 379 MCI patients and 231 healthy controls are used for development and validation of our proposed metric learning framework. For the experimental validation, we split the data into two subsets: 30% of subjects used like a blindfolded assessment and 70% employed for parameter tuning. Then, in the preprocessing stage, each structural MRI scan a total of 310 morphological measurements are automatically extracted from by FreeSurfer software package and concatenated to build an input feature matrix. Obtained test performance results, show that including a supervised metric learning improves the compared baseline classifiers in both scenarios. In the multi-class scenario, we achieve the best performance (accuracy 60.1%) for pretrained 1-layered NN, and we obtain measures over 90% in the average for HC vs. AD task. From the machine learning point of view, our proposal enhances the classifier performance by building spaces with a better class separability. From the clinical application, our enhancement results in a more balanced performance in each class than the compared approaches from the CADDementia challenge by increasing the sensitivity of pathological groups and the specificity of healthy controls. PMID:28798659
Youn, Su Hyun; Sim, Taeyong; Choi, Ahnryul; Song, Jinsung; Shin, Ki Young; Lee, Il Kwon; Heo, Hyun Mu; Lee, Daeweon; Mun, Joung Hwan
2015-06-01
Ultrasonic surgical units (USUs) have the advantage of minimizing tissue damage during surgeries that require tissue dissection by reducing problems such as coagulation and unwanted carbonization, but the disadvantage of requiring manual adjustment of power output according to the target tissue. In order to overcome this limitation, it is necessary to determine the properties of in vivo tissues automatically. We propose a multi-classifier that can accurately classify tissues based on the unique impedance of each tissue. For this purpose, a multi-classifier was built based on single classifiers with high classification rates, and the classification accuracy of the proposed model was compared with that of single classifiers for various electrode types (Type-I: 6 mm invasive; Type-II: 3 mm invasive; Type-III: surface). The sensitivity and positive predictive value (PPV) of the multi-classifier by cross checks were determined. According to the 10-fold cross validation results, the classification accuracy of the proposed model was significantly higher (p<0.05 or <0.01) than that of existing single classifiers for all electrode types. In particular, the classification accuracy of the proposed model was highest when the 3mm invasive electrode (Type-II) was used (sensitivity=97.33-100.00%; PPV=96.71-100.00%). The results of this study are an important contribution to achieving automatic optimal output power adjustment of USUs according to the properties of individual tissues. Copyright © 2015 Elsevier Ltd. All rights reserved.
Multi-level discriminative dictionary learning with application to large scale image classification.
Shen, Li; Sun, Gang; Huang, Qingming; Wang, Shuhui; Lin, Zhouchen; Wu, Enhua
2015-10-01
The sparse coding technique has shown flexibility and capability in image representation and analysis. It is a powerful tool in many visual applications. Some recent work has shown that incorporating the properties of task (such as discrimination for classification task) into dictionary learning is effective for improving the accuracy. However, the traditional supervised dictionary learning methods suffer from high computation complexity when dealing with large number of categories, making them less satisfactory in large scale applications. In this paper, we propose a novel multi-level discriminative dictionary learning method and apply it to large scale image classification. Our method takes advantage of hierarchical category correlation to encode multi-level discriminative information. Each internal node of the category hierarchy is associated with a discriminative dictionary and a classification model. The dictionaries at different layers are learnt to capture the information of different scales. Moreover, each node at lower layers also inherits the dictionary of its parent, so that the categories at lower layers can be described with multi-scale information. The learning of dictionaries and associated classification models is jointly conducted by minimizing an overall tree loss. The experimental results on challenging data sets demonstrate that our approach achieves excellent accuracy and competitive computation cost compared with other sparse coding methods for large scale image classification.
Benchmark of Machine Learning Methods for Classification of a SENTINEL-2 Image
NASA Astrophysics Data System (ADS)
Pirotti, F.; Sunar, F.; Piragnolo, M.
2016-06-01
Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.
2011-01-01
Background The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest’s Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles. Results We proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task’s development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew’s Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics. Conclusions Our novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance. PMID:22151769
Object-based class modelling for multi-scale riparian forest habitat mapping
NASA Astrophysics Data System (ADS)
Strasser, Thomas; Lang, Stefan
2015-05-01
Object-based class modelling allows for mapping complex, hierarchical habitat systems. The riparian zone, including forests, represents such a complex ecosystem. Forests within riparian zones are biologically high productive and characterized by a rich biodiversity; thus considered of high community interest with an imperative to be protected and regularly monitored. Satellite earth observation (EO) provides tools for capturing the current state of forest habitats such as forest composition including intermixture of non-native tree species. Here we present a semi-automated object based image analysis (OBIA) approach for the mapping of riparian forests by applying class modelling of habitats based on the European Nature Information System (EUNIS) habitat classifications and the European Habitats Directive (HabDir) Annex 1. A very high resolution (VHR) WorldView-2 satellite image provided the required spatial and spectral details for a multi-scale image segmentation and rule-base composition to generate a six-level hierarchical representation of riparian forest habitats. Thereby habitats were hierarchically represented within an image object hierarchy as forest stands, stands of homogenous tree species and single trees represented by sunlit tree crowns. 522 EUNIS level 3 (EUNIS-3) habitat patches with a mean patch size (MPS) of 12,349.64 m2 were modelled from 938 forest stand patches (MPS = 6868.20 m2) and 43,742 tree stand patches (MPS = 140.79 m2). The delineation quality of the modelled EUNIS-3 habitats (focal level) was quantitatively assessed to an expert-based visual interpretation showing a mean deviation of 11.71%.
NASA Astrophysics Data System (ADS)
Hammann, Mark Gregory
The fusion of electro-optical (EO) multi-spectral satellite imagery with Synthetic Aperture Radar (SAR) data was explored with the working hypothesis that the addition of multi-band SAR will increase the land-cover (LC) classification accuracy compared to EO alone. Three satellite sources for SAR imagery were used: X-band from TerraSAR-X, C-band from RADARSAT-2, and L-band from PALSAR. Images from the RapidEye satellites were the source of the EO imagery. Imagery from the GeoEye-1 and WorldView-2 satellites aided the selection of ground truth. Three study areas were chosen: Wad Medani, Sudan; Campinas, Brazil; and Fresno- Kings Counties, USA. EO imagery were radiometrically calibrated, atmospherically compensated, orthorectifed, co-registered, and clipped to a common area of interest (AOI). SAR imagery were radiometrically calibrated, and geometrically corrected for terrain and incidence angle by converting to ground range and Sigma Naught (?0). The original SAR HH data were included in the fused image stack after despeckling with a 3x3 Enhanced Lee filter. The variance and Gray-Level-Co-occurrence Matrix (GLCM) texture measures of contrast, entropy, and correlation were derived from the non-despeckled SAR HH bands. Data fusion was done with layer stacking and all data were resampled to a common spatial resolution. The Support Vector Machine (SVM) decision rule was used for the supervised classifications. Similar LC classes were identified and tested for each study area. For Wad Medani, nine classes were tested: low and medium intensity urban, sparse forest, water, barren ground, and four agriculture classes (fallow, bare agricultural ground, green crops, and orchards). For Campinas, Brazil, five generic classes were tested: urban, agriculture, forest, water, and barren ground. For the Fresno-Kings Counties location 11 classes were studied: three generic classes (urban, water, barren land), and eight specific crops. In all cases the addition of SAR to EO resulted in higher overall classification accuracies. In many cases using more than a single SAR band also improved the classification accuracy. There was no single best SAR band for all cases; for specific study areas or LC classes, different SAR bands were better. For Wad Medani, the overall accuracy increased nearly 25% over EO by using all three SAR bands and GLCM texture. For Campinas, the improvement over EO was 4.3%; the large areas of vegetation were classified by EO with good accuracy. At Fresno-Kings Counties, EO+SAR fusion improved the overall classification accuracy by 7%. For times or regions where EO is not available due to extended cloud cover, classification with SAR is often the only option; note that SAR alone typically results in lower classification accuracies than when using EO or EO-SAR fusion. Fusion of EO and SAR was especially important to improve the separability of orchards from other crops, and separating urban areas with buildings from bare soil; those classes are difficult to accurately separate with EO. The outcome of this dissertation contributes to the understanding of the benefits of combining data from EO imagery with different SAR bands and SAR derived texture data to identify different LC classes. In times of increased public and private budget constraints and industry consolidation, this dissertation provides insight as to which band packages could be most useful for increased accuracy in LC classification.
NASA Astrophysics Data System (ADS)
Murawski, Aline; Bürger, Gerd; Vorogushyn, Sergiy; Merz, Bruno
2016-04-01
The use of a weather pattern based approach for downscaling of coarse, gridded atmospheric data, as usually obtained from the output of general circulation models (GCM), allows for investigating the impact of anthropogenic greenhouse gas emissions on fluxes and state variables of the hydrological cycle such as e.g. on runoff in large river catchments. Here we aim at attributing changes in high flows in the Rhine catchment to anthropogenic climate change. Therefore we run an objective classification scheme (simulated annealing and diversified randomisation - SANDRA, available from the cost733 classification software) on ERA20C reanalyses data and apply the established classification to GCMs from the CMIP5 project. After deriving weather pattern time series from GCM runs using forcing from all greenhouse gases (All-Hist) and using natural greenhouse gas forcing only (Nat-Hist), a weather generator will be employed to obtain climate data time series for the hydrological model. The parameters of the weather pattern classification (i.e. spatial extent, number of patterns, classification variables) need to be selected in a way that allows for good stratification of the meteorological variables that are of interest for the hydrological modelling. We evaluate the skill of the classification in stratifying meteorological data using a multi-variable approach. This allows for estimating the stratification skill for all meteorological variables together, not separately as usually done in existing similar work. The advantage of the multi-variable approach is to properly account for situations where e.g. two patterns are associated with similar mean daily temperature, but one pattern is dry while the other one is related to considerable amounts of precipitation. Thus, the separation of these two patterns would not be justified when considering temperature only, but is perfectly reasonable when accounting for precipitation as well. Besides that, the weather patterns derived from reanalyses data should be well represented in the All-Hist GCM runs in terms of e.g. frequency, seasonality, and persistence. In this contribution we show how to select the most appropriate weather pattern classification and how the classes derived from it are reflected in the GCMs.
ERIC Educational Resources Information Center
Yalcin, Seher
2018-01-01
In this study, it is aimed to distinguish the reading skills of students participating in PISA 2015 application into multi-level latent classes at the student and country level. Furthermore, it is aimed to examine how the clusters emerged at country-level is predicted by variables as students have the information and communication technology (ICT)…
2014-01-01
Background Laccases (E.C. 1.10.3.2) are multi-copper oxidases that have gained importance in many industries such as biofuels, pulp production, textile dye bleaching, bioremediation, and food production. Their usefulness stems from the ability to act on a diverse range of phenolic compounds such as o-/p-quinols, aminophenols, polyphenols, polyamines, aryl diamines, and aromatic thiols. Despite acting on a wide range of compounds as a family, individual Laccases often exhibit distinctive and varied substrate ranges. This is likely due to Laccases involvement in many metabolic roles across diverse taxa. Classification systems for multi-copper oxidases have been developed using multiple sequence alignments, however, these systems seem to largely follow species taxonomy rather than substrate ranges, enzyme properties, or specific function. It has been suggested that the roles and substrates of various Laccases are related to their optimal pH. This is consistent with the observation that fungal Laccases usually prefer acidic conditions, whereas plant and bacterial Laccases prefer basic conditions. Based on these observations, we hypothesize that a descriptor-based unsupervised learning system could generate homology independent classification system for better describing the functional properties of Laccases. Results In this study, we first utilized unsupervised learning approach to develop a novel homology independent Laccase classification system. From the descriptors considered, physicochemical properties showed the best performance. Physicochemical properties divided the Laccases into twelve subtypes. Analysis of the clusters using a t-test revealed that the majority of the physicochemical descriptors had statistically significant differences between the classes. Feature selection identified the most important features as negatively charges residues, the peptide isoelectric point, and acidic or amidic residues. Secondly, to allow for classification of new Laccases, a supervised learning system was developed from the clusters. The models showed high performance with an overall accuracy of 99.03%, error of 0.49%, MCC of 0.9367, precision of 94.20%, sensitivity of 94.20%, and specificity of 99.47% in a 5-fold cross-validation test. In an independent test, our models still provide a high accuracy of 97.98%, error rate of 1.02%, MCC of 0.8678, precision of 87.88%, sensitivity of 87.88% and specificity of 98.90%. Conclusion This study provides a useful classification system for better understanding of Laccases from their physicochemical properties perspective. We also developed a publically available web tool for the characterization of Laccase protein sequences (http://lacsubpred.bioinfo.ucr.edu/). Finally, the programs used in the study are made available for researchers interested in applying the system to other enzyme classes (https://github.com/tweirick/SubClPred). PMID:25350584
NASA Astrophysics Data System (ADS)
Langer, H. K.; Falsaperla, S. M.; Behncke, B.; Messina, A.; Spampinato, S.
2009-12-01
Artificial Intelligence (AI) has found broad applications in volcano observatories worldwide with the aim of reducing volcanic hazard. The need to process larger and larger quantity of data makes indeed AI techniques appealing for monitoring purposes. Tools based on Artificial Neural Networks and Support Vector Machine have proved to be particularly successful in the classification of seismic events and volcanic tremor changes heralding eruptive activity, such as paroxysmal explosions and lava fountaining at Stromboli and Mt Etna, Italy (e.g., Falsaperla et al., 1996; Langer et al., 2009). Moving on from the excellent results obtained from these applications, we present KKAnalysis, a MATLAB based software which combines several unsupervised pattern classification methods, exploiting routines of the SOM Toolbox 2 for MATLAB (http://www.cis.hut.fi/projects/somtoolbox). KKAnalysis is based on Self Organizing Maps (SOM) and clustering methods consisting of K-Means, Fuzzy C-Means, and a scheme based on a metrics accounting for correlation between components of the feature vector. We show examples of applications of this tool to volcanic tremor data recorded at Mt Etna between 2007 and 2009. This time span - during which Strombolian explosions, 7 episodes of lava fountaining and effusive activity occurred - is particularly interesting, as it encompassed different states of volcanic activity (i.e., non-eruptive, eruptive according to different styles) for the unsupervised classifier to identify, highlighting their development in time. Even subtle changes in the signal characteristics allow the unsupervised classifier to recognize features belonging to the different classes and stages of volcanic activity. A convenient color-code representation shows up the temporal development of the different classes of signal, making this method extremely helpful for monitoring purposes and surveillance. Though being developed for volcanic tremor classification, KKAnalysis is generally applicable to any type of physical or chemical pattern, provided that feature vectors are given in numerical form. References: Falsaperla, S., S. Graziani, G. Nunnari, and S. Spampinato (1996). Automatic classification of volcanic earthquakes by using multy-layered neural networks. Natural Hazard, 13, 205-228. Langer, H., S. Falsaperla, M. Masotti, R. Campanini, S. Spampinato, and A. Messina (2008). Synopsis of supervised and unsupervised pattern classification techniques applied to volcanic tremor data at Mt Etna, Italy. Geophys. J. Int., doi:10.1111/j.1365-246X.2009.04179.x.
NASA Astrophysics Data System (ADS)
Zhao, Lili; Yin, Jianping; Yuan, Lihuan; Liu, Qiang; Li, Kuan; Qiu, Minghui
2017-07-01
Automatic detection of abnormal cells from cervical smear images is extremely demanded in annual diagnosis of women's cervical cancer. For this medical cell recognition problem, there are three different feature sections, namely cytology morphology, nuclear chromatin pathology and region intensity. The challenges of this problem come from feature combination s and classification accurately and efficiently. Thus, we propose an efficient abnormal cervical cell detection system based on multi-instance extreme learning machine (MI-ELM) to deal with above two questions in one unified framework. MI-ELM is one of the most promising supervised learning classifiers which can deal with several feature sections and realistic classification problems analytically. Experiment results over Herlev dataset demonstrate that the proposed method outperforms three traditional methods for two-class classification in terms of well accuracy and less time.
Taghanaki, Saeid Asgari; Kawahara, Jeremy; Miles, Brandon; Hamarneh, Ghassan
2017-07-01
Feature reduction is an essential stage in computer aided breast cancer diagnosis systems. Multilayer neural networks can be trained to extract relevant features by encoding high-dimensional data into low-dimensional codes. Optimizing traditional auto-encoders works well only if the initial weights are close to a proper solution. They are also trained to only reduce the mean squared reconstruction error (MRE) between the encoder inputs and the decoder outputs, but do not address the classification error. The goal of the current work is to test the hypothesis that extending traditional auto-encoders (which only minimize reconstruction error) to multi-objective optimization for finding Pareto-optimal solutions provides more discriminative features that will improve classification performance when compared to single-objective and other multi-objective approaches (i.e. scalarized and sequential). In this paper, we introduce a novel multi-objective optimization of deep auto-encoder networks, in which the auto-encoder optimizes two objectives: MRE and mean classification error (MCE) for Pareto-optimal solutions, rather than just MRE. These two objectives are optimized simultaneously by a non-dominated sorting genetic algorithm. We tested our method on 949 X-ray mammograms categorized into 12 classes. The results show that the features identified by the proposed algorithm allow a classification accuracy of up to 98.45%, demonstrating favourable accuracy over the results of state-of-the-art methods reported in the literature. We conclude that adding the classification objective to the traditional auto-encoder objective and optimizing for finding Pareto-optimal solutions, using evolutionary multi-objective optimization, results in producing more discriminative features. Copyright © 2017 Elsevier B.V. All rights reserved.
A neural network approach for enhancing information extraction from multispectral image data
Liu, J.; Shao, G.; Zhu, H.; Liu, S.
2005-01-01
A back-propagation artificial neural network (ANN) was applied to classify multispectral remote sensing imagery data. The classification procedure included four steps: (i) noisy training that adds minor random variations to the sampling data to make the data more representative and to reduce the training sample size; (ii) iterative or multi-tier classification that reclassifies the unclassified pixels by making a subset of training samples from the original training set, which means the neural model can focus on fewer classes; (iii) spectral channel selection based on neural network weights that can distinguish the relative importance of each channel in the classification process to simplify the ANN model; and (iv) voting rules that adjust the accuracy of classification and produce outputs of different confidence levels. The Purdue Forest, located west of Purdue University, West Lafayette, Indiana, was chosen as the test site. The 1992 Landsat thematic mapper imagery was used as the input data. High-quality airborne photographs of the same Lime period were used for the ground truth. A total of 11 land use and land cover classes were defined, including water, broadleaved forest, coniferous forest, young forest, urban and road, and six types of cropland-grassland. The experiment, indicated that the back-propagation neural network application was satisfactory in distinguishing different land cover types at US Geological Survey levels II-III. The single-tier classification reached an overall accuracy of 85%. and the multi-tier classification an overall accuracy of 95%. For the whole test, region, the final output of this study reached an overall accuracy of 87%. ?? 2005 CASI.
CNN-BLPred: a Convolutional neural network based predictor for β-Lactamases (BL) and their classes.
White, Clarence; Ismail, Hamid D; Saigo, Hiroto; Kc, Dukka B
2017-12-28
The β-Lactamase (BL) enzyme family is an important class of enzymes that plays a key role in bacterial resistance to antibiotics. As the newly identified number of BL enzymes is increasing daily, it is imperative to develop a computational tool to classify the newly identified BL enzymes into one of its classes. There are two types of classification of BL enzymes: Molecular Classification and Functional Classification. Existing computational methods only address Molecular Classification and the performance of these existing methods is unsatisfactory. We addressed the unsatisfactory performance of the existing methods by implementing a Deep Learning approach called Convolutional Neural Network (CNN). We developed CNN-BLPred, an approach for the classification of BL proteins. The CNN-BLPred uses Gradient Boosted Feature Selection (GBFS) in order to select the ideal feature set for each BL classification. Based on the rigorous benchmarking of CCN-BLPred using both leave-one-out cross-validation and independent test sets, CCN-BLPred performed better than the other existing algorithms. Compared with other architectures of CNN, Recurrent Neural Network, and Random Forest, the simple CNN architecture with only one convolutional layer performs the best. After feature extraction, we were able to remove ~95% of the 10,912 features using Gradient Boosted Trees. During 10-fold cross validation, we increased the accuracy of the classic BL predictions by 7%. We also increased the accuracy of Class A, Class B, Class C, and Class D performance by an average of 25.64%. The independent test results followed a similar trend. We implemented a deep learning algorithm known as Convolutional Neural Network (CNN) to develop a classifier for BL classification. Combined with feature selection on an exhaustive feature set and using balancing method such as Random Oversampling (ROS), Random Undersampling (RUS) and Synthetic Minority Oversampling Technique (SMOTE), CNN-BLPred performs significantly better than existing algorithms for BL classification.
NASA Astrophysics Data System (ADS)
Schudlo, Larissa C.; Chau, Tom
2015-12-01
Objective. The majority of near-infrared spectroscopy (NIRS) brain-computer interface (BCI) studies have investigated binary classification problems. Limited work has considered differentiation of more than two mental states, or multi-class differentiation of higher-level cognitive tasks using measurements outside of the anterior prefrontal cortex. Improvements in accuracies are needed to deliver effective communication with a multi-class NIRS system. We investigated the feasibility of a ternary NIRS-BCI that supports mental states corresponding to verbal fluency task (VFT) performance, Stroop task performance, and unconstrained rest using prefrontal and parietal measurements. Approach. Prefrontal and parietal NIRS signals were acquired from 11 able-bodied adults during rest and performance of the VFT or Stroop task. Classification was performed offline using bagging with a linear discriminant base classifier trained on a 10 dimensional feature set. Main results. VFT, Stroop task and rest were classified at an average accuracy of 71.7% ± 7.9%. The ternary classification system provided a statistically significant improvement in information transfer rate relative to a binary system controlled by either mental task (0.87 ± 0.35 bits/min versus 0.73 ± 0.24 bits/min). Significance. These results suggest that effective communication can be achieved with a ternary NIRS-BCI that supports VFT, Stroop task and rest via measurements from the frontal and parietal cortices. Further development of such a system is warranted. Accurate ternary classification can enhance communication rates offered by NIRS-BCIs, improving the practicality of this technology.
Skimming Digits: Neuromorphic Classification of Spike-Encoded Images
Cohen, Gregory K.; Orchard, Garrick; Leng, Sio-Hoi; Tapson, Jonathan; Benosman, Ryad B.; van Schaik, André
2016-01-01
The growing demands placed upon the field of computer vision have renewed the focus on alternative visual scene representations and processing paradigms. Silicon retinea provide an alternative means of imaging the visual environment, and produce frame-free spatio-temporal data. This paper presents an investigation into event-based digit classification using N-MNIST, a neuromorphic dataset created with a silicon retina, and the Synaptic Kernel Inverse Method (SKIM), a learning method based on principles of dendritic computation. As this work represents the first large-scale and multi-class classification task performed using the SKIM network, it explores different training patterns and output determination methods necessary to extend the original SKIM method to support multi-class problems. Making use of SKIM networks applied to real-world datasets, implementing the largest hidden layer sizes and simultaneously training the largest number of output neurons, the classification system achieved a best-case accuracy of 92.87% for a network containing 10,000 hidden layer neurons. These results represent the highest accuracies achieved against the dataset to date and serve to validate the application of the SKIM method to event-based visual classification tasks. Additionally, the study found that using a square pulse as the supervisory training signal produced the highest accuracy for most output determination methods, but the results also demonstrate that an exponential pattern is better suited to hardware implementations as it makes use of the simplest output determination method based on the maximum value. PMID:27199646
WND-CHARM: Multi-purpose image classification using compound image transforms
Orlov, Nikita; Shamir, Lior; Macura, Tomasz; Johnston, Josiah; Eckley, D. Mark; Goldberg, Ilya G.
2008-01-01
We describe a multi-purpose image classifier that can be applied to a wide variety of image classification tasks without modifications or fine-tuning, and yet provide classification accuracy comparable to state-of-the-art task-specific image classifiers. The proposed image classifier first extracts a large set of 1025 image features including polynomial decompositions, high contrast features, pixel statistics, and textures. These features are computed on the raw image, transforms of the image, and transforms of transforms of the image. The feature values are then used to classify test images into a set of pre-defined image classes. This classifier was tested on several different problems including biological image classification and face recognition. Although we cannot make a claim of universality, our experimental results show that this classifier performs as well or better than classifiers developed specifically for these image classification tasks. Our classifier’s high performance on a variety of classification problems is attributed to (i) a large set of features extracted from images; and (ii) an effective feature selection and weighting algorithm sensitive to specific image classification problems. The algorithms are available for free download from openmicroscopy.org. PMID:18958301
Twarog, Nathaniel R.; Low, Jonathan A.; Currier, Duane G.; Miller, Greg; Chen, Taosheng; Shelat, Anang A.
2016-01-01
Phenotypic screening through high-content automated microscopy is a powerful tool for evaluating the mechanism of action of candidate therapeutics. Despite more than a decade of development, however, high content assays have yielded mixed results, identifying robust phenotypes in only a small subset of compound classes. This has led to a combinatorial explosion of assay techniques, analyzing cellular phenotypes across dozens of assays with hundreds of measurements. Here, using a minimalist three-stain assay and only 23 basic cellular measurements, we developed an analytical approach that leverages informative dimensions extracted by linear discriminant analysis to evaluate similarity between the phenotypic trajectories of different compounds in response to a range of doses. This method enabled us to visualize biologically-interpretable phenotypic tracks populated by compounds of similar mechanism of action, cluster compounds according to phenotypic similarity, and classify novel compounds by comparing them to phenotypically active exemplars. Hierarchical clustering applied to 154 compounds from over a dozen different mechanistic classes demonstrated tight agreement with published compound mechanism classification. Using 11 phenotypically active mechanism classes, classification was performed on all 154 compounds: 78% were correctly identified as belonging to one of the 11 exemplar classes or to a different unspecified class, with accuracy increasing to 89% when less phenotypically active compounds were excluded. Importantly, several apparent clustering and classification failures, including rigosertib and 5-fluoro-2’-deoxycytidine, instead revealed more complex mechanisms or off-target effects verified by more recent publications. These results show that a simple, easily replicated, minimalist high-content assay can reveal subtle variations in the cellular phenotype induced by compounds and can correctly predict mechanism of action, as long as the appropriate analytical tools are used. PMID:26886014
Twarog, Nathaniel R; Low, Jonathan A; Currier, Duane G; Miller, Greg; Chen, Taosheng; Shelat, Anang A
2016-01-01
Phenotypic screening through high-content automated microscopy is a powerful tool for evaluating the mechanism of action of candidate therapeutics. Despite more than a decade of development, however, high content assays have yielded mixed results, identifying robust phenotypes in only a small subset of compound classes. This has led to a combinatorial explosion of assay techniques, analyzing cellular phenotypes across dozens of assays with hundreds of measurements. Here, using a minimalist three-stain assay and only 23 basic cellular measurements, we developed an analytical approach that leverages informative dimensions extracted by linear discriminant analysis to evaluate similarity between the phenotypic trajectories of different compounds in response to a range of doses. This method enabled us to visualize biologically-interpretable phenotypic tracks populated by compounds of similar mechanism of action, cluster compounds according to phenotypic similarity, and classify novel compounds by comparing them to phenotypically active exemplars. Hierarchical clustering applied to 154 compounds from over a dozen different mechanistic classes demonstrated tight agreement with published compound mechanism classification. Using 11 phenotypically active mechanism classes, classification was performed on all 154 compounds: 78% were correctly identified as belonging to one of the 11 exemplar classes or to a different unspecified class, with accuracy increasing to 89% when less phenotypically active compounds were excluded. Importantly, several apparent clustering and classification failures, including rigosertib and 5-fluoro-2'-deoxycytidine, instead revealed more complex mechanisms or off-target effects verified by more recent publications. These results show that a simple, easily replicated, minimalist high-content assay can reveal subtle variations in the cellular phenotype induced by compounds and can correctly predict mechanism of action, as long as the appropriate analytical tools are used.
Convolutional Neural Network for Histopathological Analysis of Osteosarcoma.
Mishra, Rashika; Daescu, Ovidiu; Leavey, Patrick; Rakheja, Dinesh; Sengupta, Anita
2018-03-01
Pathologists often deal with high complexity and sometimes disagreement over osteosarcoma tumor classification due to cellular heterogeneity in the dataset. Segmentation and classification of histology tissue in H&E stained tumor image datasets is a challenging task because of intra-class variations, inter-class similarity, crowded context, and noisy data. In recent years, deep learning approaches have led to encouraging results in breast cancer and prostate cancer analysis. In this article, we propose convolutional neural network (CNN) as a tool to improve efficiency and accuracy of osteosarcoma tumor classification into tumor classes (viable tumor, necrosis) versus nontumor. The proposed CNN architecture contains eight learned layers: three sets of stacked two convolutional layers interspersed with max pooling layers for feature extraction and two fully connected layers with data augmentation strategies to boost performance. The use of a neural network results in higher accuracy of average 92% for the classification. We compare the proposed architecture with three existing and proven CNN architectures for image classification: AlexNet, LeNet, and VGGNet. We also provide a pipeline to calculate percentage necrosis in a given whole slide image. We conclude that the use of neural networks can assure both high accuracy and efficiency in osteosarcoma classification.
NASA Astrophysics Data System (ADS)
Hale Topaloğlu, Raziye; Sertel, Elif; Musaoğlu, Nebiye
2016-06-01
This study aims to compare classification accuracies of land cover/use maps created from Sentinel-2 and Landsat-8 data. Istanbul metropolitan city of Turkey, with a population of around 14 million, having different landscape characteristics was selected as study area. Water, forest, agricultural areas, grasslands, transport network, urban, airport- industrial units and barren land- mine land cover/use classes adapted from CORINE nomenclature were used as main land cover/use classes to identify. To fulfil the aims of this research, recently acquired dated 08/02/2016 Sentinel-2 and dated 22/02/2016 Landsat-8 images of Istanbul were obtained and image pre-processing steps like atmospheric and geometric correction were employed. Both Sentinel-2 and Landsat-8 images were resampled to 30m pixel size after geometric correction and similar spectral bands for both satellites were selected to create a similar base for these multi-sensor data. Maximum Likelihood (MLC) and Support Vector Machine (SVM) supervised classification methods were applied to both data sets to accurately identify eight different land cover/ use classes. Error matrix was created using same reference points for Sentinel-2 and Landsat-8 classifications. After the classification accuracy, results were compared to find out the best approach to create current land cover/use map of the region. The results of MLC and SVM classification methods were compared for both images.
Yu, Guan; Liu, Yufeng; Thung, Kim-Han; Shen, Dinggang
2014-01-01
Accurately identifying mild cognitive impairment (MCI) individuals who will progress to Alzheimer's disease (AD) is very important for making early interventions. Many classification methods focus on integrating multiple imaging modalities such as magnetic resonance imaging (MRI) and fluorodeoxyglucose positron emission tomography (FDG-PET). However, the main challenge for MCI classification using multiple imaging modalities is the existence of a lot of missing data in many subjects. For example, in the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, almost half of the subjects do not have PET images. In this paper, we propose a new and flexible binary classification method, namely Multi-task Linear Programming Discriminant (MLPD) analysis, for the incomplete multi-source feature learning. Specifically, we decompose the classification problem into different classification tasks, i.e., one for each combination of available data sources. To solve all different classification tasks jointly, our proposed MLPD method links them together by constraining them to achieve the similar estimated mean difference between the two classes (under classification) for those shared features. Compared with the state-of-the-art incomplete Multi-Source Feature (iMSF) learning method, instead of constraining different classification tasks to choose a common feature subset for those shared features, MLPD can flexibly and adaptively choose different feature subsets for different classification tasks. Furthermore, our proposed MLPD method can be efficiently implemented by linear programming. To validate our MLPD method, we perform experiments on the ADNI baseline dataset with the incomplete MRI and PET images from 167 progressive MCI (pMCI) subjects and 226 stable MCI (sMCI) subjects. We further compared our method with the iMSF method (using incomplete MRI and PET images) and also the single-task classification method (using only MRI or only subjects with both MRI and PET images). Experimental results show very promising performance of our proposed MLPD method.
Yu, Guan; Liu, Yufeng; Thung, Kim-Han; Shen, Dinggang
2014-01-01
Accurately identifying mild cognitive impairment (MCI) individuals who will progress to Alzheimer's disease (AD) is very important for making early interventions. Many classification methods focus on integrating multiple imaging modalities such as magnetic resonance imaging (MRI) and fluorodeoxyglucose positron emission tomography (FDG-PET). However, the main challenge for MCI classification using multiple imaging modalities is the existence of a lot of missing data in many subjects. For example, in the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, almost half of the subjects do not have PET images. In this paper, we propose a new and flexible binary classification method, namely Multi-task Linear Programming Discriminant (MLPD) analysis, for the incomplete multi-source feature learning. Specifically, we decompose the classification problem into different classification tasks, i.e., one for each combination of available data sources. To solve all different classification tasks jointly, our proposed MLPD method links them together by constraining them to achieve the similar estimated mean difference between the two classes (under classification) for those shared features. Compared with the state-of-the-art incomplete Multi-Source Feature (iMSF) learning method, instead of constraining different classification tasks to choose a common feature subset for those shared features, MLPD can flexibly and adaptively choose different feature subsets for different classification tasks. Furthermore, our proposed MLPD method can be efficiently implemented by linear programming. To validate our MLPD method, we perform experiments on the ADNI baseline dataset with the incomplete MRI and PET images from 167 progressive MCI (pMCI) subjects and 226 stable MCI (sMCI) subjects. We further compared our method with the iMSF method (using incomplete MRI and PET images) and also the single-task classification method (using only MRI or only subjects with both MRI and PET images). Experimental results show very promising performance of our proposed MLPD method. PMID:24820966
Galaxy emission line classification using three-dimensional line ratio diagrams
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vogt, Frédéric P. A.; Dopita, Michael A.; Kewley, Lisa J.
2014-10-01
Two-dimensional (2D) line ratio diagnostic diagrams have become a key tool in understanding the excitation mechanisms of galaxies. The curves used to separate the different regions—H II-like or excited by an active galactic nucleus (AGN)—have been refined over time but the core technique has not evolved significantly. However, the classification of galaxies based on their emission line ratios really is a multi-dimensional problem. Here we exploit recent software developments to explore the potential of three-dimensional (3D) line ratio diagnostic diagrams. We introduce the ZQE diagrams, which are a specific set of 3D diagrams that separate the oxygen abundance and themore » ionization parameter of H II region-like spectra and also enable us to probe the excitation mechanism of the gas. By examining these new 3D spaces interactively, we define the ZE diagnostics, a new set of 2D diagnostics that can provide the metallicity of objects excited by hot young stars and that cleanly separate H II region-like objects from the different classes of AGNs. We show that these ZE diagnostics are consistent with the key log [N II]/Hα versus log [O III]/Hβ diagnostic currently used by the community. They also have the advantage of attaching a probability that a given object belongs to one class or the other. Finally, we discuss briefly why ZQE diagrams can provide a new way to differentiate and study the different classes of AGNs in anticipation of a dedicated follow-up study.« less
Multi-class texture analysis in colorectal cancer histology
NASA Astrophysics Data System (ADS)
Kather, Jakob Nikolas; Weis, Cleo-Aron; Bianconi, Francesco; Melchers, Susanne M.; Schad, Lothar R.; Gaiser, Timo; Marx, Alexander; Zöllner, Frank Gerrit
2016-06-01
Automatic recognition of different tissue types in histological images is an essential part in the digital pathology toolbox. Texture analysis is commonly used to address this problem; mainly in the context of estimating the tumour/stroma ratio on histological samples. However, although histological images typically contain more than two tissue types, only few studies have addressed the multi-class problem. For colorectal cancer, one of the most prevalent tumour types, there are in fact no published results on multiclass texture separation. In this paper we present a new dataset of 5,000 histological images of human colorectal cancer including eight different types of tissue. We used this set to assess the classification performance of a wide range of texture descriptors and classifiers. As a result, we found an optimal classification strategy that markedly outperformed traditional methods, improving the state of the art for tumour-stroma separation from 96.9% to 98.6% accuracy and setting a new standard for multiclass tissue separation (87.4% accuracy for eight classes). We make our dataset of histological images publicly available under a Creative Commons license and encourage other researchers to use it as a benchmark for their studies.
Automated analysis and classification of melanocytic tumor on skin whole slide images.
Xu, Hongming; Lu, Cheng; Berendt, Richard; Jha, Naresh; Mandal, Mrinal
2018-06-01
This paper presents a computer-aided technique for automated analysis and classification of melanocytic tumor on skin whole slide biopsy images. The proposed technique consists of four main modules. First, skin epidermis and dermis regions are segmented by a multi-resolution framework. Next, epidermis analysis is performed, where a set of epidermis features reflecting nuclear morphologies and spatial distributions is computed. In parallel with epidermis analysis, dermis analysis is also performed, where dermal cell nuclei are segmented and a set of textural and cytological features are computed. Finally, the skin melanocytic image is classified into different categories such as melanoma, nevus or normal tissue by using a multi-class support vector machine (mSVM) with extracted epidermis and dermis features. Experimental results on 66 skin whole slide images indicate that the proposed technique achieves more than 95% classification accuracy, which suggests that the technique has the potential to be used for assisting pathologists on skin biopsy image analysis and classification. Copyright © 2018 Elsevier Ltd. All rights reserved.
Protein classification using sequential pattern mining.
Exarchos, Themis P; Papaloukas, Costas; Lampros, Christos; Fotiadis, Dimitrios I
2006-01-01
Protein classification in terms of fold recognition can be employed to determine the structural and functional properties of a newly discovered protein. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. One of the most efficient SPM algorithms, cSPADE, is employed for protein primary structure analysis. Then a classifier uses the extracted sequential patterns for classifying proteins of unknown structure in the appropriate fold category. The proposed methodology exhibited an overall accuracy of 36% in a multi-class problem of 17 candidate categories. The classification performance reaches up to 65% when the three most probable protein folds are considered.
Immune Centroids Over-Sampling Method for Multi-Class Classification
2015-05-22
recognize to specific antigens . The response of a receptor to an antigen can activate its hosting B-cell. Activated B-cell then proliferates and...modifying N.K. Jerne’s theory. The theory states that in a pre-existing group of lympho- cytes ( specifically B cells), a specific antigen only...the clusters of each small class, which have high data density, called global immune centroids over-sampling (denoted as Global-IC). Specifically
Transient classification in LIGO data using difference boosting neural network
NASA Astrophysics Data System (ADS)
Mukund, N.; Abraham, S.; Kandhasamy, S.; Mitra, S.; Philip, N. S.
2017-05-01
Detection and classification of transients in data from gravitational wave detectors are crucial for efficient searches for true astrophysical events and identification of noise sources. We present a hybrid method for classification of short duration transients seen in gravitational wave data using both supervised and unsupervised machine learning techniques. To train the classifiers, we use the relative wavelet energy and the corresponding entropy obtained by applying one-dimensional wavelet decomposition on the data. The prediction accuracy of the trained classifier on nine simulated classes of gravitational wave transients and also LIGO's sixth science run hardware injections are reported. Targeted searches for a couple of known classes of nonastrophysical signals in the first observational run of Advanced LIGO data are also presented. The ability to accurately identify transient classes using minimal training samples makes the proposed method a useful tool for LIGO detector characterization as well as searches for short duration gravitational wave signals.
NASA Astrophysics Data System (ADS)
Kaabeche, Hamza; Chabou, Moulley Charaf; Bendaoud, Abderrahmane; Bodinier, Jean-Louis; Lobry, Olivier; Retif, Fabien
2016-06-01
Rising economic value of a large number of metals as a result of their importance for new technologies and industrial development has renewed worldwide interest for mineral exploration and detailed studies of ore deposits. The Dill's (2010) "chessboard" classification of mineral deposits is the most recent attempt to provide an exhaustive overview of all mineral deposits known to date. However, the voluminous Dills review paper is accessible only in print or as PDF file. In this article, we present MetClass, software that provides advanced solutions to perform efficient research and statistics using Dill's classification and the related database. MetClass allows to assemble all results relevant to a given ore deposit on a user-friendly interface. This software is therefore a valuable tool for mineral exploration and research on ore deposits, as well as an educational solution for students in metallogeny.
Building rooftop classification using random forests for large-scale PV deployment
NASA Astrophysics Data System (ADS)
Assouline, Dan; Mohajeri, Nahid; Scartezzini, Jean-Louis
2017-10-01
Large scale solar Photovoltaic (PV) deployment on existing building rooftops has proven to be one of the most efficient and viable sources of renewable energy in urban areas. As it usually requires a potential analysis over the area of interest, a crucial step is to estimate the geometric characteristics of the building rooftops. In this paper, we introduce a multi-layer machine learning methodology to classify 6 roof types, 9 aspect (azimuth) classes and 5 slope (tilt) classes for all building rooftops in Switzerland, using GIS processing. We train Random Forests (RF), an ensemble learning algorithm, to build the classifiers. We use (2 × 2) [m2 ] LiDAR data (considering buildings and vegetation) to extract several rooftop features, and a generalised footprint polygon data to localize buildings. The roof classifier is trained and tested with 1252 labeled roofs from three different urban areas, namely Baden, Luzern, and Winterthur. The results for roof type classification show an average accuracy of 67%. The aspect and slope classifiers are trained and tested with 11449 labeled roofs in the Zurich periphery area. The results for aspect and slope classification show different accuracies depending on the classes: while some classes are well identified, other under-represented classes remain challenging to detect.
A framework for global terrain classification using 250-m DEMs to predict geohazards
NASA Astrophysics Data System (ADS)
Iwahashi, J.; Matsuoka, M.; Yong, A.
2016-12-01
Geomorphology is key for identifying factors that control geohazards induced by landslides, liquefaction, and ground shaking. To systematically identify landforms that affect these hazards, Iwahashi and Pike (2007; IP07) introduced an automated terrain classification scheme using 1-km-scale Shuttle Radar Topography Mission (SRTM) digital elevation models (DEMs). The IP07 classes describe 16 categories of terrain types and were used as a proxy for predicting ground motion amplification (Yong et al., 2012; Seyhan et al., 2014; Stewart et al., 2014; Yong, 2016). These classes, however, were not sufficiently resolved because coarse-scaled SRTM DEMs were the basis for the categories (Yong, 2016). Thus, we develop a new framework consisting of more detailed polygonal global terrain classes to improve estimations of soil-type and material stiffness. We first prepare high resolution 250-m DEMs derived from the 2010 Global Multi-resolution Terrain Elevation Data (GMTED2010). As in IP07, we calculate three geometric signatures (slope, local convexity and surface texture) from the DEMs. We create additional polygons by using the same signatures and multi-resolution segmentation techniques on the GMTED2010. We consider two types of surface texture thresholds in different window sizes (3x3 and 13x13 pixels), in addition to slope and local convexity, to classify pixels within the DEM. Finally, we apply the k-means clustering and thresholding methods to the 250-m DEM and produce more detailed polygonal terrain classes. We compare the new terrain classification maps of Japan and California with geologic, aerial photography, and landslide distribution maps, and visually find good correspondence of key features. To predict ground motion amplification, we apply the Yong (2016) method for estimating VS30. The systematic classification of geomorphology has the potential to provide a better understanding of the susceptibility to geohazards, which is especially vital in populated areas.
A support vector machine approach for classification of welding defects from ultrasonic signals
NASA Astrophysics Data System (ADS)
Chen, Yuan; Ma, Hong-Wei; Zhang, Guang-Ming
2014-07-01
Defect classification is an important issue in ultrasonic non-destructive evaluation. A layered multi-class support vector machine (LMSVM) classification system, which combines multiple SVM classifiers through a layered architecture, is proposed in this paper. The proposed LMSVM classification system is applied to the classification of welding defects from ultrasonic test signals. The measured ultrasonic defect echo signals are first decomposed into wavelet coefficients by the wavelet packet transform. The energy of the wavelet coefficients at different frequency channels are used to construct the feature vectors. The bees algorithm (BA) is then used for feature selection and SVM parameter optimisation for the LMSVM classification system. The BA-based feature selection optimises the energy feature vectors. The optimised feature vectors are input to the LMSVM classification system for training and testing. Experimental results of classifying welding defects demonstrate that the proposed technique is highly robust, precise and reliable for ultrasonic defect classification.
Multi-Class Classification for Identifying JPEG Steganography Embedding Methods
2008-09-01
B.H. (2000). STEGANOGRAPHY: Hidden Images, A New Challenge in the Fight Against Child Porn . UPDATE, Volume 13, Number 2, pp. 1-4, Retrieved June 3...Other crimes involving the use of steganography include child pornography where the stego files are used to hide a predator’s location when posting
Tong, Tong; Ledig, Christian; Guerrero, Ricardo; Schuh, Andreas; Koikkalainen, Juha; Tolonen, Antti; Rhodius, Hanneke; Barkhof, Frederik; Tijms, Betty; Lemstra, Afina W; Soininen, Hilkka; Remes, Anne M; Waldemar, Gunhild; Hasselbalch, Steen; Mecocci, Patrizia; Baroni, Marta; Lötjönen, Jyrki; Flier, Wiesje van der; Rueckert, Daniel
2017-01-01
Differentiating between different types of neurodegenerative diseases is not only crucial in clinical practice when treatment decisions have to be made, but also has a significant potential for the enrichment of clinical trials. The purpose of this study is to develop a classification framework for distinguishing the four most common neurodegenerative diseases, including Alzheimer's disease, frontotemporal lobe degeneration, Dementia with Lewy bodies and vascular dementia, as well as patients with subjective memory complaints. Different biomarkers including features from images (volume features, region-wise grading features) and non-imaging features (CSF measures) were extracted for each subject. In clinical practice, the prevalence of different dementia types is imbalanced, posing challenges for learning an effective classification model. Therefore, we propose the use of the RUSBoost algorithm in order to train classifiers and to handle the class imbalance training problem. Furthermore, a multi-class feature selection method based on sparsity is integrated into the proposed framework to improve the classification performance. It also provides a way for investigating the importance of different features and regions. Using a dataset of 500 subjects, the proposed framework achieved a high accuracy of 75.2% with a balanced accuracy of 69.3% for the five-class classification using ten-fold cross validation, which is significantly better than the results using support vector machine or random forest, demonstrating the feasibility of the proposed framework to support clinical decision making.
NASA Astrophysics Data System (ADS)
Chung, C.; Nagol, J. R.; Tao, X.; Anand, A.; Dempewolf, J.
2015-12-01
Increasing agricultural production while at the same time preserving the environment has become a challenging task. There is a need for new approaches for use of multi-scale and multi-source remote sensing data as well as ground based measurements for mapping and monitoring crop and ecosystem state to support decision making by governmental and non-governmental organizations for sustainable agricultural development. High resolution sub-meter imagery plays an important role in such an integrative framework of landscape monitoring. It helps link the ground based data to more easily available coarser resolution data, facilitating calibration and validation of derived remote sensing products. Here we present a hierarchical Object Based Image Analysis (OBIA) approach to classify sub-meter imagery. The primary reason for choosing OBIA is to accommodate pixel sizes smaller than the object or class of interest. Especially in non-homogeneous savannah regions of Tanzania, this is an important concern and the traditional pixel based spectral signature approach often fails. Ortho-rectified, calibrated, pan sharpened 0.5 meter resolution data acquired from DigitalGlobe's WorldView-2 satellite sensor was used for this purpose. Multi-scale hierarchical segmentation was performed using multi-resolution segmentation approach to facilitate the use of texture, neighborhood context, and the relationship between super and sub objects for training and classification. eCognition, a commonly used OBIA software program, was used for this purpose. Both decision tree and random forest approaches for classification were tested. The Kappa index agreement for both algorithms surpassed the 85%. The results demonstrate that using hierarchical OBIA can effectively and accurately discriminate classes at even LCCS-3 legend.
Summer Crop Classification by Multi-Temporal COSMO-SkyMed® Data
NASA Astrophysics Data System (ADS)
Guarini, Rocchina; Bruzzone, Lorenzo; Santoni, Massimo; Vuolo, Francesco; Luigi, Dini
2016-08-01
In this study, we propose a multi-temporal and multi- polarization approach to discriminate different crop types in the Marchefel region, Austria. The sensitivity of X-band COSMO-SkyMed® (CSK®) data with respect to five crop classes, namely carrot, corn, potato, soybean and sugarbeet is investigated. In particular, the capabilities of dual-polarization (StripMap PingPong) HH/HV, and single-polarization (StripMap Himage), HH and VH, in distinguishing among the five crop types are evaluated. A total of twenty-one Himage and ten PingPong images were acquired in a seven-months period, from April to October 2014. Therefore, the backscattering coefficient was extracted for each dataset and the classification was performed using a pixel-based support vector machine (SVM) approach. The accuracy of the obtained crop classifications was assessed by comparing them with ground truth. The dual-polarization results are contrasted between the HH and HV polarization, and with single-polarization ones (HH and VH polarizations). The best accuracy is obtained by using time-series of StripMap Himage data, at VH polarization, covering the whole season period.
Multi-class segmentation of neuronal electron microscopy images using deep learning
NASA Astrophysics Data System (ADS)
Khobragade, Nivedita; Agarwal, Chirag
2018-03-01
Study of connectivity of neural circuits is an essential step towards a better understanding of functioning of the nervous system. With the recent improvement in imaging techniques, high-resolution and high-volume images are being generated requiring automated segmentation techniques. We present a pixel-wise classification method based on Bayesian SegNet architecture. We carried out multi-class segmentation on serial section Transmission Electron Microscopy (ssTEM) images of Drosophila third instar larva ventral nerve cord, labeling the four classes of neuron membranes, neuron intracellular space, mitochondria and glia / extracellular space. Bayesian SegNet was trained using 256 ssTEM images of 256 x 256 pixels and tested on 64 different ssTEM images of the same size, from the same serial stack. Due to high class imbalance, we used a class-balanced version of Bayesian SegNet by re-weighting each class based on their relative frequency. We achieved an overall accuracy of 93% and a mean class accuracy of 88% for pixel-wise segmentation using this encoder-decoder approach. On evaluating the segmentation results using similarity metrics like SSIM and Dice Coefficient, we obtained scores of 0.994 and 0.886 respectively. Additionally, we used the network trained using the 256 ssTEM images of Drosophila third instar larva for multi-class labeling of ISBI 2012 challenge ssTEM dataset.
Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A
2015-06-01
Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification. Copyright © 2015 Elsevier Ltd. All rights reserved.
Automatic Cloud Classification from Multi-Spectral Satellite Data Over Oceanic Regions
1992-01-14
parameters the first two colors used are, blue for low values and dark green for high parameter values. If a third class is identified, the intermediate...intermediate yellow and high dark green classes. The color sequence blue-yellow-light green- dark green, then characterizes the low to high parameter value...to light green then to dark green correspond to superpixels of increasing (from low to high) variability in their altitude, (see Table V.3). When the
Wang, Jie; Zeng, Hao-Long; Du, Hongying; Liu, Zeyuan; Cheng, Ji; Liu, Taotao; Hu, Ting; Kamal, Ghulam Mustafa; Li, Xihai; Liu, Huili; Xu, Fuqiang
2018-03-01
Metabolomics generate a profile of small molecules from cellular/tissue metabolism, which could directly reflect the mechanisms of complex networks of biochemical reactions. Traditional metabolomics methods, such as OPLS-DA, PLS-DA are mainly used for binary class discrimination. Multiple groups are always involved in the biological system, especially for brain research. Multiple brain regions are involved in the neuronal study of brain metabolic dysfunctions such as alcoholism, Alzheimer's disease, etc. In the current study, 10 different brain regions were utilized for comparative studies between alcohol preferring and non-preferring rats, male and female rats respectively. As many classes are involved (ten different regions and four types of animals), traditional metabolomics methods are no longer efficient for showing differentiation. Here, a novel strategy based on the decision tree algorithm was employed for successfully constructing different classification models to screen out the major characteristics of ten brain regions at the same time. Subsequently, this method was also utilized to select the major effective brain regions related to alcohol preference and gender difference. Compared with the traditional multivariate statistical methods, the decision tree could construct acceptable and understandable classification models for multi-class data analysis. Therefore, the current technology could also be applied to other general metabolomics studies involving multi class data. Copyright © 2017 Elsevier B.V. All rights reserved.
A novel framework for change detection in bi-temporal polarimetric SAR images
NASA Astrophysics Data System (ADS)
Pirrone, Davide; Bovolo, Francesca; Bruzzone, Lorenzo
2016-10-01
Last years have seen relevant increase of polarimetric Synthetic Aperture Radar (SAR) data availability, thanks to satellite sensors like Sentinel-1 or ALOS-2 PALSAR-2. The augmented information lying in the additional polarimetric channels represents a possibility for better discriminate different classes of changes in change detection (CD) applications. This work aims at proposing a framework for CD in multi-temporal multi-polarization SAR data. The framework includes both a tool for an effective visual representation of the change information and a method for extracting the multiple-change information. Both components are designed to effectively handle the multi-dimensionality of polarimetric data. In the novel representation, multi-temporal intensity SAR data are employed to compute a polarimetric log-ratio. The multitemporal information of the polarimetric log-ratio image is represented in a multi-dimensional features space, where changes are highlighted in terms of magnitude and direction. This representation is employed to design a novel unsupervised multi-class CD approach. This approach considers a sequential two-step analysis of the magnitude and the direction information for separating non-changed and changed samples. The proposed approach has been validated on a pair of Sentinel-1 data acquired before and after the flood in Tamil-Nadu in 2015. Preliminary results demonstrate that the representation tool is effective and that the use of polarimetric SAR data is promising in multi-class change detection applications.
A New Tool for Classifying Small Solar System Objects
NASA Astrophysics Data System (ADS)
Desfosses, Ryan; Arel, D.; Walker, M. E.; Ziffer, J.; Harvell, T.; Campins, H.; Fernandez, Y. R.
2011-05-01
An artificial intelligence program, AutoClass, which was developed by NASA's Artificial Intelligence Branch, uses Bayesian classification theory to automatically choose the most probable classification distribution to describe a dataset. To investigate its usefulness to the Planetary Science community, we tested its ability to reproduce the taxonomic classes as defined by Tholen and Barucci (1989). Of the 406 asteroids from the Eight Color Asteroid Survey (ECAS) we chose for our test, 346 were firmly classified and all but 3 (<1%) were classified by Autoclass as they had been in the previous classification system (Walker et al., 2011). We are now applying it to larger datasets to improve the taxonomy of currently unclassified objects. Having demonstrated AutoClass's ability to recreate existing classification effectively, we extended this work to investigations of albedo-based classification systems. To determine how predictive albedo can be, we used data from the Infrared Astronomical Satellite (IRAS) database in conjunction with the large Sloan Digital Sky Survey (SDSS), which contains color and position data for over 200,000 classified and unclassified asteroids (Ivesic et al., 2001). To judge our success we compared our results with a similar approach to classifying objects using IRAS albedo and asteroid color by Tedesco et al. (1989). Understanding the distribution of the taxonomic classes is important to understanding the history and evolution of our Solar System. AutoClass's success in categorizing ECAS, IRAS and SDSS asteroidal data highlights its potential to scan large domains for natural classes in small solar system objects. Based upon our AutoClass results, we intend to make testable predictions about asteroids observed with the Wide-field Infrared Survey Explorer (WISE).
A novel fruit shape classification method based on multi-scale analysis
NASA Astrophysics Data System (ADS)
Gui, Jiangsheng; Ying, Yibin; Rao, Xiuqin
2005-11-01
Shape is one of the major concerns and which is still a difficult problem in automated inspection and sorting of fruits. In this research, we proposed the multi-scale energy distribution (MSED) for object shape description, the relationship between objects shape and its boundary energy distribution at multi-scale was explored for shape extraction. MSED offers not only the mainly energy which represent primary shape information at the lower scales, but also subordinate energy which represent local shape information at higher differential scales. Thus, it provides a natural tool for multi resolution representation and can be used as a feature for shape classification. We addressed the three main processing steps in the MSED-based shape classification. They are namely, 1) image preprocessing and citrus shape extraction, 2) shape resample and shape feature normalization, 3) energy decomposition by wavelet and classification by BP neural network. Hereinto, shape resample is resample 256 boundary pixel from a curve which is approximated original boundary by using cubic spline in order to get uniform raw data. A probability function was defined and an effective method to select a start point was given through maximal expectation, which overcame the inconvenience of traditional methods in order to have a property of rotation invariants. The experiment result is relatively well normal citrus and serious abnormality, with a classification rate superior to 91.2%. The global correct classification rate is 89.77%, and our method is more effective than traditional method. The global result can meet the request of fruit grading.
Oshiyama, Natália F; Bassani, Rosana A; D'Ottaviano, Itala M L; Bassani, José W M
2012-04-01
As technology evolves, the role of medical equipment in the healthcare system, as well as technology management, becomes more important. Although the existence of large databases containing management information is currently common, extracting useful information from them is still difficult. A useful tool for identification of frequently failing equipment, which increases maintenance cost and downtime, would be the classification according to the corrective maintenance data. Nevertheless, establishment of classes may create inconsistencies, since an item may be close to two classes by the same extent. Paraconsistent logic might help solve this problem, as it allows the existence of inconsistent (contradictory) information without trivialization. In this paper, a methodology for medical equipment classification based on the ABC analysis of corrective maintenance data is presented, and complemented with a paraconsistent annotated logic analysis, which may enable the decision maker to take into consideration alerts created by the identification of inconsistencies and indeterminacies in the classification.
Multi-category micro-milling tool wear monitoring with continuous hidden Markov models
NASA Astrophysics Data System (ADS)
Zhu, Kunpeng; Wong, Yoke San; Hong, Geok Soon
2009-02-01
In-process monitoring of tool conditions is important in micro-machining due to the high precision requirement and high tool wear rate. Tool condition monitoring in micro-machining poses new challenges compared to conventional machining. In this paper, a multi-category classification approach is proposed for tool flank wear state identification in micro-milling. Continuous Hidden Markov models (HMMs) are adapted for modeling of the tool wear process in micro-milling, and estimation of the tool wear state given the cutting force features. For a noise-robust approach, the HMM outputs are connected via a medium filter to minimize the tool state before entry into the next state due to high noise level. A detailed study on the selection of HMM structures for tool condition monitoring (TCM) is presented. Case studies on the tool state estimation in the micro-milling of pure copper and steel demonstrate the effectiveness and potential of these methods.
Environmental Monitoring Networks Optimization Using Advanced Active Learning Algorithms
NASA Astrophysics Data System (ADS)
Kanevski, Mikhail; Volpi, Michele; Copa, Loris
2010-05-01
The problem of environmental monitoring networks optimization (MNO) belongs to one of the basic and fundamental tasks in spatio-temporal data collection, analysis, and modeling. There are several approaches to this problem, which can be considered as a design or redesign of monitoring network by applying some optimization criteria. The most developed and widespread methods are based on geostatistics (family of kriging models, conditional stochastic simulations). In geostatistics the variance is mainly used as an optimization criterion which has some advantages and drawbacks. In the present research we study an application of advanced techniques following from the statistical learning theory (SLT) - support vector machines (SVM) and the optimization of monitoring networks when dealing with a classification problem (data are discrete values/classes: hydrogeological units, soil types, pollution decision levels, etc.) is considered. SVM is a universal nonlinear modeling tool for classification problems in high dimensional spaces. The SVM solution is maximizing the decision boundary between classes and has a good generalization property for noisy data. The sparse solution of SVM is based on support vectors - data which contribute to the solution with nonzero weights. Fundamentally the MNO for classification problems can be considered as a task of selecting new measurement points which increase the quality of spatial classification and reduce the testing error (error on new independent measurements). In SLT this is a typical problem of active learning - a selection of the new unlabelled points which efficiently reduce the testing error. A classical approach (margin sampling) to active learning is to sample the points closest to the classification boundary. This solution is suboptimal when points (or generally the dataset) are redundant for the same class. In the present research we propose and study two new advanced methods of active learning adapted to the solution of MNO problem: 1) hierarchical top-down clustering in an input space in order to remove redundancy when data are clustered, and 2) a general method (independent on classifier) which gives posterior probabilities that can be used to define the classifier confidence and corresponding proposals for new measurement points. The basic ideas and procedures are explained by applying simulated data sets. The real case study deals with the analysis and mapping of soil types, which is a multi-class classification problem. Maps of soil types are important for the analysis and 3D modeling of heavy metals migration in soil and prediction risk mapping. The results obtained demonstrate the high quality of SVM mapping and efficiency of monitoring network optimization by using active learning approaches. The research was partly supported by SNSF projects No. 200021-126505 and 200020-121835.
NASA Astrophysics Data System (ADS)
Siregar, V. P.; Agus, S. B.; Subarno, T.; Prabowo, N. W.
2018-05-01
The availability of satellite imagery with a variety of spatial resolution, both free access and commercial become as an option in utilizing the remote sensing technology. Variability of the water column is one of the factors affecting the interpretation results when mapping marine shallow waters. This study aimed to evaluate the influence of water column correction (depth-invariant index) on the accuracy of shallow water habitat classification results using OBIA. This study was conducted in North of Kepulauan Seribu, precisely in Harapan Island and its surrounding areas. Habitat class schemes were based on field observations, which were then used to build habitat classes on satellite imagery. The water column correction was applied to the three pairs of SPOT-7 multispectral bands, which were subsequently used in object-based classification. Satellite image classification was performed with four different approaches, namely (i) using DII transformed bands with single pair band input (B1B2), (ii) multi pairs bands (B1B2, B1B3, and B2B3), (iii) combination of multi pairs band and initial bands, and (iv) only using initial bands. The accuracy test results of the four inputs show the values of Overall Accuracy and Kappa Statistics, respectively 55.84 and 0.48; 68.53 and 0.64; 78.68 and 0.76; 77.66 and 0.74. It shows that the best results when using DII and initial band combination for shallow water benthic classification in this study site.
Retinal Microaneurysms Detection Using Gradient Vector Analysis and Class Imbalance Classification.
Dai, Baisheng; Wu, Xiangqian; Bu, Wei
2016-01-01
Retinal microaneurysms (MAs) are the earliest clinically observable lesions of diabetic retinopathy. Reliable automated MAs detection is thus critical for early diagnosis of diabetic retinopathy. This paper proposes a novel method for the automated MAs detection in color fundus images based on gradient vector analysis and class imbalance classification, which is composed of two stages, i.e. candidate MAs extraction and classification. In the first stage, a candidate MAs extraction algorithm is devised by analyzing the gradient field of the image, in which a multi-scale log condition number map is computed based on the gradient vectors for vessel removal, and then the candidate MAs are localized according to the second order directional derivatives computed in different directions. Due to the complexity of fundus image, besides a small number of true MAs, there are also a large amount of non-MAs in the extracted candidates. Classifying the true MAs and the non-MAs is an extremely class imbalanced classification problem. Therefore, in the second stage, several types of features including geometry, contrast, intensity, edge, texture, region descriptors and other features are extracted from the candidate MAs and a class imbalance classifier, i.e., RUSBoost, is trained for the MAs classification. With the Retinopathy Online Challenge (ROC) criterion, the proposed method achieves an average sensitivity of 0.433 at 1/8, 1/4, 1/2, 1, 2, 4 and 8 false positives per image on the ROC database, which is comparable with the state-of-the-art approaches, and 0.321 on the DiaRetDB1 V2.1 database, which outperforms the state-of-the-art approaches.
NASA Astrophysics Data System (ADS)
Clark, M. L.
2016-12-01
The goal of this study was to assess multi-temporal, Hyperspectral Infrared Imager (HyspIRI) satellite imagery for improved forest class mapping relative to multispectral satellites. The study area was the western San Francisco Bay Area, California and forest alliances (e.g., forest communities defined by dominant or co-dominant trees) were defined using the U.S. National Vegetation Classification System. Simulated 30-m HyspIRI, Landsat 8 and Sentinel-2 imagery were processed from image data acquired by NASA's AVIRIS airborne sensor in year 2015, with summer and multi-temporal (spring, summer, fall) data analyzed separately. HyspIRI reflectance was used to generate a suite of hyperspectral metrics that targeted key spectral features related to chemical and structural properties. The Random Forests classifier was applied to the simulated images and overall accuracies (OA) were compared to those from real Landsat 8 images. For each image group, broad land cover (e.g., Needle-leaf Trees, Broad-leaf Trees, Annual agriculture, Herbaceous, Built-up) was classified first, followed by a finer-detail forest alliance classification for pixels mapped as closed-canopy forest. There were 5 needle-leaf tree alliances and 16 broad-leaf tree alliances, including 7 Quercus (oak) alliance types. No forest alliance classification exceeded 50% OA, indicating that there was broad spectral similarity among alliances, most of which were not spectrally pure but rather a mix of tree species. In general, needle-leaf (Pine, Redwood, Douglas Fir) alliances had better class accuracies than broad-leaf alliances (Oaks, Madrone, Bay Laurel, Buckeye, etc). Multi-temporal data classifications all had 5-6% greater OA than with comparable summer data. For simulated data, HyspIRI metrics had 4-5% greater OA than Landsat 8 and Sentinel-2 multispectral imagery and 3-4% greater OA than HyspIRI reflectance. Finally, HyspIRI metrics had 8% greater OA than real Landsat 8 imagery. In conclusion, forest alliance classification was found to be a difficult remote sensing application with moderate resolution (30 m) satellite imagery; however, of the data tested, HyspIRI spectral metrics had the best performance relative to multispectral satellites.
NASA Astrophysics Data System (ADS)
Wu, Jie; Besnehard, Quentin; Marchessoux, Cédric
2011-03-01
Clinical studies for the validation of new medical imaging devices require hundreds of images. An important step in creating and tuning the study protocol is the classification of images into "difficult" and "easy" cases. This consists of classifying the image based on features like the complexity of the background, the visibility of the disease (lesions). Therefore, an automatic medical background classification tool for mammograms would help for such clinical studies. This classification tool is based on a multi-content analysis framework (MCA) which was firstly developed to recognize image content of computer screen shots. With the implementation of new texture features and a defined breast density scale, the MCA framework is able to automatically classify digital mammograms with a satisfying accuracy. BI-RADS (Breast Imaging Reporting Data System) density scale is used for grouping the mammograms, which standardizes the mammography reporting terminology and assessment and recommendation categories. Selected features are input into a decision tree classification scheme in MCA framework, which is the so called "weak classifier" (any classifier with a global error rate below 50%). With the AdaBoost iteration algorithm, these "weak classifiers" are combined into a "strong classifier" (a classifier with a low global error rate) for classifying one category. The results of classification for one "strong classifier" show the good accuracy with the high true positive rates. For the four categories the results are: TP=90.38%, TN=67.88%, FP=32.12% and FN =9.62%.
Zhao, Nan; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry
2014-01-01
Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/. PMID:24784581
Seismic classification through sparse filter dictionaries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hickmann, Kyle Scott; Srinivasan, Gowri
We tackle a multi-label classi cation problem involving the relation between acoustic- pro le features and the measured seismogram. To isolate components of the seismo- grams unique to each class of acoustic pro le we build dictionaries of convolutional lters. The convolutional- lter dictionaries for the individual classes are then combined into a large dictionary for the entire seismogram set. A given seismogram is classi ed by computing its representation in the large dictionary and then comparing reconstruction accuracy with this representation using each of the sub-dictionaries. The sub-dictionary with the minimal reconstruction error identi es the seismogram class.
Multicategory Composite Least Squares Classifiers
Park, Seo Young; Liu, Yufeng; Liu, Dacheng; Scholl, Paul
2010-01-01
Classification is a very useful statistical tool for information extraction. In particular, multicategory classification is commonly seen in various applications. Although binary classification problems are heavily studied, extensions to the multicategory case are much less so. In view of the increased complexity and volume of modern statistical problems, it is desirable to have multicategory classifiers that are able to handle problems with high dimensions and with a large number of classes. Moreover, it is necessary to have sound theoretical properties for the multicategory classifiers. In the literature, there exist several different versions of simultaneous multicategory Support Vector Machines (SVMs). However, the computation of the SVM can be difficult for large scale problems, especially for problems with large number of classes. Furthermore, the SVM cannot produce class probability estimation directly. In this article, we propose a novel efficient multicategory composite least squares classifier (CLS classifier), which utilizes a new composite squared loss function. The proposed CLS classifier has several important merits: efficient computation for problems with large number of classes, asymptotic consistency, ability to handle high dimensional data, and simple conditional class probability estimation. Our simulated and real examples demonstrate competitive performance of the proposed approach. PMID:21218128
Investigations of possible contributions NDVI's have to misclassification in AVHRR data analysis
David L. Evans; Raymond L. Czaplewski
1996-01-01
Numerous subcontinental-scale projects have placed significant emphasis on the use of Normalized Difference Vegetation Indices (NDVI's) derived from Advanced Very High Resolution Radiometer (AVHRR) satellite data for vegetation type recognition. In multi-season AVHRR data, overlap of NDVI ranges for vegetation classes may degrade overall classification performance...
A multi-criteria inference approach for anti-desertification management.
Tervonen, Tommi; Sepehr, Adel; Kadziński, Miłosz
2015-10-01
We propose an approach for classifying land zones into categories indicating their resilience against desertification. Environmental management support is provided by a multi-criteria inference method that derives a set of value functions compatible with the given classification examples, and applies them to define, for the rest of the zones, their possible classes. In addition, a representative value function is inferred to explain the relative importance of the criteria to the stakeholders. We use the approach for classifying 28 administrative regions of the Khorasan Razavi province in Iran into three equilibrium classes: collapsed, transition, and sustainable zones. The model is parameterized with enhanced vegetation index measurements from 2005 to 2012, and 7 other natural and anthropogenic indicators for the status of the region in 2012. Results indicate that grazing density and land use changes are the main anthropogenic factors affecting desertification in Khorasan Razavi. The inference procedure suggests that the classification model is underdetermined in terms of attributes, but the approach itself is promising for supporting the management of anti-desertification efforts. Copyright © 2015 Elsevier Ltd. All rights reserved.
Kwon, Min-Seok; Nam, Seungyoon; Lee, Sungyoung; Ahn, Young Zoo; Chang, Hae Ryung; Kim, Yon Hui; Park, Taesung
2017-01-01
The recent creation of enormous, cancer-related “Big Data” public depositories represents a powerful means for understanding tumorigenesis. However, a consistently accurate system for clinically evaluating single/multi-biomarkers remains lacking, and it has been asserted that oft-failed clinical advancement of biomarkers occurs within the very early stages of biomarker assessment. To address these challenges, we developed a clinically testable, web-based tool, CANcer-specific single/multi-biomarker Evaluation System (CANES), to evaluate biomarker effectiveness, across 2,134 whole transcriptome datasets, from 94,147 biological samples (from 18 tumor types). For user-provided single/multi-biomarkers, CANES evaluates the performance of single/multi-biomarker candidates, based on four classification methods, support vector machine, random forest, neural networks, and classification and regression trees. In addition, CANES offers several advantages over earlier analysis tools, including: 1) survival analysis; 2) evaluation of mature miRNAs as markers for user-defined diagnostic or prognostic purposes; and 3) provision of a “pan-cancer” summary view, based on each single marker. We believe that such “landscape” evaluation of single/multi-biomarkers, for diagnostic therapeutic/prognostic decision-making, will be highly valuable for the discovery and “repurposing” of existing biomarkers (and their specific targeted therapies), leading to improved patient therapeutic stratification, a key component of targeted therapy success for the avoidance of therapy resistance. PMID:29050243
DATA-MEAns: an open source tool for the classification and management of neural ensemble recordings.
Bonomini, María P; Ferrandez, José M; Bolea, Jose Angel; Fernandez, Eduardo
2005-10-30
The number of laboratories using techniques that allow to acquire simultaneous recordings of as many units as possible is considerably increasing. However, the development of tools used to analyse this multi-neuronal activity is generally lagging behind the development of the tools used to acquire these data. Moreover, the data exchange between research groups using different multielectrode acquisition systems is hindered by commercial constraints such as exclusive file structures, high priced licenses and hard policies on intellectual rights. This paper presents a free open-source software for the classification and management of neural ensemble data. The main goal is to provide a graphical user interface that links the experimental data to a basic set of routines for analysis, visualization and classification in a consistent framework. To facilitate the adaptation and extension as well as the addition of new routines, tools and algorithms for data analysis, the source code and documentation are freely available.
Socoró, Joan Claudi; Alías, Francesc; Alsina-Pagès, Rosa Ma
2017-10-12
One of the main aspects affecting the quality of life of people living in urban and suburban areas is their continued exposure to high Road Traffic Noise (RTN) levels. Until now, noise measurements in cities have been performed by professionals, recording data in certain locations to build a noise map afterwards. However, the deployment of Wireless Acoustic Sensor Networks (WASN) has enabled automatic noise mapping in smart cities. In order to obtain a reliable picture of the RTN levels affecting citizens, Anomalous Noise Events (ANE) unrelated to road traffic should be removed from the noise map computation. To this aim, this paper introduces an Anomalous Noise Event Detector (ANED) designed to differentiate between RTN and ANE in real time within a predefined interval running on the distributed low-cost acoustic sensors of a WASN. The proposed ANED follows a two-class audio event detection and classification approach, instead of multi-class or one-class classification schemes, taking advantage of the collection of representative acoustic data in real-life environments. The experiments conducted within the DYNAMAP project, implemented on ARM-based acoustic sensors, show the feasibility of the proposal both in terms of computational cost and classification performance using standard Mel cepstral coefficients and Gaussian Mixture Models (GMM). The two-class GMM core classifier relatively improves the baseline universal GMM one-class classifier F1 measure by 18.7% and 31.8% for suburban and urban environments, respectively, within the 1-s integration interval. Nevertheless, according to the results, the classification performance of the current ANED implementation still has room for improvement.
NASA Astrophysics Data System (ADS)
Teutsch, Michael; Saur, Günter
2011-11-01
Spaceborne SAR imagery offers high capability for wide-ranging maritime surveillance especially in situations, where AIS (Automatic Identification System) data is not available. Therefore, maritime objects have to be detected and optional information such as size, orientation, or object/ship class is desired. In recent research work, we proposed a SAR processing chain consisting of pre-processing, detection, segmentation, and classification for single-polarimetric (HH) TerraSAR-X StripMap images to finally assign detection hypotheses to class "clutter", "non-ship", "unstructured ship", or "ship structure 1" (bulk carrier appearance) respectively "ship structure 2" (oil tanker appearance). In this work, we extend the existing processing chain and are now able to handle full-polarimetric (HH, HV, VH, VV) TerraSAR-X data. With the possibility of better noise suppression using the different polarizations, we slightly improve both the segmentation and the classification process. In several experiments we demonstrate the potential benefit for segmentation and classification. Precision of size and orientation estimation as well as correct classification rates are calculated individually for single- and quad-polarization and compared to each other.
NASA Astrophysics Data System (ADS)
Shenoy Handiru, Vikram; Vinod, A. P.; Guan, Cuntai
2017-08-01
Objective. In electroencephalography (EEG)-based brain-computer interface (BCI) systems for motor control tasks the conventional practice is to decode motor intentions by using scalp EEG. However, scalp EEG only reveals certain limited information about the complex tasks of movement with a higher degree of freedom. Therefore, our objective is to investigate the effectiveness of source-space EEG in extracting relevant features that discriminate arm movement in multiple directions. Approach. We have proposed a novel feature extraction algorithm based on supervised factor analysis that models the data from source-space EEG. To this end, we computed the features from the source dipoles confined to Brodmann areas of interest (BA4a, BA4p and BA6). Further, we embedded class-wise labels of multi-direction (multi-class) source-space EEG to an unsupervised factor analysis to make it into a supervised learning method. Main Results. Our approach provided an average decoding accuracy of 71% for the classification of hand movement in four orthogonal directions, that is significantly higher (>10%) than the classification accuracy obtained using state-of-the-art spatial pattern features in sensor space. Also, the group analysis on the spectral characteristics of source-space EEG indicates that the slow cortical potentials from a set of cortical source dipoles reveal discriminative information regarding the movement parameter, direction. Significance. This study presents evidence that low-frequency components in the source space play an important role in movement kinematics, and thus it may lead to new strategies for BCI-based neurorehabilitation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morley, Steven
The PyForecastTools package provides Python routines for calculating metrics for model validation, forecast verification and model comparison. For continuous predictands the package provides functions for calculating bias (mean error, mean percentage error, median log accuracy, symmetric signed bias), and for calculating accuracy (mean squared error, mean absolute error, mean absolute scaled error, normalized RMSE, median symmetric accuracy). Convenience routines to calculate the component parts (e.g. forecast error, scaled error) of each metric are also provided. To compare models the package provides: generic skill score; percent better. Robust measures of scale including median absolute deviation, robust standard deviation, robust coefficient ofmore » variation and the Sn estimator are all provided by the package. Finally, the package implements Python classes for NxN contingency tables. In the case of a multi-class prediction, accuracy and skill metrics such as proportion correct and the Heidke and Peirce skill scores are provided as object methods. The special case of a 2x2 contingency table inherits from the NxN class and provides many additional metrics for binary classification: probability of detection, probability of false detection, false alarm ration, threat score, equitable threat score, bias. Confidence intervals for many of these quantities can be calculated using either the Wald method or Agresti-Coull intervals.« less
Guo, Yang; Liu, Shuhui; Li, Zhanhuai; Shang, Xuequn
2018-04-11
The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data. In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification. The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data.
Deep Multi-Task Learning for Tree Genera Classification
NASA Astrophysics Data System (ADS)
Ko, C.; Kang, J.; Sohn, G.
2018-05-01
The goal for our paper is to classify tree genera using airborne Light Detection and Ranging (LiDAR) data with Convolution Neural Network (CNN) - Multi-task Network (MTN) implementation. Unlike Single-task Network (STN) where only one task is assigned to the learning outcome, MTN is a deep learning architect for learning a main task (classification of tree genera) with other tasks (in our study, classification of coniferous and deciduous) simultaneously, with shared classification features. The main contribution of this paper is to improve classification accuracy from CNN-STN to CNN-MTN. This is achieved by introducing a concurrence loss (Lcd) to the designed MTN. This term regulates the overall network performance by minimizing the inconsistencies between the two tasks. Results show that we can increase the classification accuracy from 88.7 % to 91.0 % (from STN to MTN). The second goal of this paper is to solve the problem of small training sample size by multiple-view data generation. The motivation of this goal is to address one of the most common problems in implementing deep learning architecture, the insufficient number of training data. We address this problem by simulating training dataset with multiple-view approach. The promising results from this paper are providing a basis for classifying a larger number of dataset and number of classes in the future.
Danielson, Patrick; Yang, Limin; Jin, Suming; Homer, Collin G.; Napton, Darrell
2016-01-01
We developed a method that analyzes the quality of the cultivated cropland class mapped in the USA National Land Cover Database (NLCD) 2006. The method integrates multiple geospatial datasets and a Multi Index Integrated Change Analysis (MIICA) change detection method that captures spectral changes to identify the spatial distribution and magnitude of potential commission and omission errors for the cultivated cropland class in NLCD 2006. The majority of the commission and omission errors in NLCD 2006 are in areas where cultivated cropland is not the most dominant land cover type. The errors are primarily attributed to the less accurate training dataset derived from the National Agricultural Statistics Service Cropland Data Layer dataset. In contrast, error rates are low in areas where cultivated cropland is the dominant land cover. Agreement between model-identified commission errors and independently interpreted reference data was high (79%). Agreement was low (40%) for omission error comparison. The majority of the commission errors in the NLCD 2006 cultivated crops were confused with low-intensity developed classes, while the majority of omission errors were from herbaceous and shrub classes. Some errors were caused by inaccurate land cover change from misclassification in NLCD 2001 and the subsequent land cover post-classification process.
NASA Astrophysics Data System (ADS)
Jahncke, Raymond; Leblon, Brigitte; Bush, Peter; LaRocque, Armand
2018-06-01
Wetland maps currently in use by the Province of Nova Scotia, namely the Department of Natural Resources (DNR) wetland inventory map and the swamp wetland classes of the DNR forest map, need to be updated. In this study, wetlands were mapped in an area southwest of Halifax, Nova Scotia by classifying a combination of multi-date and multi-beam RADARSAT-2 C-band polarimetric SAR (polSAR) images with spring Lidar, and fall QuickBird optical data using the Random Forests (RF) classifier. The resulting map has five wetland classes (open-water/marsh complex, open bog, open fen, shrub/treed fen/bog, swamp), plus lakes and various upland classes. Its accuracy was assessed using data from 156 GPS wetland sites collected in 2012 and compared to the one obtained with the current wetland map of Nova Scotia. The best overall classification was obtained using a combination of Lidar, RADARSAT-2 HH, HV, VH, VV intensity with polarimetric variables, and QuickBird multispectral (89.2%). The classified image was compared to GPS validation sites to assess the mapping accuracy of the wetlands. It was first done considering a group consisting of all wetland classes including lakes. This showed that only 69.9% of the wetland sites were correctly identified when only the QuickBird classified image was used in the classification. With the addition of variables derived from lidar, the number of correctly identified wetlands increased to 88.5%. The accuracy remained the same with the addition of RADARSAT-2 (88.5%). When we tested the accuracy for identifying wetland classes (e.g. marsh complex vs. open bog) instead of grouped wetlands, the resulting wetland map performed best with either QuickBird and Lidar, or QuickBird, Lidar, and RADARSAT-2 (66%). The Province of Nova Scotia's current wetland inventory and its associated wetland classes (aerial-photo interpreted) were also assessed against the GPS wetland sites. This provincial inventory correctly identified 62.2% of the grouped wetlands and only 18.6% of the wetland classes. The current inventory's poor performance demonstrates the value of incorporating a combination of new data sources into the provincial wetland mapping.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sanchez Almeida, J.; Allende Prieto, C., E-mail: jos@iac.es, E-mail: callende@iac.es
2013-01-20
Large spectroscopic surveys require automated methods of analysis. This paper explores the use of k-means clustering as a tool for automated unsupervised classification of massive stellar spectral catalogs. The classification criteria are defined by the data and the algorithm, with no prior physical framework. We work with a representative set of stellar spectra associated with the Sloan Digital Sky Survey (SDSS) SEGUE and SEGUE-2 programs, which consists of 173,390 spectra from 3800 to 9200 A sampled on 3849 wavelengths. We classify the original spectra as well as the spectra with the continuum removed. The second set only contains spectral lines,more » and it is less dependent on uncertainties of the flux calibration. The classification of the spectra with continuum renders 16 major classes. Roughly speaking, stars are split according to their colors, with enough finesse to distinguish dwarfs from giants of the same effective temperature, but with difficulties to separate stars with different metallicities. There are classes corresponding to particular MK types, intrinsically blue stars, dust-reddened, stellar systems, and also classes collecting faulty spectra. Overall, there is no one-to-one correspondence between the classes we derive and the MK types. The classification of spectra without continuum renders 13 classes, the color separation is not so sharp, but it distinguishes stars of the same effective temperature and different metallicities. Some classes thus obtained present a fairly small range of physical parameters (200 K in effective temperature, 0.25 dex in surface gravity, and 0.35 dex in metallicity), so that the classification can be used to estimate the main physical parameters of some stars at a minimum computational cost. We also analyze the outliers of the classification. Most of them turn out to be failures of the reduction pipeline, but there are also high redshift QSOs, multiple stellar systems, dust-reddened stars, galaxies, and, finally, odd spectra whose nature we have not deciphered. The template spectra representative of the classes are publicly available in the online journal.« less
Complex extreme learning machine applications in terahertz pulsed signals feature sets.
Yin, X-X; Hadjiloucas, S; Zhang, Y
2014-11-01
This paper presents a novel approach to the automatic classification of very large data sets composed of terahertz pulse transient signals, highlighting their potential use in biochemical, biomedical, pharmaceutical and security applications. Two different types of THz spectra are considered in the classification process. Firstly a binary classification study of poly-A and poly-C ribonucleic acid samples is performed. This is then contrasted with a difficult multi-class classification problem of spectra from six different powder samples that although have fairly indistinguishable features in the optical spectrum, they also possess a few discernable spectral features in the terahertz part of the spectrum. Classification is performed using a complex-valued extreme learning machine algorithm that takes into account features in both the amplitude as well as the phase of the recorded spectra. Classification speed and accuracy are contrasted with that achieved using a support vector machine classifier. The study systematically compares the classifier performance achieved after adopting different Gaussian kernels when separating amplitude and phase signatures. The two signatures are presented as feature vectors for both training and testing purposes. The study confirms the utility of complex-valued extreme learning machine algorithms for classification of the very large data sets generated with current terahertz imaging spectrometers. The classifier can take into consideration heterogeneous layers within an object as would be required within a tomographic setting and is sufficiently robust to detect patterns hidden inside noisy terahertz data sets. The proposed study opens up the opportunity for the establishment of complex-valued extreme learning machine algorithms as new chemometric tools that will assist the wider proliferation of terahertz sensing technology for chemical sensing, quality control, security screening and clinic diagnosis. Furthermore, the proposed algorithm should also be very useful in other applications requiring the classification of very large datasets. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Land cover heterogeneity and soil respiration in a west Greenland tundra landscape
NASA Astrophysics Data System (ADS)
Bradley-Cook, J. I.; Burzynski, A.; Hammond, C. R.; Virginia, R. A.
2011-12-01
Multiple direct and indirect pathways underlie the association between land cover classification, temperature and soil respiration. Temperature is a main control of the biological processes that constitute soil respiration, yet the effect of changing atmospheric temperatures on soil carbon flux is unresolved. This study examines associations amongst land cover, soil carbon characteristics, soil respiration, and temperature in an Arctic tundra landscape in western Greenland. We used a 1.34 meter resolution multi-spectral WorldView2 satellite image to conduct an unsupervised multi-staged ISODATA classification to characterize land cover heterogeneity. The four band image was taken on July 10th, 2010, and captures an 18 km by 15 km area in the vicinity of Kangerlussuaq. The four major terrestrial land cover classes identified were: shrub-dominated, graminoid-dominated, mixed vegetation, and bare soil. The bare soil class was comprised of patches where surface soil has been deflated by wind and ridge-top fellfield. We hypothesize that soil respiration and soil carbon storage are associated with land cover classification and temperature. We set up a hierarchical field sampling design to directly observe spatial variation between and within land cover classes along a 20 km temperature gradient extending west from Russell Glacier on the margin of the Greenland Ice Sheet. We used the land cover classification map and ground verification to select nine sites, each containing patches of the four land cover classes. Within each patch we collected soil samples from a 50 cm pit, quantified vegetation, measured active layer depth and determined landscape characteristics. From a subset of field sites we collected additional 10 cm surface soil samples to estimate soil heterogeneity within patches and measured soil respiration using a LiCor 8100 Infrared Gas Analyzer. Soil respiration rates varied with land cover classes, with values ranging from 0.2 mg C/m^2/hr in the bare soil class to over 5 mg C/m^2/hr in the graminoid-dominated class. These findings suggest that shifts in land cover vegetation types, especially soil and vegetation loss (e.g. from wind deflation), can alter landscape soil respiration. We relate soil respiration measurements to soil, vegetation, and permafrost characteristics to understand how ecosystem properties and processes vary at the landscape scale. A long-term goal of this research is to develop a spatially explicit model of soil organic matter, soil respiration, and temperature sensitivity of soil carbon dynamics for a western Greenland permafrost tundra ecosystems.
Neyman-Pearson classification algorithms and NP receiver operating characteristics
Tong, Xin; Feng, Yang; Li, Jingyi Jessica
2018-01-01
In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, α, on the type I error. Despite its century-long history in hypothesis testing, the NP paradigm has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than α do not satisfy the type I error control objective because the resulting classifiers are likely to have type I errors much larger than α, and the NP paradigm has not been properly implemented in practice. We develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, such as logistic regression, support vector machines, and random forests. Powered by this algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands motivated by the popular ROC curves. NP-ROC bands will help choose α in a data-adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data studies. PMID:29423442
Neyman-Pearson classification algorithms and NP receiver operating characteristics.
Tong, Xin; Feng, Yang; Li, Jingyi Jessica
2018-02-01
In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, α, on the type I error. Despite its century-long history in hypothesis testing, the NP paradigm has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than α do not satisfy the type I error control objective because the resulting classifiers are likely to have type I errors much larger than α, and the NP paradigm has not been properly implemented in practice. We develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, such as logistic regression, support vector machines, and random forests. Powered by this algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands motivated by the popular ROC curves. NP-ROC bands will help choose α in a data-adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data studies.
Ferris, Laura K; Farberg, Aaron S; Middlebrook, Brooke; Johnson, Clare E; Lassen, Natalie; Oelschlager, Kristen M; Maetzold, Derek J; Cook, Robert W; Rigel, Darrell S; Gerami, Pedram
2017-05-01
A significant proportion of patients with American Joint Committee on Cancer (AJCC)-defined early-stage cutaneous melanoma have disease recurrence and die. A 31-gene expression profile (GEP) that accurately assesses metastatic risk associated with primary cutaneous melanomas has been described. We sought to compare accuracy of the GEP in combination with risk determined using the web-based AJCC Individualized Melanoma Patient Outcome Prediction Tool. GEP results from 205 stage I/II cutaneous melanomas with sufficient clinical data for prognostication using the AJCC tool were classified as low (class 1) or high (class 2) risk. Two 5-year overall survival cutoffs (AJCC 79% and 68%), reflecting survival for patients with stage IIA or IIB disease, respectively, were assigned for binary AJCC risk. Cox univariate analysis revealed significant risk classification of distant metastasis-free and overall survival (hazard ratio range 3.2-9.4, P < .001) for both tools. In all, 43 (21%) cases had discordant GEP and AJCC classification (using 79% cutoff). Eleven of 13 (85%) deaths in that group were predicted as high risk by GEP but low risk by AJCC. Specimens reflect tertiary care center referrals; more effective therapies have been approved for clinical use after accrual. The GEP provides valuable prognostic information and improves identification of high-risk melanomas when used together with the AJCC online prediction tool. Copyright © 2016 American Academy of Dermatology, Inc. Published by Elsevier Inc. All rights reserved.
Arctic Sea Ice Classification and Mapping for Surface Albedo Parameterization in Sea Ice Modeling
NASA Astrophysics Data System (ADS)
Nghiem, S. V.; Clemente-Colón, P.; Perovich, D. K.; Polashenski, C.; Simpson, W. R.; Rigor, I. G.; Woods, J. E.; Nguyen, D. T.; Neumann, G.
2016-12-01
A regime shift of Arctic sea ice from predominantly perennial sea ice (multi-year ice or MYI) to seasonal sea ice (first-year ice or FYI) has occurred in recent decades. This shift has profoundly altered the proportional composition of different sea ice classes and the surface albedo distribution pertaining to each sea ice class. Such changes impacts physical, chemical, and biological processes in the Arctic atmosphere-ice-ocean system. The drastic changes upset the traditional geophysical representation of surface albedo of the Arctic sea ice cover in current models. A critical science issue is that these profound changes must be rigorously and systematically observed and characterized to enable a transformative re-parameterization of key model inputs, such as ice surface albedo, to ice-ocean-atmosphere climate modeling in order to obtain re-analyses that accurately reproduce Arctic changes and also to improve sea ice and weather forecast models. Addressing this challenge is a strategy identified by the National Research Council study on "Seasonal to Decadal Predictions of Arctic Sea Ice - Challenges and Strategies" to replicate the new Arctic reality. We review results of albedo characteristics associated with different sea ice classes such as FYI and MYI. Then we demonstrate the capability for sea ice classification and mapping using algorithms developed by the Jet Propulsion Laboratory and by the U.S. National Ice Center for use with multi-sourced satellite radar data at L, C, and Ku bands. Results obtained with independent algorithms for different radar frequencies consistently identify sea ice classes and thereby cross-verify the sea ice classification methods. Moreover, field observations obtained from buoy webcams and along an extensive trek across Elson Lagoon and a sector of the Beaufort Sea during the BRomine, Ozone, and Mercury EXperiment (BROMEX) in March 2012 are used to validate satellite products of sea ice classes. This research enables the mapping of Arctic sea ice classes over multiple decades using multiple satellite radar datasets with both coarse resolution for synoptic scales and high resolution for local and regional scales, which are crucial for realistic surface albedo parameterization to significantly advance sea ice forecast and projection models.
Gabriele, Domenico; Jereczek-Fossa, Barbara A; Krengli, Marco; Garibaldi, Elisabetta; Tessa, Maria; Moro, Gregorio; Girelli, Giuseppe; Gabriele, Pietro
2016-02-24
The aim of this work is to develop an algorithm to predict recurrence in prostate cancer patients treated with radical radiotherapy, getting up to a prognostic power higher than traditional D'Amico risk classification. Two thousand four hundred ninety-three men belonging to the EUREKA-2 retrospective multi-centric database on prostate cancer and treated with external-beam radiotherapy as primary treatment comprised the study population. A Cox regression time to PSA failure analysis was performed in univariate and multivariate settings, evaluating the predictive ability of age, pre-treatment PSA, clinical-radiological staging, Gleason score and percentage of positive cores at biopsy (%PC). The accuracy of this model was checked with bootstrapping statistics. Subgroups for all the variables' combinations were combined to classify patients into five different "Candiolo" risk-classes for biochemical Progression Free Survival (bPFS); thereafter, they were also applied to clinical PFS (cPFS), systemic PFS (sPFS) and Prostate Cancer Specific Survival (PCSS), and compared to D'Amico risk grouping performances. The Candiolo classifier splits patients in 5 risk-groups with the following 10-years bPFS, cPFS, sPFS and PCSS: for very-low-risk 90 %, 94 %, 100 % and 100 %; for low-risk 74 %, 88 %, 94 % and 98 %; for intermediate-risk 60 %, 82 %, 91 % and 92 %; for high-risk 43 %, 55 %, 80 % and 89 % and for very-high-risk 14 %, 38 %, 56 % and 70 %. Our classifier outperforms D'Amico risk classes for all the end-points evaluated, with concordance indexes of 71.5 %, 75.5 %, 80 % and 80.5 % versus 63 %, 65.5 %, 69.5 % and 69 %, respectively. Our classification tool, combining five clinical and easily available parameters, seems to better stratify patients in predicting prostate cancer recurrence after radiotherapy compared to the traditional D'Amico risk classes.
Action Recognition Using 3D Histograms of Texture and A Multi-Class Boosting Classifier.
Zhang, Baochang; Yang, Yun; Chen, Chen; Yang, Linlin; Han, Jungong; Shao, Ling
2017-10-01
Human action recognition is an important yet challenging task. This paper presents a low-cost descriptor called 3D histograms of texture (3DHoTs) to extract discriminant features from a sequence of depth maps. 3DHoTs are derived from projecting depth frames onto three orthogonal Cartesian planes, i.e., the frontal, side, and top planes, and thus compactly characterize the salient information of a specific action, on which texture features are calculated to represent the action. Besides this fast feature descriptor, a new multi-class boosting classifier (MBC) is also proposed to efficiently exploit different kinds of features in a unified framework for action classification. Compared with the existing boosting frameworks, we add a new multi-class constraint into the objective function, which helps to maintain a better margin distribution by maximizing the mean of margin, whereas still minimizing the variance of margin. Experiments on the MSRAction3D, MSRGesture3D, MSRActivity3D, and UTD-MHAD data sets demonstrate that the proposed system combining 3DHoTs and MBC is superior to the state of the art.
A hybrid three-class brain-computer interface system utilizing SSSEPs and transient ERPs
NASA Astrophysics Data System (ADS)
Breitwieser, Christian; Pokorny, Christoph; Müller-Putz, Gernot R.
2016-12-01
Objective. This paper investigates the fusion of steady-state somatosensory evoked potentials (SSSEPs) and transient event-related potentials (tERPs), evoked through tactile simulation on the left and right-hand fingertips, in a three-class EEG based hybrid brain-computer interface. It was hypothesized, that fusing the input signals leads to higher classification rates than classifying tERP and SSSEP individually. Approach. Fourteen subjects participated in the studies, consisting of a screening paradigm to determine person dependent resonance-like frequencies and a subsequent online paradigm. The whole setup of the BCI system was based on open interfaces, following suggestions for a common implementation platform. During the online experiment, subjects were instructed to focus their attention on the stimulated fingertips as indicated by a visual cue. The recorded data were classified during runtime using a multi-class shrinkage LDA classifier and the outputs were fused together applying a posterior probability based fusion. Data were further analyzed offline, involving a combined classification of SSSEP and tERP features as a second fusion principle. The final results were tested for statistical significance applying a repeated measures ANOVA. Main results. A significant classification increase was achieved when fusing the results with a combined classification compared to performing an individual classification. Furthermore, the SSSEP classifier was significantly better in detecting a non-control state, whereas the tERP classifier was significantly better in detecting control states. Subjects who had a higher relative band power increase during the screening session also achieved significantly higher classification results than subjects with lower relative band power increase. Significance. It could be shown that utilizing SSSEP and tERP for hBCIs increases the classification accuracy and also that tERP and SSSEP are not classifying control- and non-control states with the same level of accuracy.
Characterization of agricultural land using singular value decomposition
NASA Astrophysics Data System (ADS)
Herries, Graham M.; Danaher, Sean; Selige, Thomas
1995-11-01
A method is defined and tested for the characterization of agricultural land from multi-spectral imagery, based on singular value decomposition (SVD) and key vector analysis. The SVD technique, which bears a close resemblance to multivariate statistic techniques, has previously been successfully applied to problems of signal extraction for marine data and forestry species classification. In this study the SVD technique is used as a classifier for agricultural regions, using airborne Daedalus ATM data, with 1 m resolution. The specific region chosen is an experimental research farm in Bavaria, Germany. This farm has a large number of crops, within a very small region and hence is not amenable to existing techniques. There are a number of other significant factors which render existing techniques such as the maximum likelihood algorithm less suitable for this area. These include a very dynamic terrain and tessellated pattern soil differences, which together cause large variations in the growth characteristics of the crops. The SVD technique is applied to this data set using a multi-stage classification approach, removing unwanted land-cover classes one step at a time. Typical classification accuracy's for SVD are of the order of 85-100%. Preliminary results indicate that it is a fast and efficient classifier with the ability to differentiate between crop types such as wheat, rye, potatoes and clover. The results of characterizing 3 sub-classes of Winter Wheat are also shown.
exprso: an R-package for the rapid implementation of machine learning algorithms.
Quinn, Thomas; Tylee, Daniel; Glatt, Stephen
2016-01-01
Machine learning plays a major role in many scientific investigations. However, non-expert programmers may struggle to implement the elaborate pipelines necessary to build highly accurate and generalizable models. We introduce exprso , a new R package that is an intuitive machine learning suite designed specifically for non-expert programmers. Built initially for the classification of high-dimensional data, exprso uses an object-oriented framework to encapsulate a number of common analytical methods into a series of interchangeable modules. This includes modules for feature selection, classification, high-throughput parameter grid-searching, elaborate cross-validation schemes (e.g., Monte Carlo and nested cross-validation), ensemble classification, and prediction. In addition, exprso also supports multi-class classification (through the 1-vs-all generalization of binary classifiers) and the prediction of continuous outcomes.
Shamim, Mohammad Tabrez Anwar; Anwaruddin, Mohammad; Nagarajaram, H A
2007-12-15
Fold recognition is a key step in the protein structure discovery process, especially when traditional sequence comparison methods fail to yield convincing structural homologies. Although many methods have been developed for protein fold recognition, their accuracies remain low. This can be attributed to insufficient exploitation of fold discriminatory features. We have developed a new method for protein fold recognition using structural information of amino acid residues and amino acid residue pairs. Since protein fold recognition can be treated as a protein fold classification problem, we have developed a Support Vector Machine (SVM) based classifier approach that uses secondary structural state and solvent accessibility state frequencies of amino acids and amino acid pairs as feature vectors. Among the individual properties examined secondary structural state frequencies of amino acids gave an overall accuracy of 65.2% for fold discrimination, which is better than the accuracy by any method reported so far in the literature. Combination of secondary structural state frequencies with solvent accessibility state frequencies of amino acids and amino acid pairs further improved the fold discrimination accuracy to more than 70%, which is approximately 8% higher than the best available method. In this study we have also tested, for the first time, an all-together multi-class method known as Crammer and Singer method for protein fold classification. Our studies reveal that the three multi-class classification methods, namely one versus all, one versus one and Crammer and Singer method, yield similar predictions. Dataset and stand-alone program are available upon request.
NASA Astrophysics Data System (ADS)
Gonulalan, Cansu
In recent years, there has been an increasing demand for applications to monitor the targets related to land-use, using remote sensing images. Advances in remote sensing satellites give rise to the research in this area. Many applications ranging from urban growth planning to homeland security have already used the algorithms for automated object recognition from remote sensing imagery. However, they have still problems such as low accuracy on detection of targets, specific algorithms for a specific area etc. In this thesis, we focus on an automatic approach to classify and detect building foot-prints, road networks and vegetation areas. The automatic interpretation of visual data is a comprehensive task in computer vision field. The machine learning approaches improve the capability of classification in an intelligent way. We propose a method, which has high accuracy on detection and classification. The multi class classification is developed for detecting multiple objects. We present an AdaBoost-based approach along with the supervised learning algorithm. The combi- nation of AdaBoost with "Attentional Cascade" is adopted from Viola and Jones [1]. This combination decreases the computation time and gives opportunity to real time applications. For the feature extraction step, our contribution is to combine Haar-like features that include corner, rectangle and Gabor. Among all features, AdaBoost selects only critical features and generates in extremely efficient cascade structured classifier. Finally, we present and evaluate our experimental results. The overall system is tested and high performance of detection is achieved. The precision rate of the final multi-class classifier is over 98%.
Kocevar, Gabriel; Stamile, Claudio; Hannoun, Salem; Cotton, François; Vukusic, Sandra; Durand-Dubief, Françoise; Sappey-Marinier, Dominique
2016-01-01
Purpose: In this work, we introduce a method to classify Multiple Sclerosis (MS) patients into four clinical profiles using structural connectivity information. For the first time, we try to solve this question in a fully automated way using a computer-based method. The main goal is to show how the combination of graph-derived metrics with machine learning techniques constitutes a powerful tool for a better characterization and classification of MS clinical profiles. Materials and Methods: Sixty-four MS patients [12 Clinical Isolated Syndrome (CIS), 24 Relapsing Remitting (RR), 24 Secondary Progressive (SP), and 17 Primary Progressive (PP)] along with 26 healthy controls (HC) underwent MR examination. T1 and diffusion tensor imaging (DTI) were used to obtain structural connectivity matrices for each subject. Global graph metrics, such as density and modularity, were estimated and compared between subjects' groups. These metrics were further used to classify patients using tuned Support Vector Machine (SVM) combined with Radial Basic Function (RBF) kernel. Results: When comparing MS patients to HC subjects, a greater assortativity, transitivity, and characteristic path length as well as a lower global efficiency were found. Using all graph metrics, the best F -Measures (91.8, 91.8, 75.6, and 70.6%) were obtained for binary (HC-CIS, CIS-RR, RR-PP) and multi-class (CIS-RR-SP) classification tasks, respectively. When using only one graph metric, the best F -Measures (83.6, 88.9, and 70.7%) were achieved for modularity with previous binary classification tasks. Conclusion: Based on a simple DTI acquisition associated with structural brain connectivity analysis, this automatic method allowed an accurate classification of different MS patients' clinical profiles.
2017-01-01
One of the main aspects affecting the quality of life of people living in urban and suburban areas is their continued exposure to high Road Traffic Noise (RTN) levels. Until now, noise measurements in cities have been performed by professionals, recording data in certain locations to build a noise map afterwards. However, the deployment of Wireless Acoustic Sensor Networks (WASN) has enabled automatic noise mapping in smart cities. In order to obtain a reliable picture of the RTN levels affecting citizens, Anomalous Noise Events (ANE) unrelated to road traffic should be removed from the noise map computation. To this aim, this paper introduces an Anomalous Noise Event Detector (ANED) designed to differentiate between RTN and ANE in real time within a predefined interval running on the distributed low-cost acoustic sensors of a WASN. The proposed ANED follows a two-class audio event detection and classification approach, instead of multi-class or one-class classification schemes, taking advantage of the collection of representative acoustic data in real-life environments. The experiments conducted within the DYNAMAP project, implemented on ARM-based acoustic sensors, show the feasibility of the proposal both in terms of computational cost and classification performance using standard Mel cepstral coefficients and Gaussian Mixture Models (GMM). The two-class GMM core classifier relatively improves the baseline universal GMM one-class classifier F1 measure by 18.7% and 31.8% for suburban and urban environments, respectively, within the 1-s integration interval. Nevertheless, according to the results, the classification performance of the current ANED implementation still has room for improvement. PMID:29023397
Dieye, A.M.; Roy, David P.; Hanan, N.P.; Liu, S.; Hansen, M.; Toure, A.
2012-01-01
Spatially explicit land cover land use (LCLU) change information is needed to drive biogeochemical models that simulate soil organic carbon (SOC) dynamics. Such information is increasingly being mapped using remotely sensed satellite data with classification schemes and uncertainties constrained by the sensing system, classification algorithms and land cover schemes. In this study, automated LCLU classification of multi-temporal Landsat satellite data were used to assess the sensitivity of SOC modeled by the Global Ensemble Biogeochemical Modeling System (GEMS). The GEMS was run for an area of 1560 km2 in Senegal under three climate change scenarios with LCLU maps generated using different Landsat classification approaches. This research provides a method to estimate the variability of SOC, specifically the SOC uncertainty due to satellite classification errors, which we show is dependent not only on the LCLU classification errors but also on where the LCLU classes occur relative to the other GEMS model inputs.
Computer Aided Detection of Breast Masses in Digital Tomosynthesis
2008-06-01
the suspicious CAD location were extracted. For the second set, 256x256 ROIs representing the - 8 - summed slab of 5 slices (5 mm) were extracted...region hotelling observer, digital tomosynthesis, multi-slice CAD algorithms, biopsy 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT 18...developing computer-aided detection ( CAD ) tools for mammography. Although these tools have shown promise in identifying calcifications, detecting
Multiple directed graph large-class multi-spectral processor
NASA Technical Reports Server (NTRS)
Casasent, David; Liu, Shiaw-Dong; Yoneyama, Hideyuki
1988-01-01
Numerical analysis techniques for the interpretation of high-resolution imaging-spectrometer data are described and demonstrated. The method proposed involves the use of (1) a hierarchical classifier with a tree structure generated automatically by a Fisher linear-discriminant-function algorithm and (2) a novel multiple-directed-graph scheme which reduces the local maxima and the number of perturbations required. Results for a 500-class test problem involving simulated imaging-spectrometer data are presented in tables and graphs; 100-percent-correct classification is achieved with an improvement factor of 5.
Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali
2017-01-01
Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports.
Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali
2017-01-01
Objectives Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Methods Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Results Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. Conclusion The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports. PMID:28166263
On search guide phrase compilation for recommending home medical products.
Luo, Gang
2010-01-01
To help people find desired home medical products (HMPs), we developed an intelligent personal health record (iPHR) system that can automatically recommend HMPs based on users' health issues. Using nursing knowledge, we pre-compile a set of "search guide" phrases that provides semantic translation from words describing health issues to their underlying medical meanings. Then iPHR automatically generates queries from those phrases and uses them and a search engine to retrieve HMPs. To avoid missing relevant HMPs during retrieval, the compiled search guide phrases need to be comprehensive. Such compilation is a challenging task because nursing knowledge updates frequently and contains numerous details scattered in many sources. This paper presents a semi-automatic tool facilitating such compilation. Our idea is to formulate the phrase compilation task as a multi-label classification problem. For each newly obtained search guide phrase, we first use nursing knowledge and information retrieval techniques to identify a small set of potentially relevant classes with corresponding hints. Then a nurse makes the final decision on assigning this phrase to proper classes based on those hints. We demonstrate the effectiveness of our techniques by compiling search guide phrases from an occupational therapy textbook.
GENE-07. MOLECULAR NEUROPATHOLOGY 2.0 - INCREASING DIAGNOSTIC ACCURACY IN PEDIATRIC NEUROONCOLOGY
Sturm, Dominik; Jones, David T.W.; Capper, David; Sahm, Felix; von Deimling, Andreas; Rutkoswki, Stefan; Warmuth-Metz, Monika; Bison, Brigitte; Gessi, Marco; Pietsch, Torsten; Pfister, Stefan M.
2017-01-01
Abstract The classification of central nervous system (CNS) tumors into clinically and biologically distinct entities and subgroups is challenging. Children and adolescents can be affected by >100 histological variants with very variable outcomes, some of which are exceedingly rare. The current WHO classification has introduced a number of novel molecular markers to aid routine neuropathological diagnostics, and DNA methylation profiling is emerging as a powerful tool to distinguish CNS tumor classes. The Molecular Neuropathology 2.0 study aims to integrate genome wide (epi-)genetic diagnostics with reference neuropathological assessment for all newly-diagnosed pediatric brain tumors in Germany. To date, >350 patients have been enrolled. A molecular diagnosis is established by epigenetic tumor classification through DNA methylation profiling and targeted panel sequencing of >130 genes to detect diagnostically and/or therapeutically useful DNA mutations, structural alterations, and fusion events. Results are aligned with the reference neuropathological diagnosis, and discrepant findings are discussed in a multi-disciplinary tumor board including reference neuroradiological evaluation. Ten FFPE sections as input material are sufficient to establish a molecular diagnosis in >95% of tumors. Alignment with reference pathology results in four broad categories: a) concordant classification (~77%), b) discrepant classification resolvable by tumor board discussion and/or additional data (~5%), c) discrepant classification without currently available options to resolve (~8%), and d) cases currently unclassifiable by molecular diagnostics (~10%). Discrepancies are enriched in certain histopathological entities, such as histological high grade gliomas with a molecularly low grade profile. Gene panel sequencing reveals predisposing germline events in ~10% of patients. Genome wide (epi-)genetic analyses add a valuable layer of information to routine neuropathological diagnostics. Our study provides insight into CNS tumors with divergent histopathological and molecular classification, opening new avenues for research discoveries and facilitating optimization of clinical management for affected patients in the future.
Classification and disease prediction via mathematical programming
NASA Astrophysics Data System (ADS)
Lee, Eva K.; Wu, Tsung-Lin
2007-11-01
In this chapter, we present classification models based on mathematical programming approaches. We first provide an overview on various mathematical programming approaches, including linear programming, mixed integer programming, nonlinear programming and support vector machines. Next, we present our effort of novel optimization-based classification models that are general purpose and suitable for developing predictive rules for large heterogeneous biological and medical data sets. Our predictive model simultaneously incorporates (1) the ability to classify any number of distinct groups; (2) the ability to incorporate heterogeneous types of attributes as input; (3) a high-dimensional data transformation that eliminates noise and errors in biological data; (4) the ability to incorporate constraints to limit the rate of misclassification, and a reserved-judgment region that provides a safeguard against over-training (which tends to lead to high misclassification rates from the resulting predictive rule) and (5) successive multi-stage classification capability to handle data points placed in the reserved judgment region. To illustrate the power and flexibility of the classification model and solution engine, and its multigroup prediction capability, application of the predictive model to a broad class of biological and medical problems is described. Applications include: the differential diagnosis of the type of erythemato-squamous diseases; predicting presence/absence of heart disease; genomic analysis and prediction of aberrant CpG island meythlation in human cancer; discriminant analysis of motility and morphology data in human lung carcinoma; prediction of ultrasonic cell disruption for drug delivery; identification of tumor shape and volume in treatment of sarcoma; multistage discriminant analysis of biomarkers for prediction of early atherosclerois; fingerprinting of native and angiogenic microvascular networks for early diagnosis of diabetes, aging, macular degeneracy and tumor metastasis; prediction of protein localization sites; and pattern recognition of satellite images in classification of soil types. In all these applications, the predictive model yields correct classification rates ranging from 80% to 100%. This provides motivation for pursuing its use as a medical diagnostic, monitoring and decision-making tool.
Multivariate detrending of fMRI signal drifts for real-time multiclass pattern classification.
Lee, Dongha; Jang, Changwon; Park, Hae-Jeong
2015-03-01
Signal drift in functional magnetic resonance imaging (fMRI) is an unavoidable artifact that limits classification performance in multi-voxel pattern analysis of fMRI. As conventional methods to reduce signal drift, global demeaning or proportional scaling disregards regional variations of drift, whereas voxel-wise univariate detrending is too sensitive to noisy fluctuations. To overcome these drawbacks, we propose a multivariate real-time detrending method for multiclass classification that involves spatial demeaning at each scan and the recursive detrending of drifts in the classifier outputs driven by a multiclass linear support vector machine. Experiments using binary and multiclass data showed that the linear trend estimation of the classifier output drift for each class (a weighted sum of drifts in the class-specific voxels) was more robust against voxel-wise artifacts that lead to inconsistent spatial patterns and the effect of online processing than voxel-wise detrending. The classification performance of the proposed method was significantly better, especially for multiclass data, than that of voxel-wise linear detrending, global demeaning, and classifier output detrending without demeaning. We concluded that the multivariate approach using classifier output detrending of fMRI signals with spatial demeaning preserves spatial patterns, is less sensitive than conventional methods to sample size, and increases classification performance, which is a useful feature for real-time fMRI classification. Copyright © 2014 Elsevier Inc. All rights reserved.
Confidence level estimation in multi-target classification problems
NASA Astrophysics Data System (ADS)
Chang, Shi; Isaacs, Jason; Fu, Bo; Shin, Jaejeong; Zhu, Pingping; Ferrari, Silvia
2018-04-01
This paper presents an approach for estimating the confidence level in automatic multi-target classification performed by an imaging sensor on an unmanned vehicle. An automatic target recognition algorithm comprised of a deep convolutional neural network in series with a support vector machine classifier detects and classifies targets based on the image matrix. The joint posterior probability mass function of target class, features, and classification estimates is learned from labeled data, and recursively updated as additional images become available. Based on the learned joint probability mass function, the approach presented in this paper predicts the expected confidence level of future target classifications, prior to obtaining new images. The proposed approach is tested with a set of simulated sonar image data. The numerical results show that the estimated confidence level provides a close approximation to the actual confidence level value determined a posteriori, i.e. after the new image is obtained by the on-board sensor. Therefore, the expected confidence level function presented in this paper can be used to adaptively plan the path of the unmanned vehicle so as to optimize the expected confidence levels and ensure that all targets are classified with satisfactory confidence after the path is executed.
Borchani, Hanen; Bielza, Concha; Martı Nez-Martı N, Pablo; Larrañaga, Pedro
2012-12-01
Multi-dimensional Bayesian network classifiers (MBCs) are probabilistic graphical models recently proposed to deal with multi-dimensional classification problems, where each instance in the data set has to be assigned to more than one class variable. In this paper, we propose a Markov blanket-based approach for learning MBCs from data. Basically, it consists of determining the Markov blanket around each class variable using the HITON algorithm, then specifying the directionality over the MBC subgraphs. Our approach is applied to the prediction problem of the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson's Disease Questionnaire (PDQ-39) in order to estimate the health-related quality of life of Parkinson's patients. Fivefold cross-validation experiments were carried out on randomly generated synthetic data sets, Yeast data set, as well as on a real-world Parkinson's disease data set containing 488 patients. The experimental study, including comparison with additional Bayesian network-based approaches, back propagation for multi-label learning, multi-label k-nearest neighbor, multinomial logistic regression, ordinary least squares, and censored least absolute deviations, shows encouraging results in terms of predictive accuracy as well as the identification of dependence relationships among class and feature variables. Copyright © 2012 Elsevier Inc. All rights reserved.
A Cognitive Computing Approach for Classification of Complaints in the Insurance Industry
NASA Astrophysics Data System (ADS)
Forster, J.; Entrup, B.
2017-10-01
In this paper we present and evaluate a cognitive computing approach for classification of dissatisfaction and four complaint specific complaint classes in correspondence documents between insurance clients and an insurance company. A cognitive computing approach includes the combination classical natural language processing methods, machine learning algorithms and the evaluation of hypothesis. The approach combines a MaxEnt machine learning algorithm with language modelling, tf-idf and sentiment analytics to create a multi-label text classification model. The result is trained and tested with a set of 2500 original insurance communication documents written in German, which have been manually annotated by the partnering insurance company. With a F1-Score of 0.9, a reliable text classification component has been implemented and evaluated. A final outlook towards a cognitive computing insurance assistant is given in the end.
A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification
Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong
2016-01-01
Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs). PMID:26985826
A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification.
Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong
2016-01-01
Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs).
NASA Astrophysics Data System (ADS)
Bernales, A. M.; Antolihao, J. A.; Samonte, C.; Campomanes, F.; Rojas, R. J.; dela Serna, A. M.; Silapan, J.
2016-06-01
The threat of the ailments related to urbanization like heat stress is very prevalent. There are a lot of things that can be done to lessen the effect of urbanization to the surface temperature of the area like using green roofs or planting trees in the area. So land use really matters in both increasing and decreasing surface temperature. It is known that there is a relationship between land use land cover (LULC) and land surface temperature (LST). Quantifying this relationship in terms of a mathematical model is very important so as to provide a way to predict LST based on the LULC alone. This study aims to examine the relationship between LST and LULC as well as to create a model that can predict LST using class-level spatial metrics from LULC. LST was derived from a Landsat 8 image and LULC classification was derived from LiDAR and Orthophoto datasets. Class-level spatial metrics were created in FRAGSTATS with the LULC and LST as inputs and these metrics were analysed using a statistical framework. Multi linear regression was done to create models that would predict LST for each class and it was found that the spatial metric "Effective mesh size" was a top predictor for LST in 6 out of 7 classes. The model created can still be refined by adding a temporal aspect by analysing the LST of another farming period (for rural areas) and looking for common predictors between LSTs of these two different farming periods.
Self-organizing ontology of biochemically relevant small molecules
2012-01-01
Background The advent of high-throughput experimentation in biochemistry has led to the generation of vast amounts of chemical data, necessitating the development of novel analysis, characterization, and cataloguing techniques and tools. Recently, a movement to publically release such data has advanced biochemical structure-activity relationship research, while providing new challenges, the biggest being the curation, annotation, and classification of this information to facilitate useful biochemical pattern analysis. Unfortunately, the human resources currently employed by the organizations supporting these efforts (e.g. ChEBI) are expanding linearly, while new useful scientific information is being released in a seemingly exponential fashion. Compounding this, currently existing chemical classification and annotation systems are not amenable to automated classification, formal and transparent chemical class definition axiomatization, facile class redefinition, or novel class integration, thus further limiting chemical ontology growth by necessitating human involvement in curation. Clearly, there is a need for the automation of this process, especially for novel chemical entities of biological interest. Results To address this, we present a formal framework based on Semantic Web technologies for the automatic design of chemical ontology which can be used for automated classification of novel entities. We demonstrate the automatic self-assembly of a structure-based chemical ontology based on 60 MeSH and 40 ChEBI chemical classes. This ontology is then used to classify 200 compounds with an accuracy of 92.7%. We extend these structure-based classes with molecular feature information and demonstrate the utility of our framework for classification of functionally relevant chemicals. Finally, we discuss an iterative approach that we envision for future biochemical ontology development. Conclusions We conclude that the proposed methodology can ease the burden of chemical data annotators and dramatically increase their productivity. We anticipate that the use of formal logic in our proposed framework will make chemical classification criteria more transparent to humans and machines alike and will thus facilitate predictive and integrative bioactivity model development. PMID:22221313
NASA Astrophysics Data System (ADS)
Kaur, Parneet; Singh, Sukhwinder; Garg, Sushil; Harmanpreet
2010-11-01
In this paper we study about classification algorithms for farm DSS. By applying classification algorithms i.e. Limited search, ID3, CHAID, C4.5, Improved C4.5 and One VS all Decision Tree on common data set of crop with specified class, results are obtained. The tool used to derive results is SPINA. The graphical results obtained from tool are compared to suggest best technique to develop farm Decision Support System. This analysis would help to researchers to design effective and fast DSS for farmer to take decision for enhancing their yield.
Optimal mapping of site-specific multivariate soil properties.
Burrough, P A; Swindell, J
1997-01-01
This paper demonstrates how geostatistics and fuzzy k-means classification can be used together to improve our practical understanding of crop yield-site response. Two aspects of soil are important for precision farming: (a) sensible classes for a given crop, and (b) their spatial variation. Local site classifications are more sensitive than general taxonomies and can be provided by the method of fuzzy k-means to transform a multivariate data set with i attributes measured at n sites into k overlapping classes; each site has a membership value mk for each class in the range 0-1. Soil variation is of interest when conditions vary over patches manageable by agricultural machinery. The spatial variation of each of the k classes can be analysed by computing the variograms of mk over the n sites. Memberships for each of the k classes can be mapped by ordinary kriging. Areas of class dominance and the transition zones between them can be identified by an inter-class confusion index; reducing the zones to boundaries gives crisp maps of dominant soil groups that can be used to guide precision farming equipment. Automation of the procedure is straightforward given sufficient data. Time variations in soil properties can be automatically incorporated in the computation of membership values. The procedures are illustrated with multi-year crop yield data collected from a 5 ha demonstration field at the Royal Agricultural College in Cirencester, UK.
Pasquier, C; Promponas, V J; Hamodrakas, S J
2001-08-15
A cascading system of hierarchical, artificial neural networks (named PRED-CLASS) is presented for the generalized classification of proteins into four distinct classes-transmembrane, fibrous, globular, and mixed-from information solely encoded in their amino acid sequences. The architecture of the individual component networks is kept very simple, reducing the number of free parameters (network synaptic weights) for faster training, improved generalization, and the avoidance of data overfitting. Capturing information from as few as 50 protein sequences spread among the four target classes (6 transmembrane, 10 fibrous, 13 globular, and 17 mixed), PRED-CLASS was able to obtain 371 correct predictions out of a set of 387 proteins (success rate approximately 96%) unambiguously assigned into one of the target classes. The application of PRED-CLASS to several test sets and complete proteomes of several organisms demonstrates that such a method could serve as a valuable tool in the annotation of genomic open reading frames with no functional assignment or as a preliminary step in fold recognition and ab initio structure prediction methods. Detailed results obtained for various data sets and completed genomes, along with a web sever running the PRED-CLASS algorithm, can be accessed over the World Wide Web at http://o2.biol.uoa.gr/PRED-CLASS.
Yap, Jonathan; Lim, Fang Yi; Gao, Fei; Teo, Ling Li; Lam, Carolyn Su Ping; Yeo, Khung Keong
2015-10-01
Functional status assessment is the cornerstone of heart failure management and trials. The New York Heart Association (NYHA) classification and 6-minute walk distance (6MWD) are commonly used tools; however, the correlation between them is not well understood. We hypothesised that the relationship between the NYHA classification and 6MWD might vary across studies. A systematic literature search was performed to identify all studies reporting both NYHA class and 6MWD. Two reviewers independently assessed study eligibility and extracted data. Thirty-seven studies involving 5678 patients were included. There was significant heterogeneity across studies in 6MWD within all NYHA classes: I (n = 16, Q = 934.2; P < 0.001), II (n = 25, Q = 1658.3; P < 0.001), III (n = 30, Q = 1020.1; P < 0.001), and IV (n = 6, Q = 335.5; P < 0.001). There was no significant difference in average 6MWD between NYHA I and II (420 m vs 393 m; P = 0.416). There was a significant difference in average 6MWD between NYHA II and III (393 m vs 321 m; P = 0.014) and III and IV (321 m vs 224 m; P = 0.027). This remained significant after adjusting for region of study, age, and sex. Although there is an inverse correlation between NYHA II-IV and 6MWD, there is significant heterogeneity across studies in 6MWD within each NYHA class and overlap in 6MWD between NYHA I and II. The NYHA classification performs well in more symptomatic patients (NYHA III/IV) but less so in asymptomatic/mildly symptomatic patients (NYHA I/II). Nonetheless, the NYHA classification is an easily applied first-line tool in everyday clinical practice, but its potential subjectivity should be considered when performing comparisons across studies. © 2015 Wiley Periodicals, Inc.
A One-Versus-All Class Binarization Strategy for Bearing Diagnostics of Concurrent Defects
Ng, Selina S. Y.; Tse, Peter W.; Tsui, Kwok L.
2014-01-01
In bearing diagnostics using a data-driven modeling approach, a concern is the need for data from all possible scenarios to build a practical model for all operating conditions. This paper is a study on bearing diagnostics with the concurrent occurrence of multiple defect types. The authors are not aware of any work in the literature that studies this practical problem. A strategy based on one-versus-all (OVA) class binarization is proposed to improve fault diagnostics accuracy while reducing the number of scenarios for data collection, by predicting concurrent defects from training data of normal and single defects. The proposed OVA diagnostic approach is evaluated with empirical analysis using support vector machine (SVM) and C4.5 decision tree, two popular classification algorithms frequently applied to system health diagnostics and prognostics. Statistical features are extracted from the time domain and the frequency domain. Prediction performance of the proposed strategy is compared with that of a simple multi-class classification, as well as that of random guess and worst-case classification. We have verified the potential of the proposed OVA diagnostic strategy in performance improvements for single-defect diagnosis and predictions of BPFO plus BPFI concurrent defects using two laboratory-collected vibration data sets. PMID:24419162
A one-versus-all class binarization strategy for bearing diagnostics of concurrent defects.
Ng, Selina S Y; Tse, Peter W; Tsui, Kwok L
2014-01-13
In bearing diagnostics using a data-driven modeling approach, a concern is the need for data from all possible scenarios to build a practical model for all operating conditions. This paper is a study on bearing diagnostics with the concurrent occurrence of multiple defect types. The authors are not aware of any work in the literature that studies this practical problem. A strategy based on one-versus-all (OVA) class binarization is proposed to improve fault diagnostics accuracy while reducing the number of scenarios for data collection, by predicting concurrent defects from training data of normal and single defects. The proposed OVA diagnostic approach is evaluated with empirical analysis using support vector machine (SVM) and C4.5 decision tree, two popular classification algorithms frequently applied to system health diagnostics and prognostics. Statistical features are extracted from the time domain and the frequency domain. Prediction performance of the proposed strategy is compared with that of a simple multi-class classification, as well as that of random guess and worst-case classification. We have verified the potential of the proposed OVA diagnostic strategy in performance improvements for single-defect diagnosis and predictions of BPFO plus BPFI concurrent defects using two laboratory-collected vibration data sets.
Flow-based watershed classes provide useful strata for sampling and assessment schemes both for instream condition and condition of downstream receiving waters, and may provide useful strata for development of nutrient criteria.
Pattern recognition for passive polarimetric data using nonparametric classifiers
NASA Astrophysics Data System (ADS)
Thilak, Vimal; Saini, Jatinder; Voelz, David G.; Creusere, Charles D.
2005-08-01
Passive polarization based imaging is a useful tool in computer vision and pattern recognition. A passive polarization imaging system forms a polarimetric image from the reflection of ambient light that contains useful information for computer vision tasks such as object detection (classification) and recognition. Applications of polarization based pattern recognition include material classification and automatic shape recognition. In this paper, we present two target detection algorithms for images captured by a passive polarimetric imaging system. The proposed detection algorithms are based on Bayesian decision theory. In these approaches, an object can belong to one of any given number classes and classification involves making decisions that minimize the average probability of making incorrect decisions. This minimum is achieved by assigning an object to the class that maximizes the a posteriori probability. Computing a posteriori probabilities requires estimates of class conditional probability density functions (likelihoods) and prior probabilities. A Probabilistic neural network (PNN), which is a nonparametric method that can compute Bayes optimal boundaries, and a -nearest neighbor (KNN) classifier, is used for density estimation and classification. The proposed algorithms are applied to polarimetric image data gathered in the laboratory with a liquid crystal-based system. The experimental results validate the effectiveness of the above algorithms for target detection from polarimetric data.
Model-Based Building Detection from Low-Cost Optical Sensors Onboard Unmanned Aerial Vehicles
NASA Astrophysics Data System (ADS)
Karantzalos, K.; Koutsourakis, P.; Kalisperakis, I.; Grammatikopoulos, L.
2015-08-01
The automated and cost-effective building detection in ultra high spatial resolution is of major importance for various engineering and smart city applications. To this end, in this paper, a model-based building detection technique has been developed able to extract and reconstruct buildings from UAV aerial imagery and low-cost imaging sensors. In particular, the developed approach through advanced structure from motion, bundle adjustment and dense image matching computes a DSM and a true orthomosaic from the numerous GoPro images which are characterised by important geometric distortions and fish-eye effect. An unsupervised multi-region, graphcut segmentation and a rule-based classification is responsible for delivering the initial multi-class classification map. The DTM is then calculated based on inpaininting and mathematical morphology process. A data fusion process between the detected building from the DSM/DTM and the classification map feeds a grammar-based building reconstruction and scene building are extracted and reconstructed. Preliminary experimental results appear quite promising with the quantitative evaluation indicating detection rates at object level of 88% regarding the correctness and above 75% regarding the detection completeness.
A real-time classification algorithm for EEG-based BCI driven by self-induced emotions.
Iacoviello, Daniela; Petracca, Andrea; Spezialetti, Matteo; Placidi, Giuseppe
2015-12-01
The aim of this paper is to provide an efficient, parametric, general, and completely automatic real time classification method of electroencephalography (EEG) signals obtained from self-induced emotions. The particular characteristics of the considered low-amplitude signals (a self-induced emotion produces a signal whose amplitude is about 15% of a really experienced emotion) require exploring and adapting strategies like the Wavelet Transform, the Principal Component Analysis (PCA) and the Support Vector Machine (SVM) for signal processing, analysis and classification. Moreover, the method is thought to be used in a multi-emotions based Brain Computer Interface (BCI) and, for this reason, an ad hoc shrewdness is assumed. The peculiarity of the brain activation requires ad-hoc signal processing by wavelet decomposition, and the definition of a set of features for signal characterization in order to discriminate different self-induced emotions. The proposed method is a two stages algorithm, completely parameterized, aiming at a multi-class classification and may be considered in the framework of machine learning. The first stage, the calibration, is off-line and is devoted at the signal processing, the determination of the features and at the training of a classifier. The second stage, the real-time one, is the test on new data. The PCA theory is applied to avoid redundancy in the set of features whereas the classification of the selected features, and therefore of the signals, is obtained by the SVM. Some experimental tests have been conducted on EEG signals proposing a binary BCI, based on the self-induced disgust produced by remembering an unpleasant odor. Since in literature it has been shown that this emotion mainly involves the right hemisphere and in particular the T8 channel, the classification procedure is tested by using just T8, though the average accuracy is calculated and reported also for the whole set of the measured channels. The obtained classification results are encouraging with percentage of success that is, in the average for the whole set of the examined subjects, above 90%. An ongoing work is the application of the proposed procedure to map a large set of emotions with EEG and to establish the EEG headset with the minimal number of channels to allow the recognition of a significant range of emotions both in the field of affective computing and in the development of auxiliary communication tools for subjects affected by severe disabilities. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Learning to Predict Combinatorial Structures
NASA Astrophysics Data System (ADS)
Vembu, Shankar
2009-12-01
The major challenge in designing a discriminative learning algorithm for predicting structured data is to address the computational issues arising from the exponential size of the output space. Existing algorithms make different assumptions to ensure efficient, polynomial time estimation of model parameters. For several combinatorial structures, including cycles, partially ordered sets, permutations and other graph classes, these assumptions do not hold. In this thesis, we address the problem of designing learning algorithms for predicting combinatorial structures by introducing two new assumptions: (i) The first assumption is that a particular counting problem can be solved efficiently. The consequence is a generalisation of the classical ridge regression for structured prediction. (ii) The second assumption is that a particular sampling problem can be solved efficiently. The consequence is a new technique for designing and analysing probabilistic structured prediction models. These results can be applied to solve several complex learning problems including but not limited to multi-label classification, multi-category hierarchical classification, and label ranking.
NASA Astrophysics Data System (ADS)
Demuzere, Matthias; Kassomenos, P.; Philipp, A.
2011-08-01
In the framework of the COST733 Action "Harmonisation and Applications of Weather Types Classifications for European Regions" a new circulation type classification software (hereafter, referred to as cost733class software) is developed. The cost733class software contains a variety of (European) classification methods and is flexible towards choice of domain of interest, input variables, time step, number of circulation types, sequencing and (weighted) target variables. This work introduces the capabilities of the cost733class software in which the resulting circulation types (CTs) from various circulation type classifications (CTCs) are applied on observed summer surface ozone concentrations in Central Europe. Firstly, the main characteristics of the CTCs in terms of circulation pattern frequencies are addressed using the baseline COST733 catalogue (cat 2.0), at present the latest product of the new cost733class software. In a second step, the probabilistic Brier skill score is used to quantify the explanatory power of all classifications in terms of the maximum 8 hourly mean ozone concentrations exceeding the 120-μg/m3 threshold; this was based on ozone concentrations from 130 Central European measurement stations. Averaged evaluation results over all stations indicate generally higher performance of CTCs with a higher number of types. Within the subset of methodologies with a similar number of types, the results suggest that the use of CTCs based on optimisation algorithms are performing slightly better than those which are based on other algorithms (predefined thresholds, principal component analysis and leader algorithms). The results are further elaborated by exploring additional capabilities of the cost733class software. Sensitivity experiments are performed using different domain sizes, input variables, seasonally based classifications and multiple-day sequencing. As an illustration, CTCs which are also conditioned towards temperature with various weights are derived and tested similarly. All results exploit a physical interpretation by adapting the environment-to-circulation approach, providing more detailed information on specific synoptic conditions prevailing on days with high surface ozone concentrations. This research does not intend to bring forward a favourite classification methodology or construct a statistical ozone forecasting tool but should be seen as an introduction to the possibilities of the cost733class software. It this respect, the results presented here can provide a basic user support for the cost733class software and the development of a more user- or application-specific CTC approach.
NASA Astrophysics Data System (ADS)
Bayramov, Emil; Mammadov, Ramiz
2016-07-01
The main goals of this research are the object-based landcover classification of LANDSAT-8 multi-spectral satellite images in 2014 and 2015, quantification of Normalized Difference Vegetation Indices (NDVI) rates within the land-cover classes, change detection analysis between the NDVIs derived from multi-temporal LANDSAT-8 satellite images and the quantification of those changes within the land-cover classes and detection of changes between land-cover classes. The object-based classification accuracy of the land-cover classes was validated through the standard confusion matrix which revealed 80 % of land-cover classification accuracy for both years. The analysis revealed that the area of agricultural lands increased from 30911 sq. km. in 2014 to 31999 sq. km. in 2015. The area of barelands increased from 3933 sq. km. in 2014 to 4187 sq. km. in 2015. The area of forests increased from 8211 sq. km. in 2014 to 9175 sq. km. in 2015. The area of grasslands decreased from 27176 sq. km. in 2014 to 23294 sq. km. in 2015. The area of urban areas increased from 12479 sq. km. in 2014 to 12956 sq. km. in 2015. The decrease in the area of grasslands was mainly explained by the landuse shifts of grasslands to agricultural and urban lands. The quantification of low and medium NDVI rates revealed the increase within the agricultural, urban and forest land-cover classes in 2015. However, the high NDVI rates within agricultural, urban and forest land-cover classes in 2015 revealed to be lower relative to 2014. The change detection analysis between landscover types of 2014 and 2015 allowed to determine that 7740 sq. km. of grasslands shifted to agricultural landcover type whereas 5442sq. km. of agricultural lands shifted to rangelands. This means that the spatio-temporal patters of agricultural activities occurred in Azerbaijan because some of the areas reduced agricultural activities whereas some of them changed their landuse type to agricultural. Based on the achieved results, it is possible to conclude that the area of agricultural lands in Azerbaijan increased from 2014 to 2015. The crop productivity also increased in the croplands, however some of the areas showed lower productivity in 2015 relative to 2014.
Diversity and evolution of class 2 CRISPR–Cas systems
Shmakov, Sergey; Smargon, Aaron; Scott, David; Cox, David; Pyzocha, Neena; Yan, Winston; Abudayyeh, Omar O.; Gootenberg, Jonathan S.; Makarova, Kira S.; Wolf, Yuri I.; Severinov, Konstantin; Zhang, Feng; Koonin, Eugene V.
2018-01-01
Class 2 CRISPR–Cas systems are characterized by effector modules that consist of a single multidomain protein, such as Cas9 or Cpf1. We designed a computational pipeline for the discovery of novel class 2 variants and used it to identify six new CRISPR–Cas subtypes. The diverse properties of these new systems provide potential for the development of versatile tools for genome editing and regulation. In this Analysis article, we present a comprehensive census of class 2 types and class 2 subtypes in complete and draft bacterial and archaeal genomes, outline evolutionary scenarios for the independent origin of different class 2 CRISPR–Cas systems from mobile genetic elements, and propose an amended classification and nomenclature of CRISPR–Cas. PMID:28111461
Malware distributed collection and pre-classification system using honeypot technology
NASA Astrophysics Data System (ADS)
Grégio, André R. A.; Oliveira, Isabela L.; Santos, Rafael D. C.; Cansian, Adriano M.; de Geus, Paulo L.
2009-04-01
Malware has become a major threat in the last years due to the ease of spread through the Internet. Malware detection has become difficult with the use of compression, polymorphic methods and techniques to detect and disable security software. Those and other obfuscation techniques pose a problem for detection and classification schemes that analyze malware behavior. In this paper we propose a distributed architecture to improve malware collection using different honeypot technologies to increase the variety of malware collected. We also present a daemon tool developed to grab malware distributed through spam and a pre-classification technique that uses antivirus technology to separate malware in generic classes.
Pairwise Classifier Ensemble with Adaptive Sub-Classifiers for fMRI Pattern Analysis.
Kim, Eunwoo; Park, HyunWook
2017-02-01
The multi-voxel pattern analysis technique is applied to fMRI data for classification of high-level brain functions using pattern information distributed over multiple voxels. In this paper, we propose a classifier ensemble for multiclass classification in fMRI analysis, exploiting the fact that specific neighboring voxels can contain spatial pattern information. The proposed method converts the multiclass classification to a pairwise classifier ensemble, and each pairwise classifier consists of multiple sub-classifiers using an adaptive feature set for each class-pair. Simulated and real fMRI data were used to verify the proposed method. Intra- and inter-subject analyses were performed to compare the proposed method with several well-known classifiers, including single and ensemble classifiers. The comparison results showed that the proposed method can be generally applied to multiclass classification in both simulations and real fMRI analyses.
Soranno, Patricia A.; Cheruvelil, Kendra Spence; Webster, Katherine E.; Bremigan, Mary T.; Wagner, Tyler; Stow, Craig A.
2010-01-01
Governmental entities are responsible for managing and conserving large numbers of lake, river, and wetland ecosystems that can be addressed only rarely on a case-by-case basis. We present a system for predictive classification modeling, grounded in the theoretical foundation of landscape limnology, that creates a tractable number of ecosystem classes to which management actions may be tailored. We demonstrate our system by applying two types of predictive classification modeling approaches to develop nutrient criteria for eutrophication management in 1998 north temperate lakes. Our predictive classification system promotes the effective management of multiple ecosystems across broad geographic scales by explicitly connecting management and conservation goals to the classification modeling approach, considering multiple spatial scales as drivers of ecosystem dynamics, and acknowledging the hierarchical structure of freshwater ecosystems. Such a system is critical for adaptive management of complex mosaics of freshwater ecosystems and for balancing competing needs for ecosystem services in a changing world.
A hybrid method for prediction and repositioning of drug Anatomical Therapeutic Chemical classes.
Chen, Lei; Lu, Jing; Zhang, Ning; Huang, Tao; Cai, Yu-Dong
2014-04-01
In the Anatomical Therapeutic Chemical (ATC) classification system, therapeutic drugs are divided into 14 main classes according to the organ or system on which they act and their chemical, pharmacological and therapeutic properties. This system, recommended by the World Health Organization (WHO), provides a global standard for classifying medical substances and serves as a tool for international drug utilization research to improve quality of drug use. In view of this, it is necessary to develop effective computational prediction methods to identify the ATC-class of a given drug, which thereby could facilitate further analysis of this system. In this study, we initiated an attempt to develop a prediction method and to gain insights from it by utilizing ontology information of drug compounds. Since only about one-fourth of drugs in the ATC classification system have ontology information, a hybrid prediction method combining the ontology information, chemical interaction information and chemical structure information of drug compounds was proposed for the prediction of drug ATC-classes. As a result, by using the Jackknife test, the 1st prediction accuracies for identifying the 14 main ATC-classes in the training dataset, the internal validation dataset and the external validation dataset were 75.90%, 75.70% and 66.36%, respectively. Analysis of some samples with false-positive predictions in the internal and external validation datasets indicated that some of them may even have a relationship with the false-positive predicted ATC-class, suggesting novel uses of these drugs. It was conceivable that the proposed method could be used as an efficient tool to identify ATC-classes of novel drugs or to discover novel uses of known drugs.
NASA Astrophysics Data System (ADS)
Tarando, Sebastian Roberto; Fetita, Catalin; Brillet, Pierre-Yves
2017-03-01
The infiltrative lung diseases are a class of irreversible, non-neoplastic lung pathologies requiring regular follow-up with CT imaging. Quantifying the evolution of the patient status imposes the development of automated classification tools for lung texture. Traditionally, such classification relies on a two-dimensional analysis of axial CT images. This paper proposes a cascade of the existing CNN based CAD system, specifically tuned-up. The advantage of using a deep learning approach is a better regularization of the classification output. In a preliminary evaluation, the combined approach was tested on a 13 patient database of various lung pathologies, showing an increase of 10% in True Positive Rate (TPR) with respect to the best suited state of the art CNN for this task.
Czodrowski, Paul
2014-11-01
In the 1960s, the kappa statistic was introduced for the estimation of chance agreement in inter- and intra-rater reliability studies. The kappa statistic was strongly pushed by the medical field where it could be successfully applied via analyzing diagnoses of identical patient groups. Kappa is well suited for classification tasks where ranking is not considered. The main advantage of kappa is its simplicity and the general applicability to multi-class problems which is the major difference to receiver operating characteristic area under the curve. In this manuscript, I will outline the usage of kappa for classification tasks, and I will evaluate the role and uses of kappa in specifically machine learning and cheminformatics.
Can surgical simulation be used to train detection and classification of neural networks?
Zisimopoulos, Odysseas; Flouty, Evangello; Stacey, Mark; Muscroft, Sam; Giataganas, Petros; Nehme, Jean; Chow, Andre; Stoyanov, Danail
2017-10-01
Computer-assisted interventions (CAI) aim to increase the effectiveness, precision and repeatability of procedures to improve surgical outcomes. The presence and motion of surgical tools is a key information input for CAI surgical phase recognition algorithms. Vision-based tool detection and recognition approaches are an attractive solution and can be designed to take advantage of the powerful deep learning paradigm that is rapidly advancing image recognition and classification. The challenge for such algorithms is the availability and quality of labelled data used for training. In this Letter, surgical simulation is used to train tool detection and segmentation based on deep convolutional neural networks and generative adversarial networks. The authors experiment with two network architectures for image segmentation in tool classes commonly encountered during cataract surgery. A commercially-available simulator is used to create a simulated cataract dataset for training models prior to performing transfer learning on real surgical data. To the best of authors' knowledge, this is the first attempt to train deep learning models for surgical instrument detection on simulated data while demonstrating promising results to generalise on real data. Results indicate that simulated data does have some potential for training advanced classification methods for CAI systems.
GIS/RS-based Rapid Reassessment for Slope Land Capability Classification
NASA Astrophysics Data System (ADS)
Chang, T. Y.; Chompuchan, C.
2014-12-01
Farmland resources in Taiwan are limited because about 73% is mountainous and slope land. Moreover, the rapid urbanization and dense population resulted in the highly developed flat area. Therefore, the utilization of slope land for agriculture is more needed. In 1976, "Slope Land Conservation and Utilization Act" was promulgated to regulate the slope land utilization. Consequently, slope land capability was categorized into Class I-IV according to 4 criteria, i.e., average land slope, effective soil depth, degree of soil erosion, and parent rock. The slope land capability Class I-VI are suitable for cultivation and pasture. Whereas, Class V should be used for forestry purpose and Class VI should be the conservation land which requires intensive conservation practices. The field survey was conducted to categorize each land unit as the classification scheme. The landowners may not allow to overuse land capability limitation. In the last decade, typhoons and landslides frequently devastated in Taiwan. The rapid post-disaster reassessment of the slope land capability classification is necessary. However, the large-scale disaster on slope land is the constraint of field investigation. This study focused on using satellite remote sensing and GIS as the rapid re-evaluation method. Chenyulan watershed in Nantou County, Taiwan was selected to be a case study area. Grid-based slope derivation, topographic wetness index (TWI) and USLE soil loss calculation were used to classify slope land capability. The results showed that GIS-based classification give an overall accuracy of 68.32%. In addition, the post-disaster areas of Typhoon Morakot in 2009, which interpreted by SPOT satellite imageries, were suggested to classify as the conservation lands. These tools perform better in the large coverage post-disaster update for slope land capability classification and reduce time-consuming, manpower and material resources to the field investigation.
Training echo state networks for rotation-invariant bone marrow cell classification.
Kainz, Philipp; Burgsteiner, Harald; Asslaber, Martin; Ahammer, Helmut
2017-01-01
The main principle of diagnostic pathology is the reliable interpretation of individual cells in context of the tissue architecture. Especially a confident examination of bone marrow specimen is dependent on a valid classification of myeloid cells. In this work, we propose a novel rotation-invariant learning scheme for multi-class echo state networks (ESNs), which achieves very high performance in automated bone marrow cell classification. Based on representing static images as temporal sequence of rotations, we show how ESNs robustly recognize cells of arbitrary rotations by taking advantage of their short-term memory capacity. The performance of our approach is compared to a classification random forest that learns rotation-invariance in a conventional way by exhaustively training on multiple rotations of individual samples. The methods were evaluated on a human bone marrow image database consisting of granulopoietic and erythropoietic cells in different maturation stages. Our ESN approach to cell classification does not rely on segmentation of cells or manual feature extraction and can therefore directly be applied to image data.
GEAS Spectroscopy Tools for Authentic Research Investigations in the Classroom
NASA Astrophysics Data System (ADS)
Rector, Travis A.; Vogt, Nicole P.
2018-06-01
Spectroscopy is one of the most powerful tools that astronomers use to study the universe. However relatively few resources are available that enable undergraduates to explore astronomical spectra interactively. We present web-based applications which guide students through the analysis of real spectra of stars, galaxies, and quasars. The tools are written in HTML5 and function in all modern web browsers on computers and tablets. No software needs to be installed nor do any datasets need to be downloaded, enabling students to use the tools in or outside of class (e.g., for online classes).Approachable GUIs allow students to analyze spectra in the same manner as professional astronomers. The stellar spectroscopy tool can fit a continuum with a blackbody and identify spectral features, as well as fit line profiles and determine equivalent widths. The galaxy and AGN tools can also measure redshifts and calcium break strengths. The tools provide access to an archive of hundreds of spectra obtained with the optical telescopes at Kitt Peak National Observatory. It is also possible to load your own spectra or to query the Sloan Digital Sky Survey (SDSS) database.We have also developed curricula to investigate these topics: spectral classification, variable stars, redshift, and AGN classification. We will present the functionality of the tools and describe the associated curriculum. The tools are part of the General Education Astronomy Source (GEAS) project based at New Mexico State University, with support from the National Science Foundation (NSF, AST-0349155) and the National Aeronautics and Space Administration (NASA, NNX09AV36G). Curriculum development was supported by the NSF (DUE-0618849 and DUE-0920293).
Maximum Margin Clustering of Hyperspectral Data
NASA Astrophysics Data System (ADS)
Niazmardi, S.; Safari, A.; Homayouni, S.
2013-09-01
In recent decades, large margin methods such as Support Vector Machines (SVMs) are supposed to be the state-of-the-art of supervised learning methods for classification of hyperspectral data. However, the results of these algorithms mainly depend on the quality and quantity of available training data. To tackle down the problems associated with the training data, the researcher put effort into extending the capability of large margin algorithms for unsupervised learning. One of the recent proposed algorithms is Maximum Margin Clustering (MMC). The MMC is an unsupervised SVMs algorithm that simultaneously estimates both the labels and the hyperplane parameters. Nevertheless, the optimization of the MMC algorithm is a non-convex problem. Most of the existing MMC methods rely on the reformulating and the relaxing of the non-convex optimization problem as semi-definite programs (SDP), which are computationally very expensive and only can handle small data sets. Moreover, most of these algorithms are two-class classification, which cannot be used for classification of remotely sensed data. In this paper, a new MMC algorithm is used that solve the original non-convex problem using Alternative Optimization method. This algorithm is also extended for multi-class classification and its performance is evaluated. The results of the proposed algorithm show that the algorithm has acceptable results for hyperspectral data clustering.
NASA Astrophysics Data System (ADS)
Schwindling, Jerome
2010-04-01
This course presents an overview of the concepts of the neural networks and their aplication in the framework of High energy physics analyses. After a brief introduction on the concept of neural networks, the concept is explained in the frame of neuro-biology, introducing the concept of multi-layer perceptron, learning and their use as data classifer. The concept is then presented in a second part using in more details the mathematical approach focussing on typical use cases faced in particle physics. Finally, the last part presents the best way to use such statistical tools in view of event classifers, putting the emphasis on the setup of the multi-layer perceptron. The full article (15 p.) corresponding to this lecture is written in french and is provided in the proceedings of the book SOS 2008.
Thompson, Bryony A.; Greenblatt, Marc S.; Vallee, Maxime P.; Herkert, Johanna C.; Tessereau, Chloe; Young, Erin L.; Adzhubey, Ivan A.; Li, Biao; Bell, Russell; Feng, Bingjian; Mooney, Sean D.; Radivojac, Predrag; Sunyaev, Shamil R.; Frebourg, Thierry; Hofstra, Robert M.W.; Sijmons, Rolf H.; Boucher, Ken; Thomas, Alun; Goldgar, David E.; Spurdle, Amanda B.; Tavtigian, Sean V.
2015-01-01
Classification of rare missense substitutions observed during genetic testing for patient management is a considerable problem in clinical genetics. The Bayesian integrated evaluation of unclassified variants is a solution originally developed for BRCA1/2. Here, we take a step toward an analogous system for the mismatch repair (MMR) genes (MLH1, MSH2, MSH6, and PMS2) that confer colon cancer susceptibility in Lynch syndrome by calibrating in silico tools to estimate prior probabilities of pathogenicity for MMR gene missense substitutions. A qualitative five-class classification system was developed and applied to 143 MMR missense variants. This identified 74 missense substitutions suitable for calibration. These substitutions were scored using six different in silico tools (Align-Grantham Variation Grantham Deviation, multivariate analysis of protein polymorphisms [MAPP], Mut-Pred, PolyPhen-2.1, Sorting Intolerant From Tolerant, and Xvar), using curated MMR multiple sequence alignments where possible. The output from each tool was calibrated by regression against the classifications of the 74 missense substitutions; these calibrated outputs are interpretable as prior probabilities of pathogenicity. MAPP was the most accurate tool and MAPP + PolyPhen-2.1 provided the best-combined model (R2 = 0.62 and area under receiver operating characteristic = 0.93). The MAPP + PolyPhen-2.1 output is sufficiently predictive to feed as a continuous variable into the quantitative Bayesian integrated evaluation for clinical classification of MMR gene missense substitutions. PMID:22949387
Automation of motor dexterity assessment.
Heyer, Patrick; Castrejon, Luis R; Orihuela-Espina, Felipe; Sucar, Luis Enrique
2017-07-01
Motor dexterity assessment is regularly performed in rehabilitation wards to establish patient status and automatization for such routinary task is sought. A system for automatizing the assessment of motor dexterity based on the Fugl-Meyer scale and with loose restrictions on sensing technologies is presented. The system consists of two main elements: 1) A data representation that abstracts the low level information obtained from a variety of sensors, into a highly separable low dimensionality encoding employing t-distributed Stochastic Neighbourhood Embedding, and, 2) central to this communication, a multi-label classifier that boosts classification rates by exploiting the fact that the classes corresponding to the individual exercises are naturally organized as a network. Depending on the targeted therapeutic movement class labels i.e. exercises scores, are highly correlated-patients who perform well in one, tends to perform well in related exercises-; and critically no node can be used as proxy of others - an exercise does not encode the information of other exercises. Over data from a cohort of 20 patients, the novel classifier outperforms classical Naive Bayes, random forest and variants of support vector machines (ANOVA: p < 0.001). The novel multi-label classification strategy fulfills an automatic system for motor dexterity assessment, with implications for lessening therapist's workloads, reducing healthcare costs and providing support for home-based virtual rehabilitation and telerehabilitation alternatives.
Experiments on Supervised Learning Algorithms for Text Categorization
NASA Technical Reports Server (NTRS)
Namburu, Setu Madhavi; Tu, Haiying; Luo, Jianhui; Pattipati, Krishna R.
2005-01-01
Modern information society is facing the challenge of handling massive volume of online documents, news, intelligence reports, and so on. How to use the information accurately and in a timely manner becomes a major concern in many areas. While the general information may also include images and voice, we focus on the categorization of text data in this paper. We provide a brief overview of the information processing flow for text categorization, and discuss two supervised learning algorithms, viz., support vector machines (SVM) and partial least squares (PLS), which have been successfully applied in other domains, e.g., fault diagnosis [9]. While SVM has been well explored for binary classification and was reported as an efficient algorithm for text categorization, PLS has not yet been applied to text categorization. Our experiments are conducted on three data sets: Reuter's- 21578 dataset about corporate mergers and data acquisitions (ACQ), WebKB and the 20-Newsgroups. Results show that the performance of PLS is comparable to SVM in text categorization. A major drawback of SVM for multi-class categorization is that it requires a voting scheme based on the results of pair-wise classification. PLS does not have this drawback and could be a better candidate for multi-class text categorization.
Sub-pixel image classification for forest types in East Texas
NASA Astrophysics Data System (ADS)
Westbrook, Joey
Sub-pixel classification is the extraction of information about the proportion of individual materials of interest within a pixel. Landcover classification at the sub-pixel scale provides more discrimination than traditional per-pixel multispectral classifiers for pixels where the material of interest is mixed with other materials. It allows for the un-mixing of pixels to show the proportion of each material of interest. The materials of interest for this study are pine, hardwood, mixed forest and non-forest. The goal of this project was to perform a sub-pixel classification, which allows a pixel to have multiple labels, and compare the result to a traditional supervised classification, which allows a pixel to have only one label. The satellite image used was a Landsat 5 Thematic Mapper (TM) scene of the Stephen F. Austin Experimental Forest in Nacogdoches County, Texas and the four cover type classes are pine, hardwood, mixed forest and non-forest. Once classified, a multi-layer raster datasets was created that comprised four raster layers where each layer showed the percentage of that cover type within the pixel area. Percentage cover type maps were then produced and the accuracy of each was assessed using a fuzzy error matrix for the sub-pixel classifications, and the results were compared to the supervised classification in which a traditional error matrix was used. The overall accuracy of the sub-pixel classification using the aerial photo for both training and reference data had the highest (65% overall) out of the three sub-pixel classifications. This was understandable because the analyst can visually observe the cover types actually on the ground for training data and reference data, whereas using the FIA (Forest Inventory and Analysis) plot data, the analyst must assume that an entire pixel contains the exact percentage of a cover type found in a plot. An increase in accuracy was found after reclassifying each sub-pixel classification from nine classes with 10 percent interval each to five classes with 20 percent interval each. When compared to the supervised classification which has a satisfactory overall accuracy of 90%, none of the sub-pixel classification achieved the same level. However, since traditional per-pixel classifiers assign only one label to pixels throughout the landscape while sub-pixel classifications assign multiple labels to each pixel, the traditional 85% accuracy of acceptance for pixel-based classifications should not apply to sub-pixel classifications. More research is needed in order to define the level of accuracy that is deemed acceptable for sub-pixel classifications.
Classification of conductance traces with recurrent neural networks
NASA Astrophysics Data System (ADS)
Lauritzen, Kasper P.; Magyarkuti, András; Balogh, Zoltán; Halbritter, András; Solomon, Gemma C.
2018-02-01
We present a new automated method for structural classification of the traces obtained in break junction experiments. Using recurrent neural networks trained on the traces of minimal cross-sectional area in molecular dynamics simulations, we successfully separate the traces into two classes: point contact or nanowire. This is done without any assumptions about the expected features of each class. The trained neural network is applied to experimental break junction conductance traces, and it separates the classes as well as the previously used experimental methods. The effect of using partial conductance traces is explored, and we show that the method performs equally well using full or partial traces (as long as the trace just prior to breaking is included). When only the initial part of the trace is included, the results are still better than random chance. Finally, we show that the neural network classification method can be used to classify experimental conductance traces without using simulated results for training, but instead training the network on a few representative experimental traces. This offers a tool to recognize some characteristic motifs of the traces, which can be hard to find by simple data selection algorithms.
Gross, Douglas P; Zhang, Jing; Steenstra, Ivan; Barnsley, Susan; Haws, Calvin; Amell, Tyler; McIntosh, Greg; Cooper, Juliette; Zaiane, Osmar
2013-12-01
To develop a classification algorithm and accompanying computer-based clinical decision support tool to help categorize injured workers toward optimal rehabilitation interventions based on unique worker characteristics. Population-based historical cohort design. Data were extracted from a Canadian provincial workers' compensation database on all claimants undergoing work assessment between December 2009 and January 2011. Data were available on: (1) numerous personal, clinical, occupational, and social variables; (2) type of rehabilitation undertaken; and (3) outcomes following rehabilitation (receiving time loss benefits or undergoing repeat programs). Machine learning, concerned with the design of algorithms to discriminate between classes based on empirical data, was the foundation of our approach to build a classification system with multiple independent and dependent variables. The population included 8,611 unique claimants. Subjects were predominantly employed (85 %) males (64 %) with diagnoses of sprain/strain (44 %). Baseline clinician classification accuracy was high (ROC = 0.86) for selecting programs that lead to successful return-to-work. Classification performance for machine learning techniques outperformed the clinician baseline classification (ROC = 0.94). The final classifiers were multifactorial and included the variables: injury duration, occupation, job attachment status, work status, modified work availability, pain intensity rating, self-rated occupational disability, and 9 items from the SF-36 Health Survey. The use of machine learning classification techniques appears to have resulted in classification performance better than clinician decision-making. The final algorithm has been integrated into a computer-based clinical decision support tool that requires additional validation in a clinical sample.
Classifying quantum entanglement through topological links
NASA Astrophysics Data System (ADS)
Quinta, Gonçalo M.; André, Rui
2018-04-01
We propose an alternative classification scheme for quantum entanglement based on topological links. This is done by identifying a nonrigid ring to a particle, attributing the act of cutting and removing a ring to the operation of tracing out the particle, and associating linked rings to entangled particles. This analogy naturally leads us to a classification of multipartite quantum entanglement based on all possible distinct links for a given number of rings. To determine all different possibilities, we develop a formalism that associates any link to a polynomial, with each polynomial thereby defining a distinct equivalence class. To demonstrate the use of this classification scheme, we choose qubit quantum states as our example of physical system. A possible procedure to obtain qubit states from the polynomials is also introduced, providing an example state for each link class. We apply the formalism for the quantum systems of three and four qubits and demonstrate the potential of these tools in a context of qubit networks.
A multi-objective sampling design has been implemented through R-EMAP support of a cooperative agreement with the state of West Virginia. Goals of the project include: 1) development and testing of a temperature-adjusted fish IBI for the Central Appalachian Plateau and Western Al...
Vegetation classification and distribution mapping report Mesa Verde National Park
Thomas, Kathryn A.; McTeague, Monica L.; Ogden, Lindsay; Floyd, M. Lisa; Schulz, Keith; Friesen, Beverly A.; Fancher, Tammy; Waltermire, Robert G.; Cully, Anne
2009-01-01
The classification and distribution mapping of the vegetation of Mesa Verde National Park (MEVE) and surrounding environment was achieved through a multi-agency effort between 2004 and 2007. The National Park Service’s Southern Colorado Plateau Network facilitated the team that conducted the work, which comprised the U.S. Geological Survey’s Southwest Biological Science Center, Fort Collins Research Center, and Rocky Mountain Geographic Science Center; Northern Arizona University; Prescott College; and NatureServe. The project team described 47 plant communities for MEVE, 34 of which were described from quantitative classification based on f eld-relevé data collected in 1993 and 2004. The team derived 13 additional plant communities from field observations during the photointerpretation phase of the project. The National Vegetation Classification Standard served as a framework for classifying these plant communities to the alliance and association level. Eleven of the 47 plant communities were classified as “park specials;” that is, plant communities with insufficient data to describe them as new alliances or associations. The project team also developed a spatial vegetation map database representing MEVE, with three different map-class schemas: base, group, and management map classes. The base map classes represent the fi nest level of spatial detail. Initial polygons were developed using Definiens Professional (at the time of our use, this software was called eCognition), assisted by interpretation of 1:12,000 true-color digital orthophoto quarter quadrangles (DOQQs). These polygons (base map classes) were labeled using manual photo interpretation of the DOQQs and 1:12,000 true-color aerial photography. Field visits verified interpretation concepts. The vegetation map database includes 46 base map classes, which consist of associations, alliances, and park specials classified with quantitative analysis, additional associations and park specials noted during photointerpretation, and non-vegetated land cover, such as infrastructure, land use, and geological land cover. The base map classes consist of 5,007 polygons in the project area. A field-based accuracy assessment of the base map classes showed overall accuracy to be 43.5%. Seven map classes comprise 89.1% of the park vegetated land cover. The group map classes represent aggregations of the base map classes, approximating the group level of the National Vegetation Classification Standard, version 2 (Federal Geographic Data Committee 2007), and reflecting physiognomy and floristics. Terrestrial ecological systems, as described by NatureServe (Comer et al. 2003), were used as the fi rst approximation of the group level. The project team identified 14 group map classes for this project. The overall accuracy of the group map classes was determined using the same accuracy assessment data as for the base map classes. The overall accuracy of the group representation of vegetation was 80.3%. In consultation with park staff , the team developed management map classes, consisting of park-defined groupings of base map classes intended to represent a balance between maintaining required accuracy and providing a focus on vegetation of particular interest or import to park managers. The 23 management map classes had an overall accuracy of 73.3%. While the main products of this project are the vegetation classification and the vegetation map database, a number of ancillary digital geographic information system and database products were also produced that can be used independently or to augment the main products. These products include shapefiles of the locations of field-collected data and relational databases of field-collected data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paegert, Martin; Stassun, Keivan G.; Burger, Dan M.
2014-08-01
We describe a new neural-net-based light curve classifier and provide it with documentation as a ready-to-use tool for the community. While optimized for identification and classification of eclipsing binary stars, the classifier is general purpose, and has been developed for speed in the context of upcoming massive surveys such as the Large Synoptic Survey Telescope. A challenge for classifiers in the context of neural-net training and massive data sets is to minimize the number of parameters required to describe each light curve. We show that a simple and fast geometric representation that encodes the overall light curve shape, together withmore » a chi-square parameter to capture higher-order morphology information results in efficient yet robust light curve classification, especially for eclipsing binaries. Testing the classifier on the ASAS light curve database, we achieve a retrieval rate of 98% and a false-positive rate of 2% for eclipsing binaries. We achieve similarly high retrieval rates for most other periodic variable-star classes, including RR Lyrae, Mira, and delta Scuti. However, the classifier currently has difficulty discriminating between different sub-classes of eclipsing binaries, and suffers a relatively low (∼60%) retrieval rate for multi-mode delta Cepheid stars. We find that it is imperative to train the classifier's neural network with exemplars that include the full range of light curve quality to which the classifier will be expected to perform; the classifier performs well on noisy light curves only when trained with noisy exemplars. The classifier source code, ancillary programs, a trained neural net, and a guide for use, are provided.« less
Demirci, Oguz; Clark, Vincent P; Magnotta, Vincent A; Andreasen, Nancy C; Lauriello, John; Kiehl, Kent A; Pearlson, Godfrey D; Calhoun, Vince D
2008-09-01
Functional magnetic resonance imaging (fMRI) is a fairly new technique that has the potential to characterize and classify brain disorders such as schizophrenia. It has the possibility of playing a crucial role in designing objective prognostic/diagnostic tools, but also presents numerous challenges to analysis and interpretation. Classification provides results for individual subjects, rather than results related to group differences. This is a more complicated endeavor that must be approached more carefully and efficient methods should be developed to draw generalized and valid conclusions out of high dimensional data with a limited number of subjects, especially for heterogeneous disorders whose pathophysiology is unknown. Numerous research efforts have been reported in the field using fMRI activation of schizophrenia patients and healthy controls. However, the results are usually not generalizable to larger data sets and require careful definition of the techniques used both in designing algorithms and reporting prediction accuracies. In this review paper, we survey a number of previous reports and also identify possible biases (cross-validation, class size, e.g.) in class comparison/prediction problems. Some suggestions to improve the effectiveness of the presentation of the prediction accuracy results are provided. We also present our own results using a projection pursuit algorithm followed by an application of independent component analysis proposed in an earlier study. We classify schizophrenia versus healthy controls using fMRI data of 155 subjects from two sites obtained during three different tasks. The results are compared in order to investigate the effectiveness of each task and differences between patients with schizophrenia and healthy controls were investigated.
NASA Astrophysics Data System (ADS)
Tao, C.-S.; Chen, S.-W.; Li, Y.-Z.; Xiao, S.-P.
2017-09-01
Land cover classification is an important application for polarimetric synthetic aperture radar (PolSAR) data utilization. Rollinvariant polarimetric features such as H / Ani / α / Span are commonly adopted in PolSAR land cover classification. However, target orientation diversity effect makes PolSAR images understanding and interpretation difficult. Only using the roll-invariant polarimetric features may introduce ambiguity in the interpretation of targets' scattering mechanisms and limit the followed classification accuracy. To address this problem, this work firstly focuses on hidden polarimetric feature mining in the rotation domain along the radar line of sight using the recently reported uniform polarimetric matrix rotation theory and the visualization and characterization tool of polarimetric coherence pattern. The former rotates the acquired polarimetric matrix along the radar line of sight and fully describes the rotation characteristics of each entry of the matrix. Sets of new polarimetric features are derived to describe the hidden scattering information of the target in the rotation domain. The latter extends the traditional polarimetric coherence at a given rotation angle to the rotation domain for complete interpretation. A visualization and characterization tool is established to derive new polarimetric features for hidden information exploration. Then, a classification scheme is developed combing both the selected new hidden polarimetric features in rotation domain and the commonly used roll-invariant polarimetric features with a support vector machine (SVM) classifier. Comparison experiments based on AIRSAR and multi-temporal UAVSAR data demonstrate that compared with the conventional classification scheme which only uses the roll-invariant polarimetric features, the proposed classification scheme achieves both higher classification accuracy and better robustness. For AIRSAR data, the overall classification accuracy with the proposed classification scheme is 94.91 %, while that with the conventional classification scheme is 93.70 %. Moreover, for multi-temporal UAVSAR data, the averaged overall classification accuracy with the proposed classification scheme is up to 97.08 %, which is much higher than the 87.79 % from the conventional classification scheme. Furthermore, for multitemporal PolSAR data, the proposed classification scheme can achieve better robustness. The comparison studies also clearly demonstrate that mining and utilization of hidden polarimetric features and information in the rotation domain can gain the added benefits for PolSAR land cover classification and provide a new vision for PolSAR image interpretation and application.
32 CFR 1642.2 - The claim for classification in Class 3-A.
Code of Federal Regulations, 2010 CFR
2010-07-01
... SYSTEM CLASSIFICATION OF REGISTRANTS DEFERRED BECAUSE OF HARDSHIP TO DEPENDENTS § 1642.2 The claim for classification in Class 3-A. A claim for classification in Class 3-A must be made by the registrant in writing... 32 National Defense 6 2010-07-01 2010-07-01 false The claim for classification in Class 3-A. 1642...
32 CFR 1642.2 - The claim for classification in Class 3-A.
Code of Federal Regulations, 2011 CFR
2011-07-01
... SYSTEM CLASSIFICATION OF REGISTRANTS DEFERRED BECAUSE OF HARDSHIP TO DEPENDENTS § 1642.2 The claim for classification in Class 3-A. A claim for classification in Class 3-A must be made by the registrant in writing... 32 National Defense 6 2011-07-01 2011-07-01 false The claim for classification in Class 3-A. 1642...
Shao, Wei; Liu, Mingxia; Zhang, Daoqiang
2016-01-01
The systematic study of subcellular location pattern is very important for fully characterizing the human proteome. Nowadays, with the great advances in automated microscopic imaging, accurate bioimage-based classification methods to predict protein subcellular locations are highly desired. All existing models were constructed on the independent parallel hypothesis, where the cellular component classes are positioned independently in a multi-class classification engine. The important structural information of cellular compartments is missed. To deal with this problem for developing more accurate models, we proposed a novel cell structure-driven classifier construction approach (SC-PSorter) by employing the prior biological structural information in the learning model. Specifically, the structural relationship among the cellular components is reflected by a new codeword matrix under the error correcting output coding framework. Then, we construct multiple SC-PSorter-based classifiers corresponding to the columns of the error correcting output coding codeword matrix using a multi-kernel support vector machine classification approach. Finally, we perform the classifier ensemble by combining those multiple SC-PSorter-based classifiers via majority voting. We evaluate our method on a collection of 1636 immunohistochemistry images from the Human Protein Atlas database. The experimental results show that our method achieves an overall accuracy of 89.0%, which is 6.4% higher than the state-of-the-art method. The dataset and code can be downloaded from https://github.com/shaoweinuaa/. dqzhang@nuaa.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Automatic photointerpretation for plant species and stress identification (ERTS-A1)
NASA Technical Reports Server (NTRS)
Swanlund, G. D. (Principal Investigator); Kirvida, L.; Johnson, G. R.
1973-01-01
The author has identified the following significant results. Automatic stratification of forested land from ERTS-1 data provides a valuable tool for resource management. The results are useful for wood product yield estimates, recreation and wildlife management, forest inventory, and forest condition monitoring. Automatic procedures based on both multispectral and spatial features are evaluated. With five classes, training and testing on the same samples, classification accuracy of 74 percent was achieved using the MSS multispectral features. When adding texture computed from 8 x 8 arrays, classification accuracy of 90 percent was obtained.
Walton, David M; Lefebvre, Andy; Reynolds, Darcy
2015-06-01
Illness representations pertain to the ways in which an individual constructs and understands the experience of a health condition. The Brief Illness Perceptions Questionnaire (BIPQ) comprises 9 items intended to capture the key components of the Illness Representations Model. The purpose of this paper was to explore the utility of the BIPQ for evaluating and classifying uncomplicated mechanical neck pain in the rehabilitation setting. A convenience sample of 198 subjects presenting to physiotherapy for neck pain problems were used in this study. In the first step, 183 subjects completed the BIPQ and a series of related cognitive measures. Latent class analysis (LCA) was used to explore the number of identifiable classes amongst the sample based on BIPQ response patterns. A regression equation was created to facilitate classification. In the second step, an independent sample of 15 subjects were classified using the equation established in step 1, and they were followed over a 3 month period. The LCA revealed 3 classes of subjects with optimal fit statistics: mildly affected, moderately affected, and severely affected. Inter-group comparisons of the secondary cognitive measures supported these labels. Classification accuracy of a regression equation was high (94.5%). Applying the equation to the independent longitudinal sample revealed that it functioned equally well and that the classes may have prognostic value. The BIPQ may be a useful clinical tool for classification of neck pain. Copyright © 2014 Elsevier Ltd. All rights reserved.
Garcia-Chimeno, Yolanda; Garcia-Zapirain, Begonya
2015-01-01
The classification of subjects' pathologies enables a rigorousness to be applied to the treatment of certain pathologies, as doctors on occasions play with so many variables that they can end up confusing some illnesses with others. Thanks to Machine Learning techniques applied to a health-record database, it is possible to make using our algorithm. hClass contains a non-linear classification of either a supervised, non-supervised or semi-supervised type. The machine is configured using other techniques such as validation of the set to be classified (cross-validation), reduction in features (PCA) and committees for assessing the various classifiers. The tool is easy to use, and the sample matrix and features that one wishes to classify, the number of iterations and the subjects who are going to be used to train the machine all need to be introduced as inputs. As a result, the success rate is shown either via a classifier or via a committee if one has been formed. A 90% success rate is obtained in the ADABoost classifier and 89.7% in the case of a committee (comprising three classifiers) when PCA is applied. This tool can be expanded to allow the user to totally characterise the classifiers by adjusting them to each classification use.
NASA Astrophysics Data System (ADS)
De Bonis, Giulia; Bozza, Cristiano
2017-03-01
In the framework of Horizon 2020, the European Commission approved the ASTERICS initiative (ASTronomy ESFRI and Research Infrastructure CluSter) to collect knowledge and experiences from astronomy, astrophysics and particle physics and foster synergies among existing research infrastructures and scientific communities, hence paving the way for future ones. ASTERICS aims at producing a common set of tools and strategies to be applied in Astronomy ESFRI facilities. In particular, it will target the so-called multi-messenger approach to combine information from optical and radio telescopes, photon counters and neutrino telescopes. pLISA is a software tool under development in ASTERICS to help and promote machine learning as a unified approach to multivariate analysis of astrophysical data and signals. The library will offer a collection of classification parameters, estimators, classes and methods to be linked and used in reconstruction programs (and possibly also extended), to characterize events in terms of particle identification and energy. The pLISA library aims at offering the software infras tructure for applications developed inside different experiments and has been designed with an effort to extrapolate general, physics-related estimators from the specific features of the data model related to each particular experiment. pLISA is oriented towards parallel computing architectures, with awareness of the opportunity of using GPUs as accelerators demanding specifically optimized algorithms and to reduce the costs of pro cessing hardware requested for the reconstruction tasks. Indeed, a fast (ideally, real-time) reconstruction can open the way for the development or improvement of alert systems, typically required by multi-messenger search programmes among the different experi mental facilities involved in ASTERICS.
Multi-sensor physical activity recognition in free-living.
Ellis, Katherine; Godbole, Suneeta; Kerr, Jacqueline; Lanckriet, Gert
Physical activity monitoring in free-living populations has many applications for public health research, weight-loss interventions, context-aware recommendation systems and assistive technologies. We present a system for physical activity recognition that is learned from a free-living dataset of 40 women who wore multiple sensors for seven days. The multi-level classification system first learns low-level codebook representations for each sensor and uses a random forest classifier to produce minute-level probabilities for each activity class. Then a higher-level HMM layer learns patterns of transitions and durations of activities over time to smooth the minute-level predictions. [Formula: see text].
Chen, Lei; Liu, Tao; Zhao, Xian
2018-06-01
The anatomical therapeutic chemical (ATC) classification system is a widely accepted drug classification scheme. This system comprises five levels and includes several classes in each level. Drugs are classified into classes according to their therapeutic effects and characteristics. The first level includes 14 main classes. In this study, we proposed two network-based models to infer novel potential chemicals deemed to belong in the first level of ATC classification. To build these models, two large chemical networks were constructed using the chemical-chemical interaction information retrieved from the Search Tool for Interactions of Chemicals (STITCH). Two classic network algorithms, shortest path (SP) and random walk with restart (RWR) algorithms, were executed on the corresponding network to mine novel chemicals for each ATC class using the validated drugs in a class as seed nodes. Then, the obtained chemicals yielded by these two algorithms were further evaluated by a permutation test and an association test. The former can exclude chemicals produced by the structure of the network, i.e., false positive discoveries. By contrast, the latter identifies the most important chemicals that have strong associations with the ATC class. Comparisons indicated that the two models can provide quite dissimilar results, suggesting that the results yielded by one model can be essential supplements for those obtained by the other model. In addition, several representative inferred chemicals were analyzed to confirm the reliability of the results generated by the two models. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Ferreira, Louise Brandes Moura
This study was an application of Philosophy for Children pedagogy to science education. It was designed to answer the question, What roles do a science story (Harry Discovers Science), multi-sensorial activities designed to accompany the story, and classroom dialogue associated with the story---all modeled on the Philosophy for Children curriculum---play in the learning processes of a class of fifth graders with regard to the basic science process skills of classification, observation, and inference? To answer the question, I collected qualitative data as I carried out a participatory study in which I taught science to fifth graders at an international, bilingual private religious school in Brasilia, Brazil for a period of one semester. Twenty-one (n = 21) children participated in the study, 10 females and 11 males, who came from a predominantly middle and upper class social background. Data were collected through student interviews, student class reflection sheets, written learning assessments, audiotapes of all class sessions, including whole-class and small-class group discussions, and a videotape of one class session. Some of the key findings were that the story, activities and dialogue facilitated the children's learning in a number of ways. The story modeled the performance of classification, observation and inference skills for the children as well as reflection on the meaning of inference. The majority of the students identified with the fictional characters, particularly regarding traits such as cleverness and inquisitiveness, and with the learning context of the story. The multi-sensorial activities helped children learn observation and inference skills as well as dialogue. Dialogue also helped children self-correct and build upon each other's ideas. Some students developed theories about how ideal dialogue should work. In spite of the inherent limitations of qualitative and teacher research studies, as well as the limitations of this particular study, and despite the fact that there is a need for further research to confirm the transferability of findings, this study both supports and expands to the domain of basic science process skills the claim that Philosophy for Children helps students develop thinking skills.
NASA Astrophysics Data System (ADS)
Kuttner, Benjamin George
Natural fire return intervals are relatively long in eastern Canadian boreal forests and often allow for the development of stands with multiple, successive cohorts of trees. Multi-cohort forest management (MCM) provides a strategy to maintain such multi-cohort stands that focuses on three broad phases of increasingly complex, post-fire stand development, termed "cohorts", and recommends different silvicultural approaches be applied to emulate different cohort types. Previous research on structural cohort typing has relied upon primarily subjective classification methods; in this thesis, I develop more comprehensive and objective methods for three common boreal mixedwood and black spruce forest types in northeastern Ontario. Additionally, I examine relationships between cohort types and stand age, productivity, and disturbance history and the utility of airborne LiDAR to retrieve ground-based classifications and to extend structural cohort typing from plot- to stand-levels. In both mixedwood and black spruce forest types, stand age and age-related deadwood features varied systematically with cohort classes in support of an age-based interpretation of increasing cohort complexity. However, correlations of stand age with cohort classes were surprisingly weak. Differences in site productivity had a significant effect on the accrual of increasingly complex multi-cohort stand structure in both forest types, especially in black spruce stands. The effects of past harvesting in predictive models of class membership were only significant when considered in isolation of age. As an age-emulation strategy, the three cohort model appeared to be poorly suited to black spruce forests where the accrual of structural complexity appeared to be more a function of site productivity than age. Airborne LiDAR data appear to be particularly useful in recovering plot-based cohort types and extending them to the stand-level. The main gradients of structural variability detected using LiDAR were similar between boreal mixedwood and black spruce forest types; the best LiDAR-based models of cohort type relied upon combinations of tree size, size heterogeneity, and tree density related variables. The methods described here to measure, classify, and predict cohort-related structural complexity assist in translating the conceptual three cohort model to a more precise, measurement-based management system. In addition, the approaches presented here to measure and classify stand structural complexity promise to significantly enhance the detail of structural information in operational forest inventories in support of a wide array of forest management and conservation applications.
Van Wagtendonk, Jan W.; Root, Ralph R.
2003-01-01
The objective of this study was to test the applicability of using Normalized Difference Vegetation Index (NDVI) values derived from a temporal sequence of six Landsat Thematic Mapper (TM) scenes to map fuel models for Yosemite National Park, USA. An unsupervised classification algorithm was used to define 30 unique spectral-temporal classes of NDVI values. A combination of graphical, statistical and visual techniques was used to characterize the 30 classes and identify those that responded similarly and could be combined into fuel models. The final classification of fuel models included six different types: short annual and perennial grasses, tall perennial grasses, medium brush and evergreen hardwoods, short-needled conifers with no heavy fuels, long-needled conifers and deciduous hardwoods, and short-needled conifers with a component of heavy fuels. The NDVI, when analysed over a season of phenologically distinct periods along with ancillary data, can elicit information necessary to distinguish fuel model types. Fuels information derived from remote sensors has proven to be useful for initial classification of fuels and has been applied to fire management situations on the ground.
Multi-Temporal Land Cover Classification with Sequential Recurrent Encoders
NASA Astrophysics Data System (ADS)
Rußwurm, Marc; Körner, Marco
2018-03-01
Earth observation (EO) sensors deliver data with daily or weekly temporal resolution. Most land use and land cover (LULC) approaches, however, expect cloud-free and mono-temporal observations. The increasing temporal capabilities of today's sensors enables the use of temporal, along with spectral and spatial features. Domains, such as speech recognition or neural machine translation, work with inherently temporal data and, today, achieve impressive results using sequential encoder-decoder structures. Inspired by these sequence-to-sequence models, we adapt an encoder structure with convolutional recurrent layers in order to approximate a phenological model for vegetation classes based on a temporal sequence of Sentinel 2 (S2) images. In our experiments, we visualize internal activations over a sequence of cloudy and non-cloudy images and find several recurrent cells, which reduce the input activity for cloudy observations. Hence, we assume that our network has learned cloud-filtering schemes solely from input data, which could alleviate the need for tedious cloud-filtering as a preprocessing step for many EO approaches. Moreover, using unfiltered temporal series of top-of-atmosphere (TOA) reflectance data, we achieved in our experiments state-of-the-art classification accuracies on a large number of crop classes with minimal preprocessing compared to other classification approaches.
Fast multi-scale feature fusion for ECG heartbeat classification
NASA Astrophysics Data System (ADS)
Ai, Danni; Yang, Jian; Wang, Zeyu; Fan, Jingfan; Ai, Changbin; Wang, Yongtian
2015-12-01
Electrocardiogram (ECG) is conducted to monitor the electrical activity of the heart by presenting small amplitude and duration signals; as a result, hidden information present in ECG data is difficult to determine. However, this concealed information can be used to detect abnormalities. In our study, a fast feature-fusion method of ECG heartbeat classification based on multi-linear subspace learning is proposed. The method consists of four stages. First, baseline and high frequencies are removed to segment heartbeat. Second, as an extension of wavelets, wavelet-packet decomposition is conducted to extract features. With wavelet-packet decomposition, good time and frequency resolutions can be provided simultaneously. Third, decomposed confidences are arranged as a two-way tensor, in which feature fusion is directly implemented with generalized N dimensional ICA (GND-ICA). In this method, co-relationship among different data information is considered, and disadvantages of dimensionality are prevented; this method can also be used to reduce computing compared with linear subspace-learning methods (PCA). Finally, support vector machine (SVM) is considered as a classifier in heartbeat classification. In this study, ECG records are obtained from the MIT-BIT arrhythmia database. Four main heartbeat classes are used to examine the proposed algorithm. Based on the results of five measurements, sensitivity, positive predictivity, accuracy, average accuracy, and t-test, our conclusion is that a GND-ICA-based strategy can be used to provide enhanced ECG heartbeat classification. Furthermore, large redundant features are eliminated, and classification time is reduced.
Multi-Temporal Land Cover Classification with Long Short-Term Memory Neural Networks
NASA Astrophysics Data System (ADS)
Rußwurm, M.; Körner, M.
2017-05-01
Land cover classification (LCC) is a central and wide field of research in earth observation and has already put forth a variety of classification techniques. Many approaches are based on classification techniques considering observation at certain points in time. However, some land cover classes, such as crops, change their spectral characteristics due to environmental influences and can thus not be monitored effectively with classical mono-temporal approaches. Nevertheless, these temporal observations should be utilized to benefit the classification process. After extensive research has been conducted on modeling temporal dynamics by spectro-temporal profiles using vegetation indices, we propose a deep learning approach to utilize these temporal characteristics for classification tasks. In this work, we show how long short-term memory (LSTM) neural networks can be employed for crop identification purposes with SENTINEL 2A observations from large study areas and label information provided by local authorities. We compare these temporal neural network models, i.e., LSTM and recurrent neural network (RNN), with a classical non-temporal convolutional neural network (CNN) model and an additional support vector machine (SVM) baseline. With our rather straightforward LSTM variant, we exceeded state-of-the-art classification performance, thus opening promising potential for further research.
Image classification of human carcinoma cells using complex wavelet-based covariance descriptors.
Keskin, Furkan; Suhre, Alexander; Kose, Kivanc; Ersahin, Tulin; Cetin, A Enis; Cetin-Atalay, Rengul
2013-01-01
Cancer cell lines are widely used for research purposes in laboratories all over the world. Computer-assisted classification of cancer cells can alleviate the burden of manual labeling and help cancer research. In this paper, we present a novel computerized method for cancer cell line image classification. The aim is to automatically classify 14 different classes of cell lines including 7 classes of breast and 7 classes of liver cancer cells. Microscopic images containing irregular carcinoma cell patterns are represented by subwindows which correspond to foreground pixels. For each subwindow, a covariance descriptor utilizing the dual-tree complex wavelet transform (DT-[Formula: see text]WT) coefficients and several morphological attributes are computed. Directionally selective DT-[Formula: see text]WT feature parameters are preferred primarily because of their ability to characterize edges at multiple orientations which is the characteristic feature of carcinoma cell line images. A Support Vector Machine (SVM) classifier with radial basis function (RBF) kernel is employed for final classification. Over a dataset of 840 images, we achieve an accuracy above 98%, which outperforms the classical covariance-based methods. The proposed system can be used as a reliable decision maker for laboratory studies. Our tool provides an automated, time- and cost-efficient analysis of cancer cell morphology to classify different cancer cell lines using image-processing techniques, which can be used as an alternative to the costly short tandem repeat (STR) analysis. The data set used in this manuscript is available as supplementary material through http://signal.ee.bilkent.edu.tr/cancerCellLineClassificationSampleImages.html.
Image Classification of Human Carcinoma Cells Using Complex Wavelet-Based Covariance Descriptors
Keskin, Furkan; Suhre, Alexander; Kose, Kivanc; Ersahin, Tulin; Cetin, A. Enis; Cetin-Atalay, Rengul
2013-01-01
Cancer cell lines are widely used for research purposes in laboratories all over the world. Computer-assisted classification of cancer cells can alleviate the burden of manual labeling and help cancer research. In this paper, we present a novel computerized method for cancer cell line image classification. The aim is to automatically classify 14 different classes of cell lines including 7 classes of breast and 7 classes of liver cancer cells. Microscopic images containing irregular carcinoma cell patterns are represented by subwindows which correspond to foreground pixels. For each subwindow, a covariance descriptor utilizing the dual-tree complex wavelet transform (DT-WT) coefficients and several morphological attributes are computed. Directionally selective DT-WT feature parameters are preferred primarily because of their ability to characterize edges at multiple orientations which is the characteristic feature of carcinoma cell line images. A Support Vector Machine (SVM) classifier with radial basis function (RBF) kernel is employed for final classification. Over a dataset of 840 images, we achieve an accuracy above 98%, which outperforms the classical covariance-based methods. The proposed system can be used as a reliable decision maker for laboratory studies. Our tool provides an automated, time- and cost-efficient analysis of cancer cell morphology to classify different cancer cell lines using image-processing techniques, which can be used as an alternative to the costly short tandem repeat (STR) analysis. The data set used in this manuscript is available as supplementary material through http://signal.ee.bilkent.edu.tr/cancerCellLineClassificationSampleImages.html. PMID:23341908
Multi-Class Motor Imagery EEG Decoding for Brain-Computer Interfaces
Wang, Deng; Miao, Duoqian; Blohm, Gunnar
2012-01-01
Recent studies show that scalp electroencephalography (EEG) as a non-invasive interface has great potential for brain-computer interfaces (BCIs). However, one factor that has limited practical applications for EEG-based BCI so far is the difficulty to decode brain signals in a reliable and efficient way. This paper proposes a new robust processing framework for decoding of multi-class motor imagery (MI) that is based on five main processing steps. (i) Raw EEG segmentation without the need of visual artifact inspection. (ii) Considering that EEG recordings are often contaminated not just by electrooculography (EOG) but also other types of artifacts, we propose to first implement an automatic artifact correction method that combines regression analysis with independent component analysis for recovering the original source signals. (iii) The significant difference between frequency components based on event-related (de-) synchronization and sample entropy is then used to find non-contiguous discriminating rhythms. After spectral filtering using the discriminating rhythms, a channel selection algorithm is used to select only relevant channels. (iv) Feature vectors are extracted based on the inter-class diversity and time-varying dynamic characteristics of the signals. (v) Finally, a support vector machine is employed for four-class classification. We tested our proposed algorithm on experimental data that was obtained from dataset 2a of BCI competition IV (2008). The overall four-class kappa values (between 0.41 and 0.80) were comparable to other models but without requiring any artifact-contaminated trial removal. The performance showed that multi-class MI tasks can be reliably discriminated using artifact-contaminated EEG recordings from a few channels. This may be a promising avenue for online robust EEG-based BCI applications. PMID:23087607
Exploring diversity in ensemble classification: Applications in large area land cover mapping
NASA Astrophysics Data System (ADS)
Mellor, Andrew; Boukir, Samia
2017-07-01
Ensemble classifiers, such as random forests, are now commonly applied in the field of remote sensing, and have been shown to perform better than single classifier systems, resulting in reduced generalisation error. Diversity across the members of ensemble classifiers is known to have a strong influence on classification performance - whereby classifier errors are uncorrelated and more uniformly distributed across ensemble members. The relationship between ensemble diversity and classification performance has not yet been fully explored in the fields of information science and machine learning and has never been examined in the field of remote sensing. This study is a novel exploration of ensemble diversity and its link to classification performance, applied to a multi-class canopy cover classification problem using random forests and multisource remote sensing and ancillary GIS data, across seven million hectares of diverse dry-sclerophyll dominated public forests in Victoria Australia. A particular emphasis is placed on analysing the relationship between ensemble diversity and ensemble margin - two key concepts in ensemble learning. The main novelty of our work is on boosting diversity by emphasizing the contribution of lower margin instances used in the learning process. Exploring the influence of tree pruning on diversity is also a new empirical analysis that contributes to a better understanding of ensemble performance. Results reveal insights into the trade-off between ensemble classification accuracy and diversity, and through the ensemble margin, demonstrate how inducing diversity by targeting lower margin training samples is a means of achieving better classifier performance for more difficult or rarer classes and reducing information redundancy in classification problems. Our findings inform strategies for collecting training data and designing and parameterising ensemble classifiers, such as random forests. This is particularly important in large area remote sensing applications, for which training data is costly and resource intensive to collect.
A machine learning approach to multi-level ECG signal quality classification.
Li, Qiao; Rajagopalan, Cadathur; Clifford, Gari D
2014-12-01
Current electrocardiogram (ECG) signal quality assessment studies have aimed to provide a two-level classification: clean or noisy. However, clinical usage demands more specific noise level classification for varying applications. This work outlines a five-level ECG signal quality classification algorithm. A total of 13 signal quality metrics were derived from segments of ECG waveforms, which were labeled by experts. A support vector machine (SVM) was trained to perform the classification and tested on a simulated dataset and was validated using data from the MIT-BIH arrhythmia database (MITDB). The simulated training and test datasets were created by selecting clean segments of the ECG in the 2011 PhysioNet/Computing in Cardiology Challenge database, and adding three types of real ECG noise at different signal-to-noise ratio (SNR) levels from the MIT-BIH Noise Stress Test Database (NSTDB). The MITDB was re-annotated for five levels of signal quality. Different combinations of the 13 metrics were trained and tested on the simulated datasets and the best combination that produced the highest classification accuracy was selected and validated on the MITDB. Performance was assessed using classification accuracy (Ac), and a single class overlap accuracy (OAc), which assumes that an individual type classified into an adjacent class is acceptable. An Ac of 80.26% and an OAc of 98.60% on the test set were obtained by selecting 10 metrics while 57.26% (Ac) and 94.23% (OAc) were the numbers for the unseen MITDB validation data without retraining. By performing the fivefold cross validation, an Ac of 88.07±0.32% and OAc of 99.34±0.07% were gained on the validation fold of MITDB. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Multi-label literature classification based on the Gene Ontology graph.
Jin, Bo; Muller, Brian; Zhai, Chengxiang; Lu, Xinghua
2008-12-08
The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on the literature.
32 CFR 1642.3 - Basis for classification in Class 3-A.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 32 National Defense 6 2011-07-01 2011-07-01 false Basis for classification in Class 3-A. 1642.3... CLASSIFICATION OF REGISTRANTS DEFERRED BECAUSE OF HARDSHIP TO DEPENDENTS § 1642.3 Basis for classification in... registrant for classification in Class 3-A, the board will first determine whether the registrant's wife...
14 CFR Sec. 19-4 - Service classes.
Code of Federal Regulations, 2010 CFR
2010-01-01
... a composite of first class, coach, and mixed passenger/cargo service. The following classifications... integral part of services performed pursuant to published flight schedules. The following classifications... Classifications Sec. 19-4 Service classes. The statistical classifications are designed to reflect the operating...
14 CFR Sec. 19-4 - Service classes.
Code of Federal Regulations, 2011 CFR
2011-01-01
... a composite of first class, coach, and mixed passenger/cargo service. The following classifications... integral part of services performed pursuant to published flight schedules. The following classifications... Classifications Sec. 19-4 Service classes. The statistical classifications are designed to reflect the operating...
NASA Astrophysics Data System (ADS)
Sanhouse-García, Antonio J.; Rangel-Peraza, Jesús Gabriel; Bustos-Terrones, Yaneth; García-Ferrer, Alfonso; Mesas-Carrascosa, Francisco J.
2016-02-01
Land cover classification is often based on different characteristics between their classes, but with great homogeneity within each one of them. This cover is obtained through field work or by mean of processing satellite images. Field work involves high costs; therefore, digital image processing techniques have become an important alternative to perform this task. However, in some developing countries and particularly in Casacoima municipality in Venezuela, there is a lack of geographic information systems due to the lack of updated information and high costs in software license acquisition. This research proposes a low cost methodology to develop thematic mapping of local land use and types of coverage in areas with scarce resources. Thematic mapping was developed from CBERS-2 images and spatial information available on the network using open source tools. The supervised classification method per pixel and per region was applied using different classification algorithms and comparing them among themselves. Classification method per pixel was based on Maxver algorithms (maximum likelihood) and Euclidean distance (minimum distance), while per region classification was based on the Bhattacharya algorithm. Satisfactory results were obtained from per region classification, where overall reliability of 83.93% and kappa index of 0.81% were observed. Maxver algorithm showed a reliability value of 73.36% and kappa index 0.69%, while Euclidean distance obtained values of 67.17% and 0.61% for reliability and kappa index, respectively. It was demonstrated that the proposed methodology was very useful in cartographic processing and updating, which in turn serve as a support to develop management plans and land management. Hence, open source tools showed to be an economically viable alternative not only for forestry organizations, but for the general public, allowing them to develop projects in economically depressed and/or environmentally threatened areas.
A Scatter-Based Prototype Framework and Multi-Class Extension of Support Vector Machines
Jenssen, Robert; Kloft, Marius; Zien, Alexander; Sonnenburg, Sören; Müller, Klaus-Robert
2012-01-01
We provide a novel interpretation of the dual of support vector machines (SVMs) in terms of scatter with respect to class prototypes and their mean. As a key contribution, we extend this framework to multiple classes, providing a new joint Scatter SVM algorithm, at the level of its binary counterpart in the number of optimization variables. This enables us to implement computationally efficient solvers based on sequential minimal and chunking optimization. As a further contribution, the primal problem formulation is developed in terms of regularized risk minimization and the hinge loss, revealing the score function to be used in the actual classification of test patterns. We investigate Scatter SVM properties related to generalization ability, computational efficiency, sparsity and sensitivity maps, and report promising results. PMID:23118845
[Constructing a tool for certifying handicap / disability in Colombia].
Moreno-Angarita, Marisol; Cortés-Reyes, Edgar; Cárdenas-Jiménez, Andrea; Mena-Ortiz, Luz Z; Giraldo-Rátiva, Zulma
2013-01-01
Describing how a tool was created/constructed for certifying Colombian people's disability status. This was a descriptive study involving a five-phase, multi-method design. It sought to identify needs, background, categories and procedures from differing view points, using participatory methodology, for identifying and certifying disabled people in Colombia. The study led to an international classification of functioning, disability and health (ICF)-based certification tool which can be used by a multi-professional team in healthcare institution settings to guarantee access to benefits approved by Colombian disability law. Certification (even when voluntary) can be the key to enjoying all the benefits designed for Colombian people suffering disability; such people are not the subjects of mercy and compassion anymore. Certification seeks to identify people suffering disabilities as holders of rights under Colombian law, as clear evidence of Colombian state commitment to ensuring an inclusive society.
Brain-Computer Interface Based on Generation of Visual Images
Bobrov, Pavel; Frolov, Alexander; Cantor, Charles; Fedulova, Irina; Bakhnyan, Mikhail; Zhavoronkov, Alexander
2011-01-01
This paper examines the task of recognizing EEG patterns that correspond to performing three mental tasks: relaxation and imagining of two types of pictures: faces and houses. The experiments were performed using two EEG headsets: BrainProducts ActiCap and Emotiv EPOC. The Emotiv headset becomes widely used in consumer BCI application allowing for conducting large-scale EEG experiments in the future. Since classification accuracy significantly exceeded the level of random classification during the first three days of the experiment with EPOC headset, a control experiment was performed on the fourth day using ActiCap. The control experiment has shown that utilization of high-quality research equipment can enhance classification accuracy (up to 68% in some subjects) and that the accuracy is independent of the presence of EEG artifacts related to blinking and eye movement. This study also shows that computationally-inexpensive Bayesian classifier based on covariance matrix analysis yields similar classification accuracy in this problem as a more sophisticated Multi-class Common Spatial Patterns (MCSP) classifier. PMID:21695206
Mirza, Bilal; Lin, Zhiping
2016-08-01
In this paper, a meta-cognitive online sequential extreme learning machine (MOS-ELM) is proposed for class imbalance and concept drift learning. In MOS-ELM, meta-cognition is used to self-regulate the learning by selecting suitable learning strategies for class imbalance and concept drift problems. MOS-ELM is the first sequential learning method to alleviate the imbalance problem for both binary class and multi-class data streams with concept drift. In MOS-ELM, a new adaptive window approach is proposed for concept drift learning. A single output update equation is also proposed which unifies various application specific OS-ELM methods. The performance of MOS-ELM is evaluated under different conditions and compared with methods each specific to some of the conditions. On most of the datasets in comparison, MOS-ELM outperforms the competing methods. Copyright © 2016 Elsevier Ltd. All rights reserved.
Pan, Rui; Wang, Hansheng; Li, Runze
2016-01-01
This paper is concerned with the problem of feature screening for multi-class linear discriminant analysis under ultrahigh dimensional setting. We allow the number of classes to be relatively large. As a result, the total number of relevant features is larger than usual. This makes the related classification problem much more challenging than the conventional one, where the number of classes is small (very often two). To solve the problem, we propose a novel pairwise sure independence screening method for linear discriminant analysis with an ultrahigh dimensional predictor. The proposed procedure is directly applicable to the situation with many classes. We further prove that the proposed method is screening consistent. Simulation studies are conducted to assess the finite sample performance of the new procedure. We also demonstrate the proposed methodology via an empirical analysis of a real life example on handwritten Chinese character recognition. PMID:28127109
32 CFR 1639.3 - Basis for classification in Class 2-D.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 32 National Defense 6 2010-07-01 2010-07-01 false Basis for classification in Class 2-D. 1639.3... CLASSIFICATION OF REGISTRANTS PREPARING FOR THE MINISTRY § 1639.3 Basis for classification in Class 2-D. (a) In... maintained for qualification for the deferment. (b) The registrant's classification shall be determined on...
32 CFR 1639.3 - Basis for classification in Class 2-D.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 32 National Defense 6 2011-07-01 2011-07-01 false Basis for classification in Class 2-D. 1639.3... CLASSIFICATION OF REGISTRANTS PREPARING FOR THE MINISTRY § 1639.3 Basis for classification in Class 2-D. (a) In... maintained for qualification for the deferment. (b) The registrant's classification shall be determined on...
How a national vegetation classification can help ecological research and management
Franklin, Scott; Comer, Patrick; Evens, Julie; Ezcurra, Exequiel; Faber-Langendoen, Don; Franklin, Janet; Jennings, Michael; Josse, Carmen; Lea, Chris; Loucks, Orie; Muldavin, Esteban; Peet, Robert K.; Ponomarenko, Serguei; Roberts, David G.; Solomeshch, Ayzik; Keeler-Wolf, Todd; Van Kley, James; Weakley, Alan; McKerrow, Alexa; Burke, Marianne; Spurrier, Carol
2015-01-01
The elegance of classification lies in its ability to compile and systematize various terminological conventions and masses of information that are unattainable during typical research projects. Imagine a discipline without standards for collection, analysis, and interpretation; unfortunately, that describes much of 20th-century vegetation ecology. With differing methods, how do we assess community dynamics over decades, much less centuries? How do we compare plant communities from different areas? The need for a widely applied vegetation classification has long been clear. Now imagine a multi-decade effort to assimilate hundreds of disparate vegetation classifications into one common classification for the US. In this letter, we introduce the US National Vegetation Classification (USNVC; www.usnvc.org) as a powerful tool for research and conservation, analogous to the argument made by Schimel and Chadwick (2013) for soils. The USNVC provides a national framework to classify and describe vegetation; here we describe the USNVC and offer brief examples of its efficacy.
Comparison of four approaches to a rock facies classification problem
Dubois, M.K.; Bohling, Geoffrey C.; Chakrabarti, S.
2007-01-01
In this study, seven classifiers based on four different approaches were tested in a rock facies classification problem: classical parametric methods using Bayes' rule, and non-parametric methods using fuzzy logic, k-nearest neighbor, and feed forward-back propagating artificial neural network. Determining the most effective classifier for geologic facies prediction in wells without cores in the Panoma gas field, in Southwest Kansas, was the objective. Study data include 3600 samples with known rock facies class (from core) with each sample having either four or five measured properties (wire-line log curves), and two derived geologic properties (geologic constraining variables). The sample set was divided into two subsets, one for training and one for testing the ability of the trained classifier to correctly assign classes. Artificial neural networks clearly outperformed all other classifiers and are effective tools for this particular classification problem. Classical parametric models were inadequate due to the nature of the predictor variables (high dimensional and not linearly correlated), and feature space of the classes (overlapping). The other non-parametric methods tested, k-nearest neighbor and fuzzy logic, would need considerable improvement to match the neural network effectiveness, but further work, possibly combining certain aspects of the three non-parametric methods, may be justified. ?? 2006 Elsevier Ltd. All rights reserved.
Reading the lesson: eliciting requirements for a mammography training application
NASA Astrophysics Data System (ADS)
Hartswood, M.; Blot, L.; Taylor, P.; Anderson, S.; Procter, R.; Wilkinson, L.; Smart, L.
2009-02-01
Demonstrations of a prototype training tool were used to elicit requirements for an intelligent training system for screening mammography. The prototype allowed senior radiologists (mentors) to select cases from a distributed database of images to meet the specific training requirements of junior colleagues (trainees) and then provided automated feedback in response to trainees' attempts at interpretation. The tool was demonstrated to radiologists and radiographers working in the breast screening service at four evaluation sessions. Participants highlighted ease of selecting cases that can deliver specific learning objectives as important for delivering effective training. To usefully structure a large data set of training images we undertook a classification exercise of mentor authored free text 'learning points' attached to training case obtained from two screening centres (n=333, n=129 respectively). We were able to adduce a hierarchy of abstract categories representing classes of lesson that groups of cases were intended to convey (e.g. Temporal change, Misleading juxtapositions, Position of lesion, Typical/Atypical presentation, and so on). In this paper we present the method used to devise this classification, the classification scheme itself, initial user-feedback, and our plans to incorporated it into a software tool to aid case selection.
Choi, Joon Yul; Yoo, Tae Keun; Seo, Jeong Gi; Kwak, Jiyong; Um, Terry Taewoong; Rim, Tyler Hyungtaek
2017-01-01
Deep learning emerges as a powerful tool for analyzing medical images. Retinal disease detection by using computer-aided diagnosis from fundus image has emerged as a new method. We applied deep learning convolutional neural network by using MatConvNet for an automated detection of multiple retinal diseases with fundus photographs involved in STructured Analysis of the REtina (STARE) database. Dataset was built by expanding data on 10 categories, including normal retina and nine retinal diseases. The optimal outcomes were acquired by using a random forest transfer learning based on VGG-19 architecture. The classification results depended greatly on the number of categories. As the number of categories increased, the performance of deep learning models was diminished. When all 10 categories were included, we obtained results with an accuracy of 30.5%, relative classifier information (RCI) of 0.052, and Cohen's kappa of 0.224. Considering three integrated normal, background diabetic retinopathy, and dry age-related macular degeneration, the multi-categorical classifier showed accuracy of 72.8%, 0.283 RCI, and 0.577 kappa. In addition, several ensemble classifiers enhanced the multi-categorical classification performance. The transfer learning incorporated with ensemble classifier of clustering and voting approach presented the best performance with accuracy of 36.7%, 0.053 RCI, and 0.225 kappa in the 10 retinal diseases classification problem. First, due to the small size of datasets, the deep learning techniques in this study were ineffective to be applied in clinics where numerous patients suffering from various types of retinal disorders visit for diagnosis and treatment. Second, we found that the transfer learning incorporated with ensemble classifiers can improve the classification performance in order to detect multi-categorical retinal diseases. Further studies should confirm the effectiveness of algorithms with large datasets obtained from hospitals.
NASA Technical Reports Server (NTRS)
Spruce, J. P.; Smoot, James; Ellis, Jean; Hilbert, Kent; Swann, Roberta
2012-01-01
This paper discusses the development and implementation of a geospatial data processing method and multi-decadal Landsat time series for computing general coastal U.S. land-use and land-cover (LULC) classifications and change products consisting of seven classes (water, barren, upland herbaceous, non-woody wetland, woody upland, woody wetland, and urban). Use of this approach extends the observational period of the NOAA-generated Coastal Change and Analysis Program (C-CAP) products by almost two decades, assuming the availability of one cloud free Landsat scene from any season for each targeted year. The Mobile Bay region in Alabama was used as a study area to develop, demonstrate, and validate the method that was applied to derive LULC products for nine dates at approximate five year intervals across a 34-year time span, using single dates of data for each classification in which forests were either leaf-on, leaf-off, or mixed senescent conditions. Classifications were computed and refined using decision rules in conjunction with unsupervised classification of Landsat data and C-CAP value-added products. Each classification's overall accuracy was assessed by comparing stratified random locations to available reference data, including higher spatial resolution satellite and aerial imagery, field survey data, and raw Landsat RGBs. Overall classification accuracies ranged from 83 to 91% with overall Kappa statistics ranging from 0.78 to 0.89. The accuracies are comparable to those from similar, generalized LULC products derived from C-CAP data. The Landsat MSS-based LULC product accuracies are similar to those from Landsat TM or ETM+ data. Accurate classifications were computed for all nine dates, yielding effective results regardless of season. This classification method yielded products that were used to compute LULC change products via additive GIS overlay techniques.
NASA Astrophysics Data System (ADS)
Mücher, C. A.; Roupioz, L.; Kramer, H.; Bogers, M. M. B.; Jongman, R. H. G.; Lucas, R. M.; Kosmidou, V. E.; Petrou, Z.; Manakos, I.; Padoa-Schioppa, E.; Adamo, M.; Blonda, P.
2015-05-01
A major challenge is to develop a biodiversity observation system that is cost effective and applicable in any geographic region. Measuring and reliable reporting of trends and changes in biodiversity requires amongst others detailed and accurate land cover and habitat maps in a standard and comparable way. The objective of this paper is to assess the EODHaM (EO Data for Habitat Mapping) classification results for a Dutch case study. The EODHaM system was developed within the BIO_SOS (The BIOdiversity multi-SOurce monitoring System: from Space TO Species) project and contains the decision rules for each land cover and habitat class based on spectral and height information. One of the main findings is that canopy height models, as derived from LiDAR, in combination with very high resolution satellite imagery provides a powerful input for the EODHaM system for the purpose of generic land cover and habitat mapping for any location across the globe. The assessment of the EODHaM classification results based on field data showed an overall accuracy of 74% for the land cover classes as described according to the Food and Agricultural Organization (FAO) Land Cover Classification System (LCCS) taxonomy at level 3, while the overall accuracy was lower (69.0%) for the habitat map based on the General Habitat Category (GHC) system for habitat surveillance and monitoring. A GHC habitat class is determined for each mapping unit on the basis of the composition of the individual life forms and height measurements. The classification showed very good results for forest phanerophytes (FPH) when individual life forms were analyzed in terms of their percentage coverage estimates per mapping unit from the LCCS classification and validated with field surveys. Analysis for shrubby chamaephytes (SCH) showed less accurate results, but might also be due to less accurate field estimates of percentage coverage. Overall, the EODHaM classification results encouraged us to derive the heights of all vegetated objects in the Netherlands from LiDAR data, in preparation for new habitat classifications.
An AUC-based permutation variable importance measure for random forests
2013-01-01
Background The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. Results We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings. Conclusions The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html. PMID:23560875
An AUC-based permutation variable importance measure for random forests.
Janitza, Silke; Strobl, Carolin; Boulesteix, Anne-Laure
2013-04-05
The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings. The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html.
Treece, C
1982-05-01
The author describes the use of the DSM-III's diagnostic criteria and classification system as a research instrument and discusses some of the advantages and drawbacks of DMS-III for a specific type of study. A rearrangement of the hierarchical order of the DSM-III diagnostic classes is suggested. This rearrangement provides for levels of certainty in analyzing interrater reliability and offers a simplified framework for summarizing group data. When this approach is combined with a structured interview and response format, it provides a flexible way of managing a large classification system for a smaller study without sacrificing standardization.
LORENE: Spectral methods differential equations solver
NASA Astrophysics Data System (ADS)
Gourgoulhon, Eric; Grandclément, Philippe; Marck, Jean-Alain; Novak, Jérôme; Taniguchi, Keisuke
2016-08-01
LORENE (Langage Objet pour la RElativité NumériquE) solves various problems arising in numerical relativity, and more generally in computational astrophysics. It is a set of C++ classes and provides tools to solve partial differential equations by means of multi-domain spectral methods. LORENE classes implement basic structures such as arrays and matrices, but also abstract mathematical objects, such as tensors, and astrophysical objects, such as stars and black holes.
NASA Astrophysics Data System (ADS)
Leka, K. D.; Barnes, Graham; Wagner, Eric
2018-04-01
A classification infrastructure built upon Discriminant Analysis (DA) has been developed at NorthWest Research Associates for examining the statistical differences between samples of two known populations. Originating to examine the physical differences between flare-quiet and flare-imminent solar active regions, we describe herein some details of the infrastructure including: parametrization of large datasets, schemes for handling "null" and "bad" data in multi-parameter analysis, application of non-parametric multi-dimensional DA, an extension through Bayes' theorem to probabilistic classification, and methods invoked for evaluating classifier success. The classifier infrastructure is applicable to a wide range of scientific questions in solar physics. We demonstrate its application to the question of distinguishing flare-imminent from flare-quiet solar active regions, updating results from the original publications that were based on different data and much smaller sample sizes. Finally, as a demonstration of "Research to Operations" efforts in the space-weather forecasting context, we present the Discriminant Analysis Flare Forecasting System (DAFFS), a near-real-time operationally-running solar flare forecasting tool that was developed from the research-directed infrastructure.
LAMOST OBSERVATIONS IN THE KEPLER FIELD: SPECTRAL CLASSIFICATION WITH THE MKCLASS CODE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gray, R. O.; Corbally, C. J.; Cat, P. De
2016-01-15
The LAMOST-Kepler project was designed to obtain high-quality, low-resolution spectra of many of the stars in the Kepler field with the Large Sky Area Multi Object Fiber Spectroscopic Telescope (LAMOST) spectroscopic telescope. To date 101,086 spectra of 80,447 objects over the entire Kepler field have been acquired. Physical parameters, radial velocities, and rotational velocities of these stars will be reported in other papers. In this paper we present MK spectral classifications for these spectra determined with the automatic classification code MKCLASS. We discuss the quality and reliability of the spectral types and present histograms showing the frequency of the spectralmore » types in the main table organized according to luminosity class. Finally, as examples of the use of this spectral database, we compute the proportion of A-type stars that are Am stars, and identify 32 new barium dwarf candidates.« less
A Novel Segment-Based Approach for Improving Classification Performance of Transport Mode Detection.
Guvensan, M Amac; Dusun, Burak; Can, Baris; Turkmen, H Irem
2017-12-30
Transportation planning and solutions have an enormous impact on city life. To minimize the transport duration, urban planners should understand and elaborate the mobility of a city. Thus, researchers look toward monitoring people's daily activities including transportation types and duration by taking advantage of individual's smartphones. This paper introduces a novel segment-based transport mode detection architecture in order to improve the results of traditional classification algorithms in the literature. The proposed post-processing algorithm, namely the Healing algorithm, aims to correct the misclassification results of machine learning-based solutions. Our real-life test results show that the Healing algorithm could achieve up to 40% improvement of the classification results. As a result, the implemented mobile application could predict eight classes including stationary, walking, car, bus, tram, train, metro and ferry with a success rate of 95% thanks to the proposed multi-tier architecture and Healing algorithm.
Tran, Thi Huong Giang; Ressl, Camillo; Pfeifer, Norbert
2018-02-03
This paper suggests a new approach for change detection (CD) in 3D point clouds. It combines classification and CD in one step using machine learning. The point cloud data of both epochs are merged for computing features of four types: features describing the point distribution, a feature relating to relative terrain elevation, features specific for the multi-target capability of laser scanning, and features combining the point clouds of both epochs to identify the change. All these features are merged in the points and then training samples are acquired to create the model for supervised classification, which is then applied to the whole study area. The final results reach an overall accuracy of over 90% for both epochs of eight classes: lost tree, new tree, lost building, new building, changed ground, unchanged building, unchanged tree, and unchanged ground.
Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆
Cao, Houwei; Verma, Ragini; Nenkova, Ani
2014-01-01
We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion. PMID:25422534
Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆
Cao, Houwei; Verma, Ragini; Nenkova, Ani
2015-01-01
We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion.
NASA Technical Reports Server (NTRS)
Degrandi, G.; Lavalle, C.; Degroof, H.; Sieber, A.
1992-01-01
A study on the performance of a supervised fully polarimetric maximum likelihood classifier for synthetic aperture radar (SAR) data when applied to a specific classification context: forest classification based on age classes and in the presence of a sloping terrain is presented. For the experimental part, the polarimetric AIRSAR data at P, L, and C-band, acquired over the German Black Forest near Freiburg in the frame of the 1989 MAESTRO-1 campaign and the 1991 MAC Europe campaign was used, MAESTRO-1 with an ESA/JRC sponsored campaign, and MAC Europe (Multi-sensor Aircraft Campaign); in both cases the multi-frequency polarimetric JPL Airborne Synthetic Aperture Radar (AIRSAR) radar was flown over a number of European test sites. The study is structured as follows. At first, the general characteristics of the classifier and the dependencies from some parameters, like frequency bands, feature vector, calibration, using test areas lying on a flat terrain are investigated. Once it is determined the optimal conditions for the classifier performance, we then move on to the study of the slope effect. The bulk of this work is performed using the Maestrol data set. Next the classifier performance with the MAC Europe data is considered. The study is divided into two stages: first some of the tests done on the Maestro data are repeated, to highlight the improvements due to the new processing scheme that delivers 16 look data. Second we experiment with multi images classification with two goals: to assess the possibility of using a training set measured from one image to classify areas in different images; and to classify areas on critical slopes using different viewing angles. The main points of the study are listed and some of the results obtained so far are highlighted.
Multi-class machine classification of suicide-related communication on Twitter.
Burnap, Pete; Colombo, Gualtiero; Amery, Rosie; Hodorog, Andrei; Scourfield, Jonathan
2017-08-01
The World Wide Web, and online social networks in particular, have increased connectivity between people such that information can spread to millions of people in a matter of minutes. This form of online collective contagion has provided many benefits to society, such as providing reassurance and emergency management in the immediate aftermath of natural disasters. However, it also poses a potential risk to vulnerable Web users who receive this information and could subsequently come to harm. One example of this would be the spread of suicidal ideation in online social networks, about which concerns have been raised. In this paper we report the results of a number of machine classifiers built with the aim of classifying text relating to suicide on Twitter. The classifier distinguishes between the more worrying content, such as suicidal ideation, and other suicide-related topics such as reporting of a suicide, memorial, campaigning and support. It also aims to identify flippant references to suicide. We built a set of baseline classifiers using lexical, structural, emotive and psychological features extracted from Twitter posts. We then improved on the baseline classifiers by building an ensemble classifier using the Rotation Forest algorithm and a Maximum Probability voting classification decision method, based on the outcome of base classifiers. This achieved an F-measure of 0.728 overall (for 7 classes, including suicidal ideation) and 0.69 for the suicidal ideation class. We summarise the results by reflecting on the most significant predictive principle components of the suicidal ideation class to provide insight into the language used on Twitter to express suicidal ideation. Finally, we perform a 12-month case study of suicide-related posts where we further evaluate the classification approach - showing a sustained classification performance and providing anonymous insights into the trends and demographic profile of Twitter users posting content of this type.
Classification of hyperspectral imagery with neural networks: comparison to conventional tools
NASA Astrophysics Data System (ADS)
Merényi, Erzsébet; Farrand, William H.; Taranik, James V.; Minor, Timothy B.
2014-12-01
Efficient exploitation of hyperspectral imagery is of great importance in remote sensing. Artificial intelligence approaches have been receiving favorable reviews for classification of hyperspectral data because the complexity of such data challenges the limitations of many conventional methods. Artificial neural networks (ANNs) were shown to outperform traditional classifiers in many situations. However, studies that use the full spectral dimensionality of hyperspectral images to classify a large number of surface covers are scarce if non-existent. We advocate the need for methods that can handle the full dimensionality and a large number of classes to retain the discovery potential and the ability to discriminate classes with subtle spectral differences. We demonstrate that such a method exists in the family of ANNs. We compare the maximum likelihood, Mahalonobis distance, minimum distance, spectral angle mapper, and a hybrid ANN classifier for real hyperspectral AVIRIS data, using the full spectral resolution to map 23 cover types and using a small training set. Rigorous evaluation of the classification accuracies shows that the ANN outperforms the other methods and achieves ≈90% accuracy on test data.
NASA Astrophysics Data System (ADS)
Malekmohammadi, Bahram; Ramezani Mehrian, Majid; Jafari, Hamid Reza
2012-11-01
One of the most important water-resources management strategies for arid lands is managed aquifer recharge (MAR). In establishing a MAR scheme, site selection is the prime prerequisite that can be assisted by geographic information system (GIS) tools. One of the most important uncertainties in the site-selection process using GIS is finite ranges or intervals resulting from data classification. In order to reduce these uncertainties, a novel method has been developed involving the integration of multi-criteria decision making (MCDM), GIS, and a fuzzy inference system (FIS). The Shemil-Ashkara plain in the Hormozgan Province of Iran was selected as the case study; slope, geology, groundwater depth, potential for runoff, land use, and groundwater electrical conductivity have been considered as site-selection factors. By defining fuzzy membership functions for the input layers and the output layer, and by constructing fuzzy rules, a FIS has been developed. Comparison of the results produced by the proposed method and the traditional simple additive weighted (SAW) method shows that the proposed method yields more precise results. In conclusion, fuzzy-set theory can be an effective method to overcome associated uncertainties in classification of geographic information data.
Wu, Baolin
2006-02-15
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
Classification of EEG abnormalities in partial epilepsy with simultaneous EEG-fMRI recordings.
Pedreira, C; Vaudano, A E; Thornton, R C; Chaudhary, U J; Vulliemoz, S; Laufs, H; Rodionov, R; Carmichael, D W; Lhatoo, S D; Guye, M; Quian Quiroga, R; Lemieux, L
2014-10-01
Scalp EEG recordings and the classification of interictal epileptiform discharges (IED) in patients with epilepsy provide valuable information about the epileptogenic network, particularly by defining the boundaries of the "irritative zone" (IZ), and hence are helpful during pre-surgical evaluation of patients with severe refractory epilepsies. The current detection and classification of epileptiform signals essentially rely on expert observers. This is a very time-consuming procedure, which also leads to inter-observer variability. Here, we propose a novel approach to automatically classify epileptic activity and show how this method provides critical and reliable information related to the IZ localization beyond the one provided by previous approaches. We applied Wave_clus, an automatic spike sorting algorithm, for the classification of IED visually identified from pre-surgical simultaneous Electroencephalogram-functional Magnetic Resonance Imagining (EEG-fMRI) recordings in 8 patients affected by refractory partial epilepsy candidate for surgery. For each patient, two fMRI analyses were performed: one based on the visual classification and one based on the algorithmic sorting. This novel approach successfully identified a total of 29 IED classes (compared to 26 for visual identification). The general concordance between methods was good, providing a full match of EEG patterns in 2 cases, additional EEG information in 2 other cases and, in general, covering EEG patterns of the same areas as expert classification in 7 of the 8 cases. Most notably, evaluation of the method with EEG-fMRI data analysis showed hemodynamic maps related to the majority of IED classes representing improved performance than the visual IED classification-based analysis (72% versus 50%). Furthermore, the IED-related BOLD changes revealed by using the algorithm were localized within the presumed IZ for a larger number of IED classes (9) in a greater number of patients than the expert classification (7 and 5, respectively). In contrast, in only one case presented the new algorithm resulted in fewer classes and activation areas. We propose that the use of automated spike sorting algorithms to classify IED provides an efficient tool for mapping IED-related fMRI changes and increases the EEG-fMRI clinical value for the pre-surgical assessment of patients with severe epilepsy. Copyright © 2014 Elsevier Inc. All rights reserved.
Michez, Adrien; Piégay, Hervé; Lisein, Jonathan; Claessens, Hugues; Lejeune, Philippe
2016-03-01
Riparian forests are critically endangered many anthropogenic pressures and natural hazards. The importance of riparian zones has been acknowledged by European Directives, involving multi-scale monitoring. The use of this very-high-resolution and hyperspatial imagery in a multi-temporal approach is an emerging topic. The trend is reinforced by the recent and rapid growth of the use of the unmanned aerial system (UAS), which has prompted the development of innovative methodology. Our study proposes a methodological framework to explore how a set of multi-temporal images acquired during a vegetative period can differentiate some of the deciduous riparian forest species and their health conditions. More specifically, the developed approach intends to identify, through a process of variable selection, which variables derived from UAS imagery and which scale of image analysis are the most relevant to our objectives.The methodological framework is applied to two study sites to describe the riparian forest through two fundamental characteristics: the species composition and the health condition. These characteristics were selected not only because of their use as proxies for the riparian zone ecological integrity but also because of their use for river management.The comparison of various scales of image analysis identified the smallest object-based image analysis (OBIA) objects (ca. 1 m(2)) as the most relevant scale. Variables derived from spectral information (bands ratios) were identified as the most appropriate, followed by variables related to the vertical structure of the forest. Classification results show good overall accuracies for the species composition of the riparian forest (five classes, 79.5 and 84.1% for site 1 and site 2). The classification scenario regarding the health condition of the black alders of the site 1 performed the best (90.6%).The quality of the classification models developed with a UAS-based, cost-effective, and semi-automatic approach competes successfully with those developed using more expensive imagery, such as multi-spectral and hyperspectral airborne imagery. The high overall accuracy results obtained by the classification of the diseased alders open the door to applications dedicated to monitoring of the health conditions of riparian forest. Our methodological framework will allow UAS users to manage large imagery metric datasets derived from those dense time series.
Modified Angle's Classification for Primary Dentition.
Chandranee, Kaushik Narendra; Chandranee, Narendra Jayantilal; Nagpal, Devendra; Lamba, Gagandeep; Choudhari, Purva; Hotwani, Kavita
2017-01-01
This study aims to propose a modification of Angle's classification for primary dentition and to assess its applicability in children from Central India, Nagpur. Modification in Angle's classification has been proposed for application in primary dentition. Small roman numbers i/ii/iii are used for primary dentition notation to represent Angle's Class I/II/III molar relationships as in permanent dentition, respectively. To assess applicability of modified Angle's classification a cross-sectional preschool 2000 children population from central India; 3-6 years of age residing in Nagpur metropolitan city of Maharashtra state were selected randomly as per the inclusion and exclusion criteria. Majority 93.35% children were found to have bilateral Class i followed by 2.5% bilateral Class ii and 0.2% bilateral half cusp Class iii molar relationships as per the modified Angle's classification for primary dentition. About 3.75% children had various combinations of Class ii relationships and 0.2% children were having Class iii subdivision relationship. Modification of Angle's classification for application in primary dentition has been proposed. A cross-sectional investigation using new classification revealed various 6.25% Class ii and 0.4% Class iii molar relationships cases in preschool children population in a metropolitan city of Nagpur. Application of the modified Angle's classification to other population groups is warranted to validate its routine application in clinical pediatric dentistry.
Zhao, Xin; Kuipers, Oscar P
2016-11-07
Gram-positive bacteria of the Bacillales are important producers of antimicrobial compounds that might be utilized for medical, food or agricultural applications. Thanks to the wide availability of whole genome sequence data and the development of specific genome mining tools, novel antimicrobial compounds, either ribosomally- or non-ribosomally produced, of various Bacillales species can be predicted and classified. Here, we provide a classification scheme of known and putative antimicrobial compounds in the specific context of Bacillales species. We identify and describe known and putative bacteriocins, non-ribosomally synthesized peptides (NRPs), polyketides (PKs) and other antimicrobials from 328 whole-genome sequenced strains of 57 species of Bacillales by using web based genome-mining prediction tools. We provide a classification scheme for these bacteriocins, update the findings of NRPs and PKs and investigate their characteristics and suitability for biocontrol by describing per class their genetic organization and structure. Moreover, we highlight the potential of several known and novel antimicrobials from various species of Bacillales. Our extended classification of antimicrobial compounds demonstrates that Bacillales provide a rich source of novel antimicrobials that can now readily be tapped experimentally, since many new gene clusters are identified.
Chandonia, John-Marc; Fox, Naomi K; Brenner, Steven E
2017-02-03
SCOPe (Structural Classification of Proteins-extended, http://scop.berkeley.edu) is a database of relationships between protein structures that extends the Structural Classification of Proteins (SCOP) database. SCOP is an expert-curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. SCOPe classifies the majority of protein structures released since SCOP development concluded in 2009, using a combination of manual curation and highly precise automated tools, aiming to have the same accuracy as fully hand-curated SCOP releases. SCOPe also incorporates and updates the ASTRAL compendium, which provides several databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. SCOPe continues high-quality manual classification of new superfamilies, a key feature of SCOP. Artifacts such as expression tags are now separated into their own class, in order to distinguish them from the homology-based annotations in the remainder of the SCOPe hierarchy. SCOPe 2.06 contains 77,439 Protein Data Bank entries, double the 38,221 structures classified in SCOP. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Classification of ROTSE Variable Stars using Machine Learning
NASA Astrophysics Data System (ADS)
Wozniak, P. R.; Akerlof, C.; Amrose, S.; Brumby, S.; Casperson, D.; Gisler, G.; Kehoe, R.; Lee, B.; Marshall, S.; McGowan, K. E.; McKay, T.; Perkins, S.; Priedhorsky, W.; Rykoff, E.; Smith, D. A.; Theiler, J.; Vestrand, W. T.; Wren, J.; ROTSE Collaboration
2001-12-01
We evaluate several Machine Learning algorithms as potential tools for automated classification of variable stars. Using the ROTSE sample of ~1800 variables from a pilot study of 5% of the whole sky, we compare the effectiveness of a supervised technique (Support Vector Machines, SVM) versus unsupervised methods (K-means and Autoclass). There are 8 types of variables in the sample: RR Lyr AB, RR Lyr C, Delta Scuti, Cepheids, detached eclipsing binaries, contact binaries, Miras and LPVs. Preliminary results suggest a very high ( ~95%) efficiency of SVM in isolating a few best defined classes against the rest of the sample, and good accuracy ( ~70-75%) for all classes considered simultaneously. This includes some degeneracies, irreducible with the information at hand. Supervised methods naturally outperform unsupervised methods, in terms of final error rate, but unsupervised methods offer many advantages for large sets of unlabeled data. Therefore, both types of methods should be considered as promising tools for mining vast variability surveys. We project that there are more than 30,000 periodic variables in the ROTSE-I data base covering the entire local sky between V=10 and 15.5 mag. This sample size is already stretching the time capabilities of human analysts.
Slovut, David Paul; Gray, Bruce H; Saiar, Amin; Bates, Mark C
2017-08-01
Since 2005, the American Board of Vascular Medicine (ABVM) endovascular examination has been used to certify vascular practitioners. Annual rigorous review has confirmed it is psychometrically valid and reliable. However, the evidence basis underlying the examination items has not been studied systematically. The aim of this study was to adjudicate class of recommendation (COR) and level of evidence (LOE) for the 2015 ABVM endovascular examination and establish an additional feedback mechanism for examination improvement based on contemporary evidence-based guidelines. We performed a pooled consensus process to classify each of the 110 items in the 2015 ABVM endovascular examination by COR and LOE as detailed in the current guideline statements. We added additional categories for items that were not eligible for assignment using traditional current evidence-based metrics: 'COR X', cannot be determined, not applicable, or simple recognition; and 'LOE X', cannot be determined or not applicable. COR classifications were assigned in the following proportion: Class I=15%, Class II=40%, Class III=3%, COR X=42%. LOE classifications were assigned in the following proportion: Level A=12%, Level B=34%, Level C=32%, LOE X=22%. Our analysis showed that nearly half of the 2015 ABVM endovascular examination items were supported by strong scientific evidence or fact-based knowledge. COR and LOE analysis yielded notably different results. Use of alternate classification schema may be powerful tools for improving certification exams in healthcare.
Speech Music Discrimination Using Class-Specific Features
2004-08-01
Speech Music Discrimination Using Class-Specific Features Thomas Beierholm...between speech and music . Feature extraction is class-specific and can therefore be tailored to each class meaning that segment size, model orders...interest. Some of the applications of audio signal classification are speech/ music classification [1], acoustical environmental classification [2][3
Brain tumor segmentation based on local independent projection-based classification.
Huang, Meiyan; Yang, Wei; Wu, Yao; Jiang, Jun; Chen, Wufan; Feng, Qianjin
2014-10-01
Brain tumor segmentation is an important procedure for early tumor diagnosis and radiotherapy planning. Although numerous brain tumor segmentation methods have been presented, enhancing tumor segmentation methods is still challenging because brain tumor MRI images exhibit complex characteristics, such as high diversity in tumor appearance and ambiguous tumor boundaries. To address this problem, we propose a novel automatic tumor segmentation method for MRI images. This method treats tumor segmentation as a classification problem. Additionally, the local independent projection-based classification (LIPC) method is used to classify each voxel into different classes. A novel classification framework is derived by introducing the local independent projection into the classical classification model. Locality is important in the calculation of local independent projections for LIPC. Locality is also considered in determining whether local anchor embedding is more applicable in solving linear projection weights compared with other coding methods. Moreover, LIPC considers the data distribution of different classes by learning a softmax regression model, which can further improve classification performance. In this study, 80 brain tumor MRI images with ground truth data are used as training data and 40 images without ground truth data are used as testing data. The segmentation results of testing data are evaluated by an online evaluation tool. The average dice similarities of the proposed method for segmenting complete tumor, tumor core, and contrast-enhancing tumor on real patient data are 0.84, 0.685, and 0.585, respectively. These results are comparable to other state-of-the-art methods.
NASA Technical Reports Server (NTRS)
Djorgovski, Stanislav
1992-01-01
The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.
NASA Astrophysics Data System (ADS)
Jalbuena, Rey L.; Peralta, Rudolph V.; Tamondong, Ayin M.
2016-10-01
Mangroves are trees or shrubs that grows at the surface between the land and the sea in tropical and sub-tropical latitudes. Mangroves are essential in supporting various marine life, thus, it is important to preserve and manage these areas. There are many approaches in creating Mangroves maps, one of which is through the use of Light Detection and Ranging (LiDAR). It is a remote sensing technique which uses light pulses to measure distances and to generate three-dimensional point clouds of the Earth's surface. In this study, the topographic LiDAR Data will be used to analyze the geophysical features of the terrain and create a Mangrove map. The dataset that we have were first pre-processed using the LAStools software. It is a software that is used to process LiDAR data sets and create different layers such as DSM, DTM, nDSM, Slope, LiDAR Intensity, LiDAR number of first returns, and CHM. All the aforementioned layers together was used to derive the Mangrove class. Then, an Object-based Image Analysis (OBIA) was performed using eCognition. OBIA analyzes a group of pixels with similar properties called objects, as compared to the traditional pixel-based which only examines a single pixel. Multi-threshold and multiresolution segmentation were used to delineate the different classes and split the image into objects. There are four levels of classification, first is the separation of the Land from the Water. Then the Land class was further dived into Ground and Non-ground objects. Furthermore classification of Nonvegetation, Mangroves, and Other Vegetation was done from the Non-ground objects. Lastly Separation of the mangrove class was done through the Use of field verified training points which was then run into a Support Vector Machine (SVM) classification. Different classes were separated using the different layer feature properties, such as mean, mode, standard deviation, geometrical properties, neighbor-related properties, and textural properties. Accuracy assessment was done using a different set of field validation points. This workflow was applied in the classification of Mangroves to a LiDAR dataset of Naawan and Manticao, Misamis Oriental, Philippines. The process presented in this study shows that LiDAR data and its derivatives can be used in extracting and creating Mangrove maps, which can be helpful in managing coastal environment.
Automated diagnosis of interstitial lung diseases and emphysema in MDCT imaging
NASA Astrophysics Data System (ADS)
Fetita, Catalin; Chang Chien, Kuang-Che; Brillet, Pierre-Yves; Prêteux, Françoise
2007-09-01
Diffuse lung diseases (DLD) include a heterogeneous group of non-neoplasic disease resulting from damage to the lung parenchyma by varying patterns of inflammation. Characterization and quantification of DLD severity using MDCT, mainly in interstitial lung diseases and emphysema, is an important issue in clinical research for the evaluation of new therapies. This paper develops a 3D automated approach for detection and diagnosis of diffuse lung diseases such as fibrosis/honeycombing, ground glass and emphysema. The proposed methodology combines multi-resolution 3D morphological filtering (exploiting the sup-constrained connection cost operator) and graph-based classification for a full characterization of the parenchymal tissue. The morphological filtering performs a multi-level segmentation of the low- and medium-attenuated lung regions as well as their classification with respect to a granularity criterion (multi-resolution analysis). The original intensity range of the CT data volume is thus reduced in the segmented data to a number of levels equal to the resolution depth used (generally ten levels). The specificity of such morphological filtering is to extract tissue patterns locally contrasting with their neighborhood and of size inferior to the resolution depth, while preserving their original shape. A multi-valued hierarchical graph describing the segmentation result is built-up according to the resolution level and the adjacency of the different segmented components. The graph nodes are then enriched with the textural information carried out by their associated components. A graph analysis-reorganization based on the nodes attributes delivers the final classification of the lung parenchyma in normal and ILD/emphysematous regions. It also makes possible to discriminate between different types, or development stages, among the same class of diseases.
Alves, Julio Cesar L; Henriques, Claudete B; Poppi, Ronei J
2014-01-03
The use of near infrared (NIR) spectroscopy combined with chemometric methods have been widely used in petroleum and petrochemical industry and provides suitable methods for process control and quality control. The algorithm support vector machines (SVM) has demonstrated to be a powerful chemometric tool for development of classification models due to its ability to nonlinear modeling and with high generalization capability and these characteristics can be especially important for treating near infrared (NIR) spectroscopy data of complex mixtures such as petroleum refinery streams. In this work, a study on the performance of the support vector machines algorithm for classification was carried out, using C-SVC and ν-SVC, applied to near infrared (NIR) spectroscopy data of different types of streams that make up the diesel pool in a petroleum refinery: light gas oil, heavy gas oil, hydrotreated diesel, kerosene, heavy naphtha and external diesel. In addition to these six streams, the diesel final blend produced in the refinery was added to complete the data set. C-SVC and ν-SVC classification models with 2, 4, 6 and 7 classes were developed for comparison between its results and also for comparison with the soft independent modeling of class analogy (SIMCA) models results. It is demonstrated the superior performance of SVC models especially using ν-SVC for development of classification models for 6 and 7 classes leading to an improvement of sensitivity on validation sample sets of 24% and 15%, respectively, when compared to SIMCA models, providing better identification of chemical compositions of different diesel pool refinery streams. Copyright © 2013 Elsevier B.V. All rights reserved.
Hydrological Classification, a Practical Tool for Mangrove Restoration
Van Loon, Anne F.; Te Brake, Bram; Van Huijgevoort, Marjolein H. J.; Dijksma, Roel
2016-01-01
Mangrove restoration projects, aimed at restoring important values of mangrove forests after degradation, often fail because hydrological conditions are disregarded. We present a simple, but robust methodology to determine hydrological suitability for mangrove species, which can guide restoration practice. In 15 natural and 8 disturbed sites (i.e. disused shrimp ponds) in three case study regions in south-east Asia, water levels were measured and vegetation species composition was determined. Using an existing hydrological classification for mangroves, sites were classified into hydrological classes, based on duration of inundation, and vegetation classes, based on occurrence of mangrove species. For the natural sites hydrological and vegetation classes were similar, showing clear distribution of mangrove species from wet to dry sites. Application of the classification to disturbed sites showed that in some locations hydrological conditions had been restored enough for mangrove vegetation to establish, in some locations hydrological conditions were suitable for various mangrove species but vegetation had not established naturally, and in some locations hydrological conditions were too wet for any mangrove species (natural or planted) to grow. We quantified the effect that removal of obstructions such as dams would have on the hydrology and found that failure of planting at one site could have been prevented. The hydrological classification needs relatively little data, i.e. water levels for a period of only one lunar tidal cycle without additional measurements, and uncertainties in the measurements and analysis are relatively small. For the study locations, the application of the hydrological classification gave important information about how to restore the hydrology to suitable conditions to improve natural regeneration or to plant mangrove species, which could not have been obtained by estimating elevation only. Based on this research a number of recommendations are given to improve the effectiveness of mangrove restoration projects. PMID:27008277
Hydrological Classification, a Practical Tool for Mangrove Restoration.
Van Loon, Anne F; Te Brake, Bram; Van Huijgevoort, Marjolein H J; Dijksma, Roel
2016-01-01
Mangrove restoration projects, aimed at restoring important values of mangrove forests after degradation, often fail because hydrological conditions are disregarded. We present a simple, but robust methodology to determine hydrological suitability for mangrove species, which can guide restoration practice. In 15 natural and 8 disturbed sites (i.e. disused shrimp ponds) in three case study regions in south-east Asia, water levels were measured and vegetation species composition was determined. Using an existing hydrological classification for mangroves, sites were classified into hydrological classes, based on duration of inundation, and vegetation classes, based on occurrence of mangrove species. For the natural sites hydrological and vegetation classes were similar, showing clear distribution of mangrove species from wet to dry sites. Application of the classification to disturbed sites showed that in some locations hydrological conditions had been restored enough for mangrove vegetation to establish, in some locations hydrological conditions were suitable for various mangrove species but vegetation had not established naturally, and in some locations hydrological conditions were too wet for any mangrove species (natural or planted) to grow. We quantified the effect that removal of obstructions such as dams would have on the hydrology and found that failure of planting at one site could have been prevented. The hydrological classification needs relatively little data, i.e. water levels for a period of only one lunar tidal cycle without additional measurements, and uncertainties in the measurements and analysis are relatively small. For the study locations, the application of the hydrological classification gave important information about how to restore the hydrology to suitable conditions to improve natural regeneration or to plant mangrove species, which could not have been obtained by estimating elevation only. Based on this research a number of recommendations are given to improve the effectiveness of mangrove restoration projects.
ZN graded discrete Lax pairs and Yang-Baxter maps
NASA Astrophysics Data System (ADS)
Fordy, Allan P.; Xenitidis, Pavlos
2017-05-01
We recently introduced a class of ZN graded discrete Lax pairs and studied the associated discrete integrable systems (lattice equations). In this paper, we introduce the corresponding Yang-Baxter maps. Many well-known examples belong to this scheme for N=2, so, for N≥3, our systems may be regarded as generalizations of these. In particular, for each N we introduce a class of multi-component Yang-Baxter maps, which include HBIII (of Papageorgiou et al. 2010 SIGMA 6, 003 (9 p). (doi:10.3842/SIGMA.2010.033)), when N=2, and that associated with the discrete modified Boussinesq equation, for N=3. For N≥5 we introduce a new family of Yang-Baxter maps, which have no lower dimensional analogue. We also present new multi-component versions of the Yang-Baxter maps FIV and FV (given in the classification of Adler et al. 2004 Commun. Anal. Geom. 12, 967-1007. (doi:10.4310/CAG.2004.v12.n5.a1)).
[Formula: see text] graded discrete Lax pairs and Yang-Baxter maps.
Fordy, Allan P; Xenitidis, Pavlos
2017-05-01
We recently introduced a class of [Formula: see text] graded discrete Lax pairs and studied the associated discrete integrable systems (lattice equations). In this paper, we introduce the corresponding Yang-Baxter maps. Many well-known examples belong to this scheme for N =2, so, for N ≥3, our systems may be regarded as generalizations of these. In particular, for each N we introduce a class of multi-component Yang-Baxter maps, which include H B III (of Papageorgiou et al. 2010 SIGMA 6, 003 (9 p). (doi:10.3842/SIGMA.2010.033)), when N =2, and that associated with the discrete modified Boussinesq equation, for N =3. For N ≥5 we introduce a new family of Yang-Baxter maps, which have no lower dimensional analogue. We also present new multi-component versions of the Yang-Baxter maps F IV and F V (given in the classification of Adler et al. 2004 Commun. Anal. Geom. 12, 967-1007. (doi:10.4310/CAG.2004.v12.n5.a1)).
ZN graded discrete Lax pairs and Yang–Baxter maps
Fordy, Allan P.
2017-01-01
We recently introduced a class of ZN graded discrete Lax pairs and studied the associated discrete integrable systems (lattice equations). In this paper, we introduce the corresponding Yang–Baxter maps. Many well-known examples belong to this scheme for N=2, so, for N≥3, our systems may be regarded as generalizations of these. In particular, for each N we introduce a class of multi-component Yang–Baxter maps, which include HBIII (of Papageorgiou et al. 2010 SIGMA 6, 003 (9 p). (doi:10.3842/SIGMA.2010.033)), when N=2, and that associated with the discrete modified Boussinesq equation, for N=3. For N≥5 we introduce a new family of Yang–Baxter maps, which have no lower dimensional analogue. We also present new multi-component versions of the Yang–Baxter maps FIV and FV (given in the classification of Adler et al. 2004 Commun. Anal. Geom. 12, 967–1007. (doi:10.4310/CAG.2004.v12.n5.a1)). PMID:28588406
Prasad, Dilip K; Agarwal, Krishna
2016-03-22
We propose a method for classifying radiometric oceanic color data measured by hyperspectral satellite sensors into known spectral classes, irrespective of the downwelling irradiance of the particular day, i.e., the illumination conditions. The focus is not on retrieving the inherent optical properties but to classify the pixels according to the known spectral classes of the reflectances from the ocean. The method compensates for the unknown downwelling irradiance by white balancing the radiometric data at the ocean pixels using the radiometric data of bright pixels (typically from clouds). The white-balanced data is compared with the entries in a pre-calibrated lookup table in which each entry represents the spectral properties of one class. The proposed approach is tested on two datasets of in situ measurements and 26 different daylight illumination spectra for medium resolution imaging spectrometer (MERIS), moderate-resolution imaging spectroradiometer (MODIS), sea-viewing wide field-of-view sensor (SeaWiFS), coastal zone color scanner (CZCS), ocean and land colour instrument (OLCI), and visible infrared imaging radiometer suite (VIIRS) sensors. Results are also shown for CIMEL's SeaPRISM sun photometer sensor used on-board field trips. Accuracy of more than 92% is observed on the validation dataset and more than 86% is observed on the other dataset for all satellite sensors. The potential of applying the algorithms to non-satellite and non-multi-spectral sensors mountable on airborne systems is demonstrated by showing classification results for two consumer cameras. Classification on actual MERIS data is also shown. Additional results comparing the spectra of remote sensing reflectance with level 2 MERIS data and chlorophyll concentration estimates of the data are included.
NASA Astrophysics Data System (ADS)
Fluet-Chouinard, E.; Lehner, B.; Aires, F.; Prigent, C.; McIntyre, P. B.
2017-12-01
Global surface water maps have improved in spatial and temporal resolutions through various remote sensing methods: open water extents with compiled Landsat archives and inundation with topographically downscaled multi-sensor retrievals. These time-series capture variations through time of open water and inundation without discriminating between hydrographic features (e.g. lakes, reservoirs, river channels and wetland types) as other databases have done as static representation. Available data sources present the opportunity to generate a comprehensive map and typology of aquatic environments (deepwater and wetlands) that improves on earlier digitized inventories and maps. The challenge of classifying surface waters globally is to distinguishing wetland types with meaningful characteristics or proxies (hydrology, water chemistry, soils, vegetation) while accommodating limitations of remote sensing data. We present a new wetland classification scheme designed for global application and produce a map of aquatic ecosystem types globally using state-of-the-art remote sensing products. Our classification scheme combines open water extent and expands it with downscaled multi-sensor inundation data to capture the maximal vegetated wetland extent. The hierarchical structure of the classification is modified from the Cowardin Systems (1979) developed for the USA. The first level classification is based on a combination of landscape positions and water source (e.g. lacustrine, riverine, palustrine, coastal and artificial) while the second level represents the hydrologic regime (e.g. perennial, seasonal, intermittent and waterlogged). Class-specific descriptors can further detail the wetland types with soils and vegetation cover. Our globally consistent nomenclature and top-down mapping allows for direct comparison across biogeographic regions, to upscale biogeochemical fluxes as well as other landscape level functions.
NASA Astrophysics Data System (ADS)
Linek, M.; Jungmann, M.; Berlage, T.; Clauser, C.
2005-12-01
Within the Ocean Drilling Program (ODP), image logging tools have been routinely deployed such as the Formation MicroScanner (FMS) or the Resistivity-At-Bit (RAB) tools. Both logging methods are based on resistivity measurements at the borehole wall and therefore are sensitive to conductivity contrasts, which are mapped in color scale images. These images are commonly used to study the structure of the sedimentary rocks and the oceanic crust (petrologic fabric, fractures, veins, etc.). So far, mapping of lithology from electrical images is purely based on visual inspection and subjective interpretation. We apply digital image analysis on electrical borehole wall images in order to develop a method, which augments objective rock identification. We focus on supervised textural pattern recognition which studies the spatial gray level distribution with respect to certain rock types. FMS image intervals of rock classes known from core data are taken in order to train textural characteristics for each class. A so-called gray level co-occurrence matrix is computed by counting the occurrence of a pair of gray levels that are a certain distant apart. Once the matrix for an image interval is computed, we calculate the image contrast, homogeneity, energy, and entropy. We assign characteristic textural features to different rock types by reducing the image information into a small set of descriptive features. Once a discriminating set of texture features for each rock type is found, we are able to discriminate the entire FMS images regarding the trained rock type classification. A rock classification based on texture features enables quantitative lithology mapping and is characterized by a high repeatability, in contrast to a purely visual subjective image interpretation. We show examples for the rock classification between breccias, pillows, massive units, and horizontally bedded tuffs based on ODP image data.
Single trial detection of hand poses in human ECoG using CSP based feature extraction.
Kapeller, C; Schneider, C; Kamada, K; Ogawa, H; Kunii, N; Ortner, R; Pruckl, R; Guger, C
2014-01-01
Decoding brain activity of corresponding highlevel tasks may lead to an independent and intuitively controlled Brain-Computer Interface (BCI). Most of today's BCI research focuses on analyzing the electroencephalogram (EEG) which provides only limited spatial and temporal resolution. Derived electrocorticographic (ECoG) signals allow the investigation of spatially highly focused task-related activation within the high-gamma frequency band, making the discrimination of individual finger movements or complex grasping tasks possible. Common spatial patterns (CSP) are commonly used for BCI systems and provide a powerful tool for feature optimization and dimensionality reduction. This work focused on the discrimination of (i) three complex hand movements, as well as (ii) hand movement and idle state. Two subjects S1 and S2 performed single `open', `peace' and `fist' hand poses in multiple trials. Signals in the high-gamma frequency range between 100 and 500 Hz were spatially filtered based on a CSP algorithm for (i) and (ii). Additionally, a manual feature selection approach was tested for (i). A multi-class linear discriminant analysis (LDA) showed for (i) an error rate of 13.89 % / 7.22 % and 18.42 % / 1.17 % for S1 and S2 using manually / CSP selected features, where for (ii) a two class LDA lead to a classification error of 13.39 % and 2.33 % for S1 and S2, respectively.
NASA Astrophysics Data System (ADS)
El-Abbas, Mustafa M.; Csaplovics, Elmar; Deafalla, Taisser H.
2013-10-01
Nowadays, remote-sensing technologies are becoming increasingly interlinked to the issue of deforestation. They offer a systematized and objective strategy to document, understand and simulate the deforestation process and its associated causes. In this context, the main goal of this study, conducted in the Blue Nile region of Sudan, in which most of the natural habitats were dramatically destroyed, was to develop spatial methodologies to assess the deforestation dynamics and its associated factors. To achieve that, optical multispectral satellite scenes (i.e., ASTER and LANDSAT) integrated with field survey in addition to multiple data sources were used for the analyses. Spatiotemporal Object Based Image Analysis (STOBIA) was applied to assess the change dynamics within the period of study. Broadly, the above mentioned analyses include; Object Based (OB) classifications, post-classification change detection, data fusion, information extraction and spatial analysis. Hierarchical multi-scale segmentation thresholds were applied and each class was delimited with semantic meanings by a set of rules associated with membership functions. Consequently, the fused multi-temporal data were introduced to create detailed objects of change classes from the input LU/LC classes. The dynamic changes were quantified and spatially located as well as the spatial and contextual relations from adjacent areas were analyzed. The main finding of the present study is that, the forest areas were drastically decreased, while the agrarian structure in conversion of forest into agricultural fields and grassland was the main force of deforestation. In contrast, the capability of the area to recover was clearly observed. The study concludes with a brief assessment of an 'oriented' framework, focused on the alarming areas where serious dynamics are located and where urgent plans and interventions are most critical, guided with potential solutions based on the identified driving forces.
2016-10-01
both sexes ), neurological symptoms and organ injury resembling human EHS. Blood and tissue samples were collected at 0.5 h, 3 h, 24 h 4d,9d and 14d of...associated with adipose tissue. Analyses of metabolic hormones and histology in both sexes suggest transient injury or “stunning” to the pancreas...15. SUBJECT TERMS Sex differences, exertional heat stroke, multi-organ injury, heat stress, metabolic hormones 16. SECURITY CLASSIFICATION OF: U 17
NASA Astrophysics Data System (ADS)
Zou, Xiaoliang; Zhao, Guihua; Li, Jonathan; Yang, Yuanxi; Fang, Yong
2016-06-01
With the rapid developments of the sensor technology, high spatial resolution imagery and airborne Lidar point clouds can be captured nowadays, which make classification, extraction, evaluation and analysis of a broad range of object features available. High resolution imagery, Lidar dataset and parcel map can be widely used for classification as information carriers. Therefore, refinement of objects classification is made possible for the urban land cover. The paper presents an approach to object based image analysis (OBIA) combing high spatial resolution imagery and airborne Lidar point clouds. The advanced workflow for urban land cover is designed with four components. Firstly, colour-infrared TrueOrtho photo and laser point clouds were pre-processed to derive the parcel map of water bodies and nDSM respectively. Secondly, image objects are created via multi-resolution image segmentation integrating scale parameter, the colour and shape properties with compactness criterion. Image can be subdivided into separate object regions. Thirdly, image objects classification is performed on the basis of segmentation and a rule set of knowledge decision tree. These objects imagery are classified into six classes such as water bodies, low vegetation/grass, tree, low building, high building and road. Finally, in order to assess the validity of the classification results for six classes, accuracy assessment is performed through comparing randomly distributed reference points of TrueOrtho imagery with the classification results, forming the confusion matrix and calculating overall accuracy and Kappa coefficient. The study area focuses on test site Vaihingen/Enz and a patch of test datasets comes from the benchmark of ISPRS WG III/4 test project. The classification results show higher overall accuracy for most types of urban land cover. Overall accuracy is 89.5% and Kappa coefficient equals to 0.865. The OBIA approach provides an effective and convenient way to combine high resolution imagery and Lidar ancillary data for classification of urban land cover.
Wang, Huiya; Feng, Jun; Wang, Hongyu
2017-07-20
Detection of clustered microcalcification (MC) from mammograms plays essential roles in computer-aided diagnosis for early stage breast cancer. To tackle problems associated with the diversity of data structures of MC lesions and the variability of normal breast tissues, multi-pattern sample space learning is required. In this paper, a novel grouped fuzzy Support Vector Machine (SVM) algorithm with sample space partition based on Expectation-Maximization (EM) (called G-FSVM) is proposed for clustered MC detection. The diversified pattern of training data is partitioned into several groups based on EM algorithm. Then a series of fuzzy SVM are integrated for classification with each group of samples from the MC lesions and normal breast tissues. From DDSM database, a total of 1,064 suspicious regions are selected from 239 mammography, and the measurement of Accuracy, True Positive Rate (TPR), False Positive Rate (FPR) and EVL = TPR* 1-FPR are 0.82, 0.78, 0.14 and 0.72, respectively. The proposed method incorporates the merits of fuzzy SVM and multi-pattern sample space learning, decomposing the MC detection problem into serial simple two-class classification. Experimental results from synthetic data and DDSM database demonstrate that our integrated classification framework reduces the false positive rate significantly while maintaining the true positive rate.
Multi-scale investigation of shrub encroachment in southern Africa
NASA Astrophysics Data System (ADS)
Aplin, Paul; Marston, Christopher; Wilkinson, David; Field, Richard; O'Regan, Hannah
2016-04-01
There is growing speculation that savannah environments throughout Africa have been subject to shrub encroachment in recent years, whereby grassland is lost to woody vegetation cover. Changes in the relative proportions of grassland and woodland are important in the context of conservation of savannah systems, with implications for faunal distributions, environmental management and tourism. Here, we focus on southern Kruger National Park, South Africa, and investigate whether or not shrub encroachment has occurred over the last decade and a half. We use a multi-scale approach, examining the complementarity of medium (e.g. Landsat TM and OLI) and fine (e.g. QuickBird and WorldView-2) spatial resolution satellite sensor imagery, supported by intensive field survey in 2002 and 2014. We employ semi-automated land cover classification, involving a hybrid unsupervised clustering approach with manual class grouping and checking, followed by change detection post-classification comparison analysis. The results show that shrub encroachment is indeed occurring, a finding evidenced through three fine resolution replicate images plus medium resolution imagery. The results also demonstrate the complementarity of medium and fine resolution imagery, though some thematic information must be sacrificed to maintain high medium resolution classification accuracy. Finally, the findings have broader implications for issues such as vegetation seasonality, spatial transferability and management practices.
A Matlab Program for Textural Classification Using Neural Networks
NASA Astrophysics Data System (ADS)
Leite, E. P.; de Souza, C.
2008-12-01
A new MATLAB code that provides tools to perform classification of textural images for applications in the Geosciences is presented. The program, here coined TEXTNN, comprises the computation of variogram maps in the frequency domain for specific lag distances in the neighborhood of a pixel. The result is then converted back to spatial domain, where directional or ominidirectional semivariograms are extracted. Feature vectors are built with textural information composed of the semivariance values at these lag distances and, moreover, with histogram measures of mean, standard deviation and weighted fill-ratio. This procedure is applied to a selected group of pixels or to all pixels in an image using a moving window. A feed- forward back-propagation Neural Network can then be designed and trained on feature vectors of predefined classes (training set). The training phase minimizes the mean-squared error on the training set. Additionally, at each iteration, the mean-squared error for every validation is assessed and a test set is evaluated. The program also calculates contingency matrices, global accuracy and kappa coefficient for the three data sets, allowing a quantitative appraisal of the predictive power of the Neural Network models. The interpreter is able to select the best model obtained from a k-fold cross-validation or to use a unique split-sample data set for classification of all pixels in a given textural image. The code is opened to the geoscientific community and is very flexible, allowing the experienced user to modify it as necessary. The performance of the algorithms and the end-user program were tested using synthetic images, orbital SAR (RADARSAT) imagery for oil seepage detection, and airborne, multi-polarimetric SAR imagery for geologic mapping. The overall results proved very promising.
A tri-fold hybrid classification approach for diagnostics with unexampled faulty states
NASA Astrophysics Data System (ADS)
Tamilselvan, Prasanna; Wang, Pingfeng
2015-01-01
System health diagnostics provides diversified benefits such as improved safety, improved reliability and reduced costs for the operation and maintenance of engineered systems. Successful health diagnostics requires the knowledge of system failures. However, with an increasing system complexity, it is extraordinarily difficult to have a well-tested system so that all potential faulty states can be realized and studied at product testing stage. Thus, real time health diagnostics requires automatic detection of unexampled system faulty states based upon sensory data to avoid sudden catastrophic system failures. This paper presents a trifold hybrid classification (THC) approach for structural health diagnosis with unexampled health states (UHS), which comprises of preliminary UHS identification using a new thresholded Mahalanobis distance (TMD) classifier, UHS diagnostics using a two-class support vector machine (SVM) classifier, and exampled health states diagnostics using a multi-class SVM classifier. The proposed THC approach, which takes the advantages of both TMD and SVM-based classification techniques, is able to identify and isolate the unexampled faulty states through interactively detecting the deviation of sensory data from the exampled health states and forming new ones autonomously. The proposed THC approach is further extended to a generic framework for health diagnostics problems with unexampled faulty states and demonstrated with health diagnostics case studies for power transformers and rolling bearings.
A thyroid nodule classification method based on TI-RADS
NASA Astrophysics Data System (ADS)
Wang, Hao; Yang, Yang; Peng, Bo; Chen, Qin
2017-07-01
Thyroid Imaging Reporting and Data System(TI-RADS) is a valuable tool for differentiating the benign and the malignant thyroid nodules. In clinic, doctors can determine the extent of being benign or malignant in terms of different classes by using TI-RADS. Classification represents the degree of malignancy of thyroid nodules. TI-RADS as a classification standard can be used to guide the ultrasonic doctor to examine thyroid nodules more accurately and reliably. In this paper, we aim to classify the thyroid nodules with the help of TI-RADS. To this end, four ultrasound signs, i.e., cystic and solid, echo pattern, boundary feature and calcification of thyroid nodules are extracted and converted into feature vectors. Then semi-supervised fuzzy C-means ensemble (SS-FCME) model is applied to obtain the classification results. The experimental results demonstrate that the proposed method can help doctors diagnose the thyroid nodules effectively.
NASA Technical Reports Server (NTRS)
Emerson, Charles W.; Sig-NganLam, Nina; Quattrochi, Dale A.
2004-01-01
The accuracy of traditional multispectral maximum-likelihood image classification is limited by the skewed statistical distributions of reflectances from the complex heterogenous mixture of land cover types in urban areas. This work examines the utility of local variance, fractal dimension and Moran's I index of spatial autocorrelation in segmenting multispectral satellite imagery. Tools available in the Image Characterization and Modeling System (ICAMS) were used to analyze Landsat 7 imagery of Atlanta, Georgia. Although segmentation of panchromatic images is possible using indicators of spatial complexity, different land covers often yield similar values of these indices. Better results are obtained when a surface of local fractal dimension or spatial autocorrelation is combined as an additional layer in a supervised maximum-likelihood multispectral classification. The addition of fractal dimension measures is particularly effective at resolving land cover classes within urbanized areas, as compared to per-pixel spectral classification techniques.
A Critical Review of Mode of Action (MOA) Assignment ...
There are various structure-based classification schemes to categorize chemicals based on mode of action (MOA) which have been applied for both eco and human health toxicology. With increasing calls to assess thousands of chemicals, some of which have little available information other than structure, clear understanding how each of these MOA schemes was devised, what information they are based on, and the limitations of each approach is critical. Several groups are developing low-tier methods to more easily classify or assess chemicals, using approaches such as the ecological threshold of concern (eco-TTC) and chemical-activity. Evaluation of these approaches and determination of their domain of applicability is partly dependent on the MOA classification that is used. The most commonly used MOA classification schemes for ecotoxicology include Verhaar and Russom (included in ASTER), both of which are used to predict acute aquatic toxicity MOA. Verhaar is a QSAR-based system that classifies chemicals into one of 4 classes, with a 5th class specified for those chemicals that are not classified in the other 4. ASTER/Russom includes 8 classifications: narcotics (3 groups), oxidative phosphorylation uncouplers, respiratory inhibitors, electrophiles/proelectrophiles, AChE inhibitors, or CNS seizure agents. Other methodologies include TEST (Toxicity Estimation Software Tool), a computational chemistry-based application that allows prediction to one of 5 broad MOA
Modified Angle's Classification for Primary Dentition
Chandranee, Kaushik Narendra; Chandranee, Narendra Jayantilal; Nagpal, Devendra; Lamba, Gagandeep; Choudhari, Purva; Hotwani, Kavita
2017-01-01
Aim: This study aims to propose a modification of Angle's classification for primary dentition and to assess its applicability in children from Central India, Nagpur. Methods: Modification in Angle's classification has been proposed for application in primary dentition. Small roman numbers i/ii/iii are used for primary dentition notation to represent Angle's Class I/II/III molar relationships as in permanent dentition, respectively. To assess applicability of modified Angle's classification a cross-sectional preschool 2000 children population from central India; 3–6 years of age residing in Nagpur metropolitan city of Maharashtra state were selected randomly as per the inclusion and exclusion criteria. Results: Majority 93.35% children were found to have bilateral Class i followed by 2.5% bilateral Class ii and 0.2% bilateral half cusp Class iii molar relationships as per the modified Angle's classification for primary dentition. About 3.75% children had various combinations of Class ii relationships and 0.2% children were having Class iii subdivision relationship. Conclusions: Modification of Angle's classification for application in primary dentition has been proposed. A cross-sectional investigation using new classification revealed various 6.25% Class ii and 0.4% Class iii molar relationships cases in preschool children population in a metropolitan city of Nagpur. Application of the modified Angle's classification to other population groups is warranted to validate its routine application in clinical pediatric dentistry. PMID:29326514
NASA Astrophysics Data System (ADS)
Farda, N. M.
2017-12-01
Coastal wetlands provide ecosystem services essential to people and the environment. Changes in coastal wetlands, especially on land use, are important to monitor by utilizing multi-temporal imagery. The Google Earth Engine (GEE) provides many machine learning algorithms (10 algorithms) that are very useful for extracting land use from imagery. The research objective is to explore machine learning in Google Earth Engine and its accuracy for multi-temporal land use mapping of coastal wetland area. Landsat 3 MSS (1978), Landsat 5 TM (1991), Landsat 7 ETM+ (2001), and Landsat 8 OLI (2014) images located in Segara Anakan lagoon are selected to represent multi temporal images. The input for machine learning are visible and near infrared bands, PCA band, invers PCA bands, bare soil index, vegetation index, wetness index, elevation from ASTER GDEM, and GLCM (Harralick) texture, and also polygon samples in 140 locations. There are 10 machine learning algorithms applied to extract coastal wetlands land use from Landsat imagery. The algorithms are Fast Naive Bayes, CART (Classification and Regression Tree), Random Forests, GMO Max Entropy, Perceptron (Multi Class Perceptron), Winnow, Voting SVM, Margin SVM, Pegasos (Primal Estimated sub-GrAdient SOlver for Svm), IKPamir (Intersection Kernel Passive Aggressive Method for Information Retrieval, SVM). Machine learning in Google Earth Engine are very helpful in multi-temporal land use mapping, the highest accuracy for land use mapping of coastal wetland is CART with 96.98 % Overall Accuracy using K-Fold Cross Validation (K = 10). GEE is particularly useful for multi-temporal land use mapping with ready used image and classification algorithms, and also very challenging for other applications.
Lainscsek, Claudia; Weyhenmeyer, Jonathan; Hernandez, Manuel E; Poizner, Howard; Sejnowski, Terrence J
2013-01-01
Time series analysis with delay differential equations (DDEs) reveals non-linear properties of the underlying dynamical system and can serve as a non-linear time-domain classification tool. Here global DDE models were used to analyze short segments of simulated time series from a known dynamical system, the Rössler system, in high noise regimes. In a companion paper, we apply the DDE model developed here to classify short segments of encephalographic (EEG) data recorded from patients with Parkinson's disease and healthy subjects. Nine simulated subjects in each of two distinct classes were generated by varying the bifurcation parameter b and keeping the other two parameters (a and c) of the Rössler system fixed. All choices of b were in the chaotic parameter range. We diluted the simulated data using white noise ranging from 10 to -30 dB signal-to-noise ratios (SNR). Structure selection was supervised by selecting the number of terms, delays, and order of non-linearity of the model DDE model that best linearly separated the two classes of data. The distances d from the linear dividing hyperplane was then used to assess the classification performance by computing the area A' under the ROC curve. The selected model was tested on untrained data using repeated random sub-sampling validation. DDEs were able to accurately distinguish the two dynamical conditions, and moreover, to quantify the changes in the dynamics. There was a significant correlation between the dynamical bifurcation parameter b of the simulated data and the classification parameter d from our analysis. This correlation still held for new simulated subjects with new dynamical parameters selected from each of the two dynamical regimes. Furthermore, the correlation was robust to added noise, being significant even when the noise was greater than the signal. We conclude that DDE models may be used as a generalizable and reliable classification tool for even small segments of noisy data.
Non-Linear Dynamical Classification of Short Time Series of the Rössler System in High Noise Regimes
Lainscsek, Claudia; Weyhenmeyer, Jonathan; Hernandez, Manuel E.; Poizner, Howard; Sejnowski, Terrence J.
2013-01-01
Time series analysis with delay differential equations (DDEs) reveals non-linear properties of the underlying dynamical system and can serve as a non-linear time-domain classification tool. Here global DDE models were used to analyze short segments of simulated time series from a known dynamical system, the Rössler system, in high noise regimes. In a companion paper, we apply the DDE model developed here to classify short segments of encephalographic (EEG) data recorded from patients with Parkinson’s disease and healthy subjects. Nine simulated subjects in each of two distinct classes were generated by varying the bifurcation parameter b and keeping the other two parameters (a and c) of the Rössler system fixed. All choices of b were in the chaotic parameter range. We diluted the simulated data using white noise ranging from 10 to −30 dB signal-to-noise ratios (SNR). Structure selection was supervised by selecting the number of terms, delays, and order of non-linearity of the model DDE model that best linearly separated the two classes of data. The distances d from the linear dividing hyperplane was then used to assess the classification performance by computing the area A′ under the ROC curve. The selected model was tested on untrained data using repeated random sub-sampling validation. DDEs were able to accurately distinguish the two dynamical conditions, and moreover, to quantify the changes in the dynamics. There was a significant correlation between the dynamical bifurcation parameter b of the simulated data and the classification parameter d from our analysis. This correlation still held for new simulated subjects with new dynamical parameters selected from each of the two dynamical regimes. Furthermore, the correlation was robust to added noise, being significant even when the noise was greater than the signal. We conclude that DDE models may be used as a generalizable and reliable classification tool for even small segments of noisy data. PMID:24379798
Ambure, Pravin; Bhat, Jyotsna; Puzyn, Tomasz; Roy, Kunal
2018-04-23
Alzheimer's disease (AD) is a multi-factorial disease, which can be simply outlined as an irreversible and progressive neurodegenerative disorder with an unclear root cause. It is a major cause of dementia in old aged people. In the present study, utilizing the structural and biological activity information of ligands for five important and mostly studied vital targets (i.e. cyclin-dependant kinase 5, β-secretase, monoamine oxidase B, glycogen synthase kinase 3β, acetylcholinesterase) that are believed to be effective against AD, we have developed five classification models using linear discriminant analysis (LDA) technique. Considering the importance of data curation, we have given more attention towards the chemical and biological data curation, which is a difficult task especially in case of big data-sets. Thus, to ease the curation process we have designed Konstanz Information Miner (KNIME) workflows, which are made available at http://teqip.jdvu.ac.in/QSAR_Tools/ . The developed models were appropriately validated based on the predictions for experiment derived data from test sets, as well as true external set compounds including known multi-target compounds. The domain of applicability for each classification model was checked based on a confidence estimation approach. Further, these validated models were employed for screening of natural compounds collected from the InterBioScreen natural database ( https://www.ibscreen.com/natural-compounds ). Further, the natural compounds that were categorized as 'actives' in at least two classification models out of five developed models were considered as multi-target leads, and these compounds were further screened using the drug-like filter, molecular docking technique and then thoroughly analyzed using molecular dynamics studies. Finally, the most potential multi-target natural compounds against AD are suggested.
Dimitriadis, S I; Liparas, Dimitris; Tsolaki, Magda N
2018-05-15
In the era of computer-assisted diagnostic tools for various brain diseases, Alzheimer's disease (AD) covers a large percentage of neuroimaging research, with the main scope being its use in daily practice. However, there has been no study attempting to simultaneously discriminate among Healthy Controls (HC), early mild cognitive impairment (MCI), late MCI (cMCI) and stable AD, using features derived from a single modality, namely MRI. Based on preprocessed MRI images from the organizers of a neuroimaging challenge, 3 we attempted to quantify the prediction accuracy of multiple morphological MRI features to simultaneously discriminate among HC, MCI, cMCI and AD. We explored the efficacy of a novel scheme that includes multiple feature selections via Random Forest from subsets of the whole set of features (e.g. whole set, left/right hemisphere etc.), Random Forest classification using a fusion approach and ensemble classification via majority voting. From the ADNI database, 60 HC, 60 MCI, 60 cMCI and 60 CE were used as a training set with known labels. An extra dataset of 160 subjects (HC: 40, MCI: 40, cMCI: 40 and AD: 40) was used as an external blind validation dataset to evaluate the proposed machine learning scheme. In the second blind dataset, we succeeded in a four-class classification of 61.9% by combining MRI-based features with a Random Forest-based Ensemble Strategy. We achieved the best classification accuracy of all teams that participated in this neuroimaging competition. The results demonstrate the effectiveness of the proposed scheme to simultaneously discriminate among four groups using morphological MRI features for the very first time in the literature. Hence, the proposed machine learning scheme can be used to define single and multi-modal biomarkers for AD. Copyright © 2017 Elsevier B.V. All rights reserved.
Integration of Advanced Statistical Analysis Tools and Geophysical Modeling
2010-12-01
Carin Duke University Douglas Oldenburg University of British Columbia Stephen Billings, Leonard Pasion Laurens Beran Sky Research...means and covariances estimated for each class [5]. For this study, dipole polarizabilities were fit with a Pasion -Oldenburg parameterization of 8 −1...model for unexploded ordnance classification with EMI data,” IEEE Geosci. Remote Sensing Letters, vol. 4, pp. 629–633, 2007. [4] L. R. Pasion
Domingo-Salvany, Antònia; Bacigalupe, Amaia; Carrasco, José Miguel; Espelt, Albert; Ferrando, Josep; Borrell, Carme
2013-01-01
In Spain, the new National Classification of Occupations (Clasificación Nacional de Ocupaciones [CNO-2011]) is substantially different to the 1994 edition, and requires adaptation of occupational social classes for use in studies of health inequalities. This article presents two proposals to measure social class: the new classification of occupational social class (CSO-SEE12), based on the CNO-2011 and a neo-Weberian perspective, and a social class classification based on a neo-Marxist approach. The CSO-SEE12 is the result of a detailed review of the CNO-2011 codes. In contrast, the neo-Marxist classification is derived from variables related to capital and organizational and skill assets. The proposed CSO-SEE12 consists of seven classes that can be grouped into a smaller number of categories according to study needs. The neo-Marxist classification consists of 12 categories in which home owners are divided into three categories based on capital goods and employed persons are grouped into nine categories composed of organizational and skill assets. These proposals are complemented by a proposed classification of educational level that integrates the various curricula in Spain and provides correspondences with the International Standard Classification of Education. Copyright © 2012 SESPAS. Published by Elsevier Espana. All rights reserved.
Spatz, Erica S; Curry, Leslie A; Masoudi, Frederick A; Zhou, Shengfan; Strait, Kelly M; Gross, Cary P; Curtis, Jeptha P; Lansky, Alexandra J; Soares Barreto-Filho, Jose Augusto; Lampropulos, Julianna F; Bueno, Hector; Chaudhry, Sarwat I; D'Onofrio, Gail; Safdar, Basmah; Dreyer, Rachel P; Murugiah, Karthik; Spertus, John A; Krumholz, Harlan M
2015-11-03
Current classification schemes for acute myocardial infarction (AMI) may not accommodate the breadth of clinical phenotypes in young women. We developed a novel taxonomy among young adults (≤55 years) with AMI enrolled in the Variation in Recovery: Role of Gender on Outcomes of Young AMI Patients (VIRGO) study. We first classified a subset of patients (n=600) according to the Third Universal Definition of MI using a structured abstraction tool. There was heterogeneity within type 2 AMI, and 54 patients (9%; including 51 of 412 women) were unclassified. Using an inductive approach, we iteratively grouped patients with shared clinical characteristics, with the aims of developing a more inclusive taxonomy that could distinguish unique clinical phenotypes. The final VIRGO taxonomy classified 2802 study participants as follows: class 1, plaque-mediated culprit lesion (82.5% of women; 94.9% of men); class 2, obstructive coronary artery disease with supply-demand mismatch (2a: 1.4% women; 0.9% men) and without supply-demand mismatch (2b: 2.4% women; 1.1% men); class 3, nonobstructive coronary artery disease with supply-demand mismatch (3a: 4.3% women; 0.8% men) and without supply-demand mismatch (3b: 7.0% women; 1.9% men); class 4, other identifiable mechanism (spontaneous dissection, vasospasm, embolism; 1.5% women, 0.2% men); and class 5, undetermined classification (0.8% women, 0.2% men). Approximately 1 in 8 young women with AMI is unclassified by the Universal Definition of MI. We propose a more inclusive taxonomy that could serve as a framework for understanding biological disease mechanisms, therapeutic efficacy, and prognosis in this population. © 2015 American Heart Association, Inc.
The VIRGO Classification System: A Taxonomy for Young Women with Acute Myocardial Infarction
Spatz, Erica S.; Curry, Leslie A.; Masoudi, Frederick A.; Zhou, Shengfan; Strait, Kelly M.; Gross, Cary P.; Curtis, Jeptha P.; Lansky, Alexandra J.; Barreto-Filho, Jose Augusto Soares; Lampropulos, Julianna F.; Bueno, Hector; Chaudhry, Sarwat I.; D'Onofrio, Gail; Safdar, Basmah; Dreyer, Rachel P.; Murugiah, Karthik; Spertus, John A.; Krumholz, Harlan M.
2015-01-01
Background Current classification schemes for acute myocardial infarction (AMI) may not accommodate the breadth of clinical phenotypes in young women. Methods and Results We developed a novel taxonomy among young adults (<55 years) with AMI enrolled in the Variation in Recovery: Role of Gender on Outcomes of Young AMI Patients (VIRGO) study. We first classified a subset of patients (n=600) according to the Third Universal Definition of MI using a structured abstraction tool. There was heterogeneity within Type 2 AMI, and 54 patients (9%; including 51 of 412 women) were unclassified. Using an inductive approach, we iteratively grouped patients with shared clinical characteristics, with the aims of developing a more inclusive taxonomy that could distinguish unique clinical phenotypes. The final VIRGO taxonomy classified 2,802 study participants as: Class 1, plaque-mediated culprit lesion (82.5% of women; 94.9% of men); Class 2, obstructive coronary artery disease with supply-demand mismatch (2a: 1.4% women; 0.9% men;) and without supply-demand mismatch (2b: 2.4% women; 1.1% men); Class 3, non-obstructive coronary artery disease with supply-demand mismatch (3a: 4.3% women; 0.8% men) and without supply-demand mismatch (3b: 7.0% women; 1.9% men); Class 4, other identifiable mechanism: spontaneous dissection; vasospasm; embolism (1.5% women; 0.2% men); and Class 5, undetermined classification (0.8% women; 0.2% men). Conclusions Approximately 1 in 8 young women with AMI are unclassified by the Universal Definition of MI. We propose a more inclusive taxonomy that could serve as a framework for understanding biological disease mechanisms, therapeutic efficacy and prognosis in this population. PMID:26350057
NASA Astrophysics Data System (ADS)
Shin, K. H.; Kim, K. H.; Ki, S. J.; Lee, H. G.
2017-12-01
The vulnerability assessment tool at a Tier 1 level, although not often used for regulatory purposes, helps establish pollution prevention and management strategies in the areas of potential environmental concern such as soil and ground water. In this study, the Neural Network Pattern Recognition Tool embedded in MATLAB was used to allow the initial screening of soil and groundwater pollution based on data compiled across about 1000 previously contaminated sites in Korea. The input variables included a series of parameters which were tightly related to downward movement of water and contaminants through soil and ground water, whereas multiple classes were assigned to the sum of concentrations of major pollutants detected. Results showed that in accordance with diverse pollution indices for soil and ground water, pollution levels in both media were strongly modulated by site-specific characteristics such as intrinsic soil and other geologic properties, in addition to pollution sources and rainfall. However, classification accuracy was very sensitive to the number of classes defined as well as the types of the variables incorporated, requiring careful selection of input variables and output categories. Therefore, we believe that the proposed methodology is used not only to modify existing pollution indices so that they are more suitable for addressing local vulnerability, but also to develop a unique assessment tool to support decision making based on locally or nationally available data. This study was funded by a grant from the GAIA project(2016000560002), Korea Environmental Industry & Technology Institute, Republic of Korea.
Very Deep Convolutional Neural Networks for Morphologic Classification of Erythrocytes.
Durant, Thomas J S; Olson, Eben M; Schulz, Wade L; Torres, Richard
2017-12-01
Morphologic profiling of the erythrocyte population is a widely used and clinically valuable diagnostic modality, but one that relies on a slow manual process associated with significant labor cost and limited reproducibility. Automated profiling of erythrocytes from digital images by capable machine learning approaches would augment the throughput and value of morphologic analysis. To this end, we sought to evaluate the performance of leading implementation strategies for convolutional neural networks (CNNs) when applied to classification of erythrocytes based on morphology. Erythrocytes were manually classified into 1 of 10 classes using a custom-developed Web application. Using recent literature to guide architectural considerations for neural network design, we implemented a "very deep" CNN, consisting of >150 layers, with dense shortcut connections. The final database comprised 3737 labeled cells. Ensemble model predictions on unseen data demonstrated a harmonic mean of recall and precision metrics of 92.70% and 89.39%, respectively. Of the 748 cells in the test set, 23 misclassification errors were made, with a correct classification frequency of 90.60%, represented as a harmonic mean across the 10 morphologic classes. These findings indicate that erythrocyte morphology profiles could be measured with a high degree of accuracy with "very deep" CNNs. Further, these data support future efforts to expand classes and optimize practical performance in a clinical environment as a prelude to full implementation as a clinical tool. © 2017 American Association for Clinical Chemistry.
deGraffenried, Jeff B; Shepherd, Keith D
2009-12-15
Human induced soil erosion has severe economic and environmental impacts throughout the world. It is more severe in the tropics than elsewhere and results in diminished food production and security. Kenya has limited arable land and 30 percent of the country experiences severe to very severe human induced soil degradation. The purpose of this research was to test visible near infrared diffuse reflectance spectroscopy (VNIR) as a tool for rapid assessment and benchmarking of soil condition and erosion severity class. The study was conducted in the Saiwa River watershed in the northern Rift Valley Province of western Kenya, a tropical highland area. Soil 137 Cs concentration was measured to validate spectrally derived erosion classes and establish the background levels for difference land use types. Results indicate VNIR could be used to accurately evaluate a large and diverse soil data set and predict soil erosion characteristics. Soil condition was spectrally assessed and modeled. Analysis of mean raw spectra indicated significant reflectance differences between soil erosion classes. The largest differences occurred between 1,350 and 1,950 nm with the largest separation occurring at 1,920 nm. Classification and Regression Tree (CART) analysis indicated that the spectral model had practical predictive success (72%) with Receiver Operating Characteristic (ROC) of 0.74. The change in 137 Cs concentrations supported the premise that VNIR is an effective tool for rapid screening of soil erosion condition.
Classification of stellar spectra with SVM based on within-class scatter and between-class scatter
NASA Astrophysics Data System (ADS)
Liu, Zhong-bao; Zhou, Fang-xiao; Qin, Zhen-tao; Luo, Xue-gang; Zhang, Jing
2018-07-01
Support Vector Machine (SVM) is a popular data mining technique, and it has been widely applied in astronomical tasks, especially in stellar spectra classification. Since SVM doesn't take the data distribution into consideration, and therefore, its classification efficiencies can't be greatly improved. Meanwhile, SVM ignores the internal information of the training dataset, such as the within-class structure and between-class structure. In view of this, we propose a new classification algorithm-SVM based on Within-Class Scatter and Between-Class Scatter (WBS-SVM) in this paper. WBS-SVM tries to find an optimal hyperplane to separate two classes. The difference is that it incorporates minimum within-class scatter and maximum between-class scatter in Linear Discriminant Analysis (LDA) into SVM. These two scatters represent the distributions of the training dataset, and the optimization of WBS-SVM ensures the samples in the same class are as close as possible and the samples in different classes are as far as possible. Experiments on the K-, F-, G-type stellar spectra from Sloan Digital Sky Survey (SDSS), Data Release 8 show that our proposed WBS-SVM can greatly improve the classification accuracies.
Multi-class ERP-based BCI data analysis using a discriminant space self-organizing map.
Onishi, Akinari; Natsume, Kiyohisa
2014-01-01
Emotional or non-emotional image stimulus is recently applied to event-related potential (ERP) based brain computer interfaces (BCI). Though the classification performance is over 80% in a single trial, a discrimination between those ERPs has not been considered. In this research we tried to clarify the discriminability of four-class ERP-based BCI target data elicited by desk, seal, spider images and letter intensifications. A conventional self organizing map (SOM) and newly proposed discriminant space SOM (ds-SOM) were applied, then the discriminabilites were visualized. We also classify all pairs of those ERPs by stepwise linear discriminant analysis (SWLDA) and verify the visualization of discriminabilities. As a result, the ds-SOM showed understandable visualization of the data with a shorter computational time than the traditional SOM. We also confirmed the clear boundary between the letter cluster and the other clusters. The result was coherent with the classification performances by SWLDA. The method might be helpful not only for developing a new BCI paradigm, but also for the big data analysis.
Toward an Attention-Based Diagnostic Tool for Patients With Locked-in Syndrome.
Lesenfants, Damien; Habbal, Dina; Chatelle, Camille; Soddu, Andrea; Laureys, Steven; Noirhomme, Quentin
2018-03-01
Electroencephalography (EEG) has been proposed as a supplemental tool for reducing clinical misdiagnosis in severely brain-injured populations helping to distinguish conscious from unconscious patients. We studied the use of spectral entropy as a measure of focal attention in order to develop a motor-independent, portable, and objective diagnostic tool for patients with locked-in syndrome (LIS), answering the issues of accuracy and training requirement. Data from 20 healthy volunteers, 6 LIS patients, and 10 patients with a vegetative state/unresponsive wakefulness syndrome (VS/UWS) were included. Spectral entropy was computed during a gaze-independent 2-class (attention vs rest) paradigm, and compared with EEG rhythms (delta, theta, alpha, and beta) classification. Spectral entropy classification during the attention-rest paradigm showed 93% and 91% accuracy in healthy volunteers and LIS patients respectively. VS/UWS patients were at chance level. EEG rhythms classification reached a lower accuracy than spectral entropy. Resting-state EEG spectral entropy could not distinguish individual VS/UWS patients from LIS patients. The present study provides evidence that an EEG-based measure of attention could detect command-following in patients with severe motor disabilities. The entropy system could detect a response to command in all healthy subjects and LIS patients, while none of the VS/UWS patients showed a response to command using this system.
Beyond mind-reading: multi-voxel pattern analysis of fMRI data.
Norman, Kenneth A; Polyn, Sean M; Detre, Greg J; Haxby, James V
2006-09-01
A key challenge for cognitive neuroscience is determining how mental representations map onto patterns of neural activity. Recently, researchers have started to address this question by applying sophisticated pattern-classification algorithms to distributed (multi-voxel) patterns of functional MRI data, with the goal of decoding the information that is represented in the subject's brain at a particular point in time. This multi-voxel pattern analysis (MVPA) approach has led to several impressive feats of mind reading. More importantly, MVPA methods constitute a useful new tool for advancing our understanding of neural information processing. We review how researchers are using MVPA methods to characterize neural coding and information processing in domains ranging from visual perception to memory search.
Automatic classification of spectra from the Infrared Astronomical Satellite (IRAS)
NASA Technical Reports Server (NTRS)
Cheeseman, Peter; Stutz, John; Self, Matthew; Taylor, William; Goebel, John; Volk, Kevin; Walker, Helen
1989-01-01
A new classification of Infrared spectra collected by the Infrared Astronomical Satellite (IRAS) is presented. The spectral classes were discovered automatically by a program called Auto Class 2. This program is a method for discovering (inducing) classes from a data base, utilizing a Bayesian probability approach. These classes can be used to give insight into the patterns that occur in the particular domain, in this case, infrared astronomical spectroscopy. The classified spectra are the entire Low Resolution Spectra (LRS) Atlas of 5,425 sources. There are seventy-seven classes in this classification and these in turn were meta-classified to produce nine meta-classes. The classification is presented as spectral plots, IRAS color-color plots, galactic distribution plots and class commentaries. Cross-reference tables, listing the sources by IRAS name and by Auto Class class, are also given. These classes show some of the well known classes, such as the black-body class, and silicate emission classes, but many other classes were unsuspected, while others show important subtle differences within the well known classes.
Ghorai, Santanu; Mukherjee, Anirban; Dutta, Pranab K
2010-06-01
In this brief we have proposed the multiclass data classification by computationally inexpensive discriminant analysis through vector-valued regularized kernel function approximation (VVRKFA). VVRKFA being an extension of fast regularized kernel function approximation (FRKFA), provides the vector-valued response at single step. The VVRKFA finds a linear operator and a bias vector by using a reduced kernel that maps a pattern from feature space into the low dimensional label space. The classification of patterns is carried out in this low dimensional label subspace. A test pattern is classified depending on its proximity to class centroids. The effectiveness of the proposed method is experimentally verified and compared with multiclass support vector machine (SVM) on several benchmark data sets as well as on gene microarray data for multi-category cancer classification. The results indicate the significant improvement in both training and testing time compared to that of multiclass SVM with comparable testing accuracy principally in large data sets. Experiments in this brief also serve as comparison of performance of VVRKFA with stratified random sampling and sub-sampling.
Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants.
Yousef, Malik; Saçar Demirci, Müşerref Duygu; Khalifa, Waleed; Allmer, Jens
2016-01-01
MicroRNAs (miRNAs) are short RNA sequences involved in posttranscriptional gene regulation. Their experimental analysis is complicated and, therefore, needs to be supplemented with computational miRNA detection. Currently computational miRNA detection is mainly performed using machine learning and in particular two-class classification. For machine learning, the miRNAs need to be parametrized and more than 700 features have been described. Positive training examples for machine learning are readily available, but negative data is hard to come by. Therefore, it seems prerogative to use one-class classification instead of two-class classification. Previously, we were able to almost reach two-class classification accuracy using one-class classifiers. In this work, we employ feature selection procedures in conjunction with one-class classification and show that there is up to 36% difference in accuracy among these feature selection methods. The best feature set allowed the training of a one-class classifier which achieved an average accuracy of ~95.6% thereby outperforming previous two-class-based plant miRNA detection approaches by about 0.5%. We believe that this can be improved upon in the future by rigorous filtering of the positive training examples and by improving current feature clustering algorithms to better target pre-miRNA feature selection.
Data Mining Algorithms for Classification of Complex Biomedical Data
ERIC Educational Resources Information Center
Lan, Liang
2012-01-01
In my dissertation, I will present my research which contributes to solve the following three open problems from biomedical informatics: (1) Multi-task approaches for microarray classification; (2) Multi-label classification of gene and protein prediction from multi-source biological data; (3) Spatial scan for movement data. In microarray…
Thematic accuracy of the National Land Cover Database (NLCD) 2001 land cover for Alaska
Selkowitz, D.J.; Stehman, S.V.
2011-01-01
The National Land Cover Database (NLCD) 2001 Alaska land cover classification is the first 30-m resolution land cover product available covering the entire state of Alaska. The accuracy assessment of the NLCD 2001 Alaska land cover classification employed a geographically stratified three-stage sampling design to select the reference sample of pixels. Reference land cover class labels were determined via fixed wing aircraft, as the high resolution imagery used for determining the reference land cover classification in the conterminous U.S. was not available for most of Alaska. Overall thematic accuracy for the Alaska NLCD was 76.2% (s.e. 2.8%) at Level II (12 classes evaluated) and 83.9% (s.e. 2.1%) at Level I (6 classes evaluated) when agreement was defined as a match between the map class and either the primary or alternate reference class label. When agreement was defined as a match between the map class and primary reference label only, overall accuracy was 59.4% at Level II and 69.3% at Level I. The majority of classification errors occurred at Level I of the classification hierarchy (i.e., misclassifications were generally to a different Level I class, not to a Level II class within the same Level I class). Classification accuracy was higher for more abundant land cover classes and for pixels located in the interior of homogeneous land cover patches. ?? 2011.
Geometric mean for subspace selection.
Tao, Dacheng; Li, Xuelong; Wu, Xindong; Maybank, Stephen J
2009-02-01
Subspace selection approaches are powerful tools in pattern classification and data visualization. One of the most important subspace approaches is the linear dimensionality reduction step in the Fisher's linear discriminant analysis (FLDA), which has been successfully employed in many fields such as biometrics, bioinformatics, and multimedia information management. However, the linear dimensionality reduction step in FLDA has a critical drawback: for a classification task with c classes, if the dimension of the projected subspace is strictly lower than c - 1, the projection to a subspace tends to merge those classes, which are close together in the original feature space. If separate classes are sampled from Gaussian distributions, all with identical covariance matrices, then the linear dimensionality reduction step in FLDA maximizes the mean value of the Kullback-Leibler (KL) divergences between different classes. Based on this viewpoint, the geometric mean for subspace selection is studied in this paper. Three criteria are analyzed: 1) maximization of the geometric mean of the KL divergences, 2) maximization of the geometric mean of the normalized KL divergences, and 3) the combination of 1 and 2. Preliminary experimental results based on synthetic data, UCI Machine Learning Repository, and handwriting digits show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.
Ethnic Identity and Perceived Stress Among Ethnically Diverse Immigrants.
Espinosa, Adriana; Tikhonov, Aleksandr; Ellman, Lauren M; Kern, David M; Lui, Florence; Anglin, Deidre
2018-02-01
Recent empirical research suggests that having a strong ethnic identity may be associated with reduced perceived stress. However, the relationship between perceived stress and ethnic identity has not been tested in a large and ethnically diverse sample of immigrants. This study utilized a multi-group latent class analysis of ethnic identity on a sample of first and second generation immigrants (N = 1603), to determine ethnic identity classifications, and their relation to perceived stress. A 4-class ethnic identity structure best fit the data for this immigrant sample, and the proportion within each class varied by ethnicity, but not immigrant generation. High ethnic identity was found to be protective against perceived stress, and this finding was invariant across ethnicity. This study extends the findings of previous research on the protective effect of ethnic identity against perceived stress to immigrant populations of diverse ethnic origins.
Jiménez-Carvelo, Ana M; González-Casado, Antonio; Pérez-Castaño, Estefanía; Cuadros-Rodríguez, Luis
2017-03-01
A new analytical method for the differentiation of olive oil from other vegetable oils using reversed-phase LC and applying chemometric techniques was developed. A 3 cm short column was used to obtain the chromatographic fingerprint of the methyl-transesterified fraction of each vegetable oil. The chromatographic analysis took only 4 min. The multivariate classification methods used were k-nearest neighbors, partial least-squares (PLS) discriminant analysis, one-class PLS, support vector machine classification, and soft independent modeling of class analogies. The discrimination of olive oil from other vegetable edible oils was evaluated by several classification quality metrics. Several strategies for the classification of the olive oil were used: one input-class, two input-class, and pseudo two input-class.
Automatic Classification of Time-variable X-Ray Sources
NASA Astrophysics Data System (ADS)
Lo, Kitty K.; Farrell, Sean; Murphy, Tara; Gaensler, B. M.
2014-05-01
To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, and other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ~97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7-500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.
Quality grading of Atlantic salmon (Salmo salar) by computer vision.
Misimi, E; Erikson, U; Skavhaug, A
2008-06-01
In this study, we present a promising method of computer vision-based quality grading of whole Atlantic salmon (Salmo salar). Using computer vision, it was possible to differentiate among different quality grades of Atlantic salmon based on the external geometrical information contained in the fish images. Initially, before the image acquisition, the fish were subjectively graded and labeled into grading classes by a qualified human inspector in the processing plant. Prior to classification, the salmon images were segmented into binary images, and then feature extraction was performed on the geometrical parameters of the fish from the grading classes. The classification algorithm was a threshold-based classifier, which was designed using linear discriminant analysis. The performance of the classifier was tested by using the leave-one-out cross-validation method, and the classification results showed a good agreement between the classification done by human inspectors and by the computer vision. The computer vision-based method classified correctly 90% of the salmon from the data set as compared with the classification by human inspector. Overall, it was shown that computer vision can be used as a powerful tool to grade Atlantic salmon into quality grades in a fast and nondestructive manner by a relatively simple classifier algorithm. The low cost of implementation of today's advanced computer vision solutions makes this method feasible for industrial purposes in fish plants as it can replace manual labor, on which grading tasks still rely.
Changes in classification of genetic variants in BRCA1 and BRCA2.
Kast, Karin; Wimberger, Pauline; Arnold, Norbert
2018-02-01
Classification of variants of unknown significance (VUS) in the breast cancer genes BRCA1 and BRCA2 changes with accumulating evidence for clinical relevance. In most cases down-staging towards neutral variants without clinical significance is possible. We searched the database of the German Consortium for Hereditary Breast and Ovarian Cancer (GC-HBOC) for changes in classification of genetic variants as an update to our earlier publication on genetic variants in the Centre of Dresden. Changes between 2015 and 2017 were recorded. In the group of variants of unclassified significance (VUS, Class 3, uncertain), only changes of classification towards neutral genetic variants were noted. In BRCA1, 25% of the Class 3 variants (n = 2/8) changed to Class 2 (likely benign) and Class 1 (benign). In BRCA2, in 50% of the Class 3 variants (n = 16/32), a change to Class 2 (n = 10/16) or Class 1 (n = 6/16) was observed. No change in classification was noted in Class 4 (likely pathogenic) and Class 5 (pathogenic) genetic variants in both genes. No up-staging from Class 1, Class 2 or Class 3 to more clinical significance was observed. All variants with a change in classification in our cohort were down-staged towards no clinical significance by a panel of experts of the German Consortium for Hereditary Breast and Ovarian Cancer (GC-HBOC). Prevention in families with Class 3 variants should be based on pedigree based risks and should not be guided by the presence of a VUS.
Systems Engineering Lessons Learned for Class D Missions
NASA Technical Reports Server (NTRS)
Rojdev, Kristina; Piatek, Irene; Moore, Josh; Calvert, Derek
2015-01-01
One of NASA's goals within human exploration is to determine how to get humans to Mars safely and to live and work on the Martian surface. To accomplish this goal, several smaller missions act as stepping-stones to the larger end goal. NASA uses these smaller missions to develop new technologies and learn about how to survive outside of Low Earth Orbit for long periods. Additionally, keeping a cadence of these missions allows the team to maintain proficiency in the complex art of bringing spacecraft to fruition. Many of these smaller missions are robotic in nature and have smaller timescales, whereas there are others that involve crew and have longer mission timelines. Given the timelines associated with these various missions, different levels of risk and rigor need to be implemented to be more in line with what is appropriate for the mission. Thus, NASA has four different classifications that range from Class A to Class D based on the mission details. One of these projects is the Resource Prospector (RP) Mission, which is a multi-center and multi-institution collaborative project to search for volatiles in the polar regions of the Moon. The RP mission is classified as a Class D mission and as such, has the opportunity to more tightly manage, and therefore accept, greater levels of risk. The requirements for Class D missions were at the forefront of the design and thus presented unique challenges in vehicle development and systems engineering processes. This paper will discuss the systems engineering process at NASA and how that process is tailored for Class D missions, specifically the RP mission.
Ertosun, Mehmet Günhan; Rubin, Daniel L
2015-01-01
Brain glioma is the most common primary malignant brain tumors in adults with different pathologic subtypes: Lower Grade Glioma (LGG) Grade II, Lower Grade Glioma (LGG) Grade III, and Glioblastoma Multiforme (GBM) Grade IV. The survival and treatment options are highly dependent of this glioma grade. We propose a deep learning-based, modular classification pipeline for automated grading of gliomas using digital pathology images. Whole tissue digitized images of pathology slides obtained from The Cancer Genome Atlas (TCGA) were used to train our deep learning modules. Our modular pipeline provides diagnostic quality statistics, such as precision, sensitivity and specificity, of the individual deep learning modules, and (1) facilitates training given the limited data in this domain, (2) enables exploration of different deep learning structures for each module, (3) leads to developing less complex modules that are simpler to analyze, and (4) provides flexibility, permitting use of single modules within the framework or use of other modeling or machine learning applications, such as probabilistic graphical models or support vector machines. Our modular approach helps us meet the requirements of minimum accuracy levels that are demanded by the context of different decision points within a multi-class classification scheme. Convolutional Neural Networks are trained for each module for each sub-task with more than 90% classification accuracies on validation data set, and achieved classification accuracy of 96% for the task of GBM vs LGG classification, 71% for further identifying the grade of LGG into Grade II or Grade III on independent data set coming from new patients from the multi-institutional repository.
NASA Astrophysics Data System (ADS)
Xu, Saiping; Zhao, Qianjun; Yin, Kai; Cui, Bei; Zhang, Xiupeng
2016-10-01
Hollow village is a special phenomenon in the process of urbanization in China, which causes the waste of land resources. Therefore, it's imminent to carry out the hollow village recognition and renovation. However, there are few researches on the remote sensing identification of hollow village. In this context, in order to recognize the abandoned homesteads by remote sensing technique, the experiment was carried out as follows. Firstly, Gram-Schmidt transform method was utilized to complete the image fusion between multi-spectral images and panchromatic image of WorldView-2. Then the fusion images were made edge enhanced by high pass filtering. The multi-resolution segmentation and spectral difference segmentation were carried out to obtain the image objects. Secondly, spectral characteristic parameters were calculated, such as the normalized difference vegetation index (NDVI), the normalized difference water index (NDWI), the normalized difference Soil index (NDSI) etc. The shape feature parameters were extracted, such as Area, Length/Width Ratio and Rectangular Fit etc.. Thirdly, the SEaTH algorithm was used to determine the thresholds and optimize the feature space. Furthermore, the threshold classification method and the random forest classifier were combined, and the appropriate amount of samples were selected to train the classifier in order to determine the important feature parameters and the best classifier parameters involved in classification. Finally, the classification results was verified by computing the confusion matrix. The classification results were continuous and the phenomenon of salt and pepper using pixel classification was avoided effectively. In addition, the results showed that the extracted Abandoned Homesteads were in complete shapes, which could be distinguished from those confusing classes such as Homestead in Use and Roads.
Ertosun, Mehmet Günhan; Rubin, Daniel L.
2015-01-01
Brain glioma is the most common primary malignant brain tumors in adults with different pathologic subtypes: Lower Grade Glioma (LGG) Grade II, Lower Grade Glioma (LGG) Grade III, and Glioblastoma Multiforme (GBM) Grade IV. The survival and treatment options are highly dependent of this glioma grade. We propose a deep learning-based, modular classification pipeline for automated grading of gliomas using digital pathology images. Whole tissue digitized images of pathology slides obtained from The Cancer Genome Atlas (TCGA) were used to train our deep learning modules. Our modular pipeline provides diagnostic quality statistics, such as precision, sensitivity and specificity, of the individual deep learning modules, and (1) facilitates training given the limited data in this domain, (2) enables exploration of different deep learning structures for each module, (3) leads to developing less complex modules that are simpler to analyze, and (4) provides flexibility, permitting use of single modules within the framework or use of other modeling or machine learning applications, such as probabilistic graphical models or support vector machines. Our modular approach helps us meet the requirements of minimum accuracy levels that are demanded by the context of different decision points within a multi-class classification scheme. Convolutional Neural Networks are trained for each module for each sub-task with more than 90% classification accuracies on validation data set, and achieved classification accuracy of 96% for the task of GBM vs LGG classification, 71% for further identifying the grade of LGG into Grade II or Grade III on independent data set coming from new patients from the multi-institutional repository. PMID:26958289
Zhang, Xiong; Zhao, Yacong; Zhang, Yu; Zhong, Xuefei; Fan, Zhaowen
2018-01-01
The novel human-computer interface (HCI) using bioelectrical signals as input is a valuable tool to improve the lives of people with disabilities. In this paper, surface electromyography (sEMG) signals induced by four classes of wrist movements were acquired from four sites on the lower arm with our designed system. Forty-two features were extracted from the time, frequency and time-frequency domains. Optimal channels were determined from single-channel classification performance rank. The optimal-feature selection was according to a modified entropy criteria (EC) and Fisher discrimination (FD) criteria. The feature selection results were evaluated by four different classifiers, and compared with other conventional feature subsets. In online tests, the wearable system acquired real-time sEMG signals. The selected features and trained classifier model were used to control a telecar through four different paradigms in a designed environment with simple obstacles. Performance was evaluated based on travel time (TT) and recognition rate (RR). The results of hardware evaluation verified the feasibility of our acquisition systems, and ensured signal quality. Single-channel analysis results indicated that the channel located on the extensor carpi ulnaris (ECU) performed best with mean classification accuracy of 97.45% for all movement’s pairs. Channels placed on ECU and the extensor carpi radialis (ECR) were selected according to the accuracy rank. Experimental results showed that the proposed FD method was better than other feature selection methods and single-type features. The combination of FD and random forest (RF) performed best in offline analysis, with 96.77% multi-class RR. Online results illustrated that the state-machine paradigm with a 125 ms window had the highest maneuverability and was closest to real-life control. Subjects could accomplish online sessions by three sEMG-based paradigms, with average times of 46.02, 49.06 and 48.08 s, respectively. These experiments validate the feasibility of proposed real-time wearable HCI system and algorithms, providing a potential assistive device interface for persons with disabilities. PMID:29543737
NASA Astrophysics Data System (ADS)
García-Flores, Agustín.; Paz-Gallardo, Abel; Plaza, Antonio; Li, Jun
2016-10-01
This paper describes a new web platform dedicated to the classification of satellite images called Hypergim. The current implementation of this platform enables users to perform classification of satellite images from any part of the world thanks to the worldwide maps provided by Google Maps. To perform this classification, Hypergim uses unsupervised algorithms like Isodata and K-means. Here, we present an extension of the original platform in which we adapt Hypergim in order to use supervised algorithms to improve the classification results. This involves a significant modification of the user interface, providing the user with a way to obtain samples of classes present in the images to use in the training phase of the classification process. Another main goal of this development is to improve the runtime of the image classification process. To achieve this goal, we use a parallel implementation of the Random Forest classification algorithm. This implementation is a modification of the well-known CURFIL software package. The use of this type of algorithms to perform image classification is widespread today thanks to its precision and ease of training. The actual implementation of Random Forest was developed using CUDA platform, which enables us to exploit the potential of several models of NVIDIA graphics processing units using them to execute general purpose computing tasks as image classification algorithms. As well as CUDA, we use other parallel libraries as Intel Boost, taking advantage of the multithreading capabilities of modern CPUs. To ensure the best possible results, the platform is deployed in a cluster of commodity graphics processing units (GPUs), so that multiple users can use the tool in a concurrent way. The experimental results indicate that this new algorithm widely outperform the previous unsupervised algorithms implemented in Hypergim, both in runtime as well as precision of the actual classification of the images.
Ma, Xu; Cheng, Yongmei; Hao, Shuai
2016-12-10
Automatic classification of terrain surfaces from an aerial image is essential for an autonomous unmanned aerial vehicle (UAV) landing at an unprepared site by using vision. Diverse terrain surfaces may show similar spectral properties due to the illumination and noise that easily cause poor classification performance. To address this issue, a multi-stage classification algorithm based on low-rank recovery and multi-feature fusion sparse representation is proposed. First, color moments and Gabor texture feature are extracted from training data and stacked as column vectors of a dictionary. Then we perform low-rank matrix recovery for the dictionary by using augmented Lagrange multipliers and construct a multi-stage terrain classifier. Experimental results on an aerial map database that we prepared verify the classification accuracy and robustness of the proposed method.
Contribution of non-negative matrix factorization to the classification of remote sensing images
NASA Astrophysics Data System (ADS)
Karoui, M. S.; Deville, Y.; Hosseini, S.; Ouamri, A.; Ducrot, D.
2008-10-01
Remote sensing has become an unavoidable tool for better managing our environment, generally by realizing maps of land cover using classification techniques. The classification process requires some pre-processing, especially for data size reduction. The most usual technique is Principal Component Analysis. Another approach consists in regarding each pixel of the multispectral image as a mixture of pure elements contained in the observed area. Using Blind Source Separation (BSS) methods, one can hope to unmix each pixel and to perform the recognition of the classes constituting the observed scene. Our contribution consists in using Non-negative Matrix Factorization (NMF) combined with sparse coding as a solution to BSS, in order to generate new images (which are at least partly separated images) using HRV SPOT images from Oran area, Algeria). These images are then used as inputs of a supervised classifier integrating textural information. The results of classifications of these "separated" images show a clear improvement (correct pixel classification rate improved by more than 20%) compared to classification of initial (i.e. non separated) images. These results show the contribution of NMF as an attractive pre-processing for classification of multispectral remote sensing imagery.
Zhang, Wenyu; Zhang, Zhenjiang
2015-01-01
Decision fusion in sensor networks enables sensors to improve classification accuracy while reducing the energy consumption and bandwidth demand for data transmission. In this paper, we focus on the decentralized multi-class classification fusion problem in wireless sensor networks (WSNs) and a new simple but effective decision fusion rule based on belief function theory is proposed. Unlike existing belief function based decision fusion schemes, the proposed approach is compatible with any type of classifier because the basic belief assignments (BBAs) of each sensor are constructed on the basis of the classifier’s training output confusion matrix and real-time observations. We also derive explicit global BBA in the fusion center under Dempster’s combinational rule, making the decision making operation in the fusion center greatly simplified. Also, sending the whole BBA structure to the fusion center is avoided. Experimental results demonstrate that the proposed fusion rule has better performance in fusion accuracy compared with the naïve Bayes rule and weighted majority voting rule. PMID:26295399
NASA Astrophysics Data System (ADS)
Shyu, Mei-Ling; Sainani, Varsha
The increasing number of network security related incidents have made it necessary for the organizations to actively protect their sensitive data with network intrusion detection systems (IDSs). IDSs are expected to analyze a large volume of data while not placing a significantly added load on the monitoring systems and networks. This requires good data mining strategies which take less time and give accurate results. In this study, a novel data mining assisted multiagent-based intrusion detection system (DMAS-IDS) is proposed, particularly with the support of multiclass supervised classification. These agents can detect and take predefined actions against malicious activities, and data mining techniques can help detect them. Our proposed DMAS-IDS shows superior performance compared to central sniffing IDS techniques, and saves network resources compared to other distributed IDS with mobile agents that activate too many sniffers causing bottlenecks in the network. This is one of the major motivations to use a distributed model based on multiagent platform along with a supervised classification technique.
Photometric redshift estimation based on data mining with PhotoRApToR
NASA Astrophysics Data System (ADS)
Cavuoti, S.; Brescia, M.; De Stefano, V.; Longo, G.
2015-03-01
Photometric redshifts (photo-z) are crucial to the scientific exploitation of modern panchromatic digital surveys. In this paper we present PhotoRApToR (Photometric Research Application To Redshift): a Java/C ++ based desktop application capable to solve non-linear regression and multi-variate classification problems, in particular specialized for photo-z estimation. It embeds a machine learning algorithm, namely a multi-layer neural network trained by the Quasi Newton learning rule, and special tools dedicated to pre- and post-processing data. PhotoRApToR has been successfully tested on several scientific cases. The application is available for free download from the DAME Program web site.
Seo, Jeong Gi; Kwak, Jiyong; Um, Terry Taewoong; Rim, Tyler Hyungtaek
2017-01-01
Deep learning emerges as a powerful tool for analyzing medical images. Retinal disease detection by using computer-aided diagnosis from fundus image has emerged as a new method. We applied deep learning convolutional neural network by using MatConvNet for an automated detection of multiple retinal diseases with fundus photographs involved in STructured Analysis of the REtina (STARE) database. Dataset was built by expanding data on 10 categories, including normal retina and nine retinal diseases. The optimal outcomes were acquired by using a random forest transfer learning based on VGG-19 architecture. The classification results depended greatly on the number of categories. As the number of categories increased, the performance of deep learning models was diminished. When all 10 categories were included, we obtained results with an accuracy of 30.5%, relative classifier information (RCI) of 0.052, and Cohen’s kappa of 0.224. Considering three integrated normal, background diabetic retinopathy, and dry age-related macular degeneration, the multi-categorical classifier showed accuracy of 72.8%, 0.283 RCI, and 0.577 kappa. In addition, several ensemble classifiers enhanced the multi-categorical classification performance. The transfer learning incorporated with ensemble classifier of clustering and voting approach presented the best performance with accuracy of 36.7%, 0.053 RCI, and 0.225 kappa in the 10 retinal diseases classification problem. First, due to the small size of datasets, the deep learning techniques in this study were ineffective to be applied in clinics where numerous patients suffering from various types of retinal disorders visit for diagnosis and treatment. Second, we found that the transfer learning incorporated with ensemble classifiers can improve the classification performance in order to detect multi-categorical retinal diseases. Further studies should confirm the effectiveness of algorithms with large datasets obtained from hospitals. PMID:29095872
Elderly fall risk prediction using static posturography
2017-01-01
Maintaining and controlling postural balance is important for activities of daily living, with poor postural balance being predictive of future falls. This study investigated eyes open and eyes closed standing posturography with elderly adults to identify differences and determine appropriate outcome measure cut-off scores for prospective faller, single-faller, multi-faller, and non-faller classifications. 100 older adults (75.5 ± 6.7 years) stood quietly with eyes open and then eyes closed while Wii Balance Board data were collected. Range in anterior-posterior (AP) and medial-lateral (ML) center of pressure (CoP) motion; AP and ML CoP root mean square distance from mean (RMS); and AP, ML, and vector sum magnitude (VSM) CoP velocity were calculated. Romberg Quotients (RQ) were calculated for all parameters. Participants reported six-month fall history and six-month post-assessment fall occurrence. Groups were retrospective fallers (24), prospective all fallers (42), prospective fallers (22 single, 6 multiple), and prospective non-fallers (47). Non-faller RQ AP range and RQ AP RMS differed from prospective all fallers, fallers, and single fallers. Non-faller eyes closed AP velocity, eyes closed VSM velocity, RQ AP velocity, and RQ VSM velocity differed from multi-fallers. RQ calculations were particularly relevant for elderly fall risk assessments. Cut-off scores from Clinical Cut-off Score, ROC curves, and discriminant functions were clinically viable for multi-faller classification and provided better accuracy than single-faller classification. RQ AP range with cut-off score 1.64 could be used to screen for older people who may fall once. Prospective multi-faller classification with a discriminant function (-1.481 + 0.146 x Eyes Closed AP Velocity—0.114 x Eyes Closed Vector Sum Magnitude Velocity—2.027 x RQ AP Velocity + 2.877 x RQ Vector Sum Magnitude Velocity) and cut-off score 0.541 achieved an accuracy of 84.9% and is viable as a screening tool for older people at risk of multiple falls. PMID:28222191
Elderly fall risk prediction using static posturography.
Howcroft, Jennifer; Lemaire, Edward D; Kofman, Jonathan; McIlroy, William E
2017-01-01
Maintaining and controlling postural balance is important for activities of daily living, with poor postural balance being predictive of future falls. This study investigated eyes open and eyes closed standing posturography with elderly adults to identify differences and determine appropriate outcome measure cut-off scores for prospective faller, single-faller, multi-faller, and non-faller classifications. 100 older adults (75.5 ± 6.7 years) stood quietly with eyes open and then eyes closed while Wii Balance Board data were collected. Range in anterior-posterior (AP) and medial-lateral (ML) center of pressure (CoP) motion; AP and ML CoP root mean square distance from mean (RMS); and AP, ML, and vector sum magnitude (VSM) CoP velocity were calculated. Romberg Quotients (RQ) were calculated for all parameters. Participants reported six-month fall history and six-month post-assessment fall occurrence. Groups were retrospective fallers (24), prospective all fallers (42), prospective fallers (22 single, 6 multiple), and prospective non-fallers (47). Non-faller RQ AP range and RQ AP RMS differed from prospective all fallers, fallers, and single fallers. Non-faller eyes closed AP velocity, eyes closed VSM velocity, RQ AP velocity, and RQ VSM velocity differed from multi-fallers. RQ calculations were particularly relevant for elderly fall risk assessments. Cut-off scores from Clinical Cut-off Score, ROC curves, and discriminant functions were clinically viable for multi-faller classification and provided better accuracy than single-faller classification. RQ AP range with cut-off score 1.64 could be used to screen for older people who may fall once. Prospective multi-faller classification with a discriminant function (-1.481 + 0.146 x Eyes Closed AP Velocity-0.114 x Eyes Closed Vector Sum Magnitude Velocity-2.027 x RQ AP Velocity + 2.877 x RQ Vector Sum Magnitude Velocity) and cut-off score 0.541 achieved an accuracy of 84.9% and is viable as a screening tool for older people at risk of multiple falls.
NASA Astrophysics Data System (ADS)
Roy, A.; Inamdar, A. B.
2017-12-01
Major parts of Upper Godavari River Basin are intensely drought prone and climate vulnerable in Maharashtra State, India. The economy of the state depends on the agronomic productivity of this region. So, it is necessary to monitor and regulate the effects of climate change and anthropogenic activity on agricultural land in that region. This study investigates and maps the barren-lands and alteration of agricultural lands, their decadal deviations with the multi-temporal LANDSAT satellite images; and finally quantifies the agricultural adaptations. This work involves the utilization of remote sensing and GIS tools and modeling. First, climatic trend analysis is carried out with dataset obtained from India Meteorological Department. Then, multi-temporal LANDSAT images are classified (Level I, hybrid classification technique are followed) to determine the decadal Land Use Land Cover (LULC) changes and correlated with the agricultural water demand. Finally, various LANDSAT band analysis is conducted to determine irrigated and non-irrigated cropping area estimation and identifying the agricultural adaptations. The analysis of LANDSAT images shows that barren-lands are most increased class during the study period. The overall spatial extent of barren-lands are increased drastically during the study period. The geospatial study (class-to-class conversion study) shows that, most of the conversion of the barren-lands are from the agricultural land and reserve or open forests. The barren-lands are constantly increasing and the agricultural land is linearly decreasing. Hence, there is an inverse correlation between barren-lands and agricultural land. Moreover, there is a shift to non-irrigated and less water demanding crops, from more water demanding crops, which is a noticeable adaptation. The surface-water availability is highly dependent on rainfall and/or climatic conditions. It is changing either way in a random fashion based upon the quantity of rainfall occurred in near preceding years. The agricultural lands are densely replenished around the dams and natural water bodies which serve as the water supply stations for the irrigation purposes. Hence, the study shows there are alteration in LULC, agricultural practices and surface-water availability and expansion of barren-lands.
Gerli, Sandro; Favilli, Alessandro; Franchini, David; De Giorgi, Marcello; Casucci, Paola; Parazzini, Fabio
2018-01-01
To assess if maternal risk profile and Hospital assistential levels were able to influence the inter-Hospitals comparison in the class 1 and 3 of the "The Ten Group Classification System" (TGCS). A population-based analysis using data from Institutional data-base of an Italian Region was carried out. The 11 maternity wards were divided into two categories: second-level hospitals (SLH), and first-level hospitals (FLH). The recorded deliveries were classified according to the TGCS. To analyze if different maternal characteristics and the hospitals assistential level could influence the cesarean section (CS) risk, a multivariate analysis was done considering separately women in the TGCS class 1 and 3. From January 2011 to December 2013 were recorded 19,987 deliveries. Of those 7,693 were in the TGCS class 1 and 4,919 in the class 3. The CS rates were 20.8% and 14.7% in class 1 (p < 0.0001) and 6.9% and 5.3% (p < 0.0230) in class 3, respectively in the FLH and SLH. The multivariate logistic regression showed that the FLH, older maternal age and gestational diabetes were independent risk factors for CS in groups 1 and 3. Obesity and gestational hypertension were also independent risk factors for group 1. TGCS is a useful tool to analyze the incidence of CS in a single center but in comparing different Hospitals, maternal characteristics and different assistential levels should be considered as potential bias.
46 CFR 56.04-2 - Piping classification according to service.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 46 Shipping 2 2010-10-01 2010-10-01 false Piping classification according to service. 56.04-2... PIPING SYSTEMS AND APPURTENANCES Piping Classification § 56.04-2 Piping classification according to... Piping Classification Service Class 1 Pressure (p.s.i.g.) Temp. (°F) Class B and C poisons 2 I any and 0...
46 CFR 56.04-2 - Piping classification according to service.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 46 Shipping 2 2011-10-01 2011-10-01 false Piping classification according to service. 56.04-2... PIPING SYSTEMS AND APPURTENANCES Piping Classification § 56.04-2 Piping classification according to... Piping Classification Service Class 1 Pressure (p.s.i.g.) Temp. (°F) Class B and C poisons 2 I any and 0...
Uberon, an integrative multi-species anatomy ontology
2012-01-01
We present Uberon, an integrated cross-species ontology consisting of over 6,500 classes representing a variety of anatomical entities, organized according to traditional anatomical classification criteria. The ontology represents structures in a species-neutral way and includes extensive associations to existing species-centric anatomical ontologies, allowing integration of model organism and human data. Uberon provides a necessary bridge between anatomical structures in different taxa for cross-species inference. It uses novel methods for representing taxonomic variation, and has proved to be essential for translational phenotype analyses. Uberon is available at http://uberon.org PMID:22293552
Angle classification revisited 2: a modified Angle classification.
Katz, M I
1992-09-01
Edward Angle, in his classification of malocclusions, appears to have made Class I a range of abnormality, not a point of ideal occlusion. Current goals of orthodontic treatment, however, strive for the designation "Class I occlusion" to be synonymous with the point of ideal intermeshing and not a broad range. If contemporary orthodontists are to continue to use Class I as a goal, then it is appropriate that Dr. Angle's century-old classification, be modified to be more precise.
[ABC supplies classification: a managment tool of costs in nursing].
Lourenço, Karina Gomes; Castilho, Valéria
2006-01-01
The implementation of costs management systems has been extremely helpful to healthcare area owing to their efficacy in cutting expenditures as well as improving service quality. The ABC classification is an applied strategy to stocktaking and control. The research, which consists of an exploratory/descriptive quantitative analysis, has been carried out in order to identify, in a year time period, the demand for supplies at Universidade de Sao Paulo's Hospital. Of 1938 classified materials, 67 itens had been classified that they correspond to the materials with bigger costs for the hospital. 31.3% of these A-Class supplies catalogued items are the nursing materials, more used for the nursing team.
Non-Mutually Exclusive Deep Neural Network Classifier for Combined Modes of Bearing Fault Diagnosis.
Duong, Bach Phi; Kim, Jong-Myon
2018-04-07
The simultaneous occurrence of various types of defects in bearings makes their diagnosis more challenging owing to the resultant complexity of the constituent parts of the acoustic emission (AE) signals. To address this issue, a new approach is proposed in this paper for the detection of multiple combined faults in bearings. The proposed methodology uses a deep neural network (DNN) architecture to effectively diagnose the combined defects. The DNN structure is based on the stacked denoising autoencoder non-mutually exclusive classifier (NMEC) method for combined modes. The NMEC-DNN is trained using data for a single fault and it classifies both single faults and multiple combined faults. The results of experiments conducted on AE data collected through an experimental test-bed demonstrate that the DNN achieves good classification performance with a maximum accuracy of 95%. The proposed method is compared with a multi-class classifier based on support vector machines (SVMs). The NMEC-DNN yields better diagnostic performance in comparison to the multi-class classifier based on SVM. The NMEC-DNN reduces the number of necessary data collections and improves the bearing fault diagnosis performance.
EXhype: A tool for mineral classification using hyperspectral data
NASA Astrophysics Data System (ADS)
Adep, Ramesh Nityanand; shetty, Amba; Ramesh, H.
2017-02-01
Various supervised classification algorithms have been developed to classify earth surface features using hyperspectral data. Each algorithm is modelled based on different human expertises. However, the performance of conventional algorithms is not satisfactory to map especially the minerals in view of their typical spectral responses. This study introduces a new expert system named 'EXhype (Expert system for hyperspectral data classification)' to map minerals. The system incorporates human expertise at several stages of it's implementation: (i) to deal with intra-class variation; (ii) to identify absorption features; (iii) to discriminate spectra by considering absorption features, non-absorption features and by full spectra comparison; and (iv) finally takes a decision based on learning and by emphasizing most important features. It is developed using a knowledge base consisting of an Optimal Spectral Library, Segmented Upper Hull method, Spectral Angle Mapper (SAM) and Artificial Neural Network. The performance of the EXhype is compared with a traditional, most commonly used SAM algorithm using Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data acquired over Cuprite, Nevada, USA. A virtual verification method is used to collect samples information for accuracy assessment. Further, a modified accuracy assessment method is used to get a real users accuracies in cases where only limited or desired classes are considered for classification. With the modified accuracy assessment method, SAM and EXhype yields an overall accuracy of 60.35% and 90.75% and the kappa coefficient of 0.51 and 0.89 respectively. It was also found that the virtual verification method allows to use most desired stratified random sampling method and eliminates all the difficulties associated with it. The experimental results show that EXhype is not only producing better accuracy compared to traditional SAM but, can also rightly classify the minerals. It is proficient in avoiding misclassification between target classes when applied on minerals.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-04-29
... and Professional Classification Report (SQ-CLASS) AGENCY: U.S. Census Bureau, Commerce. ACTION: Notice... directed to Scott Handmaker, Chief, Economic Classifications Operations Branch, U.S. Census Bureau, 8K149... INFORMATION: I. Abstract The Business and Professional Classification Report survey (SQ- CLASS) collects...
Can a CNN recognize Catalan diet?
NASA Astrophysics Data System (ADS)
Herruzo, P.; Bolaños, M.; Radeva, P.
2016-10-01
Nowadays, we can find several diseases related to the unhealthy diet habits of the population, such as diabetes, obesity, anemia, bulimia and anorexia. In many cases, these diseases are related to the food consumption of people. Mediterranean diet is scientifically known as a healthy diet that helps to prevent many metabolic diseases. In particular, our work focuses on the recognition of Mediterranean food and dishes. The development of this methodology would allow to analise the daily habits of users with wearable cameras, within the topic of lifelogging. By using automatic mechanisms we could build an objective tool for the analysis of the patient's behavior, allowing specialists to discover unhealthy food patterns and understand the user's lifestyle. With the aim to automatically recognize a complete diet, we introduce a challenging multi-labeled dataset related to Mediter-ranean diet called FoodCAT. The first type of label provided consists of 115 food classes with an average of 400 images per dish, and the second one consists of 12 food categories with an average of 3800 pictures per class. This dataset will serve as a basis for the development of automatic diet recognition. In this context, deep learning and more specifically, Convolutional Neural Networks (CNNs), currently are state-of-the-art methods for automatic food recognition. In our work, we compare several architectures for image classification, with the purpose of diet recognition. Applying the best model for recognising food categories, we achieve a top-1 accuracy of 72.29%, and top-5 of 97.07%. In a complete diet recognition of dishes from Mediterranean diet, enlarged with the Food-101 dataset for international dishes recognition, we achieve a top-1 accuracy of 68.07%, and top-5 of 89.53%, for a total of 115+101 food classes.
2015-12-01
research areas in network science. 15. SUBJECT TERMS scenario creation , emulation environment, NSRL 16. SECURITY CLASSIFICATION OF: 17. LIMITATION...mobility aspect of the emulated environment, the development and creation of scenarios play an integral part. By creating scenarios that model certain...during the visualization phase. We examine these 3 phases in detail by describing the creation of a scenario based upon a vignette from the Multi-Level
Classification of Stellar Spectra with Fuzzy Minimum Within-Class Support Vector Machine
NASA Astrophysics Data System (ADS)
Zhong-bao, Liu; Wen-ai, Song; Jing, Zhang; Wen-juan, Zhao
2017-06-01
Classification is one of the important tasks in astronomy, especially in spectra analysis. Support Vector Machine (SVM) is a typical classification method, which is widely used in spectra classification. Although it performs well in practice, its classification accuracies can not be greatly improved because of two limitations. One is it does not take the distribution of the classes into consideration. The other is it is sensitive to noise. In order to solve the above problems, inspired by the maximization of the Fisher's Discriminant Analysis (FDA) and the SVM separability constraints, fuzzy minimum within-class support vector machine (FMWSVM) is proposed in this paper. In FMWSVM, the distribution of the classes is reflected by the within-class scatter in FDA and the fuzzy membership function is introduced to decrease the influence of the noise. The comparative experiments with SVM on the SDSS datasets verify the effectiveness of the proposed classifier FMWSVM.
Bantis, Leonidas E; Nakas, Christos T; Reiser, Benjamin; Myall, Daniel; Dalrymple-Alford, John C
2017-06-01
The three-class approach is used for progressive disorders when clinicians and researchers want to diagnose or classify subjects as members of one of three ordered categories based on a continuous diagnostic marker. The decision thresholds or optimal cut-off points required for this classification are often chosen to maximize the generalized Youden index (Nakas et al., Stat Med 2013; 32: 995-1003). The effectiveness of these chosen cut-off points can be evaluated by estimating their corresponding true class fractions and their associated confidence regions. Recently, in the two-class case, parametric and non-parametric methods were investigated for the construction of confidence regions for the pair of the Youden-index-based optimal sensitivity and specificity fractions that can take into account the correlation introduced between sensitivity and specificity when the optimal cut-off point is estimated from the data (Bantis et al., Biomet 2014; 70: 212-223). A parametric approach based on the Box-Cox transformation to normality often works well while for markers having more complex distributions a non-parametric procedure using logspline density estimation can be used instead. The true class fractions that correspond to the optimal cut-off points estimated by the generalized Youden index are correlated similarly to the two-class case. In this article, we generalize these methods to the three- and to the general k-class case which involves the classification of subjects into three or more ordered categories, where ROC surface or ROC manifold methodology, respectively, is typically employed for the evaluation of the discriminatory capacity of a diagnostic marker. We obtain three- and multi-dimensional joint confidence regions for the optimal true class fractions. We illustrate this with an application to the Trail Making Test Part A that has been used to characterize cognitive impairment in patients with Parkinson's disease.
Novel Computational Protocols for Functionally Classifying and Characterising Serine Beta-Lactamases
Das, Sayoni; Dawson, Natalie L.; Dobrijevic, Dragana; Orengo, Christine
2016-01-01
Beta-lactamases represent the main bacterial mechanism of resistance to beta-lactam antibiotics and are a significant challenge to modern medicine. We have developed an automated classification and analysis protocol that exploits structure- and sequence-based approaches and which allows us to propose a grouping of serine beta-lactamases that more consistently captures and rationalizes the existing three classification schemes: Classes, (A, C and D, which vary in their implementation of the mechanism of action); Types (that largely reflect evolutionary distance measured by sequence similarity); and Variant groups (which largely correspond with the Bush-Jacoby clinical groups). Our analysis platform exploits a suite of in-house and public tools to identify Functional Determinants (FDs), i.e. residue sites, responsible for conferring different phenotypes between different classes, different types and different variants. We focused on Class A beta-lactamases, the most highly populated and clinically relevant class, to identify FDs implicated in the distinct phenotypes associated with different Class A Types and Variants. We show that our FunFHMMer method can separate the known beta-lactamase classes and identify those positions likely to be responsible for the different implementations of the mechanism of action in these enzymes. Two novel algorithms, ASSP and SSPA, allow detection of FD sites likely to contribute to the broadening of the substrate profiles. Using our approaches, we recognise 151 Class A types in UniProt. Finally, we used our beta-lactamase FunFams and ASSP profiles to detect 4 novel Class A types in microbiome samples. Our platforms have been validated by literature studies, in silico analysis and some targeted experimental verification. Although developed for the serine beta-lactamases they could be used to classify and analyse any diverse protein superfamily where sub-families have diverged over both long and short evolutionary timescales. PMID:27332861
Prat, Chantal; Besalú, Emili; Bañeras, Lluís; Anticó, Enriqueta
2011-06-15
The volatile fraction of aqueous cork macerates of tainted and non-tainted agglomerate cork stoppers was analysed by headspace solid-phase microextraction (HS-SPME)/gas chromatography. Twenty compounds containing terpenoids, aliphatic alcohols, lignin-related compounds and others were selected and analysed in individual corks. Cork stoppers were previously classified in six different classes according to sensory descriptions including, 2,4,6-trichloroanisole taint and other frequent, non-characteristic odours found in cork. A multivariate analysis of the chromatographic data of 20 selected chemical compounds using linear discriminant analysis models helped in the differentiation of the a priori made groups. The discriminant model selected five compounds as the best combination. Selected compounds appear in the model in the following order; 2,4,6 TCA, fenchyl alcohol, 1-octen-3-ol, benzyl alcohol and benzothiazole. Unfortunately, not all six a priori differentiated sensory classes were clearly discriminated in the model, probably indicating that no measurable differences exist in the chromatographic data for some categories. The predictive analyses of a refined model in which two sensory classes were fused together resulted in a good classification. Prediction rates of control (non-tainted), TCA, musty-earthy-vegetative, vegetative and chemical descriptions were 100%, 100%, 85%, 67.3% and 100%, respectively, when the modified model was used. The multivariate analysis of chromatographic data will help in the classification of stoppers and provide a perfect complement to sensory analyses. Copyright © 2010 Elsevier Ltd. All rights reserved.
Handling Imbalanced Data Sets in Multistage Classification
NASA Astrophysics Data System (ADS)
López, M.
Multistage classification is a logical approach, based on a divide-and-conquer solution, for dealing with problems with a high number of classes. The classification problem is divided into several sequential steps, each one associated to a single classifier that works with subgroups of the original classes. In each level, the current set of classes is split into smaller subgroups of classes until they (the subgroups) are composed of only one class. The resulting chain of classifiers can be represented as a tree, which (1) simplifies the classification process by using fewer categories in each classifier and (2) makes it possible to combine several algorithms or use different attributes in each stage. Most of the classification algorithms can be biased in the sense of selecting the most populated class in overlapping areas of the input space. This can degrade a multistage classifier performance if the training set sample frequencies do not reflect the real prevalence in the population. Several techniques such as applying prior probabilities, assigning weights to the classes, or replicating instances have been developed to overcome this handicap. Most of them are designed for two-class (accept-reject) problems. In this article, we evaluate several of these techniques as applied to multistage classification and analyze how they can be useful for astronomy. We compare the results obtained by classifying a data set based on Hipparcos with and without these methods.
Kennen, Jonathan G.; Henriksen, James A.; Nieswand, Steven P.
2007-01-01
The natural flow regime paradigm and parallel stream ecological concepts and theories have established the benefits of maintaining or restoring the full range of natural hydrologic variation for physiochemical processes, biodiversity, and the evolutionary potential of aquatic and riparian communities. A synthesis of recent advances in hydroecological research coupled with stream classification has resulted in a new process to determine environmental flows and assess hydrologic alteration. This process has national and international applicability. It allows classification of streams into hydrologic stream classes and identification of a set of non-redundant and ecologically relevant hydrologic indices for 10 critical sub-components of flow. Three computer programs have been developed for implementing the Hydroecological Integrity Assessment Process (HIP): (1) the Hydrologic Indices Tool (HIT), which calculates 171 ecologically relevant hydrologic indices on the basis of daily-flow and peak-flow stream-gage data; (2) the New Jersey Hydrologic Assessment Tool (NJHAT), which can be used to establish a hydrologic baseline period, provide options for setting baseline environmental-flow standards, and compare past and proposed streamflow alterations; and (3) the New Jersey Stream Classification Tool (NJSCT), designed for placing unclassified streams into pre-defined stream classes. Biological and multivariate response models including principal-component, cluster, and discriminant-function analyses aided in the development of software and implementation of the HIP for New Jersey. A pilot effort is currently underway by the New Jersey Department of Environmental Protection in which the HIP is being used to evaluate the effects of past and proposed surface-water use, ground-water extraction, and land-use changes on stream ecosystems while determining the most effective way to integrate the process into ongoing regulatory programs. Ultimately, this scientifically defensible process will help to quantify the effects of anthropogenic changes and development on hydrologic variability and help planners and resource managers balance current and future water requirements with ecological needs.
Machine-learning in grading of gliomas based on multi-parametric magnetic resonance imaging at 3T.
Citak-Er, Fusun; Firat, Zeynep; Kovanlikaya, Ilhami; Ture, Ugur; Ozturk-Isik, Esin
2018-06-15
The objective of this study was to assess the contribution of multi-parametric (mp) magnetic resonance imaging (MRI) quantitative features in the machine learning-based grading of gliomas with a multi-region-of-interests approach. Forty-three patients who were newly diagnosed as having a glioma were included in this study. The patients were scanned prior to any therapy using a standard brain tumor magnetic resonance (MR) imaging protocol that included T1 and T2-weighted, diffusion-weighted, diffusion tensor, MR perfusion and MR spectroscopic imaging. Three different regions-of-interest were drawn for each subject to encompass tumor, immediate tumor periphery, and distant peritumoral edema/normal. The normalized mp-MRI features were used to build machine-learning models for differentiating low-grade gliomas (WHO grades I and II) from high grades (WHO grades III and IV). In order to assess the contribution of regional mp-MRI quantitative features to the classification models, a support vector machine-based recursive feature elimination method was applied prior to classification. A machine-learning model based on support vector machine algorithm with linear kernel achieved an accuracy of 93.0%, a specificity of 86.7%, and a sensitivity of 96.4% for the grading of gliomas using ten-fold cross validation based on the proposed subset of the mp-MRI features. In this study, machine-learning based on multiregional and multi-parametric MRI data has proven to be an important tool in grading glial tumors accurately even in this limited patient population. Future studies are needed to investigate the use of machine learning algorithms for brain tumor classification in a larger patient cohort. Copyright © 2018. Published by Elsevier Ltd.
Bromuri, Stefano; Zufferey, Damien; Hennebert, Jean; Schumacher, Michael
2014-10-01
This research is motivated by the issue of classifying illnesses of chronically ill patients for decision support in clinical settings. Our main objective is to propose multi-label classification of multivariate time series contained in medical records of chronically ill patients, by means of quantization methods, such as bag of words (BoW), and multi-label classification algorithms. Our second objective is to compare supervised dimensionality reduction techniques to state-of-the-art multi-label classification algorithms. The hypothesis is that kernel methods and locality preserving projections make such algorithms good candidates to study multi-label medical time series. We combine BoW and supervised dimensionality reduction algorithms to perform multi-label classification on health records of chronically ill patients. The considered algorithms are compared with state-of-the-art multi-label classifiers in two real world datasets. Portavita dataset contains 525 diabetes type 2 (DT2) patients, with co-morbidities of DT2 such as hypertension, dyslipidemia, and microvascular or macrovascular issues. MIMIC II dataset contains 2635 patients affected by thyroid disease, diabetes mellitus, lipoid metabolism disease, fluid electrolyte disease, hypertensive disease, thrombosis, hypotension, chronic obstructive pulmonary disease (COPD), liver disease and kidney disease. The algorithms are evaluated using multi-label evaluation metrics such as hamming loss, one error, coverage, ranking loss, and average precision. Non-linear dimensionality reduction approaches behave well on medical time series quantized using the BoW algorithm, with results comparable to state-of-the-art multi-label classification algorithms. Chaining the projected features has a positive impact on the performance of the algorithm with respect to pure binary relevance approaches. The evaluation highlights the feasibility of representing medical health records using the BoW for multi-label classification tasks. The study also highlights that dimensionality reduction algorithms based on kernel methods, locality preserving projections or both are good candidates to deal with multi-label classification tasks in medical time series with many missing values and high label density. Copyright © 2014 Elsevier Inc. All rights reserved.
Medical image classification based on multi-scale non-negative sparse coding.
Zhang, Ruijie; Shen, Jian; Wei, Fushan; Li, Xiong; Sangaiah, Arun Kumar
2017-11-01
With the rapid development of modern medical imaging technology, medical image classification has become more and more important in medical diagnosis and clinical practice. Conventional medical image classification algorithms usually neglect the semantic gap problem between low-level features and high-level image semantic, which will largely degrade the classification performance. To solve this problem, we propose a multi-scale non-negative sparse coding based medical image classification algorithm. Firstly, Medical images are decomposed into multiple scale layers, thus diverse visual details can be extracted from different scale layers. Secondly, for each scale layer, the non-negative sparse coding model with fisher discriminative analysis is constructed to obtain the discriminative sparse representation of medical images. Then, the obtained multi-scale non-negative sparse coding features are combined to form a multi-scale feature histogram as the final representation for a medical image. Finally, SVM classifier is combined to conduct medical image classification. The experimental results demonstrate that our proposed algorithm can effectively utilize multi-scale and contextual spatial information of medical images, reduce the semantic gap in a large degree and improve medical image classification performance. Copyright © 2017 Elsevier B.V. All rights reserved.
Multi-Agent Information Classification Using Dynamic Acquaintance Lists.
ERIC Educational Resources Information Center
Mukhopadhyay, Snehasis; Peng, Shengquan; Raje, Rajeev; Palakal, Mathew; Mostafa, Javed
2003-01-01
Discussion of automated information services focuses on information classification and collaborative agents, i.e. intelligent computer programs. Highlights include multi-agent systems; distributed artificial intelligence; thesauri; document representation and classification; agent modeling; acquaintances, or remote agents discovered through…
Fesharaki, Nooshin Jafari; Pourghassem, Hossein
2013-07-01
Due to the daily mass production and the widespread variation of medical X-ray images, it is necessary to classify these for searching and retrieving proposes, especially for content-based medical image retrieval systems. In this paper, a medical X-ray image hierarchical classification structure based on a novel merging and splitting scheme and using shape and texture features is proposed. In the first level of the proposed structure, to improve the classification performance, similar classes with regard to shape contents are grouped based on merging measures and shape features into the general overlapped classes. In the next levels of this structure, the overlapped classes split in smaller classes based on the classification performance of combination of shape and texture features or texture features only. Ultimately, in the last levels, this procedure is also continued forming all the classes, separately. Moreover, to optimize the feature vector in the proposed structure, we use orthogonal forward selection algorithm according to Mahalanobis class separability measure as a feature selection and reduction algorithm. In other words, according to the complexity and inter-class distance of each class, a sub-space of the feature space is selected in each level and then a supervised merging and splitting scheme is applied to form the hierarchical classification. The proposed structure is evaluated on a database consisting of 2158 medical X-ray images of 18 classes (IMAGECLEF 2005 database) and accuracy rate of 93.6% in the last level of the hierarchical structure for an 18-class classification problem is obtained.
NASA Astrophysics Data System (ADS)
Liu, Tao; Abd-Elrahman, Amr
2018-05-01
Deep convolutional neural network (DCNN) requires massive training datasets to trigger its image classification power, while collecting training samples for remote sensing application is usually an expensive process. When DCNN is simply implemented with traditional object-based image analysis (OBIA) for classification of Unmanned Aerial systems (UAS) orthoimage, its power may be undermined if the number training samples is relatively small. This research aims to develop a novel OBIA classification approach that can take advantage of DCNN by enriching the training dataset automatically using multi-view data. Specifically, this study introduces a Multi-View Object-based classification using Deep convolutional neural network (MODe) method to process UAS images for land cover classification. MODe conducts the classification on multi-view UAS images instead of directly on the orthoimage, and gets the final results via a voting procedure. 10-fold cross validation results show the mean overall classification accuracy increasing substantially from 65.32%, when DCNN was applied on the orthoimage to 82.08% achieved when MODe was implemented. This study also compared the performances of the support vector machine (SVM) and random forest (RF) classifiers with DCNN under traditional OBIA and the proposed multi-view OBIA frameworks. The results indicate that the advantage of DCNN over traditional classifiers in terms of accuracy is more obvious when these classifiers were applied with the proposed multi-view OBIA framework than when these classifiers were applied within the traditional OBIA framework.
NASA Astrophysics Data System (ADS)
Molinario, G.; Hansen, M.; Potapov, P.; Altstatt, A. L.; Justice, C. O.
2012-12-01
The FACET forest cover and forest cover loss 2000-2005-2010 data set has been produced by South Dakota State University, the University of Maryland and the Kinshasa-based Observatoire Satellital des Forets D'Afrique Central (OSFAC) with funding from the USAID Central African Regional Program for the Environment (CARPE). The product is now available or being finalized for the DRC, the ROC and Gabon with plans to complete all Congo Basin countries. While FACET provides unprecedented synoptic detail in the extent of Congo Basin forest and the forest cover loss, additional information is required to stratify land cover into types indicative of biomass content. Analysis of the FACET patterns of deforestation, more detailed remote sensing analysis of biophysical attributes within the FACET land cover classes and GIS-derived classes of degradation obtained through variable distance buffers based on relevant literature and ground truth data are combined with the existing FACET classes to produce a ranking of land cover from low biomass to high biomass for the Democratic Republic of Congo. The resulting classification can be used in all Reduced Emissions from Degradation and Deforestation (REDD) pre-inventory phases when baseline forest cover needs to be known and the location and amount of forest biomass inventory plots needs to be designed. FACET cover loss classes were kept in the classification and can provide the Monitoring, Reporting and Verification tools needed for REDD projects. The project will be demonstrated for the Maringa Lopori Wamba Landscape of the DRC where this work was funded by the African Wildlife Foundation to support the design of a REDD pilot project.
Botanical discrimination of Greek unifloral honeys with physico-chemical and chemometric analyses.
Karabagias, Ioannis K; Badeka, Anastasia V; Kontakos, Stavros; Karabournioti, Sofia; Kontominas, Michael G
2014-12-15
The aim of the present study was to investigate the possibility of characterisation and classification of Greek unifloral honeys (pine, thyme, fir and orange blossom) according to botanical origin using volatile compounds, conventional physico-chemical parameters and chemometric analyses (MANOVA and Linear Discriminant Analysis). For this purpose, 119 honey samples were collected during the harvesting period 2011 from 14 different regions in Greece known to produce unifloral honey of good quality. Physico-chemical analysis included the identification and semi quantification of fifty five volatile compounds performed by Headspace Solid Phase Microextraction coupled to gas chromatography/mass spectroscopy and the determination of conventional quality parameters such as pH, free, lactonic, total acidity, electrical conductivity, moisture, ash, lactonic/free acidity ratio and colour parameters L, a, b. Results showed that using 40 diverse variables (30 volatile compounds of different classes and 10 physico-chemical parameters) the honey samples were satisfactorily classified according to botanical origin using volatile compounds (84.0% correct prediction), physicochemical parameters (97.5% correct prediction), and the combination of both (95.8% correct prediction) indicating that multi element analysis comprises a powerful tool for honey discrimination purposes. Copyright © 2014 Elsevier Ltd. All rights reserved.
Regidor, E
2001-01-01
Two of the most important theory-based social class classifications are that of the neo-Weberian Goldthorpe and that of the neo-Marxist Wright. The social class classification proposal of the SES Working Group employed the Goldthorpe schema as a reference due to the empirical and mainly pragmatic aspects involved. In this article, these aspects are discussed and it is also discussed the problem of the validation of the measurements of social class and the problem of the use of the social class as an independent variable.
NASA Astrophysics Data System (ADS)
Mabu, Shingo; Kido, Shoji; Hashimoto, Noriaki; Hirano, Yasushi; Kuremoto, Takashi
2018-02-01
This research proposes a multi-channel deep convolutional neural network (DCNN) for computer-aided diagnosis (CAD) that classifies normal and abnormal opacities of diffuse lung diseases in Computed Tomography (CT) images. Because CT images are gray scale, DCNN usually uses one channel for inputting image data. On the other hand, this research uses multi-channel DCNN where each channel corresponds to the original raw image or the images transformed by some preprocessing techniques. In fact, the information obtained only from raw images is limited and some conventional research suggested that preprocessing of images contributes to improving the classification accuracy. Thus, the combination of the original and preprocessed images is expected to show higher accuracy. The proposed method realizes region of interest (ROI)-based opacity annotation. We used lung CT images taken in Yamaguchi University Hospital, Japan, and they are divided into 32 × 32 ROI images. The ROIs contain six kinds of opacities: consolidation, ground-glass opacity (GGO), emphysema, honeycombing, nodular, and normal. The aim of the proposed method is to classify each ROI into one of the six opacities (classes). The DCNN structure is based on VGG network that secured the first and second places in ImageNet ILSVRC-2014. From the experimental results, the classification accuracy of the proposed method was better than the conventional method with single channel, and there was a significant difference between them.
Myatt, Mark; Limburg, Hans; Minassian, Darwin; Katyola, Damson
2003-01-01
OBJECTIVE: To test the applicability of lot quality assurance sampling (LQAS) for the rapid assessment of the prevalence of active trachoma. METHODS: Prevalence of active trachoma in six communities was found by examining all children aged 2-5 years. Trial surveys were conducted in these communities. A sampling plan appropriate for classifying communities with prevalences < or =20% and > or =40% was applied to the survey data. Operating characteristic and average sample number curves were plotted, and screening test indices were calculated. The ability of LQAS to provide a three-class classification system was investigated. FINDINGS: Ninety-six trial surveys were conducted. All communities with prevalences < or =20% and > or =40% were identified correctly. The method discriminated between communities with prevalences < or =30% and >30%, with sensitivity of 98% (95% confidence interval (CI)=88.2-99.9%), specificity of 84.4% (CI=69.9-93.0%), positive predictive value of 87.7% (CI=75.7-94.5%), negative predictive value of 97.4% (CI=84.9-99.9%), and accuracy of 91.7% (CI=83.8-96.1%). Agreement between the three prevalence classes and survey classifications was 84.4% (CI=75.2-90.7%). The time needed to complete the surveys was consistent with the need to complete a survey in one day. CONCLUSION: Lot quality assurance sampling provides a method of classifying communities according to the prevalence of active trachoma. It merits serious consideration as a replacement for the assessment of the prevalence of active trachoma with the currently used trachoma rapid assessment method. It may be extended to provide a multi-class classification method. PMID:14997240
Taxonomy of multi-focal nematode image stacks by a CNN based image fusion approach.
Liu, Min; Wang, Xueping; Zhang, Hongzhong
2018-03-01
In the biomedical field, digital multi-focal images are very important for documentation and communication of specimen data, because the morphological information for a transparent specimen can be captured in form of a stack of high-quality images. Given biomedical image stacks containing multi-focal images, how to efficiently extract effective features from all layers to classify the image stacks is still an open question. We present to use a deep convolutional neural network (CNN) image fusion based multilinear approach for the taxonomy of multi-focal image stacks. A deep CNN based image fusion technique is used to combine relevant information of multi-focal images within a given image stack into a single image, which is more informative and complete than any single image in the given stack. Besides, multi-focal images within a stack are fused along 3 orthogonal directions, and multiple features extracted from the fused images along different directions are combined by canonical correlation analysis (CCA). Because multi-focal image stacks represent the effect of different factors - texture, shape, different instances within the same class and different classes of objects, we embed the deep CNN based image fusion method within a multilinear framework to propose an image fusion based multilinear classifier. The experimental results on nematode multi-focal image stacks demonstrated that the deep CNN image fusion based multilinear classifier can reach a higher classification rate (95.7%) than that by the previous multilinear based approach (88.7%), even we only use the texture feature instead of the combination of texture and shape features as in the previous work. The proposed deep CNN image fusion based multilinear approach shows great potential in building an automated nematode taxonomy system for nematologists. It is effective to classify multi-focal image stacks. Copyright © 2018 Elsevier B.V. All rights reserved.
Bone age assessment meets SIFT
NASA Astrophysics Data System (ADS)
Kashif, Muhammad; Jonas, Stephan; Haak, Daniel; Deserno, Thomas M.
2015-03-01
Bone age assessment (BAA) is a method of determining the skeletal maturity and finding the growth disorder in the skeleton of a person. BAA is frequently used in pediatric medicine but also a time-consuming and cumbersome task for a radiologist. Conventionally, the Greulich and Pyle and the Tanner and Whitehouse methods are used for bone age assessment, which are based on visual comparison of left hand radiographs with a standard atlas. We present a novel approach for automated bone age assessment, combining scale invariant feature transform (SIFT) features and support vector machine (SVM) classification. In this approach, (i) data is grouped into 30 classes to represent the age range of 0- 18 years, (ii) 14 epiphyseal ROIs are extracted from left hand radiographs, (iii) multi-level image thresholding, using Otsu method, is applied to specify key points on bone and osseous tissues of eROIs, (iv) SIFT features are extracted for specified key points for each eROI of hand radiograph, and (v) classification is performed using a multi-class extension of SVM. A total of 1101 radiographs of University of Southern California are used in training and testing phases using 5- fold cross-validation. Evaluation is performed for two age ranges (0-18 years and 2-17 years) for comparison with previous work and the commercial product BoneXpert, respectively. Results were improved significantly, where the mean errors of 0.67 years and 0.68 years for the age ranges 0-18 years and 2-17 years, respectively, were obtained. Accuracy of 98.09 %, within the range of two years was achieved.
Local Climate Zones Classification to Urban Planning in the Mega City of São Paulo - SP, Brazil
NASA Astrophysics Data System (ADS)
Gonçalves Santos, Rafael; Saraiva Lopes, António Manuel; Prata-Shimomura, Alessandra
2017-04-01
Local Climate Zones Classification to Urban Planning in the Mega city of São Paulo - SP, Brazil Tropical megacities have presented a strong trend in growing urban. Urban management in megacities has as one of the biggest challenges is the lack of integration of urban climate and urban planning to promote ecologically smart cities. Local Climatic Zones (LCZs) are considered as important and recognized tool for urban climate management. Classes are local in scale, climatic in nature, and zonal in representation. They can be understood as regions of uniform surface cover, structure, material and human activity that have to a unique climate response. As an initial tool to promote urban climate planning, LCZs represent a simple composition of different land coverages (buildings, vegetation, soils, rock, roads and water). LCZs are divided in 17 classes, they are based on surface cover (built fraction, soil moisture, albedo), surface structure (sky view factor, roughness height) and cultural activity (anthropogenic heat flux). The aim of this study is the application of the LCZs classification system in the megacity of São Paulo, Brazil. Located at a latitude of 23° 21' and longitude 46° 44' near to the Tropic of Capricorn, presenting humid subtropical climate (Cfa) with diversified topographies. The megacity of São Paulo currently concentrates 11.890.000 inhabitants is characterized by large urban conglomerates with impermeable surfaces and high verticalization, having as result high urban heat island intensity. The result indicates predominance in urban zones of Compact low-rise, Compact Mid-rise, Compact High-rise and Open Low-rise. Non-urban regions are mainly covered by dense vegetation and water. The LCZs classification system promotes significant advantages for climate sensitive urban planning in the megacity of São Paulo. They offers new perspectives to the management of temperature and urban ventilation and allows the formulation of urban planning guidelines and climatic. Key words: Local Climatic Zones; Urban Panning; Megacities; São Paulo.
Dynamic ocean provinces: a multi-sensor approach to global marine ecophysiology
NASA Astrophysics Data System (ADS)
Dowell, M.; Campbell, J.; Moore, T.
The concept of oceanic provinces or domains has existed for well over a century. Such systems, whether real or only conceptual, provide a useful framework for understanding the mechanisms controlling biological, physical and chemical processes and their interactions. Criteria have been established for defining provinces based on physical forcings, availability of light and nutrients, complexity of the marine food web, and other factors. In general, such classification systems reflect the heterogeneous nature of the ocean environment, and the effort of scientists to comprehend the whole system by understanding its various homogeneous components. If provinces are defined strictly on the basis of geospatial or temporal criteria (e.g., latitude zones, bathymetry, or season), the resulting maps exhibit discontinuities that are uncharacteristic of the ocean. While this may be useful for many purposes, it is unsatisfactory in that it does not capture the dynamic nature of fluid boundaries in the ocean. Boundaries fixed in time and space do not allow us to observe interannual or longer-term variability (e.g., regime shifts) that may result from climate change. The current study illustrates the potential of using fuzzy logic as a means of classifying the ocean into objectively defined provinces using properties measurable from satellite sensors (MODIS and SeaWiFS). This approach accommodates the dynamic variability of provinces which can be updated as each image is processed. We adopt this classification as the basis for parameterizing specific algorithms for each of the classes. Once the class specific algorithms have been applied, retrievals are then recomposed into a single blended product based on the "weighted" fuzzy memberships. This will be demonstrated through animations of multi-year time- series of monthly composites of the individual classes or provinces. The provinces themselves are identified on the basis of global fields of chlorophyll, sea surface temperature and PAR which will also be subsequently used to parameterize primary production (PP) algorithms. Two applications of the proposed dynamic classification are presented. The first applies different peer-reviewed PP algorithms to the different classes and objectively evaluates their performance to select the algorithm which performs best, and then merges results into a single primary production product. A second application illustrates the variability of P I parameters in each province and- analyzes province specific variability in the quantum yield of photosynthesis. Finally results illustrating how this approach is implemented in estimating global oceanic primary production are presented.
Myer, Gregory D; Wordeman, Samuel C; Sugimoto, Dai; Bates, Nathaniel A; Roewer, Benjamin D; Medina McKeon, Jennifer M; DiCesare, Christopher A; Di Stasi, Stephanie L; Barber Foss, Kim D; Thomas, Staci M; Hewett, Timothy E
2014-05-01
Multi-center collaborations provide a powerful alternative to overcome the inherent limitations to single-center investigations. Specifically, multi-center projects can support large-scale prospective, longitudinal studies that investigate relatively uncommon outcomes, such as anterior cruciate ligament injury. This project was conceived to assess within- and between-center reliability of an affordable, clinical nomogram utilizing two-dimensional video methods to screen for risk of knee injury. The authors hypothesized that the two-dimensional screening methods would provide good-to-excellent reliability within and between institutions for assessment of frontal and sagittal plane biomechanics. Nineteen female, high school athletes participated. Two-dimensional video kinematics of the lower extremity during a drop vertical jump task were collected on all 19 study participants at each of the three facilities. Within-center and between-center reliability were assessed with intra- and inter-class correlation coefficients. Within-center reliability of the clinical nomogram variables was consistently excellent, but between-center reliability was fair-to-good. Within-center intra-class correlation coefficient for all nomogram variables combined was 0.98, while combined between-center inter-class correlation coefficient was 0.63. Injury risk screening protocols were reliable within and repeatable between centers. These results demonstrate the feasibility of multi-site biomechanical studies and establish a framework for further dissemination of injury risk screening algorithms. Specifically, multi-center studies may allow for further validation and optimization of two-dimensional video screening tools. 2b.
NASA Astrophysics Data System (ADS)
Hussein, I.; Wilkins, M.; Roscoe, C.; Faber, W.; Chakravorty, S.; Schumacher, P.
2016-09-01
Finite Set Statistics (FISST) is a rigorous Bayesian multi-hypothesis management tool for the joint detection, classification and tracking of multi-sensor, multi-object systems. Implicit within the approach are solutions to the data association and target label-tracking problems. The full FISST filtering equations, however, are intractable. While FISST-based methods such as the PHD and CPHD filters are tractable, they require heavy moment approximations to the full FISST equations that result in a significant loss of information contained in the collected data. In this paper, we review Smart Sampling Markov Chain Monte Carlo (SSMCMC) that enables FISST to be tractable while avoiding moment approximations. We study the effect of tuning key SSMCMC parameters on tracking quality and computation time. The study is performed on a representative space object catalog with varying numbers of RSOs. The solution is implemented in the Scala computing language at the Maui High Performance Computing Center (MHPCC) facility.
Zhou, Tao; Li, Zhaofu; Pan, Jianjun
2018-01-27
This paper focuses on evaluating the ability and contribution of using backscatter intensity, texture, coherence, and color features extracted from Sentinel-1A data for urban land cover classification and comparing different multi-sensor land cover mapping methods to improve classification accuracy. Both Landsat-8 OLI and Hyperion images were also acquired, in combination with Sentinel-1A data, to explore the potential of different multi-sensor urban land cover mapping methods to improve classification accuracy. The classification was performed using a random forest (RF) method. The results showed that the optimal window size of the combination of all texture features was 9 × 9, and the optimal window size was different for each individual texture feature. For the four different feature types, the texture features contributed the most to the classification, followed by the coherence and backscatter intensity features; and the color features had the least impact on the urban land cover classification. Satisfactory classification results can be obtained using only the combination of texture and coherence features, with an overall accuracy up to 91.55% and a kappa coefficient up to 0.8935, respectively. Among all combinations of Sentinel-1A-derived features, the combination of the four features had the best classification result. Multi-sensor urban land cover mapping obtained higher classification accuracy. The combination of Sentinel-1A and Hyperion data achieved higher classification accuracy compared to the combination of Sentinel-1A and Landsat-8 OLI images, with an overall accuracy of up to 99.12% and a kappa coefficient up to 0.9889. When Sentinel-1A data was added to Hyperion images, the overall accuracy and kappa coefficient were increased by 4.01% and 0.0519, respectively.
TermGenie – a web-application for pattern-based ontology class generation
Dietze, Heiko; Berardini, Tanya Z.; Foulger, Rebecca E.; ...
2014-01-01
Biological ontologies are continually growing and improving from requests for new classes (terms) by biocurators. These ontology requests can frequently create bottlenecks in the biocuration process, as ontology developers struggle to keep up, while manually processing these requests and create classes. TermGenie allows biocurators to generate new classes based on formally specified design patterns or templates. The system is web-based and can be accessed by any authorized curator through a web browser. Automated rules and reasoning engines are used to ensure validity, uniqueness and relationship to pre-existing classes. In the last 4 years the Gene Ontology TermGenie generated 4715 newmore » classes, about 51.4% of all new classes created. The immediate generation of permanent identifiers proved not to be an issue with only 70 (1.4%) obsoleted classes. Lastly, TermGenie is a web-based class-generation system that complements traditional ontology development tools. All classes added through pre-defined templates are guaranteed to have OWL equivalence axioms that are used for automatic classification and in some cases inter-ontology linkage. At the same time, the system is simple and intuitive and can be used by most biocurators without extensive training.« less
TermGenie – a web-application for pattern-based ontology class generation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dietze, Heiko; Berardini, Tanya Z.; Foulger, Rebecca E.
Biological ontologies are continually growing and improving from requests for new classes (terms) by biocurators. These ontology requests can frequently create bottlenecks in the biocuration process, as ontology developers struggle to keep up, while manually processing these requests and create classes. TermGenie allows biocurators to generate new classes based on formally specified design patterns or templates. The system is web-based and can be accessed by any authorized curator through a web browser. Automated rules and reasoning engines are used to ensure validity, uniqueness and relationship to pre-existing classes. In the last 4 years the Gene Ontology TermGenie generated 4715 newmore » classes, about 51.4% of all new classes created. The immediate generation of permanent identifiers proved not to be an issue with only 70 (1.4%) obsoleted classes. Lastly, TermGenie is a web-based class-generation system that complements traditional ontology development tools. All classes added through pre-defined templates are guaranteed to have OWL equivalence axioms that are used for automatic classification and in some cases inter-ontology linkage. At the same time, the system is simple and intuitive and can be used by most biocurators without extensive training.« less
TermGenie - a web-application for pattern-based ontology class generation.
Dietze, Heiko; Berardini, Tanya Z; Foulger, Rebecca E; Hill, David P; Lomax, Jane; Osumi-Sutherland, David; Roncaglia, Paola; Mungall, Christopher J
2014-01-01
Biological ontologies are continually growing and improving from requests for new classes (terms) by biocurators. These ontology requests can frequently create bottlenecks in the biocuration process, as ontology developers struggle to keep up, while manually processing these requests and create classes. TermGenie allows biocurators to generate new classes based on formally specified design patterns or templates. The system is web-based and can be accessed by any authorized curator through a web browser. Automated rules and reasoning engines are used to ensure validity, uniqueness and relationship to pre-existing classes. In the last 4 years the Gene Ontology TermGenie generated 4715 new classes, about 51.4% of all new classes created. The immediate generation of permanent identifiers proved not to be an issue with only 70 (1.4%) obsoleted classes. TermGenie is a web-based class-generation system that complements traditional ontology development tools. All classes added through pre-defined templates are guaranteed to have OWL equivalence axioms that are used for automatic classification and in some cases inter-ontology linkage. At the same time, the system is simple and intuitive and can be used by most biocurators without extensive training.
Sauwen, N; Acou, M; Van Cauter, S; Sima, D M; Veraart, J; Maes, F; Himmelreich, U; Achten, E; Van Huffel, S
2016-01-01
Tumor segmentation is a particularly challenging task in high-grade gliomas (HGGs), as they are among the most heterogeneous tumors in oncology. An accurate delineation of the lesion and its main subcomponents contributes to optimal treatment planning, prognosis and follow-up. Conventional MRI (cMRI) is the imaging modality of choice for manual segmentation, and is also considered in the vast majority of automated segmentation studies. Advanced MRI modalities such as perfusion-weighted imaging (PWI), diffusion-weighted imaging (DWI) and magnetic resonance spectroscopic imaging (MRSI) have already shown their added value in tumor tissue characterization, hence there have been recent suggestions of combining different MRI modalities into a multi-parametric MRI (MP-MRI) approach for brain tumor segmentation. In this paper, we compare the performance of several unsupervised classification methods for HGG segmentation based on MP-MRI data including cMRI, DWI, MRSI and PWI. Two independent MP-MRI datasets with a different acquisition protocol were available from different hospitals. We demonstrate that a hierarchical non-negative matrix factorization variant which was previously introduced for MP-MRI tumor segmentation gives the best performance in terms of mean Dice-scores for the pathologic tissue classes on both datasets.
NASA Astrophysics Data System (ADS)
Heremans, Stien; Suykens, Johan A. K.; Van Orshoven, Jos
2016-02-01
To be physically interpretable, sub-pixel land cover fractions or abundances should fulfill two constraints, the Abundance Non-negativity Constraint (ANC) and the Abundance Sum-to-one Constraint (ASC). This paper focuses on the effect of imposing these constraints onto the MultiLayer Perceptron (MLP) for a multi-class sub-pixel land cover classification of a time series of low resolution MODIS-images covering the northern part of Belgium. Two constraining modes were compared, (i) an in-training approach that uses 'softmax' as the transfer function in the MLP's output layer and (ii) a post-training approach that linearly rescales the outputs of the unconstrained MLP. Our results demonstrate that the pixel-level prediction accuracy is markedly increased by the explicit enforcement, both in-training and post-training, of the ANC and the ASC. For aggregations of pixels (municipalities), the constrained perceptrons perform at least as well as their unconstrained counterparts. Although the difference in performance between the in-training and post-training approach is small, we recommend the former for integrating the fractional abundance constraints into MLPs meant for sub-pixel land cover estimation, regardless of the targeted level of spatial aggregation.
Deep multi-scale convolutional neural network for hyperspectral image classification
NASA Astrophysics Data System (ADS)
Zhang, Feng-zhe; Yang, Xia
2018-04-01
In this paper, we proposed a multi-scale convolutional neural network for hyperspectral image classification task. Firstly, compared with conventional convolution, we utilize multi-scale convolutions, which possess larger respective fields, to extract spectral features of hyperspectral image. We design a deep neural network with a multi-scale convolution layer which contains 3 different convolution kernel sizes. Secondly, to avoid overfitting of deep neural network, dropout is utilized, which randomly sleeps neurons, contributing to improve the classification accuracy a bit. In addition, new skills like ReLU in deep learning is utilized in this paper. We conduct experiments on University of Pavia and Salinas datasets, and obtained better classification accuracy compared with other methods.
Automatic classification of time-variable X-ray sources
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo, Kitty K.; Farrell, Sean; Murphy, Tara
2014-05-01
To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, andmore » other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ∼97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7–500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.« less
ROOT: A C++ framework for petabyte data storage, statistical analysis and visualization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Antcheva, I.; /CERN; Ballintijn, M.
2009-01-01
ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web or a number of different shared file systems. In order to analyze this data, the user can chose outmore » of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally distributing the work over the available resources in a transparent way.« less
Morison, Gordon; Boreham, Philip
2018-01-01
Electromagnetic Interference (EMI) is a technique for capturing Partial Discharge (PD) signals in High-Voltage (HV) power plant apparatus. EMI signals can be non-stationary which makes their analysis difficult, particularly for pattern recognition applications. This paper elaborates upon a previously developed software condition-monitoring model for improved EMI events classification based on time-frequency signal decomposition and entropy features. The idea of the proposed method is to map multiple discharge source signals captured by EMI and labelled by experts, including PD, from the time domain to a feature space, which aids in the interpretation of subsequent fault information. Here, instead of using only one permutation entropy measure, a more robust measure, called Dispersion Entropy (DE), is added to the feature vector. Multi-Class Support Vector Machine (MCSVM) methods are utilized for classification of the different discharge sources. Results show an improved classification accuracy compared to previously proposed methods. This yields to a successful development of an expert’s knowledge-based intelligent system. Since this method is demonstrated to be successful with real field data, it brings the benefit of possible real-world application for EMI condition monitoring. PMID:29385030
A Novel Fiber Optic Based Surveillance System for Prevention of Pipeline Integrity Threats.
Tejedor, Javier; Macias-Guarasa, Javier; Martins, Hugo F; Piote, Daniel; Pastor-Graells, Juan; Martin-Lopez, Sonia; Corredera, Pedro; Gonzalez-Herraez, Miguel
2017-02-12
This paper presents a novel surveillance system aimed at the detection and classification of threats in the vicinity of a long gas pipeline. The sensing system is based on phase-sensitive optical time domain reflectometry ( ϕ -OTDR) technology for signal acquisition and pattern recognition strategies for threat identification. The proposal incorporates contextual information at the feature level and applies a system combination strategy for pattern classification. The contextual information at the feature level is based on the tandem approach (using feature representations produced by discriminatively-trained multi-layer perceptrons) by employing feature vectors that spread different temporal contexts. The system combination strategy is based on a posterior combination of likelihoods computed from different pattern classification processes. The system operates in two different modes: (1) machine + activity identification, which recognizes the activity being carried out by a certain machine, and (2) threat detection, aimed at detecting threats no matter what the real activity being conducted is. In comparison with a previous system based on the same rigorous experimental setup, the results show that the system combination from the contextual feature information improves the results for each individual class in both operational modes, as well as the overall classification accuracy, with statistically-significant improvements.
Mtui, Devolent T.; Lepczyk, Christopher A.; Chen, Qi; Miura, Tomoaki; Cox, Linda J.
2017-01-01
Landscape change in and around protected areas is of concern worldwide given the potential impacts of such change on biodiversity. Given such impacts, we sought to understand the extent of changes in different land-cover types at two protected areas, Tarangire and Katavi National Parks in Tanzania, over the past 27 years. Using Maximum Likelihood classification procedures we derived eight land-cover classes from Landsat TM and ETM+ images, including: woody savannah, savannah, grassland, open and closed shrubland, swamp and water, and bare land. We determined the extent and direction of changes for all land-cover classes using a post-classification comparison technique. The results show declines in woody savannah and increases in barren land and swamps inside and outside Tarangire National Park and increases in woody savannah and savannah, and declines of shrubland and grassland inside and outside Katavi National Park. The decrease of woody savannah was partially due to its conversion into grassland and barren land, possibly caused by human encroachment by cultivation and livestock. Based upon these changes, we recommend management actions to prevent detrimental effects on wildlife populations. PMID:28957397
Real-time Automatic Search for Multi-wavelength Counterparts of DWF Transients
NASA Astrophysics Data System (ADS)
Murphy, Christopher; Cucchiara, Antonino; Andreoni, Igor; Cooke, Jeff; Hegarty, Sarah
2018-01-01
The Deeper Wider Faster (DWF) survey aims to find and classify the fastest transients in the Universe. DWF utilizes the Dark Energy Camera (DECam), collecting a continuous sequence of 20s images over a 3 square degree field of view.Once an interesting transient is detected during DWF observations, the DWF collaboration has access to several facilities for rapid follow-up in multiple wavelengths (from gamma to radio).An online web tool has been designed to help with real-time visual classification of possible astrophysical transients in data collected by the DWF observing program. The goal of this project is to create a python-based code to improve the classification process by querying several existing archive databases. Given the DWF transient location and search radius, the developed code will extract a list of possible counterparts and all available information (e.g. magnitude, radio fluxes, distance separation).Thanks to this tool, the human classifier can make a quicker decision in order to trigger the collaboration rapid-response resources.
46 CFR 8.430 - U.S. Supplement to class rules.
Code of Federal Regulations, 2011 CFR
2011-10-01
... authorization to participate in the ACP, a recognized classification society must prepare, and receive Commandant (CG-521) approval of, a U.S. Supplement to the recognized classification society's class rules... of that classification society or applicable international regulations. ...
46 CFR 8.430 - U.S. Supplement to class rules.
Code of Federal Regulations, 2010 CFR
2010-10-01
... authorization to participate in the ACP, a recognized classification society must prepare, and receive Commandant (CG-521) approval of, a U.S. Supplement to the recognized classification society's class rules... of that classification society or applicable international regulations. ...
Interactive lesion segmentation on dynamic contrast enhanced breast MRI using a Markov model
NASA Astrophysics Data System (ADS)
Wu, Qiu; Salganicoff, Marcos; Krishnan, Arun; Fussell, Donald S.; Markey, Mia K.
2006-03-01
The purpose of this study is to develop a method for segmenting lesions on Dynamic Contrast-Enhanced (DCE) breast MRI. DCE breast MRI, in which the breast is imaged before, during, and after the administration of a contrast agent, enables a truly 3D examination of breast tissues. This functional angiogenic imaging technique provides noninvasive assessment of microcirculatory characteristics of tissues in addition to traditional anatomical structure information. Since morphological features and kinetic curves from segmented lesions are to be used for diagnosis and treatment decisions, lesion segmentation is a key pre-processing step for classification. In our study, the ROI is defined by a bounding box containing the enhancement region in the subtraction image, which is generated by subtracting the pre-contrast image from 1st post-contrast image. A maximum a posteriori (MAP) estimate of the class membership (lesion vs. non-lesion) for each voxel is obtained using the Iterative Conditional Mode (ICM) method. The prior distribution of the class membership is modeled as a multi-level logistic model, a Markov Random Field model in which the class membership of each voxel is assumed to depend upon its nearest neighbors only. The likelihood distribution is assumed to be Gaussian. The parameters of each Gaussian distribution are estimated from a dozen voxels manually selected as representative of the class. The experimental segmentation results demonstrate anatomically plausible breast tissue segmentation and the predicted class membership of voxels from the interactive segmentation algorithm agrees with the manual classifications made by inspection of the kinetic enhancement curves. The proposed method is advantageous in that it is efficient, flexible, and robust.
AutoFACT: An Automatic Functional Annotation and Classification Tool
Koski, Liisa B; Gray, Michael W; Lang, B Franz; Burger, Gertraud
2005-01-01
Background Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. Results We present AutoFACT, a fully automated and customizable annotation tool that assigns biologically informative functions to a sequence. Key features of this tool are that it (1) analyzes nucleotide and protein sequence data; (2) determines the most informative functional description by combining multiple BLAST reports from several user-selected databases; (3) assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names; and (4) generates output in HTML, text and GFF formats for the user's convenience. We have compared AutoFACT to four well-established annotation pipelines. The error rate of functional annotation is estimated to be only between 1–2%. Comparison of AutoFACT to the traditional top-BLAST-hit annotation method shows that our procedure increases the number of functionally informative annotations by approximately 50%. Conclusion AutoFACT will serve as a useful annotation tool for smaller sequencing groups lacking dedicated bioinformatics staff. It is implemented in PERL and runs on LINUX/UNIX platforms. AutoFACT is available at . PMID:15960857
A machine learning pipeline for automated registration and classification of 3D lidar data
NASA Astrophysics Data System (ADS)
Rajagopal, Abhejit; Chellappan, Karthik; Chandrasekaran, Shivkumar; Brown, Andrew P.
2017-05-01
Despite the large availability of geospatial data, registration and exploitation of these datasets remains a persis- tent challenge in geoinformatics. Popular signal processing and machine learning algorithms, such as non-linear SVMs and neural networks, rely on well-formatted input models as well as reliable output labels, which are not always immediately available. In this paper we outline a pipeline for gathering, registering, and classifying initially unlabeled wide-area geospatial data. As an illustrative example, we demonstrate the training and test- ing of a convolutional neural network to recognize 3D models in the OGRIP 2007 LiDAR dataset using fuzzy labels derived from OpenStreetMap as well as other datasets available on OpenTopography.org. When auxiliary label information is required, various text and natural language processing filters are used to extract and cluster keywords useful for identifying potential target classes. A subset of these keywords are subsequently used to form multi-class labels, with no assumption of independence. Finally, we employ class-dependent geometry extraction routines to identify candidates from both training and testing datasets. Our regression networks are able to identify the presence of 6 structural classes, including roads, walls, and buildings, in volumes as big as 8000 m3 in as little as 1.2 seconds on a commodity 4-core Intel CPU. The presented framework is neither dataset nor sensor-modality limited due to the registration process, and is capable of multi-sensor data-fusion.
Fernández-López, Juan Antonio; Fernández-Fidalgo, María; Geoffrey, Reed; Stucki, Gerold; Cieza, Alarcos
2009-01-01
The World Health Organization's International Classification of Functioning, Disability and Health (ICF) has provided a new foundation for our understanding of health, functioning, and disability. It covers most of the health and health-related domains that make up the human experience, and the most environmental factors that influence that experience of functioning and disability. With the exhaustive ICF, patients' functioning -including its components body functions and structures and activities and participation-, becomes a central perspective in medicine. To implement the ICF in medicine and other fields, practical tools (= ICF Core Sets) have been developed. They are selected sets of categories out of the whole classification which serve as minimal standards for the assessment and reporting of functioning and health for clinical studies and clinical encounters (Brief ICF Core Set) or as standards for multiprofessional comprehensive assessment (Comprehensive ICF Core Set). Different from generic and condition-specific health-status measures, the ICF Core Sets include important body functions and structures and contextual factors. The use of the ICF Core Sets provides an important step towards improved communications between healthcare providers and professionals, and will enable patients and their families to understand and communicate with health professionals about their functioning and treatment goals. Specific applications include multi- and interdisciplinary assessment in clinical settings and in legal expert evaluations and use in disease or functioning-management programs. The ICF has also a potential as a conceptual framework to clarify an interrelated universe of health-related concepts which can be elucidated based on the ICF and therefore will be an ideal tool for teaching students in all medical fields and may open doors to multi-professional learning.
EnzML: multi-label prediction of enzyme classes using InterPro signatures
2012-01-01
Background Manual annotation of enzymatic functions cannot keep up with automatic genome sequencing. In this work we explore the capacity of InterPro sequence signatures to automatically predict enzymatic function. Results We present EnzML, a multi-label classification method that can efficiently account also for proteins with multiple enzymatic functions: 50,000 in UniProt. EnzML was evaluated using a standard set of 300,747 proteins for which the manually curated Swiss-Prot and KEGG databases have agreeing Enzyme Commission (EC) annotations. EnzML achieved more than 98% subset accuracy (exact match of all correct Enzyme Commission classes of a protein) for the entire dataset and between 87 and 97% subset accuracy in reannotating eight entire proteomes: human, mouse, rat, mouse-ear cress, fruit fly, the S. pombe yeast, the E. coli bacterium and the M. jannaschii archaebacterium. To understand the role played by the dataset size, we compared the cross-evaluation results of smaller datasets, either constructed at random or from specific taxonomic domains such as archaea, bacteria, fungi, invertebrates, plants and vertebrates. The results were confirmed even when the redundancy in the dataset was reduced using UniRef100, UniRef90 or UniRef50 clusters. Conclusions InterPro signatures are a compact and powerful attribute space for the prediction of enzymatic function. This representation makes multi-label machine learning feasible in reasonable time (30 minutes to train on 300,747 instances with 10,852 attributes and 2,201 class values) using the Mulan Binary Relevance Nearest Neighbours algorithm implementation (BR-kNN). PMID:22533924
Wavelet packets for multi- and hyper-spectral imagery
NASA Astrophysics Data System (ADS)
Benedetto, J. J.; Czaja, W.; Ehler, M.; Flake, C.; Hirn, M.
2010-01-01
State of the art dimension reduction and classification schemes in multi- and hyper-spectral imaging rely primarily on the information contained in the spectral component. To better capture the joint spatial and spectral data distribution we combine the Wavelet Packet Transform with the linear dimension reduction method of Principal Component Analysis. Each spectral band is decomposed by means of the Wavelet Packet Transform and we consider a joint entropy across all the spectral bands as a tool to exploit the spatial information. Dimension reduction is then applied to the Wavelet Packets coefficients. We present examples of this technique for hyper-spectral satellite imaging. We also investigate the role of various shrinkage techniques to model non-linearity in our approach.
Song, Weiran; Wang, Hui; Maguire, Paul; Nibouche, Omar
2018-06-07
Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Gautam, Nitin
The main objectives of this thesis are to develop a robust statistical method for the classification of ocean precipitation based on physical properties to which the SSM/I is sensitive and to examine how these properties vary globally and seasonally. A two step approach is adopted for the classification of oceanic precipitation classes from multispectral SSM/I data: (1)we subjectively define precipitation classes using a priori information about the precipitating system and its possible distinct signature on SSM/I data such as scattering by ice particles aloft in the precipitating cloud, emission by liquid rain water below freezing level, the difference of polarization at 19 GHz-an indirect measure of optical depth, etc.; (2)we then develop an objective classification scheme which is found to reproduce the subjective classification with high accuracy. This hybrid strategy allows us to use the characteristics of the data to define and encode classes and helps retain the physical interpretation of classes. The classification methods based on k-nearest neighbor and neural network are developed to objectively classify six precipitation classes. It is found that the classification method based neural network yields high accuracy for all precipitation classes. An inversion method based on minimum variance approach was used to retrieve gross microphysical properties of these precipitation classes such as column integrated liquid water path, column integrated ice water path, and column integrated min water path. This classification method is then applied to 2 years (1991-92) of SSM/I data to examine and document the seasonal and global distribution of precipitation frequency corresponding to each of these objectively defined six classes. The characteristics of the distribution are found to be consistent with assumptions used in defining these six precipitation classes and also with well known climatological patterns of precipitation regions. The seasonal and global distribution of these six classes is also compared with the earlier results obtained from Comprehensive Ocean Atmosphere Data Sets (COADS). It is found that the gross pattern of the distributions obtained from SSM/I and COADS data match remarkably well with each other.
The 'Soil Cover App' - a new tool for fast determination of dead and living biomass on soil
NASA Astrophysics Data System (ADS)
Bauer, Thomas; Strauss, Peter; Riegler-Nurscher, Peter; Prankl, Johann; Prankl, Heinrich
2017-04-01
Worldwide many agricultural practices aim on soil protection strategies using living or dead biomass as soil cover. Especially for the case when management practices are focusing on soil erosion mitigation the effectiveness of these practices is directly driven by the amount of soil coverleft on the soil surface. Hence there is a need for quick and reliable methods of soil cover estimation not only for living biomass but particularly for dead biomass (mulch). Available methods for the soil cover measurement are either subjective, depending on an educated guess or time consuming, e.g., if the image is analysed manually at grid points. We therefore developed a mobile application using an algorithm based on entangled forest classification. The final output of the algorithm gives classified labels for each pixel of the input image as well as the percentage of each class which are living biomass, dead biomass, stones and soil. Our training dataset consisted of more than 250 different images and their annotated class information. Images have been taken in a set of different environmental conditions such as light, soil coverages from between 0% to 100%, different materials such as living plants, residues, straw material and stones. We compared the results provided by our mobile application with a data set of 180 images that had been manually annotated A comparison between both methods revealed a regression slope of 0.964 with a coefficient of determination R2 = 0.92, corresponding to an average error of about 4%. While average error of living plant classification was about 3%, dead residue classification resulted in an 8% error. Thus the new mobile application tool offers a fast and easy way to obtain information on the protective potential of a particular agricultural management site.
NASA Astrophysics Data System (ADS)
Coopersmith, Evan Joseph
The techniques and information employed for decision-making vary with the spatial and temporal scope of the assessment required. In modern agriculture, the farm owner or manager makes decisions on a day-to-day or even hour-to-hour basis for dozens of fields scattered over as much as a fifty-mile radius from some central location. Following precipitation events, land begins to dry. Land-owners and managers often trace serpentine paths of 150+ miles every morning to inspect the conditions of their various parcels. His or her objective lies in appropriate resource usage -- is a given tract of land dry enough to be workable at this moment or would he or she be better served waiting patiently? Longer-term, these owners and managers decide upon which seeds will grow most effectively and which crops will make their operations profitable. At even longer temporal scales, decisions are made regarding which fields must be acquired and sold and what types of equipment will be necessary in future operations. This work develops and validates algorithms for these shorter-term decisions, along with models of national climate patterns and climate changes to enable longer-term operational planning. A test site at the University of Illinois South Farms (Urbana, IL, USA) served as the primary location to validate machine learning algorithms, employing public sources of precipitation and potential evapotranspiration to model the wetting/drying process. In expanding such local decision support tools to locations on a national scale, one must recognize the heterogeneity of hydroclimatic and soil characteristics throughout the United States. Machine learning algorithms modeling the wetting/drying process must address this variability, and yet it is wholly impractical to construct a separate algorithm for every conceivable location. For this reason, a national hydrological classification system is presented, allowing clusters of hydroclimatic similarity to emerge naturally from annual regime curve data and facilitate the development of cluster-specific algorithms. Given the desire to enable intelligent decision-making at any location, this classification system is developed in a manner that will allow for classification anywhere in the U.S., even in an ungauged basin. Daily time series data from 428 catchments in the MOPEX database are analyzed to produce an empirical classification tree, partitioning the United States into regions of hydroclimatic similarity. In constructing a classification tree based upon 55 years of data, it is important to recognize the non-stationary nature of climate data. The shifts in climatic regimes will cause certain locations to shift their ultimate position within the classification tree, requiring decision-makers to alter land usage, farming practices, and equipment needs, and algorithms to adjust accordingly. This work adapts the classification model to address the issue of regime shifts over larger temporal scales and suggests how land-usage and farming protocol may vary from hydroclimatic shifts in decades to come. Finally, the generalizability of the hydroclimatic classification system is tested with a physically-based soil moisture model calibrated at several locations throughout the continental United States. The soil moisture model is calibrated at a given site and then applied with the same parameters at other sites within and outside the same hydroclimatic class. The model's performance deteriorates minimally if the calibration and validation location are within the same hydroclimatic class, but deteriorates significantly if the calibration and validates sites are located in different hydroclimatic classes. These soil moisture estimates at the field scale are then further refined by the introduction of LiDAR elevation data, distinguishing faster-drying peaks and ridges from slower-drying valleys. The inclusion of LiDAR enabled multiple locations within the same field to be predicted accurately despite non-identical topography. This cross-application of parametric calibrations and LiDAR-driven disaggregation facilitates decision-support at locations without proximally-located soil moisture sensors.
NASA Astrophysics Data System (ADS)
Wei, Hongqiang; Zhou, Guiyun; Zhou, Junjie
2018-04-01
The classification of leaf and wood points is an essential preprocessing step for extracting inventory measurements and canopy characterization of trees from the terrestrial laser scanning (TLS) data. The geometry-based approach is one of the widely used classification method. In the geometry-based method, it is common practice to extract salient features at one single scale before the features are used for classification. It remains unclear how different scale(s) used affect the classification accuracy and efficiency. To assess the scale effect on the classification accuracy and efficiency, we extracted the single-scale and multi-scale salient features from the point clouds of two oak trees of different sizes and conducted the classification on leaf and wood. Our experimental results show that the balanced accuracy of the multi-scale method is higher than the average balanced accuracy of the single-scale method by about 10 % for both trees. The average speed-up ratio of single scale classifiers over multi-scale classifier for each tree is higher than 30.
7 CFR 27.39 - Issuance of certificates.
Code of Federal Regulations, 2010 CFR
2010-01-01
... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Cotton Class Certificates § 27... after the classification of cotton has been completed by a Marketing Services Office, the Marketing Services Office shall issue a cotton class certificate showing the results of such classification. Each...
7 CFR 27.39 - Issuance of certificates.
Code of Federal Regulations, 2012 CFR
2012-01-01
... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Cotton Class Certificates § 27... after the classification of cotton has been completed by a Marketing Services Office, the Marketing Services Office shall issue a cotton class certificate showing the results of such classification. Each...
7 CFR 27.39 - Issuance of certificates.
Code of Federal Regulations, 2011 CFR
2011-01-01
... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Cotton Class Certificates § 27... after the classification of cotton has been completed by a Marketing Services Office, the Marketing Services Office shall issue a cotton class certificate showing the results of such classification. Each...
Jiménez-Carvelo, Ana M; Pérez-Castaño, Estefanía; González-Casado, Antonio; Cuadros-Rodríguez, Luis
2017-04-15
A new method for differentiation of olive oil (independently of the quality category) from other vegetable oils (canola, safflower, corn, peanut, seeds, grapeseed, palm, linseed, sesame and soybean) has been developed. The analytical procedure for chromatographic fingerprinting of the methyl-transesterified fraction of each vegetable oil, using normal-phase liquid chromatography, is described and the chemometric strategies applied and discussed. Some chemometric methods, such as k-nearest neighbours (kNN), partial least squared-discriminant analysis (PLS-DA), support vector machine classification analysis (SVM-C), and soft independent modelling of class analogies (SIMCA), were applied to build classification models. Performance of the classification was evaluated and ranked using several classification quality metrics. The discriminant analysis, based on the use of one input-class, (plus a dummy class) was applied for the first time in this study. Copyright © 2016 Elsevier Ltd. All rights reserved.
Multiple Signal Classification for Gravitational Wave Burst Search
NASA Astrophysics Data System (ADS)
Cao, Junwei; He, Zhengqi
2013-01-01
This work is mainly focused on the application of the multiple signal classification (MUSIC) algorithm for gravitational wave burst search. This algorithm extracts important gravitational wave characteristics from signals coming from detectors with arbitrary position, orientation and noise covariance. In this paper, the MUSIC algorithm is described in detail along with the necessary adjustments required for gravitational wave burst search. The algorithm's performance is measured using simulated signals and noise. MUSIC is compared with the Q-transform for signal triggering and with Bayesian analysis for direction of arrival (DOA) estimation, using the Ω-pipeline. Experimental results show that MUSIC has a lower resolution but is faster. MUSIC is a promising tool for real-time gravitational wave search for multi-messenger astronomy.
Epigenetic Patterns of PTSD: DNA Methylation In Serum of OIF/OEF Servicemembers
2011-01-01
ascertained via query of the International Classification of Diseases , 9th Revision (ICD-9) codes 290-320. To attempt to control for confounding by other...other CNS tissues is not clear. Although relevant to a different class of disease , many of the aberrations that have been detected in the DNA of...valuable diagnostic tool in various diseases . (50-53) Compared with cultured cells, clinical specimens, such as whole blood, serum, and even brain
Classification of asteroid spectra using a neural network
NASA Technical Reports Server (NTRS)
Howell, E. S.; Merenyi, E.; Lebofsky, L. A.
1994-01-01
The 52-color asteroid survey (Bell et al., 1988) together with the 8-color asteroid survey (Zellner et al., 1985) provide a data set of asteroid spectra spanning 0.3-2.5 micrometers. An artificial neural network clusters these asteroid spectra based on their similarity to each other. We have also trained the neural network with a categorization learning output layer in a supervised mode to associate the established clusters with taxonomic classes. Results of our classification agree with Tholen's classification based on the 8-color data alone. When extending the spectral range using the 52-color survey data, we find that some modification of the Tholen classes is indicated to produce a cleaner, self-consistent set of taxonomic classes. After supervised training using our modified classes, the network correctly classifies both the training examples, and additional spectra into the correct class with an average of 90% accuracy. Our classification supports the separation of the K class from the S class, as suggested by Bell et al. (1987), based on the near-infrared spectrum. We define two end-member subclasses which seem to have compositional significance within the S class: the So class, which is olivine-rich and red, and the Sp class, which is pyroxene-rich and less red. The remaining S-class asteroids have intermediate compositions of both olivine and pyroxene and moderately red continua. The network clustering suggests some additional structure within the E-, M-, and P-class asteroids, even in the absence of albedo information, which is the only discriminant between these in the Tholen classification. New relationships are seen between the C class and related G, B, and F classes. However, in both cases, the number of spectra is too small to interpret or determine the significance of these separations.
NASA Astrophysics Data System (ADS)
Verma, A. K.; Garg, P. K.; Prasad, K. S. H.; Dadhwal, V. K.
2016-12-01
Agriculture is a backbone of Indian economy, providing livelihood to about 70% of the population. The primary objective of this research is to investigate the general applicability of time-series MODIS 250m Normalized difference vegetation index (NDVI) and Enhanced vegetation index (EVI) data for various Land use/Land cover (LULC) classification. The other objective is the retrieval of crop biophysical parameter using MODIS 250m resolution data. The Uttar Pradesh state of India is selected for this research work. A field study of 38 farms was conducted during entire crop season of the year 2015 to evaluate the applicability of MODIS 8-day, 250m resolution composite images for assessment of crop condition. The spectroradiometer is used for ground reflectance and the AccuPAR LP-80 Ceptometer is used to measure the agricultural crops Leaf Area Index (LAI). The AccuPAR measures Photosynthetically Active Radiation (PAR) and can invert these readings to give LAI for plant canopy. Ground-based canopy reflectance and LAI were used to calibrate a radiative transfer model to create look-up table (LUT) that was used to simulate LAI. The seasonal trend of MODIS-derived LAI was used to find crop parameter by adjusting the LAI simulated from climate-based crop yield model. Cloud free MODIS images of 250m resolution (16 day composite period) were downloaded using LP-DAAC website over a period of 12 months (Jan to Dec 2015). MODIS both the VI products were found to have sufficient spectral, spatial and temporal resolution to detect unique signatures for each class (water, fallow land, urban, dense vegetation, orchard, sugarcane and other crops). Ground truth data were collected using JUNO GPS. Multi-temporal VI signatures for vegetation classes were consistent with its general phenological characteristic and were spectrally separable at some point during the growing season. The MODIS NDVI and EVI multi-temporal images tracked similar seasonal responses for all croplands and were highly correlated across the growing season. The confusion matrix method is used for accuracy assessment and reference data which has been taken during the field visit. Total 520 pixels have been selected for various classes to determine the accuracy. The classification accuracy and kappa coefficient is found to be 79.76% and 0.78 respectively.
Mediterranean Land Use and Land Cover Classification Assessment Using High Spatial Resolution Data
NASA Astrophysics Data System (ADS)
Elhag, Mohamed; Boteva, Silvena
2016-10-01
Landscape fragmentation is noticeably practiced in Mediterranean regions and imposes substantial complications in several satellite image classification methods. To some extent, high spatial resolution data were able to overcome such complications. For better classification performances in Land Use Land Cover (LULC) mapping, the current research adopts different classification methods comparison for LULC mapping using Sentinel-2 satellite as a source of high spatial resolution. Both of pixel-based and an object-based classification algorithms were assessed; the pixel-based approach employs Maximum Likelihood (ML), Artificial Neural Network (ANN) algorithms, Support Vector Machine (SVM), and, the object-based classification uses the Nearest Neighbour (NN) classifier. Stratified Masking Process (SMP) that integrates a ranking process within the classes based on spectral fluctuation of the sum of the training and testing sites was implemented. An analysis of the overall and individual accuracy of the classification results of all four methods reveals that the SVM classifier was the most efficient overall by distinguishing most of the classes with the highest accuracy. NN succeeded to deal with artificial surface classes in general while agriculture area classes, and forest and semi-natural area classes were segregated successfully with SVM. Furthermore, a comparative analysis indicates that the conventional classification method yielded better accuracy results than the SMP method overall with both classifiers used, ML and SVM.
On multi-site damage identification using single-site training data
NASA Astrophysics Data System (ADS)
Barthorpe, R. J.; Manson, G.; Worden, K.
2017-11-01
This paper proposes a methodology for developing multi-site damage location systems for engineering structures that can be trained using single-site damaged state data only. The methodology involves training a sequence of binary classifiers based upon single-site damage data and combining the developed classifiers into a robust multi-class damage locator. In this way, the multi-site damage identification problem may be decomposed into a sequence of binary decisions. In this paper Support Vector Classifiers are adopted as the means of making these binary decisions. The proposed methodology represents an advancement on the state of the art in the field of multi-site damage identification which require either: (1) full damaged state data from single- and multi-site damage cases or (2) the development of a physics-based model to make multi-site model predictions. The potential benefit of the proposed methodology is that a significantly reduced number of recorded damage states may be required in order to train a multi-site damage locator without recourse to physics-based model predictions. In this paper it is first demonstrated that Support Vector Classification represents an appropriate approach to the multi-site damage location problem, with methods for combining binary classifiers discussed. Next, the proposed methodology is demonstrated and evaluated through application to a real engineering structure - a Piper Tomahawk trainer aircraft wing - with its performance compared to classifiers trained using the full damaged-state dataset.
Wang, Shuihua; Yang, Ming; Du, Sidan; Yang, Jiquan; Liu, Bin; Gorriz, Juan M.; Ramírez, Javier; Yuan, Ti-Fei; Zhang, Yudong
2016-01-01
Highlights We develop computer-aided diagnosis system for unilateral hearing loss detection in structural magnetic resonance imaging.Wavelet entropy is introduced to extract image global features from brain images. Directed acyclic graph is employed to endow support vector machine an ability to handle multi-class problems.The developed computer-aided diagnosis system achieves an overall accuracy of 95.1% for this three-class problem of differentiating left-sided and right-sided hearing loss from healthy controls. Aim: Sensorineural hearing loss (SNHL) is correlated to many neurodegenerative disease. Now more and more computer vision based methods are using to detect it in an automatic way. Materials: We have in total 49 subjects, scanned by 3.0T MRI (Siemens Medical Solutions, Erlangen, Germany). The subjects contain 14 patients with right-sided hearing loss (RHL), 15 patients with left-sided hearing loss (LHL), and 20 healthy controls (HC). Method: We treat this as a three-class classification problem: RHL, LHL, and HC. Wavelet entropy (WE) was selected from the magnetic resonance images of each subjects, and then submitted to a directed acyclic graph support vector machine (DAG-SVM). Results: The 10 repetition results of 10-fold cross validation shows 3-level decomposition will yield an overall accuracy of 95.10% for this three-class classification problem, higher than feedforward neural network, decision tree, and naive Bayesian classifier. Conclusions: This computer-aided diagnosis system is promising. We hope this study can attract more computer vision method for detecting hearing loss. PMID:27807415
Wang, Shuihua; Yang, Ming; Du, Sidan; Yang, Jiquan; Liu, Bin; Gorriz, Juan M; Ramírez, Javier; Yuan, Ti-Fei; Zhang, Yudong
2016-01-01
Highlights We develop computer-aided diagnosis system for unilateral hearing loss detection in structural magnetic resonance imaging.Wavelet entropy is introduced to extract image global features from brain images. Directed acyclic graph is employed to endow support vector machine an ability to handle multi-class problems.The developed computer-aided diagnosis system achieves an overall accuracy of 95.1% for this three-class problem of differentiating left-sided and right-sided hearing loss from healthy controls. Aim: Sensorineural hearing loss (SNHL) is correlated to many neurodegenerative disease. Now more and more computer vision based methods are using to detect it in an automatic way. Materials: We have in total 49 subjects, scanned by 3.0T MRI (Siemens Medical Solutions, Erlangen, Germany). The subjects contain 14 patients with right-sided hearing loss (RHL), 15 patients with left-sided hearing loss (LHL), and 20 healthy controls (HC). Method: We treat this as a three-class classification problem: RHL, LHL, and HC. Wavelet entropy (WE) was selected from the magnetic resonance images of each subjects, and then submitted to a directed acyclic graph support vector machine (DAG-SVM). Results: The 10 repetition results of 10-fold cross validation shows 3-level decomposition will yield an overall accuracy of 95.10% for this three-class classification problem, higher than feedforward neural network, decision tree, and naive Bayesian classifier. Conclusions: This computer-aided diagnosis system is promising. We hope this study can attract more computer vision method for detecting hearing loss.
Test of spectral/spatial classifier
NASA Technical Reports Server (NTRS)
Landgrebe, D. A. (Principal Investigator); Kast, J. L.; Davis, B. J.
1977-01-01
The author has identified the following significant results. The supervised ECHO processor (which utilizes class statistics for object identification) successfully exploits the redundancy of states characteristic of sampled imagery of ground scenes to achieve better classification accuracy, reduce the number of classifications required, and reduce the variability of classification results. The nonsupervised ECHO processor (which identifies objects without the benefit of class statistics) successfully reduces the number of classifications required and the variability of the classification results.
Automated simultaneous multiple feature classification of MTI data
NASA Astrophysics Data System (ADS)
Harvey, Neal R.; Theiler, James P.; Balick, Lee K.; Pope, Paul A.; Szymanski, John J.; Perkins, Simon J.; Porter, Reid B.; Brumby, Steven P.; Bloch, Jeffrey J.; David, Nancy A.; Galassi, Mark C.
2002-08-01
Los Alamos National Laboratory has developed and demonstrated a highly capable system, GENIE, for the two-class problem of detecting a single feature against a background of non-feature. In addition to the two-class case, however, a commonly encountered remote sensing task is the segmentation of multispectral image data into a larger number of distinct feature classes or land cover types. To this end we have extended our existing system to allow the simultaneous classification of multiple features/classes from multispectral data. The technique builds on previous work and its core continues to utilize a hybrid evolutionary-algorithm-based system capable of searching for image processing pipelines optimized for specific image feature extraction tasks. We describe the improvements made to the GENIE software to allow multiple-feature classification and describe the application of this system to the automatic simultaneous classification of multiple features from MTI image data. We show the application of the multiple-feature classification technique to the problem of classifying lava flows on Mauna Loa volcano, Hawaii, using MTI image data and compare the classification results with standard supervised multiple-feature classification techniques.
Machine learning in infrared object classification - an all-sky selection of YSO candidates
NASA Astrophysics Data System (ADS)
Marton, Gabor; Zahorecz, Sarolta; Toth, L. Viktor; Magnus McGehee, Peregrine; Kun, Maria
2015-08-01
Object classification is a fundamental and challenging problem in the era of big data. I will discuss up-to-date methods and their application to classify infrared point sources.We analysed the ALLWISE catalogue, the most recent public source catalogue of the Wide-field Infrared Survey Explorer (WISE) to compile a reliable list of Young Stellar Object (YSO) candidates. We tested and compared classical and up-to-date statistical methods as well, to discriminate source types like extragalactic objects, evolved stars, main sequence stars, objects related to the interstellar medium and YSO candidates by using their mid-IR WISE properties and associated near-IR 2MASS data.In the particular classification problem the Support Vector Machines (SVM), a class of supervised learning algorithm turned out to be the best tool. As a result we classify Class I and II YSOs with >90% accuracy while the fraction of contaminating extragalactic objects remains well below 1%, based on the number of known objects listed in the SIMBAD and VizieR databases. We compare our results to other classification schemes from the literature and show that the SVM outperforms methods that apply linear cuts on the colour-colour and colour-magnitude space. Our homogenous YSO candidate catalog can serve as an excellent pathfinder for future detailed observations of individual objects and a starting point of statistical studies that aim to add pieces to the big picture of star formation theory.
Extraction of urban vegetation with Pleiades multiangular images
NASA Astrophysics Data System (ADS)
Lefebvre, Antoine; Nabucet, Jean; Corpetti, Thomas; Courty, Nicolas; Hubert-Moy, Laurence
2016-10-01
Vegetation is essential in urban environments since it provides significant services in terms of health, heat, property value, ecology ... As part of the European Union Biodiversity Strategy Plan for 2020, the protection and development of green-infrastructures is strengthened in urban areas. In order to evaluate and monitor the quality of the green infra-structures, this article investigates contributions of Pléiades multi-angular images to extract and characterize low and high urban vegetation. From such images one can extract both spectral and elevation information from optical images. Our method is composed of 3 main steps : (1) the computation of a normalized Digital Surface Model from the multi-angular images ; (2) Extraction of spectral and contextual features ; (3) a classification of vegetation classes (tree and grass) performed with a random forest classifier. Results performed in the city of Rennes in France show the ability of multi-angular images to extract DEM in urban area despite building height. It also highlights its importance and its complementarity with contextual information to extract urban vegetation.
Soft Computing Application in Fault Detection of Induction Motor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Konar, P.; Puhan, P. S.; Chattopadhyay, P. Dr.
2010-10-26
The paper investigates the effectiveness of different patter classifier like Feed Forward Back Propagation (FFBPN), Radial Basis Function (RBF) and Support Vector Machine (SVM) for detection of bearing faults in Induction Motor. The steady state motor current with Park's Transformation has been used for discrimination of inner race and outer race bearing defects. The RBF neural network shows very encouraging results for multi-class classification problems and is hoped to set up a base for incipient fault detection of induction motor. SVM is also found to be a very good fault classifier which is highly competitive with RBF.
Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach
F. Briggs; B. Lakshminarayanan; L. Neal; X.Z. Fern; R. Raich; S.F. Hadley; A.S. Hadley; M.G. Betts
2012-01-01
Although field-collected recordings typically contain multiple simultaneously vocalizing birds of different species, acoustic species classification in this setting has received little study so far. This work formulates the problem of classifying the set of species present in an audio recording using the multi-instance multi-label (MIML) framework for machine learning...
TMS combined with EEG in genetic generalized epilepsy: A phase II diagnostic accuracy study.
Kimiskidis, Vasilios K; Tsimpiris, Alkiviadis; Ryvlin, Philippe; Kalviainen, Reetta; Koutroumanidis, Michalis; Valentin, Antonio; Laskaris, Nikolaos; Kugiumtzis, Dimitris
2017-02-01
(A) To develop a TMS-EEG stimulation and data analysis protocol in genetic generalized epilepsy (GGE). (B) To investigate the diagnostic accuracy of TMS-EEG in GGE. Pilot experiments resulted in the development and optimization of a paired-pulse TMS-EEG protocol at rest, during hyperventilation (HV), and post-HV combined with multi-level data analysis. This protocol was applied in 11 controls (C) and 25 GGE patients (P), further dichotomized into responders to antiepileptic drugs (R, n=13) and non-responders (n-R, n=12).Features (n=57) extracted from TMS-EEG responses after multi-level analysis were given to a feature selection scheme and a Bayesian classifier, and the accuracy of assigning participants into the classes P-C and R-nR was computed. On the basis of the optimal feature subset, the cross-validated accuracy of TMS-EEG for the classification P-C was 0.86 at rest, 0.81 during HV and 0.92 at post-HV, whereas for R-nR the corresponding figures are 0.80, 0.78 and 0.65, respectively. Applying a fusion approach on all conditions resulted in an accuracy of 0.84 for the classification P-C and 0.76 for the classification R-nR. TMS-EEG can be used for diagnostic purposes and for assessing the response to antiepileptic drugs. TMS-EEG holds significant diagnostic potential in GGE. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier B.V. All rights reserved.
Towards human behavior recognition based on spatio temporal features and support vector machines
NASA Astrophysics Data System (ADS)
Ghabri, Sawsen; Ouarda, Wael; Alimi, Adel M.
2017-03-01
Security and surveillance are vital issues in today's world. The recent acts of terrorism have highlighted the urgent need for efficient surveillance. There is indeed a need for an automated system for video surveillance which can detect identity and activity of person. In this article, we propose a new paradigm to recognize an aggressive human behavior such as boxing action. Our proposed system for human activity detection includes the use of a fusion between Spatio Temporal Interest Point (STIP) and Histogram of Oriented Gradient (HoG) features. The novel feature called Spatio Temporal Histogram Oriented Gradient (STHOG). To evaluate the robustness of our proposed paradigm with a local application of HoG technique on STIP points, we made experiments on KTH human action dataset based on Multi Class Support Vector Machines classification. The proposed scheme outperforms basic descriptors like HoG and STIP to achieve 82.26% us an accuracy value of classification rate.
Discovering disease-disease associations by fusing systems-level molecular data
Žitnik, Marinka; Janjić, Vuk; Larminie, Chris; Zupan, Blaž; Pržulj, Nataša
2013-01-01
The advent of genome-scale genetic and genomic studies allows new insight into disease classification. Recently, a shift was made from linking diseases simply based on their shared genes towards systems-level integration of molecular data. Here, we aim to find relationships between diseases based on evidence from fusing all available molecular interaction and ontology data. We propose a multi-level hierarchy of disease classes that significantly overlaps with existing disease classification. In it, we find 14 disease-disease associations currently not present in Disease Ontology and provide evidence for their relationships through comorbidity data and literature curation. Interestingly, even though the number of known human genetic interactions is currently very small, we find they are the most important predictor of a link between diseases. Finally, we show that omission of any one of the included data sources reduces prediction quality, further highlighting the importance in the paradigm shift towards systems-level data fusion. PMID:24232732
Discovering disease-disease associations by fusing systems-level molecular data.
Žitnik, Marinka; Janjić, Vuk; Larminie, Chris; Zupan, Blaž; Pržulj, Nataša
2013-11-15
The advent of genome-scale genetic and genomic studies allows new insight into disease classification. Recently, a shift was made from linking diseases simply based on their shared genes towards systems-level integration of molecular data. Here, we aim to find relationships between diseases based on evidence from fusing all available molecular interaction and ontology data. We propose a multi-level hierarchy of disease classes that significantly overlaps with existing disease classification. In it, we find 14 disease-disease associations currently not present in Disease Ontology and provide evidence for their relationships through comorbidity data and literature curation. Interestingly, even though the number of known human genetic interactions is currently very small, we find they are the most important predictor of a link between diseases. Finally, we show that omission of any one of the included data sources reduces prediction quality, further highlighting the importance in the paradigm shift towards systems-level data fusion.
Estuarine Habitat Assessment for Construction of a Submarine Transmission Line
NASA Astrophysics Data System (ADS)
Hamouda, Amr Z.; Abdel-Salam, Khaled M.
2010-07-01
The present paper describes a submarine survey using the acoustic discrimination system QTC VIEW (Series V) as an exploratory tool to adjust final route alignment of a new pipeline. By using acoustic sound survey as an exploratory tool described in this paper to adjust final route alignment of a new pipeline to minimize the environmental impact caused and ultimately to avoid any mitigation measures. The transmission pipeline extended from the shore line of Abu-Qir Bay, on the Mediterranean Sea in Egypt, out to 70 nautical miles at sea (60 m water depth). Four main surface sediment types were defined in the study area, namely fine sand, silty sand, silt and clay. Results of the acoustic classification revealed four acoustic classes. The first acoustic class corresponded to fine sand, absence of shell debris and very poor habitats characteristics. The second acoustic class is predominant in the study area and corresponds to the region occupied by silt. It is also characterized by intermediate diversity of macrobenthic invertebrate community which is mainly characterized by polychaeta. The third acoustic class is characterized by silt to silty clay. It is characterized by a high diversity of macrobenthic invertebrate community which is mainly polychaeta with an intermediate diversity of gastropoda and bivalvia. The final acoustic class is characterized by clay and high occurrence of shell debris of gastropoda, bivalvia and polychaeta.
NASA Astrophysics Data System (ADS)
Poletti, Enea; Veronese, Elisa; Calabrese, Massimiliano; Bertoldo, Alessandra; Grisan, Enrico
2012-02-01
The automatic segmentation of brain tissues in magnetic resonance (MR) is usually performed on T1-weighted images, due to their high spatial resolution. T1w sequence, however, has some major downsides when brain lesions are present: the altered appearance of diseased tissues causes errors in tissues classification. In order to overcome these drawbacks, we employed two different MR sequences: fluid attenuated inversion recovery (FLAIR) and double inversion recovery (DIR). The former highlights both gray matter (GM) and white matter (WM), the latter highlights GM alone. We propose here a supervised classification scheme that does not require any anatomical a priori information to identify the 3 classes, "GM", "WM", and "background". Features are extracted by means of a local multi-scale texture analysis, computed for each pixel of the DIR and FLAIR sequences. The 9 textures considered are average, standard deviation, kurtosis, entropy, contrast, correlation, energy, homogeneity, and skewness, evaluated on a neighborhood of 3x3, 5x5, and 7x7 pixels. Hence, the total number of features associated to a pixel is 56 (9 textures x3 scales x2 sequences +2 original pixel values). The classifier employed is a Support Vector Machine with Radial Basis Function as kernel. From each of the 4 brain volumes evaluated, a DIR and a FLAIR slice have been selected and manually segmented by 2 expert neurologists, providing 1st and 2nd human reference observations which agree with an average accuracy of 99.03%. SVM performances have been assessed with a 4-fold cross-validation, yielding an average classification accuracy of 98.79%.
Normalization of relative and incomplete temporal expressions in clinical narratives.
Sun, Weiyi; Rumshisky, Anna; Uzuner, Ozlem
2015-09-01
To improve the normalization of relative and incomplete temporal expressions (RI-TIMEXes) in clinical narratives. We analyzed the RI-TIMEXes in temporally annotated corpora and propose two hypotheses regarding the normalization of RI-TIMEXes in the clinical narrative domain: the anchor point hypothesis and the anchor relation hypothesis. We annotated the RI-TIMEXes in three corpora to study the characteristics of RI-TMEXes in different domains. This informed the design of our RI-TIMEX normalization system for the clinical domain, which consists of an anchor point classifier, an anchor relation classifier, and a rule-based RI-TIMEX text span parser. We experimented with different feature sets and performed an error analysis for each system component. The annotation confirmed the hypotheses that we can simplify the RI-TIMEXes normalization task using two multi-label classifiers. Our system achieves anchor point classification, anchor relation classification, and rule-based parsing accuracy of 74.68%, 87.71%, and 57.2% (82.09% under relaxed matching criteria), respectively, on the held-out test set of the 2012 i2b2 temporal relation challenge. Experiments with feature sets reveal some interesting findings, such as: the verbal tense feature does not inform the anchor relation classification in clinical narratives as much as the tokens near the RI-TIMEX. Error analysis showed that underrepresented anchor point and anchor relation classes are difficult to detect. We formulate the RI-TIMEX normalization problem as a pair of multi-label classification problems. Considering only RI-TIMEX extraction and normalization, the system achieves statistically significant improvement over the RI-TIMEX results of the best systems in the 2012 i2b2 challenge. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Operational Tree Species Mapping in a Diverse Tropical Forest with Airborne Imaging Spectroscopy.
Baldeck, Claire A; Asner, Gregory P; Martin, Robin E; Anderson, Christopher B; Knapp, David E; Kellner, James R; Wright, S Joseph
2015-01-01
Remote identification and mapping of canopy tree species can contribute valuable information towards our understanding of ecosystem biodiversity and function over large spatial scales. However, the extreme challenges posed by highly diverse, closed-canopy tropical forests have prevented automated remote species mapping of non-flowering tree crowns in these ecosystems. We set out to identify individuals of three focal canopy tree species amongst a diverse background of tree and liana species on Barro Colorado Island, Panama, using airborne imaging spectroscopy data. First, we compared two leading single-class classification methods--binary support vector machine (SVM) and biased SVM--for their performance in identifying pixels of a single focal species. From this comparison we determined that biased SVM was more precise and created a multi-species classification model by combining the three biased SVM models. This model was applied to the imagery to identify pixels belonging to the three focal species and the prediction results were then processed to create a map of focal species crown objects. Crown-level cross-validation of the training data indicated that the multi-species classification model had pixel-level producer's accuracies of 94-97% for the three focal species, and field validation of the predicted crown objects indicated that these had user's accuracies of 94-100%. Our results demonstrate the ability of high spatial and spectral resolution remote sensing to accurately detect non-flowering crowns of focal species within a diverse tropical forest. We attribute the success of our model to recent classification and mapping techniques adapted to species detection in diverse closed-canopy forests, which can pave the way for remote species mapping in a wider variety of ecosystems.
Operational Tree Species Mapping in a Diverse Tropical Forest with Airborne Imaging Spectroscopy
Baldeck, Claire A.; Asner, Gregory P.; Martin, Robin E.; Anderson, Christopher B.; Knapp, David E.; Kellner, James R.; Wright, S. Joseph
2015-01-01
Remote identification and mapping of canopy tree species can contribute valuable information towards our understanding of ecosystem biodiversity and function over large spatial scales. However, the extreme challenges posed by highly diverse, closed-canopy tropical forests have prevented automated remote species mapping of non-flowering tree crowns in these ecosystems. We set out to identify individuals of three focal canopy tree species amongst a diverse background of tree and liana species on Barro Colorado Island, Panama, using airborne imaging spectroscopy data. First, we compared two leading single-class classification methods—binary support vector machine (SVM) and biased SVM—for their performance in identifying pixels of a single focal species. From this comparison we determined that biased SVM was more precise and created a multi-species classification model by combining the three biased SVM models. This model was applied to the imagery to identify pixels belonging to the three focal species and the prediction results were then processed to create a map of focal species crown objects. Crown-level cross-validation of the training data indicated that the multi-species classification model had pixel-level producer’s accuracies of 94–97% for the three focal species, and field validation of the predicted crown objects indicated that these had user’s accuracies of 94–100%. Our results demonstrate the ability of high spatial and spectral resolution remote sensing to accurately detect non-flowering crowns of focal species within a diverse tropical forest. We attribute the success of our model to recent classification and mapping techniques adapted to species detection in diverse closed-canopy forests, which can pave the way for remote species mapping in a wider variety of ecosystems. PMID:26153693
A Parallel Finite Set Statistical Simulator for Multi-Target Detection and Tracking
NASA Astrophysics Data System (ADS)
Hussein, I.; MacMillan, R.
2014-09-01
Finite Set Statistics (FISST) is a powerful Bayesian inference tool for the joint detection, classification and tracking of multi-target environments. FISST is capable of handling phenomena such as clutter, misdetections, and target birth and decay. Implicit within the approach are solutions to the data association and target label-tracking problems. Finally, FISST provides generalized information measures that can be used for sensor allocation across different types of tasks such as: searching for new targets, and classification and tracking of known targets. These FISST capabilities have been demonstrated on several small-scale illustrative examples. However, for implementation in a large-scale system as in the Space Situational Awareness problem, these capabilities require a lot of computational power. In this paper, we implement FISST in a parallel environment for the joint detection and tracking of multi-target systems. In this implementation, false alarms and misdetections will be modeled. Target birth and decay will not be modeled in the present paper. We will demonstrate the success of the method for as many targets as we possibly can in a desktop parallel environment. Performance measures will include: number of targets in the simulation, certainty of detected target tracks, computational time as a function of clutter returns and number of targets, among other factors.
Kirschneck, Michaela; Sabariego, Carla; Singer, Susanne; Tschiesner, Uta
2014-07-01
The International Classification of Functioning, Disability and Health Core Set for Head and Neck Cancer (ICF-HNC) covers the typical spectrum of problems in functioning in head and neck cancer. This study is part of a multistep process to develop practical guidelines in Germany. The purpose of this study was to identify instruments for the assessment of functioning using the ICF-HNC as reference. Four Delphi surveys with physicians, physiotherapists, psychologists, and social workers were performed to identify which aspects of the ICF-HNC are being treated and which assessment tools are recommended for the assessment of functioning. Ninety-seven percent categories of the ICF-HNC were treated by healthcare professionals participating in the current study. Altogether, 33 assessment tools were recommended for therapy monitoring, food intake, pain, further organic problems/laboratory tests, and psychosocial areas. Although the ICF-HNC is being currently implemented by the head and neck cancer experts, several areas are not covered regularly. Additionally, validated tools were rarely recommended. Copyright © 2013 Wiley Periodicals, Inc.
Bryan, Kenneth; Cunningham, Pádraig
2008-01-01
Background Microarrays have the capacity to measure the expressions of thousands of genes in parallel over many experimental samples. The unsupervised classification technique of bicluster analysis has been employed previously to uncover gene expression correlations over subsets of samples with the aim of providing a more accurate model of the natural gene functional classes. This approach also has the potential to aid functional annotation of unclassified open reading frames (ORFs). Until now this aspect of biclustering has been under-explored. In this work we illustrate how bicluster analysis may be extended into a 'semi-supervised' ORF annotation approach referred to as BALBOA. Results The efficacy of the BALBOA ORF classification technique is first assessed via cross validation and compared to a multi-class k-Nearest Neighbour (kNN) benchmark across three independent gene expression datasets. BALBOA is then used to assign putative functional annotations to unclassified yeast ORFs. These predictions are evaluated using existing experimental and protein sequence information. Lastly, we employ a related semi-supervised method to predict the presence of novel functional modules within yeast. Conclusion In this paper we demonstrate how unsupervised classification methods, such as bicluster analysis, may be extended using of available annotations to form semi-supervised approaches within the gene expression analysis domain. We show that such methods have the potential to improve upon supervised approaches and shed new light on the functions of unclassified ORFs and their co-regulation. PMID:18831786
NASA Astrophysics Data System (ADS)
Snavely, Rachel A.
Focusing on the semi-arid and highly disturbed landscape of San Clemente Island, California, this research tests the effectiveness of incorporating a hierarchal object-based image analysis (OBIA) approach with high-spatial resolution imagery and light detection and range (LiDAR) derived canopy height surfaces for mapping vegetation communities. The study is part of a large-scale research effort conducted by researchers at San Diego State University's (SDSU) Center for Earth Systems Analysis Research (CESAR) and Soil Ecology and Restoration Group (SERG), to develop an updated vegetation community map which will support both conservation and management decisions on Naval Auxiliary Landing Field (NALF) San Clemente Island. Trimble's eCognition Developer software was used to develop and generate vegetation community maps for two study sites, with and without vegetation height data as input. Overall and class-specific accuracies were calculated and compared across the two classifications. The highest overall accuracy (approximately 80%) was observed with the classification integrating airborne visible and near infrared imagery having very high spatial resolution with a LiDAR derived canopy height model. Accuracies for individual vegetation classes differed between both classification methods, but were highest when incorporating the LiDAR digital surface data. The addition of a canopy height model, however, yielded little difference in classification accuracies for areas of very dense shrub cover. Overall, the results show the utility of the OBIA approach for mapping vegetation with high spatial resolution imagery, and emphasizes the advantage of both multi-scale analysis and digital surface data for accuracy characterizing highly disturbed landscapes. The integrated imagery and digital canopy height model approach presented both advantages and limitations, which have to be considered prior to its operational use in mapping vegetation communities.
NASA Astrophysics Data System (ADS)
Besic, Nikola; Ventura, Jordi Figueras i.; Grazioli, Jacopo; Gabella, Marco; Germann, Urs; Berne, Alexis
2016-09-01
Polarimetric radar-based hydrometeor classification is the procedure of identifying different types of hydrometeors by exploiting polarimetric radar observations. The main drawback of the existing supervised classification methods, mostly based on fuzzy logic, is a significant dependency on a presumed electromagnetic behaviour of different hydrometeor types. Namely, the results of the classification largely rely upon the quality of scattering simulations. When it comes to the unsupervised approach, it lacks the constraints related to the hydrometeor microphysics. The idea of the proposed method is to compensate for these drawbacks by combining the two approaches in a way that microphysical hypotheses can, to a degree, adjust the content of the classes obtained statistically from the observations. This is done by means of an iterative approach, performed offline, which, in a statistical framework, examines clustered representative polarimetric observations by comparing them to the presumed polarimetric properties of each hydrometeor class. Aside from comparing, a routine alters the content of clusters by encouraging further statistical clustering in case of non-identification. By merging all identified clusters, the multi-dimensional polarimetric signatures of various hydrometeor types are obtained for each of the studied representative datasets, i.e. for each radar system of interest. These are depicted by sets of centroids which are then employed in operational labelling of different hydrometeors. The method has been applied on three C-band datasets, each acquired by different operational radar from the MeteoSwiss Rad4Alp network, as well as on two X-band datasets acquired by two research mobile radars. The results are discussed through a comparative analysis which includes a corresponding supervised and unsupervised approach, emphasising the operational potential of the proposed method.
NASA Astrophysics Data System (ADS)
Candare, Rudolph Joshua; Japitana, Michelle; Cubillas, James Earl; Ramirez, Cherry Bryan
2016-06-01
This research describes the methods involved in the mapping of different high value crops in Agusan del Norte Philippines using LiDAR. This project is part of the Phil-LiDAR 2 Program which aims to conduct a nationwide resource assessment using LiDAR. Because of the high resolution data involved, the methodology described here utilizes object-based image analysis and the use of optimal features from LiDAR data and Orthophoto. Object-based classification was primarily done by developing rule-sets in eCognition. Several features from the LiDAR data and Orthophotos were used in the development of rule-sets for classification. Generally, classes of objects can't be separated by simple thresholds from different features making it difficult to develop a rule-set. To resolve this problem, the image-objects were subjected to Support Vector Machine learning. SVMs have gained popularity because of their ability to generalize well given a limited number of training samples. However, SVMs also suffer from parameter assignment issues that can significantly affect the classification results. More specifically, the regularization parameter C in linear SVM has to be optimized through cross validation to increase the overall accuracy. After performing the segmentation in eCognition, the optimization procedure as well as the extraction of the equations of the hyper-planes was done in Matlab. The learned hyper-planes separating one class from another in the multi-dimensional feature-space can be thought of as super-features which were then used in developing the classifier rule set in eCognition. In this study, we report an overall classification accuracy of greater than 90% in different areas.
Feasibility of Active Machine Learning for Multiclass Compound Classification.
Lang, Tobias; Flachsenberg, Florian; von Luxburg, Ulrike; Rarey, Matthias
2016-01-25
A common task in the hit-to-lead process is classifying sets of compounds into multiple, usually structural classes, which build the groundwork for subsequent SAR studies. Machine learning techniques can be used to automate this process by learning classification models from training compounds of each class. Gathering class information for compounds can be cost-intensive as the required data needs to be provided by human experts or experiments. This paper studies whether active machine learning can be used to reduce the required number of training compounds. Active learning is a machine learning method which processes class label data in an iterative fashion. It has gained much attention in a broad range of application areas. In this paper, an active learning method for multiclass compound classification is proposed. This method selects informative training compounds so as to optimally support the learning progress. The combination with human feedback leads to a semiautomated interactive multiclass classification procedure. This method was investigated empirically on 15 compound classification tasks containing 86-2870 compounds in 3-38 classes. The empirical results show that active learning can solve these classification tasks using 10-80% of the data which would be necessary for standard learning techniques.
Enriching User-Oriented Class Associations for Library Classification Schemes.
ERIC Educational Resources Information Center
Pu, Hsiao-Tieh; Yang, Chyan
2003-01-01
Explores the possibility of adding user-oriented class associations to hierarchical library classification schemes. Analyses a log of book circulation records from a university library in Taiwan and shows that classification schemes can be made more adaptable by analyzing circulation patterns of similar users. (Author/LRW)
Single-trial EEG RSVP classification using convolutional neural networks
NASA Astrophysics Data System (ADS)
Shamwell, Jared; Lee, Hyungtae; Kwon, Heesung; Marathe, Amar R.; Lawhern, Vernon; Nothwang, William
2016-05-01
Traditionally, Brain-Computer Interfaces (BCI) have been explored as a means to return function to paralyzed or otherwise debilitated individuals. An emerging use for BCIs is in human-autonomy sensor fusion where physiological data from healthy subjects is combined with machine-generated information to enhance the capabilities of artificial systems. While human-autonomy fusion of physiological data and computer vision have been shown to improve classification during visual search tasks, to date these approaches have relied on separately trained classification models for each modality. We aim to improve human-autonomy classification performance by developing a single framework that builds codependent models of human electroencephalograph (EEG) and image data to generate fused target estimates. As a first step, we developed a novel convolutional neural network (CNN) architecture and applied it to EEG recordings of subjects classifying target and non-target image presentations during a rapid serial visual presentation (RSVP) image triage task. The low signal-to-noise ratio (SNR) of EEG inherently limits the accuracy of single-trial classification and when combined with the high dimensionality of EEG recordings, extremely large training sets are needed to prevent overfitting and achieve accurate classification from raw EEG data. This paper explores a new deep CNN architecture for generalized multi-class, single-trial EEG classification across subjects. We compare classification performance from the generalized CNN architecture trained across all subjects to the individualized XDAWN, HDCA, and CSP neural classifiers which are trained and tested on single subjects. Preliminary results show that our CNN meets and slightly exceeds the performance of the other classifiers despite being trained across subjects.
Wildlife management by habitat units: A preliminary plan of action
NASA Technical Reports Server (NTRS)
Frentress, C. D.; Frye, R. G.
1975-01-01
Procedures for yielding vegetation type maps were developed using LANDSAT data and a computer assisted classification analysis (LARSYS) to assist in managing populations of wildlife species by defined area units. Ground cover in Travis County, Texas was classified on two occasions using a modified version of the unsupervised approach to classification. The first classification produced a total of 17 classes. Examination revealed that further grouping was justified. A second analysis produced 10 classes which were displayed on printouts which were later color-coded. The final classification was 82 percent accurate. While the classification map appeared to satisfactorily depict the existing vegetation, two classes were determined to contain significant error. The major sources of error could have been eliminated by stratifying cluster sites more closely among previously mapped soil associations that are identified with particular plant associations and by precisely defining class nomenclature using established criteria early in the analysis.
A soil map of a large watershed in China: applying digital soil mapping in a data sparse region
NASA Astrophysics Data System (ADS)
Barthold, F.; Blank, B.; Wiesmeier, M.; Breuer, L.; Frede, H.-G.
2009-04-01
Prediction of soil classes in data sparse regions is a major research challenge. With the advent of machine learning the possibilities to spatially predict soil classes have increased tremendously and given birth to new possibilities in soil mapping. Digital soil mapping is a research field that has been established during the last decades and has been accepted widely. We now need to develop tools to reduce the uncertainty in soil predictions. This is especially challenging in data sparse regions. One approach to do this is to implement soil taxonomic distance as a classification error criterion in classification and regression trees (CART) as suggested by Minasny et al. (Geoderma 142 (2007) 285-293). This approach assumes that the classification error should be larger between soils that are more dissimilar, i.e. differ in a larger number of soil properties, and smaller between more similar soils. Our study area is the Xilin River Basin, which is located in central Inner Mongolia in China. It is characterized by semi arid climate conditions and is representative for the natural occurring steppe ecosystem. The study area comprises 3600 km2. We applied a random, stratified sampling design after McKenzie and Ryan (Geoderma 89 (1999) 67-94) with landuse and topography as stratifying variables. We defined 10 sampling classes, from each class 14 replicates were randomly drawn and sampled. The dataset was split into 100 soil profiles for training and 40 soil profiles for validation. We then applied classification and regression trees (CART) to quantify the relationships between soil classes and environmental covariates. The classification tree explained 75.5% of the variance with land use and geology as most important predictor variables. Among the 8 soil classes that we predicted, the Kastanozems cover most of the area. They are predominantly found in steppe areas. However, even some of the soils at sand dune sites, which were thought to show only little soil formation, can be classified as Kastanozems. Besides the Kastanozems, Regosols are most common at the sand dune sites as well as at sites that are defined as bare soil which are characterized by little or no vegetation. Gleysols are mostly found at sites in the vicinity of the Xilin river, which are connected to the groundwater. They can also be found in small valleys or depressions where sub-surface waters from neighboring areas collect. The richest soils are found in mountain meadow areas. Pedogenetic conditions here are most favorable and lead to the formation of Chernozems with deep humic Ah horizons. Other soil types that occur in the study area are Arenosols, Calcisols, Cambisol and Phaeozems. In addition, soil taxonomic distance is implemented into the decision tree procedure as a measure of classification error. The results of incorporating taxonomic distance as a loss function in the decision tree will be compared with the standard application of the decision tree.
NASA Astrophysics Data System (ADS)
Fezzani, Ridha; Berger, Laurent
2018-06-01
An automated signal-based method was developed in order to analyse the seafloor backscatter data logged by calibrated multibeam echosounder. The processing consists first in the clustering of each survey sub-area into a small number of homogeneous sediment types, based on the backscatter average level at one or several incidence angles. Second, it uses their local average angular response to extract discriminant descriptors, obtained by fitting the field data to the Generic Seafloor Acoustic Backscatter parametric model. Third, the descriptors are used for seafloor type classification. The method was tested on the multi-year data recorded by a calibrated 90-kHz Simrad ME70 multibeam sonar operated in the Bay of Biscay, France and Celtic Sea, Ireland. It was applied for seafloor-type classification into 12 classes, to a dataset of 158 spots surveyed for demersal and benthic fauna study and monitoring. Qualitative analyses and classified clusters using extracted parameters show a good discriminatory potential, indicating the robustness of this approach.