NASA Technical Reports Server (NTRS)
Tilton, J. C.; Swain, P. H. (Principal Investigator); Vardeman, S. B.
1981-01-01
A key input to a statistical classification algorithm, which exploits the tendency of certain ground cover classes to occur more frequently in some spatial context than in others, is a statistical characterization of the context: the context distribution. An unbiased estimator of the context distribution is discussed which, besides having the advantage of statistical unbiasedness, has the additional advantage over other estimation techniques of being amenable to an adaptive implementation in which the context distribution estimate varies according to local contextual information. Results from applying the unbiased estimator to the contextual classification of three real LANDSAT data sets are presented and contrasted with results from non-contextual classifications and from contextual classifications utilizing other context distribution estimation techniques.
ERIC Educational Resources Information Center
Montoya, Isaac D.
2008-01-01
Three classification techniques (Chi-square Automatic Interaction Detection [CHAID], Classification and Regression Tree [CART], and discriminant analysis) were tested to determine their accuracy in predicting Temporary Assistance for Needy Families program recipients' future employment. Technique evaluation was based on proportion of correctly…
An unsupervised classification technique for multispectral remote sensing data.
NASA Technical Reports Server (NTRS)
Su, M. Y.; Cummings, R. E.
1973-01-01
Description of a two-part clustering technique consisting of (a) a sequential statistical clustering, which is essentially a sequential variance analysis, and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum-likelihood classification techniques.
Unsupervised classification of earth resources data.
NASA Technical Reports Server (NTRS)
Su, M. Y.; Jayroe, R. R., Jr.; Cummings, R. E.
1972-01-01
A new clustering technique is presented. It consists of two parts: (a) a sequential statistical clustering which is essentially a sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by existing supervised maximum liklihood classification technique.
Bolin, Jocelyn Holden; Finch, W Holmes
2014-01-01
Statistical classification of phenomena into observed groups is very common in the social and behavioral sciences. Statistical classification methods, however, are affected by the characteristics of the data under study. Statistical classification can be further complicated by initial misclassification of the observed groups. The purpose of this study is to investigate the impact of initial training data misclassification on several statistical classification and data mining techniques. Misclassification conditions in the three group case will be simulated and results will be presented in terms of overall as well as subgroup classification accuracy. Results show decreased classification accuracy as sample size, group separation and group size ratio decrease and as misclassification percentage increases with random forests demonstrating the highest accuracy across conditions.
NASA Technical Reports Server (NTRS)
Park, Steve
1990-01-01
A large and diverse number of computational techniques are routinely used to process and analyze remotely sensed data. These techniques include: univariate statistics; multivariate statistics; principal component analysis; pattern recognition and classification; other multivariate techniques; geometric correction; registration and resampling; radiometric correction; enhancement; restoration; Fourier analysis; and filtering. Each of these techniques will be considered, in order.
Theory and analysis of statistical discriminant techniques as applied to remote sensing data
NASA Technical Reports Server (NTRS)
Odell, P. L.
1973-01-01
Classification of remote earth resources sensing data according to normed exponential density statistics is reported. The use of density models appropriate for several physical situations provides an exact solution for the probabilities of classifications associated with the Bayes discriminant procedure even when the covariance matrices are unequal.
Automated Tissue Classification Framework for Reproducible Chronic Wound Assessment
Mukherjee, Rashmi; Manohar, Dhiraj Dhane; Das, Dev Kumar; Achar, Arun; Mitra, Analava; Chakraborty, Chandan
2014-01-01
The aim of this paper was to develop a computer assisted tissue classification (granulation, necrotic, and slough) scheme for chronic wound (CW) evaluation using medical image processing and statistical machine learning techniques. The red-green-blue (RGB) wound images grabbed by normal digital camera were first transformed into HSI (hue, saturation, and intensity) color space and subsequently the “S” component of HSI color channels was selected as it provided higher contrast. Wound areas from 6 different types of CW were segmented from whole images using fuzzy divergence based thresholding by minimizing edge ambiguity. A set of color and textural features describing granulation, necrotic, and slough tissues in the segmented wound area were extracted using various mathematical techniques. Finally, statistical learning algorithms, namely, Bayesian classification and support vector machine (SVM), were trained and tested for wound tissue classification in different CW images. The performance of the wound area segmentation protocol was further validated by ground truth images labeled by clinical experts. It was observed that SVM with 3rd order polynomial kernel provided the highest accuracies, that is, 86.94%, 90.47%, and 75.53%, for classifying granulation, slough, and necrotic tissues, respectively. The proposed automated tissue classification technique achieved the highest overall accuracy, that is, 87.61%, with highest kappa statistic value (0.793). PMID:25114925
NASA Technical Reports Server (NTRS)
Djorgovski, George
1993-01-01
The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multiparameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resource.
NASA Technical Reports Server (NTRS)
Djorgovski, Stanislav
1992-01-01
The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.
Unsupervised classification of remote multispectral sensing data
NASA Technical Reports Server (NTRS)
Su, M. Y.
1972-01-01
The new unsupervised classification technique for classifying multispectral remote sensing data which can be either from the multispectral scanner or digitized color-separation aerial photographs consists of two parts: (a) a sequential statistical clustering which is a one-pass sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. Applications of the technique using an IBM-7094 computer on multispectral data sets over Purdue's Flight Line C-1 and the Yellowstone National Park test site have been accomplished. Comparisons between the classification maps by the unsupervised technique and the supervised maximum liklihood technique indicate that the classification accuracies are in agreement.
NASA Technical Reports Server (NTRS)
Hill, C. L.
1984-01-01
A computer-implemented classification has been derived from Landsat-4 Thematic Mapper data acquired over Baldwin County, Alabama on January 15, 1983. One set of spectral signatures was developed from the data by utilizing a 3x3 pixel sliding window approach. An analysis of the classification produced from this technique identified forested areas. Additional information regarding only the forested areas. Additional information regarding only the forested areas was extracted by employing a pixel-by-pixel signature development program which derived spectral statistics only for pixels within the forested land covers. The spectral statistics from both approaches were integrated and the data classified. This classification was evaluated by comparing the spectral classes produced from the data against corresponding ground verification polygons. This iterative data analysis technique resulted in an overall classification accuracy of 88.4 percent correct for slash pine, young pine, loblolly pine, natural pine, and mixed hardwood-pine. An accuracy assessment matrix has been produced for the classification.
Statistical approach for selection of biologically informative genes.
Das, Samarendra; Rai, Anil; Mishra, D C; Rai, Shesh N
2018-05-20
Selection of informative genes from high dimensional gene expression data has emerged as an important research area in genomics. Many gene selection techniques have been proposed so far are either based on relevancy or redundancy measure. Further, the performance of these techniques has been adjudged through post selection classification accuracy computed through a classifier using the selected genes. This performance metric may be statistically sound but may not be biologically relevant. A statistical approach, i.e. Boot-MRMR, was proposed based on a composite measure of maximum relevance and minimum redundancy, which is both statistically sound and biologically relevant for informative gene selection. For comparative evaluation of the proposed approach, we developed two biological sufficient criteria, i.e. Gene Set Enrichment with QTL (GSEQ) and biological similarity score based on Gene Ontology (GO). Further, a systematic and rigorous evaluation of the proposed technique with 12 existing gene selection techniques was carried out using five gene expression datasets. This evaluation was based on a broad spectrum of statistically sound (e.g. subject classification) and biological relevant (based on QTL and GO) criteria under a multiple criteria decision-making framework. The performance analysis showed that the proposed technique selects informative genes which are more biologically relevant. The proposed technique is also found to be quite competitive with the existing techniques with respect to subject classification and computational time. Our results also showed that under the multiple criteria decision-making setup, the proposed technique is best for informative gene selection over the available alternatives. Based on the proposed approach, an R Package, i.e. BootMRMR has been developed and available at https://cran.r-project.org/web/packages/BootMRMR. This study will provide a practical guide to select statistical techniques for selecting informative genes from high dimensional expression data for breeding and system biology studies. Published by Elsevier B.V.
The composite sequential clustering technique for analysis of multispectral scanner data
NASA Technical Reports Server (NTRS)
Su, M. Y.
1972-01-01
The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.
Evaluation criteria for software classification inventories, accuracies, and maps
NASA Technical Reports Server (NTRS)
Jayroe, R. R., Jr.
1976-01-01
Statistical criteria are presented for modifying the contingency table used to evaluate tabular classification results obtained from remote sensing and ground truth maps. This classification technique contains information on the spatial complexity of the test site, on the relative location of classification errors, on agreement of the classification maps with ground truth maps, and reduces back to the original information normally found in a contingency table.
Analyzing thematic maps and mapping for accuracy
Rosenfield, G.H.
1982-01-01
Two problems which exist while attempting to test the accuracy of thematic maps and mapping are: (1) evaluating the accuracy of thematic content, and (2) evaluating the effects of the variables on thematic mapping. Statistical analysis techniques are applicable to both these problems and include techniques for sampling the data and determining their accuracy. In addition, techniques for hypothesis testing, or inferential statistics, are used when comparing the effects of variables. A comprehensive and valid accuracy test of a classification project, such as thematic mapping from remotely sensed data, includes the following components of statistical analysis: (1) sample design, including the sample distribution, sample size, size of the sample unit, and sampling procedure; and (2) accuracy estimation, including estimation of the variance and confidence limits. Careful consideration must be given to the minimum sample size necessary to validate the accuracy of a given. classification category. The results of an accuracy test are presented in a contingency table sometimes called a classification error matrix. Usually the rows represent the interpretation, and the columns represent the verification. The diagonal elements represent the correct classifications. The remaining elements of the rows represent errors by commission, and the remaining elements of the columns represent the errors of omission. For tests of hypothesis that compare variables, the general practice has been to use only the diagonal elements from several related classification error matrices. These data are arranged in the form of another contingency table. The columns of the table represent the different variables being compared, such as different scales of mapping. The rows represent the blocking characteristics, such as the various categories of classification. The values in the cells of the tables might be the counts of correct classification or the binomial proportions of these counts divided by either the row totals or the column totals from the original classification error matrices. In hypothesis testing, when the results of tests of multiple sample cases prove to be significant, some form of statistical test must be used to separate any results that differ significantly from the others. In the past, many analyses of the data in this error matrix were made by comparing the relative magnitudes of the percentage of correct classifications, for either individual categories, the entire map or both. More rigorous analyses have used data transformations and (or) two-way classification analysis of variance. A more sophisticated step of data analysis techniques would be to use the entire classification error matrices using the methods of discrete multivariate analysis or of multiviariate analysis of variance.
NASA Technical Reports Server (NTRS)
Sung, Q. C.; Miller, L. D.
1977-01-01
Three methods were tested for collection of the training sets needed to establish the spectral signatures of the land uses/land covers sought due to the difficulties of retrospective collection of representative ground control data. Computer preprocessing techniques applied to the digital images to improve the final classification results were geometric corrections, spectral band or image ratioing and statistical cleaning of the representative training sets. A minimal level of statistical verification was made based upon the comparisons between the airphoto estimates and the classification results. The verifications provided a further support to the selection of MSS band 5 and 7. It also indicated that the maximum likelihood ratioing technique can achieve more agreeable classification results with the airphoto estimates than the stepwise discriminant analysis.
An automated approach to the design of decision tree classifiers
NASA Technical Reports Server (NTRS)
Argentiero, P.; Chin, R.; Beaudet, P.
1982-01-01
An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.
NASA Astrophysics Data System (ADS)
Ranaie, Mehrdad; Soffianian, Alireza; Pourmanafi, Saeid; Mirghaffari, Noorollah; Tarkesh, Mostafa
2018-03-01
In recent decade, analyzing the remotely sensed imagery is considered as one of the most common and widely used procedures in the environmental studies. In this case, supervised image classification techniques play a central role. Hence, taking a high resolution Worldview-3 over a mixed urbanized landscape in Iran, three less applied image classification methods including Bagged CART, Stochastic gradient boosting model and Neural network with feature extraction were tested and compared with two prevalent methods: random forest and support vector machine with linear kernel. To do so, each method was run ten time and three validation techniques was used to estimate the accuracy statistics consist of cross validation, independent validation and validation with total of train data. Moreover, using ANOVA and Tukey test, statistical difference significance between the classification methods was significantly surveyed. In general, the results showed that random forest with marginal difference compared to Bagged CART and stochastic gradient boosting model is the best performing method whilst based on independent validation there was no significant difference between the performances of classification methods. It should be finally noted that neural network with feature extraction and linear support vector machine had better processing speed than other.
"Relative CIR": an image enhancement and visualization technique
Fleming, Michael D.
1993-01-01
Many techniques exist to spectrally and spatially enhance digital multispectral scanner data. One technique enhances an image while keeping the colors as they would appear in a color-infrared (CIR) image. This "relative CIR" technique generates an image that is both spectrally and spatially enhanced, while displaying a maximum range of colors. The technique enables an interpreter to visualize either spectral or land cover classes by their relative CIR characteristics. A relative CIR image is generated by developed spectral statistics for each class in the classifications and then, using a nonparametric approach for spectral enhancement, the means of the classes for each band are ranked. A 3 by 3 pixel smoothing filter is applied to the classification for spatial enhancement and the classes are mapped to the representative rank for each band. Practical applications of the technique include displaying an image classification product as a CIR image that was not derived directly from a spectral image, visualizing how a land cover classification would look as a CIR image, and displaying a spectral classification or intermediate product that will be used to label spectral classes.
NASA Astrophysics Data System (ADS)
Zink, Frank Edward
The detection and classification of pulmonary nodules is of great interest in chest radiography. Nodules are often indicative of primary cancer, and their detection is particularly important in asymptomatic patients. The ability to classify nodules as calcified or non-calcified is important because calcification is a positive indicator that the nodule is benign. Dual-energy methods offer the potential to improve both the detection and classification of nodules by allowing the formation of material-selective images. Tissue-selective images can improve detection by virtue of the elimination of obscuring rib structure. Bone -selective images are essentially calcium images, allowing classification of the nodule. A dual-energy technique is introduced which uses a computed radiography system to acquire dual-energy chest radiographs in a single-exposure. All aspects of the dual-energy technique are described, with particular emphasis on scatter-correction, beam-hardening correction, and noise-reduction algorithms. The adaptive noise-reduction algorithm employed improves material-selective signal-to-noise ratio by up to a factor of seven with minimal sacrifice in selectivity. A clinical comparison study is described, undertaken to compare the dual-energy technique to conventional chest radiography for the tasks of nodule detection and classification. Observer performance data were collected using the Free Response Observer Characteristic (FROC) method and the bi-normal Alternative FROC (AFROC) performance model. Results of the comparison study, analyzed using two common multiple observer statistical models, showed that the dual-energy technique was superior to conventional chest radiography for detection of nodules at a statistically significant level (p < .05). Discussion of the comparison study emphasizes the unique combination of data collection and analysis techniques employed, as well as the limitations of comparison techniques in the larger context of technology assessment.
Signal analysis techniques for incipient failure detection in turbomachinery
NASA Technical Reports Server (NTRS)
Coffin, T.
1985-01-01
Signal analysis techniques for the detection and classification of incipient mechanical failures in turbomachinery were developed, implemented and evaluated. Signal analysis techniques available to describe dynamic measurement characteristics are reviewed. Time domain and spectral methods are described, and statistical classification in terms of moments is discussed. Several of these waveform analysis techniques were implemented on a computer and applied to dynamic signals. A laboratory evaluation of the methods with respect to signal detection capability is described. Plans for further technique evaluation and data base development to characterize turbopump incipient failure modes from Space Shuttle main engine (SSME) hot firing measurements are outlined.
Using statistical text classification to identify health information technology incidents
Chai, Kevin E K; Anthony, Stephen; Coiera, Enrico; Magrabi, Farah
2013-01-01
Objective To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database. Design We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined. Measurements κ statistic, F1 score, precision and recall. Results Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165). Conclusions Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation. PMID:23666777
Multiple signal classification algorithm for super-resolution fluorescence microscopy
Agarwal, Krishna; Macháň, Radek
2016-01-01
Single-molecule localization techniques are restricted by long acquisition and computational times, or the need of special fluorophores or biologically toxic photochemical environments. Here we propose a statistical super-resolution technique of wide-field fluorescence microscopy we call the multiple signal classification algorithm which has several advantages. It provides resolution down to at least 50 nm, requires fewer frames and lower excitation power and works even at high fluorophore concentrations. Further, it works with any fluorophore that exhibits blinking on the timescale of the recording. The multiple signal classification algorithm shows comparable or better performance in comparison with single-molecule localization techniques and four contemporary statistical super-resolution methods for experiments of in vitro actin filaments and other independently acquired experimental data sets. We also demonstrate super-resolution at timescales of 245 ms (using 49 frames acquired at 200 frames per second) in samples of live-cell microtubules and live-cell actin filaments imaged without imaging buffers. PMID:27934858
Ben Chaabane, Salim; Fnaiech, Farhat
2014-01-23
Color image segmentation has been so far applied in many areas; hence, recently many different techniques have been developed and proposed. In the medical imaging area, the image segmentation may be helpful to provide assistance to doctor in order to follow-up the disease of a certain patient from the breast cancer processed images. The main objective of this work is to rebuild and also to enhance each cell from the three component images provided by an input image. Indeed, from an initial segmentation obtained using the statistical features and histogram threshold techniques, the resulting segmentation may represent accurately the non complete and pasted cells and enhance them. This allows real help to doctors, and consequently, these cells become clear and easy to be counted. A novel method for color edges extraction based on statistical features and automatic threshold is presented. The traditional edge detector, based on the first and the second order neighborhood, describing the relationship between the current pixel and its neighbors, is extended to the statistical domain. Hence, color edges in an image are obtained by combining the statistical features and the automatic threshold techniques. Finally, on the obtained color edges with specific primitive color, a combination rule is used to integrate the edge results over the three color components. Breast cancer cell images were used to evaluate the performance of the proposed method both quantitatively and qualitatively. Hence, a visual and a numerical assessment based on the probability of correct classification (PC), the false classification (Pf), and the classification accuracy (Sens(%)) are presented and compared with existing techniques. The proposed method shows its superiority in the detection of points which really belong to the cells, and also the facility of counting the number of the processed cells. Computer simulations highlight that the proposed method substantially enhances the segmented image with smaller error rates better than other existing algorithms under the same settings (patterns and parameters). Moreover, it provides high classification accuracy, reaching the rate of 97.94%. Additionally, the segmentation method may be extended to other medical imaging types having similar properties.
NASA Technical Reports Server (NTRS)
Hoffer, R. M. (Principal Investigator); Knowlton, D. J.; Dean, M. E.
1981-01-01
A set of training statistics for the 30 meter resolution simulated thematic mapper MSS data was generated based on land use/land cover classes. In addition to this supervised data set, a nonsupervised multicluster block of training statistics is being defined in order to compare the classification results and evaluate the effect of the different training selection methods on classification performance. Two test data sets, defined using a stratified sampling procedure incorporating a grid system with dimensions of 50 lines by 50 columns, and another set based on an analyst supervised set of test fields were used to evaluate the classifications of the TMS data. The supervised training data set generated training statistics, and a per point Gaussian maximum likelihood classification of the 1979 TMS data was obtained. The August 1980 MSS data was radiometrically adjusted. The SAR data was redigitized and the SAR imagery was qualitatively analyzed.
Comparison analysis for classification algorithm in data mining and the study of model use
NASA Astrophysics Data System (ADS)
Chen, Junde; Zhang, Defu
2018-04-01
As a key technique in data mining, classification algorithm was received extensive attention. Through an experiment of classification algorithm in UCI data set, we gave a comparison analysis method for the different algorithms and the statistical test was used here. Than that, an adaptive diagnosis model for preventive electricity stealing and leakage was given as a specific case in the paper.
Kalegowda, Yogesh; Harmer, Sarah L
2012-03-20
Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.
Some sequential, distribution-free pattern classification procedures with applications
NASA Technical Reports Server (NTRS)
Poage, J. L.
1971-01-01
Some sequential, distribution-free pattern classification techniques are presented. The decision problem to which the proposed classification methods are applied is that of discriminating between two kinds of electroencephalogram responses recorded from a human subject: spontaneous EEG and EEG driven by a stroboscopic light stimulus at the alpha frequency. The classification procedures proposed make use of the theory of order statistics. Estimates of the probabilities of misclassification are given. The procedures were tested on Gaussian samples and the EEG responses.
Using classification tree analysis to predict oak wilt distribution in Minnesota and Texas
Marla c. Downing; Vernon L. Thomas; Jennifer Juzwik; David N. Appel; Robin M. Reich; Kim Camilli
2008-01-01
We developed a methodology and compared results for predicting the potential distribution of Ceratocystis fagacearum (causal agent of oak wilt), in both Anoka County, MN, and Fort Hood, TX. The Potential Distribution of Oak Wilt (PDOW) utilizes a binary classification tree statistical technique that incorporates: geographical information systems (GIS...
ERIC Educational Resources Information Center
Acar, Tu¨lin
2014-01-01
In literature, it has been observed that many enhanced criteria are limited by factor analysis techniques. Besides examinations of statistical structure and/or psychological structure, such validity studies as cross validation and classification-sequencing studies should be performed frequently. The purpose of this study is to examine cross…
Pattern recognition of satellite cloud imagery for improved weather prediction
NASA Technical Reports Server (NTRS)
Gautier, Catherine; Somerville, Richard C. J.; Volfson, Leonid B.
1986-01-01
The major accomplishment was the successful development of a method for extracting time derivative information from geostationary meteorological satellite imagery. This research is a proof-of-concept study which demonstrates the feasibility of using pattern recognition techniques and a statistical cloud classification method to estimate time rate of change of large-scale meteorological fields from remote sensing data. The cloud classification methodology is based on typical shape function analysis of parameter sets characterizing the cloud fields. The three specific technical objectives, all of which were successfully achieved, are as follows: develop and test a cloud classification technique based on pattern recognition methods, suitable for the analysis of visible and infrared geostationary satellite VISSR imagery; develop and test a methodology for intercomparing successive images using the cloud classification technique, so as to obtain estimates of the time rate of change of meteorological fields; and implement this technique in a testbed system incorporating an interactive graphics terminal to determine the feasibility of extracting time derivative information suitable for comparison with numerical weather prediction products.
Applications of Support Vector Machines In Chemo And Bioinformatics
NASA Astrophysics Data System (ADS)
Jayaraman, V. K.; Sundararajan, V.
2010-10-01
Conventional linear & nonlinear tools for classification, regression & data driven modeling are being replaced on a rapid scale by newer techniques & tools based on artificial intelligence and machine learning. While the linear techniques are not applicable for inherently nonlinear problems, newer methods serve as attractive alternatives for solving real life problems. Support Vector Machine (SVM) classifiers are a set of universal feed-forward network based classification algorithms that have been formulated from statistical learning theory and structural risk minimization principle. SVM regression closely follows the classification methodology. In this work recent applications of SVM in Chemo & Bioinformatics will be described with suitable illustrative examples.
Text Classification for Organizational Researchers
Kobayashi, Vladimer B.; Mol, Stefan T.; Berkers, Hannah A.; Kismihók, Gábor; Den Hartog, Deanne N.
2017-01-01
Organizations are increasingly interested in classifying texts or parts thereof into categories, as this enables more effective use of their information. Manual procedures for text classification work well for up to a few hundred documents. However, when the number of documents is larger, manual procedures become laborious, time-consuming, and potentially unreliable. Techniques from text mining facilitate the automatic assignment of text strings to categories, making classification expedient, fast, and reliable, which creates potential for its application in organizational research. The purpose of this article is to familiarize organizational researchers with text mining techniques from machine learning and statistics. We describe the text classification process in several roughly sequential steps, namely training data preparation, preprocessing, transformation, application of classification techniques, and validation, and provide concrete recommendations at each step. To help researchers develop their own text classifiers, the R code associated with each step is presented in a tutorial. The tutorial draws from our own work on job vacancy mining. We end the article by discussing how researchers can validate a text classification model and the associated output. PMID:29881249
Design of partially supervised classifiers for multispectral image data
NASA Technical Reports Server (NTRS)
Jeon, Byeungwoo; Landgrebe, David
1993-01-01
A partially supervised classification problem is addressed, especially when the class definition and corresponding training samples are provided a priori only for just one particular class. In practical applications of pattern classification techniques, a frequently observed characteristic is the heavy, often nearly impossible requirements on representative prior statistical class characteristics of all classes in a given data set. Considering the effort in both time and man-power required to have a well-defined, exhaustive list of classes with a corresponding representative set of training samples, this 'partially' supervised capability would be very desirable, assuming adequate classifier performance can be obtained. Two different classification algorithms are developed to achieve simplicity in classifier design by reducing the requirement of prior statistical information without sacrificing significant classifying capability. The first one is based on optimal significance testing, where the optimal acceptance probability is estimated directly from the data set. In the second approach, the partially supervised classification is considered as a problem of unsupervised clustering with initially one known cluster or class. A weighted unsupervised clustering procedure is developed to automatically define other classes and estimate their class statistics. The operational simplicity thus realized should make these partially supervised classification schemes very viable tools in pattern classification.
Wu, Jianning; Wu, Bin
2015-01-01
The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis. PMID:25705672
Wu, Jianning; Wu, Bin
2015-01-01
The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis.
Gunavathi, Chellamuthu; Premalatha, Kandasamy
2014-01-01
Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.
NASA Astrophysics Data System (ADS)
Äijälä, Mikko; Heikkinen, Liine; Fröhlich, Roman; Canonaco, Francesco; Prévôt, André S. H.; Junninen, Heikki; Petäjä, Tuukka; Kulmala, Markku; Worsnop, Douglas; Ehn, Mikael
2017-03-01
Mass spectrometric measurements commonly yield data on hundreds of variables over thousands of points in time. Refining and synthesizing this raw data into chemical information necessitates the use of advanced, statistics-based data analytical techniques. In the field of analytical aerosol chemistry, statistical, dimensionality reductive methods have become widespread in the last decade, yet comparable advanced chemometric techniques for data classification and identification remain marginal. Here we present an example of combining data dimensionality reduction (factorization) with exploratory classification (clustering), and show that the results cannot only reproduce and corroborate earlier findings, but also complement and broaden our current perspectives on aerosol chemical classification. We find that applying positive matrix factorization to extract spectral characteristics of the organic component of air pollution plumes, together with an unsupervised clustering algorithm, k-means+ + , for classification, reproduces classical organic aerosol speciation schemes. Applying appropriately chosen metrics for spectral dissimilarity along with optimized data weighting, the source-specific pollution characteristics can be statistically resolved even for spectrally very similar aerosol types, such as different combustion-related anthropogenic aerosol species and atmospheric aerosols with similar degree of oxidation. In addition to the typical oxidation level and source-driven aerosol classification, we were also able to classify and characterize outlier groups that would likely be disregarded in a more conventional analysis. Evaluating solution quality for the classification also provides means to assess the performance of mass spectral similarity metrics and optimize weighting for mass spectral variables. This facilitates algorithm-based evaluation of aerosol spectra, which may prove invaluable for future development of automatic methods for spectra identification and classification. Robust, statistics-based results and data visualizations also provide important clues to a human analyst on the existence and chemical interpretation of data structures. Applying these methods to a test set of data, aerosol mass spectrometric data of organic aerosol from a boreal forest site, yielded five to seven different recurring pollution types from various sources, including traffic, cooking, biomass burning and nearby sawmills. Additionally, three distinct, minor pollution types were discovered and identified as amine-dominated aerosols.
Superiority of artificial neural networks for a genetic classification procedure.
Sant'Anna, I C; Tomaz, R S; Silva, G N; Nascimento, M; Bhering, L L; Cruz, C D
2015-08-19
The correct classification of individuals is extremely important for the preservation of genetic variability and for maximization of yield in breeding programs using phenotypic traits and genetic markers. The Fisher and Anderson discriminant functions are commonly used multivariate statistical techniques for these situations, which allow for the allocation of an initially unknown individual to predefined groups. However, for higher levels of similarity, such as those found in backcrossed populations, these methods have proven to be inefficient. Recently, much research has been devoted to developing a new paradigm of computing known as artificial neural networks (ANNs), which can be used to solve many statistical problems, including classification problems. The aim of this study was to evaluate the feasibility of ANNs as an evaluation technique of genetic diversity by comparing their performance with that of traditional methods. The discriminant functions were equally ineffective in discriminating the populations, with error rates of 23-82%, thereby preventing the correct discrimination of individuals between populations. The ANN was effective in classifying populations with low and high differentiation, such as those derived from a genetic design established from backcrosses, even in cases of low differentiation of the data sets. The ANN appears to be a promising technique to solve classification problems, since the number of individuals classified incorrectly by the ANN was always lower than that of the discriminant functions. We envisage the potential relevant application of this improved procedure in the genomic classification of markers to distinguish between breeds and accessions.
NASA Technical Reports Server (NTRS)
Joyce, A. T.
1974-01-01
Significant progress has been made in the classification of surface conditions (land uses) with computer-implemented techniques based on the use of ERTS digital data and pattern recognition software. The supervised technique presently used at the NASA Earth Resources Laboratory is based on maximum likelihood ratioing with a digital table look-up approach to classification. After classification, colors are assigned to the various surface conditions (land uses) classified, and the color-coded classification is film recorded on either positive or negative 9 1/2 in. film at the scale desired. Prints of the film strips are then mosaicked and photographed to produce a land use map in the format desired. Computer extraction of statistical information is performed to show the extent of each surface condition (land use) within any given land unit that can be identified in the image. Evaluations of the product indicate that classification accuracy is well within the limits for use by land resource managers and administrators. Classifications performed with digital data acquired during different seasons indicate that the combination of two or more classifications offer even better accuracy.
ERIC Educational Resources Information Center
Spearing, Debra; Woehlke, Paula
To assess the effect on discriminant analysis in terms of correct classification into two groups, the following parameters were systematically altered using Monte Carlo techniques: sample sizes; proportions of one group to the other; number of independent variables; and covariance matrices. The pairing of the off diagonals (or covariances) with…
NASA Technical Reports Server (NTRS)
Coggeshall, M. E.; Hoffer, R. M.
1973-01-01
Remote sensing equipment and automatic data processing techniques were employed as aids in the institution of improved forest resource management methods. On the basis of automatically calculated statistics derived from manually selected training samples, the feature selection processor of LARSYS selected, upon consideration of various groups of the four available spectral regions, a series of channel combinations whose automatic classification performances (for six cover types, including both deciduous and coniferous forest) were tested, analyzed, and further compared with automatic classification results obtained from digitized color infrared photography.
Das, Dev Kumar; Ghosh, Madhumala; Pal, Mallika; Maiti, Asok K; Chakraborty, Chandan
2013-02-01
The aim of this paper is to address the development of computer assisted malaria parasite characterization and classification using machine learning approach based on light microscopic images of peripheral blood smears. In doing this, microscopic image acquisition from stained slides, illumination correction and noise reduction, erythrocyte segmentation, feature extraction, feature selection and finally classification of different stages of malaria (Plasmodium vivax and Plasmodium falciparum) have been investigated. The erythrocytes are segmented using marker controlled watershed transformation and subsequently total ninety six features describing shape-size and texture of erythrocytes are extracted in respect to the parasitemia infected versus non-infected cells. Ninety four features are found to be statistically significant in discriminating six classes. Here a feature selection-cum-classification scheme has been devised by combining F-statistic, statistical learning techniques i.e., Bayesian learning and support vector machine (SVM) in order to provide the higher classification accuracy using best set of discriminating features. Results show that Bayesian approach provides the highest accuracy i.e., 84% for malaria classification by selecting 19 most significant features while SVM provides highest accuracy i.e., 83.5% with 9 most significant features. Finally, the performance of these two classifiers under feature selection framework has been compared toward malaria parasite classification. Copyright © 2012 Elsevier Ltd. All rights reserved.
Yang, Jun-Ho; Yoh, Jack J
2018-01-01
A novel technique is reported for separating overlapping latent fingerprints using chemometric approaches that combine laser-induced breakdown spectroscopy (LIBS) and multivariate analysis. The LIBS technique provides the capability of real time analysis and high frequency scanning as well as the data regarding the chemical composition of overlapping latent fingerprints. These spectra offer valuable information for the classification and reconstruction of overlapping latent fingerprints by implementing appropriate statistical multivariate analysis. The current study employs principal component analysis and partial least square methods for the classification of latent fingerprints from the LIBS spectra. This technique was successfully demonstrated through a classification study of four distinct latent fingerprints using classification methods such as soft independent modeling of class analogy (SIMCA) and partial least squares discriminant analysis (PLS-DA). The novel method yielded an accuracy of more than 85% and was proven to be sufficiently robust. Furthermore, through laser scanning analysis at a spatial interval of 125 µm, the overlapping fingerprints were reconstructed as separate two-dimensional forms.
Dasgupta, Nilanjan; Carin, Lawrence
2005-04-01
Time-reversal imaging (TRI) is analogous to matched-field processing, although TRI is typically very wideband and is appropriate for subsequent target classification (in addition to localization). Time-reversal techniques, as applied to acoustic target classification, are highly sensitive to channel mismatch. Hence, it is crucial to estimate the channel parameters before time-reversal imaging is performed. The channel-parameter statistics are estimated here by applying a geoacoustic inversion technique based on Gibbs sampling. The maximum a posteriori (MAP) estimate of the channel parameters are then used to perform time-reversal imaging. Time-reversal implementation requires a fast forward model, implemented here by a normal-mode framework. In addition to imaging, extraction of features from the time-reversed images is explored, with these applied to subsequent target classification. The classification of time-reversed signatures is performed by the relevance vector machine (RVM). The efficacy of the technique is analyzed on simulated in-channel data generated by a free-field finite element method (FEM) code, in conjunction with a channel propagation model, wherein the final classification performance is demonstrated to be relatively insensitive to the associated channel parameters. The underlying theory of Gibbs sampling and TRI are presented along with the feature extraction and target classification via the RVM.
Best Merge Region Growing with Integrated Probabilistic Classification for Hyperspectral Imagery
NASA Technical Reports Server (NTRS)
Tarabalka, Yuliya; Tilton, James C.
2011-01-01
A new method for spectral-spatial classification of hyperspectral images is proposed. The method is based on the integration of probabilistic classification within the hierarchical best merge region growing algorithm. For this purpose, preliminary probabilistic support vector machines classification is performed. Then, hierarchical step-wise optimization algorithm is applied, by iteratively merging regions with the smallest Dissimilarity Criterion (DC). The main novelty of this method consists in defining a DC between regions as a function of region statistical and geometrical features along with classification probabilities. Experimental results are presented on a 200-band AVIRIS image of the Northwestern Indiana s vegetation area and compared with those obtained by recently proposed spectral-spatial classification techniques. The proposed method improves classification accuracies when compared to other classification approaches.
Das, D K; Maiti, A K; Chakraborty, C
2015-03-01
In this paper, we propose a comprehensive image characterization cum classification framework for malaria-infected stage detection using microscopic images of thin blood smears. The methodology mainly includes microscopic imaging of Leishman stained blood slides, noise reduction and illumination correction, erythrocyte segmentation, feature selection followed by machine classification. Amongst three-image segmentation algorithms (namely, rule-based, Chan-Vese-based and marker-controlled watershed methods), marker-controlled watershed technique provides better boundary detection of erythrocytes specially in overlapping situations. Microscopic features at intensity, texture and morphology levels are extracted to discriminate infected and noninfected erythrocytes. In order to achieve subgroup of potential features, feature selection techniques, namely, F-statistic and information gain criteria are considered here for ranking. Finally, five different classifiers, namely, Naive Bayes, multilayer perceptron neural network, logistic regression, classification and regression tree (CART), RBF neural network have been trained and tested by 888 erythrocytes (infected and noninfected) for each features' subset. Performance evaluation of the proposed methodology shows that multilayer perceptron network provides higher accuracy for malaria-infected erythrocytes recognition and infected stage classification. Results show that top 90 features ranked by F-statistic (specificity: 98.64%, sensitivity: 100%, PPV: 99.73% and overall accuracy: 96.84%) and top 60 features ranked by information gain provides better results (specificity: 97.29%, sensitivity: 100%, PPV: 99.46% and overall accuracy: 96.73%) for malaria-infected stage classification. © 2014 The Authors Journal of Microscopy © 2014 Royal Microscopical Society.
NASA Astrophysics Data System (ADS)
Kerr, Laura T.; Adams, Aine; O'Dea, Shirley; Domijan, Katarina; Cullen, Ivor; Hennelly, Bryan M.
2014-05-01
Raman microspectroscopy can be applied to the urinary bladder for highly accurate classification and diagnosis of bladder cancer. This technique can be applied in vitro to bladder epithelial cells obtained from urine cytology or in vivo as an optical biopsy" to provide results in real-time with higher sensitivity and specificity than current clinical methods. However, there exists a high degree of variability across experimental parameters which need to be standardised before this technique can be utilized in an everyday clinical environment. In this study, we investigate different laser wavelengths (473 nm and 532 nm), sample substrates (glass, fused silica and calcium fluoride) and multivariate statistical methods in order to gain insight into how these various experimental parameters impact on the sensitivity and specificity of Raman cytology.
A Hybrid Semi-supervised Classification Scheme for Mining Multisource Geospatial Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vatsavai, Raju; Bhaduri, Budhendra L
2011-01-01
Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of large number of accurate training samples (10 to 30 |dimensions|), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, itmore » is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately there is no convenient multivariate statistical model that can be employed for mulitsource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on real datasets, and our new hybrid approach shows over 25 to 35% improvement in overall classification accuracy over conventional classification schemes.« less
Automatic classification of animal vocalizations
NASA Astrophysics Data System (ADS)
Clemins, Patrick J.
2005-11-01
Bioacoustics, the study of animal vocalizations, has begun to use increasingly sophisticated analysis techniques in recent years. Some common tasks in bioacoustics are repertoire determination, call detection, individual identification, stress detection, and behavior correlation. Each research study, however, uses a wide variety of different measured variables, called features, and classification systems to accomplish these tasks. The well-established field of human speech processing has developed a number of different techniques to perform many of the aforementioned bioacoustics tasks. Melfrequency cepstral coefficients (MFCCs) and perceptual linear prediction (PLP) coefficients are two popular feature sets. The hidden Markov model (HMM), a statistical model similar to a finite autonoma machine, is the most commonly used supervised classification model and is capable of modeling both temporal and spectral variations. This research designs a framework that applies models from human speech processing for bioacoustic analysis tasks. The development of the generalized perceptual linear prediction (gPLP) feature extraction model is one of the more important novel contributions of the framework. Perceptual information from the species under study can be incorporated into the gPLP feature extraction model to represent the vocalizations as the animals might perceive them. By including this perceptual information and modifying parameters of the HMM classification system, this framework can be applied to a wide range of species. The effectiveness of the framework is shown by analyzing African elephant and beluga whale vocalizations. The features extracted from the African elephant data are used as input to a supervised classification system and compared to results from traditional statistical tests. The gPLP features extracted from the beluga whale data are used in an unsupervised classification system and the results are compared to labels assigned by experts. The development of a framework from which to build animal vocalization classifiers will provide bioacoustics researchers with a consistent platform to analyze and classify vocalizations. A common framework will also allow studies to compare results across species and institutions. In addition, the use of automated classification techniques can speed analysis and uncover behavioral correlations not readily apparent using traditional techniques.
Na, X D; Zang, S Y; Wu, C S; Li, W L
2015-11-01
Knowledge of the spatial extent of forested wetlands is essential to many studies including wetland functioning assessment, greenhouse gas flux estimation, and wildlife suitable habitat identification. For discriminating forested wetlands from their adjacent land cover types, researchers have resorted to image analysis techniques applied to numerous remotely sensed data. While with some success, there is still no consensus on the optimal approaches for mapping forested wetlands. To address this problem, we examined two machine learning approaches, random forest (RF) and K-nearest neighbor (KNN) algorithms, and applied these two approaches to the framework of pixel-based and object-based classifications. The RF and KNN algorithms were constructed using predictors derived from Landsat 8 imagery, Radarsat-2 advanced synthetic aperture radar (SAR), and topographical indices. The results show that the objected-based classifications performed better than per-pixel classifications using the same algorithm (RF) in terms of overall accuracy and the difference of their kappa coefficients are statistically significant (p<0.01). There were noticeably omissions for forested and herbaceous wetlands based on the per-pixel classifications using the RF algorithm. As for the object-based image analysis, there were also statistically significant differences (p<0.01) of Kappa coefficient between results performed based on RF and KNN algorithms. The object-based classification using RF provided a more visually adequate distribution of interested land cover types, while the object classifications based on the KNN algorithm showed noticeably commissions for forested wetlands and omissions for agriculture land. This research proves that the object-based classification with RF using optical, radar, and topographical data improved the mapping accuracy of land covers and provided a feasible approach to discriminate the forested wetlands from the other land cover types in forestry area.
Longobardi, F; Ventrella, A; Bianco, A; Catucci, L; Cafagna, I; Gallo, V; Mastrorilli, P; Agostiano, A
2013-12-01
In this study, non-targeted (1)H NMR fingerprinting was used in combination with multivariate statistical techniques for the classification of Italian sweet cherries based on their different geographical origins (Emilia Romagna and Puglia). As classification techniques, Soft Independent Modelling of Class Analogy (SIMCA), Partial Least Squares Discriminant Analysis (PLS-DA), and Linear Discriminant Analysis (LDA) were carried out and the results were compared. For LDA, before performing a refined selection of the number/combination of variables, two different strategies for a preliminary reduction of the variable number were tested. The best average recognition and CV prediction abilities (both 100.0%) were obtained for all the LDA models, although PLS-DA also showed remarkable performances (94.6%). All the statistical models were validated by observing the prediction abilities with respect to an external set of cherry samples. The best result (94.9%) was obtained with LDA by performing a best subset selection procedure on a set of 30 principal components previously selected by a stepwise decorrelation. The metabolites that mostly contributed to the classification performances of such LDA model, were found to be malate, glucose, fructose, glutamine and succinate. Copyright © 2013 Elsevier Ltd. All rights reserved.
Gilbert, Fabian; Böhm, Dirk; Eden, Lars; Schmalzl, Jonas; Meffert, Rainer H; Köstler, Herbert; Weng, Andreas M; Ziegler, Dirk
2016-08-22
The Goutallier Classification is a semi quantitative classification system to determine the amount of fatty degeneration in rotator cuff muscles. Although initially proposed for axial computer tomography scans it is currently applied to magnet-resonance-imaging-scans. The role for its clinical use is controversial, as the reliability of the classification has been shown to be inconsistent. The purpose of this study was to compare the semi quantitative MRI-based Goutallier Classification applied by 5 different raters to experimental MR spectroscopic quantitative fat measurement in order to determine the correlation between this classification system and the true extent of fatty degeneration shown by spectroscopy. MRI-scans of 42 patients with rotator cuff tears were examined by 5 shoulder surgeons and were graduated according to the MRI-based Goutallier Classification proposed by Fuchs et al. Additionally the fat/water ratio was measured with MR spectroscopy using the experimental SPLASH technique. The semi quantitative grading according to the Goutallier Classification was statistically correlated with the quantitative measured fat/water ratio using Spearman's rank correlation. Statistical analysis of the data revealed only fair correlation of the Goutallier Classification system and the quantitative fat/water ratio with R = 0.35 (p < 0.05). By dichotomizing the scale the correlation was 0.72. The interobserver and intraobserver reliabilities were substantial with R = 0.62 and R = 0.74 (p < 0.01). The correlation between the semi quantitative MRI based Goutallier Classification system and MR spectroscopic fat measurement is weak. As an adequate estimation of fatty degeneration based on standard MRI may not be possible, quantitative methods need to be considered in order to increase diagnostic safety and thus provide patients with ideal care in regard to the amount of fatty degeneration. Spectroscopic MR measurement may increase the accuracy of the Goutallier classification and thus improve the prediction of clinical results after rotator cuff repair. However, these techniques are currently only available in an experimental setting.
Rule groupings in expert systems using nearest neighbour decision rules, and convex hulls
NASA Technical Reports Server (NTRS)
Anastasiadis, Stergios
1991-01-01
Expert System shells are lacking in many areas of software engineering. Large rule based systems are not semantically comprehensible, difficult to debug, and impossible to modify or validate. Partitioning a set of rules found in CLIPS (C Language Integrated Production System) into groups of rules which reflect the underlying semantic subdomains of the problem, will address adequately the concerns stated above. Techniques are introduced to structure a CLIPS rule base into groups of rules that inherently have common semantic information. The concepts involved are imported from the field of A.I., Pattern Recognition, and Statistical Inference. Techniques focus on the areas of feature selection, classification, and a criteria of how 'good' the classification technique is, based on Bayesian Decision Theory. A variety of distance metrics are discussed for measuring the 'closeness' of CLIPS rules and various Nearest Neighbor classification algorithms are described based on the above metric.
Trophic classification of selected Colorado lakes
NASA Technical Reports Server (NTRS)
Blackwell, R. J.; Boland, D. H. P.
1979-01-01
Multispectral scanner data, acquired over several Colorado lakes using LANDSAT-1 and aircraft, were used in conjunction with contact-sensed water quality data to determine the feasibility of assessing lacustrine trophic levels. A trophic state index was developed using contact-sensed data for several trophic indicators. Relationships between the digitally processed multispectral scanner data, several trophic indicators, and the trophic index were examined using a supervised multispectral classification technique and regression techniques. Statistically significant correlations exist between spectral bands, several of the trophic indicators and the trophic state index. Color-coded photomaps were generated which depict the spectral aspects of trophic state.
ADP of multispectral scanner data for land use mapping
NASA Technical Reports Server (NTRS)
Hoffer, R. M.
1971-01-01
The advantages and disadvantages of various remote sensing instrumentation and analysis techniques are reviewed. The use of multispectral scanner data and the automatic data processing techniques are considered. A computer-aided analysis system for remote sensor data is described with emphasis on the image display, statistics processor, wavelength band selection, classification processor, and results display. Advanced techniques in using spectral and temporal data are also considered.
Innovative use of self-organising maps (SOMs) in model validation.
NASA Astrophysics Data System (ADS)
Jolly, Ben; McDonald, Adrian; Coggins, Jack
2016-04-01
We present an innovative combination of techniques for validation of numerical weather prediction (NWP) output against both observations and reanalyses using two classification schemes, demonstrated by a validation of the operational NWP 'AMPS' (the Antarctic Mesoscale Prediction System). Historically, model validation techniques have centred on case studies or statistics at various time scales (yearly/seasonal/monthly). Within the past decade the latter technique has been expanded by the addition of classification schemes in place of time scales, allowing more precise analysis. Classifications are typically generated for either the model or the observations, then used to create composites for both which are compared. Our method creates and trains a single self-organising map (SOM) on both the model output and observations, which is then used to classify both datasets using the same class definitions. In addition to the standard statistics on class composites, we compare the classifications themselves between the model and the observations. To add further context to the area studied, we use the same techniques to compare the SOM classifications with regimes developed for another study to great effect. The AMPS validation study compares model output against surface observations from SNOWWEB and existing University of Wisconsin-Madison Antarctic Automatic Weather Stations (AWS) during two months over the austral summer of 2014-15. Twelve SOM classes were defined in a '4 x 3' pattern, trained on both model output and observations of 2 m wind components, then used to classify both training datasets. Simple statistics (correlation, bias and normalised root-mean-square-difference) computed for SOM class composites showed that AMPS performed well during extreme weather events, but less well during lighter winds and poorly during the more changeable conditions between either extreme. Comparison of the classification time-series showed that, while correlations were lower during lighter wind periods, AMPS actually forecast the existence of those periods well suggesting that the correlations may be unfairly low. Further investigation showed poor temporal alignment during more changeable conditions, highlighting problems AMPS has around the exact timing of events. There was also a tendency for AMPS to over-predict certain wind flow patterns at the expense of others. In order to gain a larger scale perspective, we compared our mesoscale SOM classification time-series with synoptic scale regimes developed by another study using ERA-Interim reanalysis output and k-means clustering. There was good alignment between the regimes and the observations classifications (observations/regimes), highlighting the effect of synoptic scale forcing on the area. However, comparing the alignment between observations/regimes and AMPS/regimes showed that AMPS may have problems accurately resolving the strength and location of cyclones in the Ross Sea to the north of the target area.
Statistical classification techniques for engineering and climatic data samples
NASA Technical Reports Server (NTRS)
Temple, E. C.; Shipman, J. R.
1981-01-01
Fisher's sample linear discriminant function is modified through an appropriate alteration of the common sample variance-covariance matrix. The alteration consists of adding nonnegative values to the eigenvalues of the sample variance covariance matrix. The desired results of this modification is to increase the number of correct classifications by the new linear discriminant function over Fisher's function. This study is limited to the two-group discriminant problem.
Sleep staging with movement-related signals.
Jansen, B H; Shankar, K
1993-05-01
Body movement related signals (i.e., activity due to postural changes and the ballistocardiac effort) were recorded from six normal volunteers using the static-charge-sensitive bed (SCSB). Visual sleep staging was performed on the basis of simultaneously recorded EEG, EMG and EOG signals. A statistical classification technique was used to determine if reliable sleep staging could be performed using only the SCSB signal. A classification rate of between 52% and 75% was obtained for sleep staging in the five conventional sleep stages and the awake state. These rates improved from 78% to 89% for classification between awake, REM and non-REM sleep and from 86% to 98% for awake versus asleep classification.
NASA Technical Reports Server (NTRS)
Dixon, C. M.
1981-01-01
Land cover information derived from LANDSAT is being utilized by Piedmont Planning District Commission located in the State of Virginia. Progress to date is reported on a level one land cover classification map being produced with nine categories. The nine categories of classification are defined. The computer compatible tape selection is presented. Two unsupervised classifications were done, with 50 and 70 classes respectively. Twenty-eight spectral classes were developed using the supervised technique, employing actual ground truth training sites. The accuracy of the unsupervised classifications are estimated through comparison with local county statistics and with an actual pixel count of LANDSAT information compared to ground truth.
ERIC Educational Resources Information Center
Hayashi, Atsuhiro
Both the Rule Space Method (RSM) and the Neural Network Model (NNM) are techniques of statistical pattern recognition and classification approaches developed for applications from different fields. RSM was developed in the domain of educational statistics. It started from the use of an incidence matrix Q that characterizes the underlying cognitive…
A hybrid sensing approach for pure and adulterated honey classification.
Subari, Norazian; Mohamad Saleh, Junita; Md Shakaff, Ali Yeon; Zakaria, Ammar
2012-10-17
This paper presents a comparison between data from single modality and fusion methods to classify Tualang honey as pure or adulterated using Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) statistical classification approaches. Ten different brands of certified pure Tualang honey were obtained throughout peninsular Malaysia and Sumatera, Indonesia. Various concentrations of two types of sugar solution (beet and cane sugar) were used in this investigation to create honey samples of 20%, 40%, 60% and 80% adulteration concentrations. Honey data extracted from an electronic nose (e-nose) and Fourier Transform Infrared Spectroscopy (FTIR) were gathered, analyzed and compared based on fusion methods. Visual observation of classification plots revealed that the PCA approach able to distinct pure and adulterated honey samples better than the LDA technique. Overall, the validated classification results based on FTIR data (88.0%) gave higher classification accuracy than e-nose data (76.5%) using the LDA technique. Honey classification based on normalized low-level and intermediate-level FTIR and e-nose fusion data scored classification accuracies of 92.2% and 88.7%, respectively using the Stepwise LDA method. The results suggested that pure and adulterated honey samples were better classified using FTIR and e-nose fusion data than single modality data.
15 CFR 30.61 - Statistical classification schedules.
Code of Federal Regulations, 2011 CFR
2011-01-01
... 15 Commerce and Foreign Trade 1 2011-01-01 2011-01-01 false Statistical classification schedules... § 30.61 Statistical classification schedules. The following statistical classification schedules are....census.gov/trade. (a) Schedule B—Statistical Classification for Domestic and Foreign Commodities Exported...
15 CFR 30.61 - Statistical classification schedules.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Statistical classification schedules... § 30.61 Statistical classification schedules. The following statistical classification schedules are....census.gov/trade. (a) Schedule B—Statistical Classification for Domestic and Foreign Commodities Exported...
Shameem, K M Muhammed; Choudhari, Khoobaram S; Bankapur, Aseefhali; Kulkarni, Suresh D; Unnikrishnan, V K; George, Sajan D; Kartha, V B; Santhosh, C
2017-05-01
Classification of plastics is of great importance in the recycling industry as the littering of plastic wastes increases day by day as a result of its extensive use. In this paper, we demonstrate the efficacy of a combined laser-induced breakdown spectroscopy (LIBS)-Raman system for the rapid identification and classification of post-consumer plastics. The atomic information and molecular information of polyethylene terephthalate, polyethylene, polypropylene, and polystyrene were studied using plasma emission spectra and scattered signal obtained in the LIBS and Raman technique, respectively. The collected spectral features of the samples were analyzed using statistical tools (principal component analysis, Mahalanobis distance) to categorize the plastics. The analyses of the data clearly show that elemental information and molecular information obtained from these techniques are efficient for classification of plastics. In addition, the molecular information collected via Raman spectroscopy exhibits clearly distinct features for the transparent plastics (100% discrimination), whereas the LIBS technique shows better spectral feature differences for the colored samples. The study shows that the information obtained from these complementary techniques allows the complete classification of the plastic samples, irrespective of the color or additives. This work further throws some light on the fact that the potential limitations of any of these techniques for sample identification can be overcome by the complementarity of these two techniques. Graphical Abstract ᅟ.
NASA Astrophysics Data System (ADS)
Wan, Xiaoqing; Zhao, Chunhui; Wang, Yanchun; Liu, Wu
2017-11-01
This paper proposes a novel classification paradigm for hyperspectral image (HSI) using feature-level fusion and deep learning-based methodologies. Operation is carried out in three main steps. First, during a pre-processing stage, wave atoms are introduced into bilateral filter to smooth HSI, and this strategy can effectively attenuate noise and restore texture information. Meanwhile, high quality spectral-spatial features can be extracted from HSI by taking geometric closeness and photometric similarity among pixels into consideration simultaneously. Second, higher order statistics techniques are firstly introduced into hyperspectral data classification to characterize the phase correlations of spectral curves. Third, multifractal spectrum features are extracted to characterize the singularities and self-similarities of spectra shapes. To this end, a feature-level fusion is applied to the extracted spectral-spatial features along with higher order statistics and multifractal spectrum features. Finally, stacked sparse autoencoder is utilized to learn more abstract and invariant high-level features from the multiple feature sets, and then random forest classifier is employed to perform supervised fine-tuning and classification. Experimental results on two real hyperspectral data sets demonstrate that the proposed method outperforms some traditional alternatives.
Nineteen hundred seventy three significant accomplishments. [Landsat satellite data applications
NASA Technical Reports Server (NTRS)
1974-01-01
Data collected by the Skylab remote sensing satellites was used to develop applications techniques and to combine automatic data classification with statistical clustering methods. Continuing research was concentrated in the correlation and registration of data products and in the definition of the atmospheric effects on remote sensing. The causes of errors encountered in the automated classification of agricultural data are identified. Other applications in forestry, geography, environmental geology, and land use are discussed.
Motor Oil Classification using Color Histograms and Pattern Recognition Techniques.
Ahmadi, Shiva; Mani-Varnosfaderani, Ahmad; Habibi, Biuck
2018-04-20
Motor oil classification is important for quality control and the identification of oil adulteration. In thiswork, we propose a simple, rapid, inexpensive and nondestructive approach based on image analysis and pattern recognition techniques for the classification of nine different types of motor oils according to their corresponding color histograms. For this, we applied color histogram in different color spaces such as red green blue (RGB), grayscale, and hue saturation intensity (HSI) in order to extract features that can help with the classification procedure. These color histograms and their combinations were used as input for model development and then were statistically evaluated by using linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and support vector machine (SVM) techniques. Here, two common solutions for solving a multiclass classification problem were applied: (1) transformation to binary classification problem using a one-against-all (OAA) approach and (2) extension from binary classifiers to a single globally optimized multilabel classification model. In the OAA strategy, LDA, QDA, and SVM reached up to 97% in terms of accuracy, sensitivity, and specificity for both the training and test sets. In extension from binary case, despite good performances by the SVM classification model, QDA and LDA provided better results up to 92% for RGB-grayscale-HSI color histograms and up to 93% for the HSI color map, respectively. In order to reduce the numbers of independent variables for modeling, a principle component analysis algorithm was used. Our results suggest that the proposed method is promising for the identification and classification of different types of motor oils.
Semi-Automated Classification of Seafloor Data Collected on the Delmarva Inner Shelf
NASA Astrophysics Data System (ADS)
Sweeney, E. M.; Pendleton, E. A.; Brothers, L. L.; Mahmud, A.; Thieler, E. R.
2017-12-01
We tested automated classification methods on acoustic bathymetry and backscatter data collected by the U.S. Geological Survey (USGS) and National Oceanic and Atmospheric Administration (NOAA) on the Delmarva inner continental shelf to efficiently and objectively identify sediment texture and geomorphology. Automated classification techniques are generally less subjective and take significantly less time than manual classification methods. We used a semi-automated process combining unsupervised and supervised classification techniques to characterize seafloor based on bathymetric slope and relative backscatter intensity. Statistical comparison of our automated classification results with those of a manual classification conducted on a subset of the acoustic imagery indicates that our automated method was highly accurate (95% total accuracy and 93% Kappa). Our methods resolve sediment ridges, zones of flat seafloor and areas of high and low backscatter. We compared our classification scheme with mean grain size statistics of samples collected in the study area and found that strong correlations between backscatter intensity and sediment texture exist. High backscatter zones are associated with the presence of gravel and shells mixed with sand, and low backscatter areas are primarily clean sand or sand mixed with mud. Slope classes further elucidate textural and geomorphologic differences in the seafloor, such that steep slopes (>0.35°) with high backscatter are most often associated with the updrift side of sand ridges and bedforms, whereas low slope with high backscatter correspond to coarse lag or shell deposits. Low backscatter and high slopes are most often found on the downdrift side of ridges and bedforms, and low backscatter and low slopes identify swale areas and sand sheets. We found that poor acoustic data quality was the most significant cause of inaccurate classification results, which required additional user input to mitigate. Our method worked well along the primarily sandy Delmarva inner continental shelf, and outlines a method that can be used to efficiently and consistently produce surficial geologic interpretations of the seafloor from ground-truthed geophysical or hydrographic data.
NASA Technical Reports Server (NTRS)
Leake, M. A.
1982-01-01
Planetary imagery techniques, errors in measurement or degradation assignment, and statistical formulas are presented with respect to cratering data. Base map photograph preparation, measurement of crater diameters and sampled area, and instruments used are discussed. Possible uncertainties, such as Sun angle, scale factors, degradation classification, and biases in crater recognition are discussed. The mathematical formulas used in crater statistics are presented.
Tahir, Muhammad; Jan, Bismillah; Hayat, Maqsood; Shah, Shakir Ullah; Amin, Muhammad
2018-04-01
Discriminative and informative feature extraction is the core requirement for accurate and efficient classification of protein subcellular localization images so that drug development could be more effective. The objective of this paper is to propose a novel modification in the Threshold Adjacency Statistics technique and enhance its discriminative power. In this work, we utilized Threshold Adjacency Statistics from a novel perspective to enhance its discrimination power and efficiency. In this connection, we utilized seven threshold ranges to produce seven distinct feature spaces, which are then used to train seven SVMs. The final prediction is obtained through the majority voting scheme. The proposed ETAS-SubLoc system is tested on two benchmark datasets using 5-fold cross-validation technique. We observed that our proposed novel utilization of TAS technique has improved the discriminative power of the classifier. The ETAS-SubLoc system has achieved 99.2% accuracy, 99.3% sensitivity and 99.1% specificity for Endogenous dataset outperforming the classical Threshold Adjacency Statistics technique. Similarly, 91.8% accuracy, 96.3% sensitivity and 91.6% specificity values are achieved for Transfected dataset. Simulation results validated the effectiveness of ETAS-SubLoc that provides superior prediction performance compared to the existing technique. The proposed methodology aims at providing support to pharmaceutical industry as well as research community towards better drug designing and innovation in the fields of bioinformatics and computational biology. The implementation code for replicating the experiments presented in this paper is available at: https://drive.google.com/file/d/0B7IyGPObWbSqRTRMcXI2bG5CZWs/view?usp=sharing. Copyright © 2018 Elsevier B.V. All rights reserved.
Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas
2014-01-01
Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727
Supervised target detection in hyperspectral images using one-class Fukunaga-Koontz Transform
NASA Astrophysics Data System (ADS)
Binol, Hamidullah; Bal, Abdullah
2016-05-01
A novel hyperspectral target detection technique based on Fukunaga-Koontz transform (FKT) is presented. FKT offers significant properties for feature selection and ordering. However, it can only be used to solve multi-pattern classification problems. Target detection may be considered as a two-class classification problem, i.e., target versus background clutter. Nevertheless, background clutter typically contains different types of materials. That's why; target detection techniques are different than classification methods by way of modeling clutter. To avoid the modeling of the background clutter, we have improved one-class FKT (OC-FKT) for target detection. The statistical properties of target training samples are used to define tunnel-like boundary of the target class. Non-target samples are then created synthetically as to be outside of the boundary. Thus, only limited target samples become adequate for training of FKT. The hyperspectral image experiments confirm that the proposed OC-FKT technique provides an effective means for target detection.
PROTAX-Sound: A probabilistic framework for automated animal sound identification
Somervuo, Panu; Ovaskainen, Otso
2017-01-01
Autonomous audio recording is stimulating new field in bioacoustics, with a great promise for conducting cost-effective species surveys. One major current challenge is the lack of reliable classifiers capable of multi-species identification. We present PROTAX-Sound, a statistical framework to perform probabilistic classification of animal sounds. PROTAX-Sound is based on a multinomial regression model, and it can utilize as predictors any kind of sound features or classifications produced by other existing algorithms. PROTAX-Sound combines audio and image processing techniques to scan environmental audio files. It identifies regions of interest (a segment of the audio file that contains a vocalization to be classified), extracts acoustic features from them and compares with samples in a reference database. The output of PROTAX-Sound is the probabilistic classification of each vocalization, including the possibility that it represents species not present in the reference database. We demonstrate the performance of PROTAX-Sound by classifying audio from a species-rich case study of tropical birds. The best performing classifier achieved 68% classification accuracy for 200 bird species. PROTAX-Sound improves the classification power of current techniques by combining information from multiple classifiers in a manner that yields calibrated classification probabilities. PMID:28863178
PROTAX-Sound: A probabilistic framework for automated animal sound identification.
de Camargo, Ulisses Moliterno; Somervuo, Panu; Ovaskainen, Otso
2017-01-01
Autonomous audio recording is stimulating new field in bioacoustics, with a great promise for conducting cost-effective species surveys. One major current challenge is the lack of reliable classifiers capable of multi-species identification. We present PROTAX-Sound, a statistical framework to perform probabilistic classification of animal sounds. PROTAX-Sound is based on a multinomial regression model, and it can utilize as predictors any kind of sound features or classifications produced by other existing algorithms. PROTAX-Sound combines audio and image processing techniques to scan environmental audio files. It identifies regions of interest (a segment of the audio file that contains a vocalization to be classified), extracts acoustic features from them and compares with samples in a reference database. The output of PROTAX-Sound is the probabilistic classification of each vocalization, including the possibility that it represents species not present in the reference database. We demonstrate the performance of PROTAX-Sound by classifying audio from a species-rich case study of tropical birds. The best performing classifier achieved 68% classification accuracy for 200 bird species. PROTAX-Sound improves the classification power of current techniques by combining information from multiple classifiers in a manner that yields calibrated classification probabilities.
Parallel and Scalable Clustering and Classification for Big Data in Geosciences
NASA Astrophysics Data System (ADS)
Riedel, M.
2015-12-01
Machine learning, data mining, and statistical computing are common techniques to perform analysis in earth sciences. This contribution will focus on two concrete and widely used data analytics methods suitable to analyse 'big data' in the context of geoscience use cases: clustering and classification. From the broad class of available clustering methods we focus on the density-based spatial clustering of appliactions with noise (DBSCAN) algorithm that enables the identification of outliers or interesting anomalies. A new open source parallel and scalable DBSCAN implementation will be discussed in the light of a scientific use case that detects water mixing events in the Koljoefjords. The second technique we cover is classification, with a focus set on the support vector machines algorithm (SVMs), as one of the best out-of-the-box classification algorithm. A parallel and scalable SVM implementation will be discussed in the light of a scientific use case in the field of remote sensing with 52 different classes of land cover types.
NASA Astrophysics Data System (ADS)
Mandal, Shyamapada; Santhi, B.; Sridhar, S.; Vinolia, K.; Swaminathan, P.
2017-06-01
In this paper, an online fault detection and classification method is proposed for thermocouples used in nuclear power plants. In the proposed method, the fault data are detected by the classification method, which classifies the fault data from the normal data. Deep belief network (DBN), a technique for deep learning, is applied to classify the fault data. The DBN has a multilayer feature extraction scheme, which is highly sensitive to a small variation of data. Since the classification method is unable to detect the faulty sensor; therefore, a technique is proposed to identify the faulty sensor from the fault data. Finally, the composite statistical hypothesis test, namely generalized likelihood ratio test, is applied to compute the fault pattern of the faulty sensor signal based on the magnitude of the fault. The performance of the proposed method is validated by field data obtained from thermocouple sensors of the fast breeder test reactor.
Modelling spruce bark beetle infestation probability
Paulius Zolubas; Jose Negron; A. Steven Munson
2009-01-01
Spruce bark beetle (Ips typographus L.) risk model, based on pure Norway spruce (Picea abies Karst.) stand characteristics in experimental and control plots was developed using classification and regression tree statistical technique under endemic pest population density. The most significant variable in spruce bark beetle...
A Hybrid Sensing Approach for Pure and Adulterated Honey Classification
Subari, Norazian; Saleh, Junita Mohamad; Shakaff, Ali Yeon Md; Zakaria, Ammar
2012-01-01
This paper presents a comparison between data from single modality and fusion methods to classify Tualang honey as pure or adulterated using Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) statistical classification approaches. Ten different brands of certified pure Tualang honey were obtained throughout peninsular Malaysia and Sumatera, Indonesia. Various concentrations of two types of sugar solution (beet and cane sugar) were used in this investigation to create honey samples of 20%, 40%, 60% and 80% adulteration concentrations. Honey data extracted from an electronic nose (e-nose) and Fourier Transform Infrared Spectroscopy (FTIR) were gathered, analyzed and compared based on fusion methods. Visual observation of classification plots revealed that the PCA approach able to distinct pure and adulterated honey samples better than the LDA technique. Overall, the validated classification results based on FTIR data (88.0%) gave higher classification accuracy than e-nose data (76.5%) using the LDA technique. Honey classification based on normalized low-level and intermediate-level FTIR and e-nose fusion data scored classification accuracies of 92.2% and 88.7%, respectively using the Stepwise LDA method. The results suggested that pure and adulterated honey samples were better classified using FTIR and e-nose fusion data than single modality data. PMID:23202033
NASA Astrophysics Data System (ADS)
Al-Doasari, Ahmad E.
The 1991 Gulf War caused massive environmental damage in Kuwait. Deposition of oil and soot droplets from hundreds of burning oil-wells created a layer of tarcrete on the desert surface covering over 900 km2. This research investigates the spatial change in the tarcrete extent from 1991 to 1998 using Landsat Thematic Mapper (TM) imagery and statistical modeling techniques. The pixel structure of TM data allows the spatial analysis of the change in tarcrete extent to be conducted at the pixel (cell) level within a geographical information system (GIS). There are two components to this research. The first is a comparison of three remote sensing classification techniques used to map the tarcrete layer. The second is a spatial-temporal analysis and simulation of tarcrete changes through time. The analysis focuses on an area of 389 km2 located south of the Al-Burgan oil field. Five TM images acquired in 1991, 1993, 1994, 1995, and 1998 were geometrically and atmospherically corrected. These images were classified into six classes: oil lakes; heavy, intermediate, light, and traces of tarcrete; and sand. The classification methods tested were unsupervised, supervised, and neural network supervised (fuzzy ARTMAP). Field data of tarcrete characteristics were collected to support the classification process and to evaluate the classification accuracies. Overall, the neural network method is more accurate (60 percent) than the other two methods; both the unsupervised and the supervised classification accuracy assessments resulted in 46 percent accuracy. The five classifications were used in a lagged autologistic model to analyze the spatial changes of the tarcrete through time. The autologistic model correctly identified overall tarcrete contraction between 1991--1993 and 1995--1998. However, tarcrete contraction between 1993--1994 and 1994--1995 was less well marked, in part because of classification errors in the maps from these time periods. Initial simulations of tarcrete contraction with a cellular automaton model were not very successful. However, more accurate classifications could improve the simulations. This study illustrates how an empirical investigation using satellite images, field data, GIS, and spatial statistics can simulate dynamic land-cover change through the use of a discrete statistical and cellular automaton model.
MANCOVA for one way classification with homogeneity of regression coefficient vectors
NASA Astrophysics Data System (ADS)
Mokesh Rayalu, G.; Ravisankar, J.; Mythili, G. Y.
2017-11-01
The MANOVA and MANCOVA are the extensions of the univariate ANOVA and ANCOVA techniques to multidimensional or vector valued observations. The assumption of a Gaussian distribution has been replaced with the Multivariate Gaussian distribution for the vectors data and residual term variables in the statistical models of these techniques. The objective of MANCOVA is to determine if there are statistically reliable mean differences that can be demonstrated between groups later modifying the newly created variable. When randomization assignment of samples or subjects to groups is not possible, multivariate analysis of covariance (MANCOVA) provides statistical matching of groups by adjusting dependent variables as if all subjects scored the same on the covariates. In this research article, an extension has been made to the MANCOVA technique with more number of covariates and homogeneity of regression coefficient vectors is also tested.
NASA Astrophysics Data System (ADS)
Crosta, Giovanni Franco; Pan, Yong-Le; Aptowicz, Kevin B.; Casati, Caterina; Pinnick, Ronald G.; Chang, Richard K.; Videen, Gorden W.
2013-12-01
Measurement of two-dimensional angle-resolved optical scattering (TAOS) patterns is an attractive technique for detecting and characterizing micron-sized airborne particles. In general, the interpretation of these patterns and the retrieval of the particle refractive index, shape or size alone, are difficult problems. By reformulating the problem in statistical learning terms, a solution is proposed herewith: rather than identifying airborne particles from their scattering patterns, TAOS patterns themselves are classified through a learning machine, where feature extraction interacts with multivariate statistical analysis. Feature extraction relies on spectrum enhancement, which includes the discrete cosine FOURIER transform and non-linear operations. Multivariate statistical analysis includes computation of the principal components and supervised training, based on the maximization of a suitable figure of merit. All algorithms have been combined together to analyze TAOS patterns, organize feature vectors, design classification experiments, carry out supervised training, assign unknown patterns to classes, and fuse information from different training and recognition experiments. The algorithms have been tested on a data set with more than 3000 TAOS patterns. The parameters that control the algorithms at different stages have been allowed to vary within suitable bounds and are optimized to some extent. Classification has been targeted at discriminating aerosolized Bacillus subtilis particles, a simulant of anthrax, from atmospheric aerosol particles and interfering particles, like diesel soot. By assuming that all training and recognition patterns come from the respective reference materials only, the most satisfactory classification result corresponds to 20% false negatives from B. subtilis particles and <11% false positives from all other aerosol particles. The most effective operations have consisted of thresholding TAOS patterns in order to reject defective ones, and forming training sets from three or four pattern classes. The presented automated classification method may be adapted into a real-time operation technique, capable of detecting and characterizing micron-sized airborne particles.
Rahman, Md Mostafizur; Fattah, Shaikh Anowarul
2017-01-01
In view of recent increase of brain computer interface (BCI) based applications, the importance of efficient classification of various mental tasks has increased prodigiously nowadays. In order to obtain effective classification, efficient feature extraction scheme is necessary, for which, in the proposed method, the interchannel relationship among electroencephalogram (EEG) data is utilized. It is expected that the correlation obtained from different combination of channels will be different for different mental tasks, which can be exploited to extract distinctive feature. The empirical mode decomposition (EMD) technique is employed on a test EEG signal obtained from a channel, which provides a number of intrinsic mode functions (IMFs), and correlation coefficient is extracted from interchannel IMF data. Simultaneously, different statistical features are also obtained from each IMF. Finally, the feature matrix is formed utilizing interchannel correlation features and intrachannel statistical features of the selected IMFs of EEG signal. Different kernels of the support vector machine (SVM) classifier are used to carry out the classification task. An EEG dataset containing ten different combinations of five different mental tasks is utilized to demonstrate the classification performance and a very high level of accuracy is achieved by the proposed scheme compared to existing methods.
Signal Detection Techniques for Diagnostic Monitoring of Space Shuttle Main Engine Turbomachinery
NASA Technical Reports Server (NTRS)
Coffin, Thomas; Jong, Jen-Yi
1986-01-01
An investigation to develop, implement, and evaluate signal analysis techniques for the detection and classification of incipient mechanical failures in turbomachinery is reviewed. A brief description of the Space Shuttle Main Engine (SSME) test/measurement program is presented. Signal analysis techniques available to describe dynamic measurement characteristics are reviewed. Time domain and spectral methods are described, and statistical classification in terms of moments is discussed. Several of these waveform analysis techniques have been implemented on a computer and applied to dynamc signals. A laboratory evaluation of the methods with respect to signal detection capability is described. A unique coherence function (the hyper-coherence) was developed through the course of this investigation, which appears promising as a diagnostic tool. This technique and several other non-linear methods of signal analysis are presented and illustrated by application. Software for application of these techniques has been installed on the signal processing system at the NASA/MSFC Systems Dynamics Laboratory.
High-accuracy user identification using EEG biometrics.
Koike-Akino, Toshiaki; Mahajan, Ruhi; Marks, Tim K; Ye Wang; Watanabe, Shinji; Tuzel, Oncel; Orlik, Philip
2016-08-01
We analyze brain waves acquired through a consumer-grade EEG device to investigate its capabilities for user identification and authentication. First, we show the statistical significance of the P300 component in event-related potential (ERP) data from 14-channel EEGs across 25 subjects. We then apply a variety of machine learning techniques, comparing the user identification performance of various different combinations of a dimensionality reduction technique followed by a classification algorithm. Experimental results show that an identification accuracy of 72% can be achieved using only a single 800 ms ERP epoch. In addition, we demonstrate that the user identification accuracy can be significantly improved to more than 96.7% by joint classification of multiple epochs.
Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw
2006-01-01
We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.
A statistical approach to combining multisource information in one-class classifiers
Simonson, Katherine M.; Derek West, R.; Hansen, Ross L.; ...
2017-06-08
A new method is introduced in this paper for combining information from multiple sources to support one-class classification. The contributing sources may represent measurements taken by different sensors of the same physical entity, repeated measurements by a single sensor, or numerous features computed from a single measured image or signal. The approach utilizes the theory of statistical hypothesis testing, and applies Fisher's technique for combining p-values, modified to handle nonindependent sources. Classifier outputs take the form of fused p-values, which may be used to gauge the consistency of unknown entities with one or more class hypotheses. The approach enables rigorousmore » assessment of classification uncertainties, and allows for traceability of classifier decisions back to the constituent sources, both of which are important for high-consequence decision support. Application of the technique is illustrated in two challenge problems, one for skin segmentation and the other for terrain labeling. Finally, the method is seen to be particularly effective for relatively small training samples.« less
A statistical approach to combining multisource information in one-class classifiers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simonson, Katherine M.; Derek West, R.; Hansen, Ross L.
A new method is introduced in this paper for combining information from multiple sources to support one-class classification. The contributing sources may represent measurements taken by different sensors of the same physical entity, repeated measurements by a single sensor, or numerous features computed from a single measured image or signal. The approach utilizes the theory of statistical hypothesis testing, and applies Fisher's technique for combining p-values, modified to handle nonindependent sources. Classifier outputs take the form of fused p-values, which may be used to gauge the consistency of unknown entities with one or more class hypotheses. The approach enables rigorousmore » assessment of classification uncertainties, and allows for traceability of classifier decisions back to the constituent sources, both of which are important for high-consequence decision support. Application of the technique is illustrated in two challenge problems, one for skin segmentation and the other for terrain labeling. Finally, the method is seen to be particularly effective for relatively small training samples.« less
Aircraft target detection algorithm based on high resolution spaceborne SAR imagery
NASA Astrophysics Data System (ADS)
Zhang, Hui; Hao, Mengxi; Zhang, Cong; Su, Xiaojing
2018-03-01
In this paper, an image classification algorithm for airport area is proposed, which based on the statistical features of synthetic aperture radar (SAR) images and the spatial information of pixels. The algorithm combines Gamma mixture model and MRF. The algorithm using Gamma mixture model to obtain the initial classification result. Pixel space correlation based on the classification results are optimized by the MRF technique. Additionally, morphology methods are employed to extract airport (ROI) region where the suspected aircraft target samples are clarified to reduce the false alarm and increase the detection performance. Finally, this paper presents the plane target detection, which have been verified by simulation test.
Territories typification technique with use of statistical models
NASA Astrophysics Data System (ADS)
Galkin, V. I.; Rastegaev, A. V.; Seredin, V. V.; Andrianov, A. V.
2018-05-01
Territories typification is required for solution of many problems. The results of geological zoning received by means of various methods do not always agree. That is why the main goal of the research given is to develop a technique of obtaining a multidimensional standard classified indicator for geological zoning. In the course of the research, the probabilistic approach was used. In order to increase the reliability of geological information classification, the authors suggest using complex multidimensional probabilistic indicator P K as a criterion of the classification. The second criterion chosen is multidimensional standard classified indicator Z. These can serve as characteristics of classification in geological-engineering zoning. Above mentioned indicators P K and Z are in good correlation. Correlation coefficient values for the entire territory regardless of structural solidity equal r = 0.95 so each indicator can be used in geological-engineering zoning. The method suggested has been tested and the schematic map of zoning has been drawn.
NASA Technical Reports Server (NTRS)
Jensen, John R.; Hodgson, Michael E.; Mackey, Halkard E., Jr.; Krabill, William
1987-01-01
Wetlands in a portion of the Savannah River swamp forest, the Steel Creek Delta, were mapped using April 26, 1985 high-resolution aircraft multispectral scanner (MSS) data. Due to the complex spectral characteristics of the wetland vegetation, it was necessary to implement several techniques in the classification of the MSS imagery of the Steel Creek Delta. In particular, when performing unsupervised classification, an iterative cluster busting technique was used which simplified the cluster labeling process. In addition to the MSS data, light detecting and ranging (LIDAR) data were acquired by National Aeronautics and Space Administration (NASA) personnel along two flightlines over the Steel Creek Delta. These data were registered with the wetland classification map and correlated. Statistical analyses demonstrated that the laser derived canopy height information was significantly correlated with the Steel Creek Delta wetland classes encountered along the profiling transect of the LIDAR data.
Gemovic, Branislava; Perovic, Vladimir; Glisic, Sanja; Veljkovic, Nevena
2013-01-01
There are more than 500 amino acid substitutions in each human genome, and bioinformatics tools irreplaceably contribute to determination of their functional effects. We have developed feature-based algorithm for the detection of mutations outside conserved functional domains (CFDs) and compared its classification efficacy with the most commonly used phylogeny-based tools, PolyPhen-2 and SIFT. The new algorithm is based on the informational spectrum method (ISM), a feature-based technique, and statistical analysis. Our dataset contained neutral polymorphisms and mutations associated with myeloid malignancies from epigenetic regulators ASXL1, DNMT3A, EZH2, and TET2. PolyPhen-2 and SIFT had significantly lower accuracies in predicting the effects of amino acid substitutions outside CFDs than expected, with especially low sensitivity. On the other hand, only ISM algorithm showed statistically significant classification of these sequences. It outperformed PolyPhen-2 and SIFT by 15% and 13%, respectively. These results suggest that feature-based methods, like ISM, are more suitable for the classification of amino acid substitutions outside CFDs than phylogeny-based tools.
An automated approach to the design of decision tree classifiers
NASA Technical Reports Server (NTRS)
Argentiero, P.; Chin, P.; Beaudet, P.
1980-01-01
The classification of large dimensional data sets arising from the merging of remote sensing data with more traditional forms of ancillary data is considered. Decision tree classification, a popular approach to the problem, is characterized by the property that samples are subjected to a sequence of decision rules before they are assigned to a unique class. An automated technique for effective decision tree design which relies only on apriori statistics is presented. This procedure utilizes a set of two dimensional canonical transforms and Bayes table look-up decision rules. An optimal design at each node is derived based on the associated decision table. A procedure for computing the global probability of correct classfication is also provided. An example is given in which class statistics obtained from an actual LANDSAT scene are used as input to the program. The resulting decision tree design has an associated probability of correct classification of .76 compared to the theoretically optimum .79 probability of correct classification associated with a full dimensional Bayes classifier. Recommendations for future research are included.
Statistical Emulator for Expensive Classification Simulators
NASA Technical Reports Server (NTRS)
Ross, Jerret; Samareh, Jamshid A.
2016-01-01
Expensive simulators prevent any kind of meaningful analysis to be performed on the phenomena they model. To get around this problem the concept of using a statistical emulator as a surrogate representation of the simulator was introduced in the 1980's. Presently, simulators have become more and more complex and as a result running a single example on these simulators is very expensive and can take days to weeks or even months. Many new techniques have been introduced, termed criteria, which sequentially select the next best (most informative to the emulator) point that should be run on the simulator. These criteria methods allow for the creation of an emulator with only a small number of simulator runs. We follow and extend this framework to expensive classification simulators.
Camera-Model Identification Using Markovian Transition Probability Matrix
NASA Astrophysics Data System (ADS)
Xu, Guanshuo; Gao, Shang; Shi, Yun Qing; Hu, Ruimin; Su, Wei
Detecting the (brands and) models of digital cameras from given digital images has become a popular research topic in the field of digital forensics. As most of images are JPEG compressed before they are output from cameras, we propose to use an effective image statistical model to characterize the difference JPEG 2-D arrays of Y and Cb components from the JPEG images taken by various camera models. Specifically, the transition probability matrices derived from four different directional Markov processes applied to the image difference JPEG 2-D arrays are used to identify statistical difference caused by image formation pipelines inside different camera models. All elements of the transition probability matrices, after a thresholding technique, are directly used as features for classification purpose. Multi-class support vector machines (SVM) are used as the classification tool. The effectiveness of our proposed statistical model is demonstrated by large-scale experimental results.
Solt, Illés; Tikk, Domonkos; Gál, Viktor; Kardkovács, Zsolt T.
2009-01-01
Objective Automated and disease-specific classification of textual clinical discharge summaries is of great importance in human life science, as it helps physicians to make medical studies by providing statistically relevant data for analysis. This can be further facilitated if, at the labeling of discharge summaries, semantic labels are also extracted from text, such as whether a given disease is present, absent, questionable in a patient, or is unmentioned in the document. The authors present a classification technique that successfully solves the semantic classification task. Design The authors introduce a context-aware rule-based semantic classification technique for use on clinical discharge summaries. The classification is performed in subsequent steps. First, some misleading parts are removed from the text; then the text is partitioned into positive, negative, and uncertain context segments, then a sequence of binary classifiers is applied to assign the appropriate semantic labels. Measurement For evaluation the authors used the documents of the i2b2 Obesity Challenge and adopted its evaluation measures: F1-macro and F1-micro for measurements. Results On the two subtasks of the Obesity Challenge (textual and intuitive classification) the system performed very well, and achieved a F1-macro = 0.80 for the textual and F1-macro = 0.67 for the intuitive tasks, and obtained second place at the textual and first place at the intuitive subtasks of the challenge. Conclusions The authors show in the paper that a simple rule-based classifier can tackle the semantic classification task more successfully than machine learning techniques, if the training data are limited and some semantic labels are very sparse. PMID:19390101
Artillery/mortar type classification based on detected acoustic transients
NASA Astrophysics Data System (ADS)
Morcos, Amir; Grasing, David; Desai, Sachi
2008-04-01
Feature extraction methods based on the statistical analysis of the change in event pressure levels over a period and the level of ambient pressure excitation facilitate the development of a robust classification algorithm. The features reliably discriminates mortar and artillery variants via acoustic signals produced during the launch events. Utilizing acoustic sensors to exploit the sound waveform generated from the blast for the identification of mortar and artillery variants as type A, etcetera through analysis of the waveform. Distinct characteristics arise within the different mortar/artillery variants because varying HE mortar payloads and related charges emphasize varying size events at launch. The waveform holds various harmonic properties distinct to a given mortar/artillery variant that through advanced signal processing and data mining techniques can employed to classify a given type. The skewness and other statistical processing techniques are used to extract the predominant components from the acoustic signatures at ranges exceeding 3000m. Exploiting these techniques will help develop a feature set highly independent of range, providing discrimination based on acoustic elements of the blast wave. Highly reliable discrimination will be achieved with a feed-forward neural network classifier trained on a feature space derived from the distribution of statistical coefficients, frequency spectrum, and higher frequency details found within different energy bands. The processes that are described herein extend current technologies, which emphasis acoustic sensor systems to provide such situational awareness.
Artillery/mortar round type classification to increase system situational awareness
NASA Astrophysics Data System (ADS)
Desai, Sachi; Grasing, David; Morcos, Amir; Hohil, Myron
2008-04-01
Feature extraction methods based on the statistical analysis of the change in event pressure levels over a period and the level of ambient pressure excitation facilitate the development of a robust classification algorithm. The features reliably discriminates mortar and artillery variants via acoustic signals produced during the launch events. Utilizing acoustic sensors to exploit the sound waveform generated from the blast for the identification of mortar and artillery variants as type A, etcetera through analysis of the waveform. Distinct characteristics arise within the different mortar/artillery variants because varying HE mortar payloads and related charges emphasize varying size events at launch. The waveform holds various harmonic properties distinct to a given mortar/artillery variant that through advanced signal processing and data mining techniques can employed to classify a given type. The skewness and other statistical processing techniques are used to extract the predominant components from the acoustic signatures at ranges exceeding 3000m. Exploiting these techniques will help develop a feature set highly independent of range, providing discrimination based on acoustic elements of the blast wave. Highly reliable discrimination will be achieved with a feedforward neural network classifier trained on a feature space derived from the distribution of statistical coefficients, frequency spectrum, and higher frequency details found within different energy bands. The processes that are described herein extend current technologies, which emphasis acoustic sensor systems to provide such situational awareness.
A statistical approach to root system classification
Bodner, Gernot; Leitner, Daniel; Nakhforoosh, Alireza; Sobotik, Monika; Moder, Karl; Kaul, Hans-Peter
2013-01-01
Plant root systems have a key role in ecology and agronomy. In spite of fast increase in root studies, still there is no classification that allows distinguishing among distinctive characteristics within the diversity of rooting strategies. Our hypothesis is that a multivariate approach for “plant functional type” identification in ecology can be applied to the classification of root systems. The classification method presented is based on a data-defined statistical procedure without a priori decision on the classifiers. The study demonstrates that principal component based rooting types provide efficient and meaningful multi-trait classifiers. The classification method is exemplified with simulated root architectures and morphological field data. Simulated root architectures showed that morphological attributes with spatial distribution parameters capture most distinctive features within root system diversity. While developmental type (tap vs. shoot-borne systems) is a strong, but coarse classifier, topological traits provide the most detailed differentiation among distinctive groups. Adequacy of commonly available morphologic traits for classification is supported by field data. Rooting types emerging from measured data, mainly distinguished by diameter/weight and density dominated types. Similarity of root systems within distinctive groups was the joint result of phylogenetic relation and environmental as well as human selection pressure. We concluded that the data-define classification is appropriate for integration of knowledge obtained with different root measurement methods and at various scales. Currently root morphology is the most promising basis for classification due to widely used common measurement protocols. To capture details of root diversity efforts in architectural measurement techniques are essential. PMID:23914200
A statistical approach to root system classification.
Bodner, Gernot; Leitner, Daniel; Nakhforoosh, Alireza; Sobotik, Monika; Moder, Karl; Kaul, Hans-Peter
2013-01-01
Plant root systems have a key role in ecology and agronomy. In spite of fast increase in root studies, still there is no classification that allows distinguishing among distinctive characteristics within the diversity of rooting strategies. Our hypothesis is that a multivariate approach for "plant functional type" identification in ecology can be applied to the classification of root systems. The classification method presented is based on a data-defined statistical procedure without a priori decision on the classifiers. The study demonstrates that principal component based rooting types provide efficient and meaningful multi-trait classifiers. The classification method is exemplified with simulated root architectures and morphological field data. Simulated root architectures showed that morphological attributes with spatial distribution parameters capture most distinctive features within root system diversity. While developmental type (tap vs. shoot-borne systems) is a strong, but coarse classifier, topological traits provide the most detailed differentiation among distinctive groups. Adequacy of commonly available morphologic traits for classification is supported by field data. Rooting types emerging from measured data, mainly distinguished by diameter/weight and density dominated types. Similarity of root systems within distinctive groups was the joint result of phylogenetic relation and environmental as well as human selection pressure. We concluded that the data-define classification is appropriate for integration of knowledge obtained with different root measurement methods and at various scales. Currently root morphology is the most promising basis for classification due to widely used common measurement protocols. To capture details of root diversity efforts in architectural measurement techniques are essential.
Adaptive statistical pattern classifiers for remotely sensed data
NASA Technical Reports Server (NTRS)
Gonzalez, R. C.; Pace, M. O.; Raulston, H. S.
1975-01-01
A technique for the adaptive estimation of nonstationary statistics necessary for Bayesian classification is developed. The basic approach to the adaptive estimation procedure consists of two steps: (1) an optimal stochastic approximation of the parameters of interest and (2) a projection of the parameters in time or position. A divergence criterion is developed to monitor algorithm performance. Comparative results of adaptive and nonadaptive classifier tests are presented for simulated four dimensional spectral scan data.
Bag-of-features approach for improvement of lung tissue classification in diffuse lung disease
NASA Astrophysics Data System (ADS)
Kato, Noriji; Fukui, Motofumi; Isozaki, Takashi
2009-02-01
Many automated techniques have been proposed to classify diffuse lung disease patterns. Most of the techniques utilize texture analysis approaches with second and higher order statistics, and show successful classification result among various lung tissue patterns. However, the approaches do not work well for the patterns with inhomogeneous texture distribution within a region of interest (ROI), such as reticular and honeycombing patterns, because the statistics can only capture averaged feature over the ROI. In this work, we have introduced the bag-of-features approach to overcome this difficulty. In the approach, texture images are represented as histograms or distributions of a few basic primitives, which are obtained by clustering local image features. The intensity descriptor and the Scale Invariant Feature Transformation (SIFT) descriptor are utilized to extract the local features, which have significant discriminatory power due to their specificity to a particular image class. In contrast, the drawback of the local features is lack of invariance under translation and rotation. We improved the invariance by sampling many local regions so that the distribution of the local features is unchanged. We evaluated the performance of our system in the classification task with 5 image classes (ground glass, reticular, honeycombing, emphysema, and normal) using 1109 ROIs from 211 patients. Our system achieved high classification accuracy of 92.8%, which is superior to that of the conventional system with the gray level co-occurrence matrix (GLCM) feature especially for inhomogeneous texture patterns.
Modality-Driven Classification and Visualization of Ensemble Variance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bensema, Kevin; Gosink, Luke; Obermaier, Harald
Advances in computational power now enable domain scientists to address conceptual and parametric uncertainty by running simulations multiple times in order to sufficiently sample the uncertain input space. While this approach helps address conceptual and parametric uncertainties, the ensemble datasets produced by this technique present a special challenge to visualization researchers as the ensemble dataset records a distribution of possible values for each location in the domain. Contemporary visualization approaches that rely solely on summary statistics (e.g., mean and variance) cannot convey the detailed information encoded in ensemble distributions that are paramount to ensemble analysis; summary statistics provide no informationmore » about modality classification and modality persistence. To address this problem, we propose a novel technique that classifies high-variance locations based on the modality of the distribution of ensemble predictions. Additionally, we develop a set of confidence metrics to inform the end-user of the quality of fit between the distribution at a given location and its assigned class. We apply a similar method to time-varying ensembles to illustrate the relationship between peak variance and bimodal or multimodal behavior. These classification schemes enable a deeper understanding of the behavior of the ensemble members by distinguishing between distributions that can be described by a single tendency and distributions which reflect divergent trends in the ensemble.« less
14 CFR Section 19 - Uniform Classification of Operating Statistics
Code of Federal Regulations, 2011 CFR
2011-01-01
... Statistics Section 19 Section 19 Aeronautics and Space OFFICE OF THE SECRETARY, DEPARTMENT OF TRANSPORTATION... AIR CARRIERS Operating Statistics Classifications Section 19 Uniform Classification of Operating Statistics ...
14 CFR Section 19 - Uniform Classification of Operating Statistics
Code of Federal Regulations, 2010 CFR
2010-01-01
... Statistics Section 19 Section 19 Aeronautics and Space OFFICE OF THE SECRETARY, DEPARTMENT OF TRANSPORTATION... AIR CARRIERS Operating Statistics Classifications Section 19 Uniform Classification of Operating Statistics ...
Automatic Cataract Hardness Classification Ex Vivo by Ultrasound Techniques.
Caixinha, Miguel; Santos, Mário; Santos, Jaime
2016-04-01
To demonstrate the feasibility of a new methodology for cataract hardness characterization and automatic classification using ultrasound techniques, different cataract degrees were induced in 210 porcine lenses. A 25-MHz ultrasound transducer was used to obtain acoustical parameters (velocity and attenuation) and backscattering signals. B-Scan and parametric Nakagami images were constructed. Ninety-seven parameters were extracted and subjected to a Principal Component Analysis. Bayes, K-Nearest-Neighbours, Fisher Linear Discriminant and Support Vector Machine (SVM) classifiers were used to automatically classify the different cataract severities. Statistically significant increases with cataract formation were found for velocity, attenuation, mean brightness intensity of the B-Scan images and mean Nakagami m parameter (p < 0.01). The four classifiers showed a good performance for healthy versus cataractous lenses (F-measure ≥ 92.68%), while for initial versus severe cataracts the SVM classifier showed the higher performance (90.62%). The results showed that ultrasound techniques can be used for non-invasive cataract hardness characterization and automatic classification. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
Probabilistic topic modeling for the analysis and classification of genomic sequences
2015-01-01
Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734
Collected Notes on the Workshop for Pattern Discovery in Large Databases
NASA Technical Reports Server (NTRS)
Buntine, Wray (Editor); Delalto, Martha (Editor)
1991-01-01
These collected notes are a record of material presented at the Workshop. The core data analysis is addressed that have traditionally required statistical or pattern recognition techniques. Some of the core tasks include classification, discrimination, clustering, supervised and unsupervised learning, discovery and diagnosis, i.e., general pattern discovery.
NASA Technical Reports Server (NTRS)
Logan, T. L.; Huning, J. R.; Glackin, D. L.
1983-01-01
The use of two dimensional Fast Fourier Transforms (FFTs) subjected to pattern recognition technology for the identification and classification of low altitude stratus cloud structure from Geostationary Operational Environmental Satellite (GOES) imagery was examined. The development of a scene independent pattern recognition methodology, unconstrained by conventional cloud morphological classifications was emphasized. A technique for extracting cloud shape, direction, and size attributes from GOES visual imagery was developed. These attributes were combined with two statistical attributes (cloud mean brightness, cloud standard deviation), and interrogated using unsupervised clustering amd maximum likelihood classification techniques. Results indicate that: (1) the key cloud discrimination attributes are mean brightness, direction, shape, and minimum size; (2) cloud structure can be differentiated at given pixel scales; (3) cloud type may be identifiable at coarser scales; (4) there are positive indications of scene independence which would permit development of a cloud signature bank; (5) edge enhancement of GOES imagery does not appreciably improve cloud classification over the use of raw data; and (6) the GOES imagery must be apodized before generation of FFTs.
Fernández, Katherina; Labarca, Ximena; Bordeu, Edmundo; Guesalaga, Andrés; Agosin, Eduardo
2007-11-01
Wine tannins are fundamental to the determination of wine quality. However, the chemical and sensorial analysis of these compounds is not straightforward and a simple and rapid technique is necessary. We analyzed the mid-infrared spectra of white, red, and model wines spiked with known amounts of skin or seed tannins, collected using Fourier transform mid-infrared (FT-MIR) transmission spectroscopy (400-4000 cm(-1)). The spectral data were classified according to their tannin source, skin or seed, and tannin concentration by means of discriminant analysis (DA) and soft independent modeling of class analogy (SIMCA) to obtain a probabilistic classification. Wines were also classified sensorially by a trained panel and compared with FT-MIR. SIMCA models gave the most accurate classification (over 97%) and prediction (over 60%) among the wine samples. The prediction was increased (over 73%) using the leave-one-out cross-validation technique. Sensory classification of the wines was less accurate than that obtained with FT-MIR and SIMCA. Overall, these results show the potential of FT-MIR spectroscopy, in combination with adequate statistical tools, to discriminate wines with different tannin levels.
NASA Astrophysics Data System (ADS)
Hossen, Jakir; Jacobs, Eddie L.; Chari, Srikant
2014-03-01
In this paper, we propose a real-time human versus animal classification technique using a pyro-electric sensor array and Hidden Markov Model. The technique starts with the variational energy functional level set segmentation technique to separate the object from background. After segmentation, we convert the segmented object to a signal by considering column-wise pixel values and then finding the wavelet coefficients of the signal. HMMs are trained to statistically model the wavelet features of individuals through an expectation-maximization learning process. Human versus animal classifications are made by evaluating a set of new wavelet feature data against the trained HMMs using the maximum-likelihood criterion. Human and animal data acquired-using a pyro-electric sensor in different terrains are used for performance evaluation of the algorithms. Failures of the computationally effective SURF feature based approach that we develop in our previous research are because of distorted images produced when the object runs very fast or if the temperature difference between target and background is not sufficient to accurately profile the object. We show that wavelet based HMMs work well for handling some of the distorted profiles in the data set. Further, HMM achieves improved classification rate over the SURF algorithm with almost the same computational time.
NASA Technical Reports Server (NTRS)
Buntine, Wray
1991-01-01
Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. How a tree learning algorithm can be derived from Bayesian decision theory is outlined. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule turns out to be similar to Quinlan's information gain splitting rule, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan's C4 and Breiman et al. Cart show the full Bayesian algorithm is consistently as good, or more accurate than these other approaches though at a computational price.
Muthu Rama Krishnan, M; Shah, Pratik; Chakraborty, Chandan; Ray, Ajoy K
2012-04-01
The objective of this paper is to provide an improved technique, which can assist oncopathologists in correct screening of oral precancerous conditions specially oral submucous fibrosis (OSF) with significant accuracy on the basis of collagen fibres in the sub-epithelial connective tissue. The proposed scheme is composed of collagen fibres segmentation, its textural feature extraction and selection, screening perfomance enhancement under Gaussian transformation and finally classification. In this study, collagen fibres are segmented on R,G,B color channels using back-probagation neural network from 60 normal and 59 OSF histological images followed by histogram specification for reducing the stain intensity variation. Henceforth, textural features of collgen area are extracted using fractal approaches viz., differential box counting and brownian motion curve . Feature selection is done using Kullback-Leibler (KL) divergence criterion and the screening performance is evaluated based on various statistical tests to conform Gaussian nature. Here, the screening performance is enhanced under Gaussian transformation of the non-Gaussian features using hybrid distribution. Moreover, the routine screening is designed based on two statistical classifiers viz., Bayesian classification and support vector machines (SVM) to classify normal and OSF. It is observed that SVM with linear kernel function provides better classification accuracy (91.64%) as compared to Bayesian classifier. The addition of fractal features of collagen under Gaussian transformation improves Bayesian classifier's performance from 80.69% to 90.75%. Results are here studied and discussed.
Wong, Raymond
2013-01-01
Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC. PMID:24288684
Characterization of agricultural land using singular value decomposition
NASA Astrophysics Data System (ADS)
Herries, Graham M.; Danaher, Sean; Selige, Thomas
1995-11-01
A method is defined and tested for the characterization of agricultural land from multi-spectral imagery, based on singular value decomposition (SVD) and key vector analysis. The SVD technique, which bears a close resemblance to multivariate statistic techniques, has previously been successfully applied to problems of signal extraction for marine data and forestry species classification. In this study the SVD technique is used as a classifier for agricultural regions, using airborne Daedalus ATM data, with 1 m resolution. The specific region chosen is an experimental research farm in Bavaria, Germany. This farm has a large number of crops, within a very small region and hence is not amenable to existing techniques. There are a number of other significant factors which render existing techniques such as the maximum likelihood algorithm less suitable for this area. These include a very dynamic terrain and tessellated pattern soil differences, which together cause large variations in the growth characteristics of the crops. The SVD technique is applied to this data set using a multi-stage classification approach, removing unwanted land-cover classes one step at a time. Typical classification accuracy's for SVD are of the order of 85-100%. Preliminary results indicate that it is a fast and efficient classifier with the ability to differentiate between crop types such as wheat, rye, potatoes and clover. The results of characterizing 3 sub-classes of Winter Wheat are also shown.
Iliyasu, Abdullah M; Fatichah, Chastine
2017-12-19
A quantum hybrid (QH) intelligent approach that blends the adaptive search capability of the quantum-behaved particle swarm optimisation (QPSO) method with the intuitionistic rationality of traditional fuzzy k -nearest neighbours (Fuzzy k -NN) algorithm (known simply as the Q-Fuzzy approach) is proposed for efficient feature selection and classification of cells in cervical smeared (CS) images. From an initial multitude of 17 features describing the geometry, colour, and texture of the CS images, the QPSO stage of our proposed technique is used to select the best subset features (i.e., global best particles) that represent a pruned down collection of seven features. Using a dataset of almost 1000 images, performance evaluation of our proposed Q-Fuzzy approach assesses the impact of our feature selection on classification accuracy by way of three experimental scenarios that are compared alongside two other approaches: the All-features (i.e., classification without prior feature selection) and another hybrid technique combining the standard PSO algorithm with the Fuzzy k -NN technique (P-Fuzzy approach). In the first and second scenarios, we further divided the assessment criteria in terms of classification accuracy based on the choice of best features and those in terms of the different categories of the cervical cells. In the third scenario, we introduced new QH hybrid techniques, i.e., QPSO combined with other supervised learning methods, and compared the classification accuracy alongside our proposed Q-Fuzzy approach. Furthermore, we employed statistical approaches to establish qualitative agreement with regards to the feature selection in the experimental scenarios 1 and 3. The synergy between the QPSO and Fuzzy k -NN in the proposed Q-Fuzzy approach improves classification accuracy as manifest in the reduction in number cell features, which is crucial for effective cervical cancer detection and diagnosis.
Review and classification of variability analysis techniques with clinical applications.
Bravi, Andrea; Longtin, André; Seely, Andrew J E
2011-10-10
Analysis of patterns of variation of time-series, termed variability analysis, represents a rapidly evolving discipline with increasing applications in different fields of science. In medicine and in particular critical care, efforts have focussed on evaluating the clinical utility of variability. However, the growth and complexity of techniques applicable to this field have made interpretation and understanding of variability more challenging. Our objective is to provide an updated review of variability analysis techniques suitable for clinical applications. We review more than 70 variability techniques, providing for each technique a brief description of the underlying theory and assumptions, together with a summary of clinical applications. We propose a revised classification for the domains of variability techniques, which include statistical, geometric, energetic, informational, and invariant. We discuss the process of calculation, often necessitating a mathematical transform of the time-series. Our aims are to summarize a broad literature, promote a shared vocabulary that would improve the exchange of ideas, and the analyses of the results between different studies. We conclude with challenges for the evolving science of variability analysis.
Review and classification of variability analysis techniques with clinical applications
2011-01-01
Analysis of patterns of variation of time-series, termed variability analysis, represents a rapidly evolving discipline with increasing applications in different fields of science. In medicine and in particular critical care, efforts have focussed on evaluating the clinical utility of variability. However, the growth and complexity of techniques applicable to this field have made interpretation and understanding of variability more challenging. Our objective is to provide an updated review of variability analysis techniques suitable for clinical applications. We review more than 70 variability techniques, providing for each technique a brief description of the underlying theory and assumptions, together with a summary of clinical applications. We propose a revised classification for the domains of variability techniques, which include statistical, geometric, energetic, informational, and invariant. We discuss the process of calculation, often necessitating a mathematical transform of the time-series. Our aims are to summarize a broad literature, promote a shared vocabulary that would improve the exchange of ideas, and the analyses of the results between different studies. We conclude with challenges for the evolving science of variability analysis. PMID:21985357
The evaluation of alternate methodologies for land cover classification in an urbanizing area
NASA Technical Reports Server (NTRS)
Smekofski, R. M.
1981-01-01
The usefulness of LANDSAT in classifying land cover and in identifying and classifying land use change was investigated using an urbanizing area as the study area. The question of what was the best technique for classification was the primary focus of the study. The many computer-assisted techniques available to analyze LANDSAT data were evaluated. Techniques of statistical training (polygons from CRT, unsupervised clustering, polygons from digitizer and binary masks) were tested with minimum distance to the mean, maximum likelihood and canonical analysis with minimum distance to the mean classifiers. The twelve output images were compared to photointerpreted samples, ground verified samples and a current land use data base. Results indicate that for a reconnaissance inventory, the unsupervised training with canonical analysis-minimum distance classifier is the most efficient. If more detailed ground truth and ground verification is available, the polygons from the digitizer training with the canonical analysis minimum distance is more accurate.
Belgiu, Mariana; Dr Guţ, Lucian; Strobl, Josef
2014-01-01
The increasing availability of high resolution imagery has triggered the need for automated image analysis techniques, with reduced human intervention and reproducible analysis procedures. The knowledge gained in the past might be of use to achieving this goal, if systematically organized into libraries which would guide the image analysis procedure. In this study we aimed at evaluating the variability of digital classifications carried out by three experts who were all assigned the same interpretation task. Besides the three classifications performed by independent operators, we developed an additional rule-based classification that relied on the image classifications best practices found in the literature, and used it as a surrogate for libraries of object characteristics. The results showed statistically significant differences among all operators who classified the same reference imagery. The classifications carried out by the experts achieved satisfactory results when transferred to another area for extracting the same classes of interest, without modification of the developed rules.
Belgiu, Mariana; Drǎguţ, Lucian; Strobl, Josef
2014-01-01
The increasing availability of high resolution imagery has triggered the need for automated image analysis techniques, with reduced human intervention and reproducible analysis procedures. The knowledge gained in the past might be of use to achieving this goal, if systematically organized into libraries which would guide the image analysis procedure. In this study we aimed at evaluating the variability of digital classifications carried out by three experts who were all assigned the same interpretation task. Besides the three classifications performed by independent operators, we developed an additional rule-based classification that relied on the image classifications best practices found in the literature, and used it as a surrogate for libraries of object characteristics. The results showed statistically significant differences among all operators who classified the same reference imagery. The classifications carried out by the experts achieved satisfactory results when transferred to another area for extracting the same classes of interest, without modification of the developed rules. PMID:24623959
NASA Astrophysics Data System (ADS)
Belgiu, Mariana; ǎguţ, Lucian, , Dr; Strobl, Josef
2014-01-01
The increasing availability of high resolution imagery has triggered the need for automated image analysis techniques, with reduced human intervention and reproducible analysis procedures. The knowledge gained in the past might be of use to achieving this goal, if systematically organized into libraries which would guide the image analysis procedure. In this study we aimed at evaluating the variability of digital classifications carried out by three experts who were all assigned the same interpretation task. Besides the three classifications performed by independent operators, we developed an additional rule-based classification that relied on the image classifications best practices found in the literature, and used it as a surrogate for libraries of object characteristics. The results showed statistically significant differences among all operators who classified the same reference imagery. The classifications carried out by the experts achieved satisfactory results when transferred to another area for extracting the same classes of interest, without modification of the developed rules.
Robust Feature Selection Technique using Rank Aggregation.
Sarkar, Chandrima; Cooley, Sarah; Srivastava, Jaideep
2014-01-01
Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique which produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases while classifiers exploit different statistical properties of data for evaluation. In numerous situations this can put researchers into dilemma as to which feature selection method and a classifiers to choose from a vast range of choices. In this paper, we propose a technique that aggregates the consensus properties of various feature selection methods to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable towards achieving similar and ideally higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the Robustness Index (RI). We perform an extensive empirical evaluation of our technique on eight data sets with different dimensions including Arrythmia, Lung Cancer, Madelon, mfeat-fourier, internet-ads, Leukemia-3c and Embryonal Tumor and a real world data set namely Acute Myeloid Leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that compared to other techniques our algorithm improves the classification accuracy by approximately 3-4% (in data set with less than 500 features) and by more than 5% (in data set with more than 500 features), across a wide range of classifiers.
An Analysis of Variance Framework for Matrix Sampling.
ERIC Educational Resources Information Center
Sirotnik, Kenneth
Significant cost savings can be achieved with the use of matrix sampling in estimating population parameters from psychometric data. The statistical design is intuitively simple, using the framework of the two-way classification analysis of variance technique. For example, the mean and variance are derived from the performance of a certain grade…
Calibration of remotely sensed proportion or area estimates for misclassification error
Raymond L. Czaplewski; Glenn P. Catts
1992-01-01
Classifications of remotely sensed data contain misclassification errors that bias areal estimates. Monte Carlo techniques were used to compare two statistical methods that correct or calibrate remotely sensed areal estimates for misclassification bias using reference data from an error matrix. The inverse calibration estimator was consistently superior to the...
Statistical analysis of texture in trunk images for biometric identification of tree species.
Bressane, Adriano; Roveda, José A F; Martins, Antônio C G
2015-04-01
The identification of tree species is a key step for sustainable management plans of forest resources, as well as for several other applications that are based on such surveys. However, the present available techniques are dependent on the presence of tree structures, such as flowers, fruits, and leaves, limiting the identification process to certain periods of the year. Therefore, this article introduces a study on the application of statistical parameters for texture classification of tree trunk images. For that, 540 samples from five Brazilian native deciduous species were acquired and measures of entropy, uniformity, smoothness, asymmetry (third moment), mean, and standard deviation were obtained from the presented textures. Using a decision tree, a biometric species identification system was constructed and resulted to a 0.84 average precision rate for species classification with 0.83accuracy and 0.79 agreement. Thus, it can be considered that the use of texture presented in trunk images can represent an important advance in tree identification, since the limitations of the current techniques can be overcome.
NASA Technical Reports Server (NTRS)
Lodwick, G. D. (Principal Investigator)
1976-01-01
A digital computer and multivariate statistical techniques were used to analyze 4-band multispectral data. A representation of the original data for each of the four bands allows a certain degree of terrain interpretation; however, variations in appearance of sites within and between bands, without additional criteria for deciding which representation should be preferred, create difficulties for classification. Investigation of the video data groups produced by principal components analysis and cluster analysis techniques shows that effective correlations with classifications of terrain produced by conventional methods could be carried out. The analyses also highlighted underlying relationships between the various elements. The approach used allows large areas (185 cm by 185 cm) to be classified into fundamental units within a matter of hours and can be applied to those parts of the Earth where facilities for conventional studies are poor or lacking.
Variance approximations for assessments of classification accuracy
R. L. Czaplewski
1994-01-01
Variance approximations are derived for the weighted and unweighted kappa statistics, the conditional kappa statistic, and conditional probabilities. These statistics are useful to assess classification accuracy, such as accuracy of remotely sensed classifications in thematic maps when compared to a sample of reference classifications made in the field. Published...
Mladinich, C.
2010-01-01
Human disturbance is a leading ecosystem stressor. Human-induced modifications include transportation networks, areal disturbances due to resource extraction, and recreation activities. High-resolution imagery and object-oriented classification rather than pixel-based techniques have successfully identified roads, buildings, and other anthropogenic features. Three commercial, automated feature-extraction software packages (Visual Learning Systems' Feature Analyst, ENVI Feature Extraction, and Definiens Developer) were evaluated by comparing their ability to effectively detect the disturbed surface patterns from motorized vehicle traffic. Each package achieved overall accuracies in the 70% range, demonstrating the potential to map the surface patterns. The Definiens classification was more consistent and statistically valid. Copyright ?? 2010 by Bellwether Publishing, Ltd. All rights reserved.
Lightweight and Statistical Techniques for Petascale PetaScale Debugging
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Barton
2014-06-30
This project investigated novel techniques for debugging scientific applications on petascale architectures. In particular, we developed lightweight tools that narrow the problem space when bugs are encountered. We also developed techniques that either limit the number of tasks and the code regions to which a developer must apply a traditional debugger or that apply statistical techniques to provide direct suggestions of the location and type of error. We extend previous work on the Stack Trace Analysis Tool (STAT), that has already demonstrated scalability to over one hundred thousand MPI tasks. We also extended statistical techniques developed to isolate programming errorsmore » in widely used sequential or threaded applications in the Cooperative Bug Isolation (CBI) project to large scale parallel applications. Overall, our research substantially improved productivity on petascale platforms through a tool set for debugging that complements existing commercial tools. Previously, Office Of Science application developers relied either on primitive manual debugging techniques based on printf or they use tools, such as TotalView, that do not scale beyond a few thousand processors. However, bugs often arise at scale and substantial effort and computation cycles are wasted in either reproducing the problem in a smaller run that can be analyzed with the traditional tools or in repeated runs at scale that use the primitive techniques. New techniques that work at scale and automate the process of identifying the root cause of errors were needed. These techniques significantly reduced the time spent debugging petascale applications, thus leading to a greater overall amount of time for application scientists to pursue the scientific objectives for which the systems are purchased. We developed a new paradigm for debugging at scale: techniques that reduced the debugging scenario to a scale suitable for traditional debuggers, e.g., by narrowing the search for the root-cause analysis to a small set of nodes or by identifying equivalence classes of nodes and sampling our debug targets from them. We implemented these techniques as lightweight tools that efficiently work on the full scale of the target machine. We explored four lightweight debugging refinements: generic classification parameters, such as stack traces, application-specific classification parameters, such as global variables, statistical data acquisition techniques and machine learning based approaches to perform root cause analysis. Work done under this project can be divided into two categories, new algorithms and techniques for scalable debugging, and foundation infrastructure work on our MRNet multicast-reduction framework for scalability, and Dyninst binary analysis and instrumentation toolkits.« less
Fetit, Ahmed E; Novak, Jan; Peet, Andrew C; Arvanitits, Theodoros N
2015-09-01
The aim of this study was to assess the efficacy of three-dimensional texture analysis (3D TA) of conventional MR images for the classification of childhood brain tumours in a quantitative manner. The dataset comprised pre-contrast T1 - and T2-weighted MRI series obtained from 48 children diagnosed with brain tumours (medulloblastoma, pilocytic astrocytoma and ependymoma). 3D and 2D TA were carried out on the images using first-, second- and higher order statistical methods. Six supervised classification algorithms were trained with the most influential 3D and 2D textural features, and their performances in the classification of tumour types, using the two feature sets, were compared. Model validation was carried out using the leave-one-out cross-validation (LOOCV) approach, as well as stratified 10-fold cross-validation, in order to provide additional reassurance. McNemar's test was used to test the statistical significance of any improvements demonstrated by 3D-trained classifiers. Supervised learning models trained with 3D textural features showed improved classification performances to those trained with conventional 2D features. For instance, a neural network classifier showed 12% improvement in area under the receiver operator characteristics curve (AUC) and 19% in overall classification accuracy. These improvements were statistically significant for four of the tested classifiers, as per McNemar's tests. This study shows that 3D textural features extracted from conventional T1 - and T2-weighted images can improve the diagnostic classification of childhood brain tumours. Long-term benefits of accurate, yet non-invasive, diagnostic aids include a reduction in surgical procedures, improvement in surgical and therapy planning, and support of discussions with patients' families. It remains necessary, however, to extend the analysis to a multicentre cohort in order to assess the scalability of the techniques used. Copyright © 2015 John Wiley & Sons, Ltd.
Choice-Based Conjoint Analysis: Classification vs. Discrete Choice Models
NASA Astrophysics Data System (ADS)
Giesen, Joachim; Mueller, Klaus; Taneva, Bilyana; Zolliker, Peter
Conjoint analysis is a family of techniques that originated in psychology and later became popular in market research. The main objective of conjoint analysis is to measure an individual's or a population's preferences on a class of options that can be described by parameters and their levels. We consider preference data obtained in choice-based conjoint analysis studies, where one observes test persons' choices on small subsets of the options. There are many ways to analyze choice-based conjoint analysis data. Here we discuss the intuition behind a classification based approach, and compare this approach to one based on statistical assumptions (discrete choice models) and to a regression approach. Our comparison on real and synthetic data indicates that the classification approach outperforms the discrete choice models.
A Classification of Statistics Courses (A Framework for Studying Statistical Education)
ERIC Educational Resources Information Center
Turner, J. C.
1976-01-01
A classification of statistics courses in presented, with main categories of "course type,""methods of presentation,""objectives," and "syllabus." Examples and suggestions for uses of the classification are given. (DT)
Gold-standard for computer-assisted morphological sperm analysis.
Chang, Violeta; Garcia, Alejandra; Hitschfeld, Nancy; Härtel, Steffen
2017-04-01
Published algorithms for classification of human sperm heads are based on relatively small image databases that are not open to the public, and thus no direct comparison is available for competing methods. We describe a gold-standard for morphological sperm analysis (SCIAN-MorphoSpermGS), a dataset of sperm head images with expert-classification labels in one of the following classes: normal, tapered, pyriform, small or amorphous. This gold-standard is for evaluating and comparing known techniques and future improvements to present approaches for classification of human sperm heads for semen analysis. Although this paper does not provide a computational tool for morphological sperm analysis, we present a set of experiments for comparing sperm head description and classification common techniques. This classification base-line is aimed to be used as a reference for future improvements to present approaches for human sperm head classification. The gold-standard provides a label for each sperm head, which is achieved by majority voting among experts. The classification base-line compares four supervised learning methods (1- Nearest Neighbor, naive Bayes, decision trees and Support Vector Machine (SVM)) and three shape-based descriptors (Hu moments, Zernike moments and Fourier descriptors), reporting the accuracy and the true positive rate for each experiment. We used Fleiss' Kappa Coefficient to evaluate the inter-expert agreement and Fisher's exact test for inter-expert variability and statistical significant differences between descriptors and learning techniques. Our results confirm the high degree of inter-expert variability in the morphological sperm analysis. Regarding the classification base line, we show that none of the standard descriptors or classification approaches is best suitable for tackling the problem of sperm head classification. We discovered that the correct classification rate was highly variable when trying to discriminate among non-normal sperm heads. By using the Fourier descriptor and SVM, we achieved the best mean correct classification: only 49%. We conclude that the SCIAN-MorphoSpermGS will provide a standard tool for evaluation of characterization and classification approaches for human sperm heads. Indeed, there is a clear need for a specific shape-based descriptor for human sperm heads and a specific classification approach to tackle the problem of high variability within subcategories of abnormal sperm cells. Copyright © 2017 Elsevier Ltd. All rights reserved.
Automatic age and gender classification using supervised appearance model
NASA Astrophysics Data System (ADS)
Bukar, Ali Maina; Ugail, Hassan; Connah, David
2016-11-01
Age and gender classification are two important problems that recently gained popularity in the research community, due to their wide range of applications. Research has shown that both age and gender information are encoded in the face shape and texture, hence the active appearance model (AAM), a statistical model that captures shape and texture variations, has been one of the most widely used feature extraction techniques for the aforementioned problems. However, AAM suffers from some drawbacks, especially when used for classification. This is primarily because principal component analysis (PCA), which is at the core of the model, works in an unsupervised manner, i.e., PCA dimensionality reduction does not take into account how the predictor variables relate to the response (class labels). Rather, it explores only the underlying structure of the predictor variables, thus, it is no surprise if PCA discards valuable parts of the data that represent discriminatory features. Toward this end, we propose a supervised appearance model (sAM) that improves on AAM by replacing PCA with partial least-squares regression. This feature extraction technique is then used for the problems of age and gender classification. Our experiments show that sAM has better predictive power than the conventional AAM.
NASA Technical Reports Server (NTRS)
Benediktsson, Jon A.; Swain, Philip H.; Ersoy, Okan K.
1990-01-01
Neural network learning procedures and statistical classificaiton methods are applied and compared empirically in classification of multisource remote sensing and geographic data. Statistical multisource classification by means of a method based on Bayesian classification theory is also investigated and modified. The modifications permit control of the influence of the data sources involved in the classification process. Reliability measures are introduced to rank the quality of the data sources. The data sources are then weighted according to these rankings in the statistical multisource classification. Four data sources are used in experiments: Landsat MSS data and three forms of topographic data (elevation, slope, and aspect). Experimental results show that two different approaches have unique advantages and disadvantages in this classification application.
Geological mapping in northwestern Saudi Arabia using LANDSAT multispectral techniques
NASA Technical Reports Server (NTRS)
Blodget, H. W.; Brown, G. F.; Moik, J. G.
1975-01-01
Various computer enhancement and data extraction systems using LANDSAT data were assessed and used to complement a continuing geologic mapping program. Interactive digital classification techniques using both the parallel-piped and maximum-likelihood statistical approaches achieve very limited success in areas of highly dissected terrain. Computer enhanced imagery developed by color compositing stretched MSS ratio data was constructed for a test site in northwestern Saudi Arabia. Initial results indicate that several igneous and sedimentary rock types can be discriminated.
Rank-k Maximal Statistics for Divergence and Probability of Misclassification
NASA Technical Reports Server (NTRS)
Decell, H. P., Jr.
1972-01-01
A technique is developed for selecting from n-channel multispectral data some k combinations of the n-channels upon which to base a given classification technique so that some measure of the loss of the ability to distinguish between classes, using the compressed k-dimensional data, is minimized. Information loss in compressing the n-channel data to k channels is taken to be the difference in the average interclass divergences (or probability of misclassification) in n-space and in k-space.
ERIC Educational Resources Information Center
Galbraith, Craig S.; Merrill, Gregory B.; Kline, Doug M.
2012-01-01
In this study we investigate the underlying relational structure between student evaluations of teaching effectiveness (SETEs) and achievement of student learning outcomes in 116 business related courses. Utilizing traditional statistical techniques, a neural network analysis and a Bayesian data reduction and classification algorithm, we find…
Enhancing the Classification Accuracy of IP Geolocation
2013-10-01
accurately identify the geographic location of Internet devices has signficant implications for online- advertisers, application developers , network...Real Media, Comedy Central, Netflix and Spotify) and target advertising (e.g., Google). More re- cently, IP geolocation techniques have been deployed...distance to delay function and how they triangulate the position of the target. Statistical Geolocation [14] develops a joint probability density
NASA Astrophysics Data System (ADS)
Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying
2018-06-01
In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.
Classical Statistics and Statistical Learning in Imaging Neuroscience
Bzdok, Danilo
2017-01-01
Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896
NASA Technical Reports Server (NTRS)
Emerson, Charles W.; Sig-NganLam, Nina; Quattrochi, Dale A.
2004-01-01
The accuracy of traditional multispectral maximum-likelihood image classification is limited by the skewed statistical distributions of reflectances from the complex heterogenous mixture of land cover types in urban areas. This work examines the utility of local variance, fractal dimension and Moran's I index of spatial autocorrelation in segmenting multispectral satellite imagery. Tools available in the Image Characterization and Modeling System (ICAMS) were used to analyze Landsat 7 imagery of Atlanta, Georgia. Although segmentation of panchromatic images is possible using indicators of spatial complexity, different land covers often yield similar values of these indices. Better results are obtained when a surface of local fractal dimension or spatial autocorrelation is combined as an additional layer in a supervised maximum-likelihood multispectral classification. The addition of fractal dimension measures is particularly effective at resolving land cover classes within urbanized areas, as compared to per-pixel spectral classification techniques.
Active learning methods for interactive image retrieval.
Gosselin, Philippe Henri; Cord, Matthieu
2008-07-01
Active learning methods have been considered with increased interest in the statistical learning community. Initially developed within a classification framework, a lot of extensions are now being proposed to handle multimedia applications. This paper provides algorithms within a statistical framework to extend active learning for online content-based image retrieval (CBIR). The classification framework is presented with experiments to compare several powerful classification techniques in this information retrieval context. Focusing on interactive methods, active learning strategy is then described. The limitations of this approach for CBIR are emphasized before presenting our new active selection process RETIN. First, as any active method is sensitive to the boundary estimation between classes, the RETIN strategy carries out a boundary correction to make the retrieval process more robust. Second, the criterion of generalization error to optimize the active learning selection is modified to better represent the CBIR objective of database ranking. Third, a batch processing of images is proposed. Our strategy leads to a fast and efficient active learning scheme to retrieve sets of online images (query concept). Experiments on large databases show that the RETIN method performs well in comparison to several other active strategies.
Classification without labels: learning from mixed samples in high energy physics
NASA Astrophysics Data System (ADS)
Metodiev, Eric M.; Nachman, Benjamin; Thaler, Jesse
2017-10-01
Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimal classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.
Classification without labels: learning from mixed samples in high energy physics
Metodiev, Eric M.; Nachman, Benjamin; Thaler, Jesse
2017-10-25
Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimalmore » classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.« less
Classification without labels: learning from mixed samples in high energy physics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Metodiev, Eric M.; Nachman, Benjamin; Thaler, Jesse
Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimalmore » classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.« less
ANALYSIS OF A CLASSIFICATION ERROR MATRIX USING CATEGORICAL DATA TECHNIQUES.
Rosenfield, George H.; Fitzpatrick-Lins, Katherine
1984-01-01
Summary form only given. A classification error matrix typically contains tabulation results of an accuracy evaluation of a thematic classification, such as that of a land use and land cover map. The diagonal elements of the matrix represent the counts corrected, and the usual designation of classification accuracy has been the total percent correct. The nondiagonal elements of the matrix have usually been neglected. The classification error matrix is known in statistical terms as a contingency table of categorical data. As an example, an application of these methodologies to a problem of remotely sensed data concerning two photointerpreters and four categories of classification indicated that there is no significant difference in the interpretation between the two photointerpreters, and that there are significant differences among the interpreted category classifications. However, two categories, oak and cottonwood, are not separable in classification in this experiment at the 0. 51 percent probability. A coefficient of agreement is determined for the interpreted map as a whole, and individually for each of the interpreted categories. A conditional coefficient of agreement for the individual categories is compared to other methods for expressing category accuracy which have already been presented in the remote sensing literature.
Automotive System for Remote Surface Classification.
Bystrov, Aleksandr; Hoare, Edward; Tran, Thuy-Yung; Clarke, Nigel; Gashinova, Marina; Cherniakov, Mikhail
2017-04-01
In this paper we shall discuss a novel approach to road surface recognition, based on the analysis of backscattered microwave and ultrasonic signals. The novelty of our method is sonar and polarimetric radar data fusion, extraction of features for separate swathes of illuminated surface (segmentation), and using of multi-stage artificial neural network for surface classification. The developed system consists of 24 GHz radar and 40 kHz ultrasonic sensor. The features are extracted from backscattered signals and then the procedures of principal component analysis and supervised classification are applied to feature data. The special attention is paid to multi-stage artificial neural network which allows an overall increase in classification accuracy. The proposed technique was tested for recognition of a large number of real surfaces in different weather conditions with the average accuracy of correct classification of 95%. The obtained results thereby demonstrate that the use of proposed system architecture and statistical methods allow for reliable discrimination of various road surfaces in real conditions.
Automotive System for Remote Surface Classification
Bystrov, Aleksandr; Hoare, Edward; Tran, Thuy-Yung; Clarke, Nigel; Gashinova, Marina; Cherniakov, Mikhail
2017-01-01
In this paper we shall discuss a novel approach to road surface recognition, based on the analysis of backscattered microwave and ultrasonic signals. The novelty of our method is sonar and polarimetric radar data fusion, extraction of features for separate swathes of illuminated surface (segmentation), and using of multi-stage artificial neural network for surface classification. The developed system consists of 24 GHz radar and 40 kHz ultrasonic sensor. The features are extracted from backscattered signals and then the procedures of principal component analysis and supervised classification are applied to feature data. The special attention is paid to multi-stage artificial neural network which allows an overall increase in classification accuracy. The proposed technique was tested for recognition of a large number of real surfaces in different weather conditions with the average accuracy of correct classification of 95%. The obtained results thereby demonstrate that the use of proposed system architecture and statistical methods allow for reliable discrimination of various road surfaces in real conditions. PMID:28368297
Statistical methods and neural network approaches for classification of data from multiple sources
NASA Technical Reports Server (NTRS)
Benediktsson, Jon Atli; Swain, Philip H.
1990-01-01
Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.
McDermott, P A; Hale, R L
1982-07-01
Tested diagnostic classifications of child psychopathology produced by a computerized technique known as multidimensional actuarial classification (MAC) against the criterion of expert psychological opinion. The MAC program applies series of statistical decision rules to assess the importance of and relationships among several dimensions of classification, i.e., intellectual functioning, academic achievement, adaptive behavior, and social and behavioral adjustment, to perform differential diagnosis of children's mental retardation, specific learning disabilities, behavioral and emotional disturbance, possible communication or perceptual-motor impairment, and academic under- and overachievement in reading and mathematics. Classifications rendered by MAC are compared to those offered by two expert child psychologists for cases of 73 children referred for psychological services. Experts' agreement with MAC was significant for all classification areas, as was MAC's agreement with the experts held as a conjoint reference standard. Whereas the experts' agreement with MAC averaged 86.0% above chance, their agreement with one another averaged 76.5% above chance. Implications of the findings are explored and potential advantages of the systems-actuarial approach are discussed.
Havla, Lukas; Schneider, Moritz J; Thierfelder, Kolja M; Beyer, Sebastian E; Ertl-Wagner, Birgit; Reiser, Maximilian F; Sommer, Wieland H; Dietrich, Olaf
2016-02-01
The purpose of this study was to propose and evaluate a new wavelet-based technique for classification of arterial and venous vessels using time-resolved cerebral CT perfusion data sets. Fourteen consecutive patients (mean age 73 yr, range 17-97) with suspected stroke but no pathology in follow-up MRI were included. A CT perfusion scan with 32 dynamic phases was performed during intravenous bolus contrast-agent application. After rigid-body motion correction, a Paul wavelet (order 1) was used to calculate voxelwise the wavelet power spectrum (WPS) of each attenuation-time course. The angiographic intensity A was defined as the maximum of the WPS, located at the coordinates T (time axis) and W (scale/width axis) within the WPS. Using these three parameters (A, T, W) separately as well as combined by (1) Fisher's linear discriminant analysis (FLDA), (2) logistic regression (LogR) analysis, or (3) support vector machine (SVM) analysis, their potential to classify 18 different arterial and venous vessel segments per subject was evaluated. The best vessel classification was obtained using all three parameters A and T and W [area under the curve (AUC): 0.953 with FLDA and 0.957 with LogR or SVM]. In direct comparison, the wavelet-derived parameters provided performance at least equal to conventional attenuation-time-course parameters. The maximum AUC obtained from the proposed wavelet parameters was slightly (although not statistically significantly) higher than the maximum AUC (0.945) obtained from the conventional parameters. A new method to classify arterial and venous cerebral vessels with high statistical accuracy was introduced based on the time-domain wavelet transform of dynamic CT perfusion data in combination with linear or nonlinear multidimensional classification techniques.
Pulsed terahertz imaging of breast cancer in freshly excised murine tumors
NASA Astrophysics Data System (ADS)
Bowman, Tyler; Chavez, Tanny; Khan, Kamrul; Wu, Jingxian; Chakraborty, Avishek; Rajaram, Narasimhan; Bailey, Keith; El-Shenawee, Magda
2018-02-01
This paper investigates terahertz (THz) imaging and classification of freshly excised murine xenograft breast cancer tumors. These tumors are grown via injection of E0771 breast adenocarcinoma cells into the flank of mice maintained on high-fat diet. Within 1 h of excision, the tumor and adjacent tissues are imaged using a pulsed THz system in the reflection mode. The THz images are classified using a statistical Bayesian mixture model with unsupervised and supervised approaches. Correlation with digitized pathology images is conducted using classification images assigned by a modal class decision rule. The corresponding receiver operating characteristic curves are obtained based on the classification results. A total of 13 tumor samples obtained from 9 tumors are investigated. The results show good correlation of THz images with pathology results in all samples of cancer and fat tissues. For tumor samples of cancer, fat, and muscle tissues, THz images show reasonable correlation with pathology where the primary challenge lies in the overlapping dielectric properties of cancer and muscle tissues. The use of a supervised regression approach shows improvement in the classification images although not consistently in all tissue regions. Advancing THz imaging of breast tumors from mice and the development of accurate statistical models will ultimately progress the technique for the assessment of human breast tumor margins.
NASA Technical Reports Server (NTRS)
Hebert, Martial
1999-01-01
This report describes a three-month research program undertook jointly by the Robotics Institute at Carnegie Mellon University and Ames Research Center as part of the Ames' Joint Research Initiative (JRI.) The work was conducted at the Ames Research Center by Mr. Liam Pedersen, a graduate student in the CMU Ph.D. program in Robotics under the supervision Dr. Ted Roush at the Space Science Division of the Ames Research Center from May 15 1999 to August 15, 1999. Dr. Martial Hebert is Mr. Pedersen's research adviser at CMU and is Principal Investigator of this Grant. The goal of this project is to investigate and implement methods suitable for a robotic rover to autonomously identify rocks and minerals in its vicinity, and to statistically characterize the local geological environment. Although primary sensors for these tasks are a reflection spectrometer and color camera, the goal is to create a framework under which data from multiple sensors, and multiple readings on the same object, can be combined in a principled manner. Furthermore, it is envisioned that knowledge of the local area, either a priori or gathered by the robot, will be used to improve classification accuracy. The key results obtained during this project are: The continuation of the development of a rock classifier; development of theoretical statistical methods; development of methods for evaluating and selecting sensors; and experimentation with data mining techniques on the Ames spectral library. The results of this work are being applied at CMU, in particular in the context of the Winter 99 Antarctica expedition in which the classification techniques will be used on the Nomad robot. Conversely, the software developed based on those techniques will continue to be made available to NASA Ames and the data collected from the Nomad experiments will also be made available.
Analysis of the Tanana River Basin using LANDSAT data
NASA Technical Reports Server (NTRS)
Morrissey, L. A.; Ambrosia, V. G.; Carson-Henry, C.
1981-01-01
Digital image classification techniques were used to classify land cover/resource information in the Tanana River Basin of Alaska. Portions of four scenes of LANDSAT digital data were analyzed using computer systems at Ames Research Center in an unsupervised approach to derive cluster statistics. The spectral classes were identified using the IDIMS display and color infrared photography. Classification errors were corrected using stratification procedures. The classification scheme resulted in the following eleven categories; sedimented/shallow water, clear/deep water, coniferous forest, mixed forest, deciduous forest, shrub and grass, bog, alpine tundra, barrens, snow and ice, and cultural features. Color coded maps and acreage summaries of the major land cover categories were generated for selected USGS quadrangles (1:250,000) which lie within the drainage basin. The project was completed within six months.
NASA Astrophysics Data System (ADS)
Li, Manchun; Ma, Lei; Blaschke, Thomas; Cheng, Liang; Tiede, Dirk
2016-07-01
Geographic Object-Based Image Analysis (GEOBIA) is becoming more prevalent in remote sensing classification, especially for high-resolution imagery. Many supervised classification approaches are applied to objects rather than pixels, and several studies have been conducted to evaluate the performance of such supervised classification techniques in GEOBIA. However, these studies did not systematically investigate all relevant factors affecting the classification (segmentation scale, training set size, feature selection and mixed objects). In this study, statistical methods and visual inspection were used to compare these factors systematically in two agricultural case studies in China. The results indicate that Random Forest (RF) and Support Vector Machines (SVM) are highly suitable for GEOBIA classifications in agricultural areas and confirm the expected general tendency, namely that the overall accuracies decline with increasing segmentation scale. All other investigated methods except for RF and SVM are more prone to obtain a lower accuracy due to the broken objects at fine scales. In contrast to some previous studies, the RF classifiers yielded the best results and the k-nearest neighbor classifier were the worst results, in most cases. Likewise, the RF and Decision Tree classifiers are the most robust with or without feature selection. The results of training sample analyses indicated that the RF and adaboost. M1 possess a superior generalization capability, except when dealing with small training sample sizes. Furthermore, the classification accuracies were directly related to the homogeneity/heterogeneity of the segmented objects for all classifiers. Finally, it was suggested that RF should be considered in most cases for agricultural mapping.
Integrating remote sensing and terrain data in forest fire modeling
NASA Astrophysics Data System (ADS)
Medler, Michael Johns
Forest fire policies are changing. Managers now face conflicting imperatives to re-establish pre-suppression fire regimes, while simultaneously preventing resource destruction. They must, therefore, understand the spatial patterns of fires. Geographers can facilitate this understanding by developing new techniques for mapping fire behavior. This dissertation develops such techniques for mapping recent fires and using these maps to calibrate models of potential fire hazards. In so doing, it features techniques that strive to address the inherent complexity of modeling the combinations of variables found in most ecological systems. Image processing techniques were used to stratify the elements of terrain, slope, elevation, and aspect. These stratification images were used to assure sample placement considered the role of terrain in fire behavior. Examination of multiple stratification images indicated samples were placed representatively across a controlled range of scales. The incorporation of terrain data also improved preliminary fire hazard classification accuracy by 40%, compared with remotely sensed data alone. A Kauth-Thomas transformation (KT) of pre-fire and post-fire Thematic Mapper (TM) remotely sensed data produced brightness, greenness, and wetness images. Image subtraction indicated fire induced change in brightness, greenness, and wetness. Field data guided a fuzzy classification of these change images. Because fuzzy classification can characterize a continuum of a phenomena where discrete classification may produce artificial borders, fuzzy classification was found to offer a range of fire severity information unavailable with discrete classification. These mapped fire patterns were used to calibrate a model of fire hazards for the entire mountain range. Pre-fire TM, and a digital elevation model produced a set of co-registered images. Training statistics were developed from 30 polygons associated with the previously mapped fire severity. Fuzzy classifications of potential burn patterns were produced from these images. Observed field data values were displayed over the hazard imagery to indicate the effectiveness of the model. Areas that burned without suppression during maximum fire severity are predicted best. Areas with widely spaced trees and grassy understory appear to be misrepresented, perhaps as a consequence of inaccuracies in the initial fire mapping.
NASA Astrophysics Data System (ADS)
Theodorakou, Chrysoula; Farquharson, Michael J.
2009-08-01
The motivation behind this study is to assess whether angular dispersive x-ray diffraction (ADXRD) data, processed using multivariate analysis techniques, can be used for classifying secondary colorectal liver cancer tissue and normal surrounding liver tissue in human liver biopsy samples. The ADXRD profiles from a total of 60 samples of normal liver tissue and colorectal liver metastases were measured using a synchrotron radiation source. The data were analysed for 56 samples using nonlinear peak-fitting software. Four peaks were fitted to all of the ADXRD profiles, and the amplitude, area, amplitude and area ratios for three of the four peaks were calculated and used for the statistical and multivariate analysis. The statistical analysis showed that there are significant differences between all the peak-fitting parameters and ratios between the normal and the diseased tissue groups. The technique of soft independent modelling of class analogy (SIMCA) was used to classify normal liver tissue and colorectal liver metastases resulting in 67% of the normal tissue samples and 60% of the secondary colorectal liver tissue samples being classified correctly. This study has shown that the ADXRD data of normal and secondary colorectal liver cancer are statistically different and x-ray diffraction data analysed using multivariate analysis have the potential to be used as a method of tissue classification.
Michael L. Hoppus; Andrew J. Lister
2002-01-01
A Landsat TM classification method (iterative guided spectral class rejection) produced a forest cover map of southern West Virginia that provided the stratification layer for producing estimates of timberland area from Forest Service FIA ground plots using a stratified sampling technique. These same high quality and expensive FIA ground plots provided ground reference...
Inventory and mapping of flood inundation using interactive digital image analysis techniques
Rohde, Wayne G.; Nelson, Charles A.; Taranik, J.V.
1979-01-01
LANDSAT digital data and color infra-red photographs were used in a multiphase sampling scheme to estimate the area of agricultural land affected by a flood. The LANDSAT data were classified with a maximum likelihood algorithm. Stratification of the LANDSAT data, prior to classification, greatly reduced misclassification errors. The classification results were used to prepare a map overlay showing the areal extent of flooding. These data also provided statistics required to estimate sample size in a two phase sampling scheme, and provided quick, accurate estimates of areas flooded for the first phase. The measurements made in the second phase, based on ground data and photo-interpretation, were used with two phase sampling statistics to estimate the area of agricultural land affected by flooding These results show that LANDSAT digital data can be used to prepare map overlays showing the extent of flooding on agricultural land and, with two phase sampling procedures, can provide acreage estimates with sampling errors of about 5 percent. This procedure provides a technique for rapidly assessing the areal extent of flood conditions on agricultural land and would provide a basis for designing a sampling framework to estimate the impact of flooding on crop production.
Comparing Pattern Recognition Feature Sets for Sorting Triples in the FIRST Database
NASA Astrophysics Data System (ADS)
Proctor, D. D.
2006-07-01
Pattern recognition techniques have been used with increasing success for coping with the tremendous amounts of data being generated by automated surveys. Usually this process involves construction of training sets, the typical examples of data with known classifications. Given a feature set, along with the training set, statistical methods can be employed to generate a classifier. The classifier is then applied to process the remaining data. Feature set selection, however, is still an issue. This paper presents techniques developed for accommodating data for which a substantive portion of the training set cannot be classified unambiguously, a typical case for low-resolution data. Significance tests on the sort-ordered, sample-size-normalized vote distribution of an ensemble of decision trees is introduced as a method of evaluating relative quality of feature sets. The technique is applied to comparing feature sets for sorting a particular radio galaxy morphology, bent-doubles, from the Faint Images of the Radio Sky at Twenty Centimeters (FIRST) database. Also examined are alternative functional forms for feature sets. Associated standard deviations provide the means to evaluate the effect of the number of folds, the number of classifiers per fold, and the sample size on the resulting classifications. The technique also may be applied to situations for which, although accurate classifications are available, the feature set is clearly inadequate, but is desired nonetheless to make the best of available information.
NASA Technical Reports Server (NTRS)
Hoffer, R. M.; Dean, M. E.; Knowlton, D. J.; Latty, R. S.
1982-01-01
Kershaw County, South Carolina was selected as the study site for analyzing simulated thematic mapper MSS data and dual-polarized X-band synthetic aperture radar (SAR) data. The impact of the improved spatial and spectral characteristics of the LANDSAT D thematic mapper data on computer aided analysis for forest cover type mapping was examined as well as the value of synthetic aperture radar data for differentiating forest and other cover types. The utility of pattern recognition techniques for analyzing SAR data was assessed. Topics covered include: (1) collection and of TMS and reference data; (2) reformatting, geometric and radiometric rectification, and spatial resolution degradation of TMS data; (3) development of training statistics and test data sets; (4) evaluation of different numbers and combinations of wavelength bands on classification performance; (5) comparison among three classification algorithms; and (6) the effectiveness of the principal component transformation in data analysis. The collection, digitization, reformatting, and geometric adjustment of SAR data are also discussed. Image interpretation results and classification results are presented.
Tu, Li-ping; Chen, Jing-bo; Hu, Xiao-juan; Zhang, Zhi-feng
2016-01-01
Background and Goal. The application of digital image processing techniques and machine learning methods in tongue image classification in Traditional Chinese Medicine (TCM) has been widely studied nowadays. However, it is difficult for the outcomes to generalize because of lack of color reproducibility and image standardization. Our study aims at the exploration of tongue colors classification with a standardized tongue image acquisition process and color correction. Methods. Three traditional Chinese medical experts are chosen to identify the selected tongue pictures taken by the TDA-1 tongue imaging device in TIFF format through ICC profile correction. Then we compare the mean value of L * a * b * of different tongue colors and evaluate the effect of the tongue color classification by machine learning methods. Results. The L * a * b * values of the five tongue colors are statistically different. Random forest method has a better performance than SVM in classification. SMOTE algorithm can increase classification accuracy by solving the imbalance of the varied color samples. Conclusions. At the premise of standardized tongue acquisition and color reproduction, preliminary objectification of tongue color classification in Traditional Chinese Medicine (TCM) is feasible. PMID:28050555
Qi, Zhen; Tu, Li-Ping; Chen, Jing-Bo; Hu, Xiao-Juan; Xu, Jia-Tuo; Zhang, Zhi-Feng
2016-01-01
Background and Goal . The application of digital image processing techniques and machine learning methods in tongue image classification in Traditional Chinese Medicine (TCM) has been widely studied nowadays. However, it is difficult for the outcomes to generalize because of lack of color reproducibility and image standardization. Our study aims at the exploration of tongue colors classification with a standardized tongue image acquisition process and color correction. Methods . Three traditional Chinese medical experts are chosen to identify the selected tongue pictures taken by the TDA-1 tongue imaging device in TIFF format through ICC profile correction. Then we compare the mean value of L * a * b * of different tongue colors and evaluate the effect of the tongue color classification by machine learning methods. Results . The L * a * b * values of the five tongue colors are statistically different. Random forest method has a better performance than SVM in classification. SMOTE algorithm can increase classification accuracy by solving the imbalance of the varied color samples. Conclusions . At the premise of standardized tongue acquisition and color reproduction, preliminary objectification of tongue color classification in Traditional Chinese Medicine (TCM) is feasible.
IRIS COLOUR CLASSIFICATION SCALES – THEN AND NOW
Grigore, Mariana; Avram, Alina
2015-01-01
Eye colour is one of the most obvious phenotypic traits of an individual. Since the first documented classification scale developed in 1843, there have been numerous attempts to classify the iris colour. In the past centuries, iris colour classification scales has had various colour categories and mostly relied on comparison of an individual’s eye with painted glass eyes. Once photography techniques were refined, standard iris photographs replaced painted eyes, but this did not solve the problem of painted/ printed colour variability in time. Early clinical scales were easy to use, but lacked objectivity and were not standardised or statistically tested for reproducibility. The era of automated iris colour classification systems came with the technological development. Spectrophotometry, digital analysis of high-resolution iris images, hyper spectral analysis of the human real iris and the dedicated iris colour analysis software, all accomplished an objective, accurate iris colour classification, but are quite expensive and limited in use to research environment. Iris colour classification systems evolved continuously due to their use in a wide range of studies, especially in the fields of anthropology, epidemiology and genetics. Despite the wide range of the existing scales, up until present there has been no generally accepted iris colour classification scale. PMID:27373112
IRIS COLOUR CLASSIFICATION SCALES--THEN AND NOW.
Grigore, Mariana; Avram, Alina
2015-01-01
Eye colour is one of the most obvious phenotypic traits of an individual. Since the first documented classification scale developed in 1843, there have been numerous attempts to classify the iris colour. In the past centuries, iris colour classification scales has had various colour categories and mostly relied on comparison of an individual's eye with painted glass eyes. Once photography techniques were refined, standard iris photographs replaced painted eyes, but this did not solve the problem of painted/ printed colour variability in time. Early clinical scales were easy to use, but lacked objectivity and were not standardised or statistically tested for reproducibility. The era of automated iris colour classification systems came with the technological development. Spectrophotometry, digital analysis of high-resolution iris images, hyper spectral analysis of the human real iris and the dedicated iris colour analysis software, all accomplished an objective, accurate iris colour classification, but are quite expensive and limited in use to research environment. Iris colour classification systems evolved continuously due to their use in a wide range of studies, especially in the fields of anthropology, epidemiology and genetics. Despite the wide range of the existing scales, up until present there has been no generally accepted iris colour classification scale.
Nahid, Abdullah-Al; Mehrabi, Mohamad Ali; Kong, Yinan
2018-01-01
Breast Cancer is a serious threat and one of the largest causes of death of women throughout the world. The identification of cancer largely depends on digital biomedical photography analysis such as histopathological images by doctors and physicians. Analyzing histopathological images is a nontrivial task, and decisions from investigation of these kinds of images always require specialised knowledge. However, Computer Aided Diagnosis (CAD) techniques can help the doctor make more reliable decisions. The state-of-the-art Deep Neural Network (DNN) has been recently introduced for biomedical image analysis. Normally each image contains structural and statistical information. This paper classifies a set of biomedical breast cancer images (BreakHis dataset) using novel DNN techniques guided by structural and statistical information derived from the images. Specifically a Convolutional Neural Network (CNN), a Long-Short-Term-Memory (LSTM), and a combination of CNN and LSTM are proposed for breast cancer image classification. Softmax and Support Vector Machine (SVM) layers have been used for the decision-making stage after extracting features utilising the proposed novel DNN models. In this experiment the best Accuracy value of 91.00% is achieved on the 200x dataset, the best Precision value 96.00% is achieved on the 40x dataset, and the best F -Measure value is achieved on both the 40x and 100x datasets.
Yang, X; Le, D; Zhang, Y L; Liang, L Z; Yang, G; Hu, W J
2016-10-18
To explore a crown form classification method for upper central incisor which is more objective and scientific than traditional classification method based on the standardized photography technique. To analyze the relationship between crown form of upper central incisors and papilla filling in periodontally healthy Chinese Han-nationality youth. In the study, 180 periodontally healthy Chinese youth ( 75 males, and 105 females ) aged 20-30 (24.3±4.5) years were included. With the standardized upper central incisor photography technique, pictures of 360 upper central incisors were obtained. Each tooth was classified as triangular, ovoid or square by 13 experienced specialist majors in prothodontics independently and the final classification result was decided by most evaluators in order to ensure objectivity. The standardized digital photo was also used to evaluate the gingival papilla filling situation. The papilla filling result was recorded as present or absent according to naked eye observation. The papilla filling rates of different crown forms were analyzed. Statistical analyses were performed with SPSS 19.0. The proportions of triangle, ovoid and square forms of upper central incisor in Chinese Han-nationality youth were 31.4% (113/360), 37.2% (134/360) and 31.4% (113/360 ), respectively, and no statistical difference was found between the males and females. Average κ value between each two evaluators was 0.381. Average κ value was raised up to 0.563 when compared with the final classification result. In the study, 24 upper central incisors without contact were excluded, and the papilla filling rates of triangle, ovoid and square crown were 56.4% (62/110), 69.6% (87/125), 76.2% (77/101) separately. The papilla filling rate of square form was higher (P=0.007). The proportion of clinical crown form of upper central incisor in Chinese Han-nationality youth is obtained. Compared with triangle form, square form is found to favor a gingival papilla that fills the interproximal embrasure space. The consistency of the present classification method for upper central incisor is not satisfying, which indicates that a new classification method, more scientific and objective than the present one, is to be found.
Superordinate Shape Classification Using Natural Shape Statistics
ERIC Educational Resources Information Center
Wilder, John; Feldman, Jacob; Singh, Manish
2011-01-01
This paper investigates the classification of shapes into broad natural categories such as "animal" or "leaf". We asked whether such coarse classifications can be achieved by a simple statistical classification of the shape skeleton. We surveyed databases of natural shapes, extracting shape skeletons and tabulating their…
Evaluation of normalization methods for cDNA microarray data by k-NN classification
Wu, Wei; Xing, Eric P; Myers, Connie; Mian, I Saira; Bissell, Mina J
2005-01-01
Background Non-biological factors give rise to unwanted variations in cDNA microarray data. There are many normalization methods designed to remove such variations. However, to date there have been few published systematic evaluations of these techniques for removing variations arising from dye biases in the context of downstream, higher-order analytical tasks such as classification. Results Ten location normalization methods that adjust spatial- and/or intensity-dependent dye biases, and three scale methods that adjust scale differences were applied, individually and in combination, to five distinct, published, cancer biology-related cDNA microarray data sets. Leave-one-out cross-validation (LOOCV) classification error was employed as the quantitative end-point for assessing the effectiveness of a normalization method. In particular, a known classifier, k-nearest neighbor (k-NN), was estimated from data normalized using a given technique, and the LOOCV error rate of the ensuing model was computed. We found that k-NN classifiers are sensitive to dye biases in the data. Using NONRM and GMEDIAN as baseline methods, our results show that single-bias-removal techniques which remove either spatial-dependent dye bias (referred later as spatial effect) or intensity-dependent dye bias (referred later as intensity effect) moderately reduce LOOCV classification errors; whereas double-bias-removal techniques which remove both spatial- and intensity effect reduce LOOCV classification errors even further. Of the 41 different strategies examined, three two-step processes, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, all of which removed intensity effect globally and spatial effect locally, appear to reduce LOOCV classification errors most consistently and effectively across all data sets. We also found that the investigated scale normalization methods do not reduce LOOCV classification error. Conclusion Using LOOCV error of k-NNs as the evaluation criterion, three double-bias-removal normalization strategies, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, outperform other strategies for removing spatial effect, intensity effect and scale differences from cDNA microarray data. The apparent sensitivity of k-NN LOOCV classification error to dye biases suggests that this criterion provides an informative measure for evaluating normalization methods. All the computational tools used in this study were implemented using the R language for statistical computing and graphics. PMID:16045803
Evaluation of normalization methods for cDNA microarray data by k-NN classification.
Wu, Wei; Xing, Eric P; Myers, Connie; Mian, I Saira; Bissell, Mina J
2005-07-26
Non-biological factors give rise to unwanted variations in cDNA microarray data. There are many normalization methods designed to remove such variations. However, to date there have been few published systematic evaluations of these techniques for removing variations arising from dye biases in the context of downstream, higher-order analytical tasks such as classification. Ten location normalization methods that adjust spatial- and/or intensity-dependent dye biases, and three scale methods that adjust scale differences were applied, individually and in combination, to five distinct, published, cancer biology-related cDNA microarray data sets. Leave-one-out cross-validation (LOOCV) classification error was employed as the quantitative end-point for assessing the effectiveness of a normalization method. In particular, a known classifier, k-nearest neighbor (k-NN), was estimated from data normalized using a given technique, and the LOOCV error rate of the ensuing model was computed. We found that k-NN classifiers are sensitive to dye biases in the data. Using NONRM and GMEDIAN as baseline methods, our results show that single-bias-removal techniques which remove either spatial-dependent dye bias (referred later as spatial effect) or intensity-dependent dye bias (referred later as intensity effect) moderately reduce LOOCV classification errors; whereas double-bias-removal techniques which remove both spatial- and intensity effect reduce LOOCV classification errors even further. Of the 41 different strategies examined, three two-step processes, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, all of which removed intensity effect globally and spatial effect locally, appear to reduce LOOCV classification errors most consistently and effectively across all data sets. We also found that the investigated scale normalization methods do not reduce LOOCV classification error. Using LOOCV error of k-NNs as the evaluation criterion, three double-bias-removal normalization strategies, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, outperform other strategies for removing spatial effect, intensity effect and scale differences from cDNA microarray data. The apparent sensitivity of k-NN LOOCV classification error to dye biases suggests that this criterion provides an informative measure for evaluating normalization methods. All the computational tools used in this study were implemented using the R language for statistical computing and graphics.
Multiclass Bayes error estimation by a feature space sampling technique
NASA Technical Reports Server (NTRS)
Mobasseri, B. G.; Mcgillem, C. D.
1979-01-01
A general Gaussian M-class N-feature classification problem is defined. An algorithm is developed that requires the class statistics as its only input and computes the minimum probability of error through use of a combined analytical and numerical integration over a sequence simplifying transformations of the feature space. The results are compared with those obtained by conventional techniques applied to a 2-class 4-feature discrimination problem with results previously reported and 4-class 4-feature multispectral scanner Landsat data classified by training and testing of the available data.
Mapping forest vegetation with ERTS-1 MSS data and automatic data processing techniques
NASA Technical Reports Server (NTRS)
Messmore, J.; Copeland, G. E.; Levy, G. F.
1975-01-01
This study was undertaken with the intent of elucidating the forest mapping capabilities of ERTS-1 MSS data when analyzed with the aid of LARS' automatic data processing techniques. The site for this investigation was the Great Dismal Swamp, a 210,000 acre wilderness area located on the Middle Atlantic coastal plain. Due to inadequate ground truth information on the distribution of vegetation within the swamp, an unsupervised classification scheme was utilized. Initially pictureprints, resembling low resolution photographs, were generated in each of the four ERTS-1 channels. Data found within rectangular training fields was then clustered into 13 spectral groups and defined statistically. Using a maximum likelihood classification scheme, the unknown data points were subsequently classified into one of the designated training classes. Training field data was classified with a high degree of accuracy (greater than 95%), and progress is being made towards identifying the mapped spectral classes.
Mapping forest vegetation with ERTS-1 MSS data and automatic data processing techniques
NASA Technical Reports Server (NTRS)
Messmore, J.; Copeland, G. E.; Levy, G. F.
1975-01-01
This study was undertaken with the intent of elucidating the forest mapping capabilities of ERTS-1 MSS data when analyzed with the aid of LARS' automatic data processing techniques. The site for this investigation was the Great Dismal Swamp, a 210,000 acre wilderness area located on the Middle Atlantic coastal plain. Due to inadequate ground truth information on the distribution of vegetation within the swamp, an unsupervised classification scheme was utilized. Initially pictureprints, resembling low resolution photographs, were generated in each of the four ERTS-1 channels. Data found within rectangular training fields was then clustered into 13 spectral groups and defined statistically. Using a maximum likelihood classification scheme, the unknown data points were subsequently classified into one of the designated training classes. Training field data was classified with a high degree of accuracy (greater than 95 percent), and progress is being made towards identifying the mapped spectral classes.
Landenburger, L.; Lawrence, R.L.; Podruzny, S.; Schwartz, C.C.
2008-01-01
Moderate resolution satellite imagery traditionally has been thought to be inadequate for mapping vegetation at the species level. This has made comprehensive mapping of regional distributions of sensitive species, such as whitebark pine, either impractical or extremely time consuming. We sought to determine whether using a combination of moderate resolution satellite imagery (Landsat Enhanced Thematic Mapper Plus), extensive stand data collected by land management agencies for other purposes, and modern statistical classification techniques (boosted classification trees) could result in successful mapping of whitebark pine. Overall classification accuracies exceeded 90%, with similar individual class accuracies. Accuracies on a localized basis varied based on elevation. Accuracies also varied among administrative units, although we were not able to determine whether these differences related to inherent spatial variations or differences in the quality of available reference data.
A Comparison of Two-Group Classification Methods
ERIC Educational Resources Information Center
Holden, Jocelyn E.; Finch, W. Holmes; Kelley, Ken
2011-01-01
The statistical classification of "N" individuals into "G" mutually exclusive groups when the actual group membership is unknown is common in the social and behavioral sciences. The results of such classification methods often have important consequences. Among the most common methods of statistical classification are linear discriminant analysis,…
2010-12-01
recommend [13]. 2.2 Commercial content scanning technology In [16], a companion piece to this paper, Magar completed a thorough review of commercially...defense Canada Chef de file au Canada en matiere de science et de technologie pour la defense et la securite nationale DEFENCE ~~EFENSE (_.,./ www.drdc-rddc.gc.ca
A Visual mining based framework for classification accuracy estimation
NASA Astrophysics Data System (ADS)
Arun, Pattathal Vijayakumar
2013-12-01
Classification techniques have been widely used in different remote sensing applications and correct classification of mixed pixels is a tedious task. Traditional approaches adopt various statistical parameters, however does not facilitate effective visualisation. Data mining tools are proving very helpful in the classification process. We propose a visual mining based frame work for accuracy assessment of classification techniques using open source tools such as WEKA and PREFUSE. These tools in integration can provide an efficient approach for getting information about improvements in the classification accuracy and helps in refining training data set. We have illustrated framework for investigating the effects of various resampling methods on classification accuracy and found that bilinear (BL) is best suited for preserving radiometric characteristics. We have also investigated the optimal number of folds required for effective analysis of LISS-IV images. Techniki klasyfikacji są szeroko wykorzystywane w różnych aplikacjach teledetekcyjnych, w których poprawna klasyfikacja pikseli stanowi poważne wyzwanie. Podejście tradycyjne wykorzystujące różnego rodzaju parametry statystyczne nie zapewnia efektywnej wizualizacji. Wielce obiecujące wydaje się zastosowanie do klasyfikacji narzędzi do eksploracji danych. W artykule zaproponowano podejście bazujące na wizualnej analizie eksploracyjnej, wykorzystujące takie narzędzia typu open source jak WEKA i PREFUSE. Wymienione narzędzia ułatwiają korektę pół treningowych i efektywnie wspomagają poprawę dokładności klasyfikacji. Działanie metody sprawdzono wykorzystując wpływ różnych metod resampling na zachowanie dokładności radiometrycznej i uzyskując najlepsze wyniki dla metody bilinearnej (BL).
Evaluation of change detection techniques for monitoring coastal zone environments
NASA Technical Reports Server (NTRS)
Weismiller, R. A. (Principal Investigator); Kristof, S. J.; Scholz, D. K.; Anuta, P. E.; Momin, S. M.
1977-01-01
The author has identified the following significant results. Four change detection techniques were designed and implemented for evaluation: (1) post classification comparison change detection, (2) delta data change detection, (3) spectral/temporal change classification, and (4) layered spectral/temporal change classification. The post classification comparison technique reliably identified areas of change and was used as the standard for qualitatively evaluating the other three techniques. The layered spectral/temporal change classification and the delta data change detection results generally agreed with the post classification comparison technique results; however, many small areas of change were not identified. Major discrepancies existed between the post classification comparison and spectral/temporal change detection results.
Assessment of statistical methods used in library-based approaches to microbial source tracking.
Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D
2003-12-01
Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.
Hassan, Ahnaf Rashik; Bhuiyan, Mohammed Imamul Hassan
2017-03-01
Automatic sleep staging is essential for alleviating the burden of the physicians of analyzing a large volume of data by visual inspection. It is also a precondition for making an automated sleep monitoring system feasible. Further, computerized sleep scoring will expedite large-scale data analysis in sleep research. Nevertheless, most of the existing works on sleep staging are either multichannel or multiple physiological signal based which are uncomfortable for the user and hinder the feasibility of an in-home sleep monitoring device. So, a successful and reliable computer-assisted sleep staging scheme is yet to emerge. In this work, we propose a single channel EEG based algorithm for computerized sleep scoring. In the proposed algorithm, we decompose EEG signal segments using Ensemble Empirical Mode Decomposition (EEMD) and extract various statistical moment based features. The effectiveness of EEMD and statistical features are investigated. Statistical analysis is performed for feature selection. A newly proposed classification technique, namely - Random under sampling boosting (RUSBoost) is introduced for sleep stage classification. This is the first implementation of EEMD in conjunction with RUSBoost to the best of the authors' knowledge. The proposed feature extraction scheme's performance is investigated for various choices of classification models. The algorithmic performance of our scheme is evaluated against contemporary works in the literature. The performance of the proposed method is comparable or better than that of the state-of-the-art ones. The proposed algorithm gives 88.07%, 83.49%, 92.66%, 94.23%, and 98.15% for 6-state to 2-state classification of sleep stages on Sleep-EDF database. Our experimental outcomes reveal that RUSBoost outperforms other classification models for the feature extraction framework presented in this work. Besides, the algorithm proposed in this work demonstrates high detection accuracy for the sleep states S1 and REM. Statistical moment based features in the EEMD domain distinguish the sleep states successfully and efficaciously. The automated sleep scoring scheme propounded herein can eradicate the onus of the clinicians, contribute to the device implementation of a sleep monitoring system, and benefit sleep research. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Challenges in the automated classification of variable stars in large databases
NASA Astrophysics Data System (ADS)
Graham, Matthew; Drake, Andrew; Djorgovski, S. G.; Mahabal, Ashish; Donalek, Ciro
2017-09-01
With ever-increasing numbers of astrophysical transient surveys, new facilities and archives of astronomical time series, time domain astronomy is emerging as a mainstream discipline. However, the sheer volume of data alone - hundreds of observations for hundreds of millions of sources - necessitates advanced statistical and machine learning methodologies for scientific discovery: characterization, categorization, and classification. Whilst these techniques are slowly entering the astronomer's toolkit, their application to astronomical problems is not without its issues. In this paper, we will review some of the challenges posed by trying to identify variable stars in large data collections, including appropriate feature representations, dealing with uncertainties, establishing ground truths, and simple discrete classes.
NASA Astrophysics Data System (ADS)
Hasaan, Zahra
2016-07-01
Remote sensing is very useful for the production of land use and land cover statistics which can be beneficial to determine the distribution of land uses. Using remote sensing techniques to develop land use classification mapping is a convenient and detailed way to improve the selection of areas designed to agricultural, urban and/or industrial areas of a region. In Islamabad city and surrounding the land use has been changing, every day new developments (urban, industrial, commercial and agricultural) are emerging leading to decrease in vegetation cover. The purpose of this work was to develop the land use of Islamabad and its surrounding area that is an important natural resource. For this work the eCognition Developer 64 computer software was used to develop a land use classification using SPOT 5 image of year 2012. For image processing object-based classification technique was used and important land use features i.e. Vegetation cover, barren land, impervious surface, built up area and water bodies were extracted on the basis of object variation and compared the results with the CDA Master Plan. The great increase was found in built-up area and impervious surface area. On the other hand vegetation cover and barren area followed a declining trend. Accuracy assessment of classification yielded 92% accuracies of the final land cover land use maps. In addition these improved land cover/land use maps which are produced by remote sensing technique of class definition, meet the growing need of legend standardization.
Task Oriented Evaluation of Module Extraction Techniques
NASA Astrophysics Data System (ADS)
Palmisano, Ignazio; Tamma, Valentina; Payne, Terry; Doran, Paul
Ontology Modularization techniques identify coherent and often reusable regions within an ontology. The ability to identify such modules, thus potentially reducing the size or complexity of an ontology for a given task or set of concepts is increasingly important in the Semantic Web as domain ontologies increase in terms of size, complexity and expressivity. To date, many techniques have been developed, but evaluation of the results of these techniques is sketchy and somewhat ad hoc. Theoretical properties of modularization algorithms have only been studied in a small number of cases. This paper presents an empirical analysis of a number of modularization techniques, and the modules they identify over a number of diverse ontologies, by utilizing objective, task-oriented measures to evaluate the fitness of the modules for a number of statistical classification problems.
B. Baker; Henry Diaz; William Hargrove; Forrest Hoffman
2010-01-01
Changes in climate as projected by state-of-the-art climate models are likely to result in novel combinations of climate and topo-edaphic factors that will have substantial impacts on the distribution and persistence of natural vegetation and animal species. We have used multivariate techniques to quantify some of these changes; the...
Artificial intelligence in the diagnosis of low back pain.
Mann, N H; Brown, M D
1991-04-01
Computerized methods are used to recognize the characteristics of patient pain drawings. Artificial neural network (ANN) models are compared with expert predictions and traditional statistical classification methods when placing the pain drawings of low back pain patients into one of five clinically significant categories. A discussion is undertaken outlining the differences in these classifiers and the potential benefits of the ANN model as an artificial intelligence technique.
Inverse imaging of the breast with a material classification technique.
Manry, C W; Broschat, S L
1998-03-01
In recent publications [Chew et al., IEEE Trans. Blomed. Eng. BME-9, 218-225 (1990); Borup et al., Ultrason. Imaging 14, 69-85 (1992)] the inverse imaging problem has been solved by means of a two-step iterative method. In this paper, a third step is introduced for ultrasound imaging of the breast. In this step, which is based on statistical pattern recognition, classification of tissue types and a priori knowledge of the anatomy of the breast are integrated into the iterative method. Use of this material classification technique results in more rapid convergence to the inverse solution--approximately 40% fewer iterations are required--as well as greater accuracy. In addition, tumors are detected early in the reconstruction process. Results for reconstructions of a simple two-dimensional model of the human breast are presented. These reconstructions are extremely accurate when system noise and variations in tissue parameters are not too great. However, for the algorithm used, degradation of the reconstructions and divergence from the correct solution occur when system noise and variations in parameters exceed threshold values. Even in this case, however, tumors are still identified within a few iterations.
Predictive analysis effectiveness in determining the epidemic disease infected area
NASA Astrophysics Data System (ADS)
Ibrahim, Najihah; Akhir, Nur Shazwani Md.; Hassan, Fadratul Hafinaz
2017-10-01
Epidemic disease outbreak had caused nowadays community to raise their great concern over the infectious disease controlling, preventing and handling methods to diminish the disease dissemination percentage and infected area. Backpropagation method was used for the counter measure and prediction analysis of the epidemic disease. The predictive analysis based on the backpropagation method can be determine via machine learning process that promotes the artificial intelligent in pattern recognition, statistics and features selection. This computational learning process will be integrated with data mining by measuring the score output as the classifier to the given set of input features through classification technique. The classification technique is the features selection of the disease dissemination factors that likely have strong interconnection between each other in causing infectious disease outbreaks. The predictive analysis of epidemic disease in determining the infected area was introduced in this preliminary study by using the backpropagation method in observation of other's findings. This study will classify the epidemic disease dissemination factors as the features for weight adjustment on the prediction of epidemic disease outbreaks. Through this preliminary study, the predictive analysis is proven to be effective method in determining the epidemic disease infected area by minimizing the error value through the features classification.
Test of spectral/spatial classifier
NASA Technical Reports Server (NTRS)
Landgrebe, D. A. (Principal Investigator); Kast, J. L.; Davis, B. J.
1977-01-01
The author has identified the following significant results. The supervised ECHO processor (which utilizes class statistics for object identification) successfully exploits the redundancy of states characteristic of sampled imagery of ground scenes to achieve better classification accuracy, reduce the number of classifications required, and reduce the variability of classification results. The nonsupervised ECHO processor (which identifies objects without the benefit of class statistics) successfully reduces the number of classifications required and the variability of the classification results.
Comparative Analysis of Document level Text Classification Algorithms using R
NASA Astrophysics Data System (ADS)
Syamala, Maganti; Nalini, N. J., Dr; Maguluri, Lakshamanaphaneendra; Ragupathy, R., Dr.
2017-08-01
From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.
A new classification scheme of plastic wastes based upon recycling labels
DOE Office of Scientific and Technical Information (OSTI.GOV)
Özkan, Kemal, E-mail: kozkan@ogu.edu.tr; Ergin, Semih, E-mail: sergin@ogu.edu.tr; Işık, Şahin, E-mail: sahini@ogu.edu.tr
Highlights: • PET, HPDE or PP types of plastics are considered. • An automated classification of plastic bottles based on the feature extraction and classification methods is performed. • The decision mechanism consists of PCA, Kernel PCA, FLDA, SVD and Laplacian Eigenmaps methods. • SVM is selected to achieve the classification task and majority voting technique is used. - Abstract: Since recycling of materials is widely assumed to be environmentally and economically beneficial, reliable sorting and processing of waste packaging materials such as plastics is very important for recycling with high efficiency. An automated system that can quickly categorize thesemore » materials is certainly needed for obtaining maximum classification while maintaining high throughput. In this paper, first of all, the photographs of the plastic bottles have been taken and several preprocessing steps were carried out. The first preprocessing step is to extract the plastic area of a bottle from the background. Then, the morphological image operations are implemented. These operations are edge detection, noise removal, hole removing, image enhancement, and image segmentation. These morphological operations can be generally defined in terms of the combinations of erosion and dilation. The effect of bottle color as well as label are eliminated using these operations. Secondly, the pixel-wise intensity values of the plastic bottle images have been used together with the most popular subspace and statistical feature extraction methods to construct the feature vectors in this study. Only three types of plastics are considered due to higher existence ratio of them than the other plastic types in the world. The decision mechanism consists of five different feature extraction methods including as Principal Component Analysis (PCA), Kernel PCA (KPCA), Fisher’s Linear Discriminant Analysis (FLDA), Singular Value Decomposition (SVD) and Laplacian Eigenmaps (LEMAP) and uses a simple experimental setup with a camera and homogenous backlighting. Due to the giving global solution for a classification problem, Support Vector Machine (SVM) is selected to achieve the classification task and majority voting technique is used as the decision mechanism. This technique equally weights each classification result and assigns the given plastic object to the class that the most classification results agree on. The proposed classification scheme provides high accuracy rate, and also it is able to run in real-time applications. It can automatically classify the plastic bottle types with approximately 90% recognition accuracy. Besides this, the proposed methodology yields approximately 96% classification rate for the separation of PET or non-PET plastic types. It also gives 92% accuracy for the categorization of non-PET plastic types into HPDE or PP.« less
Automatic Fault Characterization via Abnormality-Enhanced Classification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bronevetsky, G; Laguna, I; de Supinski, B R
Enterprise and high-performance computing systems are growing extremely large and complex, employing hundreds to hundreds of thousands of processors and software/hardware stacks built by many people across many organizations. As the growing scale of these machines increases the frequency of faults, system complexity makes these faults difficult to detect and to diagnose. Current system management techniques, which focus primarily on efficient data access and query mechanisms, require system administrators to examine the behavior of various system services manually. Growing system complexity is making this manual process unmanageable: administrators require more effective management tools that can detect faults and help tomore » identify their root causes. System administrators need timely notification when a fault is manifested that includes the type of fault, the time period in which it occurred and the processor on which it originated. Statistical modeling approaches can accurately characterize system behavior. However, the complex effects of system faults make these tools difficult to apply effectively. This paper investigates the application of classification and clustering algorithms to fault detection and characterization. We show experimentally that naively applying these methods achieves poor accuracy. Further, we design novel techniques that combine classification algorithms with information on the abnormality of application behavior to improve detection and characterization accuracy. Our experiments demonstrate that these techniques can detect and characterize faults with 65% accuracy, compared to just 5% accuracy for naive approaches.« less
Gao, Chao; Sun, Hanbo; Wang, Tuo; Tang, Ming; Bohnen, Nicolaas I; Müller, Martijn L T M; Herman, Talia; Giladi, Nir; Kalinin, Alexandr; Spino, Cathie; Dauer, William; Hausdorff, Jeffrey M; Dinov, Ivo D
2018-05-08
In this study, we apply a multidisciplinary approach to investigate falls in PD patients using clinical, demographic and neuroimaging data from two independent initiatives (University of Michigan and Tel Aviv Sourasky Medical Center). Using machine learning techniques, we construct predictive models to discriminate fallers and non-fallers. Through controlled feature selection, we identified the most salient predictors of patient falls including gait speed, Hoehn and Yahr stage, postural instability and gait difficulty-related measurements. The model-based and model-free analytical methods we employed included logistic regression, random forests, support vector machines, and XGboost. The reliability of the forecasts was assessed by internal statistical (5-fold) cross validation as well as by external out-of-bag validation. Four specific challenges were addressed in the study: Challenge 1, develop a protocol for harmonizing and aggregating complex, multisource, and multi-site Parkinson's disease data; Challenge 2, identify salient predictive features associated with specific clinical traits, e.g., patient falls; Challenge 3, forecast patient falls and evaluate the classification performance; and Challenge 4, predict tremor dominance (TD) vs. posture instability and gait difficulty (PIGD). Our findings suggest that, compared to other approaches, model-free machine learning based techniques provide a more reliable clinical outcome forecasting of falls in Parkinson's patients, for example, with a classification accuracy of about 70-80%.
Statistical strategy for anisotropic adventitia modelling in IVUS.
Gil, Debora; Hernández, Aura; Rodriguez, Oriol; Mauri, Josepa; Radeva, Petia
2006-06-01
Vessel plaque assessment by analysis of intravascular ultrasound sequences is a useful tool for cardiac disease diagnosis and intervention. Manual detection of luminal (inner) and media-adventitia (external) vessel borders is the main activity of physicians in the process of lumen narrowing (plaque) quantification. Difficult definition of vessel border descriptors, as well as, shades, artifacts, and blurred signal response due to ultrasound physical properties trouble automated adventitia segmentation. In order to efficiently approach such a complex problem, we propose blending advanced anisotropic filtering operators and statistical classification techniques into a vessel border modelling strategy. Our systematic statistical analysis shows that the reported adventitia detection achieves an accuracy in the range of interobserver variability regardless of plaque nature, vessel geometry, and incomplete vessel borders.
NASA Astrophysics Data System (ADS)
McNabb, R. W.; Womble, J. N.; Prakash, A.; Gens, R.; Ver Hoef, J.
2014-12-01
Tidewater glaciers play an important role in many landscape and ecosystem processes in fjords, terminating in the sea and calving icebergs and discharging meltwater directly into the ocean. Tidewater glaciers provide floating ice for use as habitat for harbor seals (Phoca vitulina richardii) for resting, pupping, nursing, molting, and avoiding predators. Tidewater glaciers are found in high concentrations in Southeast and Southcentral Alaska; currently, many of these glaciers are retreating or have stabilized in a retracted state, raising questions about the future availability of ice in these fjords as habitat for seals. Our primary objective is to investigate the relationship between harbor seal distribution and ice availability at an advancing tidewater glacier in Johns Hopkins Inlet, Glacier Bay National Park, Alaska. To this end, we use a combination of visible and infrared aerial photographs, object-based image analysis (OBIA), and statistical modeling techniques. We have developed a workflow to automate the processing of the imagery and the classification of the fjordscape (e.g., individual icebergs, brash ice, and open water), providing quantitative information on ice coverage as well as properties not typically found in traditional pixel-based classification techniques, such as block angularity and seal density across the fjord. Reflectance variation in the red channel of the optical images has proven to be the most important first-level criterion to separate open water from floating ice. This first-level criterion works well in areas without dense brash ice, but tends to misclassify dense brash ice as single icebergs. Isolating these large misclassified regions and applying a higher reflectance threshold as a second-level criterion helps to isolate individual ice blocks surrounded by dense brash ice. We present classification results from surveys taken during June and August, 2007-2013, as well as preliminary results from statistical modeling of the spatio-temporal distribution of seals and ice. OBIA is a powerful method of habitat classification and offers an effective approach to compare the spatio-temporal distribution and availability of glacial ice habitats for harbor seals in tidewater glacial fjords.
NASA Astrophysics Data System (ADS)
Irvine, John M.; Ghadar, Nastaran; Duncan, Steve; Floyd, David; O'Dowd, David; Lin, Kristie; Chang, Tom
2017-03-01
Quantitative biomarkers for assessing the presence, severity, and progression of age-related macular degeneration (AMD) would benefit research, diagnosis, and treatment. This paper explores development of quantitative biomarkers derived from OCT imagery of the retina. OCT images for approximately 75 patients with Wet AMD, Dry AMD, and no AMD (healthy eyes) were analyzed to identify image features indicative of the patients' conditions. OCT image features provide a statistical characterization of the retina. Healthy eyes exhibit a layered structure, whereas chaotic patterns indicate the deterioration associated with AMD. Our approach uses wavelet and Frangi filtering, combined with statistical features that do not rely on image segmentation, to assess patient conditions. Classification analysis indicates clear separability of Wet AMD from other conditions, including Dry AMD and healthy retinas. The probability of correct classification of was 95.7%, as determined from cross validation. Similar classification analysis predicts the response of Wet AMD patients to treatment, as measured by the Best Corrected Visual Acuity (BCVA). A statistical model predicts BCVA from the imagery features with R2 = 0.846. Initial analysis of OCT imagery indicates that imagery-derived features can provide useful biomarkers for characterization and quantification of AMD: Accurate assessment of Wet AMD compared to other conditions; image-based prediction of outcome for Wet AMD treatment; and features derived from the OCT imagery accurately predict BCVA; unlike many methods in the literature, our techniques do not rely on segmentation of the OCT image. Next steps include larger scale testing and validation.
Classification images for localization performance in ramp-spectrum noise.
Abbey, Craig K; Samuelson, Frank W; Zeng, Rongping; Boone, John M; Eckstein, Miguel P; Myers, Kyle
2018-05-01
This study investigates forced localization of targets in simulated images with statistical properties similar to trans-axial sections of x-ray computed tomography (CT) volumes. A total of 24 imaging conditions are considered, comprising two target sizes, three levels of background variability, and four levels of frequency apodization. The goal of the study is to better understand how human observers perform forced-localization tasks in images with CT-like statistical properties. The transfer properties of CT systems are modeled by a shift-invariant transfer function in addition to apodization filters that modulate high spatial frequencies. The images contain noise that is the combination of a ramp-spectrum component, simulating the effect of acquisition noise in CT, and a power-law component, simulating the effect of normal anatomy in the background, which are modulated by the apodization filter as well. Observer performance is characterized using two psychophysical techniques: efficiency analysis and classification image analysis. Observer efficiency quantifies how much diagnostic information is being used by observers to perform a task, and classification images show how that information is being accessed in the form of a perceptual filter. Psychophysical studies from five subjects form the basis of the results. Observer efficiency ranges from 29% to 77% across the different conditions. The lowest efficiency is observed in conditions with uniform backgrounds, where significant effects of apodization are found. The classification images, estimated using smoothing windows, suggest that human observers use center-surround filters to perform the task, and these are subjected to a number of subsequent analyses. When implemented as a scanning linear filter, the classification images appear to capture most of the observer variability in efficiency (r 2 = 0.86). The frequency spectra of the classification images show that frequency weights generally appear bandpass in nature, with peak frequency and bandwidth that vary with statistical properties of the images. In these experiments, the classification images appear to capture important features of human-observer performance. Frequency apodization only appears to have a significant effect on performance in the absence of anatomical variability, where the observers appear to underweight low spatial frequencies that have relatively little noise. Frequency weights derived from the classification images generally have a bandpass structure, with adaptation to different conditions seen in the peak frequency and bandwidth. The classification image spectra show relatively modest changes in response to different levels of apodization, with some evidence that observers are attempting to rebalance the apodized spectrum presented to them. © 2018 American Association of Physicists in Medicine.
Dorph, Annalie
2017-01-01
Defining an acoustic repertoire is essential to understanding vocal signalling and communicative interactions within a species. Currently, quantitative and statistical definition is lacking for the vocalisations of many dasyurids, an important group of small to medium-sized marsupials from Australasia that includes the eastern quoll (Dasyurus viverrinus), a species of conservation concern. Beyond generating a better understanding of this species' social interactions, determining an acoustic repertoire will further improve detection rates and inference of vocalisations gathered by automated bioacoustic recorders. Hence, this study investigated eastern quoll vocalisations using objective signal processing techniques to quantitatively analyse spectrograms recorded from 15 different individuals. Recordings were collected in conjunction with observations of the behaviours associated with each vocalisation to develop an acoustic-based behavioural repertoire for the species. Analysis of recordings produced a putative classification of five vocalisation types: Bark, Growl, Hiss, Cp-cp, and Chuck. These were most frequently observed during agonistic encounters between conspecifics, most likely as a graded sequence from Hisses occurring in a warning context through to Growls and finally Barks being given prior to, or during, physical confrontations between individuals. Quantitative and statistical methods were used to objectively establish the accuracy of these five putative call types. A multinomial logistic regression indicated a 97.27% correlation with the perceptual classification, demonstrating support for the five different vocalisation types. This putative classification was further supported by hierarchical cluster analysis and silhouette information that determined the optimal number of clusters to be five. Minor disparity between the objective and perceptual classifications was potentially the result of gradation between vocalisations, or subtle differences present within vocalisations not discernible to the human ear. The implication of these different vocalisations and their given context is discussed in relation to the ecology of the species and the potential application of passive acoustic monitoring techniques. PMID:28686679
NASA Astrophysics Data System (ADS)
Berlin, Cynthia Jane
1998-12-01
This research addresses the identification of the areal extent of the intertidal wetlands of Willapa Bay, Washington, and the evaluation of the potential for exotic Spartina alterniflora (smooth cordgrass) expansion in the bay using a spatial geographic approach. It is hoped that the results will address not only the management needs of the study area but provide a research design that may be applied to studies of other coastal wetlands. Four satellite images, three Landsat Multi-Spectral (MSS) and one Thematic Mapper (TM), are used to derive a map showing areas of water, low, middle and high intertidal, and upland. Two multi-date remote sensing mapping techniques are assessed: a supervised classification using density-slicing and an unsupervised classification using an ISODATA algorithm. Statistical comparisons are made between the resultant derived maps and the U.S.G.S. topographic maps for the Willapa Bay area. The potential for Spartina expansion in the bay is assessed using a sigmoidal (logistic) growth model and a spatial modelling procedure for four possible growth scenarios: without management controls (Business-as-Usual), with moderate management controls (e.g. harvesting to eliminate seed setting), under a hypothetical increase in the growth rate that may reflect favorable environmental changes, and under a hypothetical decrease in the growth rate that may reflect aggressive management controls. Comparisons for the statistics of the two mapping techniques suggest that although the unsupervised classification method performed satisfactorily, the supervised classification (density-slicing) method provided more satisfactory results. Results from the modelling of potential Spartina expansion suggest that Spartina expansion will proceed rapidly for the Business-as-Usual and hypothetical increase in the growth rate scenario, and at a slower rate for the elimination of seed setting and hypothetical decrease in the growth rate scenarios, until all potential habitat is filled.
Steganalysis of recorded speech
NASA Astrophysics Data System (ADS)
Johnson, Micah K.; Lyu, Siwei; Farid, Hany
2005-03-01
Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-02-07
... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the..., Medical Systems Administrator, Classifications and Public Health Data Standards Staff, NCHS, 3311 Toledo...
49 CFR 1248.100 - Commodity classification designated.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 49 Transportation 9 2011-10-01 2011-10-01 false Commodity classification designated. 1248.100... STATISTICS Commodity Code § 1248.100 Commodity classification designated. Commencing with reports for the..., reports of commodity statistics required to be made to the Board, shall be based on the commodity codes...
NASA Astrophysics Data System (ADS)
Kanniyappan, Udayakumar; Gnanatheepaminstein, Einstein; Prakasarao, Aruna; Dornadula, Koteeswaran; Singaravelu, Ganesan
2017-02-01
Cancer is one of the most common human threats around the world and diagnosis based on optical spectroscopy especially fluorescence technique has been established as the standard approach among scientist to explore the biochemical and morphological changes in tissues. In this regard, the present work aims to extract spectral signatures of the various fluorophores present in oral tissues using parallel factor analysis (PARAFAC). Subsequently, the statistical analysis also to be performed to show its diagnostic potential in distinguishing malignant, premalignant from normal oral tissues. Hence, the present study may lead to the possible and/or alternative tool for oral cancer diagnosis.
Random forests for classification in ecology
Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J.
2007-01-01
Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature. ?? 2007 by the Ecological Society of America.
DOE Office of Scientific and Technical Information (OSTI.GOV)
The plpdfa software is a product of an LDRD project at LLNL entitked "Adaptive Sampling for Very High Throughput Data Streams" (tracking number 11-ERD-035). This software was developed by a graduate student summer intern, Chris Challis, who worked under project PI Dan Merl furing the summer of 2011. The software the source code is implementing is a statistical analysis technique for clustering and classification of text-valued data. The method had been previously published by the PI in the open literature.
Classification Techniques for Multivariate Data Analysis.
1980-03-28
analysis among biologists, botanists, and ecologists, while some social scientists may refer "typology". Other frequently encountered terms are pattern...the determinantal equation: lB -XW 0 (42) 49 The solutions X. are the eigenvalues of the matrix W-1 B 1 as in discriminant analysis. There are t non...Statistical Package for Social Sciences (SPSS) (14) subprogram FACTOR was used for the principal components analysis. It is designed both for the factor
Reboiro-Jato, Miguel; Arrais, Joel P; Oliveira, José Luis; Fdez-Riverola, Florentino
2014-01-30
The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-07-08
... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the... Prevention, Classifications and Public Health Data Standards, 3311 Toledo Road, Room 2337, Hyattsville, MD...
Federal Register 2010, 2011, 2012, 2013, 2014
2013-08-28
... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the... Administrator, Classifications and Public Health Data Standards Staff, NCHS, 3311 Toledo Road, Room 2337...
Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses.
Hernandez, Troy; Yang, Jie
2016-10-01
The typical process for classifying and submitting a newly sequenced virus to the NCBI database involves two steps. First, a BLAST search is performed to determine likely family candidates. That is followed by checking the candidate families with the pairwise sequence alignment tool for similar species. The submitter's judgment is then used to determine the most likely species classification. The aim of this article is to show that this process can be automated into a fast, accurate, one-step process using the proposed alignment-free method and properly implemented machine learning techniques. We present a new family of alignment-free vectorizations of the genome, the generalized vector, that maintains the speed of existing alignment-free methods while outperforming all available methods. This new alignment-free vectorization uses the frequency of genomic words (k-mers), as is done in the composition vector, and incorporates descriptive statistics of those k-mers' positional information, as inspired by the natural vector. We analyze five different characterizations of genome similarity using k-nearest neighbor classification and evaluate these on two collections of viruses totaling over 10,000 viruses. We show that our proposed method performs better than, or as well as, other methods at every level of the phylogenetic hierarchy. The data and R code is available upon request.
A Comparative Study of Land Cover Classification by Using Multispectral and Texture Data
Qadri, Salman; Khan, Dost Muhammad; Ahmad, Farooq; Qadri, Syed Furqan; Babar, Masroor Ellahi; Shahid, Muhammad; Ul-Rehman, Muzammil; Razzaq, Abdul; Shah Muhammad, Syed; Fahad, Muhammad; Ahmad, Sarfraz; Pervez, Muhammad Tariq; Naveed, Nasir; Aslam, Naeem; Jamil, Mutiullah; Rehmani, Ejaz Ahmad; Ahmad, Nazir; Akhtar Khan, Naeem
2016-01-01
The main objective of this study is to find out the importance of machine vision approach for the classification of five types of land cover data such as bare land, desert rangeland, green pasture, fertile cultivated land, and Sutlej river land. A novel spectra-statistical framework is designed to classify the subjective land cover data types accurately. Multispectral data of these land covers were acquired by using a handheld device named multispectral radiometer in the form of five spectral bands (blue, green, red, near infrared, and shortwave infrared) while texture data were acquired with a digital camera by the transformation of acquired images into 229 texture features for each image. The most discriminant 30 features of each image were obtained by integrating the three statistical features selection techniques such as Fisher, Probability of Error plus Average Correlation, and Mutual Information (F + PA + MI). Selected texture data clustering was verified by nonlinear discriminant analysis while linear discriminant analysis approach was applied for multispectral data. For classification, the texture and multispectral data were deployed to artificial neural network (ANN: n-class). By implementing a cross validation method (80-20), we received an accuracy of 91.332% for texture data and 96.40% for multispectral data, respectively. PMID:27376088
NASA Astrophysics Data System (ADS)
Liu, Chanjuan; van Netten, Jaap J.; Klein, Marvin E.; van Baal, Jeff G.; Bus, Sicco A.; van der Heijden, Ferdi
2013-12-01
Early detection of (pre-)signs of ulceration on a diabetic foot is valuable for clinical practice. Hyperspectral imaging is a promising technique for detection and classification of such (pre-)signs. However, the number of the spectral bands should be limited to avoid overfitting, which is critical for pixel classification with hyperspectral image data. The goal was to design a detector/classifier based on spectral imaging (SI) with a small number of optical bandpass filters. The performance and stability of the design were also investigated. The selection of the bandpass filters boils down to a feature selection problem. A dataset was built, containing reflectance spectra of 227 skin spots from 64 patients, measured with a spectrometer. Each skin spot was annotated manually by clinicians as "healthy" or a specific (pre-)sign of ulceration. Statistical analysis on the data set showed the number of required filters is between 3 and 7, depending on additional constraints on the filter set. The stability analysis revealed that shot noise was the most critical factor affecting the classification performance. It indicated that this impact could be avoided in future SI systems with a camera sensor whose saturation level is higher than 106, or by postimage processing.
Validity and reliability of the Paprosky acetabular defect classification.
Yu, Raymond; Hofstaetter, Jochen G; Sullivan, Thomas; Costi, Kerry; Howie, Donald W; Solomon, Lucian B
2013-07-01
The Paprosky acetabular defect classification is widely used but has not been appropriately validated. Reliability of the Paprosky system has not been evaluated in combination with standardized techniques of measurement and scoring. This study evaluated the reliability, teachability, and validity of the Paprosky acetabular defect classification. Preoperative radiographs from a random sample of 83 patients undergoing 85 acetabular revisions were classified by four observers, and their classifications were compared with quantitative intraoperative measurements. Teachability of the classification scheme was tested by dividing the four observers into two groups. The observers in Group 1 underwent three teaching sessions; those in Group 2 underwent one session and the influence of teaching on the accuracy of their classifications was ascertained. Radiographic evaluation showed statistically significant relationships with intraoperative measurements of anterior, medial, and superior acetabular defect sizes. Interobserver reliability improved substantially after teaching and did not improve without it. The weighted kappa coefficient went from 0.56 at Occasion 1 to 0.79 after three teaching sessions in Group 1 observers, and from 0.49 to 0.65 after one teaching session in Group 2 observers. The Paprosky system is valid and shows good reliability when combined with standardized definitions of radiographic landmarks and a structured analysis. Level II, diagnostic study. See the Guidelines for Authors for a complete description of levels of evidence.
Classifying GRB 170817A/GW170817 in a Fermi duration-hardness plane
NASA Astrophysics Data System (ADS)
Horváth, I.; Tóth, B. G.; Hakkila, J.; Tóth, L. V.; Balázs, L. G.; Rácz, I. I.; Pintér, S.; Bagoly, Z.
2018-03-01
GRB 170817A, associated with the LIGO-Virgo GW170817 neutron-star merger event, lacks the short duration and hard spectrum of a Short gamma-ray burst (GRB) expected from long-standing classification models. Correctly identifying the class to which this burst belongs requires comparison with other GRBs detected by the Fermi GBM. The aim of our analysis is to classify Fermi GRBs and to test whether or not GRB 170817A belongs—as suggested—to the Short GRB class. The Fermi GBM catalog provides a large database with many measured variables that can be used to explore gamma-ray burst classification. We use statistical techniques to look for clustering in a sample of 1298 gamma-ray bursts described by duration and spectral hardness. Classification of the detected bursts shows that GRB 170817A most likely belongs to the Intermediate, rather than the Short GRB class. We discuss this result in light of theoretical neutron-star merger models and existing GRB classification schemes. It appears that GRB classification schemes may not yet be linked to appropriate theoretical models, and that theoretical models may not yet adequately account for known GRB class properties. We conclude that GRB 170817A may not fit into a simple phenomenological classification scheme.
High-Reproducibility and High-Accuracy Method for Automated Topic Classification
NASA Astrophysics Data System (ADS)
Lancichinetti, Andrea; Sirer, M. Irmak; Wang, Jane X.; Acuna, Daniel; Körding, Konrad; Amaral, Luís A. Nunes
2015-01-01
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent searching, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state of the art in topic modeling. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results that are not accurate in inferring the most suitable model parameters. Adapting approaches from community detection in networks, we propose a new algorithm that displays high reproducibility and high accuracy and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure.
Crack classification in concrete beams using AE parameters
NASA Astrophysics Data System (ADS)
Bahari, N. A. A. S.; Shahidan, S.; Abdullah, S. R.; Ali, N.; Zuki, S. S. Mohd; Ibrahim, M. H. W.; Rahim, M. A.
2017-11-01
The acoustic emission (AE) technique is an effective tool for the evaluation of crack growth. The aim of this study is to evaluate crack classification in reinforced concrete beams using statistical analysis. AE has been applied for the early monitoring of reinforced concrete structures using AE parameters such as average frequency, rise time, amplitude counts and duration. This experimental study focuses on the utilisation of this method in evaluating reinforced concrete beams. Beam specimens measuring 150 mm × 250 mm × 1200 mm were tested using a three-point load flexural test using Universal Testing Machines (UTM) together with an AE monitoring system. The results indicated that RA value can be used to determine the relationship between tensile crack and shear movement in reinforced concrete beams.
HOS network-based classification of power quality events via regression algorithms
NASA Astrophysics Data System (ADS)
Palomares Salas, José Carlos; González de la Rosa, Juan José; Sierra Fernández, José María; Pérez, Agustín Agüera
2015-12-01
This work compares seven regression algorithms implemented in artificial neural networks (ANNs) supported by 14 power-quality features, which are based in higher-order statistics. Combining time and frequency domain estimators to deal with non-stationary measurement sequences, the final goal of the system is the implementation in the future smart grid to guarantee compatibility between all equipment connected. The principal results are based in spectral kurtosis measurements, which easily adapt to the impulsive nature of the power quality events. These results verify that the proposed technique is capable of offering interesting results for power quality (PQ) disturbance classification. The best results are obtained using radial basis networks, generalized regression, and multilayer perceptron, mainly due to the non-linear nature of data.
Rosso, Osvaldo A; Ospina, Raydonal; Frery, Alejandro C
2016-01-01
We present a new approach for handwritten signature classification and verification based on descriptors stemming from time causal information theory. The proposal uses the Shannon entropy, the statistical complexity, and the Fisher information evaluated over the Bandt and Pompe symbolization of the horizontal and vertical coordinates of signatures. These six features are easy and fast to compute, and they are the input to an One-Class Support Vector Machine classifier. The results are better than state-of-the-art online techniques that employ higher-dimensional feature spaces which often require specialized software and hardware. We assess the consistency of our proposal with respect to the size of the training sample, and we also use it to classify the signatures into meaningful groups.
Classification of wheat: Badhwar profile similarity technique
NASA Technical Reports Server (NTRS)
Austin, W. W.
1980-01-01
The Badwar profile similarity classification technique used successfully for classification of corn was applied to spring wheat classifications. The software programs and the procedures used to generate full-scene classifications are presented, and numerical results of the acreage estimations are given.
Exploring Genome-Wide Expression Profiles Using Machine Learning Techniques.
Kebschull, Moritz; Papapanou, Panos N
2017-01-01
Although contemporary high-throughput -omics methods produce high-dimensional data, the resulting wealth of information is difficult to assess using traditional statistical procedures. Machine learning methods facilitate the detection of additional patterns, beyond the mere identification of lists of features that differ between groups.Here, we demonstrate the utility of (1) supervised classification algorithms in class validation, and (2) unsupervised clustering in class discovery. We use data from our previous work that described the transcriptional profiles of gingival tissue samples obtained from subjects suffering from chronic or aggressive periodontitis (1) to test whether the two diagnostic entities were also characterized by differences on the molecular level, and (2) to search for a novel, alternative classification of periodontitis based on the tissue transcriptomes.Using machine learning technology, we provide evidence for diagnostic imprecision in the currently accepted classification of periodontitis, and demonstrate that a novel, alternative classification based on differences in gingival tissue transcriptomes is feasible. The outlined procedures allow for the unbiased interrogation of high-dimensional datasets for characteristic underlying classes, and are applicable to a broad range of -omics data.
Using complex networks for text classification: Discriminating informative and imaginative documents
NASA Astrophysics Data System (ADS)
de Arruda, Henrique F.; Costa, Luciano da F.; Amancio, Diego R.
2016-01-01
Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised the semantical content of texts, as is the case of bag-of-word language models. These approaches have certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only in a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterising texts.
Speaker gender identification based on majority vote classifiers
NASA Astrophysics Data System (ADS)
Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri
2017-03-01
Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.
G0-WISHART Distribution Based Classification from Polarimetric SAR Images
NASA Astrophysics Data System (ADS)
Hu, G. C.; Zhao, Q. H.
2017-09-01
Enormous scientific and technical developments have been carried out to further improve the remote sensing for decades, particularly Polarimetric Synthetic Aperture Radar(PolSAR) technique, so classification method based on PolSAR images has getted much more attention from scholars and related department around the world. The multilook polarmetric G0-Wishart model is a more flexible model which describe homogeneous, heterogeneous and extremely heterogeneous regions in the image. Moreover, the polarmetric G0-Wishart distribution dose not include the modified Bessel function of the second kind. It is a kind of simple statistical distribution model with less parameter. To prove its feasibility, a process of classification has been tested with the full-polarized Synthetic Aperture Radar (SAR) image by the method. First, apply multilook polarimetric SAR data process and speckle filter to reduce speckle influence for classification result. Initially classify the image into sixteen classes by H/A/α decomposition. Using the ICM algorithm to classify feature based on the G0-Wshart distance. Qualitative and quantitative results show that the proposed method can classify polaimetric SAR data effectively and efficiently.
Multisensor multiresolution data fusion for improvement in classification
NASA Astrophysics Data System (ADS)
Rubeena, V.; Tiwari, K. C.
2016-04-01
The rapid advancements in technology have facilitated easy availability of multisensor and multiresolution remote sensing data. Multisensor, multiresolution data contain complementary information and fusion of such data may result in application dependent significant information which may otherwise remain trapped within. The present work aims at improving classification by fusing features of coarse resolution hyperspectral (1 m) LWIR and fine resolution (20 cm) RGB data. The classification map comprises of eight classes. The class names are Road, Trees, Red Roof, Grey Roof, Concrete Roof, Vegetation, bare Soil and Unclassified. The processing methodology for hyperspectral LWIR data comprises of dimensionality reduction, resampling of data by interpolation technique for registering the two images at same spatial resolution, extraction of the spatial features to improve classification accuracy. In the case of fine resolution RGB data, the vegetation index is computed for classifying the vegetation class and the morphological building index is calculated for buildings. In order to extract the textural features, occurrence and co-occurence statistics is considered and the features will be extracted from all the three bands of RGB data. After extracting the features, Support Vector Machine (SVMs) has been used for training and classification. To increase the classification accuracy, post processing steps like removal of any spurious noise such as salt and pepper noise is done which is followed by filtering process by majority voting within the objects for better object classification.
Russell, Thomas A; Mir, Hassan R; Stoneback, Jason; Cohen, Jose; Downs, Brandon
2008-07-01
To determine our rate of malalignment in proximal femoral shaft fractures treated with intramedullary (IM) nails, with and without the use of a minimally invasive nail insertion technique (MINIT). Retrospective study. Level 1 trauma center. Between July 1, 2003, and June 31, 2005, 100 consecutive proximal femoral shaft fractures (97 patients) were treated with IM nails. The average age of the 56 men and 41 women was 43.5 years (range, 17 to 96 years). There were 92 closed fractures and 8 open fractures. Fractures were classified according to the Russell-Taylor classification (69 type 1A, 11 type 1B, 3 type 2A, 17 type 2B). All patients underwent antegrade IM nailing using a fracture table in the supine (83) or lateral (17) position. A total of 72 entry portals were trochanteric, and 28 were piriformis. Seventy-seven percent of the femurs were opened with MINIT, a technique that uses a percutaneous cannulated channel reamer over a guide pin as opposed to the standard method of Kuntscher, which employs a femoral awl. Nails were locked proximally using standard locking in 37 fractures, and recon mode in 63. Fracture reduction was examined on immediate postoperative films to determine angulation in the coronal and sagittal planes. Criteria for acceptable reduction were less than 5 degrees angulation in any plane. In addition, surgical position, entry portal, mechanism of injury, Russell-Taylor classification, OTA classification, open or closed fracture, open or closed reduction, and type of implant used were analyzed for significance. The frequency of malalignment was 10% for the entire group of patients. Malalignment occurred in 26% of fractures treated without the use of the MINIT and in 5.2% when the MINIT was used (P < 0.01). There was no statistically significant difference between the different Russell-Taylor fracture types, although there was a trend towards more malalignment in type 2A and 2B fractures (P = 0.06). None of the other factors studied had a statistically significant effect on malalignment. A whole-model test of the factors that were surgeon-controlled (use of the MINIT, surgical position, open or closed reduction, type of implant used, and entry portal) found that only use of the MINIT had a statistically significant effect on malalignment (P < 0.01). The results indicate that use of the minimally invasive nail insertion technique (MINIT) significantly decreases the occurrence of malalignment in proximal femoral shaft fractures.
[Surgical correction of cleft palate].
Kimura, F T; Pavia Noble, A; Soriano Padilla, F; Soto Miranda, A; Medellín Rodríguez, A
1990-04-01
This study presents a statistical review of corrective surgery for cleft palate, based on cases treated at the maxillo-facial surgery units of the Pediatrics Hospital of the Centro Médico Nacional and at Centro Médico La Raza of the National Institute of Social Security of Mexico, over a five-year period. Interdisciplinary management as performed at the Cleft-Palate Clinic, in an integrated approach involving specialists in maxillo-facial surgery, maxillar orthopedics, genetics, social work and mental hygiene, pursuing to reestablish the stomatological and psychological functions of children afflicted by cleft palate, is amply described. The frequency and classification of the various techniques practiced in that service are described, as well as surgical statistics for 188 patients, which include a total of 256 palate surgeries performed from March 1984 to March 1989, applying three different techniques and proposing a combination of them in a single surgical time, in order to avoid complementary surgery.
Sayago, Ana; González-Domínguez, Raúl; Beltrán, Rafael; Fernández-Recamales, Ángeles
2018-09-30
This work explores the potential of multi-element fingerprinting in combination with advanced data mining strategies to assess the geographical origin of extra virgin olive oil samples. For this purpose, the concentrations of 55 elements were determined in 125 oil samples from multiple Spanish geographic areas. Several unsupervised and supervised multivariate statistical techniques were used to build classification models and investigate the relationship between mineral composition of olive oils and their provenance. Results showed that Spanish extra virgin olive oils exhibit characteristic element profiles, which can be differentiated on the basis of their origin in accordance with three geographical areas: Atlantic coast (Huelva province), Mediterranean coast and inland regions. Furthermore, statistical modelling yielded high sensitivity and specificity, principally when random forest and support vector machines were employed, thus demonstrating the utility of these techniques in food traceability and authenticity research. Copyright © 2018 Elsevier Ltd. All rights reserved.
Variance estimates and confidence intervals for the Kappa measure of classification accuracy
M. A. Kalkhan; R. M. Reich; R. L. Czaplewski
1997-01-01
The Kappa statistic is frequently used to characterize the results of an accuracy assessment used to evaluate land use and land cover classifications obtained by remotely sensed data. This statistic allows comparisons of alternative sampling designs, classification algorithms, photo-interpreters, and so forth. In order to make these comparisons, it is...
Automatic Classification of Medical Text: The Influence of Publication Form1
Cole, William G.; Michael, Patricia A.; Stewart, James G.; Blois, Marsden S.
1988-01-01
Previous research has shown that within the domain of medical journal abstracts the statistical distribution of words is neither random nor uniform, but is highly characteristic. Many words are used mainly or solely by one medical specialty or when writing about one particular level of description. Due to this regularity of usage, automatic classification within journal abstracts has proved quite successful. The present research asks two further questions. It investigates whether this statistical regularity and automatic classification success can also be achieved in medical textbook chapters. It then goes on to see whether the statistical distribution found in textbooks is sufficiently similar to that found in abstracts to permit accurate classification of abstracts based solely on previous knowledge of textbooks. 14 textbook chapters and 45 MEDLINE abstracts were submitted to an automatic classification program that had been trained only on chapters drawn from a standard textbook series. Statistical analysis of the properties of abstracts vs. chapters revealed important differences in word use. Automatic classification performance was good for chapters, but poor for abstracts.
A land classification protocol for pollinator ecology research: An urbanization case study.
Samuelson, Ash E; Leadbeater, Ellouise
2018-06-01
Land-use change is one of the most important drivers of widespread declines in pollinator populations. Comprehensive quantitative methods for land classification are critical to understanding these effects, but co-option of existing human-focussed land classifications is often inappropriate for pollinator research. Here, we present a flexible GIS-based land classification protocol for pollinator research using a bottom-up approach driven by reference to pollinator ecology, with urbanization as a case study. Our multistep method involves manually generating land cover maps at multiple biologically relevant radii surrounding study sites using GIS, with a focus on identifying land cover types that have a specific relevance to pollinators. This is followed by a three-step refinement process using statistical tools: (i) definition of land-use categories, (ii) principal components analysis on the categories, and (iii) cluster analysis to generate a categorical land-use variable for use in subsequent analysis. Model selection is then used to determine the appropriate spatial scale for analysis. We demonstrate an application of our protocol using a case study of 38 sites across a gradient of urbanization in South-East England. In our case study, the land classification generated a categorical land-use variable at each of four radii based on the clustering of sites with different degrees of urbanization, open land, and flower-rich habitat. Studies of land-use effects on pollinators have historically employed a wide array of land classification techniques from descriptive and qualitative to complex and quantitative. We suggest that land-use studies in pollinator ecology should broadly adopt GIS-based multistep land classification techniques to enable robust analysis and aid comparative research. Our protocol offers a customizable approach that combines specific relevance to pollinator research with the potential for application to a wide range of ecological questions, including agroecological studies of pest control.
Decision rules for unbiased inventory estimates
NASA Technical Reports Server (NTRS)
Argentiero, P. D.; Koch, D.
1979-01-01
An efficient and accurate procedure for estimating inventories from remote sensing scenes is presented. In place of the conventional and expensive full dimensional Bayes decision rule, a one-dimensional feature extraction and classification technique was employed. It is shown that this efficient decision rule can be used to develop unbiased inventory estimates and that for large sample sizes typical of satellite derived remote sensing scenes, resulting accuracies are comparable or superior to more expensive alternative procedures. Mathematical details of the procedure are provided in the body of the report and in the appendix. Results of a numerical simulation of the technique using statistics obtained from an observed LANDSAT scene are included. The simulation demonstrates the effectiveness of the technique in computing accurate inventory estimates.
A multiple-point spatially weighted k-NN method for object-based classification
NASA Astrophysics Data System (ADS)
Tang, Yunwei; Jing, Linhai; Li, Hui; Atkinson, Peter M.
2016-10-01
Object-based classification, commonly referred to as object-based image analysis (OBIA), is now commonly regarded as able to produce more appealing classification maps, often of greater accuracy, than pixel-based classification and its application is now widespread. Therefore, improvement of OBIA using spatial techniques is of great interest. In this paper, multiple-point statistics (MPS) is proposed for object-based classification enhancement in the form of a new multiple-point k-nearest neighbour (k-NN) classification method (MPk-NN). The proposed method first utilises a training image derived from a pre-classified map to characterise the spatial correlation between multiple points of land cover classes. The MPS borrows spatial structures from other parts of the training image, and then incorporates this spatial information, in the form of multiple-point probabilities, into the k-NN classifier. Two satellite sensor images with a fine spatial resolution were selected to evaluate the new method. One is an IKONOS image of the Beijing urban area and the other is a WorldView-2 image of the Wolong mountainous area, in China. The images were object-based classified using the MPk-NN method and several alternatives, including the k-NN, the geostatistically weighted k-NN, the Bayesian method, the decision tree classifier (DTC), and the support vector machine classifier (SVM). It was demonstrated that the new spatial weighting based on MPS can achieve greater classification accuracy relative to the alternatives and it is, thus, recommended as appropriate for object-based classification.
NASA Technical Reports Server (NTRS)
Amis, M. L.; Martin, M. V.; Mcguire, W. G.; Shen, S. S. (Principal Investigator)
1982-01-01
Studies completed in fiscal year 1981 in support of the clustering/classification and preprocessing activities of the Domestic Crops and Land Cover project. The theme throughout the study was the improvement of subanalysis district (usually county level) crop hectarage estimates, as reflected in the following three objectives: (1) to evaluate the current U.S. Department of Agriculture Statistical Reporting Service regression approach to crop area estimation as applied to the problem of obtaining subanalysis district estimates; (2) to develop and test alternative approaches to subanalysis district estimation; and (3) to develop and test preprocessing techniques for use in improving subanalysis district estimates.
Impervious surfaces mapping using high resolution satellite imagery
NASA Astrophysics Data System (ADS)
Shirmeen, Tahmina
In recent years, impervious surfaces have emerged not only as an indicator of the degree of urbanization, but also as an indicator of environmental quality. As impervious surface area increases, storm water runoff increases in velocity, quantity, temperature and pollution load. Any of these attributes can contribute to the degradation of natural hydrology and water quality. Various image processing techniques have been used to identify the impervious surfaces, however, most of the existing impervious surface mapping tools used moderate resolution imagery. In this project, the potential of standard image processing techniques to generate impervious surface data for change detection analysis using high-resolution satellite imagery was evaluated. The city of Oxford, MS was selected as the study site for this project. Standard image processing techniques, including Normalized Difference Vegetation Index (NDVI), Principal Component Analysis (PCA), a combination of NDVI and PCA, and image classification algorithms, were used to generate impervious surfaces from multispectral IKONOS and QuickBird imagery acquired in both leaf-on and leaf-off conditions. Accuracy assessments were performed, using truth data generated by manual classification, with Kappa statistics and Zonal statistics to select the most appropriate image processing techniques for impervious surface mapping. The performance of selected image processing techniques was enhanced by incorporating Soil Brightness Index (SBI) and Greenness Index (GI) derived from Tasseled Cap Transformed (TCT) IKONOS and QuickBird imagery. A time series of impervious surfaces for the time frame between 2001 and 2007 was made using the refined image processing techniques to analyze the changes in IS in Oxford. It was found that NDVI and the combined NDVI--PCA methods are the most suitable image processing techniques for mapping impervious surfaces in leaf-off and leaf-on conditions respectively, using high resolution multispectral imagery. It was also found that IS data generated by these techniques can be refined by removing the conflicting dry soil patches using SBI and GI obtained from TCT of the same imagery used for IS data generation. The change detection analysis of the IS time series shows that Oxford experienced the major changes in IS from the year 2001 to 2004 and 2006 to 2007.
Rutzinger, Martin; Höfle, Bernhard; Hollaus, Markus; Pfeifer, Norbert
2008-01-01
Airborne laser scanning (ALS) is a remote sensing technique well-suited for 3D vegetation mapping and structure characterization because the emitted laser pulses are able to penetrate small gaps in the vegetation canopy. The backscattered echoes from the foliage, woody vegetation, the terrain, and other objects are detected, leading to a cloud of points. Higher echo densities (>20 echoes/m2) and additional classification variables from full-waveform (FWF) ALS data, namely echo amplitude, echo width and information on multiple echoes from one shot, offer new possibilities in classifying the ALS point cloud. Currently FWF sensor information is hardly used for classification purposes. This contribution presents an object-based point cloud analysis (OBPA) approach, combining segmentation and classification of the 3D FWF ALS points designed to detect tall vegetation in urban environments. The definition tall vegetation includes trees and shrubs, but excludes grassland and herbage. In the applied procedure FWF ALS echoes are segmented by a seeded region growing procedure. All echoes sorted descending by their surface roughness are used as seed points. Segments are grown based on echo width homogeneity. Next, segment statistics (mean, standard deviation, and coefficient of variation) are calculated by aggregating echo features such as amplitude and surface roughness. For classification a rule base is derived automatically from a training area using a statistical classification tree. To demonstrate our method we present data of three sites with around 500,000 echoes each. The accuracy of the classified vegetation segments is evaluated for two independent validation sites. In a point-wise error assessment, where the classification is compared with manually classified 3D points, completeness and correctness better than 90% are reached for the validation sites. In comparison to many other algorithms the proposed 3D point classification works on the original measurements directly, i.e. the acquired points. Gridding of the data is not necessary, a process which is inherently coupled to loss of data and precision. The 3D properties provide especially a good separability of buildings and terrain points respectively, if they are occluded by vegetation. PMID:27873771
NASA Technical Reports Server (NTRS)
Abbey, Craig K.; Eckstein, Miguel P.
2002-01-01
We consider estimation and statistical hypothesis testing on classification images obtained from the two-alternative forced-choice experimental paradigm. We begin with a probabilistic model of task performance for simple forced-choice detection and discrimination tasks. Particular attention is paid to general linear filter models because these models lead to a direct interpretation of the classification image as an estimate of the filter weights. We then describe an estimation procedure for obtaining classification images from observer data. A number of statistical tests are presented for testing various hypotheses from classification images based on some more compact set of features derived from them. As an example of how the methods we describe can be used, we present a case study investigating detection of a Gaussian bump profile.
Diagnostic methodology for incipient system disturbance based on a neural wavelet approach
NASA Astrophysics Data System (ADS)
Won, In-Ho
Since incipient system disturbances are easily mixed up with other events or noise sources, the signal from the system disturbance can be neglected or identified as noise. Thus, as available knowledge and information is obtained incompletely or inexactly from the measurements; an exploration into the use of artificial intelligence (AI) tools to overcome these uncertainties and limitations was done. A methodology integrating the feature extraction efficiency of the wavelet transform with the classification capabilities of neural networks is developed for signal classification in the context of detecting incipient system disturbances. The synergistic effects of wavelets and neural networks present more strength and less weakness than either technique taken alone. A wavelet feature extractor is developed to form concise feature vectors for neural network inputs. The feature vectors are calculated from wavelet coefficients to reduce redundancy and computational expense. During this procedure, the statistical features based on the fractal concept to the wavelet coefficients play a role as crucial key in the wavelet feature extractor. To verify the proposed methodology, two applications are investigated and successfully tested. The first involves pump cavitation detection using dynamic pressure sensor. The second pertains to incipient pump cavitation detection using signals obtained from a current sensor. Also, through comparisons between three proposed feature vectors and with statistical techniques, it is shown that the variance feature extractor provides a better approach in the performed applications.
Forkan, Abdur Rahim Mohammad; Khalil, Ibrahim
2017-02-01
In home-based context-aware monitoring patient's real-time data of multiple vital signs (e.g. heart rate, blood pressure) are continuously generated from wearable sensors. The changes in such vital parameters are highly correlated. They are also patient-centric and can be either recurrent or can fluctuate. The objective of this study is to develop an intelligent method for personalized monitoring and clinical decision support through early estimation of patient-specific vital sign values, and prediction of anomalies using the interrelation among multiple vital signs. In this paper, multi-label classification algorithms are applied in classifier design to forecast these values and related abnormalities. We proposed a completely new approach of patient-specific vital sign prediction system using their correlations. The developed technique can guide healthcare professionals to make accurate clinical decisions. Moreover, our model can support many patients with various clinical conditions concurrently by utilizing the power of cloud computing technology. The developed method also reduces the rate of false predictions in remote monitoring centres. In the experimental settings, the statistical features and correlations of six vital signs are formulated as multi-label classification problem. Eight multi-label classification algorithms along with three fundamental machine learning algorithms are used and tested on a public dataset of 85 patients. Different multi-label classification evaluation measures such as Hamming score, F1-micro average, and accuracy are used for interpreting the prediction performance of patient-specific situation classifications. We achieved 90-95% Hamming score values across 24 classifier combinations for 85 different patients used in our experiment. The results are compared with single-label classifiers and without considering the correlations among the vitals. The comparisons show that multi-label method is the best technique for this problem domain. The evaluation results reveal that multi-label classification techniques using the correlations among multiple vitals are effective ways for early estimation of future values of those vitals. In context-aware remote monitoring this process can greatly help the doctors in quick diagnostic decision making. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Bach, Bo; Sellbom, Martin; Skjernov, Mathias; Simonsen, Erik
2018-05-01
The five personality disorder trait domains in the proposed International Classification of Diseases, 11th edition and the Diagnostic and Statistical Manual of Mental Disorders, 5th edition are comparable in terms of Negative Affectivity, Detachment, Antagonism/Dissociality and Disinhibition. However, the International Classification of Diseases, 11th edition model includes a separate domain of Anankastia, whereas the Diagnostic and Statistical Manual of Mental Disorders, 5th edition model includes an additional domain of Psychoticism. This study examined associations of International Classification of Diseases, 11th edition and Diagnostic and Statistical Manual of Mental Disorders, 5th edition trait domains, simultaneously, with categorical personality disorders. Psychiatric outpatients ( N = 226) were administered the Structured Clinical Interview for DSM-IV Axis II Personality Disorders Interview and the Personality Inventory for DSM-5. International Classification of Diseases, 11th edition and Diagnostic and Statistical Manual of Mental Disorders, 5th edition trait domain scores were obtained using pertinent scoring algorithms for the Personality Inventory for DSM-5. Associations between categorical personality disorders and trait domains were examined using correlation and multiple regression analyses. Both the International Classification of Diseases, 11th edition and the Diagnostic and Statistical Manual of Mental Disorders, 5th edition domain models showed relevant continuity with categorical personality disorders and captured a substantial amount of their information. As expected, the International Classification of Diseases, 11th edition model was superior in capturing obsessive-compulsive personality disorder, whereas the Diagnostic and Statistical Manual of Mental Disorders, 5th edition model was superior in capturing schizotypal personality disorder. These preliminary findings suggest that little information is 'lost' in a transition to trait domain models and potentially adds to narrowing the gap between Diagnostic and Statistical Manual of Mental Disorders, 5th edition and the proposed International Classification of Diseases, 11th edition model. Accordingly, the International Classification of Diseases, 11th edition and Diagnostic and Statistical Manual of Mental Disorders, 5th edition domain models may be used to delineate one another as well as features of familiar categorical personality disorder types. A preliminary category-to-domain 'cross walk' is provided in the article.
49 CFR 1248.100 - Commodity classification designated.
Code of Federal Regulations, 2010 CFR
2010-10-01
... STATISTICS Commodity Code § 1248.100 Commodity classification designated. Commencing with reports for the..., reports of commodity statistics required to be made to the Board, shall be based on the commodity codes... Statistics, 1963, issued by the Bureau of the Budget, and on additional codes 411 through 462 shown in § 1248...
[The informational support of statistical observation related to children disability].
Son, I M; Polikarpov, A V; Ogrizko, E V; Golubeva, T Yu
2016-01-01
Within the framework of the Convention on rights of the disabled the revision is specified concerning criteria of identification of disability of children and reformation of system of medical social expertise according international standards of indices of health and indices related to health. In connection with it, it is important to consider the relationship between alterations in forms of the Federal statistical monitoring in the part of registration of disabled children in the Russian Federation and classification of health indices and indices related to health applied at identification of disability. The article presents analysis of relationship between alterations in forms of the Federal statistical monitoring in the part of registration of disabled children in the Russian Federation and applied classifications used at identification of disability (International classification of impairments, disabilities and handicap (ICDH), international classification of functioning, disability and health (ICF), international classification of functioning, disability and health, version for children and youth (ICF-CY). The intersectorial interaction is considered within the framework of statistics of children disability.
ERIC Educational Resources Information Center
National Forum on Education Statistics, 2011
2011-01-01
In this handbook, "Prior-to-Secondary School Course Classification System: School Codes for the Exchange of Data" (SCED), the National Center for Education Statistics (NCES) and the National Forum on Education Statistics have extended the existing secondary course classification system with codes and descriptions for courses offered at…
NASA Astrophysics Data System (ADS)
Peresan, Antonella; Gentili, Stefania
2017-04-01
Identification and statistical characterization of seismic clusters may provide useful insights about the features of seismic energy release and their relation to physical properties of the crust within a given region. Moreover, a number of studies based on spatio-temporal analysis of main-shocks occurrence require preliminary declustering of the earthquake catalogs. Since various methods, relying on different physical/statistical assumptions, may lead to diverse classifications of earthquakes into main events and related events, we aim to investigate the classification differences among different declustering techniques. Accordingly, a formal selection and comparative analysis of earthquake clusters is carried out for the most relevant earthquakes in North-Eastern Italy, as reported in the local OGS-CRS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. The comparison is then extended to selected earthquake sequences associated with a different seismotectonic setting, namely to events that occurred in the region struck by the recent Central Italy destructive earthquakes, making use of INGV data. Various techniques, ranging from classical space-time windows methods to ad hoc manual identification of aftershocks, are applied for detection of earthquake clusters. In particular, a statistical method based on nearest-neighbor distances of events in space-time-energy domain, is considered. Results from clusters identification by the nearest-neighbor method turn out quite robust with respect to the time span of the input catalogue, as well as to minimum magnitude cutoff. The identified clusters for the largest events reported in North-Eastern Italy since 1977 are well consistent with those reported in earlier studies, which were aimed at detailed manual aftershocks identification. The study shows that the data-driven approach, based on the nearest-neighbor distances, can be satisfactorily applied to decompose the seismic catalog into background seismicity and individual sequences of earthquake clusters, also in areas characterized by moderate seismic activity, where the standard declustering techniques may turn out rather gross approximations. With these results acquired, the main statistical features of seismic clusters are explored, including complex interdependence of related events, with the aim to characterize the space-time patterns of earthquakes occurrence in North-Eastern Italy and capture their basic differences with Central Italy sequences.
NASA Astrophysics Data System (ADS)
Sreejith, Sreevarsha; Pereverzyev, Sergiy, Jr.; Kelvin, Lee S.; Marleau, Francine R.; Haltmeier, Markus; Ebner, Judith; Bland-Hawthorn, Joss; Driver, Simon P.; Graham, Alister W.; Holwerda, Benne W.; Hopkins, Andrew M.; Liske, Jochen; Loveday, Jon; Moffett, Amanda J.; Pimbblet, Kevin A.; Taylor, Edward N.; Wang, Lingyu; Wright, Angus H.
2018-03-01
We apply four statistical learning methods to a sample of 7941 galaxies (z < 0.06) from the Galaxy And Mass Assembly survey to test the feasibility of using automated algorithms to classify galaxies. Using 10 features measured for each galaxy (sizes, colours, shape parameters, and stellar mass), we apply the techniques of Support Vector Machines, Classification Trees, Classification Trees with Random Forest (CTRF) and Neural Networks, and returning True Prediction Ratios (TPRs) of 75.8 per cent, 69.0 per cent, 76.2 per cent, and 76.0 per cent, respectively. Those occasions whereby all four algorithms agree with each other yet disagree with the visual classification (`unanimous disagreement') serves as a potential indicator of human error in classification, occurring in ˜ 9 per cent of ellipticals, ˜ 9 per cent of little blue spheroids, ˜ 14 per cent of early-type spirals, ˜ 21 per cent of intermediate-type spirals, and ˜ 4 per cent of late-type spirals and irregulars. We observe that the choice of parameters rather than that of algorithms is more crucial in determining classification accuracy. Due to its simplicity in formulation and implementation, we recommend the CTRF algorithm for classifying future galaxy data sets. Adopting the CTRF algorithm, the TPRs of the five galaxy types are : E, 70.1 per cent; LBS, 75.6 per cent; S0-Sa, 63.6 per cent; Sab-Scd, 56.4 per cent, and Sd-Irr, 88.9 per cent. Further, we train a binary classifier using this CTRF algorithm that divides galaxies into spheroid-dominated (E, LBS, and S0-Sa) and disc-dominated (Sab-Scd and Sd-Irr), achieving an overall accuracy of 89.8 per cent. This translates into an accuracy of 84.9 per cent for spheroid-dominated systems and 92.5 per cent for disc-dominated systems.
Experimental analysis of computer system dependability
NASA Technical Reports Server (NTRS)
Iyer, Ravishankar, K.; Tang, Dong
1993-01-01
This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance.
NASA Astrophysics Data System (ADS)
Wang, Audrey; Price, David T.
2007-03-01
A simple integrated algorithm was developed to relate global climatology to distributions of tree plant functional types (PFT). Multivariate cluster analysis was performed to analyze the statistical homogeneity of the climate space occupied by individual tree PFTs. Forested regions identified from the satellite-based GLC2000 classification were separated into tropical, temperate, and boreal sub-PFTs for use in the Canadian Terrestrial Ecosystem Model (CTEM). Global data sets of monthly minimum temperature, growing degree days, an index of climatic moisture, and estimated PFT cover fractions were then used as variables in the cluster analysis. The statistical results for individual PFT clusters were found consistent with other global-scale classifications of dominant vegetation. As an improvement of the quantification of the climatic limitations on PFT distributions, the results also demonstrated overlapping of PFT cluster boundaries that reflected vegetation transitions, for example, between tropical and temperate biomes. The resulting global database should provide a better basis for simulating the interaction of climate change and terrestrial ecosystem dynamics using global vegetation models.
Audio Classification in Speech and Music: A Comparison between a Statistical and a Neural Approach
NASA Astrophysics Data System (ADS)
Bugatti, Alessandro; Flammini, Alessandra; Migliorati, Pierangelo
2002-12-01
We focus the attention on the problem of audio classification in speech and music for multimedia applications. In particular, we present a comparison between two different techniques for speech/music discrimination. The first method is based on Zero crossing rate and Bayesian classification. It is very simple from a computational point of view, and gives good results in case of pure music or speech. The simulation results show that some performance degradation arises when the music segment contains also some speech superimposed on music, or strong rhythmic components. To overcome these problems, we propose a second method, that uses more features, and is based on neural networks (specifically a multi-layer Perceptron). In this case we obtain better performance, at the expense of a limited growth in the computational complexity. In practice, the proposed neural network is simple to be implemented if a suitable polynomial is used as the activation function, and a real-time implementation is possible even if low-cost embedded systems are used.
Large deformation image classification using generalized locality-constrained linear coding.
Zhang, Pei; Wee, Chong-Yaw; Niethammer, Marc; Shen, Dinggang; Yap, Pew-Thian
2013-01-01
Magnetic resonance (MR) imaging has been demonstrated to be very useful for clinical diagnosis of Alzheimer's disease (AD). A common approach to using MR images for AD detection is to spatially normalize the images by non-rigid image registration, and then perform statistical analysis on the resulting deformation fields. Due to the high nonlinearity of the deformation field, recent studies suggest to use initial momentum instead as it lies in a linear space and fully encodes the deformation field. In this paper we explore the use of initial momentum for image classification by focusing on the problem of AD detection. Experiments on the public ADNI dataset show that the initial momentum, together with a simple sparse coding technique-locality-constrained linear coding (LLC)--can achieve a classification accuracy that is comparable to or even better than the state of the art. We also show that the performance of LLC can be greatly improved by introducing proper weights to the codebook.
Classification of speech dysfluencies using LPC based parameterization techniques.
Hariharan, M; Chee, Lim Sin; Ai, Ooi Chia; Yaacob, Sazali
2012-06-01
The goal of this paper is to discuss and compare three feature extraction methods: Linear Predictive Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC) and Weighted Linear Prediction Cepstral Coefficients (WLPCC) for recognizing the stuttered events. Speech samples from the University College London Archive of Stuttered Speech (UCLASS) were used for our analysis. The stuttered events were identified through manual segmentation and were used for feature extraction. Two simple classifiers namely, k-nearest neighbour (kNN) and Linear Discriminant Analysis (LDA) were employed for speech dysfluencies classification. Conventional validation method was used for testing the reliability of the classifier results. The study on the effect of different frame length, percentage of overlapping, value of ã in a first order pre-emphasizer and different order p were discussed. The speech dysfluencies classification accuracy was found to be improved by applying statistical normalization before feature extraction. The experimental investigation elucidated LPC, LPCC and WLPCC features can be used for identifying the stuttered events and WLPCC features slightly outperforms LPCC features and LPC features.
A Generic multi-dimensional feature extraction method using multiobjective genetic programming.
Zhang, Yang; Rockett, Peter I
2009-01-01
In this paper, we present a generic feature extraction method for pattern classification using multiobjective genetic programming. This not only evolves the (near-)optimal set of mappings from a pattern space to a multi-dimensional decision space, but also simultaneously optimizes the dimensionality of that decision space. The presented framework evolves vector-to-vector feature extractors that maximize class separability. We demonstrate the efficacy of our approach by making statistically-founded comparisons with a wide variety of established classifier paradigms over a range of datasets and find that for most of the pairwise comparisons, our evolutionary method delivers statistically smaller misclassification errors. At very worst, our method displays no statistical difference in a few pairwise comparisons with established classifier/dataset combinations; crucially, none of the misclassification results produced by our method is worse than any comparator classifier. Although principally focused on feature extraction, feature selection is also performed as an implicit side effect; we show that both feature extraction and selection are important to the success of our technique. The presented method has the practical consequence of obviating the need to exhaustively evaluate a large family of conventional classifiers when faced with a new pattern recognition problem in order to attain a good classification accuracy.
Applications of LANDSAT data to the integrated economic development of Mindoro, Phillipines
NASA Technical Reports Server (NTRS)
Wagner, T. W.; Fernandez, J. C.
1977-01-01
LANDSAT data is seen as providing essential up-to-date resource information for the planning process. LANDSAT data of Mindoro Island in the Philippines was processed to provide thematic maps showing patterns of agriculture, forest cover, terrain, wetlands and water turbidity. A hybrid approach using both supervised and unsupervised classification techniques resulted in 30 different scene classes which were subsequently color-coded and mapped at a scale of 1:250,000. In addition, intensive image analysis is being carried out in evaluating the images. The images, maps, and aerial statistics are being used to provide data to seven technical departments in planning the economic development of Mindoro. Multispectral aircraft imagery was collected to compliment the application of LANDSAT data and validate the classification results.
Ospina, Raydonal; Frery, Alejandro C.
2016-01-01
We present a new approach for handwritten signature classification and verification based on descriptors stemming from time causal information theory. The proposal uses the Shannon entropy, the statistical complexity, and the Fisher information evaluated over the Bandt and Pompe symbolization of the horizontal and vertical coordinates of signatures. These six features are easy and fast to compute, and they are the input to an One-Class Support Vector Machine classifier. The results are better than state-of-the-art online techniques that employ higher-dimensional feature spaces which often require specialized software and hardware. We assess the consistency of our proposal with respect to the size of the training sample, and we also use it to classify the signatures into meaningful groups. PMID:27907014
Hierarchical classification method and its application in shape representation
NASA Astrophysics Data System (ADS)
Ireton, M. A.; Oakley, John P.; Xydeas, Costas S.
1992-04-01
In this paper we describe a technique for performing shaped-based content retrieval of images from a large database. In order to be able to formulate such user-generated queries about visual objects, we have developed an hierarchical classification technique. This hierarchical classification technique enables similarity matching between objects, with the position in the hierarchy signifying the level of generality to be used in the query. The classification technique is unsupervised, robust, and general; it can be applied to any suitable parameter set. To establish the potential of this classifier for aiding visual querying, we have applied it to the classification of the 2-D outlines of leaves.
UNDERSTANDING THE INTERNATIONAL CONSENSUS FOR ACUTE PANCREATITIS: CLASSIFICATION OF ATLANTA 2012
de SOUZA, Gleim Dias; SOUZA, Luciana Rodrigues Queiroz; CUENCA, Ronaldo Máfia; JERÔNIMO, Bárbara Stephane de Medeiros; de SOUZA, Guilherme Medeiros; VILELA, Vinícius Martins
2016-01-01
ABSTRACT Introduction: Contrast computed tomography and magnetic resonance imaging are widely used due to its image quality and ability to study pancreatic and peripancreatic morphology. The understanding of the various subtypes of the disease and identification of possible complications requires a familiarity with the terminology, which allows effective communication between the different members of the multidisciplinary team. Aim: Demonstrate the terminology and parameters to identify the different classifications and findings of the disease based on the international consensus for acute pancreatitis ( Atlanta Classification 2012). Methods: Search and analysis of articles in the "CAPES Portal de Periódicos with headings "acute pancreatitis" and "Atlanta Review". Results: Were selected 23 articles containing radiological descriptions, management or statistical data related to pathology. Additional statistical data were obtained from Datasus and Population Census 2010. The radiological diagnostic criterion adopted was the Radiology American College system. The "acute pancreatitis - 2012 Rating: Review Atlanta classification and definitions for international consensus" tries to eliminate inconsistency and divergence from the determination of uniformity to the radiological findings, especially the terminology related to fluid collections. More broadly as "pancreatic abscess" and "phlegmon" went into disuse and the evolution of the collection of patient fluids can be described as "acute peripancreatic collections", "acute necrotic collections", "pseudocyst" and "necrosis pancreatic walled or isolated". Conclusion: Computed tomography and magnetic resonance represent the best techniques with sequential images available for diagnosis. Standardization of the terminology is critical and should improve the management of patients with multiple professionals care, risk stratification and adequate treatment. PMID:27759788
Surveillance of Arthropod Vector-Borne Infectious Diseases Using Remote Sensing Techniques: A Review
Kalluri, Satya; Gilruth, Peter; Rogers, David; Szczur, Martha
2007-01-01
Epidemiologists are adopting new remote sensing techniques to study a variety of vector-borne diseases. Associations between satellite-derived environmental variables such as temperature, humidity, and land cover type and vector density are used to identify and characterize vector habitats. The convergence of factors such as the availability of multi-temporal satellite data and georeferenced epidemiological data, collaboration between remote sensing scientists and biologists, and the availability of sophisticated, statistical geographic information system and image processing algorithms in a desktop environment creates a fertile research environment. The use of remote sensing techniques to map vector-borne diseases has evolved significantly over the past 25 years. In this paper, we review the status of remote sensing studies of arthropod vector-borne diseases due to mosquitoes, ticks, blackflies, tsetse flies, and sandflies, which are responsible for the majority of vector-borne diseases in the world. Examples of simple image classification techniques that associate land use and land cover types with vector habitats, as well as complex statistical models that link satellite-derived multi-temporal meteorological observations with vector biology and abundance, are discussed here. Future improvements in remote sensing applications in epidemiology are also discussed. PMID:17967056
NASA Technical Reports Server (NTRS)
Hsu, Wei-Chen; Kuss, Amber Jean; Ketron, Tyler; Nguyen, Andrew; Remar, Alex Covello; Newcomer, Michelle; Fleming, Erich; Debout, Leslie; Debout, Brad; Detweiler, Angela;
2011-01-01
Tidal marshes are highly productive ecosystems that support migratory birds as roosting and over-wintering habitats on the Pacific Flyway. Microphytobenthos, or more commonly 'biofilms' contribute significantly to the primary productivity of wetland ecosystems, and provide a substantial food source for macroinvertebrates and avian communities. In this study, biofilms were characterized based on taxonomic classification, density differences, and spectral signatures. These techniques were then applied to remotely sensed images to map biofilm densities and distributions in the South Bay Salt Ponds and predict the carrying capacity of these newly restored ponds for migratory birds. The GER-1500 spectroradiometer was used to obtain in situ spectral signatures for each density-class of biofilm. The spectral variation and taxonomic classification between high, medium, and low density biofilm cover types was mapped using in-situ spectral measurements and classification of EO-1 Hyperion and Landsat TM 5 images. Biofilm samples were also collected in the field to perform laboratory analyses including chlorophyll-a, taxonomic classification, and energy content. Comparison of the spectral signatures between the three density groups shows distinct variations useful for classification. Also, analysis of chlorophyll-a concentrations show statistically significant differences between each density group, using the Tukey-Kramer test at an alpha level of 0.05. The potential carrying capacity in South Bay Salt Ponds is estimated to be 250,000 birds.
Lewicke, Aaron; Sazonov, Edward; Corwin, Michael J; Neuman, Michael; Schuckers, Stephanie
2008-01-01
Reliability of classification performance is important for many biomedical applications. A classification model which considers reliability in the development of the model such that unreliable segments are rejected would be useful, particularly, in large biomedical data sets. This approach is demonstrated in the development of a technique to reliably determine sleep and wake using only the electrocardiogram (ECG) of infants. Typically, sleep state scoring is a time consuming task in which sleep states are manually derived from many physiological signals. The method was tested with simultaneous 8-h ECG and polysomnogram (PSG) determined sleep scores from 190 infants enrolled in the collaborative home infant monitoring evaluation (CHIME) study. Learning vector quantization (LVQ) neural network, multilayer perceptron (MLP) neural network, and support vector machines (SVMs) are tested as the classifiers. After systematic rejection of difficult to classify segments, the models can achieve 85%-87% correct classification while rejecting only 30% of the data. This corresponds to a Kappa statistic of 0.65-0.68. With rejection, accuracy improves by about 8% over a model without rejection. Additionally, the impact of the PSG scored indeterminate state epochs is analyzed. The advantages of a reliable sleep/wake classifier based only on ECG include high accuracy, simplicity of use, and low intrusiveness. Reliability of the classification can be built directly in the model, such that unreliable segments are rejected.
NASA Astrophysics Data System (ADS)
Navratil, Peter; Wilps, Hans
2013-01-01
Three different object-based image classification techniques are applied to high-resolution satellite data for the mapping of the habitats of Asian migratory locust (Locusta migratoria migratoria) in the southern Aral Sea basin, Uzbekistan. A set of panchromatic and multispectral Système Pour l'Observation de la Terre-5 satellite images was spectrally enhanced by normalized difference vegetation index and tasseled cap transformation and segmented into image objects, which were then classified by three different classification approaches: a rule-based hierarchical fuzzy threshold (HFT) classification method was compared to a supervised nearest neighbor classifier and classification tree analysis by the quick, unbiased, efficient statistical trees algorithm. Special emphasis was laid on the discrimination of locust feeding and breeding habitats due to the significance of this discrimination for practical locust control. Field data on vegetation and land cover, collected at the time of satellite image acquisition, was used to evaluate classification accuracy. The results show that a robust HFT classifier outperformed the two automated procedures by 13% overall accuracy. The classification method allowed a reliable discrimination of locust feeding and breeding habitats, which is of significant importance for the application of the resulting data for an economically and environmentally sound control of locust pests because exact spatial knowledge on the habitat types allows a more effective surveying and use of pesticides.
Dinov, Ivo D; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W; Price, Nathan D; Van Horn, John D; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M; Dauer, William; Toga, Arthur W
2016-01-01
A unique archive of Big Data on Parkinson's Disease is collected, managed and disseminated by the Parkinson's Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson's disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data-large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources-all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson's disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson's disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer's, Huntington's, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications.
NASA Astrophysics Data System (ADS)
Weller, Andrew F.; Harris, Anthony J.; Ware, J. Andrew; Jarvis, Paul S.
2006-11-01
The classification of sedimentary organic matter (OM) images can be improved by determining the saliency of image analysis (IA) features measured from them. Knowing the saliency of IA feature measurements means that only the most significant discriminating features need be used in the classification process. This is an important consideration for classification techniques such as artificial neural networks (ANNs), where too many features can lead to the 'curse of dimensionality'. The classification scheme adopted in this work is a hybrid of morphologically and texturally descriptive features from previous manual classification schemes. Some of these descriptive features are assigned to IA features, along with several others built into the IA software (Halcon) to ensure that a valid cross-section is available. After an image is captured and segmented, a total of 194 features are measured for each particle. To reduce this number to a more manageable magnitude, the SPSS AnswerTree Exhaustive CHAID (χ 2 automatic interaction detector) classification tree algorithm is used to establish each measurement's saliency as a classification discriminator. In the case of continuous data as used here, the F-test is used as opposed to the published algorithm. The F-test checks various statistical hypotheses about the variance of groups of IA feature measurements obtained from the particles to be classified. The aim is to reduce the number of features required to perform the classification without reducing its accuracy. In the best-case scenario, 194 inputs are reduced to 8, with a subsequent multi-layer back-propagation ANN recognition rate of 98.65%. This paper demonstrates the ability of the algorithm to reduce noise, help overcome the curse of dimensionality, and facilitate an understanding of the saliency of IA features as discriminators for sedimentary OM classification.
Patange Subba Rao, Sheethal Prasad; Lewis, James; Haddad, Ziad; Paringe, Vishal; Mohanty, Khitish
2014-10-01
The aim of the study was to evaluate inter-observer reliability and intra-observer reproducibility between the three-column classification and Schatzker classification systems using 2D and 3D CT models. Fifty-two consecutive patients with tibial plateau fractures were evaluated by five orthopaedic surgeons. All patients were classified into Schatzker and three-column classification systems using x-rays and 2D and 3D CT images. The inter-observer reliability was evaluated in the first round and the intra-observer reliability was determined during the second round 2 weeks later. The average intra-observer reproducibility for the three-column classification was from substantial to excellent in all sub classifications, as compared with Schatzker classification. The inter-observer kappa values increased from substantial to excellent in three-column classification and to moderate in Schatzker classification The average values for three-column classification for all the categories are as follows: (I-III) k2D = 0.718, 95% CI 0.554-0.864, p < 0.0001 and average 3D = 0.874, 95% CI 0.754-0.890, p < 0.0001. For Schatzker classification system, the average values for all six categories are as follows: (I-VI) k2D = 0.536, 95% CI 0.365-0.685, p < 0.0001 and average k3D = 0.552 95% CI 0.405-0.700, p < 0.0001. The values are statistically significant. Statistically significant inter-observer values in both rounds were noted with the three-column classification, making it statistically an excellent agreement. The intra-observer reproducibility for the three-column classification improved as compared with the Schatzker classification. The three-column classification seems to be an effective way to characterise and classify fractures of tibial plateau.
Multi-class SVM model for fMRI-based classification and grading of liver fibrosis
NASA Astrophysics Data System (ADS)
Freiman, M.; Sela, Y.; Edrei, Y.; Pappo, O.; Joskowicz, L.; Abramovitch, R.
2010-03-01
We present a novel non-invasive automatic method for the classification and grading of liver fibrosis from fMRI maps based on hepatic hemodynamic changes. This method automatically creates a model for liver fibrosis grading based on training datasets. Our supervised learning method evaluates hepatic hemodynamics from an anatomical MRI image and three T2*-W fMRI signal intensity time-course scans acquired during the breathing of air, air-carbon dioxide, and carbogen. It constructs a statistical model of liver fibrosis from these fMRI scans using a binary-based one-against-all multi class Support Vector Machine (SVM) classifier. We evaluated the resulting classification model with the leave-one out technique and compared it to both full multi-class SVM and K-Nearest Neighbor (KNN) classifications. Our experimental study analyzed 57 slice sets from 13 mice, and yielded a 98.2% separation accuracy between healthy and low grade fibrotic subjects, and an overall accuracy of 84.2% for fibrosis grading. These results are better than the existing image-based methods which can only discriminate between healthy and high grade fibrosis subjects. With appropriate extensions, our method may be used for non-invasive classification and progression monitoring of liver fibrosis in human patients instead of more invasive approaches, such as biopsy or contrast-enhanced imaging.
On the prompt identification of traces of explosives
NASA Astrophysics Data System (ADS)
Trobajo, M. T.; López-Cabeceira, M. M.; Carriegos, M. V.; Díez-Machío, H.
2014-12-01
Some recent results in the use of Raman spectroscopy for recognition of explosives are reviewed. Experimental study using spectra data base has been developed. In order to simulate a more real situation, both blank substances and explosives substances have been considered in this research. Statistic classification techniques have been performed. Estimations of prediction errors were obtained by cross-validation methods. These results can be applied in airport security systems in order to prevent terror acts (by the detection of explosive/flammable substances).
NASA Astrophysics Data System (ADS)
Smid, Marek; Costa, Ana; Pebesma, Edzer; Granell, Carlos; Bhattacharya, Devanjan
2016-04-01
Human kind is currently predominantly urban based, and the majority of ever continuing population growth will take place in urban agglomerations. Urban systems are not only major drivers of climate change, but also the impact hot spots. Furthermore, climate change impacts are commonly managed at city scale. Therefore, assessing climate change impacts on urban systems is a very relevant subject of research. Climate and its impacts on all levels (local, meso and global scale) and also the inter-scale dependencies of those processes should be a subject to detail analysis. While global and regional projections of future climate are currently available, local-scale information is lacking. Hence, statistical downscaling methodologies represent a potentially efficient way to help to close this gap. In general, the methodological reviews of downscaling procedures cover the various methods according to their application (e.g. downscaling for the hydrological modelling). Some of the most recent and comprehensive studies, such as the ESSEM COST Action ES1102 (VALUE), use the concept of Perfect Prog and MOS. Other examples of classification schemes of downscaling techniques consider three main categories: linear methods, weather classifications and weather generators. Downscaling and climate modelling represent a multidisciplinary field, where researchers from various backgrounds intersect their efforts, resulting in specific terminology, which may be somewhat confusing. For instance, the Polynomial Regression (also called the Surface Trend Analysis) is a statistical technique. In the context of the spatial interpolation procedures, it is commonly classified as a deterministic technique, and kriging approaches are classified as stochastic. Furthermore, the terms "statistical" and "stochastic" (frequently used as names of sub-classes in downscaling methodological reviews) are not always considered as synonymous, even though both terms could be seen as identical since they are referring to methods handling input modelling factors as variables with certain probability distributions. In addition, the recent development is going towards multi-step methodologies containing deterministic and stochastic components. This evolution leads to the introduction of new terms like hybrid or semi-stochastic approaches, which makes the efforts to systematically classifying downscaling methods to the previously defined categories even more challenging. This work presents a review of statistical downscaling procedures, which classifies the methods in two steps. In the first step, we describe several techniques that produce a single climatic surface based on observations. The methods are classified into two categories using an approximation to the broadest consensual statistical terms: linear and non-linear methods. The second step covers techniques that use simulations to generate alternative surfaces, which correspond to different realizations of the same processes. Those simulations are essential because there is a limited number of real observational data, and such procedures are crucial for modelling extremes. This work emphasises the link between statistical downscaling methods and the research of climate change impacts at city scale.
NASA Astrophysics Data System (ADS)
Cameron, Enrico; Pilla, Giorgio; Stella, Fabio A.
2018-06-01
The application of statistical classification methods is investigated—in comparison also to spatial interpolation methods—for predicting the acceptability of well-water quality in a situation where an effective quantitative model of the hydrogeological system under consideration cannot be developed. In the example area in northern Italy, in particular, the aquifer is locally affected by saline water and the concentration of chloride is the main indicator of both saltwater occurrence and groundwater quality. The goal is to predict if the chloride concentration in a water well will exceed the allowable concentration so that the water is unfit for the intended use. A statistical classification algorithm achieved the best predictive performances and the results of the study show that statistical classification methods provide further tools for dealing with groundwater quality problems concerning hydrogeological systems that are too difficult to describe analytically or to simulate effectively.
Statistical inference for classification of RRIM clone series using near IR reflectance properties
NASA Astrophysics Data System (ADS)
Ismail, Faridatul Aima; Madzhi, Nina Korlina; Hashim, Hadzli; Abdullah, Noor Ezan; Khairuzzaman, Noor Aishah; Azmi, Azrie Faris Mohd; Sampian, Ahmad Faiz Mohd; Harun, Muhammad Hafiz
2015-08-01
RRIM clone is a rubber breeding series produced by RRIM (Rubber Research Institute of Malaysia) through "rubber breeding program" to improve latex yield and producing clones attractive to farmers. The objective of this work is to analyse measurement of optical sensing device on latex of selected clone series. The device using transmitting NIR properties and its reflectance is converted in terms of voltage. The obtained reflectance index value via voltage was analyzed using statistical technique in order to find out the discrimination among the clones. From the statistical results using error plots and one-way ANOVA test, there is an overwhelming evidence showing discrimination of RRIM 2002, RRIM 2007 and RRIM 3001 clone series with p value = 0.000. RRIM 2008 cannot be discriminated with RRIM 2014; however both of these groups are distinct from the other clones.
High Dimensional Classification Using Features Annealed Independence Rules.
Fan, Jianqing; Fan, Yingying
2008-01-01
Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.
Lyons-Weiler, James; Pelikan, Richard; Zeh, Herbert J; Whitcomb, David C; Malehorn, David E; Bigbee, William L; Hauskrecht, Milos
2005-01-01
Peptide profiles generated using SELDI/MALDI time of flight mass spectrometry provide a promising source of patient-specific information with high potential impact on the early detection and classification of cancer and other diseases. The new profiling technology comes, however, with numerous challenges and concerns. Particularly important are concerns of reproducibility of classification results and their significance. In this work we describe a computational validation framework, called PACE (Permutation-Achieved Classification Error), that lets us assess, for a given classification model, the significance of the Achieved Classification Error (ACE) on the profile data. The framework compares the performance statistic of the classifier on true data samples and checks if these are consistent with the behavior of the classifier on the same data with randomly reassigned class labels. A statistically significant ACE increases our belief that a discriminative signal was found in the data. The advantage of PACE analysis is that it can be easily combined with any classification model and is relatively easy to interpret. PACE analysis does not protect researchers against confounding in the experimental design, or other sources of systematic or random error. We use PACE analysis to assess significance of classification results we have achieved on a number of published data sets. The results show that many of these datasets indeed possess a signal that leads to a statistically significant ACE.
Protocol Design Challenges in the Detection of Awareness in Aware Subjects Using EEG Signals.
Henriques, J; Gabriel, D; Grigoryeva, L; Haffen, E; Moulin, T; Aubry, R; Pazart, L; Ortega, J-P
2016-10-01
Recent studies have evidenced serious difficulties in detecting covert awareness with electroencephalography-based techniques both in unresponsive patients and in healthy control subjects. This work reproduces the protocol design in two recent mental imagery studies with a larger group comprising 20 healthy volunteers. The main goal is assessing if modifications in the signal extraction techniques, training-testing/cross-validation routines, and hypotheses evoked in the statistical analysis, can provide solutions to the serious difficulties documented in the literature. The lack of robustness in the results advises for further search of alternative protocols more suitable for machine learning classification and of better performing signal treatment techniques. Specific recommendations are made using the findings in this work. © EEG and Clinical Neuroscience Society (ECNS) 2014.
Jenson, Susan K.; Trautwein, C.M.
1984-01-01
The application of an unsupervised, spatially dependent clustering technique (AMOEBA) to interpolated raster arrays of stream sediment data has been found to provide useful multivariate geochemical associations for modeling regional polymetallic resource potential. The technique is based on three assumptions regarding the compositional and spatial relationships of stream sediment data and their regional significance. These assumptions are: (1) compositionally separable classes exist and can be statistically distinguished; (2) the classification of multivariate data should minimize the pair probability of misclustering to establish useful compositional associations; and (3) a compositionally defined class represented by three or more contiguous cells within an array is a more important descriptor of a terrane than a class represented by spatial outliers.
NASA Astrophysics Data System (ADS)
Zhao, Bei; Zhong, Yanfei; Zhang, Liangpei
2016-06-01
Land-use classification of very high spatial resolution remote sensing (VHSR) imagery is one of the most challenging tasks in the field of remote sensing image processing. However, the land-use classification is hard to be addressed by the land-cover classification techniques, due to the complexity of the land-use scenes. Scene classification is considered to be one of the expected ways to address the land-use classification issue. The commonly used scene classification methods of VHSR imagery are all derived from the computer vision community that mainly deal with terrestrial image recognition. Differing from terrestrial images, VHSR images are taken by looking down with airborne and spaceborne sensors, which leads to the distinct light conditions and spatial configuration of land cover in VHSR imagery. Considering the distinct characteristics, two questions should be answered: (1) Which type or combination of information is suitable for the VHSR imagery scene classification? (2) Which scene classification algorithm is best for VHSR imagery? In this paper, an efficient spectral-structural bag-of-features scene classifier (SSBFC) is proposed to combine the spectral and structural information of VHSR imagery. SSBFC utilizes the first- and second-order statistics (the mean and standard deviation values, MeanStd) as the statistical spectral descriptor for the spectral information of the VHSR imagery, and uses dense scale-invariant feature transform (SIFT) as the structural feature descriptor. From the experimental results, the spectral information works better than the structural information, while the combination of the spectral and structural information is better than any single type of information. Taking the characteristic of the spatial configuration into consideration, SSBFC uses the whole image scene as the scope of the pooling operator, instead of the scope generated by a spatial pyramid (SP) commonly used in terrestrial image classification. The experimental results show that the whole image as the scope of the pooling operator performs better than the scope generated by SP. In addition, SSBFC codes and pools the spectral and structural features separately to avoid mutual interruption between the spectral and structural features. The coding vectors of spectral and structural features are then concatenated into a final coding vector. Finally, SSBFC classifies the final coding vector by support vector machine (SVM) with a histogram intersection kernel (HIK). Compared with the latest scene classification methods, the experimental results with three VHSR datasets demonstrate that the proposed SSBFC performs better than the other classification methods for VHSR image scenes.
EVALUATION OF REGISTRATION, COMPRESSION AND CLASSIFICATION ALGORITHMS
NASA Technical Reports Server (NTRS)
Jayroe, R. R.
1994-01-01
Several types of algorithms are generally used to process digital imagery such as Landsat data. The most commonly used algorithms perform the task of registration, compression, and classification. Because there are different techniques available for performing registration, compression, and classification, imagery data users need a rationale for selecting a particular approach to meet their particular needs. This collection of registration, compression, and classification algorithms was developed so that different approaches could be evaluated and the best approach for a particular application determined. Routines are included for six registration algorithms, six compression algorithms, and two classification algorithms. The package also includes routines for evaluating the effects of processing on the image data. This collection of routines should be useful to anyone using or developing image processing software. Registration of image data involves the geometrical alteration of the imagery. Registration routines available in the evaluation package include image magnification, mapping functions, partitioning, map overlay, and data interpolation. The compression of image data involves reducing the volume of data needed for a given image. Compression routines available in the package include adaptive differential pulse code modulation, two-dimensional transforms, clustering, vector reduction, and picture segmentation. Classification of image data involves analyzing the uncompressed or compressed image data to produce inventories and maps of areas of similar spectral properties within a scene. The classification routines available include a sequential linear technique and a maximum likelihood technique. The choice of the appropriate evaluation criteria is quite important in evaluating the image processing functions. The user is therefore given a choice of evaluation criteria with which to investigate the available image processing functions. All of the available evaluation criteria basically compare the observed results with the expected results. For the image reconstruction processes of registration and compression, the expected results are usually the original data or some selected characteristics of the original data. For classification processes the expected result is the ground truth of the scene. Thus, the comparison process consists of determining what changes occur in processing, where the changes occur, how much change occurs, and the amplitude of the change. The package includes evaluation routines for performing such comparisons as average uncertainty, average information transfer, chi-square statistics, multidimensional histograms, and computation of contingency matrices. This collection of routines is written in FORTRAN IV for batch execution and has been implemented on an IBM 360 computer with a central memory requirement of approximately 662K of 8 bit bytes. This collection of image processing and evaluation routines was developed in 1979.
Statistical Methods for Passive Vehicle Classification in Urban Traffic Surveillance and Control
DOT National Transportation Integrated Search
1980-01-01
A statistical approach to passive vehicle classification using the phase-shift signature from electromagnetic presence-type vehicle detectors is developed with digitized samples of the analog phase-shift signature, the problem of classifying vehicle ...
The search for structure - Object classification in large data sets. [for astronomers
NASA Technical Reports Server (NTRS)
Kurtz, Michael J.
1988-01-01
Research concerning object classifications schemes are reviewed, focusing on large data sets. Classification techniques are discussed, including syntactic, decision theoretic methods, fuzzy techniques, and stochastic and fuzzy grammars. Consideration is given to the automation of MK classification (Morgan and Keenan, 1973) and other problems associated with the classification of spectra. In addition, the classification of galaxies is examined, including the problems of systematic errors, blended objects, galaxy types, and galaxy clusters.
An Examination of Application of Artificial Neural Network in Cognitive Radios
NASA Astrophysics Data System (ADS)
Bello Salau, H.; Onwuka, E. N.; Aibinu, A. M.
2013-12-01
Recent advancement in software radio technology has led to the development of smart device known as cognitive radio. This type of radio fuses powerful techniques taken from artificial intelligence, game theory, wideband/multiple antenna techniques, information theory and statistical signal processing to create an outstanding dynamic behavior. This cognitive radio is utilized in achieving diverse set of applications such as spectrum sensing, radio parameter adaptation and signal classification. This paper contributes by reviewing different cognitive radio implementation that uses artificial intelligence such as the hidden markov models, metaheuristic algorithm and artificial neural networks (ANNs). Furthermore, different areas of application of ANNs and their performance metrics based approach are also examined.
Characteristics and Classification of Least Altered Streamflows in Massachusetts
Armstrong, David S.; Parker, Gene W.; Richards, Todd A.
2008-01-01
Streamflow records from 85 streamflow-gaging stations at which streamflows were considered to be least altered were used to characterize natural streamflows within southern New England. Period-of-record streamflow data were used to determine annual hydrographs of median monthly flows. The shapes and magnitudes of annual hydrographs of median monthly flows, normalized by drainage area, differed among stations in different geographic areas of southern New England. These differences were gradational across southern New England and were attributed to differences in basin and climate characteristics. Period-of-record streamflow data were also used to analyze the statistical properties of daily streamflows at 61 stations across southern New England by using L-moment ratios. An L-moment ratio diagram of L-skewness and L-kurtosis showed a continuous gradation in these properties between stations and indicated differences between base-flow dominated and runoff-dominated rivers. Streamflow records from a concurrent period (1960-2004) for 61 stations were used in a multivariate statistical analysis to develop a hydrologic classification of rivers in southern New England. Missing records from 46 of these stations were extended by using a Maintenance of Variation Extension technique. The concurrent-period streamflows were used in the Indicators of Hydrologic Alteration and Hydrologic Index Tool programs to determine 224 hydrologic indices for the 61 stations. Principal-components analysis (PCA) was used to reduce the number of hydrologic indices to 20 that provided nonredundant information. The PCA also indicated that the major patterns of variability in the dataset are related to differences in flow variability and low-flow magnitude among the stations. Hierarchical cluster analysis was used to classify stations into groups with similar hydrologic properties. The cluster analysis classified rivers in southern New England into two broad groups: (1) base-flow dominated rivers, whose statistical properties indicated less flow variability and high magnitudes of low flow, and (2) runoff-dominated rivers, whose statistical properties indicated greater flow variability and lower magnitudes of low flow. A four-cluster classification further classified the runoff-dominated streams into three groups that varied in gradient, elevation, and differences in winter streamflow conditions: high-gradient runoff-dominated rivers, northern runoff-dominated rivers, and southern runoff-dominated rivers. A nine-cluster division indicated that basin size also becomes a distinguishing factor among basins at finer levels of classification. Smaller basins (less than 10 square miles) were classified into different groups than larger basins. A comparison of station classifications indicated that a classification based on multiple hydrologic indices that represent different aspects of the flow regime did not result in the same classification of stations as a classification based on a single type of statistic such as a monthly median. River basins identified by the cluster analysis as having similar hydrologic properties tended to have similar basin and climate characteristics and to be in close proximity to one another. Stations were not classified in the same cluster on the basis of geographic location alone; as a result, boundaries cannot be drawn between geographic regions with similar streamflow characteristics. Rivers with different basin and climate characteristics were classified in different clusters, even if they were in adjacent basins or upstream and downstream within the same basin.
Oelze, Michael L.; Mamou, Jonathan
2017-01-01
Conventional medical imaging technologies, including ultrasound, have continued to improve over the years. For example, in oncology, medical imaging is characterized by high sensitivity, i.e., the ability to detect anomalous tissue features, but the ability to classify these tissue features from images often lacks specificity. As a result, a large number of biopsies of tissues with suspicious image findings are performed each year with a vast majority of these biopsies resulting in a negative finding. To improve specificity of cancer imaging, quantitative imaging techniques can play an important role. Conventional ultrasound B-mode imaging is mainly qualitative in nature. However, quantitative ultrasound (QUS) imaging can provide specific numbers related to tissue features that can increase the specificity of image findings leading to improvements in diagnostic ultrasound. QUS imaging techniques can encompass a wide variety of techniques including spectral-based parameterization, elastography, shear wave imaging, flow estimation and envelope statistics. Currently, spectral-based parameterization and envelope statistics are not available on most conventional clinical ultrasound machines. However, in recent years QUS techniques involving spectral-based parameterization and envelope statistics have demonstrated success in many applications, providing additional diagnostic capabilities. Spectral-based techniques include the estimation of the backscatter coefficient, estimation of attenuation, and estimation of scatterer properties such as the correlation length associated with an effective scatterer diameter and the effective acoustic concentration of scatterers. Envelope statistics include the estimation of the number density of scatterers and quantification of coherent to incoherent signals produced from the tissue. Challenges for clinical application include correctly accounting for attenuation effects and transmission losses and implementation of QUS on clinical devices. Successful clinical and pre-clinical applications demonstrating the ability of QUS to improve medical diagnostics include characterization of the myocardium during the cardiac cycle, cancer detection, classification of solid tumors and lymph nodes, detection and quantification of fatty liver disease, and monitoring and assessment of therapy. PMID:26761606
Network-based high level data classification.
Silva, Thiago Christiano; Zhao, Liang
2012-06-01
Traditional supervised data classification considers only physical features (e.g., distance or similarity) of the input data. Here, this type of learning is called low level classification. On the other hand, the human (animal) brain performs both low and high orders of learning and it has facility in identifying patterns according to the semantic meaning of the input data. Data classification that considers not only physical attributes but also the pattern formation is, here, referred to as high level classification. In this paper, we propose a hybrid classification technique that combines both types of learning. The low level term can be implemented by any classification technique, while the high level term is realized by the extraction of features of the underlying network constructed from the input data. Thus, the former classifies the test instances by their physical features or class topologies, while the latter measures the compliance of the test instances to the pattern formation of the data. Our study shows that the proposed technique not only can realize classification according to the pattern formation, but also is able to improve the performance of traditional classification techniques. Furthermore, as the class configuration's complexity increases, such as the mixture among different classes, a larger portion of the high level term is required to get correct classification. This feature confirms that the high level classification has a special importance in complex situations of classification. Finally, we show how the proposed technique can be employed in a real-world application, where it is capable of identifying variations and distortions of handwritten digit images. As a result, it supplies an improvement in the overall pattern recognition rate.
NASA Astrophysics Data System (ADS)
Podladchikova, O.; Lefebvre, B.; Krasnoselskikh, V.; Podladchikov, V.
An important task for the problem of coronal heating is to produce reliable evaluation of the statistical properties of energy release and eruptive events such as micro-and nanoflares in the solar corona. Different types of distributions for the peak flux, peak count rate measurements, pixel intensities, total energy flux or emission measures increases or waiting times have appeared in the literature. This raises the question of a precise evaluation and classification of such distributions. For this purpose, we use the method proposed by K. Pearson at the beginning of the last century, based on the relationship between the first 4 moments of the distribution. Pearson's technique encompasses and classifies a broad range of distributions, including some of those which have appeared in the literature about coronal heating. This technique is successfully applied to simulated data from the model of Krasnoselskikh et al. (2002). It allows to provide successful fits to the empirical distributions of the dissipated energy, and to classify them as a function of model parameters such as dissipation mechanisms and threshold.
Bajoub, Aadil; Medina-Rodríguez, Santiago; Gómez-Romero, María; Ajal, El Amine; Bagur-González, María Gracia; Fernández-Gutiérrez, Alberto; Carrasco-Pancorbo, Alegría
2017-01-15
High Performance Liquid Chromatography (HPLC) with diode array (DAD) and fluorescence (FLD) detection was used to acquire the fingerprints of the phenolic fraction of monovarietal extra-virgin olive oils (extra-VOOs) collected over three consecutive crop seasons (2011/2012-2013/2014). The chromatographic fingerprints of 140 extra-VOO samples processed from olive fruits of seven olive varieties, were recorded and statistically treated for varietal authentication purposes. First, DAD and FLD chromatographic-fingerprint datasets were separately processed and, subsequently, were joined using "Low-level" and "Mid-Level" data fusion methods. After the preliminary examination by principal component analysis (PCA), three supervised pattern recognition techniques, Partial Least Squares Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogies (SIMCA) and K-Nearest Neighbors (k-NN) were applied to the four chromatographic-fingerprinting matrices. The classification models built were very sensitive and selective, showing considerably good recognition and prediction abilities. The combination "chromatographic dataset+chemometric technique" allowing the most accurate classification for each monovarietal extra-VOO was highlighted. Copyright © 2016 Elsevier Ltd. All rights reserved.
A Categorization of Dynamic Analyzers
NASA Technical Reports Server (NTRS)
Lujan, Michelle R.
1997-01-01
Program analysis techniques and tools are essential to the development process because of the support they provide in detecting errors and deficiencies at different phases of development. The types of information rendered through analysis includes the following: statistical measurements of code, type checks, dataflow analysis, consistency checks, test data,verification of code, and debugging information. Analyzers can be broken into two major categories: dynamic and static. Static analyzers examine programs with respect to syntax errors and structural properties., This includes gathering statistical information on program content, such as the number of lines of executable code, source lines. and cyclomatic complexity. In addition, static analyzers provide the ability to check for the consistency of programs with respect to variables. Dynamic analyzers in contrast are dependent on input and the execution of a program providing the ability to find errors that cannot be detected through the use of static analysis alone. Dynamic analysis provides information on the behavior of a program rather than on the syntax. Both types of analysis detect errors in a program, but dynamic analyzers accomplish this through run-time behavior. This paper focuses on the following broad classification of dynamic analyzers: 1) Metrics; 2) Models; and 3) Monitors. Metrics are those analyzers that provide measurement. The next category, models, captures those analyzers that present the state of the program to the user at specified points in time. The last category, monitors, checks specified code based on some criteria. The paper discusses each classification and the techniques that are included under them. In addition, the role of each technique in the software life cycle is discussed. Familiarization with the tools that measure, model and monitor programs provides a framework for understanding the program's dynamic behavior from different, perspectives through analysis of the input/output data.
van der Ploeg, Tjeerd; Nieboer, Daan; Steyerberg, Ewout W
2016-10-01
Prediction of medical outcomes may potentially benefit from using modern statistical modeling techniques. We aimed to externally validate modeling strategies for prediction of 6-month mortality of patients suffering from traumatic brain injury (TBI) with predictor sets of increasing complexity. We analyzed individual patient data from 15 different studies including 11,026 TBI patients. We consecutively considered a core set of predictors (age, motor score, and pupillary reactivity), an extended set with computed tomography scan characteristics, and a further extension with two laboratory measurements (glucose and hemoglobin). With each of these sets, we predicted 6-month mortality using default settings with five statistical modeling techniques: logistic regression (LR), classification and regression trees, random forests (RFs), support vector machines (SVM) and neural nets. For external validation, a model developed on one of the 15 data sets was applied to each of the 14 remaining sets. This process was repeated 15 times for a total of 630 validations. The area under the receiver operating characteristic curve (AUC) was used to assess the discriminative ability of the models. For the most complex predictor set, the LR models performed best (median validated AUC value, 0.757), followed by RF and support vector machine models (median validated AUC value, 0.735 and 0.732, respectively). With each predictor set, the classification and regression trees models showed poor performance (median validated AUC value, <0.7). The variability in performance across the studies was smallest for the RF- and LR-based models (inter quartile range for validated AUC values from 0.07 to 0.10). In the area of predicting mortality from TBI, nonlinear and nonadditive effects are not pronounced enough to make modern prediction methods beneficial. Copyright © 2016 Elsevier Inc. All rights reserved.
Machado, Daniel Gonçalves; da Cruz Cerqueira, Sergio Auto; de Lima, Alexandre Fernandes; de Mathias, Marcelo Bezerra; Aramburu, José Paulo Gabbi; Rodarte, Rodrigo Ribeiro Pinho
2016-01-01
Objective The objective of this study was to evaluate the current classifications for fractures of the distal extremity of the radius, since the classifications made using traditional radiographs in anteroposterior and lateral views have been questioned regarding their reproducibility. In the literature, it has been suggested that other options are needed, such as use of preoperative radiographs on fractures of the distal radius subjected to traction, with stratification by the evaluators. The aim was to demonstrate which classification systems present better statistical reliability. Results In the Universal classification, the results from the third-year resident group (R3) and from the group of more experienced evaluators (Staff) presented excellent correlation, with a statistically significant p-value (p < 0.05). Neither of the groups presented a statistically significant result through the Frykman classification. In the AO classification, there were high correlations in the R3 and Staff groups (respectively 0.950 and 0.800), with p-values lower than 0.05 (respectively <0.001 and 0.003). Conclusion It can be concluded that radiographs performed under traction showed good concordance in the Staff group and in the R3 group, and that this is a good tactic for radiographic evaluations of fractures of the distal extremity of the radius. PMID:26962498
Wu, Allison Chia-Yi; Rifkin, Scott A
2015-03-27
Recent techniques for tagging and visualizing single molecules in fixed or living organisms and cell lines have been revolutionizing our understanding of the spatial and temporal dynamics of fundamental biological processes. However, fluorescence microscopy images are often noisy, and it can be difficult to distinguish a fluorescently labeled single molecule from background speckle. We present a computational pipeline to distinguish the true signal of fluorescently labeled molecules from background fluorescence and noise. We test our technique using the challenging case of wide-field, epifluorescence microscope image stacks from single molecule fluorescence in situ experiments on nematode embryos where there can be substantial out-of-focus light and structured noise. The software recognizes and classifies individual mRNA spots by measuring several features of local intensity maxima and classifying them with a supervised random forest classifier. A key innovation of this software is that, by estimating the probability that each local maximum is a true spot in a statistically principled way, it makes it possible to estimate the error introduced by image classification. This can be used to assess the quality of the data and to estimate a confidence interval for the molecule count estimate, all of which are important for quantitative interpretations of the results of single-molecule experiments. The software classifies spots in these images well, with >95% AUROC on realistic artificial data and outperforms other commonly used techniques on challenging real data. Its interval estimates provide a unique measure of the quality of an image and confidence in the classification.
NASA Technical Reports Server (NTRS)
Sabol, Donald E., Jr.; Roberts, Dar A.; Adams, John B.; Smith, Milton O.
1993-01-01
An important application of remote sensing is to map and monitor changes over large areas of the land surface. This is particularly significant with the current interest in monitoring vegetation communities. Most of traditional methods for mapping different types of plant communities are based upon statistical classification techniques (i.e., parallel piped, nearest-neighbor, etc.) applied to uncalibrated multispectral data. Classes from these techniques are typically difficult to interpret (particularly to a field ecologist/botanist). Also, classes derived for one image can be very different from those derived from another image of the same area, making interpretation of observed temporal changes nearly impossible. More recently, neural networks have been applied to classification. Neural network classification, based upon spectral matching, is weak in dealing with spectral mixtures (a condition prevalent in images of natural surfaces). Another approach to mapping vegetation communities is based on spectral mixture analysis, which can provide a consistent framework for image interpretation. Roberts et al. (1990) mapped vegetation using the band residuals from a simple mixing model (the same spectral endmembers applied to all image pixels). Sabol et al. (1992b) and Roberts et al. (1992) used different methods to apply the most appropriate spectral endmembers to each image pixel, thereby allowing mapping of vegetation based upon the the different endmember spectra. In this paper, we describe a new approach to classification of vegetation communities based upon the spectra fractions derived from spectral mixture analysis. This approach was applied to three 1992 AVIRIS images of Jasper Ridge, California to observe seasonal changes in surface composition.
Classification of the Regional Ionospheric Disturbance Based on Machine Learning Techniques
NASA Astrophysics Data System (ADS)
Terzi, Merve Begum; Arikan, Orhan; Karatay, Secil; Arikan, Feza; Gulyaeva, Tamara
2016-08-01
In this study, Total Electron Content (TEC) estimated from GPS receivers is used to model the regional and local variability that differs from global activity along with solar and geomagnetic indices. For the automated classification of regional disturbances, a classification technique based on a robust machine learning technique that have found wide spread use, Support Vector Machine (SVM) is proposed. Performance of developed classification technique is demonstrated for midlatitude ionosphere over Anatolia using TEC estimates generated from GPS data provided by Turkish National Permanent GPS Network (TNPGN-Active) for solar maximum year of 2011. As a result of implementing developed classification technique to Global Ionospheric Map (GIM) TEC data, which is provided by the NASA Jet Propulsion Laboratory (JPL), it is shown that SVM can be a suitable learning method to detect anomalies in TEC variations.
Machine learning modelling for predicting soil liquefaction susceptibility
NASA Astrophysics Data System (ADS)
Samui, P.; Sitharam, T. G.
2011-01-01
This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN) based on multi-layer perceptions (MLP) that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM) that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [(N1)60] and cyclic stress ratio (CSR). Further, an attempt has been made to simplify the models, requiring only the two parameters [(N1)60 and peck ground acceleration (amax/g)], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.
McLaughlin, Eamon J; Cunningham, Michael J; Kazahaya, Ken; Hsing, Julianna; Kawai, Kosuke; Adil, Eelam A
2016-06-01
To evaluate the feasibility of radiofrequency surgical instrumentation for endoscopic resection of juvenile nasopharyngeal angiofibroma (JNA) and to test the hypothesis that endoscopic radiofrequency ablation-assisted (RFA) resection will have superior intraoperative and/or postoperative outcomes as compared with traditional endoscopic (TE) resection techniques. Case series with chart review. Two tertiary care pediatric hospitals. Twenty-nine pediatric patients who underwent endoscopic transnasal resection of JNA from January 2000 to December 2014. Twenty-nine patients underwent RFA (n = 13) or TE (n = 16) JNA resection over the 15-year study period. Mean patient age was not statistically different between the 2 groups (P = .41); neither was their University of Pittsburgh Medical Center classification stage (P = .79). All patients underwent preoperative embolization. Mean operative times were not statistically different (P = .29). Mean intraoperative blood loss and the need for a transfusion were also not statistically different (P = .27 and .47, respectively). Length of hospital stay was not statistically different (P = .46). Recurrence rates did not differ between groups (P = .99) over a mean follow-up period of 2.3 years. There were no significant differences between RFA and TE resection in intraoperative or postoperative outcome parameters. © American Academy of Otolaryngology—Head and Neck Surgery Foundation 2016.
NASA Astrophysics Data System (ADS)
Yao, Sen; Li, Tao; Li, JieQing; Liu, HongGao; Wang, YuanZhong
2018-06-01
Boletus griseus and Boletus edulis are two well-known wild-grown edible mushrooms which have high nutrition, delicious flavor and high economic value distributing in Yunnan Province. In this study, a rapid method using Fourier transform infrared (FT-IR) and ultraviolet (UV) spectroscopies coupled with data fusion was established for the discrimination of Boletus mushrooms from seven different geographical origins with pattern recognition method. Initially, the spectra of 332 mushroom samples obtained from the two spectroscopic techniques were analyzed individually and then the classification performance based on data fusion strategy was investigated. Meanwhile, the latent variables (LVs) of FT-IR and UV spectra were extracted by partial least square discriminant analysis (PLS-DA) and two datasets were concatenated into a new matrix for data fusion. Then, the fusion matrix was further analyzed by support vector machine (SVM). Compared with single spectroscopic technique, data fusion strategy can improve the classification performance effectively. In particular, the accuracy of correct classification of SVM model in training and test sets were 99.10% and 100.00%, respectively. The results demonstrated that data fusion of FT-IR and UV spectra can provide higher synergic effect for the discrimination of different geographical origins of Boletus mushrooms, which may be benefit for further authentication and quality assessment of edible mushrooms.
Yao, Sen; Li, Tao; Li, JieQing; Liu, HongGao; Wang, YuanZhong
2018-06-05
Boletus griseus and Boletus edulis are two well-known wild-grown edible mushrooms which have high nutrition, delicious flavor and high economic value distributing in Yunnan Province. In this study, a rapid method using Fourier transform infrared (FT-IR) and ultraviolet (UV) spectroscopies coupled with data fusion was established for the discrimination of Boletus mushrooms from seven different geographical origins with pattern recognition method. Initially, the spectra of 332 mushroom samples obtained from the two spectroscopic techniques were analyzed individually and then the classification performance based on data fusion strategy was investigated. Meanwhile, the latent variables (LVs) of FT-IR and UV spectra were extracted by partial least square discriminant analysis (PLS-DA) and two datasets were concatenated into a new matrix for data fusion. Then, the fusion matrix was further analyzed by support vector machine (SVM). Compared with single spectroscopic technique, data fusion strategy can improve the classification performance effectively. In particular, the accuracy of correct classification of SVM model in training and test sets were 99.10% and 100.00%, respectively. The results demonstrated that data fusion of FT-IR and UV spectra can provide higher synergic effect for the discrimination of different geographical origins of Boletus mushrooms, which may be benefit for further authentication and quality assessment of edible mushrooms. Copyright © 2018 Elsevier B.V. All rights reserved.
Using Context Variety and Students' Discussions in Recognizing Statistical Situations
ERIC Educational Resources Information Center
Silva, José Luis Ángel Rodríguez; Aguilar, Mario Sánchez
2016-01-01
We present a proposal for helping students to cope with statistical word problems related to the classification of different cases of confidence intervals. The proposal promotes an environment where students can explicitly discuss the reasons underlying their classification of cases.
Training site statistics from Landsat and Seasat satellite imagery registered to a common map base
NASA Technical Reports Server (NTRS)
Clark, J.
1981-01-01
Landsat and Seasat satellite imagery and training site boundary coordinates were registered to a common Universal Transverse Mercator map base in the Newport Beach area of Orange County, California. The purpose was to establish a spatially-registered, multi-sensor data base which would test the use of Seasat synthetic aperture radar imagery to improve spectral separability of channels used for land use classification of an urban area. Digital image processing techniques originally developed for the digital mosaics of the California Desert and the State of Arizona were adapted to spatially register multispectral and radar data. Techniques included control point selection from imagery and USGS topographic quadrangle maps, control point cataloguing with the Image Based Information System, and spatial and spectral rectifications of the imagery. The radar imagery was pre-processed to reduce its tendency toward uniform data distributions, so that training site statistics for selected Landsat and pre-processed Seasat imagery indicated good spectral separation between channels.
Recognition of speaker-dependent continuous speech with KEAL
NASA Astrophysics Data System (ADS)
Mercier, G.; Bigorgne, D.; Miclet, L.; Le Guennec, L.; Querre, M.
1989-04-01
A description of the speaker-dependent continuous speech recognition system KEAL is given. An unknown utterance, is recognized by means of the followng procedures: acoustic analysis, phonetic segmentation and identification, word and sentence analysis. The combination of feature-based, speaker-independent coarse phonetic segmentation with speaker-dependent statistical classification techniques is one of the main design features of the acoustic-phonetic decoder. The lexical access component is essentially based on a statistical dynamic programming technique which aims at matching a phonemic lexical entry containing various phonological forms, against a phonetic lattice. Sentence recognition is achieved by use of a context-free grammar and a parsing algorithm derived from Earley's parser. A speaker adaptation module allows some of the system parameters to be adjusted by matching known utterances with their acoustical representation. The task to be performed, described by its vocabulary and its grammar, is given as a parameter of the system. Continuously spoken sentences extracted from a 'pseudo-Logo' language are analyzed and results are presented.
Automatic tissue characterization from ultrasound imagery
NASA Astrophysics Data System (ADS)
Kadah, Yasser M.; Farag, Aly A.; Youssef, Abou-Bakr M.; Badawi, Ahmed M.
1993-08-01
In this work, feature extraction algorithms are proposed to extract the tissue characterization parameters from liver images. Then the resulting parameter set is further processed to obtain the minimum number of parameters representing the most discriminating pattern space for classification. This preprocessing step was applied to over 120 pathology-investigated cases to obtain the learning data for designing the classifier. The extracted features are divided into independent training and test sets and are used to construct both statistical and neural classifiers. The optimal criteria for these classifiers are set to have minimum error, ease of implementation and learning, and the flexibility for future modifications. Various algorithms for implementing various classification techniques are presented and tested on the data. The best performance was obtained using a single layer tensor model functional link network. Also, the voting k-nearest neighbor classifier provided comparably good diagnostic rates.
Forest inventory using multistage sampling with probability proportional to size. [Brazil
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Lee, D. C. L.; Hernandezfilho, P.; Shimabukuro, Y. E.; Deassis, O. R.; Demedeiros, J. S.
1984-01-01
A multistage sampling technique, with probability proportional to size, for forest volume inventory using remote sensing data is developed and evaluated. The study area is located in the Southeastern Brazil. The LANDSAT 4 digital data of the study area are used in the first stage for automatic classification of reforested areas. Four classes of pine and eucalypt with different tree volumes are classified utilizing a maximum likelihood classification algorithm. Color infrared aerial photographs are utilized in the second stage of sampling. In the third state (ground level) the time volume of each class is determined. The total time volume of each class is expanded through a statistical procedure taking into account all the three stages of sampling. This procedure results in an accurate time volume estimate with a smaller number of aerial photographs and reduced time in field work.
Multispectral scanner system parameter study and analysis software system description, volume 2
NASA Technical Reports Server (NTRS)
Landgrebe, D. A. (Principal Investigator); Mobasseri, B. G.; Wiersma, D. J.; Wiswell, E. R.; Mcgillem, C. D.; Anuta, P. E.
1978-01-01
The author has identified the following significant results. The integration of the available methods provided the analyst with the unified scanner analysis package (USAP), the flexibility and versatility of which was superior to many previous integrated techniques. The USAP consisted of three main subsystems; (1) a spatial path, (2) a spectral path, and (3) a set of analytic classification accuracy estimators which evaluated the system performance. The spatial path consisted of satellite and/or aircraft data, data correlation analyzer, scanner IFOV, and random noise model. The output of the spatial path was fed into the analytic classification and accuracy predictor. The spectral path consisted of laboratory and/or field spectral data, EXOSYS data retrieval, optimum spectral function calculation, data transformation, and statistics calculation. The output of the spectral path was fended into the stratified posterior performance estimator.
Comparative analysis of classification based algorithms for diabetes diagnosis using iris images.
Samant, Piyush; Agarwal, Ravinder
2018-01-01
Photo-diagnosis is always an intriguing area for the researchers, with the advancement of image processing and computer machine vision techniques it have become more reliable and popular in recent years. The objective of this paper is to study the change in the features of iris, particularly irregularities in the pigmentation of certain areas of the iris with respect to diabetic health of an individual. Apart from the point that iris recognition concentrates on the overall structure of the iris, diagnostic techniques emphasises the local variations in the particular area of iris. Pre-image processing techniques have been applied to extract iris and thereafter, region of interest from the extracted iris have been cropped out. In order to observe the changes in the tissue pigmentation of region of interest, statistical, texture textural and wavelet features have been extracted. At the end, a comparison of accuracies of five different classifiers has been presented to classify two subject groups of diabetic and non-diabetic. Best classification accuracy has been calculated as 89.66% by the random forest classifier. Results have been shown the effectiveness and diagnostic significance of the proposed methodology. Presented piece of work offers a novel systemic perspective of non-invasive and automatic diabetic diagnosis.
14 CFR Sec. 19-4 - Service classes.
Code of Federal Regulations, 2010 CFR
2010-01-01
... a composite of first class, coach, and mixed passenger/cargo service. The following classifications... integral part of services performed pursuant to published flight schedules. The following classifications... Classifications Sec. 19-4 Service classes. The statistical classifications are designed to reflect the operating...
14 CFR Sec. 19-4 - Service classes.
Code of Federal Regulations, 2011 CFR
2011-01-01
... a composite of first class, coach, and mixed passenger/cargo service. The following classifications... integral part of services performed pursuant to published flight schedules. The following classifications... Classifications Sec. 19-4 Service classes. The statistical classifications are designed to reflect the operating...
NASA Astrophysics Data System (ADS)
Prakash, A.; Haselwimmer, C. E.; Gens, R.; Womble, J. N.; Ver Hoef, J.
2013-12-01
Tidewater glaciers are prominent landscape features that play a significant role in landscape and ecosystem processes along the southeastern and southcentral coasts of Alaska. Tidewater glaciers calve large icebergs that serve as an important substrate for harbor seals (Phoca vitulina richardii) for resting, pupping, nursing young, molting, and avoiding predators. Many of the tidewater glaciers in Alaska are retreating, which may influence harbor seal populations. Our objectives are to investigate the relationship between ice conditions and harbor seal distributions, which are poorly understood, in John's Hopkins Inlet, Glacier Bay National Park, Alaska, using a combination of airborne remote sensing and statistical modeling techniques. We present an overview of some results from Object-Based Image Analysis (OBIA) for classification of a time series of very high spatial resolution (4 cm pixels) airborne imagery acquired over John's Hopkins Inlet during the harbor seal pupping season in June and during the molting season in August from 2007 - 2012. Using OBIA we have developed a workflow to automate processing of the large volumes (~1250 images/survey) of airborne visible imagery for 1) classification of ice products (e.g. percent ice cover, percent brash ice, percent ice bergs) at a range of scales, and 2) quantitative determination of ice morphological properties such as iceberg size, roundness, and texture that are not found in traditional per-pixel classification approaches. These ice classifications and morphological variables are then used in statistical models to assess relationships with harbor seal abundance and distribution. Ultimately, understanding these relationships may provide novel perspectives on the spatial and temporal variation of harbor seals in tidewater glacial fjords.
Pointwise probability reinforcements for robust statistical inference.
Frénay, Benoît; Verleysen, Michel
2014-02-01
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Forbes, G. S.; Pielke, R. A.
1985-01-01
Various empirical and statistical weather-forecasting studies which utilize stratification by weather regime are described. Objective classification was used to determine weather regime in some studies. In other cases the weather pattern was determined on the basis of a parameter representing the physical and dynamical processes relevant to the anticipated mesoscale phenomena, such as low level moisture convergence and convective precipitation, or the Froude number and the occurrence of cold-air damming. For mesoscale phenomena already in existence, new forecasting techniques were developed. The use of cloud models in operational forecasting is discussed. Models to calculate the spatial scales of forcings and resultant response for mesoscale systems are presented. The use of these models to represent the climatologically most prevalent systems, and to perform case-by-case simulations is reviewed. Operational implementation of mesoscale data into weather forecasts, using both actual simulation output and method-output statistics is discussed.
Using ontology network structure in text mining.
Berndt, Donald J; McCart, James A; Luther, Stephen L
2010-11-13
Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.
Classification of earth terrain using polarimetric synthetic aperture radar images
NASA Technical Reports Server (NTRS)
Lim, H. H.; Swartz, A. A.; Yueh, H. A.; Kong, J. A.; Shin, R. T.; Van Zyl, J. J.
1989-01-01
Supervised and unsupervised classification techniques are developed and used to classify the earth terrain components from SAR polarimetric images of San Francisco Bay and Traverse City, Michigan. The supervised techniques include the Bayes classifiers, normalized polarimetric classification, and simple feature classification using discriminates such as the absolute and normalized magnitude response of individual receiver channel returns and the phase difference between receiver channels. An algorithm is developed as an unsupervised technique which classifies terrain elements based on the relationship between the orientation angle and the handedness of the transmitting and receiving polariation states. It is found that supervised classification produces the best results when accurate classifier training data are used, while unsupervised classification may be applied when training data are not available.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Getman, Daniel J
2008-01-01
Many attempts to observe changes in terrestrial systems over time would be significantly enhanced if it were possible to improve the accuracy of classifications of low-resolution historic satellite data. In an effort to examine improving the accuracy of historic satellite image classification by combining satellite and air photo data, two experiments were undertaken in which low-resolution multispectral data and high-resolution panchromatic data were combined and then classified using the ECHO spectral-spatial image classification algorithm and the Maximum Likelihood technique. The multispectral data consisted of 6 multispectral channels (30-meter pixel resolution) from Landsat 7. These data were augmented with panchromatic datamore » (15m pixel resolution) from Landsat 7 in the first experiment, and with a mosaic of digital aerial photography (1m pixel resolution) in the second. The addition of the Landsat 7 panchromatic data provided a significant improvement in the accuracy of classifications made using the ECHO algorithm. Although the inclusion of aerial photography provided an improvement in accuracy, this improvement was only statistically significant at a 40-60% level. These results suggest that once error levels associated with combining aerial photography and multispectral satellite data are reduced, this approach has the potential to significantly enhance the precision and accuracy of classifications made using historic remotely sensed data, as a way to extend the time range of efforts to track temporal changes in terrestrial systems.« less
A Comprehensive Study of Retinal Vessel Classification Methods in Fundus Images
Miri, Maliheh; Amini, Zahra; Rabbani, Hossein; Kafieh, Raheleh
2017-01-01
Nowadays, it is obvious that there is a relationship between changes in the retinal vessel structure and diseases such as diabetic, hypertension, stroke, and the other cardiovascular diseases in adults as well as retinopathy of prematurity in infants. Retinal fundus images provide non-invasive visualization of the retinal vessel structure. Applying image processing techniques in the study of digital color fundus photographs and analyzing their vasculature is a reliable approach for early diagnosis of the aforementioned diseases. Reduction in the arteriolar–venular ratio of retina is one of the primary signs of hypertension, diabetic, and cardiovascular diseases which can be calculated by analyzing the fundus images. To achieve a precise measuring of this parameter and meaningful diagnostic results, accurate classification of arteries and veins is necessary. Classification of vessels in fundus images faces with some challenges that make it difficult. In this paper, a comprehensive study of the proposed methods for classification of arteries and veins in fundus images is presented. Considering that these methods are evaluated on different datasets and use different evaluation criteria, it is not possible to conduct a fair comparison of their performance. Therefore, we evaluate the classification methods from modeling perspective. This analysis reveals that most of the proposed approaches have focused on statistics, and geometric models in spatial domain and transform domain models have received less attention. This could suggest the possibility of using transform models, especially data adaptive ones, for modeling of the fundus images in future classification approaches. PMID:28553578
DOE Office of Scientific and Technical Information (OSTI.GOV)
Masci, Frank J.; Grillmair, Carl J.; Cutri, Roc M.
2014-07-01
We describe a methodology to classify periodic variable stars identified using photometric time-series measurements constructed from the Wide-field Infrared Survey Explorer (WISE) full-mission single-exposure Source Databases. This will assist in the future construction of a WISE Variable Source Database that assigns variables to specific science classes as constrained by the WISE observing cadence with statistically meaningful classification probabilities. We have analyzed the WISE light curves of 8273 variable stars identified in previous optical variability surveys (MACHO, GCVS, and ASAS) and show that Fourier decomposition techniques can be extended into the mid-IR to assist with their classification. Combined with other periodicmore » light-curve features, this sample is then used to train a machine-learned classifier based on the random forest (RF) method. Consistent with previous classification studies of variable stars in general, the RF machine-learned classifier is superior to other methods in terms of accuracy, robustness against outliers, and relative immunity to features that carry little or redundant class information. For the three most common classes identified by WISE: Algols, RR Lyrae, and W Ursae Majoris type variables, we obtain classification efficiencies of 80.7%, 82.7%, and 84.5% respectively using cross-validation analyses, with 95% confidence intervals of approximately ±2%. These accuracies are achieved at purity (or reliability) levels of 88.5%, 96.2%, and 87.8% respectively, similar to that achieved in previous automated classification studies of periodic variable stars.« less
NASA Astrophysics Data System (ADS)
Zavaletta, Vanessa A.; Bartholmai, Brian J.; Robb, Richard A.
2007-03-01
Diffuse lung diseases, such as idiopathic pulmonary fibrosis (IPF), can be characterized and quantified by analysis of volumetric high resolution CT scans of the lungs. These data sets typically have dimensions of 512 x 512 x 400. It is too subjective and labor intensive for a radiologist to analyze each slice and quantify regional abnormalities manually. Thus, computer aided techniques are necessary, particularly texture analysis techniques which classify various lung tissue types. Second and higher order statistics which relate the spatial variation of the intensity values are good discriminatory features for various textures. The intensity values in lung CT scans range between [-1024, 1024]. Calculation of second order statistics on this range is too computationally intensive so the data is typically binned between 16 or 32 gray levels. There are more effective ways of binning the gray level range to improve classification. An optimal and very efficient way to nonlinearly bin the histogram is to use a dynamic programming algorithm. The objective of this paper is to show that nonlinear binning using dynamic programming is computationally efficient and improves the discriminatory power of the second and higher order statistics for more accurate quantification of diffuse lung disease.
Sun, Min; Wong, David; Kronenfeld, Barry
2016-01-01
Despite conceptual and technology advancements in cartography over the decades, choropleth map design and classification fail to address a fundamental issue: estimates that are statistically indifferent may be assigned to different classes on maps or vice versa. Recently, the class separability concept was introduced as a map classification criterion to evaluate the likelihood that estimates in two classes are statistical different. Unfortunately, choropleth maps created according to the separability criterion usually have highly unbalanced classes. To produce reasonably separable but more balanced classes, we propose a heuristic classification approach to consider not just the class separability criterion but also other classification criteria such as evenness and intra-class variability. A geovisual-analytic package was developed to support the heuristic mapping process to evaluate the trade-off between relevant criteria and to select the most preferable classification. Class break values can be adjusted to improve the performance of a classification. PMID:28286426
Discriminant forest classification method and system
Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.
2012-11-06
A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.
Forster, H.-J.; Davis, J.C.; Tischendorf, G.; Seltmann, R.
1999-01-01
High-precision major, minor and trace element analyses for 44 elements have been made of 329 Late Variscan granitic and rhyolitic rocks from the Erzgebirge metallogenic province of Germany. The intrusive histories of some of these granites are not completely understood and exposures of rock are not adequate to resolve relationships between what apparently are different plutons. Therefore, it is necessary to turn to chemical analyses to decipher the evolution of the plutons and their relationships. A new classification of Erzgebirge plutons into five major groups of granites, based on petrologic interpretations of geochemical and mineralogical relationships (low-F biotite granites; low-F two-mica granites; high-F, high-P2O5 Li-mica granites; high-F, low-P2O5 Li-mica granites; high-F, low-P2O5 biotite granites) was tested by multivariate techniques. Canonical analyses of major elements, minor elements, trace elements and ratio variables all distinguish the groups with differing amounts of success. Univariate ANOVA's, in combination with forward-stepwise and backward-elimination canonical analyses, were used to select ten variables which were most effective in distinguishing groups. In a biplot, groups form distinct clusters roughly arranged along a quadratic path. Within groups, individual plutons tend to be arranged in patterns possibly reflecting granitic evolution. Canonical functions were used to classify samples of rhyolites of unknown association into the five groups. Another canonical analysis was based on ten elements traditionally used in petrology and which were important in the new classification of granites. Their biplot pattern is similar to that from statistically chosen variables but less effective at distinguishing the five groups of granites. This study shows that multivariate statistical techniques can provide significant insight into problems of granitic petrogenesis and may be superior to conventional procedures for petrological interpretation.
Contextual classification of multispectral image data: Approximate algorithm
NASA Technical Reports Server (NTRS)
Tilton, J. C. (Principal Investigator)
1980-01-01
An approximation to a classification algorithm incorporating spatial context information in a general, statistical manner is presented which is computationally less intensive. Classifications that are nearly as accurate are produced.
NASA Astrophysics Data System (ADS)
Broderick, Ciaran; Fealy, Rowan
2013-04-01
Circulation type classifications (CTCs) compiled as part of the COST733 Action, entitled 'Harmonisation and Application of Weather Type Classifications for European Regions', are examined for their synoptic and climatological applicability to Ireland based on their ability to characterise surface temperature and precipitation. In all 16 different objective classification schemes, representative of four different methodological approaches to circulation typing (optimization algorithms, threshold based methods, eigenvector techniques and leader algorithms) are considered. Several statistical metrics which variously quantify the ability of CTCs to discretize daily data into well-defined homogeneous groups are used to evaluate and compare different approaches to synoptic typing. The records from 14 meteorological stations located across the island of Ireland are used in the study. The results indicate that while it was not possible to identify a single optimum classification or approach to circulation typing - conditional on the location and surface variables considered - a number of general assertions regarding the performance of different schemes can be made. The findings for surface temperature indicate that that those classifications based on predefined thresholds (e.g. Litynski, GrossWetterTypes and original Lamb Weather Type) perform well, as do the Kruizinga and Lund classification schemes. Similarly for precipitation predefined type classifications return high skill scores, as do those classifications derived using some optimization procedure (e.g. SANDRA, Self Organizing Maps and K-Means clustering). For both temperature and precipitation the results generally indicate that the classifications perform best for the winter season - reflecting the closer coupling between large-scale circulation and surface conditions during this period. In contrast to the findings for temperature, spatial patterns in the performance of classifications were more evident for precipitation. In the case of this variable those more westerly synoptic stations open to zonal airflow and less influenced by regional scale forcings generally exhibited a stronger link with large-scale circulation.
Salari, Nader; Shohaimi, Shamarina; Najafi, Farid; Nallappan, Meenakshii; Karishnarajah, Isthrinayagy
2014-01-01
Among numerous artificial intelligence approaches, k-Nearest Neighbor algorithms, genetic algorithms, and artificial neural networks are considered as the most common and effective methods in classification problems in numerous studies. In the present study, the results of the implementation of a novel hybrid feature selection-classification model using the above mentioned methods are presented. The purpose is benefitting from the synergies obtained from combining these technologies for the development of classification models. Such a combination creates an opportunity to invest in the strength of each algorithm, and is an approach to make up for their deficiencies. To develop proposed model, with the aim of obtaining the best array of features, first, feature ranking techniques such as the Fisher's discriminant ratio and class separability criteria were used to prioritize features. Second, the obtained results that included arrays of the top-ranked features were used as the initial population of a genetic algorithm to produce optimum arrays of features. Third, using a modified k-Nearest Neighbor method as well as an improved method of backpropagation neural networks, the classification process was advanced based on optimum arrays of the features selected by genetic algorithms. The performance of the proposed model was compared with thirteen well-known classification models based on seven datasets. Furthermore, the statistical analysis was performed using the Friedman test followed by post-hoc tests. The experimental findings indicated that the novel proposed hybrid model resulted in significantly better classification performance compared with all 13 classification methods. Finally, the performance results of the proposed model was benchmarked against the best ones reported as the state-of-the-art classifiers in terms of classification accuracy for the same data sets. The substantial findings of the comprehensive comparative study revealed that performance of the proposed model in terms of classification accuracy is desirable, promising, and competitive to the existing state-of-the-art classification models. PMID:25419659
Theory of Image Analysis and Recognition.
1983-01-24
Stanley M. Dunn, "Texture Classification with Change Point Statistics," TR- 1082 , July 1981. 97. R. Chellappa, "Synthesis of Textures Using Simultane...July 1981. 96. Stanley M. Dunn, "Texture Classification with Change Point Statistics," TR- 1082 , July 1981. * 97. R. Chellappa, "Synthesis of Textures
NASA Astrophysics Data System (ADS)
Yu, Xin; Wen, Zongyong; Zhu, Zhaorong; Xia, Qiang; Shun, Lan
2016-06-01
Image classification will still be a long way in the future, although it has gone almost half a century. In fact, researchers have gained many fruits in the image classification domain, but there is still a long distance between theory and practice. However, some new methods in the artificial intelligence domain will be absorbed into the image classification domain and draw on the strength of each to offset the weakness of the other, which will open up a new prospect. Usually, networks play the role of a high-level language, as is seen in Artificial Intelligence and statistics, because networks are used to build complex model from simple components. These years, Bayesian Networks, one of probabilistic networks, are a powerful data mining technique for handling uncertainty in complex domains. In this paper, we apply Tree Augmented Naive Bayesian Networks (TAN) to texture classification of High-resolution remote sensing images and put up a new method to construct the network topology structure in terms of training accuracy based on the training samples. Since 2013, China government has started the first national geographical information census project, which mainly interprets geographical information based on high-resolution remote sensing images. Therefore, this paper tries to apply Bayesian network to remote sensing image classification, in order to improve image interpretation in the first national geographical information census project. In the experiment, we choose some remote sensing images in Beijing. Experimental results demonstrate TAN outperform than Naive Bayesian Classifier (NBC) and Maximum Likelihood Classification Method (MLC) in the overall classification accuracy. In addition, the proposed method can reduce the workload of field workers and improve the work efficiency. Although it is time consuming, it will be an attractive and effective method for assisting office operation of image interpretation.
NASA Astrophysics Data System (ADS)
Wilson, Machelle; Ustin, Susan L.; Rocke, David
2003-03-01
Remote sensing technologies with high spatial and spectral resolution show a great deal of promise in addressing critical environmental monitoring issues, but the ability to analyze and interpret the data lags behind the technology. Robust analytical methods are required before the wealth of data available through remote sensing can be applied to a wide range of environmental problems for which remote detection is the best method. In this study we compare the classification effectiveness of two relatively new techniques on data consisting of leaf-level reflectance from plants that have been exposed to varying levels of heavy metal toxicity. If these methodologies work well on leaf-level data, then there is some hope that they will also work well on data from airborne and space-borne platforms. The classification methods compared were support vector machine classification of exposed and non-exposed plants based on the reflectance data, and partial east squares compression of the reflectance data followed by classification using logistic discrimination (PLS/LD). PLS/LD was performed in two ways. We used the continuous concentration data as the response during compression, and then used the binary response required during logistic discrimination. We also used a binary response during compression followed by logistic discrimination. The statistics we used to compare the effectiveness of the methodologies was the leave-one-out cross validation estimate of the prediction error.
Low-cost real-time automatic wheel classification system
NASA Astrophysics Data System (ADS)
Shabestari, Behrouz N.; Miller, John W. V.; Wedding, Victoria
1992-11-01
This paper describes the design and implementation of a low-cost machine vision system for identifying various types of automotive wheels which are manufactured in several styles and sizes. In this application, a variety of wheels travel on a conveyor in random order through a number of processing steps. One of these processes requires the identification of the wheel type which was performed manually by an operator. A vision system was designed to provide the required identification. The system consisted of an annular illumination source, a CCD TV camera, frame grabber, and 386-compatible computer. Statistical pattern recognition techniques were used to provide robust classification as well as a simple means for adding new wheel designs to the system. Maintenance of the system can be performed by plant personnel with minimal training. The basic steps for identification include image acquisition, segmentation of the regions of interest, extraction of selected features, and classification. The vision system has been installed in a plant and has proven to be extremely effective. The system properly identifies the wheels correctly up to 30 wheels per minute regardless of rotational orientation in the camera's field of view. Correct classification can even be achieved if a portion of the wheel is blocked off from the camera. Significant cost savings have been achieved by a reduction in scrap associated with incorrect manual classification as well as a reduction of labor in a tedious task.
Classifiers utilized to enhance acoustic based sensors to identify round types of artillery/mortar
NASA Astrophysics Data System (ADS)
Grasing, David; Desai, Sachi; Morcos, Amir
2008-04-01
Feature extraction methods based on the statistical analysis of the change in event pressure levels over a period and the level of ambient pressure excitation facilitate the development of a robust classification algorithm. The features reliably discriminates mortar and artillery variants via acoustic signals produced during the launch events. Utilizing acoustic sensors to exploit the sound waveform generated from the blast for the identification of mortar and artillery variants as type A, etcetera through analysis of the waveform. Distinct characteristics arise within the different mortar/artillery variants because varying HE mortar payloads and related charges emphasize varying size events at launch. The waveform holds various harmonic properties distinct to a given mortar/artillery variant that through advanced signal processing and data mining techniques can employed to classify a given type. The skewness and other statistical processing techniques are used to extract the predominant components from the acoustic signatures at ranges exceeding 3000m. Exploiting these techniques will help develop a feature set highly independent of range, providing discrimination based on acoustic elements of the blast wave. Highly reliable discrimination will be achieved with a feedforward neural network classifier trained on a feature space derived from the distribution of statistical coefficients, frequency spectrum, and higher frequency details found within different energy bands. The processes that are described herein extend current technologies, which emphasis acoustic sensor systems to provide such situational awareness.
Is Mitochondrial Donation Germ-Line Gene Therapy? Classifications and Ethical Implications.
Newson, Ainsley J; Wrigley, Anthony
2017-01-01
The classification of techniques used in mitochondrial donation, including their role as purported germ-line gene therapies, is far from clear. These techniques exhibit characteristics typical of a variety of classifications that have been used in both scientific and bioethics scholarship. This raises two connected questions, which we address in this paper: (i) how should we classify mitochondrial donation techniques?; and (ii) what ethical implications surround such a classification? First, we outline how methods of genetic intervention, such as germ-line gene therapy, are typically defined or classified. We then consider whether techniques of mitochondrial donation fit into these, whether they might do so with some refinement of these categories, or whether they require some other approach to classification. To answer the second question, we discuss the relationship between classification and several key ethical issues arising from mitochondrial donation. We conclude that the properties characteristic of mitochondrial inheritance mean that most mitochondrial donation techniques belong to a new sub-class of genetic modification, which we call 'conditionally inheritable genomic modification' (CIGM). © 2017 John Wiley & Sons Ltd.
Case-mix groups for VA hospital-based home care.
Smith, M E; Baker, C R; Branch, L G; Walls, R C; Grimes, R M; Karklins, J M; Kashner, M; Burrage, R; Parks, A; Rogers, P
1992-01-01
The purpose of this study is to group hospital-based home care (HBHC) patients homogeneously by their characteristics with respect to cost of care to develop alternative case mix methods for management and reimbursement (allocation) purposes. Six Veterans Affairs (VA) HBHC programs in Fiscal Year (FY) 1986 that maximized patient, program, and regional variation were selected, all of which agreed to participate. All HBHC patients active in each program on October 1, 1987, in addition to all new admissions through September 30, 1988 (FY88), comprised the sample of 874 unique patients. Statistical methods include the use of classification and regression trees (CART software: Statistical Software; Lafayette, CA), analysis of variance, and multiple linear regression techniques. The resulting algorithm is a three-factor model that explains 20% of the cost variance (R2 = 20%, with a cross validation R2 of 12%). Similar classifications such as the RUG-II, which is utilized for VA nursing home and intermediate care, the VA outpatient resource allocation model, and the RUG-HHC, utilized in some states for reimbursing home health care in the private sector, explained less of the cost variance and, therefore, are less adequate for VA home care resource allocation.
Ielpo, Pierina; Leardi, Riccardo; Pappagallo, Giuseppe; Uricchio, Vito Felice
2017-06-01
In this paper, the results obtained from multivariate statistical techniques such as PCA (Principal component analysis) and LDA (Linear discriminant analysis) applied to a wide soil data set are presented. The results have been compared with those obtained on a groundwater data set, whose samples were collected together with soil ones, within the project "Improvement of the Regional Agro-meteorological Monitoring Network (2004-2007)". LDA, applied to soil data, has allowed to distinguish the geographical origin of the sample from either one of the two macroaeras: Bari and Foggia provinces vs Brindisi, Lecce e Taranto provinces, with a percentage of correct prediction in cross validation of 87%. In the case of the groundwater data set, the best classification was obtained when the samples were grouped into three macroareas: Foggia province, Bari province and Brindisi, Lecce and Taranto provinces, by reaching a percentage of correct predictions in cross validation of 84%. The obtained information can be very useful in supporting soil and water resource management, such as the reduction of water consumption and the reduction of energy and chemical (nutrients and pesticides) inputs in agriculture.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davis, E.L.
A novel method for performing real-time acquisition and processing Landsat/EROS data covers all aspects including radiometric and geometric corrections of multispectral scanner or return-beam vidicon inputs, image enhancement, statistical analysis, feature extraction, and classification. Radiometric transformations include bias/gain adjustment, noise suppression, calibration, scan angle compensation, and illumination compensation, including topography and atmospheric effects. Correction or compensation for geometric distortion includes sensor-related distortions, such as centering, skew, size, scan nonlinearity, radial symmetry, and tangential symmetry. Also included are object image-related distortions such as aspect angle (altitude), scale distortion (altitude), terrain relief, and earth curvature. Ephemeral corrections are also applied to compensatemore » for satellite forward movement, earth rotation, altitude variations, satellite vibration, and mirror scan velocity. Image enhancement includes high-pass, low-pass, and Laplacian mask filtering and data restoration for intermittent losses. Resource classification is provided by statistical analysis including histograms, correlational analysis, matrix manipulations, and determination of spectral responses. Feature extraction includes spatial frequency analysis, which is used in parallel discriminant functions in each array processor for rapid determination. The technique uses integrated parallel array processors that decimate the tasks concurrently under supervision of a control processor. The operator-machine interface is optimized for programming ease and graphics image windowing.« less
Motion Control of Drives for Prosthetic Hand Using Continuous Myoelectric Signals
NASA Astrophysics Data System (ADS)
Purushothaman, Geethanjali; Ray, Kalyan Kumar
2016-03-01
In this paper the authors present motion control of a prosthetic hand, through continuous myoelectric signal acquisition, classification and actuation of the prosthetic drive. A four channel continuous electromyogram (EMG) signal also known as myoelectric signals (MES) are acquired from the abled-body to classify the six unique movements of hand and wrist, viz, hand open (HO), hand close (HC), wrist flexion (WF), wrist extension (WE), ulnar deviation (UD) and radial deviation (RD). The classification technique involves in extracting the features/pattern through statistical time domain (TD) parameter/autoregressive coefficients (AR), which are reduced using principal component analysis (PCA). The reduced statistical TD features and or AR coefficients are used to classify the signal patterns through k nearest neighbour (kNN) as well as neural network (NN) classifier and the performance of the classifiers are compared. Performance comparison of the above two classifiers clearly shows that kNN classifier in identifying the hidden intended motion in the myoelectric signals is better than that of NN classifier. Once the classifier identifies the intended motion, the signal is amplified to actuate the three low power DC motor to perform the above mentioned movements.
Estimating Classification Consistency and Accuracy for Cognitive Diagnostic Assessment
ERIC Educational Resources Information Center
Cui, Ying; Gierl, Mark J.; Chang, Hua-Hua
2012-01-01
This article introduces procedures for the computation and asymptotic statistical inference for classification consistency and accuracy indices specifically designed for cognitive diagnostic assessments. The new classification indices can be used as important indicators of the reliability and validity of classification results produced by…
77 FR 59989 - Labor Surplus Area Classification Under Executive Orders
Federal Register 2010, 2011, 2012, 2013, 2014
2012-10-01
... DEPARTMENT OF LABOR Employment and Training Administration Labor Surplus Area Classification Under... Bureau of Labor Statistics are used in making these classifications. The average unemployment rate for all states includes data for the Commonwealth of Puerto Rico. The basic LSA classification criteria...
Classification software technique assessment
NASA Technical Reports Server (NTRS)
Jayroe, R. R., Jr.; Atkinson, R.; Dasarathy, B. V.; Lybanon, M.; Ramapryian, H. K.
1976-01-01
A catalog of software options is presented for the use of local user communities to obtain software for analyzing remotely sensed multispectral imagery. The resources required to utilize a particular software program are described. Descriptions of how a particular program analyzes data and the performance of that program for an application and data set provided by the user are shown. An effort is made to establish a statistical performance base for various software programs with regard to different data sets and analysis applications, to determine the status of the state-of-the-art.
International experience on the use of artificial neural networks in gastroenterology.
Grossi, E; Mancini, A; Buscema, M
2007-03-01
In this paper, we reconsider the scientific background for the use of artificial intelligence tools in medicine. A review of some recent significant papers shows that artificial neural networks, the more advanced and effective artificial intelligence technique, can improve the classification accuracy and survival prediction of a number of gastrointestinal diseases. We discuss the 'added value' the use of artificial neural networks-based tools can bring in the field of gastroenterology, both at research and clinical application level, when compared with traditional statistical or clinical-pathological methods.
Optical recognition of statistical patterns
NASA Astrophysics Data System (ADS)
Lee, S. H.
1981-12-01
Optical implementation of the Fukunaga-Koontz transform (FKT) and the Least-Squares Linear Mapping Technique (LSLMT) is described. The FKT is a linear transformation which performs image feature extraction for a two-class image classification problem. The LSLMT performs a transform from large dimensional feature space to small dimensional decision space for separating multiple image classes by maximizing the interclass differences while minimizing the intraclass variations. The FKT and the LSLMT were optically implemented by utilizing a coded phase optical processor. The transform was used for classifying birds and fish. After the F-K basis functions were calculated, those most useful for classification were incorporated into a computer generated hologram. The output of the optical processor, consisting of the squared magnitude of the F-K coefficients, was detected by a T.V. camera, digitized, and fed into a micro-computer for classification. A simple linear classifier based on only two F-K coefficients was able to separate the images into two classes, indicating that the F-K transform had chosen good features. Two advantages of optically implementing the FKT and LSLMT are parallel and real time processing.
Optical recognition of statistical patterns
NASA Technical Reports Server (NTRS)
Lee, S. H.
1981-01-01
Optical implementation of the Fukunaga-Koontz transform (FKT) and the Least-Squares Linear Mapping Technique (LSLMT) is described. The FKT is a linear transformation which performs image feature extraction for a two-class image classification problem. The LSLMT performs a transform from large dimensional feature space to small dimensional decision space for separating multiple image classes by maximizing the interclass differences while minimizing the intraclass variations. The FKT and the LSLMT were optically implemented by utilizing a coded phase optical processor. The transform was used for classifying birds and fish. After the F-K basis functions were calculated, those most useful for classification were incorporated into a computer generated hologram. The output of the optical processor, consisting of the squared magnitude of the F-K coefficients, was detected by a T.V. camera, digitized, and fed into a micro-computer for classification. A simple linear classifier based on only two F-K coefficients was able to separate the images into two classes, indicating that the F-K transform had chosen good features. Two advantages of optically implementing the FKT and LSLMT are parallel and real time processing.
Fuzzy Classification of Ocean Color Satellite Data for Bio-optical Algorithm Constituent Retrievals
NASA Technical Reports Server (NTRS)
Campbell, Janet W.
1998-01-01
The ocean has been traditionally viewed as a 2 class system. Morel and Prieur (1977) classified ocean water according to the dominant absorbent particle suspended in the water column. Case 1 is described as having a high concentration of phytoplankton (and detritus) relative to other particles. Conversely, case 2 is described as having inorganic particles such as suspended sediments in high concentrations. Little work has gone into the problem of mixing bio-optical models for these different water types. An approach is put forth here to blend bio-optical algorithms based on a fuzzy classification scheme. This scheme involves two procedures. First, a clustering procedure identifies classes and builds class statistics from in-situ optical measurements. Next, a classification procedure assigns satellite pixels partial memberships to these classes based on their ocean color reflectance signature. These membership assignments can be used as the basis for a weighting retrievals from class-specific bio-optical algorithms. This technique is demonstrated with in-situ optical measurements and an image from the SeaWiFS ocean color satellite.
Georgouli, Konstantia; Martinez Del Rincon, Jesus; Koidis, Anastasios
2017-02-15
The main objective of this work was to develop a novel dimensionality reduction technique as a part of an integrated pattern recognition solution capable of identifying adulterants such as hazelnut oil in extra virgin olive oil at low percentages based on spectroscopic chemical fingerprints. A novel Continuous Locality Preserving Projections (CLPP) technique is proposed which allows the modelling of the continuous nature of the produced in-house admixtures as data series instead of discrete points. The maintenance of the continuous structure of the data manifold enables the better visualisation of this examined classification problem and facilitates the more accurate utilisation of the manifold for detecting the adulterants. The performance of the proposed technique is validated with two different spectroscopic techniques (Raman and Fourier transform infrared, FT-IR). In all cases studied, CLPP accompanied by k-Nearest Neighbors (kNN) algorithm was found to outperform any other state-of-the-art pattern recognition techniques. Copyright © 2016 Elsevier Ltd. All rights reserved.
Smolinski, Tomasz G; Buchanan, Roger; Boratyn, Grzegorz M; Milanova, Mariofanna; Prinz, Astrid A
2006-01-01
Background Independent Component Analysis (ICA) proves to be useful in the analysis of neural activity, as it allows for identification of distinct sources of activity. Applied to measurements registered in a controlled setting and under exposure to an external stimulus, it can facilitate analysis of the impact of the stimulus on those sources. The link between the stimulus and a given source can be verified by a classifier that is able to "predict" the condition a given signal was registered under, solely based on the components. However, the ICA's assumption about statistical independence of sources is often unrealistic and turns out to be insufficient to build an accurate classifier. Therefore, we propose to utilize a novel method, based on hybridization of ICA, multi-objective evolutionary algorithms (MOEA), and rough sets (RS), that attempts to improve the effectiveness of signal decomposition techniques by providing them with "classification-awareness." Results The preliminary results described here are very promising and further investigation of other MOEAs and/or RS-based classification accuracy measures should be pursued. Even a quick visual analysis of those results can provide an interesting insight into the problem of neural activity analysis. Conclusion We present a methodology of classificatory decomposition of signals. One of the main advantages of our approach is the fact that rather than solely relying on often unrealistic assumptions about statistical independence of sources, components are generated in the light of a underlying classification problem itself. PMID:17118151
A classification scheme for turbulent flows based on their joint velocity-intermittency structure
NASA Astrophysics Data System (ADS)
Keylock, C. J.; Nishimura, K.; Peinke, J.
2011-12-01
Kolmogorov's classic theory for turbulence assumed an independence between velocity increments and the value for the velocity itself. However, this assumption is questionable, particularly in complex geophysical flows. Here we propose a framework for studying velocity-intermittency coupling that is similar in essence to the popular quadrant analysis method for studying near-wall flows. However, we study the dominant (longitudinal) velocity component along with a measure of the roughness of the signal, given mathematically by its series of Hölder exponents. Thus, we permit a possible dependence between velocity and intermittency. We compare boundary layer data obtained in a wind tunnel to turbulent jets and wake flows. These flow classes all have distinct velocity-intermittency characteristics, which cause them to be readily distinguished using our technique. Our method is much simpler and quicker to apply than approaches that condition the velocity increment statistics at some scale, r, on the increment statistics at a neighbouring, larger spatial scale, r+Δ, and the velocity itself. Classification of environmental flows is then possible based on their similarities to the idealised flow classes and we demonstrate this using laboratory data for flow in a parallel-channel confluence where the region of flow recirculation in the lee of the step is discriminated as a flow class distinct from boundary layer, jet and wake flows. Hence, using our method, it is possible to assign a flow classification to complex geophysical, turbulent flows depending upon which idealised flow class they most resemble.
Dinov, Ivo D.; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W.; Price, Nathan D.; Van Horn, John D.; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M.; Dauer, William; Toga, Arthur W.
2016-01-01
Background A unique archive of Big Data on Parkinson’s Disease is collected, managed and disseminated by the Parkinson’s Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson’s disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data–large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources–all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Methods and Findings Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson’s disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Conclusions Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson’s disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer’s, Huntington’s, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications. PMID:27494614
Federal Register 2010, 2011, 2012, 2013, 2014
2010-09-16
... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the... Public Health Data Standards Staff, NCHS, 3311 Toledo Road, Room 2337, Hyattsville, Maryland 20782, e...
Applying Descriptive Statistics to Teaching the Regional Classification of Climate.
ERIC Educational Resources Information Center
Lindquist, Peter S.; Hammel, Daniel J.
1998-01-01
Describes an exercise for college and high school students that relates descriptive statistics to the regional climatic classification. The exercise introduces students to simple calculations of central tendency and dispersion, the construction and interpretation of scatterplots, and the definition of climatic regions. Forces students to engage…
NASA Technical Reports Server (NTRS)
Bizzell, R. M.; Feiveson, A. H.; Hall, F. G.; Bauer, M. E.; Davis, B. J.; Malila, W. A.; Rice, D. P.
1975-01-01
The CITARS was an experiment designed to quantitatively evaluate crop identification performance for corn and soybeans in various environments using a well-defined set of automatic data processing (ADP) techniques. Each technique was applied to data acquired to recognize and estimate proportions of corn and soybeans. The CITARS documentation summarizes, interprets, and discusses the crop identification performances obtained using (1) different ADP procedures; (2) a linear versus a quadratic classifier; (3) prior probability information derived from historic data; (4) local versus nonlocal recognition training statistics and the associated use of preprocessing; (5) multitemporal data; (6) classification bias and mixed pixels in proportion estimation; and (7) data with differnt site characteristics, including crop, soil, atmospheric effects, and stages of crop maturity.
Supervised Classification Techniques for Hyperspectral Data
NASA Technical Reports Server (NTRS)
Jimenez, Luis O.
1997-01-01
The recent development of more sophisticated remote sensing systems enables the measurement of radiation in many mm-e spectral intervals than previous possible. An example of this technology is the AVIRIS system, which collects image data in 220 bands. The increased dimensionality of such hyperspectral data provides a challenge to the current techniques for analyzing such data. Human experience in three dimensional space tends to mislead one's intuition of geometrical and statistical properties in high dimensional space, properties which must guide our choices in the data analysis process. In this paper high dimensional space properties are mentioned with their implication for high dimensional data analysis in order to illuminate the next steps that need to be taken for the next generation of hyperspectral data classifiers.
Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali
2018-06-01
Text categorization has been used extensively in recent years to classify plain-text clinical reports. This study employs text categorization techniques for the classification of open narrative forensic autopsy reports. One of the key steps in text classification is document representation. In document representation, a clinical report is transformed into a format that is suitable for classification. The traditional document representation technique for text categorization is the bag-of-words (BoW) technique. In this study, the traditional BoW technique is ineffective in classifying forensic autopsy reports because it merely extracts frequent but discriminative features from clinical reports. Moreover, this technique fails to capture word inversion, as well as word-level synonymy and polysemy, when classifying autopsy reports. Hence, the BoW technique suffers from low accuracy and low robustness unless it is improved with contextual and application-specific information. To overcome the aforementioned limitations of the BoW technique, this research aims to develop an effective conceptual graph-based document representation (CGDR) technique to classify 1500 forensic autopsy reports from four (4) manners of death (MoD) and sixteen (16) causes of death (CoD). Term-based and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) based conceptual features were extracted and represented through graphs. These features were then used to train a two-level text classifier. The first level classifier was responsible for predicting MoD. In addition, the second level classifier was responsible for predicting CoD using the proposed conceptual graph-based document representation technique. To demonstrate the significance of the proposed technique, its results were compared with those of six (6) state-of-the-art document representation techniques. Lastly, this study compared the effects of one-level classification and two-level classification on the experimental results. The experimental results indicated that the CGDR technique achieved 12% to 15% improvement in accuracy compared with fully automated document representation baseline techniques. Moreover, two-level classification obtained better results compared with one-level classification. The promising results of the proposed conceptual graph-based document representation technique suggest that pathologists can adopt the proposed system as their basis for second opinion, thereby supporting them in effectively determining CoD. Copyright © 2018 Elsevier Inc. All rights reserved.
78 FR 63248 - Labor Surplus Area Classification under Executive Orders 12073 and 10582
Federal Register 2010, 2011, 2012, 2013, 2014
2013-10-23
... DEPARTMENT OF LABOR Employment and Training Administration Labor Surplus Area Classification under... Statistics unemployment estimates to make these classifications. The average unemployment rate for all states includes data for the Commonwealth of Puerto Rico. The basic LSA classification criteria include a ``floor...
A new classification scheme of plastic wastes based upon recycling labels.
Özkan, Kemal; Ergin, Semih; Işık, Şahin; Işıklı, Idil
2015-01-01
Since recycling of materials is widely assumed to be environmentally and economically beneficial, reliable sorting and processing of waste packaging materials such as plastics is very important for recycling with high efficiency. An automated system that can quickly categorize these materials is certainly needed for obtaining maximum classification while maintaining high throughput. In this paper, first of all, the photographs of the plastic bottles have been taken and several preprocessing steps were carried out. The first preprocessing step is to extract the plastic area of a bottle from the background. Then, the morphological image operations are implemented. These operations are edge detection, noise removal, hole removing, image enhancement, and image segmentation. These morphological operations can be generally defined in terms of the combinations of erosion and dilation. The effect of bottle color as well as label are eliminated using these operations. Secondly, the pixel-wise intensity values of the plastic bottle images have been used together with the most popular subspace and statistical feature extraction methods to construct the feature vectors in this study. Only three types of plastics are considered due to higher existence ratio of them than the other plastic types in the world. The decision mechanism consists of five different feature extraction methods including as Principal Component Analysis (PCA), Kernel PCA (KPCA), Fisher's Linear Discriminant Analysis (FLDA), Singular Value Decomposition (SVD) and Laplacian Eigenmaps (LEMAP) and uses a simple experimental setup with a camera and homogenous backlighting. Due to the giving global solution for a classification problem, Support Vector Machine (SVM) is selected to achieve the classification task and majority voting technique is used as the decision mechanism. This technique equally weights each classification result and assigns the given plastic object to the class that the most classification results agree on. The proposed classification scheme provides high accuracy rate, and also it is able to run in real-time applications. It can automatically classify the plastic bottle types with approximately 90% recognition accuracy. Besides this, the proposed methodology yields approximately 96% classification rate for the separation of PET or non-PET plastic types. It also gives 92% accuracy for the categorization of non-PET plastic types into HPDE or PP. Copyright © 2014 Elsevier Ltd. All rights reserved.
Using machine learning techniques to automate sky survey catalog generation
NASA Technical Reports Server (NTRS)
Fayyad, Usama M.; Roden, J. C.; Doyle, R. J.; Weir, Nicholas; Djorgovski, S. G.
1993-01-01
We describe the application of machine classification techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Palomar Observatory Sky Survey provides comprehensive photographic coverage of the northern celestial hemisphere. The photographic plates are being digitized into images containing on the order of 10(exp 7) galaxies and 10(exp 8) stars. Since the size of this data set precludes manual analysis and classification of objects, our approach is to develop a software system which integrates independently developed techniques for image processing and data classification. Image processing routines are applied to identify and measure features of sky objects. Selected features are used to determine the classification of each object. GID3* and O-BTree, two inductive learning techniques, are used to automatically learn classification decision trees from examples. We describe the techniques used, the details of our specific application, and the initial encouraging results which indicate that our approach is well-suited to the problem. The benefits of the approach are increased data reduction throughput, consistency of classification, and the automated derivation of classification rules that will form an objective, examinable basis for classifying sky objects. Furthermore, astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems given automatically cataloged data.
Oelze, Michael L; Mamou, Jonathan
2016-02-01
Conventional medical imaging technologies, including ultrasound, have continued to improve over the years. For example, in oncology, medical imaging is characterized by high sensitivity, i.e., the ability to detect anomalous tissue features, but the ability to classify these tissue features from images often lacks specificity. As a result, a large number of biopsies of tissues with suspicious image findings are performed each year with a vast majority of these biopsies resulting in a negative finding. To improve specificity of cancer imaging, quantitative imaging techniques can play an important role. Conventional ultrasound B-mode imaging is mainly qualitative in nature. However, quantitative ultrasound (QUS) imaging can provide specific numbers related to tissue features that can increase the specificity of image findings leading to improvements in diagnostic ultrasound. QUS imaging can encompass a wide variety of techniques including spectral-based parameterization, elastography, shear wave imaging, flow estimation, and envelope statistics. Currently, spectral-based parameterization and envelope statistics are not available on most conventional clinical ultrasound machines. However, in recent years, QUS techniques involving spectral-based parameterization and envelope statistics have demonstrated success in many applications, providing additional diagnostic capabilities. Spectral-based techniques include the estimation of the backscatter coefficient (BSC), estimation of attenuation, and estimation of scatterer properties such as the correlation length associated with an effective scatterer diameter (ESD) and the effective acoustic concentration (EAC) of scatterers. Envelope statistics include the estimation of the number density of scatterers and quantification of coherent to incoherent signals produced from the tissue. Challenges for clinical application include correctly accounting for attenuation effects and transmission losses and implementation of QUS on clinical devices. Successful clinical and preclinical applications demonstrating the ability of QUS to improve medical diagnostics include characterization of the myocardium during the cardiac cycle, cancer detection, classification of solid tumors and lymph nodes, detection and quantification of fatty liver disease, and monitoring and assessment of therapy.
NASA Astrophysics Data System (ADS)
Arnold, Thomas; De Biasio, Martin; Leitner, Raimund
2015-06-01
Two problems are addressed in this paper (i) the fluorescent marker-based and the (ii) marker-free discrimination between healthy and cancerous human tissues. For both applications the performance of hyper-spectral methods are quantified. Fluorescent marker-based tissue classification uses a number of fluorescent markers to dye specific parts of a human cell. The challenge is that the emission spectra of the fluorescent dyes overlap considerably. They are, furthermore disturbed by the inherent auto-fluorescence of human tissue. This results in ambiguities and decreased image contrast causing difficulties for the treatment decision. The higher spectral resolution introduced by tunable-filter-based spectral imaging in combination with spectral unmixing techniques results in an improvement of the image contrast and therefore more reliable information for the physician to choose the treatment decision. Marker-free tissue classification is based solely on the subtle spectral features of human tissue without the use of artificial markers. The challenge in this case is that the spectral differences between healthy and cancerous tissues are subtle and embedded in intra- and inter-patient variations of these features. The contributions of this paper are (i) the evaluation of hyper-spectral imaging in combination with spectral unmixing techniques for fluorescence marker-based tissue classification, (ii) the evaluation of spectral imaging for marker-free intra surgery tissue classification. Within this paper, we consider real hyper-spectral fluorescence and endoscopy data sets to emphasize the practical capability of the proposed methods. It is shown that the combination of spectral imaging with multivariate statistical methods can improve the sensitivity and specificity of the detection and the staging of cancerous tissues compared to standard procedures.
ChariDingari, Narahara; Barman, Ishan; Myakalwar, Ashwin Kumar; Tewari, Surya P.; Kumar, G. Manoj
2012-01-01
Despite the intrinsic elemental analysis capability and lack of sample preparation requirements, laser-induced breakdown spectroscopy (LIBS) has not been extensively used for real world applications, e.g. quality assurance and process monitoring. Specifically, variability in sample, system and experimental parameters in LIBS studies present a substantive hurdle for robust classification, even when standard multivariate chemometric techniques are used for analysis. Considering pharmaceutical sample investigation as an example, we propose the use of support vector machines (SVM) as a non-linear classification method over conventional linear techniques such as soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) for discrimination based on LIBS measurements. Using over-the-counter pharmaceutical samples, we demonstrate that application of SVM enables statistically significant improvements in prospective classification accuracy (sensitivity), due to its ability to address variability in LIBS sample ablation and plasma self-absorption behavior. Furthermore, our results reveal that SVM provides nearly 10% improvement in correct allocation rate and a concomitant reduction in misclassification rates of 75% (cf. PLS-DA) and 80% (cf. SIMCA)-when measurements from samples not included in the training set are incorporated in the test data – highlighting its robustness. While further studies on a wider matrix of sample types performed using different LIBS systems is needed to fully characterize the capability of SVM to provide superior predictions, we anticipate that the improved sensitivity and robustness observed here will facilitate application of the proposed LIBS-SVM toolbox for screening drugs and detecting counterfeit samples as well as in related areas of forensic and biological sample analysis. PMID:22292496
Dingari, Narahara Chari; Barman, Ishan; Myakalwar, Ashwin Kumar; Tewari, Surya P; Kumar Gundawar, Manoj
2012-03-20
Despite the intrinsic elemental analysis capability and lack of sample preparation requirements, laser-induced breakdown spectroscopy (LIBS) has not been extensively used for real-world applications, e.g., quality assurance and process monitoring. Specifically, variability in sample, system, and experimental parameters in LIBS studies present a substantive hurdle for robust classification, even when standard multivariate chemometric techniques are used for analysis. Considering pharmaceutical sample investigation as an example, we propose the use of support vector machines (SVM) as a nonlinear classification method over conventional linear techniques such as soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) for discrimination based on LIBS measurements. Using over-the-counter pharmaceutical samples, we demonstrate that the application of SVM enables statistically significant improvements in prospective classification accuracy (sensitivity), because of its ability to address variability in LIBS sample ablation and plasma self-absorption behavior. Furthermore, our results reveal that SVM provides nearly 10% improvement in correct allocation rate and a concomitant reduction in misclassification rates of 75% (cf. PLS-DA) and 80% (cf. SIMCA)-when measurements from samples not included in the training set are incorporated in the test data-highlighting its robustness. While further studies on a wider matrix of sample types performed using different LIBS systems is needed to fully characterize the capability of SVM to provide superior predictions, we anticipate that the improved sensitivity and robustness observed here will facilitate application of the proposed LIBS-SVM toolbox for screening drugs and detecting counterfeit samples, as well as in related areas of forensic and biological sample analysis.
NASA Astrophysics Data System (ADS)
Kukunda, Collins B.; Duque-Lazo, Joaquín; González-Ferreiro, Eduardo; Thaden, Hauke; Kleinn, Christoph
2018-03-01
Distinguishing tree species is relevant in many contexts of remote sensing assisted forest inventory. Accurate tree species maps support management and conservation planning, pest and disease control and biomass estimation. This study evaluated the performance of applying ensemble techniques with the goal of automatically distinguishing Pinus sylvestris L. and Pinus uncinata Mill. Ex Mirb within a 1.3 km2 mountainous area in Barcelonnette (France). Three modelling schemes were examined, based on: (1) high-density LiDAR data (160 returns m-2), (2) Worldview-2 multispectral imagery, and (3) Worldview-2 and LiDAR in combination. Variables related to the crown structure and height of individual trees were extracted from the normalized LiDAR point cloud at individual-tree level, after performing individual tree crown (ITC) delineation. Vegetation indices and the Haralick texture indices were derived from Worldview-2 images and served as independent spectral variables. Selection of the best predictor subset was done after a comparison of three variable selection procedures: (1) Random Forests with cross validation (AUCRFcv), (2) Akaike Information Criterion (AIC) and (3) Bayesian Information Criterion (BIC). To classify the species, 9 regression techniques were combined using ensemble models. Predictions were evaluated using cross validation and an independent dataset. Integration of datasets and models improved individual tree species classification (True Skills Statistic, TSS; from 0.67 to 0.81) over individual techniques and maintained strong predictive power (Relative Operating Characteristic, ROC = 0.91). Assemblage of regression models and integration of the datasets provided more reliable species distribution maps and associated tree-scale mapping uncertainties. Our study highlights the potential of model and data assemblage at improving species classifications needed in present-day forest planning and management.
76 FR 53699 - Labor Surplus Area Classification Under Executive Orders 12073 and 10582
Federal Register 2010, 2011, 2012, 2013, 2014
2011-08-29
... DEPARTMENT OF LABOR Employment and Training Administration Labor Surplus Area Classification Under... estimates provided to ETA by the Bureau of Labor Statistics are used in making these classifications. The... classification criteria include a ``floor unemployment rate'' (6.0%) and a ``ceiling unemployment rate'' (10.0...
Fries, Christopher J
2008-11-01
ABSTRACTOBJECTIVETo develop a classification of complementary and alternative medicine (CAM) practices widely available in Canada based on physicians' effectiveness ratings of the therapies.DESIGNA self-administered postal questionnaire asking family physicians to rate their "belief in the degree of therapeutic effectiveness" of 15 CAM therapies.SETTINGProvince of Alberta.PARTICIPANTSA total of 875 family physicians.MAIN OUTCOME MEASURESDescriptive statistics of physicians' awareness of and effectiveness ratings for each of the therapies; factor analysis was applied to the ratings of the 15 therapies in order to explore whether or not the data support the proposed classification of CAM practices into categories of accepted and rejected.RESULTSPhysicians believed that acupuncture, massage therapy, chiropractic care, relaxation therapy, biofeedback, and spiritual or religious healing were effective when used in conjunction with biomedicine to treat chronic or psychosomatic indications. Physicians attributed little effectiveness to homeopathy or naturopathy, Feldenkrais or Alexander technique, Rolfing, herbal medicine, traditional Chinese medicine, and reflexology. The factor analysis revealed an underlying dimensionality to physicians' effectiveness ratings of the CAM therapies that supports the classification of these practices as either accepted or rejected.CONCLUSIONThis study provides Canadian family physicians with information concerning which CAM therapies are generally accepted by their peers as effective and which are not.
Parametric Time-Frequency Analysis and Its Applications in Music Classification
NASA Astrophysics Data System (ADS)
Shen, Ying; Li, Xiaoli; Ma, Ngok-Wah; Krishnan, Sridhar
2010-12-01
Analysis of nonstationary signals, such as music signals, is a challenging task. The purpose of this study is to explore an efficient and powerful technique to analyze and classify music signals in higher frequency range (44.1 kHz). The pursuit methods are good tools for this purpose, but they aimed at representing the signals rather than classifying them as in Y. Paragakin et al., 2009. Among the pursuit methods, matching pursuit (MP), an adaptive true nonstationary time-frequency signal analysis tool, is applied for music classification. First, MP decomposes the sample signals into time-frequency functions or atoms. Atom parameters are then analyzed and manipulated, and discriminant features are extracted from atom parameters. Besides the parameters obtained using MP, an additional feature, central energy, is also derived. Linear discriminant analysis and the leave-one-out method are used to evaluate the classification accuracy rate for different feature sets. The study is one of the very few works that analyze atoms statistically and extract discriminant features directly from the parameters. From our experiments, it is evident that the MP algorithm with the Gabor dictionary decomposes nonstationary signals, such as music signals, into atoms in which the parameters contain strong discriminant information sufficient for accurate and efficient signal classifications.
Guidelines to classification and nomenclature of Arabian felsic plutonic rocks
Ramsay, C.R.; Stoeser, D.B.; Drysdall, A.R.
1986-01-01
Well-defined procedures for classifying the felsic plutonic rocks of the Arabian Shield on the basis of petrographic, chemical and lithostratigraphic criteria and mineral-resource potential have been adopted and developed in the Saudi Arabian Deputy Ministry for Mineral Resources over the past decade. A number of problems with conventional classification schemes have been identified and resolved; others, notably those arising from difficulties in identifying precise mineral compositions, continue to present difficulties. The petrographic nomenclature used is essentially that recommended by the International Union of Geological Sciences. Problems that have arisen include the definition of: (1) rocks with sodic, zoned or perthitic feldspar, (2) trondhjemites, and (3) alkali granites. Chemical classification has been largely based on relative molar amounts of alumina, lime and alkalis, and the use of conventional variation diagrams, but pilot studies utilizing univariate and multivariate statistical techniques have been made. The classification used in Saudi Arabia for stratigraphic purposes is a hierarchy of formation-rank units, suites and super-suites as defined in the Saudi Arabian stratigraphic code. For genetic and petrological studies, a grouping as 'associations' of similar and genetically related lithologies is commonly used. In order to indicate mineral-resource potential, the felsic plutons are classed as common, precursor, specialized or mineralized, in order of increasing exploration significance. ?? 1986.
Pu, Hongbin; Sun, Da-Wen; Ma, Ji; Cheng, Jun-Hu
2015-01-01
The potential of visible and near infrared hyperspectral imaging was investigated as a rapid and nondestructive technique for classifying fresh and frozen-thawed meats by integrating critical spectral and image features extracted from hyperspectral images in the region of 400-1000 nm. Six feature wavelengths (400, 446, 477, 516, 592 and 686 nm) were identified using uninformative variable elimination and successive projections algorithm. Image textural features of the principal component images from hyperspectral images were obtained using histogram statistics (HS), gray level co-occurrence matrix (GLCM) and gray level-gradient co-occurrence matrix (GLGCM). By these spectral and textural features, probabilistic neural network (PNN) models for classification of fresh and frozen-thawed pork meats were established. Compared with the models using the optimum wavelengths only, optimum wavelengths with HS image features, and optimum wavelengths with GLCM image features, the model integrating optimum wavelengths with GLGCM gave the highest classification rate of 93.14% and 90.91% for calibration and validation sets, respectively. Results indicated that the classification accuracy can be improved by combining spectral features with textural features and the fusion of critical spectral and textural features had better potential than single spectral extraction in classifying fresh and frozen-thawed pork meat. Copyright © 2014 Elsevier Ltd. All rights reserved.
Classifying patents based on their semantic content.
Bergeaud, Antonin; Potiron, Yoann; Raimbault, Juste
2017-01-01
In this paper, we extend some usual techniques of classification resulting from a large-scale data-mining and network approach. This new technology, which in particular is designed to be suitable to big data, is used to construct an open consolidated database from raw data on 4 million patents taken from the US patent office from 1976 onward. To build the pattern network, not only do we look at each patent title, but we also examine their full abstract and extract the relevant keywords accordingly. We refer to this classification as semantic approach in contrast with the more common technological approach which consists in taking the topology when considering US Patent office technological classes. Moreover, we document that both approaches have highly different topological measures and strong statistical evidence that they feature a different model. This suggests that our method is a useful tool to extract endogenous information.
Comparison of citrus orchard inventory using LISS-III and LISS-IV data
NASA Astrophysics Data System (ADS)
Singh, Niti; Chaudhari, K. N.; Manjunath, K. R.
2016-04-01
In India, in terms of area under cultivation, citrus is the third most cultivated fruit crop after Banana and Mango. Among citrus group, lime is one of the most important horticultural crops in India as the demand for its consumption is very high. Hence, preparing citrus crop inventories using remote sensing techniques would help in maintaining a record of its area and production statistics. This study shows how accurately citrus orchard can be classified using both IRS Resourcesat-2 LISS-III and LISS-IV data and depicts the optimum bio-widow for procuring satellite data to achieve high classification accuracy required for maintaining inventory of crop. Findings of the study show classification accuracy increased from 55% (using LISS-III) to 77% (using LISS-IV). Also, according to classified outputs and NDVI values obtained, April and May months were identified as optimum bio-window for citrus crop identification.
User oriented ERTS-1 images. [vegetation identification in Canada through image enhancement
NASA Technical Reports Server (NTRS)
Shlien, S.; Goodenough, D.
1974-01-01
Photographic reproduction of ERTS-1 images are capable of displaying only a portion of the total information available from the multispectral scanner. Methods are being developed to generate ERTS-1 images oriented towards special users such as agriculturists, foresters, and hydrologists by applying image enhancement techniques and interactive statistical classification schemes. Spatial boundaries and linear features can be emphasized and delineated using simple filters. Linear and nonlinear transformations can be applied to the spectral data to emphasize certain ground information. An automatic classification scheme was developed to identify particular ground cover classes such as fallow, grain, rape seed or various vegetation covers. The scheme applies the maximum likelihood decision rule to the spectral information and classifies the ERTS-1 image on a pixel by pixel basis. Preliminary results indicate that the classifier has limited success in distinguishing crops, but is well adapted for identifying different types of vegetation.
NASA Technical Reports Server (NTRS)
Leake, M. A.
1982-01-01
The geologic framework of the intercrater plains on Mercury and the Moon as determined through geologic mapping is presented. The strategies used in such mapping are discussed first. Then, because the degree of crater degradation is applied to both mapping and crater statistics, the correlation of degradation classification of lunar and Mercurian craters is thoroughly addressed. Different imaging systems can potentially affect this classification, and are therefore also discussed. The techniques used in mapping Mercury are discussed in Section 2, followed by presentation of the Geologic Map of Mercury in Section 3. Material units, structures, and relevant albedo and color data are discussed therein. Preliminary conclusions regarding plains' origins are given there. The last section presents the mapping analyses of the lunar intercrater plains, including tentative conclusions of their origin.
Classifying patents based on their semantic content
2017-01-01
In this paper, we extend some usual techniques of classification resulting from a large-scale data-mining and network approach. This new technology, which in particular is designed to be suitable to big data, is used to construct an open consolidated database from raw data on 4 million patents taken from the US patent office from 1976 onward. To build the pattern network, not only do we look at each patent title, but we also examine their full abstract and extract the relevant keywords accordingly. We refer to this classification as semantic approach in contrast with the more common technological approach which consists in taking the topology when considering US Patent office technological classes. Moreover, we document that both approaches have highly different topological measures and strong statistical evidence that they feature a different model. This suggests that our method is a useful tool to extract endogenous information. PMID:28445550
Exploring the CAESAR database using dimensionality reduction techniques
NASA Astrophysics Data System (ADS)
Mendoza-Schrock, Olga; Raymer, Michael L.
2012-06-01
The Civilian American and European Surface Anthropometry Resource (CAESAR) database containing over 40 anthropometric measurements on over 4000 humans has been extensively explored for pattern recognition and classification purposes using the raw, original data [1-4]. However, some of the anthropometric variables would be impossible to collect in an uncontrolled environment. Here, we explore the use of dimensionality reduction methods in concert with a variety of classification algorithms for gender classification using only those variables that are readily observable in an uncontrolled environment. Several dimensionality reduction techniques are employed to learn the underlining structure of the data. These techniques include linear projections such as the classical Principal Components Analysis (PCA) and non-linear (manifold learning) techniques, such as Diffusion Maps and the Isomap technique. This paper briefly describes all three techniques, and compares three different classifiers, Naïve Bayes, Adaboost, and Support Vector Machines (SVM), for gender classification in conjunction with each of these three dimensionality reduction approaches.
Stephens, David; Diesing, Markus
2014-01-01
Detailed seabed substrate maps are increasingly in demand for effective planning and management of marine ecosystems and resources. It has become common to use remotely sensed multibeam echosounder data in the form of bathymetry and acoustic backscatter in conjunction with ground-truth sampling data to inform the mapping of seabed substrates. Whilst, until recently, such data sets have typically been classified by expert interpretation, it is now obvious that more objective, faster and repeatable methods of seabed classification are required. This study compares the performances of a range of supervised classification techniques for predicting substrate type from multibeam echosounder data. The study area is located in the North Sea, off the north-east coast of England. A total of 258 ground-truth samples were classified into four substrate classes. Multibeam bathymetry and backscatter data, and a range of secondary features derived from these datasets were used in this study. Six supervised classification techniques were tested: Classification Trees, Support Vector Machines, k-Nearest Neighbour, Neural Networks, Random Forest and Naive Bayes. Each classifier was trained multiple times using different input features, including i) the two primary features of bathymetry and backscatter, ii) a subset of the features chosen by a feature selection process and iii) all of the input features. The predictive performances of the models were validated using a separate test set of ground-truth samples. The statistical significance of model performances relative to a simple baseline model (Nearest Neighbour predictions on bathymetry and backscatter) were tested to assess the benefits of using more sophisticated approaches. The best performing models were tree based methods and Naive Bayes which achieved accuracies of around 0.8 and kappa coefficients of up to 0.5 on the test set. The models that used all input features didn't generally perform well, highlighting the need for some means of feature selection.
Statistical inference for remote sensing-based estimates of net deforestation
Ronald E. McRoberts; Brian F. Walters
2012-01-01
Statistical inference requires expression of an estimate in probabilistic terms, usually in the form of a confidence interval. An approach to constructing confidence intervals for remote sensing-based estimates of net deforestation is illustrated. The approach is based on post-classification methods using two independent forest/non-forest classifications because...
Saini, Harsh; Lal, Sunil Pranit; Naidu, Vimal Vikash; Pickering, Vincel Wince; Singh, Gurmeet; Tsunoda, Tatsuhiko; Sharma, Alok
2016-12-05
High dimensional feature space generally degrades classification in several applications. In this paper, we propose a strategy called gene masking, in which non-contributing dimensions are heuristically removed from the data to improve classification accuracy. Gene masking is implemented via a binary encoded genetic algorithm that can be integrated seamlessly with classifiers during the training phase of classification to perform feature selection. It can also be used to discriminate between features that contribute most to the classification, thereby, allowing researchers to isolate features that may have special significance. This technique was applied on publicly available datasets whereby it substantially reduced the number of features used for classification while maintaining high accuracies. The proposed technique can be extremely useful in feature selection as it heuristically removes non-contributing features to improve the performance of classifiers.
Texture as a basis for acoustic classification of substrate in the nearshore region
NASA Astrophysics Data System (ADS)
Dennison, A.; Wattrus, N. J.
2016-12-01
Segmentation and classification of substrate type from two locations in Lake Superior, are predicted using multivariate statistical processing of textural measures derived from shallow-water, high-resolution multibeam bathymetric data. During a multibeam sonar survey, both bathymetric and backscatter data are collected. It is well documented that the statistical characteristic of a sonar backscatter mosaic is dependent on substrate type. While classifying the bottom-type on the basis on backscatter alone can accurately predict and map bottom-type, it lacks the ability to resolve and capture fine textural details, an important factor in many habitat mapping studies. Statistical processing can capture the pertinent details about the bottom-type that are rich in textural information. Further multivariate statistical processing can then isolate characteristic features, and provide the basis for an accurate classification scheme. Preliminary results from an analysis of bathymetric data and ground-truth samples collected from the Amnicon River, Superior, Wisconsin, and the Lester River, Duluth, Minnesota, demonstrate the ability to process and develop a novel classification scheme of the bottom type in two geomorphologically distinct areas.
NASA Astrophysics Data System (ADS)
Lage, A.; Taboada, J. J.
Precipitation is the most obvious of the weather elements in its effects on normal life. Numerical weather prediction (NWP) is generally used to produce quantitative precip- itation forecast (QPF) beyond the 1-3 h time frame. These models often fail to predict small-scale variations of rain because of spin-up problems and their coarse spatial and temporal resolution (Antolik, 2000). Moreover, there are some uncertainties about the behaviour of the NWP models in extreme situations (de Bruijn and Brandsma, 2000). Hybrid techniques, combining the benefits of NWP and statistical approaches in a flexible way, are very useful to achieve a good QPF. In this work, a new technique of QPF for Galicia (NW of Spain) is presented. This region has a percentage of rainy days per year greater than 50% with quantities that may cause floods, with human and economical damages. The technique is composed of a NWP model (ARPS) and a statistical downscaling process based on an automated classification scheme of at- mospheric circulation patterns for the Iberian Peninsula (J. Ribalaygua and R. Boren, 1995). Results show that QPF for Galicia is improved using this hybrid technique. [1] Antolik, M.S. 2000 "An Overview of the National Weather Service's centralized statistical quantitative precipitation forecasts". Journal of Hydrology, 239, pp:306- 337. [2] de Bruijn, E.I.F and T. Brandsma "Rainfall prediction for a flooding event in Ireland caused by the remnants of Hurricane Charley". Journal of Hydrology, 239, pp:148-161. [3] Ribalaygua, J. and Boren R. "Clasificación de patrones espaciales de precipitación diaria sobre la España Peninsular". Informes N 3 y 4 del Servicio de Análisis e Investigación del Clima. Instituto Nacional de Meteorología. Madrid. 53 pp.
Olives, Casey; Valadez, Joseph J; Brooker, Simon J; Pagano, Marcello
2012-01-01
Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%), and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n=15 and n=25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n=15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools.
Darmawan, M F; Yusuf, Suhaila M; Kadir, M R Abdul; Haron, H
2015-02-01
Sex estimation is used in forensic anthropology to assist the identification of individual remains. However, the estimation techniques tend to be unique and applicable only to a certain population. This paper analyzed sex estimation on living individual child below 19 years old using the length of 19 bones of left hand applied for three classification techniques, which were Discriminant Function Analysis (DFA), Support Vector Machine (SVM) and Artificial Neural Network (ANN) multilayer perceptron. These techniques were carried out on X-ray images of the left hand taken from an Asian population data set. All the 19 bones of the left hand were measured using Free Image software, and all the techniques were performed using MATLAB. The group of age "16-19" years old and "7-9" years old were the groups that could be used for sex estimation with as their average of accuracy percentage was above 80%. ANN model was the best classification technique with the highest average of accuracy percentage in the two groups of age compared to other classification techniques. The results show that each classification technique has the best accuracy percentage on each different group of age. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Elhenawy, Mohammed; Jahangiri, Arash; Rakha, Hesham A; El-Shawarby, Ihab
2015-10-01
The ability to model driver stop/run behavior at signalized intersections considering the roadway surface condition is critical in the design of advanced driver assistance systems. Such systems can reduce intersection crashes and fatalities by predicting driver stop/run behavior. The research presented in this paper uses data collected from two controlled field experiments on the Smart Road at the Virginia Tech Transportation Institute (VTTI) to model driver stop/run behavior at the onset of a yellow indication for different roadway surface conditions. The paper offers two contributions. First, it introduces a new predictor related to driver aggressiveness and demonstrates that this measure enhances the modeling of driver stop/run behavior. Second, it applies well-known artificial intelligence techniques including: adaptive boosting (AdaBoost), random forest, and support vector machine (SVM) algorithms as well as traditional logistic regression techniques on the data in order to develop a model that can be used by traffic signal controllers to predict driver stop/run decisions in a connected vehicle environment. The research demonstrates that by adding the proposed driver aggressiveness predictor to the model, there is a statistically significant increase in the model accuracy. Moreover the false alarm rate is significantly reduced but this reduction is not statistically significant. The study demonstrates that, for the subject data, the SVM machine learning algorithm performs the best in terms of optimum classification accuracy and false positive rates. However, the SVM model produces the best performance in terms of the classification accuracy only. Copyright © 2015 Elsevier Ltd. All rights reserved.
Classification Techniques for Digital Map Compression
1989-03-01
classification improved the performance of the K-means classification algorithm resulting in a compression of 8.06:1 with Lempel - Ziv coding. Run-length coding... compression performance are run-length coding [2], [8] and Lempel - Ziv coding 110], [11]. These techniques are chosen because they are most efficient when...investigated. After the classification, some standard file compression methods, such as Lempel - Ziv and run-length encoding were applied to the
Automated texture-based identification of ovarian cancer in confocal microendoscope images
NASA Astrophysics Data System (ADS)
Srivastava, Saurabh; Rodriguez, Jeffrey J.; Rouse, Andrew R.; Brewer, Molly A.; Gmitro, Arthur F.
2005-03-01
The fluorescence confocal microendoscope provides high-resolution, in-vivo imaging of cellular pathology during optical biopsy. There are indications that the examination of human ovaries with this instrument has diagnostic implications for the early detection of ovarian cancer. The purpose of this study was to develop a computer-aided system to facilitate the identification of ovarian cancer from digital images captured with the confocal microendoscope system. To achieve this goal, we modeled the cellular-level structure present in these images as texture and extracted features based on first-order statistics, spatial gray-level dependence matrices, and spatial-frequency content. Selection of the best features for classification was performed using traditional feature selection techniques including stepwise discriminant analysis, forward sequential search, a non-parametric method, principal component analysis, and a heuristic technique that combines the results of these methods. The best set of features selected was used for classification, and performance of various machine classifiers was compared by analyzing the areas under their receiver operating characteristic curves. The results show that it is possible to automatically identify patients with ovarian cancer based on texture features extracted from confocal microendoscope images and that the machine performance is superior to that of the human observer.
Practical protocols for fast histopathology by Fourier transform infrared spectroscopic imaging
NASA Astrophysics Data System (ADS)
Keith, Frances N.; Reddy, Rohith K.; Bhargava, Rohit
2008-02-01
Fourier transform infrared (FT-IR) spectroscopic imaging is an emerging technique that combines the molecular selectivity of spectroscopy with the spatial specificity of optical microscopy. We demonstrate a new concept in obtaining high fidelity data using commercial array detectors coupled to a microscope and Michelson interferometer. Next, we apply the developed technique to rapidly provide automated histopathologic information for breast cancer. Traditionally, disease diagnoses are based on optical examinations of stained tissue and involve a skilled recognition of morphological patterns of specific cell types (histopathology). Consequently, histopathologic determinations are a time consuming, subjective process with innate intra- and inter-operator variability. Utilizing endogenous molecular contrast inherent in vibrational spectra, specially designed tissue microarrays and pattern recognition of specific biochemical features, we report an integrated algorithm for automated classifications. The developed protocol is objective, statistically significant and, being compatible with current tissue processing procedures, holds potential for routine clinical diagnoses. We first demonstrate that the classification of tissue type (histology) can be accomplished in a manner that is robust and rigorous. Since data quality and classifier performance are linked, we quantify the relationship through our analysis model. Last, we demonstrate the application of the minimum noise fraction (MNF) transform to improve tissue segmentation.
Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases.
Masías, Víctor Hugo; Valle, Mauricio; Morselli, Carlo; Crespo, Fernando; Vargas, Augusto; Laengle, Sigifredo
2016-01-01
Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers-Logistic Regression, Naïve Bayes and Random Forest-with a range of social network measures and the necessary databases to model the verdicts in two real-world cases: the U.S. Watergate Conspiracy of the 1970's and the now-defunct Canada-based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures.
Review of chart recognition in document images
NASA Astrophysics Data System (ADS)
Liu, Yan; Lu, Xiaoqing; Qin, Yeyang; Tang, Zhi; Xu, Jianbo
2013-01-01
As an effective information transmitting way, chart is widely used to represent scientific statistics datum in books, research papers, newspapers etc. Though textual information is still the major source of data, there has been an increasing trend of introducing graphs, pictures, and figures into the information pool. Text recognition techniques for documents have been accomplished using optical character recognition (OCR) software. Chart recognition techniques as a necessary supplement of OCR for document images are still an unsolved problem due to the great subjectiveness and variety of charts styles. This paper reviews the development process of chart recognition techniques in the past decades and presents the focuses of current researches. The whole process of chart recognition is presented systematically, which mainly includes three parts: chart segmentation, chart classification, and chart Interpretation. In each part, the latest research work is introduced. In the last, the paper concludes with a summary and promising future research direction.
Derrac, Joaquín; Triguero, Isaac; Garcia, Salvador; Herrera, Francisco
2012-10-01
Cooperative coevolution is a successful trend of evolutionary computation which allows us to define partitions of the domain of a given problem, or to integrate several related techniques into one, by the use of evolutionary algorithms. It is possible to apply it to the development of advanced classification methods, which integrate several machine learning techniques into a single proposal. A novel approach integrating instance selection, instance weighting, and feature weighting into the framework of a coevolutionary model is presented in this paper. We compare it with a wide range of evolutionary and nonevolutionary related methods, in order to show the benefits of the employment of coevolution to apply the techniques considered simultaneously. The results obtained, contrasted through nonparametric statistical tests, show that our proposal outperforms other methods in the comparison, thus becoming a suitable tool in the task of enhancing the nearest neighbor classifier.
NASA Astrophysics Data System (ADS)
Srinivasan, Yeshwanth; Hernes, Dana; Tulpule, Bhakti; Yang, Shuyu; Guo, Jiangling; Mitra, Sunanda; Yagneswaran, Sriraja; Nutter, Brian; Jeronimo, Jose; Phillips, Benny; Long, Rodney; Ferris, Daron
2005-04-01
Automated segmentation and classification of diagnostic markers in medical imagery are challenging tasks. Numerous algorithms for segmentation and classification based on statistical approaches of varying complexity are found in the literature. However, the design of an efficient and automated algorithm for precise classification of desired diagnostic markers is extremely image-specific. The National Library of Medicine (NLM), in collaboration with the National Cancer Institute (NCI), is creating an archive of 60,000 digitized color images of the uterine cervix. NLM is developing tools for the analysis and dissemination of these images over the Web for the study of visual features correlated with precancerous neoplasia and cancer. To enable indexing of images of the cervix, it is essential to develop algorithms for the segmentation of regions of interest, such as acetowhitened regions, and automatic identification and classification of regions exhibiting mosaicism and punctation. Success of such algorithms depends, primarily, on the selection of relevant features representing the region of interest. We present color and geometric features based statistical classification and segmentation algorithms yielding excellent identification of the regions of interest. The distinct classification of the mosaic regions from the non-mosaic ones has been obtained by clustering multiple geometric and color features of the segmented sections using various morphological and statistical approaches. Such automated classification methodologies will facilitate content-based image retrieval from the digital archive of uterine cervix and have the potential of developing an image based screening tool for cervical cancer.
NASA Astrophysics Data System (ADS)
Berti, Matteo; Corsini, Alessandro; Franceschini, Silvia; Iannacone, Jean Pascal
2013-04-01
The application of space borne synthetic aperture radar interferometry has progressed, over the last two decades, from the pioneer use of single interferograms for analyzing changes on the earth's surface to the development of advanced multi-interferogram techniques to analyze any sort of natural phenomena which involves movements of the ground. The success of multi-interferograms techniques in the analysis of natural hazards such as landslides and subsidence is widely documented in the scientific literature and demonstrated by the consensus among the end-users. Despite the great potential of this technique, radar interpretation of slope movements is generally based on the sole analysis of average displacement velocities, while the information embraced in multi interferogram time series is often overlooked if not completely neglected. The underuse of PS time series is probably due to the detrimental effect of residual atmospheric errors, which make the PS time series characterized by erratic, irregular fluctuations often difficult to interpret, and also to the difficulty of performing a visual, supervised analysis of the time series for a large dataset. In this work is we present a procedure for automatic classification of PS time series based on a series of statistical characterization tests. The procedure allows to classify the time series into six distinctive target trends (0=uncorrelated; 1=linear; 2=quadratic; 3=bilinear; 4=discontinuous without constant velocity; 5=discontinuous with change in velocity) and retrieve for each trend a series of descriptive parameters which can be efficiently used to characterize the temporal changes of ground motion. The classification algorithms were developed and tested using an ENVISAT datasets available in the frame of EPRS-E project (Extraordinary Plan of Environmental Remote Sensing) of the Italian Ministry of Environment (track "Modena", Northern Apennines). This dataset was generated using standard processing, then the time series are typically affected by a significant noise to signal ratio. The results of the analysis show that even with such a rough-quality dataset, our automated classification procedure can greatly improve radar interpretation of mass movements. In general, uncorrelated PS (type 0) are concentrated in flat areas such as fluvial terraces and valley bottoms, and along stable watershed divides; linear PS (type 1) are mainly located on slopes (both inside or outside mapped landslides) or near the edge of scarps or steep slopes; non-linear PS (types 2 to 5) typically fall inside landslide deposits or in the surrounding areas. The spatial distribution of classified PS allows to detect deformation phenomena not visible by considering the average velocity alone, and provide important information on the temporal evolution of the phenomena such as acceleration, deceleration, seasonal fluctuations, abrupt or continuous changes of the displacement rate. Based on these encouraging results we integrated all the classification algorithms into a Graphical User Interface (called PSTime) which is freely available as a standalone application.
Measuring severe maternal morbidity: validation of potential measures.
Main, Elliott K; Abreo, Anisha; McNulty, Jennifer; Gilbert, William; McNally, Colleen; Poeltler, Debra; Lanner-Cusin, Katarina; Fenton, Douglas; Gipps, Theresa; Melsop, Kathryn; Greene, Naomi; Gould, Jeffrey B; Kilpatrick, Sarah
2016-05-01
Both maternal mortality rate and severe maternal morbidity rate have risen significantly in the United Sates. Recently, the Centers for Disease Control and Prevention introduced International Classification of Diseases, 9th revision, criteria for defining severe maternal morbidity with the use of administrative data sources; however, those criteria have not been validated with the use of chart reviews. The primary aim of the current study was to validate the Centers for Disease Control and Prevention International Classification of Diseases, 9th revision, criteria for the identification of severe maternal morbidity. This analysis initially required the development of a reproducible set of clinical conditions that were judged to be consistent with severe maternal morbidity to be used as the clinical gold standard for validation. Alternative criteria for severe maternal morbidity were also examined. The 67,468 deliveries that occurred during a 12-month period from 16 participating California hospitals were screened initially for severe maternal morbidity with the presence of any of 4 criteria: (1) Centers for Disease Control and Prevention International Classification of Diseases, 9th revision, diagnosis and procedure codes; (2) prolonged postpartum length of stay (>3 standard deviations beyond the mean length of stay for the California population); (3) any maternal intensive care unit admissions (with the use of hospital billing sources); and (4) the administration of any blood product (with the use of transfusion service data). Complete medical records for all screen-positive cases were examined to determine whether they satisfied the criteria for the clinical gold standard (determined by 4 rounds of a modified Delphi technique). Descriptive and statistical analyses that included area under the receiver operating characteristic curve and C-statistic were performed. The Centers for Disease Control and Prevention International Classification of Diseases, 9th revision, criteria had a reasonably high sensitivity of 0.77 and a positive predictive value of 0.44 with a C-statistic of 0.87. The most important source of false-positive cases were mothers whose only criterion was 1-2 units of blood products. The Centers for Disease Control and Prevention International Classification of Diseases, 9th revision, criteria screen rate ranged from 0.51-2.45% among hospitals. True positive severe maternal morbidity ranged from 0.05-1.13%. When hospitals were grouped by their neonatal intensive care unit level of care, severe maternal morbidity rates were statistically lower at facilities with lower level neonatal intensive care units (P < .0001). The Centers for Disease Control and Prevention International Classification of Diseases, 9th revision, criteria can serve as a reasonable administrative metric for measuring severe maternal morbidity at population levels. Caution should be used with the use of these criteria for individual hospitals, because case-mix effects appear to be strong. Copyright © 2016 Elsevier Inc. All rights reserved.
Matías, J M; Taboada, J; Ordóñez, C; Nieto, P G
2007-08-17
This article describes a methodology to model the degree of remedial action required to make short stretches of a roadway suitable for dangerous goods transport (DGT), particularly pollutant substances, using different variables associated with the characteristics of each segment. Thirty-one factors determining the impact of an accident on a particular stretch of road were identified and subdivided into two major groups: accident probability factors and accident severity factors. Given the number of factors determining the state of a particular road segment, the only viable statistical methods for implementing the model were machine learning techniques, such as multilayer perceptron networks (MLPs), classification trees (CARTs) and support vector machines (SVMs). The results produced by these techniques on a test sample were more favourable than those produced by traditional discriminant analysis, irrespective of whether dimensionality reduction techniques were applied. The best results were obtained using SVMs specifically adapted to ordinal data. This technique takes advantage of the ordinal information contained in the data without penalising the computational load. Furthermore, the technique permits the estimation of the utility function that is latent in expert knowledge.
SENTINEL-1 and SENTINEL-2 Data Fusion for Wetlands Mapping: Balikdami, Turkey
NASA Astrophysics Data System (ADS)
Kaplan, G.; Avdan, U.
2018-04-01
Wetlands provide a number of environmental and socio-economic benefits such as their ability to store floodwaters and improve water quality, providing habitats for wildlife and supporting biodiversity, as well as aesthetic values. Remote sensing technology has proven to be a useful and frequent application in monitoring and mapping wetlands. Combining optical and microwave satellite data can help with mapping and monitoring the biophysical characteristics of wetlands and wetlands` vegetation. Also, fusing radar and optical remote sensing data can increase the wetland classification accuracy. In this paper, data from the fine spatial resolution optical satellite, Sentinel-2 and the Synthetic Aperture Radar Satellite, Sentinel-1, were fused for mapping wetlands. Both Sentinel-1 and Sentinel-2 images were pre-processed. After the pre-processing, vegetation indices were calculated using the Sentinel-2 bands and the results were included in the fusion data set. For the classification of the fused data, three different classification approaches were used and compared. The results showed significant improvement in the wetland classification using both multispectral and microwave data. Also, the presence of the red edge bands and the vegetation indices used in the data set showed significant improvement in the discrimination between wetlands and other vegetated areas. The statistical results of the fusion of the optical and radar data showed high wetland mapping accuracy, showing an overall classification accuracy of approximately 90 % in the object-based classification method. For future research, we recommend multi-temporal image use, terrain data collection, as well as a comparison of the used method with the traditional image fusion techniques.
[Bile leakage after liver resection: A retrospective cohort study].
Menclová, K; Bělina, F; Pudil, J; Langer, D; Ryska, M
2015-12-01
Many previous reports have focused on bile leakage after liver resection. Despite the improvements in surgical techniques and perioperative care the incidence of this complication rather keeps increasing. A number of predictive factors have been analyzed. There is still no consensus regarding their influence on the formation of bile leakage. The objective of our analysis was to evaluate the incidence of bile leakage, its impact on mortality and duration of hospitalization at our department. At the same time, we conducted an analysis of known predictive factors. The authors present a retrospective review of the set of 146 patients who underwent liver resection at the Department of Surgery of the 2nd Faculty of Medicine of the Charles University and Central Military Hospital Prague, performed between 20102013. We used the current ISGLS (International Study Group of Liver Surgery) classification to evaluate the bile leakage. The severity of this complication was determined according to the Clavien-Dindo classification system. Statistical significance of the predictive factors was determined using Fishers exact test and Students t-test. The incidence of bile leakage was 21%. According to ISGLS classification the A, B, and C rates were 6.5%, 61.2%, and 32.3%, respectively. The severity of bile leakage according to the Clavien-Dindo classification system - I-II, IIIa, IIIb, IV and V rates were 19.3%, 42%, 9.7%, 9.7%, and 19.3%, respectively. We determined the following predictive factors as statistically significant: surgery for malignancy (p<0.001), major hepatic resection (p=0.001), operative time (p<0.001), high intraoperative blood loss (p=0.02), construction of HJA (p=0.005), portal venous embolization/two-stage surgery (p=0.009) and ASA score (p=0.02). Bile leakage significantly prolonged hospitalization time (p<0.001). In the group of patients with bile leakage the perioperative mortality was 23 times higher (p<0.001) than in the group with no leakage. Bile leakage is one of the most serious complications of liver surgery. Most of the risk factors are not easily controllable and there is no clear consensus on their influence. Intraoperative leak tests could probably reduce the incidence of bile leakage. In the future, further studies will be required to improve the perioperative management and techniques to prevent such serious complications. Multidisciplinary approach is essential in the treatment.
Comparisons of neural networks to standard techniques for image classification and correlation
NASA Technical Reports Server (NTRS)
Paola, Justin D.; Schowengerdt, Robert A.
1994-01-01
Neural network techniques for multispectral image classification and spatial pattern detection are compared to the standard techniques of maximum-likelihood classification and spatial correlation. The neural network produced a more accurate classification than maximum-likelihood of a Landsat scene of Tucson, Arizona. Some of the errors in the maximum-likelihood classification are illustrated using decision region and class probability density plots. As expected, the main drawback to the neural network method is the long time required for the training stage. The network was trained using several different hidden layer sizes to optimize both the classification accuracy and training speed, and it was found that one node per class was optimal. The performance improved when 3x3 local windows of image data were entered into the net. This modification introduces texture into the classification without explicit calculation of a texture measure. Larger windows were successfully used for the detection of spatial features in Landsat and Magellan synthetic aperture radar imagery.
Feature extraction and classification algorithms for high dimensional data
NASA Technical Reports Server (NTRS)
Lee, Chulhee; Landgrebe, David
1993-01-01
Feature extraction and classification algorithms for high dimensional data are investigated. Developments with regard to sensors for Earth observation are moving in the direction of providing much higher dimensional multispectral imagery than is now possible. In analyzing such high dimensional data, processing time becomes an important factor. With large increases in dimensionality and the number of classes, processing time will increase significantly. To address this problem, a multistage classification scheme is proposed which reduces the processing time substantially by eliminating unlikely classes from further consideration at each stage. Several truncation criteria are developed and the relationship between thresholds and the error caused by the truncation is investigated. Next an approach to feature extraction for classification is proposed based directly on the decision boundaries. It is shown that all the features needed for classification can be extracted from decision boundaries. A characteristic of the proposed method arises by noting that only a portion of the decision boundary is effective in discriminating between classes, and the concept of the effective decision boundary is introduced. The proposed feature extraction algorithm has several desirable properties: it predicts the minimum number of features necessary to achieve the same classification accuracy as in the original space for a given pattern recognition problem; and it finds the necessary feature vectors. The proposed algorithm does not deteriorate under the circumstances of equal means or equal covariances as some previous algorithms do. In addition, the decision boundary feature extraction algorithm can be used both for parametric and non-parametric classifiers. Finally, some problems encountered in analyzing high dimensional data are studied and possible solutions are proposed. First, the increased importance of the second order statistics in analyzing high dimensional data is recognized. By investigating the characteristics of high dimensional data, the reason why the second order statistics must be taken into account in high dimensional data is suggested. Recognizing the importance of the second order statistics, there is a need to represent the second order statistics. A method to visualize statistics using a color code is proposed. By representing statistics using color coding, one can easily extract and compare the first and the second statistics.
Ensemble of sparse classifiers for high-dimensional biological data.
Kim, Sunghan; Scalzo, Fabien; Telesca, Donatello; Hu, Xiao
2015-01-01
Biological data are often high in dimension while the number of samples is small. In such cases, the performance of classification can be improved by reducing the dimension of data, which is referred to as feature selection. Recently, a novel feature selection method has been proposed utilising the sparsity of high-dimensional biological data where a small subset of features accounts for most variance of the dataset. In this study we propose a new classification method for high-dimensional biological data, which performs both feature selection and classification within a single framework. Our proposed method utilises a sparse linear solution technique and the bootstrap aggregating algorithm. We tested its performance on four public mass spectrometry cancer datasets along with two other conventional classification techniques such as Support Vector Machines and Adaptive Boosting. The results demonstrate that our proposed method performs more accurate classification across various cancer datasets than those conventional classification techniques.
Comparative Analysis of RF Emission Based Fingerprinting Techniques for ZigBee Device Classification
quantify the differences invarious RF fingerprinting techniques via comparative analysis of MDA/ML classification results. The findings herein demonstrate...correct classification rates followed by COR-DNA and then RF-DNA in most test cases and especially in low Eb/N0 ranges, where ZigBee is designed to operate.
The trophic classification of lakes using ERTS multispectral scanner data
NASA Technical Reports Server (NTRS)
Blackwell, R. J.; Boland, D. H.
1975-01-01
Lake classification methods based on the use of ERTS data are described. Preliminary classification results obtained by multispectral and digital image processing techniques indicate satisfactory correlation between ERTS data and EPA-supplied water analysis. Techniques for determining lake trophic levels using ERTS data are examined, and data obtained for 20 lakes are discussed.
SAR-based change detection using hypothesis testing and Markov random field modelling
NASA Astrophysics Data System (ADS)
Cao, W.; Martinis, S.
2015-04-01
The objective of this study is to automatically detect changed areas caused by natural disasters from bi-temporal co-registered and calibrated TerraSAR-X data. The technique in this paper consists of two steps: Firstly, an automatic coarse detection step is applied based on a statistical hypothesis test for initializing the classification. The original analytical formula as proposed in the constant false alarm rate (CFAR) edge detector is reviewed and rewritten in a compact form of the incomplete beta function, which is a builtin routine in commercial scientific software such as MATLAB and IDL. Secondly, a post-classification step is introduced to optimize the noisy classification result in the previous step. Generally, an optimization problem can be formulated as a Markov random field (MRF) on which the quality of a classification is measured by an energy function. The optimal classification based on the MRF is related to the lowest energy value. Previous studies provide methods for the optimization problem using MRFs, such as the iterated conditional modes (ICM) algorithm. Recently, a novel algorithm was presented based on graph-cut theory. This method transforms a MRF to an equivalent graph and solves the optimization problem by a max-flow/min-cut algorithm on the graph. In this study this graph-cut algorithm is applied iteratively to improve the coarse classification. At each iteration the parameters of the energy function for the current classification are set by the logarithmic probability density function (PDF). The relevant parameters are estimated by the method of logarithmic cumulants (MoLC). Experiments are performed using two flood events in Germany and Australia in 2011 and a forest fire on La Palma in 2009 using pre- and post-event TerraSAR-X data. The results show convincing coarse classifications and considerable improvement by the graph-cut post-classification step.
NASA Technical Reports Server (NTRS)
Spruce, J. P.; Smoot, James; Ellis, Jean; Hilbert, Kent; Swann, Roberta
2012-01-01
This paper discusses the development and implementation of a geospatial data processing method and multi-decadal Landsat time series for computing general coastal U.S. land-use and land-cover (LULC) classifications and change products consisting of seven classes (water, barren, upland herbaceous, non-woody wetland, woody upland, woody wetland, and urban). Use of this approach extends the observational period of the NOAA-generated Coastal Change and Analysis Program (C-CAP) products by almost two decades, assuming the availability of one cloud free Landsat scene from any season for each targeted year. The Mobile Bay region in Alabama was used as a study area to develop, demonstrate, and validate the method that was applied to derive LULC products for nine dates at approximate five year intervals across a 34-year time span, using single dates of data for each classification in which forests were either leaf-on, leaf-off, or mixed senescent conditions. Classifications were computed and refined using decision rules in conjunction with unsupervised classification of Landsat data and C-CAP value-added products. Each classification's overall accuracy was assessed by comparing stratified random locations to available reference data, including higher spatial resolution satellite and aerial imagery, field survey data, and raw Landsat RGBs. Overall classification accuracies ranged from 83 to 91% with overall Kappa statistics ranging from 0.78 to 0.89. The accuracies are comparable to those from similar, generalized LULC products derived from C-CAP data. The Landsat MSS-based LULC product accuracies are similar to those from Landsat TM or ETM+ data. Accurate classifications were computed for all nine dates, yielding effective results regardless of season. This classification method yielded products that were used to compute LULC change products via additive GIS overlay techniques.
NASA Astrophysics Data System (ADS)
Sridhar, J.
2015-12-01
The focus of this work is to examine polarimetric decomposition techniques primarily focussed on Pauli decomposition and Sphere Di-Plane Helix (SDH) decomposition for forest resource assessment. The data processing methods adopted are Pre-processing (Geometric correction and Radiometric calibration), Speckle Reduction, Image Decomposition and Image Classification. Initially to classify forest regions, unsupervised classification was applied to determine different unknown classes. It was observed K-means clustering method gave better results in comparison with ISO Data method.Using the algorithm developed for Radar Tools, the code for decomposition and classification techniques were applied in Interactive Data Language (IDL) and was applied to RISAT-1 image of Mysore-Mandya region of Karnataka, India. This region is chosen for studying forest vegetation and consists of agricultural lands, water and hilly regions. Polarimetric SAR data possess a high potential for classification of earth surface.After applying the decomposition techniques, classification was done by selecting region of interests andpost-classification the over-all accuracy was observed to be higher in the SDH decomposed image, as it operates on individual pixels on a coherent basis and utilises the complete intrinsic coherent nature of polarimetric SAR data. Thereby, making SDH decomposition particularly suited for analysis of high-resolution SAR data. The Pauli Decomposition represents all the polarimetric information in a single SAR image however interpretation of the resulting image is difficult. The SDH decomposition technique seems to produce better results and interpretation as compared to Pauli Decomposition however more quantification and further analysis are being done in this area of research. The comparison of Polarimetric decomposition techniques and evolutionary classification techniques will be the scope of this work.
McCann, Cooper; Repasky, Kevin S.; Morin, Mikindra; ...
2017-05-23
Hyperspectral image analysis has benefited from an array of methods that take advantage of the increased spectral depth compared to multispectral sensors; however, the focus of these developments has been on supervised classification methods. Lack of a priori knowledge regarding land cover characteristics can make unsupervised classification methods preferable under certain circumstances. An unsupervised classification technique is presented in this paper that utilizes physically relevant basis functions to model the reflectance spectra. These fit parameters used to generate the basis functions allow clustering based on spectral characteristics rather than spectral channels and provide both noise and data reduction. Histogram splittingmore » of the fit parameters is then used as a means of producing an unsupervised classification. Unlike current unsupervised classification techniques that rely primarily on Euclidian distance measures to determine similarity, the unsupervised classification technique uses the natural splitting of the fit parameters associated with the basis functions creating clusters that are similar in terms of physical parameters. The data set used in this work utilizes the publicly available data collected at Indian Pines, Indiana. This data set provides reference data allowing for comparisons of the efficacy of different unsupervised data analysis. The unsupervised histogram splitting technique presented in this paper is shown to be better than the standard unsupervised ISODATA clustering technique with an overall accuracy of 34.3/19.0% before merging and 40.9/39.2% after merging. Finally, this improvement is also seen as an improvement of kappa before/after merging of 24.8/30.5 for the histogram splitting technique compared to 15.8/28.5 for ISODATA.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCann, Cooper; Repasky, Kevin S.; Morin, Mikindra
Hyperspectral image analysis has benefited from an array of methods that take advantage of the increased spectral depth compared to multispectral sensors; however, the focus of these developments has been on supervised classification methods. Lack of a priori knowledge regarding land cover characteristics can make unsupervised classification methods preferable under certain circumstances. An unsupervised classification technique is presented in this paper that utilizes physically relevant basis functions to model the reflectance spectra. These fit parameters used to generate the basis functions allow clustering based on spectral characteristics rather than spectral channels and provide both noise and data reduction. Histogram splittingmore » of the fit parameters is then used as a means of producing an unsupervised classification. Unlike current unsupervised classification techniques that rely primarily on Euclidian distance measures to determine similarity, the unsupervised classification technique uses the natural splitting of the fit parameters associated with the basis functions creating clusters that are similar in terms of physical parameters. The data set used in this work utilizes the publicly available data collected at Indian Pines, Indiana. This data set provides reference data allowing for comparisons of the efficacy of different unsupervised data analysis. The unsupervised histogram splitting technique presented in this paper is shown to be better than the standard unsupervised ISODATA clustering technique with an overall accuracy of 34.3/19.0% before merging and 40.9/39.2% after merging. Finally, this improvement is also seen as an improvement of kappa before/after merging of 24.8/30.5 for the histogram splitting technique compared to 15.8/28.5 for ISODATA.« less
An objective and parsimonious approach for classifying natural flow regimes at a continental scale
NASA Astrophysics Data System (ADS)
Archfield, S. A.; Kennen, J.; Carlisle, D.; Wolock, D.
2013-12-01
Hydroecological stream classification--the process of grouping streams by similar hydrologic responses and, thereby, similar aquatic habitat--has been widely accepted and is often one of the first steps towards developing ecological flow targets. Despite its importance, the last national classification of streamgauges was completed about 20 years ago. A new classification of 1,534 streamgauges in the contiguous United States is presented using a novel and parsimonious approach to understand similarity in ecological streamflow response. This new classification approach uses seven fundamental daily streamflow statistics (FDSS) rather than winnowing down an uncorrelated subset from 200 or more ecologically relevant streamflow statistics (ERSS) commonly used in hydroecological classification studies. The results of this investigation demonstrate that the distributions of 33 tested ERSS are consistently different among the classes derived from the seven FDSS. It is further shown that classification based solely on the 33 ERSS generally does a poorer job in grouping similar streamgauges than the classification based on the seven FDSS. This new classification approach has the additional advantages of overcoming some of the subjectivity associated with the selection of the classification variables and provides a set of robust continental-scale classes of US streamgauges.
Nationwide forestry applications program. Analysis of forest classification accuracy
NASA Technical Reports Server (NTRS)
Congalton, R. G.; Mead, R. A.; Oderwald, R. G.; Heinen, J. (Principal Investigator)
1981-01-01
The development of LANDSAT classification accuracy assessment techniques, and of a computerized system for assessing wildlife habitat from land cover maps are considered. A literature review on accuracy assessment techniques and an explanation for the techniques development under both projects are included along with listings of the computer programs. The presentations and discussions at the National Working Conference on LANDSAT Classification Accuracy are summarized. Two symposium papers which were published on the results of this project are appended.
Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra
2015-01-01
Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. PMID:25830807
Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra
2015-01-01
Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.
Rock classification based on resistivity patterns in electrical borehole wall images
NASA Astrophysics Data System (ADS)
Linek, Margarete; Jungmann, Matthias; Berlage, Thomas; Pechnig, Renate; Clauser, Christoph
2007-06-01
Electrical borehole wall images represent grey-level-coded micro-resistivity measurements at the borehole wall. Different scientific methods have been implemented to transform image data into quantitative log curves. We introduce a pattern recognition technique applying texture analysis, which uses second-order statistics based on studying the occurrence of pixel pairs. We calculate so-called Haralick texture features such as contrast, energy, entropy and homogeneity. The supervised classification method is used for assigning characteristic texture features to different rock classes and assessing the discriminative power of these image features. We use classifiers obtained from training intervals to characterize the entire image data set recovered in ODP hole 1203A. This yields a synthetic lithology profile based on computed texture data. We show that Haralick features accurately classify 89.9% of the training intervals. We obtained misclassification for vesicular basaltic rocks. Hence, further image analysis tools are used to improve the classification reliability. We decompose the 2D image signal by the application of wavelet transformation in order to enhance image objects horizontally, diagonally and vertically. The resulting filtered images are used for further texture analysis. This combined classification based on Haralick features and wavelet transformation improved our classification up to a level of 98%. The application of wavelet transformation increases the consistency between standard logging profiles and texture-derived lithology. Texture analysis of borehole wall images offers the potential to facilitate objective analysis of multiple boreholes with the same lithology.
Selecting reusable components using algebraic specifications
NASA Technical Reports Server (NTRS)
Eichmann, David A.
1992-01-01
A significant hurdle confronts the software reuser attempting to select candidate components from a software repository - discriminating between those components without resorting to inspection of the implementation(s). We outline a mixed classification/axiomatic approach to this problem based upon our lattice-based faceted classification technique and Guttag and Horning's algebraic specification techniques. This approach selects candidates by natural language-derived classification, by their interfaces, using signatures, and by their behavior, using axioms. We briefly outline our problem domain and related work. Lattice-based faceted classifications are described; the reader is referred to surveys of the extensive literature for algebraic specification techniques. Behavioral support for reuse queries is presented, followed by the conclusions.
Correcting evaluation bias of relational classifiers with network cross validation
Neville, Jennifer; Gallagher, Brian; Eliassi-Rad, Tina; ...
2011-01-04
Recently, a number of modeling techniques have been developed for data mining and machine learning in relational and network domains where the instances are not independent and identically distributed (i.i.d.). These methods specifically exploit the statistical dependencies among instances in order to improve classification accuracy. However, there has been little focus on how these same dependencies affect our ability to draw accurate conclusions about the performance of the models. More specifically, the complex link structure and attribute dependencies in relational data violate the assumptions of many conventional statistical tests and make it difficult to use these tests to assess themore » models in an unbiased manner. In this work, we examine the task of within-network classification and the question of whether two algorithms will learn models that will result in significantly different levels of performance. We show that the commonly used form of evaluation (paired t-test on overlapping network samples) can result in an unacceptable level of Type I error. Furthermore, we show that Type I error increases as (1) the correlation among instances increases and (2) the size of the evaluation set increases (i.e., the proportion of labeled nodes in the network decreases). Lastly, we propose a method for network cross-validation that combined with paired t-tests produces more acceptable levels of Type I error while still providing reasonable levels of statistical power (i.e., 1–Type II error).« less
Progress in the detection of neoplastic progress and cancer by Raman spectroscopy
NASA Astrophysics Data System (ADS)
Bakker Schut, Tom C.; Stone, Nicholas; Kendall, Catherine A.; Barr, Hugh; Bruining, Hajo A.; Puppels, Gerwin J.
2000-05-01
Early detection of cancer is important because of the improved survival rates when the cancer is treated early. We study the application of NIR Raman spectroscopy for detection of dysplasia because this technique is sensitive to the small changes in molecular invasive in vivo detection using fiber-optic probes. The result of an in vitro study to detect neoplastic progress of esophageal Barrett's esophageal tissue will be presented. Using multivariate statistics, we developed three different linear discriminant analysis classification models to predict tissue type on the basis of the measured spectrum. Spectra of normal, metaplastic and dysplasia tissue could be discriminated with an accuracy of up to 88 percent. Therefore Raman spectroscopy seems to be a very suitable technique to detect dysplasia in Barrett's esophageal tissue.
Karimi, Mohammad H; Asemani, Davud
2014-05-01
Ceramic and tile industries should indispensably include a grading stage to quantify the quality of products. Actually, human control systems are often used for grading purposes. An automatic grading system is essential to enhance the quality control and marketing of the products. Since there generally exist six different types of defects originating from various stages of tile manufacturing lines with distinct textures and morphologies, many image processing techniques have been proposed for defect detection. In this paper, a survey has been made on the pattern recognition and image processing algorithms which have been used to detect surface defects. Each method appears to be limited for detecting some subgroup of defects. The detection techniques may be divided into three main groups: statistical pattern recognition, feature vector extraction and texture/image classification. The methods such as wavelet transform, filtering, morphology and contourlet transform are more effective for pre-processing tasks. Others including statistical methods, neural networks and model-based algorithms can be applied to extract the surface defects. Although, statistical methods are often appropriate for identification of large defects such as Spots, but techniques such as wavelet processing provide an acceptable response for detection of small defects such as Pinhole. A thorough survey is made in this paper on the existing algorithms in each subgroup. Also, the evaluation parameters are discussed including supervised and unsupervised parameters. Using various performance parameters, different defect detection algorithms are compared and evaluated. Copyright © 2013 ISA. Published by Elsevier Ltd. All rights reserved.
Hu, Fei; Cheng, Yayun; Gui, Liangqi; Wu, Liang; Zhang, Xinyi; Peng, Xiaohui; Su, Jinlong
2016-11-01
The polarization properties of thermal millimeter-wave emission capture inherent information of objects, e.g., material composition, shape, and surface features. In this paper, a polarization-based material-classification technique using passive millimeter-wave polarimetric imagery is presented. Linear polarization ratio (LPR) is created to be a new feature discriminator that is sensitive to material type and to remove the reflected ambient radiation effect. The LPR characteristics of several common natural and artificial materials are investigated by theoretical and experimental analysis. Based on a priori information about LPR characteristics, the optimal range of incident angle and the classification criterion are discussed. Simulation and measurement results indicate that the presented classification technique is effective for distinguishing between metals and dielectrics. This technique suggests possible applications for outdoor metal target detection in open scenes.
NASA Astrophysics Data System (ADS)
Yang, Jing; Zammit, Christian; Dudley, Bruce
2017-04-01
The phenomenon of losing and gaining in rivers normally takes place in lowland where often there are various, sometimes conflicting uses for water resources, e.g., agriculture, industry, recreation, and maintenance of ecosystem function. To better support water allocation decisions, it is crucial to understand the location and seasonal dynamics of these losses and gains. We present a statistical methodology to predict losing and gaining river reaches in New Zealand based on 1) information surveys with surface water and groundwater experts from regional government, 2) A collection of river/watershed characteristics, including climate, soil and hydrogeologic information, and 3) the random forests technique. The surveys on losing and gaining reaches were conducted face-to-face at 16 New Zealand regional government authorities, and climate, soil, river geometry, and hydrogeologic data from various sources were collected and compiled to represent river/watershed characteristics. The random forests technique was used to build up the statistical relationship between river reach status (gain and loss) and river/watershed characteristics, and then to predict for river reaches at Strahler order one without prior losing and gaining information. Results show that the model has a classification error of around 10% for "gain" and "loss". The results will assist further research, and water allocation decisions in lowland New Zealand.
ERIC Educational Resources Information Center
Vaughn, Brandon K.; Wang, Qui
2009-01-01
Many areas in educational and psychological research involve the use of classification statistical analysis. For example, school districts might be interested in attaining variables that provide optimal prediction of school dropouts. In psychology, a researcher might be interested in the classification of a subject into a particular psychological…
NASA Technical Reports Server (NTRS)
Messmore, J. A.
1976-01-01
The feasibility of using digital satellite imagery and automatic data processing techniques as a means of mapping swamp forest vegetation was considered, using multispectral scanner data acquired by the LANDSAT-1 satellite. The site for this investigation was the Dismal Swamp, a 210,000 acre swamp forest located south of Suffolk, Va. on the Virginia-North Carolina border. Two basic classification strategies were employed. The initial classification utilized unsupervised techniques which produced a map of the swamp indicating the distribution of thirteen forest spectral classes. These classes were later combined into three informational categories: Atlantic white cedar (Chamaecyparis thyoides), Loblolly pine (Pinus taeda), and deciduous forest. The subsequent classification employed supervised techniques which mapped Atlantic white cedar, Loblolly pine, deciduous forest, water and agriculture within the study site. A classification accuracy of 82.5% was produced by unsupervised techniques compared with 89% accuracy using supervised techniques.
NASA Technical Reports Server (NTRS)
Botkin, Daniel B.
1987-01-01
The analysis of ground-truth data from the boreal forest plots in the Superior National Forest, Minnesota, was completed. Development of statistical methods was completed for dimension analysis (equations to estimate the biomass of trees from measurements of diameter and height). The dimension-analysis equations were applied to the data obtained from ground-truth plots, to estimate the biomass. Classification and analyses of remote sensing images of the Superior National Forest were done as a test of the technique to determine forest biomass and ecological state by remote sensing. Data was archived on diskette and tape and transferred to UCSB to be used in subsequent research.
Crop identification and area estimation over large geographic areas using LANDSAT MSS data
NASA Technical Reports Server (NTRS)
Bauer, M. E. (Principal Investigator)
1977-01-01
The author has identified the following significant results. LANDSAT MSS data was adequate to accurately identify wheat in Kansas; corn and soybean estimates in Indiana were less accurate. Computer-aided analysis techniques were effectively used to extract crop identification information from LANDSAT data. Systematic sampling of entire counties made possible by computer classification methods resulted in very precise area estimates at county, district, and state levels. Training statistics were successfully extended from one county to other counties having similar crops and soils if the training areas sampled the total variation of the area to be classified.
Evaluation of large area crop estimation techniques using LANDSAT and ground-derived data. [Missouri
NASA Technical Reports Server (NTRS)
Amis, M. L.; Lennington, R. K.; Martin, M. V.; Mcguire, W. G.; Shen, S. S. (Principal Investigator)
1981-01-01
The results of the Domestic Crops and Land Cover Classification and Clustering study on large area crop estimation using LANDSAT and ground truth data are reported. The current crop area estimation approach of the Economics and Statistics Service of the U.S. Department of Agriculture was evaluated in terms of the factors that are likely to influence the bias and variance of the estimator. Also, alternative procedures involving replacements for the clustering algorithm, the classifier, or the regression model used in the original U.S. Department of Agriculture procedures were investigated.
NASA Astrophysics Data System (ADS)
Alexakis, Dimitris; Hadjimitsis, Diofantos; Agapiou, Athos; Themistocleous, Kyriacos; Retalis, Adrianos
2011-11-01
The increase of flood inundation occuring in different regions all over the world have enhanced the need for effective flood risk management. As floods frequency is increasing with a steady rate due to ever increasing human activities on physical floodplains there is a respectively increasing of financial destructive impact of floods. A flood can be determined as a mass of water that produces runoff on land that is not normally covered by water. However, earth observation techniques such as satellite remote sensing can contribute toward a more efficient flood risk mapping according to EU Directives of 2007/60. This study strives to highlight the need of digital mapping of urban sprawl in a catchment area in Cyprus and the assessment of its contribution to flood risk. The Yialias river (Nicosia, Cyprus) was selected as case study where devastating flash floods events took place at 2003 and 2009. In order to search the diachronic land cover regime of the study area multi-temporal satellite imagery was processed and analyzed (e.g Landsat TMETM+, Aster). The land cover regime was examined in detail by using sophisticated post-processing classification algorithms such as Maximum Likelihood, Parallelepiped Algorithm, Minimum Distance, Spectral Angle and Isodata. Texture features were calculated using the Grey Level Co-Occurence Matrix. In addition three classification techniques were compared : multispectral classification, texture based classification and a combination of both. The classification products were compared and evaluated for their accuracy. Moreover, a knowledge-rule method is proposed based on spectral, texture and shape features in order to create efficient land use and land cover maps of the study area. Morphometric parameters such as stream frequency, drainage density and elongation ratio were calculated in order to extract the basic watershed characteristics. In terms of the impacts of land use/cover on flooding, GIS and Fragstats tool were used to detect identifying trends, both visually and statistically, resulting from land use changes in a flood prone area such as Yialias by the use of spatial metrics. The results indicated that there is a considerable increase of urban areas cover during the period of the last 30 years. All these denoted that one of the main driving force of the increasing flood risk in catchment areas in Cyprus is generally associated to human activities.
NASA Astrophysics Data System (ADS)
Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.
2017-12-01
Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.
Biometric Authentication for Gender Classification Techniques: A Review
NASA Astrophysics Data System (ADS)
Mathivanan, P.; Poornima, K.
2017-12-01
One of the challenging biometric authentication applications is gender identification and age classification, which captures gait from far distance and analyze physical information of the subject such as gender, race and emotional state of the subject. It is found that most of the gender identification techniques have focused only with frontal pose of different human subject, image size and type of database used in the process. The study also classifies different feature extraction process such as, Principal Component Analysis (PCA) and Local Directional Pattern (LDP) that are used to extract the authentication features of a person. This paper aims to analyze different gender classification techniques that help in evaluating strength and weakness of existing gender identification algorithm. Therefore, it helps in developing a novel gender classification algorithm with less computation cost and more accuracy. In this paper, an overview and classification of different gender identification techniques are first presented and it is compared with other existing human identification system by means of their performance.
Decision trees in epidemiological research.
Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone
2017-01-01
In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.
NASA Technical Reports Server (NTRS)
Nguyen, Andrew; Gole, Alexander; Randall, Jarom; Dlott, Glade; Zhang, Sylvia; Alfaro, Brian; Schmidt, Cindy; Skiles, J. W.
2011-01-01
Mapping and predicting the spatial distribution of invasive plant species is central to habitat management, however difficult to implement at landscape and regional scales. Remote sensing techniques can reduce the impact field campaigns have on these ecologically sensitive areas and can provide a regional and multi-temporal view of invasive species spread. Invasive perennial pepperweed (Lepidium latifolium) is now widespread in fragmented estuaries of the South San Francisco Bay, and is shown to degrade native vegetation in estuaries and adjacent habitats, thereby reducing forage and shelter for wildlife. The purpose of this study is to map the present distribution of pepperweed in estuarine areas of the South San Francisco Bay Salt Pond Restoration Project (Alviso, CA), and create a habitat suitability model to predict future spread. Pepperweed reflectance data were collected in-situ with a GER 1500 spectroradiometer along with 88 corresponding pepperweed presence and absence points used for building the statistical models. The spectral angle mapper (SAM) classification algorithm was used to distinguish the reflectance spectrum of pepperweed and map its distribution using an image from EO-1 Hyperion. To map pepperweed, we performed a supervised classification on an ASTER image with a resulting classification accuracy of 71.8%. We generated a weighted overlay analysis model within a geographic information system (GIS) framework to predict areas in the study site most susceptible to pepperweed colonization. Variables for the model included propensity for disturbance, status of pond restoration, proximity to water channels, and terrain curvature. A Generalized Additive Model (GAM) was also used to generate a probability map and investigate the statistical probability that each variable contributed to predict pepperweed spread. Results from the GAM revealed distance to channels, distance to ponds and curvature were statistically significant (p < 0.01) in determining the locations of suitable pepperweed habitats.
Slip, David J.; Hocking, David P.; Harcourt, Robert G.
2016-01-01
Constructing activity budgets for marine animals when they are at sea and cannot be directly observed is challenging, but recent advances in bio-logging technology offer solutions to this problem. Accelerometers can potentially identify a wide range of behaviours for animals based on unique patterns of acceleration. However, when analysing data derived from accelerometers, there are many statistical techniques available which when applied to different data sets produce different classification accuracies. We investigated a selection of supervised machine learning methods for interpreting behavioural data from captive otariids (fur seals and sea lions). We conducted controlled experiments with 12 seals, where their behaviours were filmed while they were wearing 3-axis accelerometers. From video we identified 26 behaviours that could be grouped into one of four categories (foraging, resting, travelling and grooming) representing key behaviour states for wild seals. We used data from 10 seals to train four predictive classification models: stochastic gradient boosting (GBM), random forests, support vector machine using four different kernels and a baseline model: penalised logistic regression. We then took the best parameters from each model and cross-validated the results on the two seals unseen so far. We also investigated the influence of feature statistics (describing some characteristic of the seal), testing the models both with and without these. Cross-validation accuracies were lower than training accuracy, but the SVM with a polynomial kernel was still able to classify seal behaviour with high accuracy (>70%). Adding feature statistics improved accuracies across all models tested. Most categories of behaviour -resting, grooming and feeding—were all predicted with reasonable accuracy (52–81%) by the SVM while travelling was poorly categorised (31–41%). These results show that model selection is important when classifying behaviour and that by using animal characteristics we can strengthen the overall accuracy. PMID:28002450
Kapellusch, Jay M; Bao, Stephen S; Silverstein, Barbara A; Merryweather, Andrew S; Thiese, Mathew S; Hegmann, Kurt T; Garg, Arun
2017-12-01
The Strain Index (SI) and the American Conference of Governmental Industrial Hygienists (ACGIH) Threshold Limit Value for Hand Activity Level (TLV for HAL) use different constituent variables to quantify task physical exposures. Similarly, time-weighted-average (TWA), Peak, and Typical exposure techniques to quantify physical exposure from multi-task jobs make different assumptions about each task's contribution to the whole job exposure. Thus, task and job physical exposure classifications differ depending upon which model and technique are used for quantification. This study examines exposure classification agreement, disagreement, correlation, and magnitude of classification differences between these models and techniques. Data from 710 multi-task job workers performing 3,647 tasks were analyzed using the SI and TLV for HAL models, as well as with the TWA, Typical and Peak job exposure techniques. Physical exposures were classified as low, medium, and high using each model's recommended, or a priori limits. Exposure classification agreement and disagreement between models (SI, TLV for HAL) and between job exposure techniques (TWA, Typical, Peak) were described and analyzed. Regardless of technique, the SI classified more tasks as high exposure than the TLV for HAL, and the TLV for HAL classified more tasks as low exposure. The models agreed on 48.5% of task classifications (kappa = 0.28) with 15.5% of disagreement between low and high exposure categories. Between-technique (i.e., TWA, Typical, Peak) agreement ranged from 61-93% (kappa: 0.16-0.92) depending on whether the SI or TLV for HAL was used. There was disagreement between the SI and TLV for HAL and between the TWA, Typical and Peak techniques. Disagreement creates uncertainty for job design, job analysis, risk assessments, and developing interventions. Task exposure classifications from the SI and TLV for HAL might complement each other. However, TWA, Typical, and Peak job exposure techniques all have limitations. Part II of this article examines whether the observed differences between these models and techniques produce different exposure-response relationships for predicting prevalence of carpal tunnel syndrome.
Improving permafrost distribution modelling using feature selection algorithms
NASA Astrophysics Data System (ADS)
Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail
2016-04-01
The availability of an increasing number of spatial data on the occurrence of mountain permafrost allows the employment of machine learning (ML) classification algorithms for modelling the distribution of the phenomenon. One of the major problems when dealing with high-dimensional dataset is the number of input features (variables) involved. Application of ML classification algorithms to this large number of variables leads to the risk of overfitting, with the consequence of a poor generalization/prediction. For this reason, applying feature selection (FS) techniques helps simplifying the amount of factors required and improves the knowledge on adopted features and their relation with the studied phenomenon. Moreover, taking away irrelevant or redundant variables from the dataset effectively improves the quality of the ML prediction. This research deals with a comparative analysis of permafrost distribution models supported by FS variable importance assessment. The input dataset (dimension = 20-25, 10 m spatial resolution) was constructed using landcover maps, climate data and DEM derived variables (altitude, aspect, slope, terrain curvature, solar radiation, etc.). It was completed with permafrost evidences (geophysical and thermal data and rock glacier inventories) that serve as training permafrost data. Used FS algorithms informed about variables that appeared less statistically important for permafrost presence/absence. Three different algorithms were compared: Information Gain (IG), Correlation-based Feature Selection (CFS) and Random Forest (RF). IG is a filter technique that evaluates the worth of a predictor by measuring the information gain with respect to the permafrost presence/absence. Conversely, CFS is a wrapper technique that evaluates the worth of a subset of predictors by considering the individual predictive ability of each variable along with the degree of redundancy between them. Finally, RF is a ML algorithm that performs FS as part of its overall operation. It operates by constructing a large collection of decorrelated classification trees, and then predicts the permafrost occurrence through a majority vote. With the so-called out-of-bag (OOB) error estimate, the classification of permafrost data can be validated as well as the contribution of each predictor can be assessed. The performances of compared permafrost distribution models (computed on independent testing sets) increased with the application of FS algorithms on the original dataset and irrelevant or redundant variables were removed. As a consequence, the process provided faster and more cost-effective predictors and a better understanding of the underlying structures residing in permafrost data. Our work demonstrates the usefulness of a feature selection step prior to applying a machine learning algorithm. In fact, permafrost predictors could be ranked not only based on their heuristic and subjective importance (expert knowledge), but also based on their statistical relevance in relation of the permafrost distribution.
Photoacoustic discrimination of vascular and pigmented lesions using classical and Bayesian methods
NASA Astrophysics Data System (ADS)
Swearingen, Jennifer A.; Holan, Scott H.; Feldman, Mary M.; Viator, John A.
2010-01-01
Discrimination of pigmented and vascular lesions in skin can be difficult due to factors such as size, subungual location, and the nature of lesions containing both melanin and vascularity. Misdiagnosis may lead to precancerous or cancerous lesions not receiving proper medical care. To aid in the rapid and accurate diagnosis of such pathologies, we develop a photoacoustic system to determine the nature of skin lesions in vivo. By irradiating skin with two laser wavelengths, 422 and 530 nm, we induce photoacoustic responses, and the relative response at these two wavelengths indicates whether the lesion is pigmented or vascular. This response is due to the distinct absorption spectrum of melanin and hemoglobin. In particular, pigmented lesions have ratios of photoacoustic amplitudes of approximately 1.4 to 1 at the two wavelengths, while vascular lesions have ratios of about 4.0 to 1. Furthermore, we consider two statistical methods for conducting classification of lesions: standard multivariate analysis classification techniques and a Bayesian-model-based approach. We study 15 human subjects with eight vascular and seven pigmented lesions. Using the classical method, we achieve a perfect classification rate, while the Bayesian approach has an error rate of 20%.
Formisano, Elia; De Martino, Federico; Valente, Giancarlo
2008-09-01
Machine learning and pattern recognition techniques are being increasingly employed in functional magnetic resonance imaging (fMRI) data analysis. By taking into account the full spatial pattern of brain activity measured simultaneously at many locations, these methods allow detecting subtle, non-strictly localized effects that may remain invisible to the conventional analysis with univariate statistical methods. In typical fMRI applications, pattern recognition algorithms "learn" a functional relationship between brain response patterns and a perceptual, cognitive or behavioral state of a subject expressed in terms of a label, which may assume discrete (classification) or continuous (regression) values. This learned functional relationship is then used to predict the unseen labels from a new data set ("brain reading"). In this article, we describe the mathematical foundations of machine learning applications in fMRI. We focus on two methods, support vector machines and relevance vector machines, which are respectively suited for the classification and regression of fMRI patterns. Furthermore, by means of several examples and applications, we illustrate and discuss the methodological challenges of using machine learning algorithms in the context of fMRI data analysis.
Talbot, S. S.; Shasby, M.B.; Bailey, T.N.
1985-01-01
A Landsat-based vegetation map was prepared for Kenai National Wildlife Refuge and adjacent lands, 2 million and 2.5 million acres respectively. The refuge lies within the middle boreal sub zone of south central Alaska. Seven major classes and sixteen subclasses were recognized: forest (closed needleleaf, needleleaf woodland, mixed); deciduous scrub (lowland and montane, subalpine); dwarf scrub (dwarf shrub tundra, lichen tundra, dwarf shrub and lichen tundra, dwarf shrub peatland, string bog/wetlands); herbaceous (graminoid meadows and marshes); scarcely vegetated areas ; water (clear, moderately turbid, highly turbid); and glaciers. The methodology employed a cluster-block technique. Sample areas were described based on a combination of helicopter-ground survey, aerial photo interpretation, and digital Landsat data. Major steps in the Landsat analysis involved: preprocessing (geometric connection), spectral class labeling of sample areas, derivation of statistical parameters for spectral classes, preliminary classification of the entree study area using a maximum-likelihood algorithm, and final classification through ancillary information such as digital elevation data. The vegetation map (scale 1:250,000) was a pioneering effort since there were no intermediate-sclae maps of the area. Representative of distinctive regional patterns, the map was suitable for use in comprehensive conservation planning and wildlife management.
Data Analytics for Smart Parking Applications.
Piovesan, Nicola; Turi, Leo; Toigo, Enrico; Martinez, Borja; Rossi, Michele
2016-09-23
We consider real-life smart parking systems where parking lot occupancy data are collected from field sensor devices and sent to backend servers for further processing and usage for applications. Our objective is to make these data useful to end users, such as parking managers, and, ultimately, to citizens. To this end, we concoct and validate an automated classification algorithm having two objectives: (1) outlier detection: to detect sensors with anomalous behavioral patterns, i.e., outliers; and (2) clustering: to group the parking sensors exhibiting similar patterns into distinct clusters. We first analyze the statistics of real parking data, obtaining suitable simulation models for parking traces. We then consider a simple classification algorithm based on the empirical complementary distribution function of occupancy times and show its limitations. Hence, we design a more sophisticated algorithm exploiting unsupervised learning techniques (self-organizing maps). These are tuned following a supervised approach using our trace generator and are compared against other clustering schemes, namely expectation maximization, k-means clustering and DBSCAN, considering six months of data from a real sensor deployment. Our approach is found to be superior in terms of classification accuracy, while also being capable of identifying all of the outliers in the dataset.
Random whole metagenomic sequencing for forensic discrimination of soils.
Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian
2014-01-01
Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.
Van Wagtendonk, Jan W.; Root, Ralph R.
2003-01-01
The objective of this study was to test the applicability of using Normalized Difference Vegetation Index (NDVI) values derived from a temporal sequence of six Landsat Thematic Mapper (TM) scenes to map fuel models for Yosemite National Park, USA. An unsupervised classification algorithm was used to define 30 unique spectral-temporal classes of NDVI values. A combination of graphical, statistical and visual techniques was used to characterize the 30 classes and identify those that responded similarly and could be combined into fuel models. The final classification of fuel models included six different types: short annual and perennial grasses, tall perennial grasses, medium brush and evergreen hardwoods, short-needled conifers with no heavy fuels, long-needled conifers and deciduous hardwoods, and short-needled conifers with a component of heavy fuels. The NDVI, when analysed over a season of phenologically distinct periods along with ancillary data, can elicit information necessary to distinguish fuel model types. Fuels information derived from remote sensors has proven to be useful for initial classification of fuels and has been applied to fire management situations on the ground.
The footprints of visual attention in the Posner cueing paradigm revealed by classification images
NASA Technical Reports Server (NTRS)
Eckstein, Miguel P.; Shimozaki, Steven S.; Abbey, Craig K.
2002-01-01
In the Posner cueing paradigm, observers' performance in detecting a target is typically better in trials in which the target is present at the cued location than in trials in which the target appears at the uncued location. This effect can be explained in terms of a Bayesian observer where visual attention simply weights the information differently at the cued (attended) and uncued (unattended) locations without a change in the quality of processing at each location. Alternatively, it could also be explained in terms of visual attention changing the shape of the perceptual filter at the cued location. In this study, we use the classification image technique to compare the human perceptual filters at the cued and uncued locations in a contrast discrimination task. We did not find statistically significant differences between the shapes of the inferred perceptual filters across the two locations, nor did the observed differences account for the measured cueing effects in human observers. Instead, we found a difference in the magnitude of the classification images, supporting the idea that visual attention changes the weighting of information at the cued and uncued location, but does not change the quality of processing at each individual location.
Data Analytics for Smart Parking Applications
Piovesan, Nicola; Turi, Leo; Toigo, Enrico; Martinez, Borja; Rossi, Michele
2016-01-01
We consider real-life smart parking systems where parking lot occupancy data are collected from field sensor devices and sent to backend servers for further processing and usage for applications. Our objective is to make these data useful to end users, such as parking managers, and, ultimately, to citizens. To this end, we concoct and validate an automated classification algorithm having two objectives: (1) outlier detection: to detect sensors with anomalous behavioral patterns, i.e., outliers; and (2) clustering: to group the parking sensors exhibiting similar patterns into distinct clusters. We first analyze the statistics of real parking data, obtaining suitable simulation models for parking traces. We then consider a simple classification algorithm based on the empirical complementary distribution function of occupancy times and show its limitations. Hence, we design a more sophisticated algorithm exploiting unsupervised learning techniques (self-organizing maps). These are tuned following a supervised approach using our trace generator and are compared against other clustering schemes, namely expectation maximization, k-means clustering and DBSCAN, considering six months of data from a real sensor deployment. Our approach is found to be superior in terms of classification accuracy, while also being capable of identifying all of the outliers in the dataset. PMID:27669259
Intelligent image processing for vegetation classification using multispectral LANDSAT data
NASA Astrophysics Data System (ADS)
Santos, Stewart R.; Flores, Jorge L.; Garcia-Torales, G.
2015-09-01
We propose an intelligent computational technique for analysis of vegetation imaging, which are acquired with multispectral scanner (MSS) sensor. This work focuses on intelligent and adaptive artificial neural network (ANN) methodologies that allow segmentation and classification of spectral remote sensing (RS) signatures, in order to obtain a high resolution map, in which we can delimit the wooded areas and quantify the amount of combustible materials present into these areas. This could provide important information to prevent fires and deforestation of wooded areas. The spectral RS input data, acquired by the MSS sensor, are considered in a random propagation remotely sensed scene with unknown statistics for each Thematic Mapper (TM) band. Performing high-resolution reconstruction and adding these spectral values with neighbor pixels information from each TM band, we can include contextual information into an ANN. The biggest challenge in conventional classifiers is how to reduce the number of components in the feature vector, while preserving the major information contained in the data, especially when the dimensionality of the feature space is high. Preliminary results show that the Adaptive Modified Neural Network method is a promising and effective spectral method for segmentation and classification in RS images acquired with MSS sensor.
voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data.
Zararsiz, Gokmen; Goksuluk, Dincer; Klaus, Bernd; Korkmaz, Selcuk; Eldem, Vahap; Karabulut, Erdem; Ozturk, Ahmet
2017-01-01
RNA-Seq is a recent and efficient technique that uses the capabilities of next-generation sequencing technology for characterizing and quantifying transcriptomes. One important task using gene-expression data is to identify a small subset of genes that can be used to build diagnostic classifiers particularly for cancer diseases. Microarray based classifiers are not directly applicable to RNA-Seq data due to its discrete nature. Overdispersion is another problem that requires careful modeling of mean and variance relationship of the RNA-Seq data. In this study, we present voomDDA classifiers: variance modeling at the observational level (voom) extensions of the nearest shrunken centroids (NSC) and the diagonal discriminant classifiers. VoomNSC is one of these classifiers and brings voom and NSC approaches together for the purpose of gene-expression based classification. For this purpose, we propose weighted statistics and put these weighted statistics into the NSC algorithm. The VoomNSC is a sparse classifier that models the mean-variance relationship using the voom method and incorporates voom's precision weights into the NSC classifier via weighted statistics. A comprehensive simulation study was designed and four real datasets are used for performance assessment. The overall results indicate that voomNSC performs as the sparsest classifier. It also provides the most accurate results together with power-transformed Poisson linear discriminant analysis, rlog transformed support vector machines and random forests algorithms. In addition to prediction purposes, the voomNSC classifier can be used to identify the potential diagnostic biomarkers for a condition of interest. Through this work, statistical learning methods proposed for microarrays can be reused for RNA-Seq data. An interactive web application is freely available at http://www.biosoft.hacettepe.edu.tr/voomDDA/.
Knez, J; Saridogan, E; Van Den Bosch, T; Mavrelos, D; Ambler, G; Jurkovic, D
2018-04-01
What would be a potential impact of implementing the new ESHRE/European Society of Gynaecological Endoscopy (ESGE) female genital anomalies classification system on the management of women with previous diagnosis of arcuate uteri based on the modified American Society for Reproductive Medicine (ASRM) criteria? A significant number of women with previous diagnosis of arcuate uteri are reclassified as having partial septate uteri according to the new ESHRE/ESGE classification system which may increase the number of remedial surgical procedures. The ESHRE/ESGE classification system has defined measurement techniques, reference points and specific cut-offs to facilitate the differentiation between normal and septate uteri. These criteria have been arbitrarily defined and they rely on the measurement of uterine wall thickness and depth of distortion of uterine fundus. This was a retrospective cohort study. We searched our ultrasound clinic database from January 2011 to December 2014 to identify all women diagnosed with arcuate uterus on three-dimensional ultrasound according to the modified ASRM criteria. For each woman, the ultrasound images were stored in our clinical database and they were re-examined according to ESHRE/ESGE specifications. The presence and location of all acquired uterine anomalies, such as fibroids or adenomyosis was noted. We applied the two diagnostic approaches as specified by the ESHRE/ESGE classification: the main option (MO) and the alternative option (AO). We used the Kappa statistic to quantify the agreement between the two approaches. We also compared the number of previous miscarriages in women with normal and partial septate uteri according to the ESHRE/ESGE classification. Non-parametric Mann-Whitney and Kruskal-Wallis tests were used for the analyses and receiver-operating characteristic curves were constructed to assess the predictive values of the calculated uterine distortion indices for the detection of women at risk of suffering multiple pregnancy losses. We included 270 women diagnosed with arcuate uterus in the study. In all, 77 women (28.5%, 95% confidence interval (CI) 23.1-33.9) had evidence of fibroids or adenomyosis. These abnormalities precluded the application of either proposed ESHRE/ESGE techniques to assess uterine morphology in 25 women (9.3%, 95% CI 5.8-12.7). When using the MO, 138/237 (58.2%, 95% CI 51.9-64.3) women were diagnosed with partial septate uterus compared to 61/230 (26.5%, 95% CI 21.2-32.6) women when using the AO. In 222 women in whom we were able to apply both MO and AO, there was agreement in the diagnosis of septate uterus between the two techniques in 146/222 cases (65.8%, 95% CI 59.3-71.7; Kappa 0.42, 95%CI 0.35-0.5). There was no statistical difference in the proportion of women with history of previous multiple miscarriages between those diagnosed with normal or partial septate uteri using either MO (6.2%, 95% CI 2.9-12.9 vs. 9.5%, 95% CI 5.6-15.6; P = 0.47) or AO (7.2%, 95% CI 4.2-12.1 vs. 11.7%, 95% CI 5.8-22.2; P = 0.29). This study was retrospective in nature and the definition of arcuate uterus used in the study is not universally accepted. The reproductive history data were collected retrospectively and therefore may be prone to bias. There are methodological weaknesses in the new ESHRE/ESGE classification system which would need to be addressed in future revisions. There was no significant difference in the past reproductive outcomes between women diagnosed with normal and anomalous uteri and the clinicians should exercise caution when offering surgical correction to women diagnosed with partial septate uteri using the new ESHRE/ESGE classification. No study funding was received and no competing interests are present. N/A.
NASA Astrophysics Data System (ADS)
Schulz, Hans Martin; Thies, Boris; Chang, Shih-Chieh; Bendix, Jörg
2016-03-01
The mountain cloud forest of Taiwan can be delimited from other forest types using a map of the ground fog frequency. In order to create such a frequency map from remotely sensed data, an algorithm able to detect ground fog is necessary. Common techniques for ground fog detection based on weather satellite data cannot be applied to fog occurrences in Taiwan as they rely on several assumptions regarding cloud properties. Therefore a new statistical method for the detection of ground fog in mountainous terrain from MODIS Collection 051 data is presented. Due to the sharpening of input data using MODIS bands 1 and 2, the method provides fog masks in a resolution of 250 m per pixel. The new technique is based on negative correlations between optical thickness and terrain height that can be observed if a cloud that is relatively plane-parallel is truncated by the terrain. A validation of the new technique using camera data has shown that the quality of fog detection is comparable to that of another modern fog detection scheme developed and validated for the temperate zones. The method is particularly applicable to optically thinner water clouds. Beyond a cloud optical thickness of ≈ 40, classification errors significantly increase.
NASA Astrophysics Data System (ADS)
Anitha, J.; Vijila, C. Kezi Selva; Hemanth, D. Jude
2010-02-01
Diabetic retinopathy (DR) is a chronic eye disease for which early detection is highly essential to avoid any fatal results. Image processing of retinal images emerge as a feasible tool for this early diagnosis. Digital image processing techniques involve image classification which is a significant technique to detect the abnormality in the eye. Various automated classification systems have been developed in the recent years but most of them lack high classification accuracy. Artificial neural networks are the widely preferred artificial intelligence technique since it yields superior results in terms of classification accuracy. In this work, Radial Basis function (RBF) neural network based bi-level classification system is proposed to differentiate abnormal DR Images and normal retinal images. The results are analyzed in terms of classification accuracy, sensitivity and specificity. A comparative analysis is performed with the results of the probabilistic classifier namely Bayesian classifier to show the superior nature of neural classifier. Experimental results show promising results for the neural classifier in terms of the performance measures.
Automated simultaneous multiple feature classification of MTI data
NASA Astrophysics Data System (ADS)
Harvey, Neal R.; Theiler, James P.; Balick, Lee K.; Pope, Paul A.; Szymanski, John J.; Perkins, Simon J.; Porter, Reid B.; Brumby, Steven P.; Bloch, Jeffrey J.; David, Nancy A.; Galassi, Mark C.
2002-08-01
Los Alamos National Laboratory has developed and demonstrated a highly capable system, GENIE, for the two-class problem of detecting a single feature against a background of non-feature. In addition to the two-class case, however, a commonly encountered remote sensing task is the segmentation of multispectral image data into a larger number of distinct feature classes or land cover types. To this end we have extended our existing system to allow the simultaneous classification of multiple features/classes from multispectral data. The technique builds on previous work and its core continues to utilize a hybrid evolutionary-algorithm-based system capable of searching for image processing pipelines optimized for specific image feature extraction tasks. We describe the improvements made to the GENIE software to allow multiple-feature classification and describe the application of this system to the automatic simultaneous classification of multiple features from MTI image data. We show the application of the multiple-feature classification technique to the problem of classifying lava flows on Mauna Loa volcano, Hawaii, using MTI image data and compare the classification results with standard supervised multiple-feature classification techniques.
Thomas, Minta; De Brabanter, Kris; De Moor, Bart
2014-05-10
DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade. Disease class predictors can be designed for known disease cases and provide diagnostic confirmation or clarify abnormal cases. The main input to this class predictors are high dimensional data with many variables and few observations. Dimensionality reduction of these features set significantly speeds up the prediction task. Feature selection and feature transformation methods are well known preprocessing steps in the field of bioinformatics. Several prediction tools are available based on these techniques. Studies show that a well tuned Kernel PCA (KPCA) is an efficient preprocessing step for dimensionality reduction, but the available bandwidth selection method for KPCA was computationally expensive. In this paper, we propose a new data-driven bandwidth selection criterion for KPCA, which is related to least squares cross-validation for kernel density estimation. We propose a new prediction model with a well tuned KPCA and Least Squares Support Vector Machine (LS-SVM). We estimate the accuracy of the newly proposed model based on 9 case studies. Then, we compare its performances (in terms of test set Area Under the ROC Curve (AUC) and computational time) with other well known techniques such as whole data set + LS-SVM, PCA + LS-SVM, t-test + LS-SVM, Prediction Analysis of Microarrays (PAM) and Least Absolute Shrinkage and Selection Operator (Lasso). Finally, we assess the performance of the proposed strategy with an existing KPCA parameter tuning algorithm by means of two additional case studies. We propose, evaluate, and compare several mathematical/statistical techniques, which apply feature transformation/selection for subsequent classification, and consider its application in medical diagnostics. Both feature selection and feature transformation perform well on classification tasks. Due to the dynamic selection property of feature selection, it is hard to define significant features for the classifier, which predicts classes of future samples. Moreover, the proposed strategy enjoys a distinctive advantage with its relatively lesser time complexity.
Transperineal ultrasonography: First level exam in IBD patients with perianal disease.
Terracciano, Fulvia; Scalisi, Giuseppe; Bossa, Fabrizio; Scimeca, Daniela; Biscaglia, Giuseppe; Mangiacotti, Michele; Valvano, Maria Rosa; Perri, Francesco; Simeone, Anna; Andriulli, Angelo
2016-08-01
A pelvic magnetic resonance imaging (MRI) represents the front-line method for evaluating perianal disease in patients with inflammatory bowel disease (IBD). Recently, transperineal ultrasonography (TPUS) has been proposed as a simple, safe, time-sparing and useful diagnostic technique to assess different pathological conditions of the pelvic floor. The aim of this prospective single centre study was to evaluate the accuracy of TPUS versus MRI for the detection and classification of perineal disease in IBD patients. From November 2013 to November 2014, 28 IBD patients underwent either TPUS or MRI. Fistulae and abscesses were classified according to Parks' and AGA's classification methods. A concordance was assessed by k statistics. Overall, 33 fistulae and 8 abscesses were recognized by TPUS (30 and 7 by MRI, respectively). The agreement between TPUS and MRI was 75% according to Parks' classification (k=0.67) and 86% according to AGA classification (k=0.83), while it was 36% (k=0.34) for classifying abscesses. TPUS proved to be as accurate as MRI for detecting superficial and small abscesses and for classifying perianal disease. Both examinations may be performed at the initial presentation of the patient, but TPUS is a cheaper, time-sparing procedure. The optimal use of TPUS might be in follow-up patients. Copyright © 2016 Editrice Gastroenterologica Italiana S.r.l. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Abdullahi, Sahra; Schardt, Mathias; Pretzsch, Hans
2017-05-01
Forest structure at stand level plays a key role for sustainable forest management, since the biodiversity, productivity, growth and stability of the forest can be positively influenced by managing its structural diversity. In contrast to field-based measurements, remote sensing techniques offer a cost-efficient opportunity to collect area-wide information about forest stand structure with high spatial and temporal resolution. Especially Interferometric Synthetic Aperture Radar (InSAR), which facilitates worldwide acquisition of 3d information independent from weather conditions and illumination, is convenient to capture forest stand structure. This study purposes an unsupervised two-stage clustering approach for forest structure classification based on height information derived from interferometric X-band SAR data which was performed in complex temperate forest stands of Traunstein forest (South Germany). In particular, a four dimensional input data set composed of first-order height statistics was non-linearly projected on a two-dimensional Self-Organizing Map, spatially ordered according to similarity (based on the Euclidean distance) in the first stage and classified using the k-means algorithm in the second stage. The study demonstrated that X-band InSAR data exhibits considerable capabilities for forest structure classification. Moreover, the unsupervised classification approach achieved meaningful and reasonable results by means of comparison to aerial imagery and LiDAR data.
NASA Astrophysics Data System (ADS)
Baccar, D.; Söffker, D.
2017-11-01
Acoustic Emission (AE) is a suitable method to monitor the health of composite structures in real-time. However, AE-based failure mode identification and classification are still complex to apply due to the fact that AE waves are generally released simultaneously from all AE-emitting damage sources. Hence, the use of advanced signal processing techniques in combination with pattern recognition approaches is required. In this paper, AE signals generated from laminated carbon fiber reinforced polymer (CFRP) subjected to indentation test are examined and analyzed. A new pattern recognition approach involving a number of processing steps able to be implemented in real-time is developed. Unlike common classification approaches, here only CWT coefficients are extracted as relevant features. Firstly, Continuous Wavelet Transform (CWT) is applied to the AE signals. Furthermore, dimensionality reduction process using Principal Component Analysis (PCA) is carried out on the coefficient matrices. The PCA-based feature distribution is analyzed using Kernel Density Estimation (KDE) allowing the determination of a specific pattern for each fault-specific AE signal. Moreover, waveform and frequency content of AE signals are in depth examined and compared with fundamental assumptions reported in this field. A correlation between the identified patterns and failure modes is achieved. The introduced method improves the damage classification and can be used as a non-destructive evaluation tool.
NASA Astrophysics Data System (ADS)
Meyer, J.; White, S.
2005-05-01
Classification of lava morphology on a regional scale contributes to the understanding of the distribution and extent of lava flows at a mid-ocean ridge. Seafloor classification is essential to understand the regional undersea environment at midocean ridges. In this study, the development of a classification scheme is found to identify and extract textural patterns of different lava morphologies along the East Pacific Rise using DSL-120 side-scan and ARGO camera imagery. Application of an accurate image classification technique to side-scan sonar allows us to expand upon the locally available visual ground reference data to make the first comprehensive regional maps of small-scale lava morphology present at a mid-ocean ridge. The submarine lava morphologies focused upon in this study; sheet flows, lobate flows, and pillow flows; have unique textures. Several algorithms were applied to the sonar backscatter intensity images to produce multiple textural image layers useful in distinguishing the different lava morphologies. The intensity and spatially enhanced images were then combined and applied to a hybrid classification technique. The hybrid classification involves two integrated classifiers, a rule-based expert system classifier and a machine learning classifier. The complementary capabilities of the two integrated classifiers provided a higher accuracy of regional seafloor classification compared to using either classifier alone. Once trained, the hybrid classifier can then be applied to classify neighboring images with relative ease. This classification technique has been used to map the lava morphology distribution and infer spatial variability of lava effusion rates along two segments of the East Pacific Rise, 17 deg S and 9 deg N. Future use of this technique may also be useful for attaining temporal information. Repeated documentation of morphology classification in this dynamic environment can be compared to detect regional seafloor change.
Karan, Shivesh Kishore; Samadder, Sukha Ranjan
2016-08-01
One objective of the present study was to evaluate the performance of support vector machine (SVM)-based image classification technique with the maximum likelihood classification (MLC) technique for a rapidly changing landscape of an open-cast mine. The other objective was to assess the change in land use pattern due to coal mining from 2006 to 2016. Assessing the change in land use pattern accurately is important for the development and monitoring of coalfields in conjunction with sustainable development. For the present study, Landsat 5 Thematic Mapper (TM) data of 2006 and Landsat 8 Operational Land Imager (OLI)/Thermal Infrared Sensor (TIRS) data of 2016 of a part of Jharia Coalfield, Dhanbad, India, were used. The SVM classification technique provided greater overall classification accuracy when compared to the MLC technique in classifying heterogeneous landscape with limited training dataset. SVM exceeded MLC in handling a difficult challenge of classifying features having near similar reflectance on the mean signature plot, an improvement of over 11 % was observed in classification of built-up area, and an improvement of 24 % was observed in classification of surface water using SVM; similarly, the SVM technique improved the overall land use classification accuracy by almost 6 and 3 % for Landsat 5 and Landsat 8 images, respectively. Results indicated that land degradation increased significantly from 2006 to 2016 in the study area. This study will help in quantifying the changes and can also serve as a basis for further decision support system studies aiding a variety of purposes such as planning and management of mines and environmental impact assessment.
A comprehensive simulation study on classification of RNA-Seq data.
Zararsız, Gökmen; Goksuluk, Dincer; Korkmaz, Selcuk; Eldem, Vahap; Zararsiz, Gozde Erturk; Duru, Izzet Parug; Ozturk, Ahmet
2017-01-01
RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM classifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html.
Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa
2018-07-01
Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Texture Classification by Texton: Statistical versus Binary
Guo, Zhenhua; Zhang, Zhongcheng; Li, Xiu; Li, Qin; You, Jane
2014-01-01
Using statistical textons for texture classification has shown great success recently. The maximal response 8 (Statistical_MR8), image patch (Statistical_Joint) and locally invariant fractal (Statistical_Fractal) are typical statistical texton algorithms and state-of-the-art texture classification methods. However, there are two limitations when using these methods. First, it needs a training stage to build a texton library, thus the recognition accuracy will be highly depended on the training samples; second, during feature extraction, local feature is assigned to a texton by searching for the nearest texton in the whole library, which is time consuming when the library size is big and the dimension of feature is high. To address the above two issues, in this paper, three binary texton counterpart methods were proposed, Binary_MR8, Binary_Joint, and Binary_Fractal. These methods do not require any training step but encode local feature into binary representation directly. The experimental results on the CUReT, UIUC and KTH-TIPS databases show that binary texton could get sound results with fast feature extraction, especially when the image size is not big and the quality of image is not poor. PMID:24520346
Kuhn, M A; Burch, M; Chinnock, R E; Fenton, M J
2017-10-01
Intravascular ultrasound (IVUS) has been routinely used in some centers to investigate cardiac allograft vasculopathy in pediatric heart transplant recipients. We present an alternative method using more sophisticated imaging software. This study presents a comparison of this method with an established standard method. All patients who had IVUS performed in 2014 were retrospectively evaluated. The standard technique consisted of analysis of 10 operator-selected segments along the vessel. Each study was re-evaluated using a longitudinal technique, taken at every third cardiac cycle, along the entire vessel. Semiautomatic edge detection software was used to detect vessel imaging planes. Measurements included outer and inner diameter, total and luminal area, maximal intimal thickness (MIT), and intimal index. Each IVUS was graded for severity using the Stanford classification. All results were given as mean ± standard deviation (SD). Groups were compared using Student t test. A P value <.05 was considered significant. There were 59 IVUS studies performed on 58 patients. There was no statistically significant difference between outer diameter, inner diameter, or total area. In the longitudinal group, there was a significantly smaller luminal area, higher MIT, and higher intimal index. Using the longitudinal technique, there was an increase in Stanford classification in 20 patients. The longitudinal technique appeared more sensitive in assessing the degree of cardiac allograft vasculopathy and may play a role in the increase in the degree of thickening seen. It may offer an alternative way of grading severity of cardiac allograft vasculopathy in pediatric heart transplant recipients. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Cattaneo, Alessandro; Park, Gyuhae; Farrar, Charles; Mascareñas, David
2012-04-01
The acoustic emission (AE) phenomena generated by a rapid release in the internal stress of a material represent a promising technique for structural health monitoring (SHM) applications. AE events typically result in a discrete number of short-time, transient signals. The challenge associated with capturing these events using classical techniques is that very high sampling rates must be used over extended periods of time. The result is that a very large amount of data is collected to capture a phenomenon that rarely occurs. Furthermore, the high energy consumption associated with the required high sampling rates makes the implementation of high-endurance, low-power, embedded AE sensor nodes difficult to achieve. The relatively rare occurrence of AE events over long time scales implies that these measurements are inherently sparse in the spike domain. The sparse nature of AE measurements makes them an attractive candidate for the application of compressed sampling techniques. Collecting compressed measurements of sparse AE signals will relax the requirements on the sampling rate and memory demands. The focus of this work is to investigate the suitability of compressed sensing techniques for AE-based SHM. The work explores estimating AE signal statistics in the compressed domain for low-power classification applications. In the event compressed classification finds an event of interest, ι1 norm minimization will be used to reconstruct the measurement for further analysis. The impact of structured noise on compressive measurements is specifically addressed. The suitability of a particular algorithm, called Justice Pursuit, to increase robustness to a small amount of arbitrary measurement corruption is investigated.
An objective and parsimonious approach for classifying natural flow regimes at a continental scale
Archfield, Stacey A.; Kennen, Jonathan G.; Carlisle, Daren M.; Wolock, David M.
2014-01-01
Hydro-ecological stream classification-the process of grouping streams by similar hydrologic responses and, by extension, similar aquatic habitat-has been widely accepted and is considered by some to be one of the first steps towards developing ecological flow targets. A new classification of 1543 streamgauges in the contiguous USA is presented by use of a novel and parsimonious approach to understand similarity in ecological streamflow response. This novel classification approach uses seven fundamental daily streamflow statistics (FDSS) rather than winnowing down an uncorrelated subset from 200 or more ecologically relevant streamflow statistics (ERSS) commonly used in hydro-ecological classification studies. The results of this investigation demonstrate that the distributions of 33 tested ERSS are consistently different among the classification groups derived from the seven FDSS. It is further shown that classification based solely on the 33 ERSS generally does a poorer job in grouping similar streamgauges than the classification based on the seven FDSS. This new classification approach has the additional advantages of overcoming some of the subjectivity associated with the selection of the classification variables and provides a set of robust continental-scale classes of US streamgauges. Published 2013. This article is a U.S. Government work and is in the public domain in the USA.
Olives, Casey; Valadez, Joseph J.; Brooker, Simon J.; Pagano, Marcello
2012-01-01
Background Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%), and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. Methodology We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n = 15 and n = 25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Principle Findings Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n = 15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. Conclusion/Significance This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools. PMID:22970333
Comparison of Danish dichotomous and BI-RADS classifications of mammographic density.
Hodge, Rebecca; Hellmann, Sophie Sell; von Euler-Chelpin, My; Vejborg, Ilse; Andersen, Zorana Jovanovic
2014-06-01
In the Copenhagen mammography screening program from 1991 to 2001, mammographic density was classified either as fatty or mixed/dense. This dichotomous mammographic density classification system is unique internationally, and has not been validated before. To compare the Danish dichotomous mammographic density classification system from 1991 to 2001 with the density BI-RADS classifications, in an attempt to validate the Danish classification system. The study sample consisted of 120 mammograms taken in Copenhagen in 1991-2001, which tested false positive, and which were in 2012 re-assessed and classified according to the BI-RADS classification system. We calculated inter-rater agreement between the Danish dichotomous mammographic classification as fatty or mixed/dense and the four-level BI-RADS classification by the linear weighted Kappa statistic. Of the 120 women, 32 (26.7%) were classified as having fatty and 88 (73.3%) as mixed/dense mammographic density, according to Danish dichotomous classification. According to BI-RADS density classification, 12 (10.0%) women were classified as having predominantly fatty (BI-RADS code 1), 46 (38.3%) as having scattered fibroglandular (BI-RADS code 2), 57 (47.5%) as having heterogeneously dense (BI-RADS 3), and five (4.2%) as having extremely dense (BI-RADS code 4) mammographic density. The inter-rater variability assessed by weighted kappa statistic showed a substantial agreement (0.75). The dichotomous mammographic density classification system utilized in early years of Copenhagen's mammographic screening program (1991-2001) agreed well with the BI-RADS density classification system.
Gupta, Mamta; Vang, Russell; Yemelyanova, Anna V; Kurman, Robert J; Li, Fanghong Rose; Maambo, Emily C; Murphy, Kathleen M; DeScipio, Cheryl; Thompson, Carol B; Ronnett, Brigitte M
2012-12-01
Distinction of hydatidiform moles from nonmolar specimens (NMs) and subclassification of hydatidiform moles as complete hydatidiform mole (CHM) and partial hydatidiform mole (PHM) are important for clinical practice and investigational studies; however, diagnosis based solely on morphology is affected by interobserver variability. Molecular genotyping can distinguish these entities by discerning androgenetic diploidy, diandric triploidy, and biparental diploidy to diagnose CHMs, PHMs, and NMs, respectively. Eighty genotyped cases (27 CHMs, 27 PHMs, 26 NMs) were selected from a series of 200 potentially molar specimens previously diagnosed using p57 immunohistochemistry and genotyping. Cases were classified by 6 pathologists (3 faculty level gynecologic pathologists and 3 fellows) on the basis of morphology, masked to p57 immunostaining and genotyping results, into 1 of 3 categories (CHM, PHM, or NM) during 2 diagnostic rounds; a third round incorporating p57 immunostaining results was also conducted. Consensus diagnoses (those rendered by 2 of 3 pathologists in each group) were also determined. Performance of experienced gynecologic pathologists versus fellow pathologists was compared, using genotyping results as the gold standard. Correct classification of CHMs ranged from 59% to 100%; there were no statistically significant differences in performance of faculty versus fellows in any round (P-values of 0.13, 0.67, and 0.54 for rounds 1 to 3, respectively). Correct classification of PHMs ranged from 26% to 93%, with statistically significantly better performance of faculty versus fellows in each round (P-values of 0.04, <0.01, and <0.01 for rounds 1 to 3, respectively). Correct classification of NMs ranged from 31% to 92%, with statistically significantly better performance of faculty only in round 2 (P-values of 1.0, <0.01, and 0.61 for rounds 1 to 3, respectively). Correct classification of all cases combined ranged from 51% to 75% by morphology and 70% to 80% with p57, with statistically significantly better performance of faculty only in round 2 (P-values of 0.69, <0.01, and 0.15 for rounds 1 to 3, respectively). p57 immunostaining significantly improved recognition of CHMs (P<0.01) and had high reproducibility (κ=0.93 to 0.96) but had no impact on distinction of PHMs and NMs. Genotyping provides a definitive diagnosis for the ∼25% to 50% of cases that are misclassified by morphology, especially those that are also unresolved by p57 immunostaining.
NASA Astrophysics Data System (ADS)
Omenzetter, Piotr; de Lautour, Oliver R.
2010-04-01
Developed for studying long, periodic records of various measured quantities, time series analysis methods are inherently suited and offer interesting possibilities for Structural Health Monitoring (SHM) applications. However, their use in SHM can still be regarded as an emerging application and deserves more studies. In this research, Autoregressive (AR) models were used to fit experimental acceleration time histories from two experimental structural systems, a 3- storey bookshelf-type laboratory structure and the ASCE Phase II SHM Benchmark Structure, in healthy and several damaged states. The coefficients of the AR models were chosen as damage sensitive features. Preliminary visual inspection of the large, multidimensional sets of AR coefficients to check the presence of clusters corresponding to different damage severities was achieved using Sammon mapping - an efficient nonlinear data compression technique. Systematic classification of damage into states based on the analysis of the AR coefficients was achieved using two supervised classification techniques: Nearest Neighbor Classification (NNC) and Learning Vector Quantization (LVQ), and one unsupervised technique: Self-organizing Maps (SOM). This paper discusses the performance of AR coefficients as damage sensitive features and compares the efficiency of the three classification techniques using experimental data.
Wang, Xue; Bi, Dao-wei; Ding, Liang; Wang, Sheng
2007-01-01
The recent availability of low cost and miniaturized hardware has allowed wireless sensor networks (WSNs) to retrieve audio and video data in real world applications, which has fostered the development of wireless multimedia sensor networks (WMSNs). Resource constraints and challenging multimedia data volume make development of efficient algorithms to perform in-network processing of multimedia contents imperative. This paper proposes solving problems in the domain of WMSNs from the perspective of multi-agent systems. The multi-agent framework enables flexible network configuration and efficient collaborative in-network processing. The focus is placed on target classification in WMSNs where audio information is retrieved by microphones. To deal with the uncertainties related to audio information retrieval, the statistical approaches of power spectral density estimates, principal component analysis and Gaussian process classification are employed. A multi-agent negotiation mechanism is specially developed to efficiently utilize limited resources and simultaneously enhance classification accuracy and reliability. The negotiation is composed of two phases, where an auction based approach is first exploited to allocate the classification task among the agents and then individual agent decisions are combined by the committee decision mechanism. Simulation experiments with real world data are conducted and the results show that the proposed statistical approaches and negotiation mechanism not only reduce memory and computation requirements in WMSNs but also significantly enhance classification accuracy and reliability. PMID:28903223
Multiple Sparse Representations Classification
Plenge, Esben; Klein, Stefan S.; Niessen, Wiro J.; Meijering, Erik
2015-01-01
Sparse representations classification (SRC) is a powerful technique for pixelwise classification of images and it is increasingly being used for a wide variety of image analysis tasks. The method uses sparse representation and learned redundant dictionaries to classify image pixels. In this empirical study we propose to further leverage the redundancy of the learned dictionaries to achieve a more accurate classifier. In conventional SRC, each image pixel is associated with a small patch surrounding it. Using these patches, a dictionary is trained for each class in a supervised fashion. Commonly, redundant/overcomplete dictionaries are trained and image patches are sparsely represented by a linear combination of only a few of the dictionary elements. Given a set of trained dictionaries, a new patch is sparse coded using each of them, and subsequently assigned to the class whose dictionary yields the minimum residual energy. We propose a generalization of this scheme. The method, which we call multiple sparse representations classification (mSRC), is based on the observation that an overcomplete, class specific dictionary is capable of generating multiple accurate and independent estimates of a patch belonging to the class. So instead of finding a single sparse representation of a patch for each dictionary, we find multiple, and the corresponding residual energies provides an enhanced statistic which is used to improve classification. We demonstrate the efficacy of mSRC for three example applications: pixelwise classification of texture images, lumen segmentation in carotid artery magnetic resonance imaging (MRI), and bifurcation point detection in carotid artery MRI. We compare our method with conventional SRC, K-nearest neighbor, and support vector machine classifiers. The results show that mSRC outperforms SRC and the other reference methods. In addition, we present an extensive evaluation of the effect of the main mSRC parameters: patch size, dictionary size, and sparsity level. PMID:26177106
SLO blind data set inversion and classification using physically complete models
NASA Astrophysics Data System (ADS)
Shamatava, I.; Shubitidze, F.; Fernández, J. P.; Barrowes, B. E.; O'Neill, K.; Grzegorczyk, T. M.; Bijamov, A.
2010-04-01
Discrimination studies carried out on TEMTADS and Metal Mapper blind data sets collected at the San Luis Obispo UXO site are presented. The data sets included four types of targets of interest: 2.36" rockets, 60-mm mortar shells, 81-mm projectiles, and 4.2" mortar items. The total parameterized normalized magnetic source (NSMS) amplitudes were used to discriminate TOI from metallic clutter and among the different hazardous UXO. First, in object's frame coordinate, the total NSMS were determined for each TOI along three orthogonal axes from the training data provided by the Strategic Environmental Research and Development Program (SERDP) along with the referred blind data sets. Then the inverted total NSMS were used to extract the time-decay classification features. Once our inversion and classification algorithms were tested on the calibration data sets then we applied the same procedure to all blind data sets. The combined NSMS and differential evolution algorithm is utilized for determine the NSMS strengths for each cell. The obtained total NSMS time-decay curves were used to extract the discrimination features and perform classification using the training data as reference. In addition, for cross validation, the inverted locations and orientations from NSMS-DE algorithm were compared against the inverted data that obtained via the magnetic field, vector and scalar potentials (HAP) method and the combined dipole and Gauss-Newton approach technique. We examined the entire time decay history of the total NSMS case-by-case for classification purposes. Also, we use different multi-class statistical classification algorithms for separating the dangerous objects from non hazardous items. The inverted targets were ranked by target ID and submitted to SERDP for independent scoring. The independent scoring results are presented.
Discovering interesting molecular substructures for molecular classification.
Lam, Winnie W M; Chan, Keith C C
2010-06-01
Given a set of molecular structure data preclassified into a number of classes, the molecular classification problem is concerned with the discovering of interesting structural patterns in the data so that "unseen" molecules not originally in the dataset can be accurately classified. To tackle the problem, interesting molecular substructures have to be discovered and this is done typically by first representing molecular structures in molecular graphs, and then, using graph-mining algorithms to discover frequently occurring subgraphs in them. These subgraphs are then used to characterize different classes for molecular classification. While such an approach can be very effective, it should be noted that a substructure that occurs frequently in one class may also does occur in another. The discovering of frequent subgraphs for molecular classification may, therefore, not always be the most effective. In this paper, we propose a novel technique called mining interesting substructures in molecular data for classification (MISMOC) that can discover interesting frequent subgraphs not just for the characterization of a molecular class but also for the distinguishing of it from the others. Using a test statistic, MISMOC screens each frequent subgraph to determine if they are interesting. For those that are interesting, their degrees of interestingness are determined using an information-theoretic measure. When classifying an unseen molecule, its structure is then matched against the interesting subgraphs in each class and a total interestingness measure for the unseen molecule to be classified into a particular class is determined, which is based on the interestingness of each matched subgraphs. The performance of MISMOC is evaluated using both artificial and real datasets, and the results show that it can be an effective approach for molecular classification.
Land Cover Analysis by Using Pixel-Based and Object-Based Image Classification Method in Bogor
NASA Astrophysics Data System (ADS)
Amalisana, Birohmatin; Rokhmatullah; Hernina, Revi
2017-12-01
The advantage of image classification is to provide earth’s surface information like landcover and time-series changes. Nowadays, pixel-based image classification technique is commonly performed with variety of algorithm such as minimum distance, parallelepiped, maximum likelihood, mahalanobis distance. On the other hand, landcover classification can also be acquired by using object-based image classification technique. In addition, object-based classification uses image segmentation from parameter such as scale, form, colour, smoothness and compactness. This research is aimed to compare the result of landcover classification and its change detection between parallelepiped pixel-based and object-based classification method. Location of this research is Bogor with 20 years range of observation from 1996 until 2016. This region is famous as urban areas which continuously change due to its rapid development, so that time-series landcover information of this region will be interesting.
Computerized Classification Testing with the Rasch Model
ERIC Educational Resources Information Center
Eggen, Theo J. H. M.
2011-01-01
If classification in a limited number of categories is the purpose of testing, computerized adaptive tests (CATs) with algorithms based on sequential statistical testing perform better than estimation-based CATs (e.g., Eggen & Straetmans, 2000). In these computerized classification tests (CCTs), the Sequential Probability Ratio Test (SPRT) (Wald,…
Suir, Glenn M.; Evers, D. Elaine; Steyer, Gregory D.; Sasser, Charles E.
2013-01-01
Coastal Louisiana is a dynamic and ever-changing landscape. From 1956 to 2010, over 3,734 km2 of Louisiana's coastal wetlands have been lost due to a combination of natural and human-induced activities. The resulting landscape constitutes a mosaic of conditions from highly deteriorated to relatively stable with intact landmasses. Understanding how and why coastal landscapes change over time is critical to restoration and rehabilitation efforts. Historically, changes in marsh pattern (i.e., size and spatial distribution of marsh landmasses and water bodies) have been distinguished using visual identification by individual researchers. Difficulties associated with this approach include subjective interpretation, uncertain reproducibility, and laborious techniques. In order to minimize these limitations, this study aims to expand existing tools and techniques via a computer-based method, which uses geospatial technologies for determining shifts in landscape patterns. Our method is based on a raster framework and uses landscape statistics to develop conditions and thresholds for a marsh classification scheme. The classification scheme incorporates land and water classified imagery and a two-part classification system: (1) ratio of water to land, and (2) configuration and connectivity of water within wetland landscapes to evaluate changes in marsh patterns. This analysis system can also be used to trace trajectories in landscape patterns through space and time. Overall, our method provides a more automated means of quantifying landscape patterns and may serve as a reliable landscape evaluation tool for future investigations of wetland ecosystem processes in the northern Gulf of Mexico.
NASA Astrophysics Data System (ADS)
Åberg Lindell, M.; Andersson, P.; Grape, S.; Hellesen, C.; Håkansson, A.; Thulin, M.
2018-03-01
This paper investigates how concentrations of certain fission products and their related gamma-ray emissions can be used to discriminate between uranium oxide (UOX) and mixed oxide (MOX) type fuel. Discrimination of irradiated MOX fuel from irradiated UOX fuel is important in nuclear facilities and for transport of nuclear fuel, for purposes of both criticality safety and nuclear safeguards. Although facility operators keep records on the identity and properties of each fuel, tools for nuclear safeguards inspectors that enable independent verification of the fuel are critical in the recovery of continuity of knowledge, should it be lost. A discrimination methodology for classification of UOX and MOX fuel, based on passive gamma-ray spectroscopy data and multivariate analysis methods, is presented. Nuclear fuels and their gamma-ray emissions were simulated in the Monte Carlo code Serpent, and the resulting data was used as input to train seven different multivariate classification techniques. The trained classifiers were subsequently implemented and evaluated with respect to their capabilities to correctly predict the classes of unknown fuel items. The best results concerning successful discrimination of UOX and MOX-fuel were acquired when using non-linear classification techniques, such as the k nearest neighbors method and the Gaussian kernel support vector machine. For fuel with cooling times up to 20 years, when it is considered that gamma-rays from the isotope 134Cs can still be efficiently measured, success rates of 100% were obtained. A sensitivity analysis indicated that these methods were also robust.
Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases
2016-01-01
Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers–Logistic Regression, Naïve Bayes and Random Forest–with a range of social network measures and the necessary databases to model the verdicts in two real–world cases: the U.S. Watergate Conspiracy of the 1970’s and the now–defunct Canada–based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures. PMID:26824351
Applications of remote sensing, volume 1
NASA Technical Reports Server (NTRS)
Landgrebe, D. A. (Principal Investigator)
1977-01-01
The author has identified the following significant results. ECHO successfully exploits the redundancy of states characteristics of sampled imagery of ground scenes to achieve better classification accuracy, reduce the number of classifications required, and reduce the variability of classification results. The information required to produce ECHO classifications are cell size, cell homogeneity, cell-to-field annexation parameters, input data, and a class conditional marginal density statistics deck.
Lu, Yingjie
2013-01-01
To facilitate patient involvement in online health community and obtain informative support and emotional support they need, a topic identification approach was proposed in this paper for identifying automatically topics of the health-related messages in online health community, thus assisting patients in reaching the most relevant messages for their queries efficiently. Feature-based classification framework was presented for automatic topic identification in our study. We first collected the messages related to some predefined topics in a online health community. Then we combined three different types of features, n-gram-based features, domain-specific features and sentiment features to build four feature sets for health-related text representation. Finally, three different text classification techniques, C4.5, Naïve Bayes and SVM were adopted to evaluate our topic classification model. By comparing different feature sets and different classification techniques, we found that n-gram-based features, domain-specific features and sentiment features were all considered to be effective in distinguishing different types of health-related topics. In addition, feature reduction technique based on information gain was also effective to improve the topic classification performance. In terms of classification techniques, SVM outperformed C4.5 and Naïve Bayes significantly. The experimental results demonstrated that the proposed approach could identify the topics of online health-related messages efficiently.
A comparison of autonomous techniques for multispectral image analysis and classification
NASA Astrophysics Data System (ADS)
Valdiviezo-N., Juan C.; Urcid, Gonzalo; Toxqui-Quitl, Carina; Padilla-Vivanco, Alfonso
2012-10-01
Multispectral imaging has given place to important applications related to classification and identification of objects from a scene. Because of multispectral instruments can be used to estimate the reflectance of materials in the scene, these techniques constitute fundamental tools for materials analysis and quality control. During the last years, a variety of algorithms has been developed to work with multispectral data, whose main purpose has been to perform the correct classification of the objects in the scene. The present study introduces a brief review of some classical as well as a novel technique that have been used for such purposes. The use of principal component analysis and K-means clustering techniques as important classification algorithms is here discussed. Moreover, a recent method based on the min-W and max-M lattice auto-associative memories, that was proposed for endmember determination in hyperspectral imagery, is introduced as a classification method. Besides a discussion of their mathematical foundation, we emphasize their main characteristics and the results achieved for two exemplar images conformed by objects similar in appearance, but spectrally different. The classification results state that the first components computed from principal component analysis can be used to highlight areas with different spectral characteristics. In addition, the use of lattice auto-associative memories provides good results for materials classification even in the cases where some spectral similarities appears in their spectral responses.
NASA Technical Reports Server (NTRS)
Kim, H.; Swain, P. H.
1991-01-01
A method of classifying multisource data in remote sensing is presented. The proposed method considers each data source as an information source providing a body of evidence, represents statistical evidence by interval-valued probabilities, and uses Dempster's rule to integrate information based on multiple data source. The method is applied to the problems of ground-cover classification of multispectral data combined with digital terrain data such as elevation, slope, and aspect. Then this method is applied to simulated 201-band High Resolution Imaging Spectrometer (HIRIS) data by dividing the dimensionally huge data source into smaller and more manageable pieces based on the global statistical correlation information. It produces higher classification accuracy than the Maximum Likelihood (ML) classification method when the Hughes phenomenon is apparent.
Radiographic classifications in Perthes disease
Huhnstock, Stefan; Svenningsen, Svein; Merckoll, Else; Catterall, Anthony; Terjesen, Terje; Wiig, Ola
2017-01-01
Background and purpose Different radiographic classifications have been proposed for prediction of outcome in Perthes disease. We assessed whether the modified lateral pillar classification would provide more reliable interobserver agreement and prognostic value compared with the original lateral pillar classification and the Catterall classification. Patients and methods 42 patients (38 boys) with Perthes disease were included in the interobserver study. Their mean age at diagnosis was 6.5 (3–11) years. 5 observers classified the radiographs in 2 separate sessions according to the Catterall classification, the original and the modified lateral pillar classifications. Interobserver agreement was analysed using weighted kappa statistics. We assessed the associations between the classifications and femoral head sphericity at 5-year follow-up in 37 non-operatively treated patients in a crosstable analysis (Gamma statistics for ordinal variables, γ). Results The original lateral pillar and Catterall classifications showed moderate interobserver agreement (kappa 0.49 and 0.43, respectively) while the modified lateral pillar classification had fair agreement (kappa 0.40). The original lateral pillar classification was strongly associated with the 5-year radiographic outcome, with a mean γ correlation coefficient of 0.75 (95% CI: 0.61–0.95) among the 5 observers. The modified lateral pillar and Catterall classifications showed moderate associations (mean γ correlation coefficient 0.55 [95% CI: 0.38–0.66] and 0.64 [95% CI: 0.57–0.72], respectively). Interpretation The Catterall classification and the original lateral pillar classification had sufficient interobserver agreement and association to late radiographic outcome to be suitable for clinical use. Adding the borderline B/C group did not increase the interobserver agreement or prognostic value of the original lateral pillar classification. PMID:28613966
A Discriminative Approach to EEG Seizure Detection
Johnson, Ashley N.; Sow, Daby; Biem, Alain
2011-01-01
Seizures are abnormal sudden discharges in the brain with signatures represented in electroencephalograms (EEG). The efficacy of the application of speech processing techniques to discriminate between seizure and non-seizure states in EEGs is reported. The approach accounts for the challenges of unbalanced datasets (seizure and non-seizure), while also showing a system capable of real-time seizure detection. The Minimum Classification Error (MCE) algorithm, which is a discriminative learning algorithm with wide-use in speech processing, is applied and compared with conventional classification techniques that have already been applied to the discrimination between seizure and non-seizure states in the literature. The system is evaluated on 22 pediatric patients multi-channel EEG recordings. Experimental results show that the application of speech processing techniques and MCE compare favorably with conventional classification techniques in terms of classification performance, while requiring less computational overhead. The results strongly suggests the possibility of deploying the designed system at the bedside. PMID:22195192
Collagen morphology and texture analysis: from statistics to classification
Mostaço-Guidolin, Leila B.; Ko, Alex C.-T.; Wang, Fei; Xiang, Bo; Hewko, Mark; Tian, Ganghong; Major, Arkady; Shiomi, Masashi; Sowa, Michael G.
2013-01-01
In this study we present an image analysis methodology capable of quantifying morphological changes in tissue collagen fibril organization caused by pathological conditions. Texture analysis based on first-order statistics (FOS) and second-order statistics such as gray level co-occurrence matrix (GLCM) was explored to extract second-harmonic generation (SHG) image features that are associated with the structural and biochemical changes of tissue collagen networks. Based on these extracted quantitative parameters, multi-group classification of SHG images was performed. With combined FOS and GLCM texture values, we achieved reliable classification of SHG collagen images acquired from atherosclerosis arteries with >90% accuracy, sensitivity and specificity. The proposed methodology can be applied to a wide range of conditions involving collagen re-modeling, such as in skin disorders, different types of fibrosis and muscular-skeletal diseases affecting ligaments and cartilage. PMID:23846580
NASA Technical Reports Server (NTRS)
Benediktsson, J. A.; Swain, P. H.; Ersoy, O. K.
1993-01-01
Application of neural networks to classification of remote sensing data is discussed. Conventional two-layer backpropagation is found to give good results in classification of remote sensing data but is not efficient in training. A more efficient variant, based on conjugate-gradient optimization, is used for classification of multisource remote sensing and geographic data and very-high-dimensional data. The conjugate-gradient neural networks give excellent performance in classification of multisource data, but do not compare as well with statistical methods in classification of very-high-dimentional data.
Hydrologic Landscape Regionalisation Using Deductive Classification and Random Forests
Brown, Stuart C.; Lester, Rebecca E.; Versace, Vincent L.; Fawcett, Jonathon; Laurenson, Laurie
2014-01-01
Landscape classification and hydrological regionalisation studies are being increasingly used in ecohydrology to aid in the management and research of aquatic resources. We present a methodology for classifying hydrologic landscapes based on spatial environmental variables by employing non-parametric statistics and hybrid image classification. Our approach differed from previous classifications which have required the use of an a priori spatial unit (e.g. a catchment) which necessarily results in the loss of variability that is known to exist within those units. The use of a simple statistical approach to identify an appropriate number of classes eliminated the need for large amounts of post-hoc testing with different number of groups, or the selection and justification of an arbitrary number. Using statistical clustering, we identified 23 distinct groups within our training dataset. The use of a hybrid classification employing random forests extended this statistical clustering to an area of approximately 228,000 km2 of south-eastern Australia without the need to rely on catchments, landscape units or stream sections. This extension resulted in a highly accurate regionalisation at both 30-m and 2.5-km resolution, and a less-accurate 10-km classification that would be more appropriate for use at a continental scale. A smaller case study, of an area covering 27,000 km2, demonstrated that the method preserved the intra- and inter-catchment variability that is known to exist in local hydrology, based on previous research. Preliminary analysis linking the regionalisation to streamflow indices is promising suggesting that the method could be used to predict streamflow behaviour in ungauged catchments. Our work therefore simplifies current classification frameworks that are becoming more popular in ecohydrology, while better retaining small-scale variability in hydrology, thus enabling future attempts to explain and visualise broad-scale hydrologic trends at the scale of catchments and continents. PMID:25396410
Hydrologic landscape regionalisation using deductive classification and random forests.
Brown, Stuart C; Lester, Rebecca E; Versace, Vincent L; Fawcett, Jonathon; Laurenson, Laurie
2014-01-01
Landscape classification and hydrological regionalisation studies are being increasingly used in ecohydrology to aid in the management and research of aquatic resources. We present a methodology for classifying hydrologic landscapes based on spatial environmental variables by employing non-parametric statistics and hybrid image classification. Our approach differed from previous classifications which have required the use of an a priori spatial unit (e.g. a catchment) which necessarily results in the loss of variability that is known to exist within those units. The use of a simple statistical approach to identify an appropriate number of classes eliminated the need for large amounts of post-hoc testing with different number of groups, or the selection and justification of an arbitrary number. Using statistical clustering, we identified 23 distinct groups within our training dataset. The use of a hybrid classification employing random forests extended this statistical clustering to an area of approximately 228,000 km2 of south-eastern Australia without the need to rely on catchments, landscape units or stream sections. This extension resulted in a highly accurate regionalisation at both 30-m and 2.5-km resolution, and a less-accurate 10-km classification that would be more appropriate for use at a continental scale. A smaller case study, of an area covering 27,000 km2, demonstrated that the method preserved the intra- and inter-catchment variability that is known to exist in local hydrology, based on previous research. Preliminary analysis linking the regionalisation to streamflow indices is promising suggesting that the method could be used to predict streamflow behaviour in ungauged catchments. Our work therefore simplifies current classification frameworks that are becoming more popular in ecohydrology, while better retaining small-scale variability in hydrology, thus enabling future attempts to explain and visualise broad-scale hydrologic trends at the scale of catchments and continents.
Study of on-board compression of earth resources data
NASA Technical Reports Server (NTRS)
Habibi, A.
1975-01-01
The current literature on image bandwidth compression was surveyed and those methods relevant to compression of multispectral imagery were selected. Typical satellite multispectral data was then analyzed statistically and the results used to select a smaller set of candidate bandwidth compression techniques particularly relevant to earth resources data. These were compared using both theoretical analysis and simulation, under various criteria of optimality such as mean square error (MSE), signal-to-noise ratio, classification accuracy, and computational complexity. By concatenating some of the most promising techniques, three multispectral data compression systems were synthesized which appear well suited to current and future NASA earth resources applications. The performance of these three recommended systems was then examined in detail by all of the above criteria. Finally, merits and deficiencies were summarized and a number of recommendations for future NASA activities in data compression proposed.
CROP type analysis using Landsat digital data
NASA Technical Reports Server (NTRS)
Brown, C. E.; Thomas, R. W.; Wall, S. L.
1981-01-01
Classification and statistical sampling techniques for crop type discrimination using Landsat digital data have been developed by the University of California in cooperation with NASA and the California Department of Water Resources. Ratioed bands (MSS 7/5 and 5/4) and a sun-angle corrected Euclidean albedo band were prepared from data for the Sacramento Valley for five different dates. The test area was stratified into general crop groupings based on the particular patterns of irrigation timing for each crop. Data classified within each stratum were used to produce a crop type map. Comparison with ground data indicates that certain crops and crop groups are discernable. Small grains and rice are easily identifiable, as are deciduous fruit varieties as a group. However, it is not feasible to separate various fruit and nut varieties, or separate vegetable crops with these techniques at present.
Current application of chemometrics in traditional Chinese herbal medicine research.
Huang, Yipeng; Wu, Zhenwei; Su, Rihui; Ruan, Guihua; Du, Fuyou; Li, Gongke
2016-07-15
Traditional Chinese herbal medicines (TCHMs) are promising approach for the treatment of various diseases which have attracted increasing attention all over the world. Chemometrics in quality control of TCHMs are great useful tools that harnessing mathematics, statistics and other methods to acquire information maximally from the data obtained from various analytical approaches. This feature article focuses on the recent studies which evaluating the pharmacological efficacy and quality of TCHMs by determining, identifying and discriminating the bioactive or marker components in different samples with the help of chemometric techniques. In this work, the application of chemometric techniques in the classification of TCHMs based on their efficacy and usage was introduced. The recent advances of chemometrics applied in the chemical analysis of TCHMs were reviewed in detail. Copyright © 2015 Elsevier B.V. All rights reserved.
Computer Aided Diagnostic Support System for Skin Cancer: A Review of Techniques and Algorithms
Masood, Ammara; Al-Jumaily, Adel Ali
2013-01-01
Image-based computer aided diagnosis systems have significant potential for screening and early detection of malignant melanoma. We review the state of the art in these systems and examine current practices, problems, and prospects of image acquisition, pre-processing, segmentation, feature extraction and selection, and classification of dermoscopic images. This paper reports statistics and results from the most important implementations reported to date. We compared the performance of several classifiers specifically developed for skin lesion diagnosis and discussed the corresponding findings. Whenever available, indication of various conditions that affect the technique's performance is reported. We suggest a framework for comparative assessment of skin cancer diagnostic models and review the results based on these models. The deficiencies in some of the existing studies are highlighted and suggestions for future research are provided. PMID:24575126
Checa-Moreno, R; Manzano, E; Mirón, G; Capitan-Vallvey, L F
2008-05-15
In this paper, we performed a comparison between commonly used strategies amino acid ratios (Aa ratios), two-dimensional ratio plots (2D-Plot) and statistical correlation factor (SCF) and a classification technique, soft independent modelling of class analogy (SIMCA), to identify protein binders present in old artwork samples. To do this, we used a natural standard collection of proteinaceous binders prepared in our laboratory using old recipes and eleven samples coming from Cultural Heritage, such as mural and easel paintings, manuscripts and polychrome sculptures from the 15-18th centuries. Protein binder samples were hydrolyzed and their constitutive amino acids were determined as PITC-derivatives using HPLC-DAD. Amino acid profile data were used to perform the comparison between the four different strategies mentioned above. Traditional strategies can lead to ambiguous or non-conclusive results. With SIMCA, it is possible to provide a more robust and less subjective identification knowing the confidence level of identification. As a standard, we used proteinaceous albumin (whole egg, yolk and glair); casein (goat, cow and sheep) and collagen (mammalian and fish). The process results in a more robust understanding of proteinaceous binding media in old artworks that makes it possible to distinguish them according to their origin.
NASA Astrophysics Data System (ADS)
Ghaffarian, S.; Ghaffarian, S.
2014-08-01
This paper presents a novel approach to detect the buildings by automization of the training area collecting stage for supervised classification. The method based on the fact that a 3d building structure should cast a shadow under suitable imaging conditions. Therefore, the methodology begins with the detection and masking out the shadow areas using luminance component of the LAB color space, which indicates the lightness of the image, and a novel double thresholding technique. Further, the training areas for supervised classification are selected by automatically determining a buffer zone on each building whose shadow is detected by using the shadow shape and the sun illumination direction. Thereafter, by calculating the statistic values of each buffer zone which is collected from the building areas the Improved Parallelepiped Supervised Classification is executed to detect the buildings. Standard deviation thresholding applied to the Parallelepiped classification method to improve its accuracy. Finally, simple morphological operations conducted for releasing the noises and increasing the accuracy of the results. The experiments were performed on set of high resolution Google Earth images. The performance of the proposed approach was assessed by comparing the results of the proposed approach with the reference data by using well-known quality measurements (Precision, Recall and F1-score) to evaluate the pixel-based and object-based performances of the proposed approach. Evaluation of the results illustrates that buildings detected from dense and suburban districts with divers characteristics and color combinations using our proposed method have 88.4 % and 853 % overall pixel-based and object-based precision performances, respectively.
Taxonomy-aware feature engineering for microbiome classification.
Oudah, Mai; Henschel, Andreas
2018-06-15
What is a healthy microbiome? The pursuit of this and many related questions, especially in light of the recently recognized microbial component in a wide range of diseases has sparked a surge in metagenomic studies. They are often not simply attributable to a single pathogen but rather are the result of complex ecological processes. Relatedly, the increasing DNA sequencing depth and number of samples in metagenomic case-control studies enabled the applicability of powerful statistical methods, e.g. Machine Learning approaches. For the latter, the feature space is typically shaped by the relative abundances of operational taxonomic units, as determined by cost-effective phylogenetic marker gene profiles. While a substantial body of microbiome/microbiota research involves unsupervised and supervised Machine Learning, very little attention has been put on feature selection and engineering. We here propose the first algorithm to exploit phylogenetic hierarchy (i.e. an all-encompassing taxonomy) in feature engineering for microbiota classification. The rationale is to exploit the often mono- or oligophyletic distribution of relevant (but hidden) traits by virtue of taxonomic abstraction. The algorithm is embedded in a comprehensive microbiota classification pipeline, which we applied to a diverse range of datasets, distinguishing healthy from diseased microbiota samples. We demonstrate substantial improvements over the state-of-the-art microbiota classification tools in terms of classification accuracy, regardless of the actual Machine Learning technique while using drastically reduced feature spaces. Moreover, generalized features bear great explanatory value: they provide a concise description of conditions and thus help to provide pathophysiological insights. Indeed, the automatically and reproducibly derived features are consistent with previously published domain expert analyses.
Yabe, Tetsuji; Tsuda, Tomoyuki; Hirose, Shunsuke; Ozawa, Toshiyuki
2012-05-01
In this article, a comparison of replantation using microsurgical replantation (replantation) and the Brent method and its modification (pocket principle) in the treatment of fingertip amputation is reported. As a classification of amputation level, we used Ishikawa's subzone classification of fingertip amputation, and the cases of amputations only in subzone 2 were included in this study. Between these two groups, there was no statistical difference in survival rate, postoperative atrophy, or postoperative range of motion. In terms of sensory recovery, some records were lost and exact study was difficult. But there was no obvious difference between these cases. In our comparison of microsurgical replantation versus the pocket principle in treatment of subzone 2 fingertip amputation, there was no difference in postoperative results. Each method has pros and cons, and the surgeon should choose which technique to use based on his or her understanding of the characteristics of both methods. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
LaViola, Joseph J; Zeleznik, Robert C
2007-11-01
We present a practical technique for using a writer-independent recognition engine to improve the accuracy and speed while reducing the training requirements of a writer-dependent symbol recognizer. Our writer-dependent recognizer uses a set of binary classifiers based on the AdaBoost learning algorithm, one for each possible pairwise symbol comparison. Each classifier consists of a set of weak learners, one of which is based on a writer-independent handwriting recognizer. During online recognition, we also use the n-best list of the writer-independent recognizer to prune the set of possible symbols and thus reduce the number of required binary classifications. In this paper, we describe the geometric and statistical features used in our recognizer and our all-pairs classification algorithm. We also present the results of experiments that quantify the effect incorporating a writer-independent recognition engine into a writer-dependent recognizer has on accuracy, speed, and user training time.
Román Falcó, Iván P; Grané Teruel, Nuria; Prats Moya, Soledad; Martín Carratalá, M Luisa
2012-11-28
A new approach for the determination of kinetic parameters of the cis/trans isomerization during the oxidation process of 24 virgin olive oils belonging to 8 different varieties is presented. The accelerated process of degradation at 100 °C was monitored by recording the Fourier transform infrared spectra. The parameters obtained confirm pseudo-first-order kinetics for the degradation of cis and the appearance of trans double bonds. The kinetic approach affords the induction time and the rate coefficient; these parameters are related to the fatty acid profile of the fresh olive oils. The data obtained were used to compare the oil stability of the samples with the help of multivariate statistical techniques. Fatty acid allowed a classification of the samples in five groups, one of them constituted by the cultivars with higher stability. Meanwhile, the kinetic parameters showed greater ability for the characterization of olive oils, allowing the classification in seven groups.
Mapping the Philippines' mangrove forests using Landsat imagery
Long, Jordan; Giri, Chandra
2011-01-01
Current, accurate, and reliable information on the areal extent and spatial distribution of mangrove forests in the Philippines is limited. Previous estimates of mangrove extent do not illustrate the spatial distribution for the entire country. This study, part of a global assessment of mangrove dynamics, mapped the spatial distribution and areal extent of the Philippines’ mangroves circa 2000. We used publicly available Landsat data acquired primarily from the Global Land Survey to map the total extent and spatial distribution. ISODATA clustering, an unsupervised classification technique, was applied to 61 Landsat images. Statistical analysis indicates the total area of mangrove forest cover was approximately 256,185 hectares circa 2000 with overall classification accuracy of 96.6% and a kappa coefficient of 0.926. These results differ substantially from most recent estimates of mangrove area in the Philippines. The results of this study may assist the decision making processes for rehabilitation and conservation efforts that are currently needed to protect and restore the Philippines’ degraded mangrove forests.
Bourne, Roger; Himmelreich, Uwe; Sharma, Ansuiya; Mountford, Carolyn; Sorrell, Tania
2001-01-01
A new fingerprinting technique with the potential for rapid identification of bacteria was developed by combining proton magnetic resonance spectroscopy (1H MRS) with multivariate statistical analysis. This resulted in an objective identification strategy for common clinical isolates belonging to the bacterial species Staphylococcus aureus, Staphylococcus epidermidis, Enterococcus faecalis, Streptococcus pneumoniae, Streptococcus pyogenes, Streptococcus agalactiae, and the Streptococcus milleri group. Duplicate cultures of 104 different isolates were examined one or more times using 1H MRS. A total of 312 cultures were examined. An optimized classifier was developed using a bootstrapping process and a seven-group linear discriminant analysis to provide objective classification of the spectra. Identification of isolates was based on consistent high-probability classification of spectra from duplicate cultures and achieved 92% agreement with conventional methods of identification. Fewer than 1% of isolates were identified incorrectly. Identification of the remaining 7% of isolates was defined as indeterminate. PMID:11474013
Himmelreich, Uwe; Somorjai, Ray L.; Dolenko, Brion; Lee, Ok Cha; Daniel, Heide-Marie; Murray, Ronan; Mountford, Carolyn E.; Sorrell, Tania C.
2003-01-01
Nuclear magnetic resonance (NMR) spectra were acquired from suspensions of clinically important yeast species of the genus Candida to characterize the relationship between metabolite profiles and species identification. Major metabolites were identified by using two-dimensional correlation NMR spectroscopy. One-dimensional proton NMR spectra were analyzed by using a staged statistical classification strategy. Analysis of NMR spectra from 442 isolates of Candida albicans, C. glabrata, C. krusei, C. parapsilosis, and C. tropicalis resulted in rapid, accurate identification when compared with conventional and DNA-based identification. Spectral regions used for the classification of the five yeast species revealed species-specific differences in relative amounts of lipids, trehalose, polyols, and other metabolites. Isolates of C. parapsilosis and C. glabrata with unusual PCR fingerprinting patterns also generated atypical NMR spectra, suggesting the possibility of intraspecies discontinuity. We conclude that NMR spectroscopy combined with a statistical classification strategy is a rapid, nondestructive, and potentially valuable method for identification and chemotaxonomic characterization that may be broadly applicable to fungi and other microorganisms. PMID:12902244
Yilmaz, E; Kayikcioglu, T; Kayipmaz, S
2017-07-01
In this article, we propose a decision support system for effective classification of dental periapical cyst and keratocystic odontogenic tumor (KCOT) lesions obtained via cone beam computed tomography (CBCT). CBCT has been effectively used in recent years for diagnosing dental pathologies and determining their boundaries and content. Unlike other imaging techniques, CBCT provides detailed and distinctive information about the pathologies by enabling a three-dimensional (3D) image of the region to be displayed. We employed 50 CBCT 3D image dataset files as the full dataset of our study. These datasets were identified by experts as periapical cyst and KCOT lesions according to the clinical, radiographic and histopathologic features. Segmentation operations were performed on the CBCT images using viewer software that we developed. Using the tools of this software, we marked the lesional volume of interest and calculated and applied the order statistics and 3D gray-level co-occurrence matrix for each CBCT dataset. A feature vector of the lesional region, including 636 different feature items, was created from those statistics. Six classifiers were used for the classification experiments. The Support Vector Machine (SVM) classifier achieved the best classification performance with 100% accuracy, and 100% F-score (F1) scores as a result of the experiments in which a ten-fold cross validation method was used with a forward feature selection algorithm. SVM achieved the best classification performance with 96.00% accuracy, and 96.00% F1 scores in the experiments in which a split sample validation method was used with a forward feature selection algorithm. SVM additionally achieved the best performance of 94.00% accuracy, and 93.88% F1 in which a leave-one-out (LOOCV) method was used with a forward feature selection algorithm. Based on the results, we determined that periapical cyst and KCOT lesions can be classified with a high accuracy with the models that we built using the new dataset selected for this study. The studies mentioned in this article, along with the selected 3D dataset, 3D statistics calculated from the dataset, and performance results of the different classifiers, comprise an important contribution to the field of computer-aided diagnosis of dental apical lesions. Copyright © 2017 Elsevier B.V. All rights reserved.
Bennet, Jaison; Ganaprakasam, Chilambuchelvan Arul; Arputharaj, Kannan
2014-01-01
Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN), naive Bayes, and support vector machine (SVM). Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique (MWT) is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.
Forest tree species discrimination in western Himalaya using EO-1 Hyperion
NASA Astrophysics Data System (ADS)
George, Rajee; Padalia, Hitendra; Kushwaha, S. P. S.
2014-05-01
The information acquired in the narrow bands of hyperspectral remote sensing data has potential to capture plant species spectral variability, thereby improving forest tree species mapping. This study assessed the utility of spaceborne EO-1 Hyperion data in discrimination and classification of broadleaved evergreen and conifer forest tree species in western Himalaya. The pre-processing of 242 bands of Hyperion data resulted into 160 noise-free and vertical stripe corrected reflectance bands. Of these, 29 bands were selected through step-wise exclusion of bands (Wilk's Lambda). Spectral Angle Mapper (SAM) and Support Vector Machine (SVM) algorithms were applied to the selected bands to assess their effectiveness in classification. SVM was also applied to broadband data (Landsat TM) to compare the variation in classification accuracy. All commonly occurring six gregarious tree species, viz., white oak, brown oak, chir pine, blue pine, cedar and fir in western Himalaya could be effectively discriminated. SVM produced a better species classification (overall accuracy 82.27%, kappa statistic 0.79) than SAM (overall accuracy 74.68%, kappa statistic 0.70). It was noticed that classification accuracy achieved with Hyperion bands was significantly higher than Landsat TM bands (overall accuracy 69.62%, kappa statistic 0.65). Study demonstrated the potential utility of narrow spectral bands of Hyperion data in discriminating tree species in a hilly terrain.
NASA Astrophysics Data System (ADS)
Zhang, Y.; Li, F.; Zhang, S.; Hao, W.; Zhu, T.; Yuan, L.; Xiao, F.
2017-09-01
In this paper, Statistical Distribution based Conditional Random Fields (STA-CRF) algorithm is exploited for improving marginal ice-water classification. Pixel level ice concentration is presented as the comparison of methods based on CRF. Furthermore, in order to explore the effective statistical distribution model to be integrated into STA-CRF, five statistical distribution models are investigated. The STA-CRF methods are tested on 2 scenes around Prydz Bay and Adélie Depression, where contain a variety of ice types during melt season. Experimental results indicate that the proposed method can resolve sea ice edge well in Marginal Ice Zone (MIZ) and show a robust distinction of ice and water.
ERIC Educational Resources Information Center
Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver
2017-01-01
Diagnostic classification models (DCMs) hold great potential for applications in summative and formative assessment by providing discrete multivariate proficiency scores that yield statistically driven classifications of students. Using data from a newly developed diagnostic arithmetic assessment that was administered to 2032 fourth-grade students…
75 FR 34319 - User Fees for 2010 Crop Cotton Classification Services to Growers
Federal Register 2010, 2011, 2012, 2013, 2014
2010-06-17
...-AC99 User Fees for 2010 Crop Cotton Classification Services to Growers AGENCY: Agricultural Marketing... fees for cotton producers for 2010 crop cotton classification services under the Cotton Statistics and Estimates Act at the same level as in 2009. These fees are also authorized under the Cotton Standards Act of...
77 FR 33289 - User Fees for 2012 Crop Cotton Classification Services to Growers
Federal Register 2010, 2011, 2012, 2013, 2014
2012-06-06
... User Fees for 2012 Crop Cotton Classification Services to Growers AGENCY: Agricultural Marketing... fees for cotton producers for 2012 crop cotton classification services under the Cotton Statistics and Estimates Act and the Cotton Standards Act of 1923 at $2.20 per bale--the same level as in 2011. This fee...
76 FR 25533 - User Fees for 2011 Crop Cotton Classification Services to Growers
Federal Register 2010, 2011, 2012, 2013, 2014
2011-05-05
...-AD11 User Fees for 2011 Crop Cotton Classification Services to Growers AGENCY: Agricultural Marketing... fees for cotton producers for 2011 crop cotton classification services under the Cotton Statistics and Estimates Act at the same level as in 2010. These fees are also authorized under the Cotton Standards Act of...
Lisa A. Schulte; David J. Mladenoff; Erik V. Nordheim
2002-01-01
We developed a quantitative and replicable classification system to improve understanding of historical composition and structure within northern Wisconsin's forests. The classification system was based on statistical cluster analysis and two forest metrics, relative dominance (% basal area) and relative importance (mean of relative dominance and relative density...
A discrimlnant function approach to ecological site classification in northern New England
James M. Fincher; Marie-Louise Smith
1994-01-01
Describes one approach to ecologically based classification of upland forest community types of the White and Green Mountain physiographic regions. The classification approach is based on an intensive statistical analysis of the relationship between the communities and soil-site factors. Discriminant functions useful in distinguishing between types based on soil-site...
Fast Solution in Sparse LDA for Binary Classification
NASA Technical Reports Server (NTRS)
Moghaddam, Baback
2010-01-01
An algorithm that performs sparse linear discriminant analysis (Sparse-LDA) finds near-optimal solutions in far less time than the prior art when specialized to binary classification (of 2 classes). Sparse-LDA is a type of feature- or variable- selection problem with numerous applications in statistics, machine learning, computer vision, computational finance, operations research, and bio-informatics. Because of its combinatorial nature, feature- or variable-selection problems are NP-hard or computationally intractable in cases involving more than 30 variables or features. Therefore, one typically seeks approximate solutions by means of greedy search algorithms. The prior Sparse-LDA algorithm was a greedy algorithm that considered the best variable or feature to add/ delete to/ from its subsets in order to maximally discriminate between multiple classes of data. The present algorithm is designed for the special but prevalent case of 2-class or binary classification (e.g. 1 vs. 0, functioning vs. malfunctioning, or change versus no change). The present algorithm provides near-optimal solutions on large real-world datasets having hundreds or even thousands of variables or features (e.g. selecting the fewest wavelength bands in a hyperspectral sensor to do terrain classification) and does so in typical computation times of minutes as compared to days or weeks as taken by the prior art. Sparse LDA requires solving generalized eigenvalue problems for a large number of variable subsets (represented by the submatrices of the input within-class and between-class covariance matrices). In the general (fullrank) case, the amount of computation scales at least cubically with the number of variables and thus the size of the problems that can be solved is limited accordingly. However, in binary classification, the principal eigenvalues can be found using a special analytic formula, without resorting to costly iterative techniques. The present algorithm exploits this analytic form along with the inherent sequential nature of greedy search itself. Together this enables the use of highly-efficient partitioned-matrix-inverse techniques that result in large speedups of computation in both the forward-selection and backward-elimination stages of greedy algorithms in general.
Collaborative classification of hyperspectral and visible images with convolutional neural network
NASA Astrophysics Data System (ADS)
Zhang, Mengmeng; Li, Wei; Du, Qian
2017-10-01
Recent advances in remote sensing technology have made multisensor data available for the same area, and it is well-known that remote sensing data processing and analysis often benefit from multisource data fusion. Specifically, low spatial resolution of hyperspectral imagery (HSI) degrades the quality of the subsequent classification task while using visible (VIS) images with high spatial resolution enables high-fidelity spatial analysis. A collaborative classification framework is proposed to fuse HSI and VIS images for finer classification. First, the convolutional neural network model is employed to extract deep spectral features for HSI classification. Second, effective binarized statistical image features are learned as contextual basis vectors for the high-resolution VIS image, followed by a classifier. The proposed approach employs diversified data in a decision fusion, leading to an integration of the rich spectral information, spatial information, and statistical representation information. In particular, the proposed approach eliminates the potential problems of the curse of dimensionality and excessive computation time. The experiments evaluated on two standard data sets demonstrate better classification performance offered by this framework.
Investigation of Error Patterns in Geographical Databases
NASA Technical Reports Server (NTRS)
Dryer, David; Jacobs, Derya A.; Karayaz, Gamze; Gronbech, Chris; Jones, Denise R. (Technical Monitor)
2002-01-01
The objective of the research conducted in this project is to develop a methodology to investigate the accuracy of Airport Safety Modeling Data (ASMD) using statistical, visualization, and Artificial Neural Network (ANN) techniques. Such a methodology can contribute to answering the following research questions: Over a representative sampling of ASMD databases, can statistical error analysis techniques be accurately learned and replicated by ANN modeling techniques? This representative ASMD sample should include numerous airports and a variety of terrain characterizations. Is it possible to identify and automate the recognition of patterns of error related to geographical features? Do such patterns of error relate to specific geographical features, such as elevation or terrain slope? Is it possible to combine the errors in small regions into an error prediction for a larger region? What are the data density reduction implications of this work? ASMD may be used as the source of terrain data for a synthetic visual system to be used in the cockpit of aircraft when visual reference to ground features is not possible during conditions of marginal weather or reduced visibility. In this research, United States Geologic Survey (USGS) digital elevation model (DEM) data has been selected as the benchmark. Artificial Neural Networks (ANNS) have been used and tested as alternate methods in place of the statistical methods in similar problems. They often perform better in pattern recognition, prediction and classification and categorization problems. Many studies show that when the data is complex and noisy, the accuracy of ANN models is generally higher than those of comparable traditional methods.
NASA Astrophysics Data System (ADS)
Vathsala, H.; Koolagudi, Shashidhar G.
2017-01-01
In this paper we discuss a data mining application for predicting peninsular Indian summer monsoon rainfall, and propose an algorithm that combine data mining and statistical techniques. We select likely predictors based on association rules that have the highest confidence levels. We then cluster the selected predictors to reduce their dimensions and use cluster membership values for classification. We derive the predictors from local conditions in southern India, including mean sea level pressure, wind speed, and maximum and minimum temperatures. The global condition variables include southern oscillation and Indian Ocean dipole conditions. The algorithm predicts rainfall in five categories: Flood, Excess, Normal, Deficit and Drought. We use closed itemset mining, cluster membership calculations and a multilayer perceptron function in the algorithm to predict monsoon rainfall in peninsular India. Using Indian Institute of Tropical Meteorology data, we found the prediction accuracy of our proposed approach to be exceptionally good.
Menditto, Anthony A; Linhorst, Donald M; Coleman, James C; Beck, Niels C
2006-04-01
Development of policies and procedures to contend with the risks presented by elopement, aggression, and suicidal behaviors are long-standing challenges for mental health administrators. Guidance in making such judgments can be obtained through the use of a multivariate statistical technique known as logistic regression. This procedure can be used to develop a predictive equation that is mathematically formulated to use the best combination of predictors, rather than considering just one factor at a time. This paper presents an overview of logistic regression and its utility in mental health administrative decision making. A case example of its application is presented using data on elopements from Missouri's long-term state psychiatric hospitals. Ultimately, the use of statistical prediction analyses tempered with differential qualitative weighting of classification errors can augment decision-making processes in a manner that provides guidance and flexibility while wrestling with the complex problem of risk assessment and decision making.
Classification of Malaysia aromatic rice using multivariate statistical analysis
NASA Astrophysics Data System (ADS)
Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A.; Omar, O.
2015-05-01
Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC-MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.
NASA Astrophysics Data System (ADS)
Cánovas-García, Fulgencio; Alonso-Sarría, Francisco; Gomariz-Castillo, Francisco; Oñate-Valdivieso, Fernando
2017-06-01
Random forest is a classification technique widely used in remote sensing. One of its advantages is that it produces an estimation of classification accuracy based on the so called out-of-bag cross-validation method. It is usually assumed that such estimation is not biased and may be used instead of validation based on an external data-set or a cross-validation external to the algorithm. In this paper we show that this is not necessarily the case when classifying remote sensing imagery using training areas with several pixels or objects. According to our results, out-of-bag cross-validation clearly overestimates accuracy, both overall and per class. The reason is that, in a training patch, pixels or objects are not independent (from a statistical point of view) of each other; however, they are split by bootstrapping into in-bag and out-of-bag as if they were really independent. We believe that putting whole patch, rather than pixels/objects, in one or the other set would produce a less biased out-of-bag cross-validation. To deal with the problem, we propose a modification of the random forest algorithm to split training patches instead of the pixels (or objects) that compose them. This modified algorithm does not overestimate accuracy and has no lower predictive capability than the original. When its results are validated with an external data-set, the accuracy is not different from that obtained with the original algorithm. We analysed three remote sensing images with different classification approaches (pixel and object based); in the three cases reported, the modification we propose produces a less biased accuracy estimation.
Hidden Markov models of biological primary sequence information.
Baldi, P; Chauvin, Y; Hunkapiller, T; McClure, M A
1994-01-01
Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences. PMID:8302831
Lake bed classification using acoustic data
Yin, Karen K.; Li, Xing; Bonde, John; Richards, Carl; Cholwek, Gary
1998-01-01
As part of our effort to identify the lake bed surficial substrates using remote sensing data, this work designs pattern classifiers by multivariate statistical methods. Probability distribution of the preprocessed acoustic signal is analyzed first. A confidence region approach is then adopted to improve the design of the existing classifier. A technique for further isolation is proposed which minimizes the expected loss from misclassification. The devices constructed are applicable for real-time lake bed categorization. A mimimax approach is suggested to treat more general cases where the a priori probability distribution of the substrate types is unknown. Comparison of the suggested methods with the traditional likelihood ratio tests is discussed.
Navas, Juan Moreno; Telfer, Trevor C; Ross, Lindsay G
2011-08-01
Combining GIS with neuro-fuzzy modeling has the advantage that expert scientific knowledge in coastal aquaculture activities can be incorporated into a geospatial model to classify areas particularly vulnerable to pollutants. Data on the physical environment and its suitability for aquaculture in an Irish fjard, which is host to a number of different aquaculture activities, were derived from a three-dimensional hydrodynamic and GIS models. Subsequent incorporation into environmental vulnerability models, based on neuro-fuzzy techniques, highlighted localities particularly vulnerable to aquaculture development. The models produced an overall classification accuracy of 85.71%, with a Kappa coefficient of agreement of 81%, and were sensitive to different input parameters. A statistical comparison between vulnerability scores and nitrogen concentrations in sediment associated with salmon cages showed good correlation. Neuro-fuzzy techniques within GIS modeling classify vulnerability of coastal regions appropriately and have a role in policy decisions for aquaculture site selection. Copyright © 2011 Elsevier Ltd. All rights reserved.
Cloud field classification based on textural features
NASA Technical Reports Server (NTRS)
Sengupta, Sailes Kumar
1989-01-01
An essential component in global climate research is accurate cloud cover and type determination. Of the two approaches to texture-based classification (statistical and textural), only the former is effective in the classification of natural scenes such as land, ocean, and atmosphere. In the statistical approach that was adopted, parameters characterizing the stochastic properties of the spatial distribution of grey levels in an image are estimated and then used as features for cloud classification. Two types of textural measures were used. One is based on the distribution of the grey level difference vector (GLDV), and the other on a set of textural features derived from the MaxMin cooccurrence matrix (MMCM). The GLDV method looks at the difference D of grey levels at pixels separated by a horizontal distance d and computes several statistics based on this distribution. These are then used as features in subsequent classification. The MaxMin tectural features on the other hand are based on the MMCM, a matrix whose (I,J)th entry give the relative frequency of occurrences of the grey level pair (I,J) that are consecutive and thresholded local extremes separated by a given pixel distance d. Textural measures are then computed based on this matrix in much the same manner as is done in texture computation using the grey level cooccurrence matrix. The database consists of 37 cloud field scenes from LANDSAT imagery using a near IR visible channel. The classification algorithm used is the well known Stepwise Discriminant Analysis. The overall accuracy was estimated by the percentage or correct classifications in each case. It turns out that both types of classifiers, at their best combination of features, and at any given spatial resolution give approximately the same classification accuracy. A neural network based classifier with a feed forward architecture and a back propagation training algorithm is used to increase the classification accuracy, using these two classes of features. Preliminary results based on the GLDV textural features alone look promising.
Carboni, Davide; Gluhak, Alex; McCann, Julie A.; Beach, Thomas H.
2016-01-01
Water monitoring in households is important to ensure the sustainability of fresh water reserves on our planet. It provides stakeholders with the statistics required to formulate optimal strategies in residential water management. However, this should not be prohibitive and appliance-level water monitoring cannot practically be achieved by deploying sensors on every faucet or water-consuming device of interest due to the higher hardware costs and complexity, not to mention the risk of accidental leakages that can derive from the extra plumbing needed. Machine learning and data mining techniques are promising techniques to analyse monitored data to obtain non-intrusive water usage disaggregation. This is because they can discern water usage from the aggregated data acquired from a single point of observation. This paper provides an overview of water usage disaggregation systems and related techniques adopted for water event classification. The state-of-the art of algorithms and testbeds used for fixture recognition are reviewed and a discussion on the prominent challenges and future research are also included. PMID:27213397
Machine learning techniques for medical diagnosis of diabetes using iris images.
Samant, Piyush; Agarwal, Ravinder
2018-04-01
Complementary and alternative medicine techniques have shown their potential for the treatment and diagnosis of chronical diseases like diabetes, arthritis etc. On the same time digital image processing techniques for disease diagnosis is reliable and fastest growing field in biomedical. Proposed model is an attempt to evaluate diagnostic validity of an old complementary and alternative medicine technique, iridology for diagnosis of type-2 diabetes using soft computing methods. Investigation was performed over a close group of total 338 subjects (180 diabetic and 158 non-diabetic). Infra-red images of both the eyes were captured simultaneously. The region of interest from the iris image was cropped as zone corresponds to the position of pancreas organ according to the iridology chart. Statistical, texture and discrete wavelength transformation features were extracted from the region of interest. The results show best classification accuracy of 89.63% calculated from RF classifier. Maximum specificity and sensitivity were absorbed as 0.9687 and 0.988, respectively. Results have revealed the effectiveness and diagnostic significance of proposed model for non-invasive and automatic diabetes diagnosis. Copyright © 2018 Elsevier B.V. All rights reserved.