C-fuzzy variable-branch decision tree with storage and classification error rate constraints
NASA Astrophysics Data System (ADS)
Yang, Shiueng-Bien
2009-10-01
The C-fuzzy decision tree (CFDT), which is based on the fuzzy C-means algorithm, has recently been proposed. The CFDT is grown by selecting the nodes to be split according to its classification error rate. However, the CFDT design does not consider the classification time taken to classify the input vector. Thus, the CFDT can be improved. We propose a new C-fuzzy variable-branch decision tree (CFVBDT) with storage and classification error rate constraints. The design of the CFVBDT consists of two phases-growing and pruning. The CFVBDT is grown by selecting the nodes to be split according to the classification error rate and the classification time in the decision tree. Additionally, the pruning method selects the nodes to prune based on the storage requirement and the classification time of the CFVBDT. Furthermore, the number of branches of each internal node is variable in the CFVBDT. Experimental results indicate that the proposed CFVBDT outperforms the CFDT and other methods.
Effects of uncertainty and variability on population declines and IUCN Red List classifications.
Rueda-Cediel, Pamela; Anderson, Kurt E; Regan, Tracey J; Regan, Helen M
2018-01-22
The International Union for Conservation of Nature (IUCN) Red List Categories and Criteria is a quantitative framework for classifying species according to extinction risk. Population models may be used to estimate extinction risk or population declines. Uncertainty and variability arise in threat classifications through measurement and process error in empirical data and uncertainty in the models used to estimate extinction risk and population declines. Furthermore, species traits are known to affect extinction risk. We investigated the effects of measurement and process error, model type, population growth rate, and age at first reproduction on the reliability of risk classifications based on projected population declines on IUCN Red List classifications. We used an age-structured population model to simulate true population trajectories with different growth rates, reproductive ages and levels of variation, and subjected them to measurement error. We evaluated the ability of scalar and matrix models parameterized with these simulated time series to accurately capture the IUCN Red List classification generated with true population declines. Under all levels of measurement error tested and low process error, classifications were reasonably accurate; scalar and matrix models yielded roughly the same rate of misclassifications, but the distribution of errors differed; matrix models led to greater overestimation of extinction risk than underestimations; process error tended to contribute to misclassifications to a greater extent than measurement error; and more misclassifications occurred for fast, rather than slow, life histories. These results indicate that classifications of highly threatened taxa (i.e., taxa with low growth rates) under criterion A are more likely to be reliable than for less threatened taxa when assessed with population models. Greater scrutiny needs to be placed on data used to parameterize population models for species with high growth rates, particularly when available evidence indicates a potential transition to higher risk categories. © 2018 Society for Conservation Biology.
Classification based upon gene expression data: bias and precision of error rates.
Wood, Ian A; Visscher, Peter M; Mengersen, Kerrie L
2007-06-01
Gene expression data offer a large number of potentially useful predictors for the classification of tissue samples into classes, such as diseased and non-diseased. The predictive error rate of classifiers can be estimated using methods such as cross-validation. We have investigated issues of interpretation and potential bias in the reporting of error rate estimates. The issues considered here are optimization and selection biases, sampling effects, measures of misclassification rate, baseline error rates, two-level external cross-validation and a novel proposal for detection of bias using the permutation mean. Reporting an optimal estimated error rate incurs an optimization bias. Downward bias of 3-5% was found in an existing study of classification based on gene expression data and may be endemic in similar studies. Using a simulated non-informative dataset and two example datasets from existing studies, we show how bias can be detected through the use of label permutations and avoided using two-level external cross-validation. Some studies avoid optimization bias by using single-level cross-validation and a test set, but error rates can be more accurately estimated via two-level cross-validation. In addition to estimating the simple overall error rate, we recommend reporting class error rates plus where possible the conditional risk incorporating prior class probabilities and a misclassification cost matrix. We also describe baseline error rates derived from three trivial classifiers which ignore the predictors. R code which implements two-level external cross-validation with the PAMR package, experiment code, dataset details and additional figures are freely available for non-commercial use from http://www.maths.qut.edu.au/profiles/wood/permr.jsp
Particle Swarm Optimization approach to defect detection in armour ceramics.
Kesharaju, Manasa; Nagarajah, Romesh
2017-03-01
In this research, various extracted features were used in the development of an automated ultrasonic sensor based inspection system that enables defect classification in each ceramic component prior to despatch to the field. Classification is an important task and large number of irrelevant, redundant features commonly introduced to a dataset reduces the classifiers performance. Feature selection aims to reduce the dimensionality of the dataset while improving the performance of a classification system. In the context of a multi-criteria optimization problem (i.e. to minimize classification error rate and reduce number of features) such as one discussed in this research, the literature suggests that evolutionary algorithms offer good results. Besides, it is noted that Particle Swarm Optimization (PSO) has not been explored especially in the field of classification of high frequency ultrasonic signals. Hence, a binary coded Particle Swarm Optimization (BPSO) technique is investigated in the implementation of feature subset selection and to optimize the classification error rate. In the proposed method, the population data is used as input to an Artificial Neural Network (ANN) based classification system to obtain the error rate, as ANN serves as an evaluator of PSO fitness function. Copyright © 2016. Published by Elsevier B.V.
An extension of the receiver operating characteristic curve and AUC-optimal classification.
Takenouchi, Takashi; Komori, Osamu; Eguchi, Shinto
2012-10-01
While most proposed methods for solving classification problems focus on minimization of the classification error rate, we are interested in the receiver operating characteristic (ROC) curve, which provides more information about classification performance than the error rate does. The area under the ROC curve (AUC) is a natural measure for overall assessment of a classifier based on the ROC curve. We discuss a class of concave functions for AUC maximization in which a boosting-type algorithm including RankBoost is considered, and the Bayesian risk consistency and the lower bound of the optimum function are discussed. A procedure derived by maximizing a specific optimum function has high robustness, based on gross error sensitivity. Additionally, we focus on the partial AUC, which is the partial area under the ROC curve. For example, in medical screening, a high true-positive rate to the fixed lower false-positive rate is preferable and thus the partial AUC corresponding to lower false-positive rates is much more important than the remaining AUC. We extend the class of concave optimum functions for partial AUC optimality with the boosting algorithm. We investigated the validity of the proposed method through several experiments with data sets in the UCI repository.
Error Detection in Mechanized Classification Systems
ERIC Educational Resources Information Center
Hoyle, W. G.
1976-01-01
When documentary material is indexed by a mechanized classification system, and the results judged by trained professionals, the number of documents in disagreement, after suitable adjustment, defines the error rate of the system. In a test case disagreement was 22 percent and, of this 22 percent, the computer correctly identified two-thirds of…
Kalpathy-Cramer, Jayashree; Hersh, William
2008-01-01
In 2006 and 2007, Oregon Health & Science University (OHSU) participated in the automatic image annotation task for medical images at ImageCLEF, an annual international benchmarking event that is part of the Cross Language Evaluation Forum (CLEF). The goal of the automatic annotation task was to classify 1000 test images based on the Image Retrieval in Medical Applications (IRMA) code, given a set of 10,000 training images. There were 116 distinct classes in 2006 and 2007. We evaluated the efficacy of a variety of primarily global features for this classification task. These included features based on histograms, gray level correlation matrices and the gist technique. A multitude of classifiers including k-nearest neighbors, two-level neural networks, support vector machines, and maximum likelihood classifiers were evaluated. Our official error rates for the 1000 test images were 26% in 2006 using the flat classification structure. The error count in 2007 was 67.8 using the hierarchical classification error computation based on the IRMA code in 2007. Confusion matrices as well as clustering experiments were used to identify visually similar classes. The use of the IRMA code did not help us in the classification task as the semantic hierarchy of the IRMA classes did not correspond well with the hierarchy based on clustering of image features that we used. Our most frequent misclassification errors were along the view axis. Subsequent experiments based on a two-stage classification system decreased our error rate to 19.8% for the 2006 dataset and our error count to 55.4 for the 2007 data. PMID:19884953
Using Gaussian mixture models to detect and classify dolphin whistles and pulses.
Peso Parada, Pablo; Cardenal-López, Antonio
2014-06-01
In recent years, a number of automatic detection systems for free-ranging cetaceans have been proposed that aim to detect not just surfaced, but also submerged, individuals. These systems are typically based on pattern-recognition techniques applied to underwater acoustic recordings. Using a Gaussian mixture model, a classification system was developed that detects sounds in recordings and classifies them as one of four types: background noise, whistles, pulses, and combined whistles and pulses. The classifier was tested using a database of underwater recordings made off the Spanish coast during 2011. Using cepstral-coefficient-based parameterization, a sound detection rate of 87.5% was achieved for a 23.6% classification error rate. To improve these results, two parameters computed using the multiple signal classification algorithm and an unpredictability measure were included in the classifier. These parameters, which helped to classify the segments containing whistles, increased the detection rate to 90.3% and reduced the classification error rate to 18.1%. Finally, the potential of the multiple signal classification algorithm and unpredictability measure for estimating whistle contours and classifying cetacean species was also explored, with promising results.
Tsuji, Toshikazu; Nagata, Kenichiro; Kawashiri, Takehiro; Yamada, Takaaki; Irisa, Toshihiro; Murakami, Yuko; Kanaya, Akiko; Egashira, Nobuaki; Masuda, Satohiro
2016-01-01
There are many reports regarding various medical institutions' attempts at the prevention of dispensing errors. However, the relationship between occurrence timing of dispensing errors and subsequent danger to patients has not been studied under the situation according to the classification of drugs by efficacy. Therefore, we analyzed the relationship between position and time regarding the occurrence of dispensing errors. Furthermore, we investigated the relationship between occurrence timing of them and danger to patients. In this study, dispensing errors and incidents in three categories (drug name errors, drug strength errors, drug count errors) were classified into two groups in terms of its drug efficacy (efficacy similarity (-) group, efficacy similarity (+) group), into three classes in terms of the occurrence timing of dispensing errors (initial phase errors, middle phase errors, final phase errors). Then, the rates of damage shifting from "dispensing errors" to "damage to patients" were compared as an index of danger between two groups and among three classes. Consequently, the rate of damage in "efficacy similarity (-) group" was significantly higher than that in "efficacy similarity (+) group". Furthermore, the rate of damage is the highest in "initial phase errors", the lowest in "final phase errors" among three classes. From the results of this study, it became clear that the earlier the timing of dispensing errors occurs, the more severe the damage to patients becomes.
A Guide for Setting the Cut-Scores to Minimize Weighted Classification Errors in Test Batteries
ERIC Educational Resources Information Center
Grabovsky, Irina; Wainer, Howard
2017-01-01
In this article, we extend the methodology of the Cut-Score Operating Function that we introduced previously and apply it to a testing scenario with multiple independent components and different testing policies. We derive analytically the overall classification error rate for a test battery under the policy when several retakes are allowed for…
NASA Astrophysics Data System (ADS)
Situmorang, B. H.; Setiawan, M. P.; Tosida, E. T.
2017-01-01
Refractive errors are abnormalities of the refraction of light so that the shadows do not focus precisely on the retina resulting in blurred vision [1]. Refractive errors causing the patient should wear glasses or contact lenses in order eyesight returned to normal. The use of glasses or contact lenses in a person will be different from others, it is influenced by patient age, the amount of tear production, vision prescription, and astigmatic. Because the eye is one organ of the human body is very important to see, then the accuracy in determining glasses or contact lenses which will be used is required. This research aims to develop a decision support system that can produce output on the right contact lenses for refractive errors patients with a value of 100% accuracy. Iterative Dichotomize Three (ID3) classification methods will generate gain and entropy values of attributes that include code sample data, age of the patient, astigmatic, the ratio of tear production, vision prescription, and classes that will affect the outcome of the decision tree. The eye specialist test result for the training data obtained the accuracy rate of 96.7% and an error rate of 3.3%, the result test using confusion matrix obtained the accuracy rate of 96.1% and an error rate of 3.1%; for the data testing obtained accuracy rate of 100% and an error rate of 0.
Ensemble of classifiers for confidence-rated classification of NDE signal
NASA Astrophysics Data System (ADS)
Banerjee, Portia; Safdarnejad, Seyed; Udpa, Lalita; Udpa, Satish
2016-02-01
Ensemble of classifiers in general, aims to improve classification accuracy by combining results from multiple weak hypotheses into a single strong classifier through weighted majority voting. Improved versions of ensemble of classifiers generate self-rated confidence scores which estimate the reliability of each of its prediction and boost the classifier using these confidence-rated predictions. However, such a confidence metric is based only on the rate of correct classification. In existing works, although ensemble of classifiers has been widely used in computational intelligence, the effect of all factors of unreliability on the confidence of classification is highly overlooked. With relevance to NDE, classification results are affected by inherent ambiguity of classifica-tion, non-discriminative features, inadequate training samples and noise due to measurement. In this paper, we extend the existing ensemble classification by maximizing confidence of every classification decision in addition to minimizing the classification error. Initial results of the approach on data from eddy current inspection show improvement in classification performance of defect and non-defect indications.
Porter, Teresita M.; Golding, G. Brian
2012-01-01
Nuclear large subunit ribosomal DNA is widely used in fungal phylogenetics and to an increasing extent also amplicon-based environmental sequencing. The relatively short reads produced by next-generation sequencing, however, makes primer choice and sequence error important variables for obtaining accurate taxonomic classifications. In this simulation study we tested the performance of three classification methods: 1) a similarity-based method (BLAST + Metagenomic Analyzer, MEGAN); 2) a composition-based method (Ribosomal Database Project naïve Bayesian classifier, NBC); and, 3) a phylogeny-based method (Statistical Assignment Package, SAP). We also tested the effects of sequence length, primer choice, and sequence error on classification accuracy and perceived community composition. Using a leave-one-out cross validation approach, results for classifications to the genus rank were as follows: BLAST + MEGAN had the lowest error rate and was particularly robust to sequence error; SAP accuracy was highest when long LSU query sequences were classified; and, NBC runs significantly faster than the other tested methods. All methods performed poorly with the shortest 50–100 bp sequences. Increasing simulated sequence error reduced classification accuracy. Community shifts were detected due to sequence error and primer selection even though there was no change in the underlying community composition. Short read datasets from individual primers, as well as pooled datasets, appear to only approximate the true community composition. We hope this work informs investigators of some of the factors that affect the quality and interpretation of their environmental gene surveys. PMID:22558215
Optimization of the ANFIS using a genetic algorithm for physical work rate classification.
Habibi, Ehsanollah; Salehi, Mina; Yadegarfar, Ghasem; Taheri, Ali
2018-03-13
Recently, a new method was proposed for physical work rate classification based on an adaptive neuro-fuzzy inference system (ANFIS). This study aims to present a genetic algorithm (GA)-optimized ANFIS model for a highly accurate classification of physical work rate. Thirty healthy men participated in this study. Directly measured heart rate and oxygen consumption of the participants in the laboratory were used for training the ANFIS classifier model in MATLAB version 8.0.0 using a hybrid algorithm. A similar process was done using the GA as an optimization technique. The accuracy, sensitivity and specificity of the ANFIS classifier model were increased successfully. The mean accuracy of the model was increased from 92.95 to 97.92%. Also, the calculated root mean square error of the model was reduced from 5.4186 to 3.1882. The maximum estimation error of the optimized ANFIS during the network testing process was ± 5%. The GA can be effectively used for ANFIS optimization and leads to an accurate classification of physical work rate. In addition to high accuracy, simple implementation and inter-individual variability consideration are two other advantages of the presented model.
Documentation of procedures for textural/spatial pattern recognition techniques
NASA Technical Reports Server (NTRS)
Haralick, R. M.; Bryant, W. F.
1976-01-01
A C-130 aircraft was flown over the Sam Houston National Forest on March 21, 1973 at 10,000 feet altitude to collect multispectral scanner (MSS) data. Existing textural and spatial automatic processing techniques were used to classify the MSS imagery into specified timber categories. Several classification experiments were performed on this data using features selected from the spectral bands and a textural transform band. The results indicate that (1) spatial post-processing a classified image can cut the classification error to 1/2 or 1/3 of its initial value, (2) spatial post-processing the classified image using combined spectral and textural features produces a resulting image with less error than post-processing a classified image using only spectral features and (3) classification without spatial post processing using the combined spectral textural features tends to produce about the same error rate as a classification without spatial post processing using only spectral features.
ERIC Educational Resources Information Center
Greve, Kevin W.; Springer, Steven; Bianchini, Kevin J.; Black, F. William; Heinly, Matthew T.; Love, Jeffrey M.; Swift, Douglas A.; Ciota, Megan A.
2007-01-01
This study examined the sensitivity and false-positive error rate of reliable digit span (RDS) and the WAIS-III Digit Span (DS) scaled score in persons alleging toxic exposure and determined whether error rates differed from published rates in traumatic brain injury (TBI) and chronic pain (CP). Data were obtained from the files of 123 persons…
Wright, C.; Gallant, Alisa L.
2007-01-01
The U.S. Fish and Wildlife Service uses the term palustrine wetland to describe vegetated wetlands traditionally identified as marsh, bog, fen, swamp, or wet meadow. Landsat TM imagery was combined with image texture and ancillary environmental data to model probabilities of palustrine wetland occurrence in Yellowstone National Park using classification trees. Model training and test locations were identified from National Wetlands Inventory maps, and classification trees were built for seven years spanning a range of annual precipitation. At a coarse level, palustrine wetland was separated from upland. At a finer level, five palustrine wetland types were discriminated: aquatic bed (PAB), emergent (PEM), forested (PFO), scrub–shrub (PSS), and unconsolidated shore (PUS). TM-derived variables alone were relatively accurate at separating wetland from upland, but model error rates dropped incrementally as image texture, DEM-derived terrain variables, and other ancillary GIS layers were added. For classification trees making use of all available predictors, average overall test error rates were 7.8% for palustrine wetland/upland models and 17.0% for palustrine wetland type models, with consistent accuracies across years. However, models were prone to wetland over-prediction. While the predominant PEM class was classified with omission and commission error rates less than 14%, we had difficulty identifying the PAB and PSS classes. Ancillary vegetation information greatly improved PSS classification and moderately improved PFO discrimination. Association with geothermal areas distinguished PUS wetlands. Wetland over-prediction was exacerbated by class imbalance in likely combination with spatial and spectral limitations of the TM sensor. Wetland probability surfaces may be more informative than hard classification, and appear to respond to climate-driven wetland variability. The developed method is portable, relatively easy to implement, and should be applicable in other settings and over larger extents.
Linear and Order Statistics Combiners for Pattern Classification
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Ghosh, Joydeep; Lau, Sonie (Technical Monitor)
2001-01-01
Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the 'added' error. If N unbiased classifiers are combined by simple averaging. the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the i-th order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.
Bayes Error Rate Estimation Using Classifier Ensembles
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Ghosh, Joydeep
2003-01-01
The Bayes error rate gives a statistical lower bound on the error achievable for a given classification problem and the associated choice of features. By reliably estimating th is rate, one can assess the usefulness of the feature set that is being used for classification. Moreover, by comparing the accuracy achieved by a given classifier with the Bayes rate, one can quantify how effective that classifier is. Classical approaches for estimating or finding bounds for the Bayes error, in general, yield rather weak results for small sample sizes; unless the problem has some simple characteristics, such as Gaussian class-conditional likelihoods. This article shows how the outputs of a classifier ensemble can be used to provide reliable and easily obtainable estimates of the Bayes error with negligible extra computation. Three methods of varying sophistication are described. First, we present a framework that estimates the Bayes error when multiple classifiers, each providing an estimate of the a posteriori class probabilities, a recombined through averaging. Second, we bolster this approach by adding an information theoretic measure of output correlation to the estimate. Finally, we discuss a more general method that just looks at the class labels indicated by ensem ble members and provides error estimates based on the disagreements among classifiers. The methods are illustrated for artificial data, a difficult four-class problem involving underwater acoustic data, and two problems from the Problem benchmarks. For data sets with known Bayes error, the combiner-based methods introduced in this article outperform existing methods. The estimates obtained by the proposed methods also seem quite reliable for the real-life data sets for which the true Bayes rates are unknown.
Robust Transmission of H.264/AVC Streams Using Adaptive Group Slicing and Unequal Error Protection
NASA Astrophysics Data System (ADS)
Thomos, Nikolaos; Argyropoulos, Savvas; Boulgouris, Nikolaos V.; Strintzis, Michael G.
2006-12-01
We present a novel scheme for the transmission of H.264/AVC video streams over lossy packet networks. The proposed scheme exploits the error-resilient features of H.264/AVC codec and employs Reed-Solomon codes to protect effectively the streams. A novel technique for adaptive classification of macroblocks into three slice groups is also proposed. The optimal classification of macroblocks and the optimal channel rate allocation are achieved by iterating two interdependent steps. Dynamic programming techniques are used for the channel rate allocation process in order to reduce complexity. Simulations clearly demonstrate the superiority of the proposed method over other recent algorithms for transmission of H.264/AVC streams.
On the statistical assessment of classifiers using DNA microarray data
Ancona, N; Maglietta, R; Piepoli, A; D'Addabbo, A; Cotugno, R; Savino, M; Liuni, S; Carella, M; Pesole, G; Perri, F
2006-01-01
Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set size increases, reaching its best performances with 35 training examples. In this case, RLS and SVM have error rates of e = 14% (p = 0.027) and e = 11% (p = 0.019). Concerning the number of genes, we found about 6000 genes (p < 0.05) correlated with the pathology, resulting from the signal-to-noise statistic. Moreover the performances of RLS and SVM classifiers do not change when 74% of genes is used. They progressively reduce up to e = 16% (p < 0.05) when only 2 genes are employed. The biological relevance of a set of genes determined by our statistical analysis and the major roles they play in colorectal tumorigenesis is discussed. Conclusions The method proposed provides statistically significant answers to precise questions relevant for the diagnosis and prognosis of cancer. We found that, with as few as 15 examples, it is possible to train statistically significant classifiers for colon cancer diagnosis. As for the definition of the number of genes sufficient for a reliable classification of colon cancer, our results suggest that it depends on the accuracy required. PMID:16919171
Rueckauer, Bodo; Lungu, Iulia-Alexandra; Hu, Yuhuang; Pfeiffer, Michael; Liu, Shih-Chii
2017-01-01
Spiking neural networks (SNNs) can potentially offer an efficient way of doing inference because the neurons in the networks are sparsely activated and computations are event-driven. Previous work showed that simple continuous-valued deep Convolutional Neural Networks (CNNs) can be converted into accurate spiking equivalents. These networks did not include certain common operations such as max-pooling, softmax, batch-normalization and Inception-modules. This paper presents spiking equivalents of these operations therefore allowing conversion of nearly arbitrary CNN architectures. We show conversion of popular CNN architectures, including VGG-16 and Inception-v3, into SNNs that produce the best results reported to date on MNIST, CIFAR-10 and the challenging ImageNet dataset. SNNs can trade off classification error rate against the number of available operations whereas deep continuous-valued neural networks require a fixed number of operations to achieve their classification error rate. From the examples of LeNet for MNIST and BinaryNet for CIFAR-10, we show that with an increase in error rate of a few percentage points, the SNNs can achieve more than 2x reductions in operations compared to the original CNNs. This highlights the potential of SNNs in particular when deployed on power-efficient neuromorphic spiking neuron chips, for use in embedded applications.
Rueckauer, Bodo; Lungu, Iulia-Alexandra; Hu, Yuhuang; Pfeiffer, Michael; Liu, Shih-Chii
2017-01-01
Spiking neural networks (SNNs) can potentially offer an efficient way of doing inference because the neurons in the networks are sparsely activated and computations are event-driven. Previous work showed that simple continuous-valued deep Convolutional Neural Networks (CNNs) can be converted into accurate spiking equivalents. These networks did not include certain common operations such as max-pooling, softmax, batch-normalization and Inception-modules. This paper presents spiking equivalents of these operations therefore allowing conversion of nearly arbitrary CNN architectures. We show conversion of popular CNN architectures, including VGG-16 and Inception-v3, into SNNs that produce the best results reported to date on MNIST, CIFAR-10 and the challenging ImageNet dataset. SNNs can trade off classification error rate against the number of available operations whereas deep continuous-valued neural networks require a fixed number of operations to achieve their classification error rate. From the examples of LeNet for MNIST and BinaryNet for CIFAR-10, we show that with an increase in error rate of a few percentage points, the SNNs can achieve more than 2x reductions in operations compared to the original CNNs. This highlights the potential of SNNs in particular when deployed on power-efficient neuromorphic spiking neuron chips, for use in embedded applications. PMID:29375284
Farwell, Lawrence A.; Richardson, Drew C.; Richardson, Graham M.; Furedy, John J.
2014-01-01
A classification concealed information test (CIT) used the “brain fingerprinting” method of applying P300 event-related potential (ERP) in detecting information that is (1) acquired in real life and (2) unique to US Navy experts in military medicine. Military medicine experts and non-experts were asked to push buttons in response to three types of text stimuli. Targets contain known information relevant to military medicine, are identified to subjects as relevant, and require pushing one button. Subjects are told to push another button to all other stimuli. Probes contain concealed information relevant to military medicine, and are not identified to subjects. Irrelevants contain equally plausible, but incorrect/irrelevant information. Error rate was 0%. Median and mean statistical confidences for individual determinations were 99.9% with no indeterminates (results lacking sufficiently high statistical confidence to be classified). We compared error rate and statistical confidence for determinations of both information present and information absent produced by classification CIT (Is a probe ERP more similar to a target or to an irrelevant ERP?) vs. comparison CIT (Does a probe produce a larger ERP than an irrelevant?) using P300 plus the late negative component (LNP; together, P300-MERMER). Comparison CIT produced a significantly higher error rate (20%) and lower statistical confidences: mean 67%; information-absent mean was 28.9%, less than chance (50%). We compared analysis using P300 alone with the P300 + LNP. P300 alone produced the same 0% error rate but significantly lower statistical confidences. These findings add to the evidence that the brain fingerprinting methods as described here provide sufficient conditions to produce less than 1% error rate and greater than 95% median statistical confidence in a CIT on information obtained in the course of real life that is characteristic of individuals with specific training, expertise, or organizational affiliation. PMID:25565941
Evaluation of normalization methods for cDNA microarray data by k-NN classification
Wu, Wei; Xing, Eric P; Myers, Connie; Mian, I Saira; Bissell, Mina J
2005-01-01
Background Non-biological factors give rise to unwanted variations in cDNA microarray data. There are many normalization methods designed to remove such variations. However, to date there have been few published systematic evaluations of these techniques for removing variations arising from dye biases in the context of downstream, higher-order analytical tasks such as classification. Results Ten location normalization methods that adjust spatial- and/or intensity-dependent dye biases, and three scale methods that adjust scale differences were applied, individually and in combination, to five distinct, published, cancer biology-related cDNA microarray data sets. Leave-one-out cross-validation (LOOCV) classification error was employed as the quantitative end-point for assessing the effectiveness of a normalization method. In particular, a known classifier, k-nearest neighbor (k-NN), was estimated from data normalized using a given technique, and the LOOCV error rate of the ensuing model was computed. We found that k-NN classifiers are sensitive to dye biases in the data. Using NONRM and GMEDIAN as baseline methods, our results show that single-bias-removal techniques which remove either spatial-dependent dye bias (referred later as spatial effect) or intensity-dependent dye bias (referred later as intensity effect) moderately reduce LOOCV classification errors; whereas double-bias-removal techniques which remove both spatial- and intensity effect reduce LOOCV classification errors even further. Of the 41 different strategies examined, three two-step processes, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, all of which removed intensity effect globally and spatial effect locally, appear to reduce LOOCV classification errors most consistently and effectively across all data sets. We also found that the investigated scale normalization methods do not reduce LOOCV classification error. Conclusion Using LOOCV error of k-NNs as the evaluation criterion, three double-bias-removal normalization strategies, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, outperform other strategies for removing spatial effect, intensity effect and scale differences from cDNA microarray data. The apparent sensitivity of k-NN LOOCV classification error to dye biases suggests that this criterion provides an informative measure for evaluating normalization methods. All the computational tools used in this study were implemented using the R language for statistical computing and graphics. PMID:16045803
Evaluation of normalization methods for cDNA microarray data by k-NN classification.
Wu, Wei; Xing, Eric P; Myers, Connie; Mian, I Saira; Bissell, Mina J
2005-07-26
Non-biological factors give rise to unwanted variations in cDNA microarray data. There are many normalization methods designed to remove such variations. However, to date there have been few published systematic evaluations of these techniques for removing variations arising from dye biases in the context of downstream, higher-order analytical tasks such as classification. Ten location normalization methods that adjust spatial- and/or intensity-dependent dye biases, and three scale methods that adjust scale differences were applied, individually and in combination, to five distinct, published, cancer biology-related cDNA microarray data sets. Leave-one-out cross-validation (LOOCV) classification error was employed as the quantitative end-point for assessing the effectiveness of a normalization method. In particular, a known classifier, k-nearest neighbor (k-NN), was estimated from data normalized using a given technique, and the LOOCV error rate of the ensuing model was computed. We found that k-NN classifiers are sensitive to dye biases in the data. Using NONRM and GMEDIAN as baseline methods, our results show that single-bias-removal techniques which remove either spatial-dependent dye bias (referred later as spatial effect) or intensity-dependent dye bias (referred later as intensity effect) moderately reduce LOOCV classification errors; whereas double-bias-removal techniques which remove both spatial- and intensity effect reduce LOOCV classification errors even further. Of the 41 different strategies examined, three two-step processes, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, all of which removed intensity effect globally and spatial effect locally, appear to reduce LOOCV classification errors most consistently and effectively across all data sets. We also found that the investigated scale normalization methods do not reduce LOOCV classification error. Using LOOCV error of k-NNs as the evaluation criterion, three double-bias-removal normalization strategies, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, outperform other strategies for removing spatial effect, intensity effect and scale differences from cDNA microarray data. The apparent sensitivity of k-NN LOOCV classification error to dye biases suggests that this criterion provides an informative measure for evaluating normalization methods. All the computational tools used in this study were implemented using the R language for statistical computing and graphics.
Improved classification accuracy by feature extraction using genetic algorithms
NASA Astrophysics Data System (ADS)
Patriarche, Julia; Manduca, Armando; Erickson, Bradley J.
2003-05-01
A feature extraction algorithm has been developed for the purposes of improving classification accuracy. The algorithm uses a genetic algorithm / hill-climber hybrid to generate a set of linearly recombined features, which may be of reduced dimensionality compared with the original set. The genetic algorithm performs the global exploration, and a hill climber explores local neighborhoods. Hybridizing the genetic algorithm with a hill climber improves both the rate of convergence, and the final overall cost function value; it also reduces the sensitivity of the genetic algorithm to parameter selection. The genetic algorithm includes the operators: crossover, mutation, and deletion / reactivation - the last of these effects dimensionality reduction. The feature extractor is supervised, and is capable of deriving a separate feature space for each tissue (which are reintegrated during classification). A non-anatomical digital phantom was developed as a gold standard for testing purposes. In tests with the phantom, and with images of multiple sclerosis patients, classification with feature extractor derived features yielded lower error rates than using standard pulse sequences, and with features derived using principal components analysis. Using the multiple sclerosis patient data, the algorithm resulted in a mean 31% reduction in classification error of pure tissues.
NASA Astrophysics Data System (ADS)
Wu, Jie; Besnehard, Quentin; Marchessoux, Cédric
2011-03-01
Clinical studies for the validation of new medical imaging devices require hundreds of images. An important step in creating and tuning the study protocol is the classification of images into "difficult" and "easy" cases. This consists of classifying the image based on features like the complexity of the background, the visibility of the disease (lesions). Therefore, an automatic medical background classification tool for mammograms would help for such clinical studies. This classification tool is based on a multi-content analysis framework (MCA) which was firstly developed to recognize image content of computer screen shots. With the implementation of new texture features and a defined breast density scale, the MCA framework is able to automatically classify digital mammograms with a satisfying accuracy. BI-RADS (Breast Imaging Reporting Data System) density scale is used for grouping the mammograms, which standardizes the mammography reporting terminology and assessment and recommendation categories. Selected features are input into a decision tree classification scheme in MCA framework, which is the so called "weak classifier" (any classifier with a global error rate below 50%). With the AdaBoost iteration algorithm, these "weak classifiers" are combined into a "strong classifier" (a classifier with a low global error rate) for classifying one category. The results of classification for one "strong classifier" show the good accuracy with the high true positive rates. For the four categories the results are: TP=90.38%, TN=67.88%, FP=32.12% and FN =9.62%.
Spencer, Bruce D
2012-06-01
Latent class models are increasingly used to assess the accuracy of medical diagnostic tests and other classifications when no gold standard is available and the true state is unknown. When the latent class is treated as the true class, the latent class models provide measures of components of accuracy including specificity and sensitivity and their complements, type I and type II error rates. The error rates according to the latent class model differ from the true error rates, however, and empirical comparisons with a gold standard suggest the true error rates often are larger. We investigate conditions under which the true type I and type II error rates are larger than those provided by the latent class models. Results from Uebersax (1988, Psychological Bulletin 104, 405-416) are extended to accommodate random effects and covariates affecting the responses. The results are important for interpreting the results of latent class analyses. An error decomposition is presented that incorporates an error component from invalidity of the latent class model. © 2011, The International Biometric Society.
Simulated rRNA/DNA Ratios Show Potential To Misclassify Active Populations as Dormant
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steven, Blaire; Hesse, Cedar; Soghigian, John
The use of rRNA/DNA ratios derived from surveys of rRNA sequences in RNA and DNA extracts is an appealing but poorly validated approach to infer the activity status of environmental microbes. To improve the interpretation of rRNA/DNA ratios, we performed simulations to investigate the effects of community structure, rRNA amplification, and sampling depth on the accuracy of rRNA/DNA ratios in classifying bacterial populations as “active” or “dormant.” Community structure was an insignificant factor. In contrast, the extent of rRNA amplification that occurs as cells transition from dormant to growing had a significant effect (P < 0.0001) on classification accuracy, withmore » misclassification errors ranging from 16 to 28%, depending on the rRNA amplification model. The error rate increased to 47% when communities included a mixture of rRNA amplification models, but most of the inflated error was false negatives (i.e., active populations misclassified as dormant). Sampling depth also affected error rates (P < 0.001). Inadequate sampling depth produced various artifacts that are characteristic of rRNA/DNA ratios generated from real communities. These data show important constraints on the use of rRNA/DNA ratios to infer activity status. Whereas classification of populations as active based on rRNA/DNA ratios appears generally valid, classification of populations as dormant is potentially far less accurate.« less
Simulated rRNA/DNA Ratios Show Potential To Misclassify Active Populations as Dormant
Steven, Blaire; Hesse, Cedar; Soghigian, John; ...
2017-03-31
The use of rRNA/DNA ratios derived from surveys of rRNA sequences in RNA and DNA extracts is an appealing but poorly validated approach to infer the activity status of environmental microbes. To improve the interpretation of rRNA/DNA ratios, we performed simulations to investigate the effects of community structure, rRNA amplification, and sampling depth on the accuracy of rRNA/DNA ratios in classifying bacterial populations as “active” or “dormant.” Community structure was an insignificant factor. In contrast, the extent of rRNA amplification that occurs as cells transition from dormant to growing had a significant effect (P < 0.0001) on classification accuracy, withmore » misclassification errors ranging from 16 to 28%, depending on the rRNA amplification model. The error rate increased to 47% when communities included a mixture of rRNA amplification models, but most of the inflated error was false negatives (i.e., active populations misclassified as dormant). Sampling depth also affected error rates (P < 0.001). Inadequate sampling depth produced various artifacts that are characteristic of rRNA/DNA ratios generated from real communities. These data show important constraints on the use of rRNA/DNA ratios to infer activity status. Whereas classification of populations as active based on rRNA/DNA ratios appears generally valid, classification of populations as dormant is potentially far less accurate.« less
NASA Astrophysics Data System (ADS)
Bechet, P.; Mitran, R.; Munteanu, M.
2013-08-01
Non-contact methods for the assessment of vital signs are of great interest for specialists due to the benefits obtained in both medical and special applications, such as those for surveillance, monitoring, and search and rescue. This paper investigates the possibility of implementing a digital processing algorithm based on the MUSIC (Multiple Signal Classification) parametric spectral estimation in order to reduce the observation time needed to accurately measure the heart rate. It demonstrates that, by proper dimensioning the signal subspace, the MUSIC algorithm can be optimized in order to accurately assess the heart rate during an 8-28 s time interval. The validation of the processing algorithm performance was achieved by minimizing the mean error of the heart rate after performing simultaneous comparative measurements on several subjects. In order to calculate the error the reference value of heart rate was measured using a classic measurement system through direct contact.
Development of a methodology for classifying software errors
NASA Technical Reports Server (NTRS)
Gerhart, S. L.
1976-01-01
A mathematical formalization of the intuition behind classification of software errors is devised and then extended to a classification discipline: Every classification scheme should have an easily discernible mathematical structure and certain properties of the scheme should be decidable (although whether or not these properties hold is relative to the intended use of the scheme). Classification of errors then becomes an iterative process of generalization from actual errors to terms defining the errors together with adjustment of definitions according to the classification discipline. Alternatively, whenever possible, small scale models may be built to give more substance to the definitions. The classification discipline and the difficulties of definition are illustrated by examples of classification schemes from the literature and a new study of observed errors in published papers of programming methodologies.
Classification and reduction of pilot error
NASA Technical Reports Server (NTRS)
Rogers, W. H.; Logan, A. L.; Boley, G. D.
1989-01-01
Human error is a primary or contributing factor in about two-thirds of commercial aviation accidents worldwide. With the ultimate goal of reducing pilot error accidents, this contract effort is aimed at understanding the factors underlying error events and reducing the probability of certain types of errors by modifying underlying factors such as flight deck design and procedures. A review of the literature relevant to error classification was conducted. Classification includes categorizing types of errors, the information processing mechanisms and factors underlying them, and identifying factor-mechanism-error relationships. The classification scheme developed by Jens Rasmussen was adopted because it provided a comprehensive yet basic error classification shell or structure that could easily accommodate addition of details on domain-specific factors. For these purposes, factors specific to the aviation environment were incorporated. Hypotheses concerning the relationship of a small number of underlying factors, information processing mechanisms, and error types types identified in the classification scheme were formulated. ASRS data were reviewed and a simulation experiment was performed to evaluate and quantify the hypotheses.
A real-time heat strain risk classifier using heart rate and skin temperature.
Buller, Mark J; Latzka, William A; Yokota, Miyo; Tharion, William J; Moran, Daniel S
2008-12-01
Heat injury is a real concern to workers engaged in physically demanding tasks in high heat strain environments. Several real-time physiological monitoring systems exist that can provide indices of heat strain, e.g. physiological strain index (PSI), and provide alerts to medical personnel. However, these systems depend on core temperature measurement using expensive, ingestible thermometer pills. Seeking a better solution, we suggest the use of a model which can identify the probability that individuals are 'at risk' from heat injury using non-invasive measures. The intent is for the system to identify individuals who need monitoring more closely or who should apply heat strain mitigation strategies. We generated a model that can identify 'at risk' (PSI 7.5) workers from measures of heart rate and chest skin temperature. The model was built using data from six previously published exercise studies in which some subjects wore chemical protective equipment. The model has an overall classification error rate of 10% with one false negative error (2.7%), and outperforms an earlier model and a least squares regression model with classification errors of 21% and 14%, respectively. Additionally, the model allows the classification criteria to be adjusted based on the task and acceptable level of risk. We conclude that the model could be a valuable part of a multi-faceted heat strain management system.
Validation of tool mark analysis of cut costal cartilage.
Love, Jennifer C; Derrick, Sharon M; Wiersema, Jason M; Peters, Charles
2012-03-01
This study was designed to establish the potential error rate associated with the generally accepted method of tool mark analysis of cut marks in costal cartilage. Three knives with different blade types were used to make experimental cut marks in costal cartilage of pigs. Each cut surface was cast, and each cast was examined by three analysts working independently. The presence of striations, regularity of striations, and presence of a primary and secondary striation pattern were recorded for each cast. The distance between each striation was measured. The results showed that striations were not consistently impressed on the cut surface by the blade's cutting edge. Also, blade type classification by the presence or absence of striations led to a 65% misclassification rate. Use of the classification tree and cross-validation methods and inclusion of the mean interstriation distance decreased the error rate to c. 50%. © 2011 American Academy of Forensic Sciences.
Umut, İlhan; Çentik, Güven
2016-01-01
The number of channels used for polysomnographic recording frequently causes difficulties for patients because of the many cables connected. Also, it increases the risk of having troubles during recording process and increases the storage volume. In this study, it is intended to detect periodic leg movement (PLM) in sleep with the use of the channels except leg electromyography (EMG) by analysing polysomnography (PSG) data with digital signal processing (DSP) and machine learning methods. PSG records of 153 patients of different ages and genders with PLM disorder diagnosis were examined retrospectively. A novel software was developed for the analysis of PSG records. The software utilizes the machine learning algorithms, statistical methods, and DSP methods. In order to classify PLM, popular machine learning methods (multilayer perceptron, K-nearest neighbour, and random forests) and logistic regression were used. Comparison of classified results showed that while K-nearest neighbour classification algorithm had higher average classification rate (91.87%) and lower average classification error value (RMSE = 0.2850), multilayer perceptron algorithm had the lowest average classification rate (83.29%) and the highest average classification error value (RMSE = 0.3705). Results showed that PLM can be classified with high accuracy (91.87%) without leg EMG record being present. PMID:27213008
Umut, İlhan; Çentik, Güven
2016-01-01
The number of channels used for polysomnographic recording frequently causes difficulties for patients because of the many cables connected. Also, it increases the risk of having troubles during recording process and increases the storage volume. In this study, it is intended to detect periodic leg movement (PLM) in sleep with the use of the channels except leg electromyography (EMG) by analysing polysomnography (PSG) data with digital signal processing (DSP) and machine learning methods. PSG records of 153 patients of different ages and genders with PLM disorder diagnosis were examined retrospectively. A novel software was developed for the analysis of PSG records. The software utilizes the machine learning algorithms, statistical methods, and DSP methods. In order to classify PLM, popular machine learning methods (multilayer perceptron, K-nearest neighbour, and random forests) and logistic regression were used. Comparison of classified results showed that while K-nearest neighbour classification algorithm had higher average classification rate (91.87%) and lower average classification error value (RMSE = 0.2850), multilayer perceptron algorithm had the lowest average classification rate (83.29%) and the highest average classification error value (RMSE = 0.3705). Results showed that PLM can be classified with high accuracy (91.87%) without leg EMG record being present.
Effects of stress typicality during speeded grammatical classification.
Arciuli, Joanne; Cupples, Linda
2003-01-01
The experiments reported here were designed to investigate the influence of stress typicality during speeded grammatical classification of disyllabic English words by native and non-native speakers. Trochaic nouns and iambic gram verbs were considered to be typically stressed, whereas iambic nouns and trochaic verbs were considered to be atypically stressed. Experiments 1a and 2a showed that while native speakers classified typically stressed words individual more quickly and more accurately than atypically stressed words during differences reading, there were no overall effects during classification of spoken stimuli. However, a subgroup of native speakers with high error rates did show a significant effect during classification of spoken stimuli. Experiments 1b and 2b showed that non-native speakers classified typically stressed words more quickly and more accurately than atypically stressed words during reading. Typically stressed words were classified more accurately than atypically stressed words when the stimuli were spoken. Importantly, there was a significant relationship between error rates, vocabulary size and the size of the stress typicality effect in each experiment. We conclude that participants use information about lexical stress to help them distinguish between disyllabic nouns and verbs during speeded grammatical classification. This is especially so for individuals with a limited vocabulary who lack other knowledge (e.g., semantic knowledge) about the differences between these grammatical categories.
Influence of nuclei segmentation on breast cancer malignancy classification
NASA Astrophysics Data System (ADS)
Jelen, Lukasz; Fevens, Thomas; Krzyzak, Adam
2009-02-01
Breast Cancer is one of the most deadly cancers affecting middle-aged women. Accurate diagnosis and prognosis are crucial to reduce the high death rate. Nowadays there are numerous diagnostic tools for breast cancer diagnosis. In this paper we discuss a role of nuclear segmentation from fine needle aspiration biopsy (FNA) slides and its influence on malignancy classification. Classification of malignancy plays a very important role during the diagnosis process of breast cancer. Out of all cancer diagnostic tools, FNA slides provide the most valuable information about the cancer malignancy grade which helps to choose an appropriate treatment. This process involves assessing numerous nuclear features and therefore precise segmentation of nuclei is very important. In this work we compare three powerful segmentation approaches and test their impact on the classification of breast cancer malignancy. The studied approaches involve level set segmentation, fuzzy c-means segmentation and textural segmentation based on co-occurrence matrix. Segmented nuclei were used to extract nuclear features for malignancy classification. For classification purposes four different classifiers were trained and tested with previously extracted features. The compared classifiers are Multilayer Perceptron (MLP), Self-Organizing Maps (SOM), Principal Component-based Neural Network (PCA) and Support Vector Machines (SVM). The presented results show that level set segmentation yields the best results over the three compared approaches and leads to a good feature extraction with a lowest average error rate of 6.51% over four different classifiers. The best performance was recorded for multilayer perceptron with an error rate of 3.07% using fuzzy c-means segmentation.
The nearest neighbor and the bayes error rates.
Loizou, G; Maybank, S J
1987-02-01
The (k, l) nearest neighbor method of pattern classification is compared to the Bayes method. If the two acceptance rates are equal then the asymptotic error rates satisfy the inequalities Ek,l + 1 ¿ E*(¿) ¿ Ek,l dE*(¿), where d is a function of k, l, and the number of pattern classes, and ¿ is the reject threshold for the Bayes method. An explicit expression for d is given which is optimal in the sense that for some probability distributions Ek,l and dE* (¿) are equal.
Production rates for crews using hand tools on firelines
Lisa Haven; T. Parkin Hunter; Theodore G. Storey
1982-01-01
Reported rates at which hand crews construct firelines can vary widely because of differences in fuels, fire and measurement conditions, and fuel resistance-to-control classification schemes. Real-time fire dispatching and fire simulation planning models, however, require accurate estimates of hand crew productivity. Errors in estimating rate of fireline production...
Goo, Yeung-Ja James; Chi, Der-Jang; Shen, Zong-De
2016-01-01
The purpose of this study is to establish rigorous and reliable going concern doubt (GCD) prediction models. This study first uses the least absolute shrinkage and selection operator (LASSO) to select variables and then applies data mining techniques to establish prediction models, such as neural network (NN), classification and regression tree (CART), and support vector machine (SVM). The samples of this study include 48 GCD listed companies and 124 NGCD (non-GCD) listed companies from 2002 to 2013 in the TEJ database. We conduct fivefold cross validation in order to identify the prediction accuracy. According to the empirical results, the prediction accuracy of the LASSO-NN model is 88.96 % (Type I error rate is 12.22 %; Type II error rate is 7.50 %), the prediction accuracy of the LASSO-CART model is 88.75 % (Type I error rate is 13.61 %; Type II error rate is 14.17 %), and the prediction accuracy of the LASSO-SVM model is 89.79 % (Type I error rate is 10.00 %; Type II error rate is 15.83 %).
Hettick, Justin M; Green, Brett J; Buskirk, Amanda D; Kashon, Michael L; Slaven, James E; Janotka, Erika; Blachere, Francoise M; Schmechel, Detlef; Beezhold, Donald H
2008-09-15
Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) was used to generate highly reproducible mass spectral fingerprints for 12 species of fungi of the genus Aspergillus and 5 different strains of Aspergillus flavus. Prior to MALDI-TOF MS analysis, the fungi were subjected to three 1-min bead beating cycles in an acetonitrile/trifluoroacetic acid solvent. The mass spectra contain abundant peaks in the range of 5 to 20kDa and may be used to discriminate between species unambiguously. A discriminant analysis using all peaks from the MALDI-TOF MS data yielded error rates for classification of 0 and 18.75% for resubstitution and cross-validation methods, respectively. If a subset of 28 significant peaks is chosen, resubstitution and cross-validation error rates are 0%. Discriminant analysis of the MALDI-TOF MS data for 5 strains of A. flavus using all peaks yielded error rates for classification of 0 and 5% for resubstitution and cross-validation methods, respectively. These data indicate that MALDI-TOF MS data may be used for unambiguous identification of members of the genus Aspergillus at both the species and strain levels.
3D multi-view convolutional neural networks for lung nodule classification
Kang, Guixia; Hou, Beibei; Zhang, Ningbo
2017-01-01
The 3D convolutional neural network (CNN) is able to make full use of the spatial 3D context information of lung nodules, and the multi-view strategy has been shown to be useful for improving the performance of 2D CNN in classifying lung nodules. In this paper, we explore the classification of lung nodules using the 3D multi-view convolutional neural networks (MV-CNN) with both chain architecture and directed acyclic graph architecture, including 3D Inception and 3D Inception-ResNet. All networks employ the multi-view-one-network strategy. We conduct a binary classification (benign and malignant) and a ternary classification (benign, primary malignant and metastatic malignant) on Computed Tomography (CT) images from Lung Image Database Consortium and Image Database Resource Initiative database (LIDC-IDRI). All results are obtained via 10-fold cross validation. As regards the MV-CNN with chain architecture, results show that the performance of 3D MV-CNN surpasses that of 2D MV-CNN by a significant margin. Finally, a 3D Inception network achieved an error rate of 4.59% for the binary classification and 7.70% for the ternary classification, both of which represent superior results for the corresponding task. We compare the multi-view-one-network strategy with the one-view-one-network strategy. The results reveal that the multi-view-one-network strategy can achieve a lower error rate than the one-view-one-network strategy. PMID:29145492
Statistical inference for template aging
NASA Astrophysics Data System (ADS)
Schuckers, Michael E.
2006-04-01
A change in classification error rates for a biometric device is often referred to as template aging. Here we offer two methods for determining whether the effect of time is statistically significant. The first of these is the use of a generalized linear model to determine if these error rates change linearly over time. This approach generalizes previous work assessing the impact of covariates using generalized linear models. The second approach uses of likelihood ratio tests methodology. The focus here is on statistical methods for estimation not the underlying cause of the change in error rates over time. These methodologies are applied to data from the National Institutes of Standards and Technology Biometric Score Set Release 1. The results of these applications are discussed.
Improved EEG Event Classification Using Differential Energy.
Harati, A; Golmohammadi, M; Lopez, S; Obeid, I; Picone, J
2015-12-01
Feature extraction for automatic classification of EEG signals typically relies on time frequency representations of the signal. Techniques such as cepstral-based filter banks or wavelets are popular analysis techniques in many signal processing applications including EEG classification. In this paper, we present a comparison of a variety of approaches to estimating and postprocessing features. To further aid in discrimination of periodic signals from aperiodic signals, we add a differential energy term. We evaluate our approaches on the TUH EEG Corpus, which is the largest publicly available EEG corpus and an exceedingly challenging task due to the clinical nature of the data. We demonstrate that a variant of a standard filter bank-based approach, coupled with first and second derivatives, provides a substantial reduction in the overall error rate. The combination of differential energy and derivatives produces a 24 % absolute reduction in the error rate and improves our ability to discriminate between signal events and background noise. This relatively simple approach proves to be comparable to other popular feature extraction approaches such as wavelets, but is much more computationally efficient.
NASA Astrophysics Data System (ADS)
Kurniawan, Dian; Suparti; Sugito
2018-05-01
Population growth in Indonesia has increased every year. According to the population census conducted by the Central Bureau of Statistics (BPS) in 2010, the population of Indonesia has reached 237.6 million people. Therefore, to control the population growth rate, the government hold Family Planning or Keluarga Berencana (KB) program for couples of childbearing age. The purpose of this program is to improve the health of mothers and children in order to manifest prosperous society by controlling births while ensuring control of population growth. The data used in this study is the updated family data of Semarang city in 2016 that conducted by National Family Planning Coordinating Board (BKKBN). From these data, classifiers with kernel discriminant analysis will be obtained, and also classification accuracy will be obtained from that method. The result of the analysis showed that normal kernel discriminant analysis gives 71.05 % classification accuracy with 28.95 % classification error. Whereas triweight kernel discriminant analysis gives 73.68 % classification accuracy with 26.32 % classification error. Using triweight kernel discriminant for data preprocessing of family planning participation of childbearing age couples in Semarang City of 2016 can be stated better than with normal kernel discriminant.
The generalization ability of online SVM classification based on Markov sampling.
Xu, Jie; Yan Tang, Yuan; Zou, Bin; Xu, Zongben; Li, Luoqing; Lu, Yang
2015-03-01
In this paper, we consider online support vector machine (SVM) classification learning algorithms with uniformly ergodic Markov chain (u.e.M.c.) samples. We establish the bound on the misclassification error of an online SVM classification algorithm with u.e.M.c. samples based on reproducing kernel Hilbert spaces and obtain a satisfactory convergence rate. We also introduce a novel online SVM classification algorithm based on Markov sampling, and present the numerical studies on the learning ability of online SVM classification based on Markov sampling for benchmark repository. The numerical studies show that the learning performance of the online SVM classification algorithm based on Markov sampling is better than that of classical online SVM classification based on random sampling as the size of training samples is larger.
1987-09-30
RESTRICTIVE MARKINGSC Unclassif ied 2a SECURIly CLASSIFICATION ALIIMOA4TY 3 DIS1RSBj~jiOAVAILAB.I1Y OF RkPORI _________________________________ Approved...of the AC current, including the time dependence at a growing DME, at a given fixed potential either in the presence or the absence of an...the relative error in k b(app) is ob relatively small for ks (true) : 0.5 cm s-, and increases rapidly for ob larger rate constants as kob reaches the
Fully Convolutional Networks for Ground Classification from LIDAR Point Clouds
NASA Astrophysics Data System (ADS)
Rizaldy, A.; Persello, C.; Gevaert, C. M.; Oude Elberink, S. J.
2018-05-01
Deep Learning has been massively used for image classification in recent years. The use of deep learning for ground classification from LIDAR point clouds has also been recently studied. However, point clouds need to be converted into an image in order to use Convolutional Neural Networks (CNNs). In state-of-the-art techniques, this conversion is slow because each point is converted into a separate image. This approach leads to highly redundant computation during conversion and classification. The goal of this study is to design a more efficient data conversion and ground classification. This goal is achieved by first converting the whole point cloud into a single image. The classification is then performed by a Fully Convolutional Network (FCN), a modified version of CNN designed for pixel-wise image classification. The proposed method is significantly faster than state-of-the-art techniques. On the ISPRS Filter Test dataset, it is 78 times faster for conversion and 16 times faster for classification. Our experimental analysis on the same dataset shows that the proposed method results in 5.22 % of total error, 4.10 % of type I error, and 15.07 % of type II error. Compared to the previous CNN-based technique and LAStools software, the proposed method reduces the total error and type I error (while type II error is slightly higher). The method was also tested on a very high point density LIDAR point clouds resulting in 4.02 % of total error, 2.15 % of type I error and 6.14 % of type II error.
Consequences of land-cover misclassification in models of impervious surface
McMahon, G.
2007-01-01
Model estimates of impervious area as a function of landcover area may be biased and imprecise because of errors in the land-cover classification. This investigation of the effects of land-cover misclassification on impervious surface models that use National Land Cover Data (NLCD) evaluates the consequences of adjusting land-cover within a watershed to reflect uncertainty assessment information. Model validation results indicate that using error-matrix information to adjust land-cover values used in impervious surface models does not substantially improve impervious surface predictions. Validation results indicate that the resolution of the landcover data (Level I and Level II) is more important in predicting impervious surface accurately than whether the land-cover data have been adjusted using information in the error matrix. Level I NLCD, adjusted for land-cover misclassification, is preferable to the other land-cover options for use in models of impervious surface. This result is tied to the lower classification error rates for the Level I NLCD. ?? 2007 American Society for Photogrammetry and Remote Sensing.
Chaves, Sandra; Gadanho, Mário; Tenreiro, Rogério; Cabrita, José
1999-01-01
Metronidazole susceptibility of 100 Helicobacter pylori strains was assessed by determining the inhibition zone diameters by disk diffusion test and the MICs by agar dilution and PDM Epsilometer test (E test). Linear regression analysis was performed, allowing the definition of significant linear relations, and revealed correlations of disk diffusion results with both E-test and agar dilution results (r2 = 0.88 and 0.81, respectively). No significant differences (P = 0.84) were found between MICs defined by E test and those defined by agar dilution, taken as a standard. Reproducibility comparison between E-test and disk diffusion tests showed that they are equivalent and with good precision. Two interpretative susceptibility schemes (with or without an intermediate class) were compared by an interpretative error rate analysis method. The susceptibility classification scheme that included the intermediate category was retained, and breakpoints were assessed for diffusion assay with 5-μg metronidazole disks. Strains with inhibition zone diameters less than 16 mm were defined as resistant (MIC > 8 μg/ml), those with zone diameters equal to or greater than 16 mm but less than 21 mm were considered intermediate (4 μg/ml < MIC ≤ 8 μg/ml), and those with zone diameters of 21 mm or greater were regarded as susceptible (MIC ≤ 4 μg/ml). Error rate analysis applied to this classification scheme showed occurrence frequencies of 1% for major errors and 7% for minor errors, when the results were compared to those obtained by agar dilution. No very major errors were detected, suggesting that disk diffusion might be a good alternative for determining the metronidazole sensitivity of H. pylori strains. PMID:10203543
Image Augmentation for Object Image Classification Based On Combination of Pre-Trained CNN and SVM
NASA Astrophysics Data System (ADS)
Shima, Yoshihiro
2018-04-01
Neural networks are a powerful means of classifying object images. The proposed image category classification method for object images combines convolutional neural networks (CNNs) and support vector machines (SVMs). A pre-trained CNN, called Alex-Net, is used as a pattern-feature extractor. Alex-Net is pre-trained for the large-scale object-image dataset ImageNet. Instead of training, Alex-Net, pre-trained for ImageNet is used. An SVM is used as trainable classifier. The feature vectors are passed to the SVM from Alex-Net. The STL-10 dataset are used as object images. The number of classes is ten. Training and test samples are clearly split. STL-10 object images are trained by the SVM with data augmentation. We use the pattern transformation method with the cosine function. We also apply some augmentation method such as rotation, skewing and elastic distortion. By using the cosine function, the original patterns were left-justified, right-justified, top-justified, or bottom-justified. Patterns were also center-justified and enlarged. Test error rate is decreased by 0.435 percentage points from 16.055% by augmentation with cosine transformation. Error rates are increased by other augmentation method such as rotation, skewing and elastic distortion, compared without augmentation. Number of augmented data is 30 times that of the original STL-10 5K training samples. Experimental test error rate for the test 8k STL-10 object images was 15.620%, which shows that image augmentation is effective for image category classification.
A data-driven modeling approach to stochastic computation for low-energy biomedical devices.
Lee, Kyong Ho; Jang, Kuk Jin; Shoeb, Ali; Verma, Naveen
2011-01-01
Low-power devices that can detect clinically relevant correlations in physiologically-complex patient signals can enable systems capable of closed-loop response (e.g., controlled actuation of therapeutic stimulators, continuous recording of disease states, etc.). In ultra-low-power platforms, however, hardware error sources are becoming increasingly limiting. In this paper, we present how data-driven methods, which allow us to accurately model physiological signals, also allow us to effectively model and overcome prominent hardware error sources with nearly no additional overhead. Two applications, EEG-based seizure detection and ECG-based arrhythmia-beat classification, are synthesized to a logic-gate implementation, and two prominent error sources are introduced: (1) SRAM bit-cell errors and (2) logic-gate switching errors ('stuck-at' faults). Using patient data from the CHB-MIT and MIT-BIH databases, performance similar to error-free hardware is achieved even for very high fault rates (up to 0.5 for SRAMs and 7 × 10(-2) for logic) that cause computational bit error rates as high as 50%.
High-density force myography: A possible alternative for upper-limb prosthetic control.
Radmand, Ashkan; Scheme, Erik; Englehart, Kevin
2016-01-01
Several multiple degree-of-freedom upper-limb prostheses that have the promise of highly dexterous control have recently been developed. Inadequate controllability, however, has limited adoption of these devices. Introducing more robust control methods will likely result in higher acceptance rates. This work investigates the suitability of using high-density force myography (HD-FMG) for prosthetic control. HD-FMG uses a high-density array of pressure sensors to detect changes in the pressure patterns between the residual limb and socket caused by the contraction of the forearm muscles. In this work, HD-FMG outperforms the standard electromyography (EMG)-based system in detecting different wrist and hand gestures. With the arm in a fixed, static position, eight hand and wrist motions were classified with 0.33% error using the HD-FMG technique. Comparatively, classification errors in the range of 2.2%-11.3% have been reported in the literature for multichannel EMG-based approaches. As with EMG, position variation in HD-FMG can introduce classification error, but incorporating position variation into the training protocol reduces this effect. Channel reduction was also applied to the HD-FMG technique to decrease the dimensionality of the problem as well as the size of the sensorized area. We found that with informed, symmetric channel reduction, classification error could be decreased to 0.02%.
Classification of echolocation clicks from odontocetes in the Southern California Bight.
Roch, Marie A; Klinck, Holger; Baumann-Pickering, Simone; Mellinger, David K; Qui, Simon; Soldevilla, Melissa S; Hildebrand, John A
2011-01-01
This study presents a system for classifying echolocation clicks of six species of odontocetes in the Southern California Bight: Visually confirmed bottlenose dolphins, short- and long-beaked common dolphins, Pacific white-sided dolphins, Risso's dolphins, and presumed Cuvier's beaked whales. Echolocation clicks are represented by cepstral feature vectors that are classified by Gaussian mixture models. A randomized cross-validation experiment is designed to provide conditions similar to those found in a field-deployed system. To prevent matched conditions from inappropriately lowering the error rate, echolocation clicks associated with a single sighting are never split across the training and test data. Sightings are randomly permuted before assignment to folds in the experiment. This allows different combinations of the training and test data to be used while keeping data from each sighting entirely in the training or test set. The system achieves a mean error rate of 22% across 100 randomized three-fold cross-validation experiments. Four of the six species had mean error rates lower than the overall mean, with the presumed Cuvier's beaked whale clicks showing the best performance (<2% error rate). Long-beaked common and bottlenose dolphins proved the most difficult to classify, with mean error rates of 53% and 68%, respectively.
Franson, J.C.; Hohman, W.L.; Moore, J.L.; Smith, M.R.
1996-01-01
We used 363 blood samples collected from wild canvasback dueks (Aythya valisineria) at Catahoula Lake, Louisiana, U.S.A. to evaluate the effect of sample storage time on the efficacy of erythrocytic protoporphyrin as an indicator of lead exposure. The protoporphyrin concentration of each sample was determined by hematofluorometry within 5 min of blood collection and after refrigeration at 4 °C for 24 and 48 h. All samples were analyzed for lead by atomic absorption spectrophotometry. Based on a blood lead concentration of ≥0.2 ppm wet weight as positive evidence for lead exposure, the protoporphyrin technique resulted in overall error rates of 29%, 20%, and 19% and false negative error rates of 47%, 29% and 25% when hematofluorometric determinations were made on blood at 5 min, 24 h, and 48 h, respectively. False positive error rates were less than 10% for all three measurement times. The accuracy of the 24-h erythrocytic protoporphyrin classification of blood samples as positive or negative for lead exposure was significantly greater than the 5-min classification, but no improvement in accuracy was gained when samples were tested at 48 h. The false negative errors were probably due, at least in part, to the lag time between lead exposure and the increase of blood protoporphyrin concentrations. False negatives resulted in an underestimation of the true number of canvasbacks exposed to lead, indicating that hematofluorometry provides a conservative estimate of lead exposure.
Locally Weighted Score Estimation for Quantile Classification in Binary Regression Models
Rice, John D.; Taylor, Jeremy M. G.
2016-01-01
One common use of binary response regression methods is classification based on an arbitrary probability threshold dictated by the particular application. Since this is given to us a priori, it is sensible to incorporate the threshold into our estimation procedure. Specifically, for the linear logistic model, we solve a set of locally weighted score equations, using a kernel-like weight function centered at the threshold. The bandwidth for the weight function is selected by cross validation of a novel hybrid loss function that combines classification error and a continuous measure of divergence between observed and fitted values; other possible cross-validation functions based on more common binary classification metrics are also examined. This work has much in common with robust estimation, but diers from previous approaches in this area in its focus on prediction, specifically classification into high- and low-risk groups. Simulation results are given showing the reduction in error rates that can be obtained with this method when compared with maximum likelihood estimation, especially under certain forms of model misspecification. Analysis of a melanoma data set is presented to illustrate the use of the method in practice. PMID:28018492
Land use in the Paraiba Valley through remotely sensed data. [Brazil
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Lombardo, M. A.; Novo, E. M. L. D.; Niero, M.; Foresti, C.
1980-01-01
A methodology for land use survey was developed and land use modification rates were determined using LANDSAT imagery of the Paraiba Valley (state of Sao Paulo). Both visual and automatic interpretation methods were employed to analyze seven land use classes: urban area, industrial area, bare soil, cultivated area, pastureland, reforestation and natural vegetation. By means of visual interpretation, little spectral differences are observed among those classes. The automatic classification of LANDSAT MSS data using maximum likelihood algorithm shows a 39% average error of omission and a 3.4% error of inclusion for the seven classes. The complexity of land uses in the study area, the large spectral variations of analyzed classes, and the low resolution of LANDSAT data influenced the classification results.
Multilayer perceptron, fuzzy sets, and classification
NASA Technical Reports Server (NTRS)
Pal, Sankar K.; Mitra, Sushmita
1992-01-01
A fuzzy neural network model based on the multilayer perceptron, using the back-propagation algorithm, and capable of fuzzy classification of patterns is described. The input vector consists of membership values to linguistic properties while the output vector is defined in terms of fuzzy class membership values. This allows efficient modeling of fuzzy or uncertain patterns with appropriate weights being assigned to the backpropagated errors depending upon the membership values at the corresponding outputs. During training, the learning rate is gradually decreased in discrete steps until the network converges to a minimum error solution. The effectiveness of the algorithm is demonstrated on a speech recognition problem. The results are compared with those of the conventional MLP, the Bayes classifier, and the other related models.
Classification-Based Spatial Error Concealment for Visual Communications
NASA Astrophysics Data System (ADS)
Chen, Meng; Zheng, Yefeng; Wu, Min
2006-12-01
In an error-prone transmission environment, error concealment is an effective technique to reconstruct the damaged visual content. Due to large variations of image characteristics, different concealment approaches are necessary to accommodate the different nature of the lost image content. In this paper, we address this issue and propose using classification to integrate the state-of-the-art error concealment techniques. The proposed approach takes advantage of multiple concealment algorithms and adaptively selects the suitable algorithm for each damaged image area. With growing awareness that the design of sender and receiver systems should be jointly considered for efficient and reliable multimedia communications, we proposed a set of classification-based block concealment schemes, including receiver-side classification, sender-side attachment, and sender-side embedding. Our experimental results provide extensive performance comparisons and demonstrate that the proposed classification-based error concealment approaches outperform the conventional approaches.
AVNM: A Voting based Novel Mathematical Rule for Image Classification.
Vidyarthi, Ankit; Mittal, Namita
2016-12-01
In machine learning, the accuracy of the system depends upon classification result. Classification accuracy plays an imperative role in various domains. Non-parametric classifier like K-Nearest Neighbor (KNN) is the most widely used classifier for pattern analysis. Besides its easiness, simplicity and effectiveness characteristics, the main problem associated with KNN classifier is the selection of a number of nearest neighbors i.e. "k" for computation. At present, it is hard to find the optimal value of "k" using any statistical algorithm, which gives perfect accuracy in terms of low misclassification error rate. Motivated by the prescribed problem, a new sample space reduction weighted voting mathematical rule (AVNM) is proposed for classification in machine learning. The proposed AVNM rule is also non-parametric in nature like KNN. AVNM uses the weighted voting mechanism with sample space reduction to learn and examine the predicted class label for unidentified sample. AVNM is free from any initial selection of predefined variable and neighbor selection as found in KNN algorithm. The proposed classifier also reduces the effect of outliers. To verify the performance of the proposed AVNM classifier, experiments are made on 10 standard datasets taken from UCI database and one manually created dataset. The experimental result shows that the proposed AVNM rule outperforms the KNN classifier and its variants. Experimentation results based on confusion matrix accuracy parameter proves higher accuracy value with AVNM rule. The proposed AVNM rule is based on sample space reduction mechanism for identification of an optimal number of nearest neighbor selections. AVNM results in better classification accuracy and minimum error rate as compared with the state-of-art algorithm, KNN, and its variants. The proposed rule automates the selection of nearest neighbor selection and improves classification rate for UCI dataset and manually created dataset. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Modeling habitat dynamics accounting for possible misclassification
Veran, Sophie; Kleiner, Kevin J.; Choquet, Remi; Collazo, Jaime; Nichols, James D.
2012-01-01
Land cover data are widely used in ecology as land cover change is a major component of changes affecting ecological systems. Landscape change estimates are characterized by classification errors. Researchers have used error matrices to adjust estimates of areal extent, but estimation of land cover change is more difficult and more challenging, with error in classification being confused with change. We modeled land cover dynamics for a discrete set of habitat states. The approach accounts for state uncertainty to produce unbiased estimates of habitat transition probabilities using ground information to inform error rates. We consider the case when true and observed habitat states are available for the same geographic unit (pixel) and when true and observed states are obtained at one level of resolution, but transition probabilities estimated at a different level of resolution (aggregations of pixels). Simulation results showed a strong bias when estimating transition probabilities if misclassification was not accounted for. Scaling-up does not necessarily decrease the bias and can even increase it. Analyses of land cover data in the Southeast region of the USA showed that land change patterns appeared distorted if misclassification was not accounted for: rate of habitat turnover was artificially increased and habitat composition appeared more homogeneous. Not properly accounting for land cover misclassification can produce misleading inferences about habitat state and dynamics and also misleading predictions about species distributions based on habitat. Our models that explicitly account for state uncertainty should be useful in obtaining more accurate inferences about change from data that include errors.
Refractive errors in children and adolescents in Bucaramanga (Colombia).
Galvis, Virgilio; Tello, Alejandro; Otero, Johanna; Serrano, Andrés A; Gómez, Luz María; Castellanos, Yuly
2017-01-01
The aim of this study was to establish the frequency of refractive errors in children and adolescents aged between 8 and 17 years old, living in the metropolitan area of Bucaramanga (Colombia). This study was a secondary analysis of two descriptive cross-sectional studies that applied sociodemographic surveys and assessed visual acuity and refraction. Ametropias were classified as myopic errors, hyperopic errors, and mixed astigmatism. Eyes were considered emmetropic if none of these classifications were made. The data were collated using free software and analyzed with STATA/IC 11.2. One thousand two hundred twenty-eight individuals were included in this study. Girls showed a higher rate of ametropia than boys. Hyperopic refractive errors were present in 23.1% of the subjects, and myopic errors in 11.2%. Only 0.2% of the eyes had high myopia (≤-6.00 D). Mixed astigmatism and anisometropia were uncommon, and myopia frequency increased with age. There were statistically significant steeper keratometric readings in myopic compared to hyperopic eyes. The frequency of refractive errors that we found of 36.7% is moderate compared to the global data. The rates and parameters statistically differed by sex and age groups. Our findings are useful for establishing refractive error rate benchmarks in low-middle-income countries and as a baseline for following their variation by sociodemographic factors.
Löpprich, Martin; Krauss, Felix; Ganzinger, Matthias; Senghas, Karsten; Riezler, Stefan; Knaup, Petra
2016-08-05
In the Multiple Myeloma clinical registry at Heidelberg University Hospital, most data are extracted from discharge letters. Our aim was to analyze if it is possible to make the manual documentation process more efficient by using methods of natural language processing for multiclass classification of free-text diagnostic reports to automatically document the diagnosis and state of disease of myeloma patients. The first objective was to create a corpus consisting of free-text diagnosis paragraphs of patients with multiple myeloma from German diagnostic reports, and its manual annotation of relevant data elements by documentation specialists. The second objective was to construct and evaluate a framework using different NLP methods to enable automatic multiclass classification of relevant data elements from free-text diagnostic reports. The main diagnoses paragraph was extracted from the clinical report of one third randomly selected patients of the multiple myeloma research database from Heidelberg University Hospital (in total 737 selected patients). An EDC system was setup and two data entry specialists performed independently a manual documentation of at least nine specific data elements for multiple myeloma characterization. Both data entries were compared and assessed by a third specialist and an annotated text corpus was created. A framework was constructed, consisting of a self-developed package to split multiple diagnosis sequences into several subsequences, four different preprocessing steps to normalize the input data and two classifiers: a maximum entropy classifier (MEC) and a support vector machine (SVM). In total 15 different pipelines were examined and assessed by a ten-fold cross-validation, reiterated 100 times. For quality indication the average error rate and the average F1-score were conducted. For significance testing the approximate randomization test was used. The created annotated corpus consists of 737 different diagnoses paragraphs with a total number of 865 coded diagnosis. The dataset is publicly available in the supplementary online files for training and testing of further NLP methods. Both classifiers showed low average error rates (MEC: 1.05; SVM: 0.84) and high F1-scores (MEC: 0.89; SVM: 0.92). However the results varied widely depending on the classified data element. Preprocessing methods increased this effect and had significant impact on the classification, both positive and negative. The automatic diagnosis splitter increased the average error rate significantly, even if the F1-score decreased only slightly. The low average error rates and high average F1-scores of each pipeline demonstrate the suitability of the investigated NPL methods. However, it was also shown that there is no best practice for an automatic classification of data elements from free-text diagnostic reports.
Common component classification: what can we learn from machine learning?
Anderson, Ariana; Labus, Jennifer S; Vianna, Eduardo P; Mayer, Emeran A; Cohen, Mark S
2011-05-15
Machine learning methods have been applied to classifying fMRI scans by studying locations in the brain that exhibit temporal intensity variation between groups, frequently reporting classification accuracy of 90% or better. Although empirical results are quite favorable, one might doubt the ability of classification methods to withstand changes in task ordering and the reproducibility of activation patterns over runs, and question how much of the classification machines' power is due to artifactual noise versus genuine neurological signal. To examine the true strength and power of machine learning classifiers we create and then deconstruct a classifier to examine its sensitivity to physiological noise, task reordering, and across-scan classification ability. The models are trained and tested both within and across runs to assess stability and reproducibility across conditions. We demonstrate the use of independent components analysis for both feature extraction and artifact removal and show that removal of such artifacts can reduce predictive accuracy even when data has been cleaned in the preprocessing stages. We demonstrate how mistakes in the feature selection process can cause the cross-validation error seen in publication to be a biased estimate of the testing error seen in practice and measure this bias by purposefully making flawed models. We discuss other ways to introduce bias and the statistical assumptions lying behind the data and model themselves. Finally we discuss the complications in drawing inference from the smaller sample sizes typically seen in fMRI studies, the effects of small or unbalanced samples on the Type 1 and Type 2 error rates, and how publication bias can give a false confidence of the power of such methods. Collectively this work identifies challenges specific to fMRI classification and methods affecting the stability of models. Copyright © 2010 Elsevier Inc. All rights reserved.
Correlation-based pattern recognition for implantable defibrillators.
Wilkins, J.
1996-01-01
An estimated 300,000 Americans die each year from cardiac arrhythmias. Historically, drug therapy or surgery were the only treatment options available for patients suffering from arrhythmias. Recently, implantable arrhythmia management devices have been developed. These devices allow abnormal cardiac rhythms to be sensed and corrected in vivo. Proper arrhythmia classification is critical to selecting the appropriate therapeutic intervention. The classification problem is made more challenging by the power/computation constraints imposed by the short battery life of implantable devices. Current devices utilize heart rate-based classification algorithms. Although easy to implement, rate-based approaches have unacceptably high error rates in distinguishing supraventricular tachycardia (SVT) from ventricular tachycardia (VT). Conventional morphology assessment techniques used in ECG analysis often require too much computation to be practical for implantable devices. In this paper, a computationally-efficient, arrhythmia classification architecture using correlation-based morphology assessment is presented. The architecture classifies individuals heart beats by assessing similarity between an incoming cardiac signal vector and a series of prestored class templates. A series of these beat classifications are used to make an overall rhythm assessment. The system makes use of several new results in the field of pattern recognition. The resulting system achieved excellent accuracy in discriminating SVT and VT. PMID:8947674
Multiple-rule bias in the comparison of classification rules
Yousefi, Mohammadmahdi R.; Hua, Jianping; Dougherty, Edward R.
2011-01-01
Motivation: There is growing discussion in the bioinformatics community concerning overoptimism of reported results. Two approaches contributing to overoptimism in classification are (i) the reporting of results on datasets for which a proposed classification rule performs well and (ii) the comparison of multiple classification rules on a single dataset that purports to show the advantage of a certain rule. Results: This article provides a careful probabilistic analysis of the second issue and the ‘multiple-rule bias’, resulting from choosing a classification rule having minimum estimated error on the dataset. It quantifies this bias corresponding to estimating the expected true error of the classification rule possessing minimum estimated error and it characterizes the bias from estimating the true comparative advantage of the chosen classification rule relative to the others by the estimated comparative advantage on the dataset. The analysis is applied to both synthetic and real data using a number of classification rules and error estimators. Availability: We have implemented in C code the synthetic data distribution model, classification rules, feature selection routines and error estimation methods. The code for multiple-rule analysis is implemented in MATLAB. The source code is available at http://gsp.tamu.edu/Publications/supplementary/yousefi11a/. Supplementary simulation results are also included. Contact: edward@ece.tamu.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:21546390
Van de Vreede, Melita; McGrath, Anne; de Clifford, Jan
2018-05-14
Objective. The aim of the present study was to identify and quantify medication errors reportedly related to electronic medication management systems (eMMS) and those considered likely to occur more frequently with eMMS. This included developing a new classification system relevant to eMMS errors. Methods. Eight Victorian hospitals with eMMS participated in a retrospective audit of reported medication incidents from their incident reporting databases between May and July 2014. Site-appointed project officers submitted deidentified incidents they deemed new or likely to occur more frequently due to eMMS, together with the Incident Severity Rating (ISR). The authors reviewed and classified incidents. Results. There were 5826 medication-related incidents reported. In total, 93 (47 prescribing errors, 46 administration errors) were identified as new or potentially related to eMMS. Only one ISR2 (moderate) and no ISR1 (severe or death) errors were reported, so harm to patients in this 3-month period was minimal. The most commonly reported error types were 'human factors' and 'unfamiliarity or training' (70%) and 'cross-encounter or hybrid system errors' (22%). Conclusions. Although the results suggest that the errors reported were of low severity, organisations must remain vigilant to the risk of new errors and avoid the assumption that eMMS is the panacea to all medication error issues. What is known about the topic? eMMS have been shown to reduce some types of medication errors, but it has been reported that some new medication errors have been identified and some are likely to occur more frequently with eMMS. There are few published Australian studies that have reported on medication error types that are likely to occur more frequently with eMMS in more than one organisation and that include administration and prescribing errors. What does this paper add? This paper includes a new simple classification system for eMMS that is useful and outlines the most commonly reported incident types and can inform organisations and vendors on possible eMMS improvements. The paper suggests a new classification system for eMMS medication errors. What are the implications for practitioners? The results of the present study will highlight to organisations the need for ongoing review of system design, refinement of workflow issues, staff education and training and reporting and monitoring of errors.
Nebuloni, G; Di Giulio, P; Gregori, D; Sandonà, P; Berchialla, P; Foltran, F; Renga, G
2011-01-01
Since 2003, the Lombardy region has introduced a case-mix reimbursement system for nursing homes based on the SOSIA form which classifies residents into eight classes of frailty. In the present study the agreement between SOSIA classification and other well documented instruments, including Barthel Index, Mini Mental State Examination and Clinical Dementia Rating Scale is evaluated in 100 nursing home residents. Only 50% of residents with severe dementia have been recognized as seriously impaired when assessed with SOSIA form; since misclassification errors underestimate residents' care needs, they determine an insufficient reimbursement limiting nursing home possibility to offer care appropriate for the case-mix.
Toward noncooperative iris recognition: a classification approach using multiple signatures.
Proença, Hugo; Alexandre, Luís A
2007-04-01
This paper focuses on noncooperative iris recognition, i.e., the capture of iris images at large distances, under less controlled lighting conditions, and without active participation of the subjects. This increases the probability of capturing very heterogeneous images (regarding focus, contrast, or brightness) and with several noise factors (iris obstructions and reflections). Current iris recognition systems are unable to deal with noisy data and substantially increase their error rates, especially the false rejections, in these conditions. We propose an iris classification method that divides the segmented and normalized iris image into six regions, makes an independent feature extraction and comparison for each region, and combines each of the dissimilarity values through a classification rule. Experiments show a substantial decrease, higher than 40 percent, of the false rejection rates in the recognition of noisy iris images.
Clarification of terminology in medication errors: definitions and classification.
Ferner, Robin E; Aronson, Jeffrey K
2006-01-01
We have previously described and analysed some terms that are used in drug safety and have proposed definitions. Here we discuss and define terms that are used in the field of medication errors, particularly terms that are sometimes misunderstood or misused. We also discuss the classification of medication errors. A medication error is a failure in the treatment process that leads to, or has the potential to lead to, harm to the patient. Errors can be classified according to whether they are mistakes, slips, or lapses. Mistakes are errors in the planning of an action. They can be knowledge based or rule based. Slips and lapses are errors in carrying out an action - a slip through an erroneous performance and a lapse through an erroneous memory. Classification of medication errors is important because the probabilities of errors of different classes are different, as are the potential remedies.
Classification of burn wounds using support vector machines
NASA Astrophysics Data System (ADS)
Acha, Begona; Serrano, Carmen; Palencia, Sergio; Murillo, Juan Jose
2004-05-01
The purpose of this work is to improve a previous method developed by the authors for the classification of burn wounds into their depths. The inputs of the system are color and texture information, as these are the characteristics observed by physicians in order to give a diagnosis. Our previous work consisted in segmenting the burn wound from the rest of the image and classifying the burn into its depth. In this paper we focus on the classification problem only. We already proposed to use a Fuzzy-ARTMAP neural network (NN). However, we may take advantage of new powerful classification tools such as Support Vector Machines (SVM). We apply the five-folded cross validation scheme to divide the database into training and validating sets. Then, we apply a feature selection method for each classifier, which will give us the set of features that yields the smallest classification error for each classifier. Features used to classify are first-order statistical parameters extracted from the L*, u* and v* color components of the image. The feature selection algorithms used are the Sequential Forward Selection (SFS) and the Sequential Backward Selection (SBS) methods. As data of the problem faced here are not linearly separable, the SVM was trained using some different kernels. The validating process shows that the SVM method, when using a Gaussian kernel of variance 1, outperforms classification results obtained with the rest of the classifiers, yielding an error classification rate of 0.7% whereas the Fuzzy-ARTMAP NN attained 1.6 %.
NASA Astrophysics Data System (ADS)
Rajwa, Bartek; Bayraktar, Bulent; Banada, Padmapriya P.; Huff, Karleigh; Bae, Euiwon; Hirleman, E. Daniel; Bhunia, Arun K.; Robinson, J. Paul
2006-10-01
Bacterial contamination by Listeria monocytogenes puts the public at risk and is also costly for the food-processing industry. Traditional methods for pathogen identification require complicated sample preparation for reliable results. Previously, we have reported development of a noninvasive optical forward-scattering system for rapid identification of Listeria colonies grown on solid surfaces. The presented system included application of computer-vision and patternrecognition techniques to classify scatter pattern formed by bacterial colonies irradiated with laser light. This report shows an extension of the proposed method. A new scatterometer equipped with a high-resolution CCD chip and application of two additional sets of image features for classification allow for higher accuracy and lower error rates. Features based on Zernike moments are supplemented by Tchebichef moments, and Haralick texture descriptors in the new version of the algorithm. Fisher's criterion has been used for feature selection to decrease the training time of machine learning systems. An algorithm based on support vector machines was used for classification of patterns. Low error rates determined by cross-validation, reproducibility of the measurements, and robustness of the system prove that the proposed technology can be implemented in automated devices for detection and classification of pathogenic bacteria.
On Correlations, Distances and Error Rates.
ERIC Educational Resources Information Center
Dorans, Neil J.
The nature of the criterion (dependent) variable may play a useful role in structuring a list of classification/prediction problems. Such criteria are continuous in nature, binary dichotomous, or multichotomous. In this paper, discussion is limited to the continuous normally distributed criterion scenarios. For both cases, it is assumed that the…
The presence of English and Spanish dyslexia in the Web
NASA Astrophysics Data System (ADS)
Rello, Luz; Baeza-Yates, Ricardo
2012-09-01
In this study we present a lower bound of the prevalence of dyslexia in the Web for English and Spanish. On the basis of analysis of corpora written by dyslexic people, we propose a classification of the different kinds of dyslexic errors. A representative data set of dyslexic words is used to calculate this lower bound in web pages containing English and Spanish dyslexic errors. We also present an analysis of dyslexic errors in major Internet domains, social media sites, and throughout English- and Spanish-speaking countries. To show the independence of our estimations from the presence of other kinds of errors, we compare them with the overall lexical quality of the Web and with the error rate of noncorrected corpora. The presence of dyslexic errors in the Web motivates work in web accessibility for dyslexic users.
Roch, Marie A; Stinner-Sloan, Johanna; Baumann-Pickering, Simone; Wiggins, Sean M
2015-01-01
A concern for applications of machine learning techniques to bioacoustics is whether or not classifiers learn the categories for which they were trained. Unfortunately, information such as characteristics of specific recording equipment or noise environments can also be learned. This question is examined in the context of identifying delphinid species by their echolocation clicks. To reduce the ambiguity between species classification performance and other confounding factors, species whose clicks can be readily distinguished were used in this study: Pacific white-sided and Risso's dolphins. A subset of data from autonomous acoustic recorders located at seven sites in the Southern California Bight collected between 2006 and 2012 was selected. Cepstral-based features were extracted for each echolocation click and Gaussian mixture models were used to classify groups of 100 clicks. One hundred Monte-Carlo three-fold experiments were conducted to examine classification performance where fold composition was determined by acoustic encounter, recorder characteristics, or recording site. The error rate increased from 6.1% when grouped by acoustic encounter to 18.1%, 46.2%, and 33.2% for grouping by equipment, equipment category, and site, respectively. A noise compensation technique reduced error for these grouping schemes to 2.7%, 4.4%, 6.7%, and 11.4%, respectively, a reduction in error rate of 56%-86%.
Smith, Lauren H; Hargrove, Levi J; Lock, Blair A; Kuiken, Todd A
2011-04-01
Pattern recognition-based control of myoelectric prostheses has shown great promise in research environments, but has not been optimized for use in a clinical setting. To explore the relationship between classification error, controller delay, and real-time controllability, 13 able-bodied subjects were trained to operate a virtual upper-limb prosthesis using pattern recognition of electromyogram (EMG) signals. Classification error and controller delay were varied by training different classifiers with a variety of analysis window lengths ranging from 50 to 550 ms and either two or four EMG input channels. Offline analysis showed that classification error decreased with longer window lengths (p < 0.01 ). Real-time controllability was evaluated with the target achievement control (TAC) test, which prompted users to maneuver the virtual prosthesis into various target postures. The results indicated that user performance improved with lower classification error (p < 0.01 ) and was reduced with longer controller delay (p < 0.01 ), as determined by the window length. Therefore, both of these effects should be considered when choosing a window length; it may be beneficial to increase the window length if this results in a reduced classification error, despite the corresponding increase in controller delay. For the system employed in this study, the optimal window length was found to be between 150 and 250 ms, which is within acceptable controller delays for conventional multistate amplitude controllers.
Satellite inventory of Minnesota forest resources
NASA Technical Reports Server (NTRS)
Bauer, Marvin E.; Burk, Thomas E.; Ek, Alan R.; Coppin, Pol R.; Lime, Stephen D.; Walsh, Terese A.; Walters, David K.; Befort, William; Heinzen, David F.
1993-01-01
The methods and results of using Landsat Thematic Mapper (TM) data to classify and estimate the acreage of forest covertypes in northeastern Minnesota are described. Portions of six TM scenes covering five counties with a total area of 14,679 square miles were classified into six forest and five nonforest classes. The approach involved the integration of cluster sampling, image processing, and estimation. Using cluster sampling, 343 plots, each 88 acres in size, were photo interpreted and field mapped as a source of reference data for classifier training and calibration of the TM data classifications. Classification accuracies of up to 75 percent were achieved; most misclassification was between similar or related classes. An inverse method of calibration, based on the error rates obtained from the classifications of the cluster plots, was used to adjust the classification class proportions for classification errors. The resulting area estimates for total forest land in the five-county area were within 3 percent of the estimate made independently by the USDA Forest Service. Area estimates for conifer and hardwood forest types were within 0.8 and 6.0 percent respectively, of the Forest Service estimates. A trial of a second method of estimating the same classes as the Forest Service resulted in standard errors of 0.002 to 0.015. A study of the use of multidate TM data for change detection showed that forest canopy depletion, canopy increment, and no change could be identified with greater than 90 percent accuracy. The project results have been the basis for the Minnesota Department of Natural Resources and the Forest Service to define and begin to implement an annual system of forest inventory which utilizes Landsat TM data to detect changes in forest cover.
Adamo, Margaret Peggy; Boten, Jessica A; Coyle, Linda M; Cronin, Kathleen A; Lam, Clara J K; Negoita, Serban; Penberthy, Lynne; Stevens, Jennifer L; Ward, Kevin C
2017-02-15
Researchers have used prostate-specific antigen (PSA) values collected by central cancer registries to evaluate tumors for potential aggressive clinical disease. An independent study collecting PSA values suggested a high error rate (18%) related to implied decimal points. To evaluate the error rate in the Surveillance, Epidemiology, and End Results (SEER) program, a comprehensive review of PSA values recorded across all SEER registries was performed. Consolidated PSA values for eligible prostate cancer cases in SEER registries were reviewed and compared with text documentation from abstracted records. Four types of classification errors were identified: implied decimal point errors, abstraction or coding implementation errors, nonsignificant errors, and changes related to "unknown" values. A total of 50,277 prostate cancer cases diagnosed in 2012 were reviewed. Approximately 94.15% of cases did not have meaningful changes (85.85% correct, 5.58% with a nonsignificant change of <1 ng/mL, and 2.80% with no clinical change). Approximately 5.70% of cases had meaningful changes (1.93% due to implied decimal point errors, 1.54% due to abstract or coding errors, and 2.23% due to errors related to unknown categories). Only 419 of the original 50,277 cases (0.83%) resulted in a change in disease stage due to a corrected PSA value. The implied decimal error rate was only 1.93% of all cases in the current validation study, with a meaningful error rate of 5.81%. The reasons for the lower error rate in SEER are likely due to ongoing and rigorous quality control and visual editing processes by the central registries. The SEER program currently is reviewing and correcting PSA values back to 2004 and will re-release these data in the public use research file. Cancer 2017;123:697-703. © 2016 American Cancer Society. © 2016 The Authors. Cancer published by Wiley Periodicals, Inc. on behalf of American Cancer Society.
Cabilan, C J; Hughes, James A; Shannon, Carl
2017-12-01
To describe the contextual, modal and psychological classification of medication errors in the emergency department to know the factors associated with the reported medication errors. The causes of medication errors are unique in every clinical setting; hence, error minimisation strategies are not always effective. For this reason, it is fundamental to understand the causes specific to the emergency department so that targeted strategies can be implemented. Retrospective analysis of reported medication errors in the emergency department. All voluntarily staff-reported medication-related incidents from 2010-2015 from the hospital's electronic incident management system were retrieved for analysis. Contextual classification involved the time, place and the type of medications involved. Modal classification pertained to the stage and issue (e.g. wrong medication, wrong patient). Psychological classification categorised the errors in planning (knowledge-based and rule-based errors) and skill (slips and lapses). There were 405 errors reported. Most errors occurred in the acute care area, short-stay unit and resuscitation area, during the busiest shifts (0800-1559, 1600-2259). Half of the errors involved high-alert medications. Many of the errors occurred during administration (62·7%), prescribing (28·6%) and commonly during both stages (18·5%). Wrong dose, wrong medication and omission were the issues that dominated. Knowledge-based errors characterised the errors that occurred in prescribing and administration. The highest proportion of slips (79·5%) and lapses (76·1%) occurred during medication administration. It is likely that some of the errors occurred due to the lack of adherence to safety protocols. Technology such as computerised prescribing, barcode medication administration and reminder systems could potentially decrease the medication errors in the emergency department. There was a possibility that some of the errors could be prevented if safety protocols were adhered to, which highlights the need to also address clinicians' attitudes towards safety. Technology can be implemented to help minimise errors in the ED, but this must be coupled with efforts to enhance the culture of safety. © 2017 John Wiley & Sons Ltd.
Morbi, Abigail H M; Hamady, Mohamad S; Riga, Celia V; Kashef, Elika; Pearch, Ben J; Vincent, Charles; Moorthy, Krishna; Vats, Amit; Cheshire, Nicholas J W; Bicknell, Colin D
2012-08-01
To determine the type and frequency of errors during vascular interventional radiology (VIR) and design and implement an intervention to reduce error and improve efficiency in this setting. Ethical guidance was sought from the Research Services Department at Imperial College London. Informed consent was not obtained. Field notes were recorded during 55 VIR procedures by a single observer. Two blinded assessors identified failures from field notes and categorized them into one or more errors by using a 22-part classification system. The potential to cause harm, disruption to procedural flow, and preventability of each failure was determined. A preprocedural team rehearsal (PPTR) was then designed and implemented to target frequent preventable potential failures. Thirty-three procedures were observed subsequently to determine the efficacy of the PPTR. Nonparametric statistical analysis was used to determine the effect of intervention on potential failure rates, potential to cause harm and procedural flow disruption scores (Mann-Whitney U test), and number of preventable failures (Fisher exact test). Before intervention, 1197 potential failures were recorded, of which 54.6% were preventable. A total of 2040 errors were deemed to have occurred to produce these failures. Planning error (19.7%), staff absence (16.2%), equipment unavailability (12.2%), communication error (11.2%), and lack of safety consciousness (6.1%) were the most frequent errors, accounting for 65.4% of the total. After intervention, 352 potential failures were recorded. Classification resulted in 477 errors. Preventable failures decreased from 54.6% to 27.3% (P < .001) with implementation of PPTR. Potential failure rates per hour decreased from 18.8 to 9.2 (P < .001), with no increase in potential to cause harm or procedural flow disruption per failure. Failures during VIR procedures are largely because of ineffective planning, communication error, and equipment difficulties, rather than a result of technical or patient-related issues. Many of these potential failures are preventable. A PPTR is an effective means of targeting frequent preventable failures, reducing procedural delays and improving patient safety.
Underwater target classification using wavelet packets and neural networks.
Azimi-Sadjadi, M R; Yao, D; Huang, Q; Dobeck, G J
2000-01-01
In this paper, a new subband-based classification scheme is developed for classifying underwater mines and mine-like targets from the acoustic backscattered signals. The system consists of a feature extractor using wavelet packets in conjunction with linear predictive coding (LPC), a feature selection scheme, and a backpropagation neural-network classifier. The data set used for this study consists of the backscattered signals from six different objects: two mine-like targets and four nontargets for several aspect angles. Simulation results on ten different noisy realizations and for signal-to-noise ratio (SNR) of 12 dB are presented. The receiver operating characteristic (ROC) curve of the classifier generated based on these results demonstrated excellent classification performance of the system. The generalization ability of the trained network was demonstrated by computing the error and classification rate statistics on a large data set. A multiaspect fusion scheme was also adopted in order to further improve the classification performance.
NASA Technical Reports Server (NTRS)
Alexander, Tiffaney Miller
2017-01-01
Research results have shown that more than half of aviation, aerospace and aeronautics mishaps incidents are attributed to human error. As a part of Safety within space exploration ground processing operations, the identification and/or classification of underlying contributors and causes of human error must be identified, in order to manage human error. This research provides a framework and methodology using the Human Error Assessment and Reduction Technique (HEART) and Human Factor Analysis and Classification System (HFACS), as an analysis tool to identify contributing factors, their impact on human error events, and predict the Human Error probabilities (HEPs) of future occurrences. This research methodology was applied (retrospectively) to six (6) NASA ground processing operations scenarios and thirty (30) years of Launch Vehicle related mishap data. This modifiable framework can be used and followed by other space and similar complex operations.
NASA Technical Reports Server (NTRS)
Alexander, Tiffaney Miller
2017-01-01
Research results have shown that more than half of aviation, aerospace and aeronautics mishaps/incidents are attributed to human error. As a part of Safety within space exploration ground processing operations, the identification and/or classification of underlying contributors and causes of human error must be identified, in order to manage human error. This research provides a framework and methodology using the Human Error Assessment and Reduction Technique (HEART) and Human Factor Analysis and Classification System (HFACS), as an analysis tool to identify contributing factors, their impact on human error events, and predict the Human Error probabilities (HEPs) of future occurrences. This research methodology was applied (retrospectively) to six (6) NASA ground processing operations scenarios and thirty (30) years of Launch Vehicle related mishap data. This modifiable framework can be used and followed by other space and similar complex operations.
NASA Technical Reports Server (NTRS)
Alexander, Tiffaney Miller
2017-01-01
Research results have shown that more than half of aviation, aerospace and aeronautics mishaps incidents are attributed to human error. As a part of Quality within space exploration ground processing operations, the identification and or classification of underlying contributors and causes of human error must be identified, in order to manage human error.This presentation will provide a framework and methodology using the Human Error Assessment and Reduction Technique (HEART) and Human Factor Analysis and Classification System (HFACS), as an analysis tool to identify contributing factors, their impact on human error events, and predict the Human Error probabilities (HEPs) of future occurrences. This research methodology was applied (retrospectively) to six (6) NASA ground processing operations scenarios and thirty (30) years of Launch Vehicle related mishap data. This modifiable framework can be used and followed by other space and similar complex operations.
Masked and unmasked error-related potentials during continuous control and feedback
NASA Astrophysics Data System (ADS)
Lopes Dias, Catarina; Sburlea, Andreea I.; Müller-Putz, Gernot R.
2018-06-01
The detection of error-related potentials (ErrPs) in tasks with discrete feedback is well established in the brain–computer interface (BCI) field. However, the decoding of ErrPs in tasks with continuous feedback is still in its early stages. Objective. We developed a task in which subjects have continuous control of a cursor’s position by means of a joystick. The cursor’s position was shown to the participants in two different modalities of continuous feedback: normal and jittered. The jittered feedback was created to mimic the instability that could exist if participants controlled the trajectory directly with brain signals. Approach. This paper studies the electroencephalographic (EEG)—measurable signatures caused by a loss of control over the cursor’s trajectory, causing a target miss. Main results. In both feedback modalities, time-locked potentials revealed the typical frontal-central components of error-related potentials. Errors occurring during the jittered feedback (masked errors) were delayed in comparison to errors occurring during normal feedback (unmasked errors). Masked errors displayed lower peak amplitudes than unmasked errors. Time-locked classification analysis allowed a good distinction between correct and error classes (average Cohen-, average TPR = 81.8% and average TNR = 96.4%). Time-locked classification analysis between masked error and unmasked error classes revealed results at chance level (average Cohen-, average TPR = 60.9% and average TNR = 58.3%). Afterwards, we performed asynchronous detection of ErrPs, combining both masked and unmasked trials. The asynchronous detection of ErrPs in a simulated online scenario resulted in an average TNR of 84.0% and in an average TPR of 64.9%. Significance. The time-locked classification results suggest that the masked and unmasked errors were indistinguishable in terms of classification. The asynchronous classification results suggest that the feedback modality did not hinder the asynchronous detection of ErrPs.
Statistical sensor fusion of ECG data using automotive-grade sensors
NASA Astrophysics Data System (ADS)
Koenig, A.; Rehg, T.; Rasshofer, R.
2015-11-01
Driver states such as fatigue, stress, aggression, distraction or even medical emergencies continue to be yield to severe mistakes in driving and promote accidents. A pathway towards improving driver state assessment can be found in psycho-physiological measures to directly quantify the driver's state from physiological recordings. Although heart rate is a well-established physiological variable that reflects cognitive stress, obtaining heart rate contactless and reliably is a challenging task in an automotive environment. Our aim was to investigate, how sensory fusion of two automotive grade sensors would influence the accuracy of automatic classification of cognitive stress levels. We induced cognitive stress in subjects and estimated levels from their heart rate signals, acquired from automotive ready ECG sensors. Using signal quality indices and Kalman filters, we were able to decrease Root Mean Squared Error (RMSE) of heart rate recordings by 10 beats per minute. We then trained a neural network to classify the cognitive workload state of subjects from heart rate and compared classification performance for ground truth, the individual sensors and the fused heart rate signal. We obtained an increase of 5 % higher correct classification by fusing signals as compared to individual sensors, staying only 4 % below the maximally possible classification accuracy from ground truth. These results are a first step towards real world applications of psycho-physiological measurements in vehicle settings. Future implementations of driver state modeling will be able to draw from a larger pool of data sources, such as additional physiological values or vehicle related data, which can be expected to drive classification to significantly higher values.
A Generic Deep-Learning-Based Approach for Automated Surface Inspection.
Ren, Ruoxu; Hung, Terence; Tan, Kay Chen
2018-03-01
Automated surface inspection (ASI) is a challenging task in industry, as collecting training dataset is usually costly and related methods are highly dataset-dependent. In this paper, a generic approach that requires small training data for ASI is proposed. First, this approach builds classifier on the features of image patches, where the features are transferred from a pretrained deep learning network. Next, pixel-wise prediction is obtained by convolving the trained classifier over input image. An experiment on three public and one industrial data set is carried out. The experiment involves two tasks: 1) image classification and 2) defect segmentation. The results of proposed algorithm are compared against several best benchmarks in literature. In the classification tasks, the proposed method improves accuracy by 0.66%-25.50%. In the segmentation tasks, the proposed method reduces error escape rates by 6.00%-19.00% in three defect types and improves accuracies by 2.29%-9.86% in all seven defect types. In addition, the proposed method achieves 0.0% error escape rate in the segmentation task of industrial data.
Jang, Hojin; Plis, Sergey M.; Calhoun, Vince D.; Lee, Jong-Hwan
2016-01-01
Feedforward deep neural networks (DNN), artificial neural networks with multiple hidden layers, have recently demonstrated a record-breaking performance in multiple areas of applications in computer vision and speech processing. Following the success, DNNs have been applied to neuroimaging modalities including functional/structural magnetic resonance imaging (MRI) and positron-emission tomography data. However, no study has explicitly applied DNNs to 3D whole-brain fMRI volumes and thereby extracted hidden volumetric representations of fMRI that are discriminative for a task performed as the fMRI volume was acquired. Our study applied fully connected feedforward DNN to fMRI volumes collected in four sensorimotor tasks (i.e., left-hand clenching, right-hand clenching, auditory attention, and visual stimulus) undertaken by 12 healthy participants. Using a leave-one-subject-out cross-validation scheme, a restricted Boltzmann machine-based deep belief network was pretrained and used to initialize weights of the DNN. The pretrained DNN was fine-tuned while systematically controlling weight-sparsity levels across hidden layers. Optimal weight-sparsity levels were determined from a minimum validation error rate of fMRI volume classification. Minimum error rates (mean ± standard deviation; %) of 6.9 (± 3.8) were obtained from the three-layer DNN with the sparsest condition of weights across the three hidden layers. These error rates were even lower than the error rates from the single-layer network (9.4 ± 4.6) and the two-layer network (7.4 ± 4.1). The estimated DNN weights showed spatial patterns that are remarkably task-specific, particularly in the higher layers. The output values of the third hidden layer represented distinct patterns/codes of the 3D whole-brain fMRI volume and encoded the information of the tasks as evaluated from representational similarity analysis. Our reported findings show the ability of the DNN to classify a single fMRI volume based on the extraction of hidden representations of fMRI volumes associated with tasks across multiple hidden layers. Our study may be beneficial to the automatic classification/diagnosis of neuropsychiatric and neurological diseases and prediction of disease severity and recovery in (pre-) clinical settings using fMRI volumes without requiring an estimation of activation patterns or ad hoc statistical evaluation. PMID:27079534
Jang, Hojin; Plis, Sergey M; Calhoun, Vince D; Lee, Jong-Hwan
2017-01-15
Feedforward deep neural networks (DNNs), artificial neural networks with multiple hidden layers, have recently demonstrated a record-breaking performance in multiple areas of applications in computer vision and speech processing. Following the success, DNNs have been applied to neuroimaging modalities including functional/structural magnetic resonance imaging (MRI) and positron-emission tomography data. However, no study has explicitly applied DNNs to 3D whole-brain fMRI volumes and thereby extracted hidden volumetric representations of fMRI that are discriminative for a task performed as the fMRI volume was acquired. Our study applied fully connected feedforward DNN to fMRI volumes collected in four sensorimotor tasks (i.e., left-hand clenching, right-hand clenching, auditory attention, and visual stimulus) undertaken by 12 healthy participants. Using a leave-one-subject-out cross-validation scheme, a restricted Boltzmann machine-based deep belief network was pretrained and used to initialize weights of the DNN. The pretrained DNN was fine-tuned while systematically controlling weight-sparsity levels across hidden layers. Optimal weight-sparsity levels were determined from a minimum validation error rate of fMRI volume classification. Minimum error rates (mean±standard deviation; %) of 6.9 (±3.8) were obtained from the three-layer DNN with the sparsest condition of weights across the three hidden layers. These error rates were even lower than the error rates from the single-layer network (9.4±4.6) and the two-layer network (7.4±4.1). The estimated DNN weights showed spatial patterns that are remarkably task-specific, particularly in the higher layers. The output values of the third hidden layer represented distinct patterns/codes of the 3D whole-brain fMRI volume and encoded the information of the tasks as evaluated from representational similarity analysis. Our reported findings show the ability of the DNN to classify a single fMRI volume based on the extraction of hidden representations of fMRI volumes associated with tasks across multiple hidden layers. Our study may be beneficial to the automatic classification/diagnosis of neuropsychiatric and neurological diseases and prediction of disease severity and recovery in (pre-) clinical settings using fMRI volumes without requiring an estimation of activation patterns or ad hoc statistical evaluation. Copyright © 2016 Elsevier Inc. All rights reserved.
Data quality in a DRG-based information system.
Colin, C; Ecochard, R; Delahaye, F; Landrivon, G; Messy, P; Morgon, E; Matillon, Y
1994-09-01
The aim of this study initiated in May 1990 was to evaluate the quality of the medical data collected from the main hospital of the "Hospices Civils de Lyon", Edouard Herriot Hospital. We studied a random sample of 593 discharge abstracts from 12 wards of the hospital. Quality control was performed by checking multi-hospitalized patients' personal data, checking that each discharge abstract was exhaustive, examining the quality of abstracting, studying diagnoses and medical procedures coding, and checking data entry. Assessment of personal data showed a 4.4% error rate. It was mainly accounted for by spelling mistakes in surnames and first names, and mistakes in dates of birth. The quality of a discharge abstract was estimated according to the two purposes of the medical information system: description of hospital morbidity per patient and Diagnosis Related Group's case mix. Error rates in discharge abstracts were expressed in two ways: an overall rate for errors of concordance between Discharge Abstracts and Medical Records, and a specific rate for errors modifying classification in Diagnosis Related Groups (DRG). For abstracting medical information, these error rates were 11.5% (SE +/- 2.2) and 7.5% (SE +/- 1.9) respectively. For coding diagnoses and procedures, they were 11.4% (SE +/- 1.5) and 1.3% (SE +/- 0.5) respectively. For data entry on the computerized data base, the error rate was 2% (SE +/- 0.5) and 0.2% (SE +/- 0.05). Quality control must be performed regularly because it demonstrates the degree of participation from health care teams and the coherence of the database.(ABSTRACT TRUNCATED AT 250 WORDS)
Benchmark data on the separability among crops in the southern San Joaquin Valley of California
NASA Technical Reports Server (NTRS)
Morse, A.; Card, D. H.
1984-01-01
Landsat MSS data were input to a discriminant analysis of 21 crops on each of eight dates in 1979 using a total of 4,142 fields in southern Fresno County, California. The 21 crops, which together account for over 70 percent of the agricultural acreage in the southern San Joaquin Valley, were analyzed to quantify the spectral separability, defined as omission error, between all pairs of crops. On each date the fields were segregated into six groups based on the mean value of the MSS7/MSS5 ratio, which is correlated with green biomass. Discriminant analysis was run on each group on each date. The resulting contingency tables offer information that can be profitably used in conjunction with crop calendars to pick the best dates for a classification. The tables show expected percent correct classification and error rates for all the crops. The patterns in the contingency tables show that the percent correct classification for crops generally increases with the amount of greenness in the fields being classified. However, there are exceptions to this general rule, notably grain.
Multicategory nets of single-layer perceptrons: complexity and sample-size issues.
Raudys, Sarunas; Kybartas, Rimantas; Zavadskas, Edmundas Kazimieras
2010-05-01
The standard cost function of multicategory single-layer perceptrons (SLPs) does not minimize the classification error rate. In order to reduce classification error, it is necessary to: 1) refuse the traditional cost function, 2) obtain near to optimal pairwise linear classifiers by specially organized SLP training and optimal stopping, and 3) fuse their decisions properly. To obtain better classification in unbalanced training set situations, we introduce the unbalance correcting term. It was found that fusion based on the Kulback-Leibler (K-L) distance and the Wu-Lin-Weng (WLW) method result in approximately the same performance in situations where sample sizes are relatively small. The explanation for this observation is by theoretically known verity that an excessive minimization of inexact criteria becomes harmful at times. Comprehensive comparative investigations of six real-world pattern recognition (PR) problems demonstrated that employment of SLP-based pairwise classifiers is comparable and as often as not outperforming the linear support vector (SV) classifiers in moderate dimensional situations. The colored noise injection used to design pseudovalidation sets proves to be a powerful tool for facilitating finite sample problems in moderate-dimensional PR tasks.
Kopps, Anna M; Kang, Jungkoo; Sherwin, William B; Palsbøll, Per J
2015-06-30
Kinship analyses are important pillars of ecological and conservation genetic studies with potentially far-reaching implications. There is a need for power analyses that address a range of possible relationships. Nevertheless, such analyses are rarely applied, and studies that use genetic-data-based-kinship inference often ignore the influence of intrinsic population characteristics. We investigated 11 questions regarding the correct classification rate of dyads to relatedness categories (relatedness category assignments; RCA) using an individual-based model with realistic life history parameters. We investigated the effects of the number of genetic markers; marker type (microsatellite, single nucleotide polymorphism SNP, or both); minor allele frequency; typing error; mating system; and the number of overlapping generations under different demographic conditions. We found that (i) an increasing number of genetic markers increased the correct classification rate of the RCA so that up to >80% first cousins can be correctly assigned; (ii) the minimum number of genetic markers required for assignments with 80 and 95% correct classifications differed between relatedness categories, mating systems, and the number of overlapping generations; (iii) the correct classification rate was improved by adding additional relatedness categories and age and mitochondrial DNA data; and (iv) a combination of microsatellite and single-nucleotide polymorphism data increased the correct classification rate if <800 SNP loci were available. This study shows how intrinsic population characteristics, such as mating system and the number of overlapping generations, life history traits, and genetic marker characteristics, can influence the correct classification rate of an RCA study. Therefore, species-specific power analyses are essential for empirical studies. Copyright © 2015 Kopps et al.
Statistical approaches to account for false-positive errors in environmental DNA samples.
Lahoz-Monfort, José J; Guillera-Arroita, Gurutzeta; Tingley, Reid
2016-05-01
Environmental DNA (eDNA) sampling is prone to both false-positive and false-negative errors. We review statistical methods to account for such errors in the analysis of eDNA data and use simulations to compare the performance of different modelling approaches. Our simulations illustrate that even low false-positive rates can produce biased estimates of occupancy and detectability. We further show that removing or classifying single PCR detections in an ad hoc manner under the suspicion that such records represent false positives, as sometimes advocated in the eDNA literature, also results in biased estimation of occupancy, detectability and false-positive rates. We advocate alternative approaches to account for false-positive errors that rely on prior information, or the collection of ancillary detection data at a subset of sites using a sampling method that is not prone to false-positive errors. We illustrate the advantages of these approaches over ad hoc classifications of detections and provide practical advice and code for fitting these models in maximum likelihood and Bayesian frameworks. Given the severe bias induced by false-negative and false-positive errors, the methods presented here should be more routinely adopted in eDNA studies. © 2015 John Wiley & Sons Ltd.
On the use of interaction error potentials for adaptive brain computer interfaces.
Llera, A; van Gerven, M A J; Gómez, V; Jensen, O; Kappen, H J
2011-12-01
We propose an adaptive classification method for the Brain Computer Interfaces (BCI) which uses Interaction Error Potentials (IErrPs) as a reinforcement signal and adapts the classifier parameters when an error is detected. We analyze the quality of the proposed approach in relation to the misclassification of the IErrPs. In addition we compare static versus adaptive classification performance using artificial and MEG data. We show that the proposed adaptive framework significantly improves the static classification methods. Copyright © 2011 Elsevier Ltd. All rights reserved.
A neural network for noise correlation classification
NASA Astrophysics Data System (ADS)
Paitz, Patrick; Gokhberg, Alexey; Fichtner, Andreas
2018-02-01
We present an artificial neural network (ANN) for the classification of ambient seismic noise correlations into two categories, suitable and unsuitable for noise tomography. By using only a small manually classified data subset for network training, the ANN allows us to classify large data volumes with low human effort and to encode the valuable subjective experience of data analysts that cannot be captured by a deterministic algorithm. Based on a new feature extraction procedure that exploits the wavelet-like nature of seismic time-series, we efficiently reduce the dimensionality of noise correlation data, still keeping relevant features needed for automated classification. Using global- and regional-scale data sets, we show that classification errors of 20 per cent or less can be achieved when the network training is performed with as little as 3.5 per cent and 16 per cent of the data sets, respectively. Furthermore, the ANN trained on the regional data can be applied to the global data, and vice versa, without a significant increase of the classification error. An experiment where four students manually classified the data, revealed that the classification error they would assign to each other is substantially larger than the classification error of the ANN (>35 per cent). This indicates that reproducibility would be hampered more by human subjectivity than by imperfections of the ANN.
Microscopic saw mark analysis: an empirical approach.
Love, Jennifer C; Derrick, Sharon M; Wiersema, Jason M; Peters, Charles
2015-01-01
Microscopic saw mark analysis is a well published and generally accepted qualitative analytical method. However, little research has focused on identifying and mitigating potential sources of error associated with the method. The presented study proposes the use of classification trees and random forest classifiers as an optimal, statistically sound approach to mitigate the potential for error of variability and outcome error in microscopic saw mark analysis. The statistical model was applied to 58 experimental saw marks created with four types of saws. The saw marks were made in fresh human femurs obtained through anatomical gift and were analyzed using a Keyence digital microscope. The statistical approach weighed the variables based on discriminatory value and produced decision trees with an associated outcome error rate of 8.62-17.82%. © 2014 American Academy of Forensic Sciences.
Analyzing thematic maps and mapping for accuracy
Rosenfield, G.H.
1982-01-01
Two problems which exist while attempting to test the accuracy of thematic maps and mapping are: (1) evaluating the accuracy of thematic content, and (2) evaluating the effects of the variables on thematic mapping. Statistical analysis techniques are applicable to both these problems and include techniques for sampling the data and determining their accuracy. In addition, techniques for hypothesis testing, or inferential statistics, are used when comparing the effects of variables. A comprehensive and valid accuracy test of a classification project, such as thematic mapping from remotely sensed data, includes the following components of statistical analysis: (1) sample design, including the sample distribution, sample size, size of the sample unit, and sampling procedure; and (2) accuracy estimation, including estimation of the variance and confidence limits. Careful consideration must be given to the minimum sample size necessary to validate the accuracy of a given. classification category. The results of an accuracy test are presented in a contingency table sometimes called a classification error matrix. Usually the rows represent the interpretation, and the columns represent the verification. The diagonal elements represent the correct classifications. The remaining elements of the rows represent errors by commission, and the remaining elements of the columns represent the errors of omission. For tests of hypothesis that compare variables, the general practice has been to use only the diagonal elements from several related classification error matrices. These data are arranged in the form of another contingency table. The columns of the table represent the different variables being compared, such as different scales of mapping. The rows represent the blocking characteristics, such as the various categories of classification. The values in the cells of the tables might be the counts of correct classification or the binomial proportions of these counts divided by either the row totals or the column totals from the original classification error matrices. In hypothesis testing, when the results of tests of multiple sample cases prove to be significant, some form of statistical test must be used to separate any results that differ significantly from the others. In the past, many analyses of the data in this error matrix were made by comparing the relative magnitudes of the percentage of correct classifications, for either individual categories, the entire map or both. More rigorous analyses have used data transformations and (or) two-way classification analysis of variance. A more sophisticated step of data analysis techniques would be to use the entire classification error matrices using the methods of discrete multivariate analysis or of multiviariate analysis of variance.
Analysis of DSN software anomalies
NASA Technical Reports Server (NTRS)
Galorath, D. D.; Hecht, H.; Hecht, M.; Reifer, D. J.
1981-01-01
A categorized data base of software errors which were discovered during the various stages of development and operational use of the Deep Space Network DSN/Mark 3 System was developed. A study team identified several existing error classification schemes (taxonomies), prepared a detailed annotated bibliography of the error taxonomy literature, and produced a new classification scheme which was tuned to the DSN anomaly reporting system and encapsulated the work of others. Based upon the DSN/RCI error taxonomy, error data on approximately 1000 reported DSN/Mark 3 anomalies were analyzed, interpreted and classified. Next, error data are summarized and histograms were produced highlighting key tendencies.
Neyman-Pearson classification algorithms and NP receiver operating characteristics
Tong, Xin; Feng, Yang; Li, Jingyi Jessica
2018-01-01
In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, α, on the type I error. Despite its century-long history in hypothesis testing, the NP paradigm has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than α do not satisfy the type I error control objective because the resulting classifiers are likely to have type I errors much larger than α, and the NP paradigm has not been properly implemented in practice. We develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, such as logistic regression, support vector machines, and random forests. Powered by this algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands motivated by the popular ROC curves. NP-ROC bands will help choose α in a data-adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data studies. PMID:29423442
Neyman-Pearson classification algorithms and NP receiver operating characteristics.
Tong, Xin; Feng, Yang; Li, Jingyi Jessica
2018-02-01
In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, α, on the type I error. Despite its century-long history in hypothesis testing, the NP paradigm has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than α do not satisfy the type I error control objective because the resulting classifiers are likely to have type I errors much larger than α, and the NP paradigm has not been properly implemented in practice. We develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, such as logistic regression, support vector machines, and random forests. Powered by this algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands motivated by the popular ROC curves. NP-ROC bands will help choose α in a data-adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data studies.
Men, Hong; Fu, Songlin; Yang, Jialin; Cheng, Meiqi; Shi, Yan; Liu, Jingjing
2018-01-18
Paraffin odor intensity is an important quality indicator when a paraffin inspection is performed. Currently, paraffin odor level assessment is mainly dependent on an artificial sensory evaluation. In this paper, we developed a paraffin odor analysis system to classify and grade four kinds of paraffin samples. The original feature set was optimized using Principal Component Analysis (PCA) and Partial Least Squares (PLS). Support Vector Machine (SVM), Random Forest (RF), and Extreme Learning Machine (ELM) were applied to three different feature data sets for classification and level assessment of paraffin. For classification, the model based on SVM, with an accuracy rate of 100%, was superior to that based on RF, with an accuracy rate of 98.33-100%, and ELM, with an accuracy rate of 98.01-100%. For level assessment, the R² related to the training set was above 0.97 and the R² related to the test set was above 0.87. Through comprehensive comparison, the generalization of the model based on ELM was superior to those based on SVM and RF. The scoring errors for the three models were 0.0016-0.3494, lower than the error of 0.5-1.0 measured by industry standard experts, meaning these methods have a higher prediction accuracy for scoring paraffin level.
Automated Identification of Abnormal Adult EEGs
López, S.; Suarez, G.; Jungreis, D.; Obeid, I.; Picone, J.
2016-01-01
The interpretation of electroencephalograms (EEGs) is a process that is still dependent on the subjective analysis of the examiners. Though interrater agreement on critical events such as seizures is high, it is much lower on subtler events (e.g., when there are benign variants). The process used by an expert to interpret an EEG is quite subjective and hard to replicate by machine. The performance of machine learning technology is far from human performance. We have been developing an interpretation system, AutoEEG, with a goal of exceeding human performance on this task. In this work, we are focusing on one of the early decisions made in this process – whether an EEG is normal or abnormal. We explore two baseline classification algorithms: k-Nearest Neighbor (kNN) and Random Forest Ensemble Learning (RF). A subset of the TUH EEG Corpus was used to evaluate performance. Principal Components Analysis (PCA) was used to reduce the dimensionality of the data. kNN achieved a 41.8% detection error rate while RF achieved an error rate of 31.7%. These error rates are significantly lower than those obtained by random guessing based on priors (49.5%). The majority of the errors were related to misclassification of normal EEGs. PMID:27195311
Evaluation of drug administration errors in a teaching hospital
2012-01-01
Background Medication errors can occur at any of the three steps of the medication use process: prescribing, dispensing and administration. We aimed to determine the incidence, type and clinical importance of drug administration errors and to identify risk factors. Methods Prospective study based on disguised observation technique in four wards in a teaching hospital in Paris, France (800 beds). A pharmacist accompanied nurses and witnessed the preparation and administration of drugs to all patients during the three drug rounds on each of six days per ward. Main outcomes were number, type and clinical importance of errors and associated risk factors. Drug administration error rate was calculated with and without wrong time errors. Relationship between the occurrence of errors and potential risk factors were investigated using logistic regression models with random effects. Results Twenty-eight nurses caring for 108 patients were observed. Among 1501 opportunities for error, 415 administrations (430 errors) with one or more errors were detected (27.6%). There were 312 wrong time errors, ten simultaneously with another type of error, resulting in an error rate without wrong time error of 7.5% (113/1501). The most frequently administered drugs were the cardiovascular drugs (425/1501, 28.3%). The highest risks of error in a drug administration were for dermatological drugs. No potentially life-threatening errors were witnessed and 6% of errors were classified as having a serious or significant impact on patients (mainly omission). In multivariate analysis, the occurrence of errors was associated with drug administration route, drug classification (ATC) and the number of patient under the nurse's care. Conclusion Medication administration errors are frequent. The identification of its determinants helps to undertake designed interventions. PMID:22409837
Evaluation of drug administration errors in a teaching hospital.
Berdot, Sarah; Sabatier, Brigitte; Gillaizeau, Florence; Caruba, Thibaut; Prognon, Patrice; Durieux, Pierre
2012-03-12
Medication errors can occur at any of the three steps of the medication use process: prescribing, dispensing and administration. We aimed to determine the incidence, type and clinical importance of drug administration errors and to identify risk factors. Prospective study based on disguised observation technique in four wards in a teaching hospital in Paris, France (800 beds). A pharmacist accompanied nurses and witnessed the preparation and administration of drugs to all patients during the three drug rounds on each of six days per ward. Main outcomes were number, type and clinical importance of errors and associated risk factors. Drug administration error rate was calculated with and without wrong time errors. Relationship between the occurrence of errors and potential risk factors were investigated using logistic regression models with random effects. Twenty-eight nurses caring for 108 patients were observed. Among 1501 opportunities for error, 415 administrations (430 errors) with one or more errors were detected (27.6%). There were 312 wrong time errors, ten simultaneously with another type of error, resulting in an error rate without wrong time error of 7.5% (113/1501). The most frequently administered drugs were the cardiovascular drugs (425/1501, 28.3%). The highest risks of error in a drug administration were for dermatological drugs. No potentially life-threatening errors were witnessed and 6% of errors were classified as having a serious or significant impact on patients (mainly omission). In multivariate analysis, the occurrence of errors was associated with drug administration route, drug classification (ATC) and the number of patient under the nurse's care. Medication administration errors are frequent. The identification of its determinants helps to undertake designed interventions.
Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R
2009-12-01
To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.
Bayesian learning for spatial filtering in an EEG-based brain-computer interface.
Zhang, Haihong; Yang, Huijuan; Guan, Cuntai
2013-07-01
Spatial filtering for EEG feature extraction and classification is an important tool in brain-computer interface. However, there is generally no established theory that links spatial filtering directly to Bayes classification error. To address this issue, this paper proposes and studies a Bayesian analysis theory for spatial filtering in relation to Bayes error. Following the maximum entropy principle, we introduce a gamma probability model for describing single-trial EEG power features. We then formulate and analyze the theoretical relationship between Bayes classification error and the so-called Rayleigh quotient, which is a function of spatial filters and basically measures the ratio in power features between two classes. This paper also reports our extensive study that examines the theory and its use in classification, using three publicly available EEG data sets and state-of-the-art spatial filtering techniques and various classifiers. Specifically, we validate the positive relationship between Bayes error and Rayleigh quotient in real EEG power features. Finally, we demonstrate that the Bayes error can be practically reduced by applying a new spatial filter with lower Rayleigh quotient.
Sources of error in estimating truck traffic from automatic vehicle classification data
DOT National Transportation Integrated Search
1998-10-01
Truck annual average daily traffic estimation errors resulting from sample classification counts are computed in this paper under two scenarios. One scenario investigates an improper factoring procedure that may be used by highway agencies. The study...
Lyons-Weiler, James; Pelikan, Richard; Zeh, Herbert J; Whitcomb, David C; Malehorn, David E; Bigbee, William L; Hauskrecht, Milos
2005-01-01
Peptide profiles generated using SELDI/MALDI time of flight mass spectrometry provide a promising source of patient-specific information with high potential impact on the early detection and classification of cancer and other diseases. The new profiling technology comes, however, with numerous challenges and concerns. Particularly important are concerns of reproducibility of classification results and their significance. In this work we describe a computational validation framework, called PACE (Permutation-Achieved Classification Error), that lets us assess, for a given classification model, the significance of the Achieved Classification Error (ACE) on the profile data. The framework compares the performance statistic of the classifier on true data samples and checks if these are consistent with the behavior of the classifier on the same data with randomly reassigned class labels. A statistically significant ACE increases our belief that a discriminative signal was found in the data. The advantage of PACE analysis is that it can be easily combined with any classification model and is relatively easy to interpret. PACE analysis does not protect researchers against confounding in the experimental design, or other sources of systematic or random error. We use PACE analysis to assess significance of classification results we have achieved on a number of published data sets. The results show that many of these datasets indeed possess a signal that leads to a statistically significant ACE.
Automated Classification of Phonological Errors in Aphasic Language
Ahuja, Sanjeev B.; Reggia, James A.; Berndt, Rita S.
1984-01-01
Using heuristically-guided state space search, a prototype program has been developed to simulate and classify phonemic errors occurring in the speech of neurologically-impaired patients. Simulations are based on an interchangeable rule/operator set of elementary errors which represent a theory of phonemic processing faults. This work introduces and evaluates a novel approach to error simulation and classification, it provides a prototype simulation tool for neurolinguistic research, and it forms the initial phase of a larger research effort involving computer modelling of neurolinguistic processes.
ANALYSIS OF A CLASSIFICATION ERROR MATRIX USING CATEGORICAL DATA TECHNIQUES.
Rosenfield, George H.; Fitzpatrick-Lins, Katherine
1984-01-01
Summary form only given. A classification error matrix typically contains tabulation results of an accuracy evaluation of a thematic classification, such as that of a land use and land cover map. The diagonal elements of the matrix represent the counts corrected, and the usual designation of classification accuracy has been the total percent correct. The nondiagonal elements of the matrix have usually been neglected. The classification error matrix is known in statistical terms as a contingency table of categorical data. As an example, an application of these methodologies to a problem of remotely sensed data concerning two photointerpreters and four categories of classification indicated that there is no significant difference in the interpretation between the two photointerpreters, and that there are significant differences among the interpreted category classifications. However, two categories, oak and cottonwood, are not separable in classification in this experiment at the 0. 51 percent probability. A coefficient of agreement is determined for the interpreted map as a whole, and individually for each of the interpreted categories. A conditional coefficient of agreement for the individual categories is compared to other methods for expressing category accuracy which have already been presented in the remote sensing literature.
NASA Astrophysics Data System (ADS)
Jiang, Yicheng; Cheng, Ping; Ou, Yangkui
2001-09-01
A new method for target classification of high-range resolution radar is proposed. It tries to use neural learning to obtain invariant subclass features of training range profiles. A modified Euclidean metric based on the Box-Cox transformation technique is investigated for Nearest Neighbor target classification improvement. The classification experiments using real radar data of three different aircraft have demonstrated that classification error can reduce 8% if this method proposed in this paper is chosen instead of the conventional method. The results of this paper have shown that by choosing an optimized metric, it is indeed possible to reduce the classification error without increasing the number of samples.
ERIC Educational Resources Information Center
Protopapas, Athanassios; Fakou, Aikaterini; Drakopoulou, Styliani; Skaloumbakas, Christos; Mouzaki, Angeliki
2013-01-01
In this study we propose a classification system for spelling errors and determine the most common spelling difficulties of Greek children with and without dyslexia. Spelling skills of 542 children from the general population and 44 children with dyslexia, Grades 3-4 and 7, were assessed with a dictated common word list and age-appropriate…
Validation Relaxation: A Quality Assurance Strategy for Electronic Data Collection
Gordon, Nicholas; Griffiths, Thomas; Kraemer, John D; Siedner, Mark J
2017-01-01
Background The use of mobile devices for data collection in developing world settings is becoming increasingly common and may offer advantages in data collection quality and efficiency relative to paper-based methods. However, mobile data collection systems can hamper many standard quality assurance techniques due to the lack of a hardcopy backup of data. Consequently, mobile health data collection platforms have the potential to generate datasets that appear valid, but are susceptible to unidentified database design flaws, areas of miscomprehension by enumerators, and data recording errors. Objective We describe the design and evaluation of a strategy for estimating data error rates and assessing enumerator performance during electronic data collection, which we term “validation relaxation.” Validation relaxation involves the intentional omission of data validation features for select questions to allow for data recording errors to be committed, detected, and monitored. Methods We analyzed data collected during a cluster sample population survey in rural Liberia using an electronic data collection system (Open Data Kit). We first developed a classification scheme for types of detectable errors and validation alterations required to detect them. We then implemented the following validation relaxation techniques to enable data error conduct and detection: intentional redundancy, removal of “required” constraint, and illogical response combinations. This allowed for up to 11 identifiable errors to be made per survey. The error rate was defined as the total number of errors committed divided by the number of potential errors. We summarized crude error rates and estimated changes in error rates over time for both individuals and the entire program using logistic regression. Results The aggregate error rate was 1.60% (125/7817). Error rates did not differ significantly between enumerators (P=.51), but decreased for the cohort with increasing days of application use, from 2.3% at survey start (95% CI 1.8%-2.8%) to 0.6% at day 45 (95% CI 0.3%-0.9%; OR=0.969; P<.001). The highest error rate (84/618, 13.6%) occurred for an intentional redundancy question for a birthdate field, which was repeated in separate sections of the survey. We found low error rates (0.0% to 3.1%) for all other possible errors. Conclusions A strategy of removing validation rules on electronic data capture platforms can be used to create a set of detectable data errors, which can subsequently be used to assess group and individual enumerator error rates, their trends over time, and categories of data collection that require further training or additional quality control measures. This strategy may be particularly useful for identifying individual enumerators or systematic data errors that are responsive to enumerator training and is best applied to questions for which errors cannot be prevented through training or software design alone. Validation relaxation should be considered as a component of a holistic data quality assurance strategy. PMID:28821474
Validation Relaxation: A Quality Assurance Strategy for Electronic Data Collection.
Kenny, Avi; Gordon, Nicholas; Griffiths, Thomas; Kraemer, John D; Siedner, Mark J
2017-08-18
The use of mobile devices for data collection in developing world settings is becoming increasingly common and may offer advantages in data collection quality and efficiency relative to paper-based methods. However, mobile data collection systems can hamper many standard quality assurance techniques due to the lack of a hardcopy backup of data. Consequently, mobile health data collection platforms have the potential to generate datasets that appear valid, but are susceptible to unidentified database design flaws, areas of miscomprehension by enumerators, and data recording errors. We describe the design and evaluation of a strategy for estimating data error rates and assessing enumerator performance during electronic data collection, which we term "validation relaxation." Validation relaxation involves the intentional omission of data validation features for select questions to allow for data recording errors to be committed, detected, and monitored. We analyzed data collected during a cluster sample population survey in rural Liberia using an electronic data collection system (Open Data Kit). We first developed a classification scheme for types of detectable errors and validation alterations required to detect them. We then implemented the following validation relaxation techniques to enable data error conduct and detection: intentional redundancy, removal of "required" constraint, and illogical response combinations. This allowed for up to 11 identifiable errors to be made per survey. The error rate was defined as the total number of errors committed divided by the number of potential errors. We summarized crude error rates and estimated changes in error rates over time for both individuals and the entire program using logistic regression. The aggregate error rate was 1.60% (125/7817). Error rates did not differ significantly between enumerators (P=.51), but decreased for the cohort with increasing days of application use, from 2.3% at survey start (95% CI 1.8%-2.8%) to 0.6% at day 45 (95% CI 0.3%-0.9%; OR=0.969; P<.001). The highest error rate (84/618, 13.6%) occurred for an intentional redundancy question for a birthdate field, which was repeated in separate sections of the survey. We found low error rates (0.0% to 3.1%) for all other possible errors. A strategy of removing validation rules on electronic data capture platforms can be used to create a set of detectable data errors, which can subsequently be used to assess group and individual enumerator error rates, their trends over time, and categories of data collection that require further training or additional quality control measures. This strategy may be particularly useful for identifying individual enumerators or systematic data errors that are responsive to enumerator training and is best applied to questions for which errors cannot be prevented through training or software design alone. Validation relaxation should be considered as a component of a holistic data quality assurance strategy. ©Avi Kenny, Nicholas Gordon, Thomas Griffiths, John D Kraemer, Mark J Siedner. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 18.08.2017.
Zheng, Wenjing; Balzer, Laura; van der Laan, Mark; Petersen, Maya
2018-01-30
Binary classification problems are ubiquitous in health and social sciences. In many cases, one wishes to balance two competing optimality considerations for a binary classifier. For instance, in resource-limited settings, an human immunodeficiency virus prevention program based on offering pre-exposure prophylaxis (PrEP) to select high-risk individuals must balance the sensitivity of the binary classifier in detecting future seroconverters (and hence offering them PrEP regimens) with the total number of PrEP regimens that is financially and logistically feasible for the program. In this article, we consider a general class of constrained binary classification problems wherein the objective function and the constraint are both monotonic with respect to a threshold. These include the minimization of the rate of positive predictions subject to a minimum sensitivity, the maximization of sensitivity subject to a maximum rate of positive predictions, and the Neyman-Pearson paradigm, which minimizes the type II error subject to an upper bound on the type I error. We propose an ensemble approach to these binary classification problems based on the Super Learner methodology. This approach linearly combines a user-supplied library of scoring algorithms, with combination weights and a discriminating threshold chosen to minimize the constrained optimality criterion. We then illustrate the application of the proposed classifier to develop an individualized PrEP targeting strategy in a resource-limited setting, with the goal of minimizing the number of PrEP offerings while achieving a minimum required sensitivity. This proof of concept data analysis uses baseline data from the ongoing Sustainable East Africa Research in Community Health study. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Digital Troposcatter Performance Model. Users Manual.
1983-11-01
and Information Systems - .,- - - UNCLASSIFIED SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered) S REPORT DOCUIAENTATION PAGE READ...Diffraction Multipath Prediction MD-918 Modem Error Rate Prediction AN/TRC-170 Link Analysis 20. ABSTRACT (Continue en reverse esie If neceseay end...configurations used in the Defense Communications System (DCS), and prediction of the performance of both the MD-918 and AN/TRC-170 digital troposcatter modems
Deep neural networks for texture classification-A theoretical analysis.
Basu, Saikat; Mukhopadhyay, Supratik; Karki, Manohar; DiBiano, Robert; Ganguly, Sangram; Nemani, Ramakrishna; Gayaka, Shreekant
2018-01-01
We investigate the use of Deep Neural Networks for the classification of image datasets where texture features are important for generating class-conditional discriminative representations. To this end, we first derive the size of the feature space for some standard textural features extracted from the input dataset and then use the theory of Vapnik-Chervonenkis dimension to show that hand-crafted feature extraction creates low-dimensional representations which help in reducing the overall excess error rate. As a corollary to this analysis, we derive for the first time upper bounds on the VC dimension of Convolutional Neural Network as well as Dropout and Dropconnect networks and the relation between excess error rate of Dropout and Dropconnect networks. The concept of intrinsic dimension is used to validate the intuition that texture-based datasets are inherently higher dimensional as compared to handwritten digits or other object recognition datasets and hence more difficult to be shattered by neural networks. We then derive the mean distance from the centroid to the nearest and farthest sampling points in an n-dimensional manifold and show that the Relative Contrast of the sample data vanishes as dimensionality of the underlying vector space tends to infinity. Copyright © 2017 Elsevier Ltd. All rights reserved.
Error-related brain activity and error awareness in an error classification paradigm.
Di Gregorio, Francesco; Steinhauser, Marco; Maier, Martin E
2016-10-01
Error-related brain activity has been linked to error detection enabling adaptive behavioral adjustments. However, it is still unclear which role error awareness plays in this process. Here, we show that the error-related negativity (Ne/ERN), an event-related potential reflecting early error monitoring, is dissociable from the degree of error awareness. Participants responded to a target while ignoring two different incongruent distractors. After responding, they indicated whether they had committed an error, and if so, whether they had responded to one or to the other distractor. This error classification paradigm allowed distinguishing partially aware errors, (i.e., errors that were noticed but misclassified) and fully aware errors (i.e., errors that were correctly classified). The Ne/ERN was larger for partially aware errors than for fully aware errors. Whereas this speaks against the idea that the Ne/ERN foreshadows the degree of error awareness, it confirms the prediction of a computational model, which relates the Ne/ERN to post-response conflict. This model predicts that stronger distractor processing - a prerequisite of error classification in our paradigm - leads to lower post-response conflict and thus a smaller Ne/ERN. This implies that the relationship between Ne/ERN and error awareness depends on how error awareness is related to response conflict in a specific task. Our results further indicate that the Ne/ERN but not the degree of error awareness determines adaptive performance adjustments. Taken together, we conclude that the Ne/ERN is dissociable from error awareness and foreshadows adaptive performance adjustments. Our results suggest that the relationship between the Ne/ERN and error awareness is correlative and mediated by response conflict. Copyright © 2016 Elsevier Inc. All rights reserved.
2012-01-01
Background Electromyography (EMG) pattern-recognition based control strategies for multifunctional myoelectric prosthesis systems have been studied commonly in a controlled laboratory setting. Before these myoelectric prosthesis systems are clinically viable, it will be necessary to assess the effect of some disparities between the ideal laboratory setting and practical use on the control performance. One important obstacle is the impact of arm position variation that causes the changes of EMG pattern when performing identical motions in different arm positions. This study aimed to investigate the impacts of arm position variation on EMG pattern-recognition based motion classification in upper-limb amputees and the solutions for reducing these impacts. Methods With five unilateral transradial (TR) amputees, the EMG signals and tri-axial accelerometer mechanomyography (ACC-MMG) signals were simultaneously collected from both amputated and intact arms when performing six classes of arm and hand movements in each of five arm positions that were considered in the study. The effect of the arm position changes was estimated in terms of motion classification error and compared between amputated and intact arms. Then the performance of three proposed methods in attenuating the impact of arm positions was evaluated. Results With EMG signals, the average intra-position and inter-position classification errors across all five arm positions and five subjects were around 7.3% and 29.9% from amputated arms, respectively, about 1.0% and 10% low in comparison with those from intact arms. While ACC-MMG signals could yield a similar intra-position classification error (9.9%) as EMG, they had much higher inter-position classification error with an average value of 81.1% over the arm positions and the subjects. When the EMG data from all five arm positions were involved in the training set, the average classification error reached a value of around 10.8% for amputated arms. Using a two-stage cascade classifier, the average classification error was around 9.0% over all five arm positions. Reducing ACC-MMG channels from 8 to 2 only increased the average position classification error across all five arm positions from 0.7% to 1.0% in amputated arms. Conclusions The performance of EMG pattern-recognition based method in classifying movements strongly depends on arm positions. This dependency is a little stronger in intact arm than in amputated arm, which suggests that the investigations associated with practical use of a myoelectric prosthesis should use the limb amputees as subjects instead of using able-body subjects. The two-stage cascade classifier mode with ACC-MMG for limb position identification and EMG for limb motion classification may be a promising way to reduce the effect of limb position variation on classification performance. PMID:23036049
NASA Astrophysics Data System (ADS)
Chestek, Cynthia A.; Gilja, Vikash; Blabe, Christine H.; Foster, Brett L.; Shenoy, Krishna V.; Parvizi, Josef; Henderson, Jaimie M.
2013-04-01
Objective. Brain-machine interface systems translate recorded neural signals into command signals for assistive technology. In individuals with upper limb amputation or cervical spinal cord injury, the restoration of a useful hand grasp could significantly improve daily function. We sought to determine if electrocorticographic (ECoG) signals contain sufficient information to select among multiple hand postures for a prosthetic hand, orthotic, or functional electrical stimulation system.Approach. We recorded ECoG signals from subdural macro- and microelectrodes implanted in motor areas of three participants who were undergoing inpatient monitoring for diagnosis and treatment of intractable epilepsy. Participants performed five distinct isometric hand postures, as well as four distinct finger movements. Several control experiments were attempted in order to remove sensory information from the classification results. Online experiments were performed with two participants. Main results. Classification rates were 68%, 84% and 81% for correct identification of 5 isometric hand postures offline. Using 3 potential controls for removing sensory signals, error rates were approximately doubled on average (2.1×). A similar increase in errors (2.6×) was noted when the participant was asked to make simultaneous wrist movements along with the hand postures. In online experiments, fist versus rest was successfully classified on 97% of trials; the classification output drove a prosthetic hand. Online classification performance for a larger number of hand postures remained above chance, but substantially below offline performance. In addition, the long integration windows used would preclude the use of decoded signals for control of a BCI system. Significance. These results suggest that ECoG is a plausible source of command signals for prosthetic grasp selection. Overall, avenues remain for improvement through better electrode designs and placement, better participant training, and characterization of non-stationarities such that ECoG could be a viable signal source for grasp control for amputees or individuals with paralysis.
Algorithmic Classification of Five Characteristic Types of Paraphasias.
Fergadiotis, Gerasimos; Gorman, Kyle; Bedrick, Steven
2016-12-01
This study was intended to evaluate a series of algorithms developed to perform automatic classification of paraphasic errors (formal, semantic, mixed, neologistic, and unrelated errors). We analyzed 7,111 paraphasias from the Moss Aphasia Psycholinguistics Project Database (Mirman et al., 2010) and evaluated the classification accuracy of 3 automated tools. First, we used frequency norms from the SUBTLEXus database (Brysbaert & New, 2009) to differentiate nonword errors and real-word productions. Then we implemented a phonological-similarity algorithm to identify phonologically related real-word errors. Last, we assessed the performance of a semantic-similarity criterion that was based on word2vec (Mikolov, Yih, & Zweig, 2013). Overall, the algorithmic classification replicated human scoring for the major categories of paraphasias studied with high accuracy. The tool that was based on the SUBTLEXus frequency norms was more than 97% accurate in making lexicality judgments. The phonological-similarity criterion was approximately 91% accurate, and the overall classification accuracy of the semantic classifier ranged from 86% to 90%. Overall, the results highlight the potential of tools from the field of natural language processing for the development of highly reliable, cost-effective diagnostic tools suitable for collecting high-quality measurement data for research and clinical purposes.
DOT National Transportation Integrated Search
2001-02-01
The Human Factors Analysis and Classification System (HFACS) is a general human error framework : originally developed and tested within the U.S. military as a tool for investigating and analyzing the human : causes of aviation accidents. Based upon ...
Content-based multiple bitstream image transmission over noisy channels.
Cao, Lei; Chen, Chang Wen
2002-01-01
In this paper, we propose a novel combined source and channel coding scheme for image transmission over noisy channels. The main feature of the proposed scheme is a systematic decomposition of image sources so that unequal error protection can be applied according to not only bit error sensitivity but also visual content importance. The wavelet transform is adopted to hierarchically decompose the image. The association between the wavelet coefficients and what they represent spatially in the original image is fully exploited so that wavelet blocks are classified based on their corresponding image content. The classification produces wavelet blocks in each class with similar content and statistics, therefore enables high performance source compression using the set partitioning in hierarchical trees (SPIHT) algorithm. To combat the channel noise, an unequal error protection strategy with rate-compatible punctured convolutional/cyclic redundancy check (RCPC/CRC) codes is implemented based on the bit contribution to both peak signal-to-noise ratio (PSNR) and visual quality. At the receiving end, a postprocessing method making use of the SPIHT decoding structure and the classification map is developed to restore the degradation due to the residual error after channel decoding. Experimental results show that the proposed scheme is indeed able to provide protection both for the bits that are more sensitive to errors and for the more important visual content under a noisy transmission environment. In particular, the reconstructed images illustrate consistently better visual quality than using the single-bitstream-based schemes.
Parallel photonic information processing at gigabyte per second data rates using transient states
NASA Astrophysics Data System (ADS)
Brunner, Daniel; Soriano, Miguel C.; Mirasso, Claudio R.; Fischer, Ingo
2013-01-01
The increasing demands on information processing require novel computational concepts and true parallelism. Nevertheless, hardware realizations of unconventional computing approaches never exceeded a marginal existence. While the application of optics in super-computing receives reawakened interest, new concepts, partly neuro-inspired, are being considered and developed. Here we experimentally demonstrate the potential of a simple photonic architecture to process information at unprecedented data rates, implementing a learning-based approach. A semiconductor laser subject to delayed self-feedback and optical data injection is employed to solve computationally hard tasks. We demonstrate simultaneous spoken digit and speaker recognition and chaotic time-series prediction at data rates beyond 1Gbyte/s. We identify all digits with very low classification errors and perform chaotic time-series prediction with 10% error. Our approach bridges the areas of photonic information processing, cognitive and information science.
Sheehan, David V; Giddens, Jennifer M; Sheehan, Kathy Harnett
2014-09-01
Standard international classification criteria require that classification categories be comprehensive to avoid type II error. Categories should be mutually exclusive and definitions should be clear and unambiguous (to avoid type I and type II errors). In addition, the classification system should be robust enough to last over time and provide comparability between data collections. This article was designed to evaluate the extent to which the classification system contained in the United States Food and Drug Administration 2012 Draft Guidance for the prospective assessment and classification of suicidal ideation and behavior in clinical trials meets these criteria. A critical review is used to assess the extent to which the proposed categories contained in the Food and Drug Administration 2012 Draft Guidance are comprehensive, unambiguous, and robust. Assumptions that underlie the classification system are also explored. The Food and Drug Administration classification system contained in the 2012 Draft Guidance does not capture the full range of suicidal ideation and behavior (type II error). Definitions, moreover, are frequently ambiguous (susceptible to multiple interpretations), and the potential for misclassification (type I and type II errors) is compounded by frequent mismatches in category titles and definitions. These issues have the potential to compromise data comparability within clinical trial sites, across sites, and over time. These problems need to be remedied because of the potential for flawed data output and consequent threats to public health, to research on the safety of medications, and to the search for effective medication treatments for suicidality.
Evaluation of communication in wireless underground sensor networks
NASA Astrophysics Data System (ADS)
Yu, X. Q.; Zhang, Z. L.; Han, W. T.
2017-06-01
Wireless underground sensor networks (WUSN) are an emerging area of research that promises to provide communication capabilities to buried sensors. In this paper, experimental measurements have been conducted with commodity sensor motes at the frequency of 2.4GHz and 433 MHz, respectively. Experiments are run to examine the received signal strength of correctly received packets and the packet error rate for a communication link. The tests show the potential feasibility of the WUSN with the use of powerful RF transceivers at 433MHz frequency. Moreover, we also illustrate a classification for wireless underground sensor network communication. Finally, we conclude that the effects of burial depth, inter-node distance and volumetric water content of the soil on the signal strength and packet error rate in communication of WUSN.
NASA Astrophysics Data System (ADS)
Dash, Jatindra K.; Kale, Mandar; Mukhopadhyay, Sudipta; Khandelwal, Niranjan; Prabhakar, Nidhi; Garg, Mandeep; Kalra, Naveen
2017-03-01
In this paper, we investigate the effect of the error criteria used during a training phase of the artificial neural network (ANN) on the accuracy of the classifier for classification of lung tissues affected with Interstitial Lung Diseases (ILD). Mean square error (MSE) and the cross-entropy (CE) criteria are chosen being most popular choice in state-of-the-art implementations. The classification experiment performed on the six interstitial lung disease (ILD) patterns viz. Consolidation, Emphysema, Ground Glass Opacity, Micronodules, Fibrosis and Healthy from MedGIFT database. The texture features from an arbitrary region of interest (AROI) are extracted using Gabor filter. Two different neural networks are trained with the scaled conjugate gradient back propagation algorithm with MSE and CE error criteria function respectively for weight updation. Performance is evaluated in terms of average accuracy of these classifiers using 4 fold cross-validation. Each network is trained for five times for each fold with randomly initialized weight vectors and accuracies are computed. Significant improvement in classification accuracy is observed when ANN is trained by using CE (67.27%) as error function compared to MSE (63.60%). Moreover, standard deviation of the classification accuracy for the network trained with CE (6.69) error criteria is found less as compared to network trained with MSE (10.32) criteria.
An evaluation of computer assisted clinical classification algorithms.
Chute, C G; Yang, Y; Buntrock, J
1994-01-01
The Mayo Clinic has a long tradition of indexing patient records in high resolution and volume. Several algorithms have been developed which promise to help human coders in the classification process. We evaluate variations on code browsers and free text indexing systems with respect to their speed and error rates in our production environment. The more sophisticated indexing systems save measurable time in the coding process, but suffer from incompleteness which requires a back-up system or human verification. Expert Network does the best job of rank ordering clinical text, potentially enabling the creation of thresholds for the pass through of computer coded data without human review.
The impact of OCR accuracy on automated cancer classification of pathology reports.
Zuccon, Guido; Nguyen, Anthony N; Bergheim, Anton; Wickman, Sandra; Grayson, Narelle
2012-01-01
To evaluate the effects of Optical Character Recognition (OCR) on the automatic cancer classification of pathology reports. Scanned images of pathology reports were converted to electronic free-text using a commercial OCR system. A state-of-the-art cancer classification system, the Medical Text Extraction (MEDTEX) system, was used to automatically classify the OCR reports. Classifications produced by MEDTEX on the OCR versions of the reports were compared with the classification from a human amended version of the OCR reports. The employed OCR system was found to recognise scanned pathology reports with up to 99.12% character accuracy and up to 98.95% word accuracy. Errors in the OCR processing were found to minimally impact on the automatic classification of scanned pathology reports into notifiable groups. However, the impact of OCR errors is not negligible when considering the extraction of cancer notification items, such as primary site, histological type, etc. The automatic cancer classification system used in this work, MEDTEX, has proven to be robust to errors produced by the acquisition of freetext pathology reports from scanned images through OCR software. However, issues emerge when considering the extraction of cancer notification items.
Evaluating structural pattern recognition for handwritten math via primitive label graphs
NASA Astrophysics Data System (ADS)
Zanibbi, Richard; MoucheÌre, Harold; Viard-Gaudin, Christian
2013-01-01
Currently, structural pattern recognizer evaluations compare graphs of detected structure to target structures (i.e. ground truth) using recognition rates, recall and precision for object segmentation, classification and relationships. In document recognition, these target objects (e.g. symbols) are frequently comprised of multiple primitives (e.g. connected components, or strokes for online handwritten data), but current metrics do not characterize errors at the primitive level, from which object-level structure is obtained. Primitive label graphs are directed graphs defined over primitives and primitive pairs. We define new metrics obtained by Hamming distances over label graphs, which allow classification, segmentation and parsing errors to be characterized separately, or using a single measure. Recall and precision for detected objects may also be computed directly from label graphs. We illustrate the new metrics by comparing a new primitive-level evaluation to the symbol-level evaluation performed for the CROHME 2012 handwritten math recognition competition. A Python-based set of utilities for evaluating, visualizing and translating label graphs is publicly available.
NASA Astrophysics Data System (ADS)
Sakuma, Jun; Wright, Rebecca N.
Privacy-preserving classification is the task of learning or training a classifier on the union of privately distributed datasets without sharing the datasets. The emphasis of existing studies in privacy-preserving classification has primarily been put on the design of privacy-preserving versions of particular data mining algorithms, However, in classification problems, preprocessing and postprocessing— such as model selection or attribute selection—play a prominent role in achieving higher classification accuracy. In this paper, we show generalization error of classifiers in privacy-preserving classification can be securely evaluated without sharing prediction results. Our main technical contribution is a new generalized Hamming distance protocol that is universally applicable to preprocessing and postprocessing of various privacy-preserving classification problems, such as model selection in support vector machine and attribute selection in naive Bayes classification.
McDonnell, Mark D.; Tissera, Migel D.; Vladusich, Tony; van Schaik, André; Tapson, Jonathan
2015-01-01
Recent advances in training deep (multi-layer) architectures have inspired a renaissance in neural network use. For example, deep convolutional networks are becoming the default option for difficult tasks on large datasets, such as image and speech recognition. However, here we show that error rates below 1% on the MNIST handwritten digit benchmark can be replicated with shallow non-convolutional neural networks. This is achieved by training such networks using the ‘Extreme Learning Machine’ (ELM) approach, which also enables a very rapid training time (∼ 10 minutes). Adding distortions, as is common practise for MNIST, reduces error rates even further. Our methods are also shown to be capable of achieving less than 5.5% error rates on the NORB image database. To achieve these results, we introduce several enhancements to the standard ELM algorithm, which individually and in combination can significantly improve performance. The main innovation is to ensure each hidden-unit operates only on a randomly sized and positioned patch of each image. This form of random ‘receptive field’ sampling of the input ensures the input weight matrix is sparse, with about 90% of weights equal to zero. Furthermore, combining our methods with a small number of iterations of a single-batch backpropagation method can significantly reduce the number of hidden-units required to achieve a particular performance. Our close to state-of-the-art results for MNIST and NORB suggest that the ease of use and accuracy of the ELM algorithm for designing a single-hidden-layer neural network classifier should cause it to be given greater consideration either as a standalone method for simpler problems, or as the final classification stage in deep neural networks applied to more difficult problems. PMID:26262687
Men, Hong; Fu, Songlin; Yang, Jialin; Cheng, Meiqi; Shi, Yan
2018-01-01
Paraffin odor intensity is an important quality indicator when a paraffin inspection is performed. Currently, paraffin odor level assessment is mainly dependent on an artificial sensory evaluation. In this paper, we developed a paraffin odor analysis system to classify and grade four kinds of paraffin samples. The original feature set was optimized using Principal Component Analysis (PCA) and Partial Least Squares (PLS). Support Vector Machine (SVM), Random Forest (RF), and Extreme Learning Machine (ELM) were applied to three different feature data sets for classification and level assessment of paraffin. For classification, the model based on SVM, with an accuracy rate of 100%, was superior to that based on RF, with an accuracy rate of 98.33–100%, and ELM, with an accuracy rate of 98.01–100%. For level assessment, the R2 related to the training set was above 0.97 and the R2 related to the test set was above 0.87. Through comprehensive comparison, the generalization of the model based on ELM was superior to those based on SVM and RF. The scoring errors for the three models were 0.0016–0.3494, lower than the error of 0.5–1.0 measured by industry standard experts, meaning these methods have a higher prediction accuracy for scoring paraffin level. PMID:29346328
Information analysis of a spatial database for ecological land classification
NASA Technical Reports Server (NTRS)
Davis, Frank W.; Dozier, Jeff
1990-01-01
An ecological land classification was developed for a complex region in southern California using geographic information system techniques of map overlay and contingency table analysis. Land classes were identified by mutual information analysis of vegetation pattern in relation to other mapped environmental variables. The analysis was weakened by map errors, especially errors in the digital elevation data. Nevertheless, the resulting land classification was ecologically reasonable and performed well when tested with higher quality data from the region.
Incidence of speech recognition errors in the emergency department.
Goss, Foster R; Zhou, Li; Weiner, Scott G
2016-09-01
Physician use of computerized speech recognition (SR) technology has risen in recent years due to its ease of use and efficiency at the point of care. However, error rates between 10 and 23% have been observed, raising concern about the number of errors being entered into the permanent medical record, their impact on quality of care and medical liability that may arise. Our aim was to determine the incidence and types of SR errors introduced by this technology in the emergency department (ED). Level 1 emergency department with 42,000 visits/year in a tertiary academic teaching hospital. A random sample of 100 notes dictated by attending emergency physicians (EPs) using SR software was collected from the ED electronic health record between January and June 2012. Two board-certified EPs annotated the notes and conducted error analysis independently. An existing classification schema was adopted to classify errors into eight errors types. Critical errors deemed to potentially impact patient care were identified. There were 128 errors in total or 1.3 errors per note, and 14.8% (n=19) errors were judged to be critical. 71% of notes contained errors, and 15% contained one or more critical errors. Annunciation errors were the highest at 53.9% (n=69), followed by deletions at 18.0% (n=23) and added words at 11.7% (n=15). Nonsense errors, homonyms and spelling errors were present in 10.9% (n=14), 4.7% (n=6), and 0.8% (n=1) of notes, respectively. There were no suffix or dictionary errors. Inter-annotator agreement was 97.8%. This is the first estimate at classifying speech recognition errors in dictated emergency department notes. Speech recognition errors occur commonly with annunciation errors being the most frequent. Error rates were comparable if not lower than previous studies. 15% of errors were deemed critical, potentially leading to miscommunication that could affect patient care. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A bayesian approach to classification criteria for spectacled eiders
Taylor, B.L.; Wade, P.R.; Stehn, R.A.; Cochrane, J.F.
1996-01-01
To facilitate decisions to classify species according to risk of extinction, we used Bayesian methods to analyze trend data for the Spectacled Eider, an arctic sea duck. Trend data from three independent surveys of the Yukon-Kuskokwim Delta were analyzed individually and in combination to yield posterior distributions for population growth rates. We used classification criteria developed by the recovery team for Spectacled Eiders that seek to equalize errors of under- or overprotecting the species. We conducted both a Bayesian decision analysis and a frequentist (classical statistical inference) decision analysis. Bayesian decision analyses are computationally easier, yield basically the same results, and yield results that are easier to explain to nonscientists. With the exception of the aerial survey analysis of the 10 most recent years, both Bayesian and frequentist methods indicated that an endangered classification is warranted. The discrepancy between surveys warrants further research. Although the trend data are abundance indices, we used a preliminary estimate of absolute abundance to demonstrate how to calculate extinction distributions using the joint probability distributions for population growth rate and variance in growth rate generated by the Bayesian analysis. Recent apparent increases in abundance highlight the need for models that apply to declining and then recovering species.
Disregarding population specificity: its influence on the sex assessment methods from the tibia.
Kotěrová, Anežka; Velemínská, Jana; Dupej, Ján; Brzobohatá, Hana; Pilný, Aleš; Brůžek, Jaroslav
2017-01-01
Forensic anthropology has developed classification techniques for sex estimation of unknown skeletal remains, for example population-specific discriminant function analyses. These methods were designed for populations that lived mostly in the late nineteenth and twentieth centuries. Their level of reliability or misclassification is important for practical use in today's forensic practice; it is, however, unknown. We addressed the question of what the likelihood of errors would be if population specificity of discriminant functions of the tibia were disregarded. Moreover, five classification functions in a Czech sample were proposed (accuracies 82.1-87.5 %, sex bias ranged from -1.3 to -5.4 %). We measured ten variables traditionally used for sex assessment of the tibia on a sample of 30 male and 26 female models from recent Czech population. To estimate the classification accuracy and error (misclassification) rates ignoring population specificity, we selected published classification functions of tibia for the Portuguese, south European, and the North American populations. These functions were applied on the dimensions of the Czech population. Comparing the classification success of the reference and the tested Czech sample showed that females from Czech population were significantly overestimated and mostly misclassified as males. Overall accuracy of sex assessment significantly decreased (53.6-69.7 %), sex bias -29.4-100 %, which is most probably caused by secular trend and the generally high variability of body size. Results indicate that the discriminant functions, developed for skeletal series representing geographically and chronologically diverse populations, are not applicable in current forensic investigations. Finally, implications and recommendations for future research are discussed.
Rawlins, B G; Scheib, C; Tyler, A N; Beamish, D
2012-12-01
Regulatory authorities need ways to estimate natural terrestrial gamma radiation dose rates (nGy h⁻¹) across the landscape accurately, to assess its potential deleterious health effects. The primary method for estimating outdoor dose rate is to use an in situ detector supported 1 m above the ground, but such measurements are costly and cannot capture the landscape-scale variation in dose rates which are associated with changes in soil and parent material mineralogy. We investigate the potential for improving estimates of terrestrial gamma dose rates across Northern Ireland (13,542 km²) using measurements from 168 sites and two sources of ancillary data: (i) a map based on a simplified classification of soil parent material, and (ii) dose estimates from a national-scale, airborne radiometric survey. We used the linear mixed modelling framework in which the two ancillary variables were included in separate models as fixed effects, plus a correlation structure which captures the spatially correlated variance component. We used a cross-validation procedure to determine the magnitude of the prediction errors for the different models. We removed a random subset of 10 terrestrial measurements and formed the model from the remainder (n = 158), and then used the model to predict values at the other 10 sites. We repeated this procedure 50 times. The measurements of terrestrial dose vary between 1 and 103 (nGy h⁻¹). The median absolute model prediction errors (nGy h⁻¹) for the three models declined in the following order: no ancillary data (10.8) > simple geological classification (8.3) > airborne radiometric dose (5.4) as a single fixed effect. Estimates of airborne radiometric gamma dose rate can significantly improve the spatial prediction of terrestrial dose rate.
van der Heijden, R T; Heijnen, J J; Hellinga, C; Romein, B; Luyben, K C
1994-01-05
Measurements provide the basis for process monitoring and control as well as for model development and validation. Systematic approaches to increase the accuracy and credibility of the empirical data set are therefore of great value. In (bio)chemical conversions, linear conservation relations such as the balance equations for charge, enthalpy, and/or chemical elements, can be employed to relate conversion rates. In a pactical situation, some of these rates will be measured (in effect, be calculated directly from primary measurements of, e.g., concentrations and flow rates), as others can or cannot be calculated from the measured ones. When certain measured rates can also be calculated from other measured rates, the set of equations, the accuracy and credibility of the measured rates can indeed be improved by, respectively, balancing and gross error diagnosis. The balanced conversion rates are more accurate, and form a consistent set of data, which is more suitable for further application (e.g., to calculate nonmeasured rates) than the raw measurements. Such an approach has drawn attention in previous studies. The current study deals mainly with the problem of mathematically classifying the conversion rates into balanceable and calculable rates, given the subset of measured rates. The significance of this problem is illustrated with some examples. It is shown that a simple matrix equation can be derived that contains the vector of measured conversion rates and the redundancy matrix R. Matrix R plays a predominant role in the classification problem. In supplementary articles, significance of the redundancy matrix R for an improved gross error diagnosis approach will be shown. In addition, efficient equations have been derived to calculate the balanceable and/or calculable rates. The method is completely based on matrix algebra (principally different from the graph-theoretical approach), and it is easily implemented into a computer program. (c) 1994 John Wiley & Sons, Inc.
Beheshti, Iman; Demirel, Hasan; Farokhian, Farnaz; Yang, Chunlan; Matsuda, Hiroshi
2016-12-01
This paper presents an automatic computer-aided diagnosis (CAD) system based on feature ranking for detection of Alzheimer's disease (AD) using structural magnetic resonance imaging (sMRI) data. The proposed CAD system is composed of four systematic stages. First, global and local differences in the gray matter (GM) of AD patients compared to the GM of healthy controls (HCs) are analyzed using a voxel-based morphometry technique. The aim is to identify significant local differences in the volume of GM as volumes of interests (VOIs). Second, the voxel intensity values of the VOIs are extracted as raw features. Third, the raw features are ranked using a seven-feature ranking method, namely, statistical dependency (SD), mutual information (MI), information gain (IG), Pearson's correlation coefficient (PCC), t-test score (TS), Fisher's criterion (FC), and the Gini index (GI). The features with higher scores are more discriminative. To determine the number of top features, the estimated classification error based on training set made up of the AD and HC groups is calculated, with the vector size that minimized this error selected as the top discriminative feature. Fourth, the classification is performed using a support vector machine (SVM). In addition, a data fusion approach among feature ranking methods is introduced to improve the classification performance. The proposed method is evaluated using a data-set from ADNI (130 AD and 130 HC) with 10-fold cross-validation. The classification accuracy of the proposed automatic system for the diagnosis of AD is up to 92.48% using the sMRI data. An automatic CAD system for the classification of AD based on feature-ranking method and classification errors is proposed. In this regard, seven-feature ranking methods (i.e., SD, MI, IG, PCC, TS, FC, and GI) are evaluated. The optimal size of top discriminative features is determined by the classification error estimation in the training phase. The experimental results indicate that the performance of the proposed system is comparative to that of state-of-the-art classification models. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Three-Class Mammogram Classification Based on Descriptive CNN Features
Zhang, Qianni; Jadoon, Adeel
2017-01-01
In this paper, a novel classification technique for large data set of mammograms using a deep learning method is proposed. The proposed model targets a three-class classification study (normal, malignant, and benign cases). In our model we have presented two methods, namely, convolutional neural network-discrete wavelet (CNN-DW) and convolutional neural network-curvelet transform (CNN-CT). An augmented data set is generated by using mammogram patches. To enhance the contrast of mammogram images, the data set is filtered by contrast limited adaptive histogram equalization (CLAHE). In the CNN-DW method, enhanced mammogram images are decomposed as its four subbands by means of two-dimensional discrete wavelet transform (2D-DWT), while in the second method discrete curvelet transform (DCT) is used. In both methods, dense scale invariant feature (DSIFT) for all subbands is extracted. Input data matrix containing these subband features of all the mammogram patches is created that is processed as input to convolutional neural network (CNN). Softmax layer and support vector machine (SVM) layer are used to train CNN for classification. Proposed methods have been compared with existing methods in terms of accuracy rate, error rate, and various validation assessment measures. CNN-DW and CNN-CT have achieved accuracy rate of 81.83% and 83.74%, respectively. Simulation results clearly validate the significance and impact of our proposed model as compared to other well-known existing techniques. PMID:28191461
Three-Class Mammogram Classification Based on Descriptive CNN Features.
Jadoon, M Mohsin; Zhang, Qianni; Haq, Ihsan Ul; Butt, Sharjeel; Jadoon, Adeel
2017-01-01
In this paper, a novel classification technique for large data set of mammograms using a deep learning method is proposed. The proposed model targets a three-class classification study (normal, malignant, and benign cases). In our model we have presented two methods, namely, convolutional neural network-discrete wavelet (CNN-DW) and convolutional neural network-curvelet transform (CNN-CT). An augmented data set is generated by using mammogram patches. To enhance the contrast of mammogram images, the data set is filtered by contrast limited adaptive histogram equalization (CLAHE). In the CNN-DW method, enhanced mammogram images are decomposed as its four subbands by means of two-dimensional discrete wavelet transform (2D-DWT), while in the second method discrete curvelet transform (DCT) is used. In both methods, dense scale invariant feature (DSIFT) for all subbands is extracted. Input data matrix containing these subband features of all the mammogram patches is created that is processed as input to convolutional neural network (CNN). Softmax layer and support vector machine (SVM) layer are used to train CNN for classification. Proposed methods have been compared with existing methods in terms of accuracy rate, error rate, and various validation assessment measures. CNN-DW and CNN-CT have achieved accuracy rate of 81.83% and 83.74%, respectively. Simulation results clearly validate the significance and impact of our proposed model as compared to other well-known existing techniques.
Kreilinger, Alex; Hiebel, Hannah; Müller-Putz, Gernot R
2016-03-01
This work aimed to find and evaluate a new method for detecting errors in continuous brain-computer interface (BCI) applications. Instead of classifying errors on a single-trial basis, the new method was based on multiple events (MEs) analysis to increase the accuracy of error detection. In a BCI-driven car game, based on motor imagery (MI), discrete events were triggered whenever subjects collided with coins and/or barriers. Coins counted as correct events, whereas barriers were errors. This new method, termed ME method, combined and averaged the classification results of single events (SEs) and determined the correctness of MI trials, which consisted of event sequences instead of SEs. The benefit of this method was evaluated in an offline simulation. In an online experiment, the new method was used to detect erroneous MI trials. Such MI trials were discarded and could be repeated by the users. We found that, even with low SE error potential (ErrP) detection rates, feasible accuracies can be achieved when combining MEs to distinguish erroneous from correct MI trials. Online, all subjects reached higher scores with error detection than without, at the cost of longer times needed for completing the game. Findings suggest that ErrP detection may become a reliable tool for monitoring continuous states in BCI applications when combining MEs. This paper demonstrates a novel technique for detecting errors in online continuous BCI applications, which yields promising results even with low single-trial detection rates.
Who gets a mammogram amongst European women aged 50-69 years?
2012-01-01
On the basis of the Survey of Health, Ageing, and Retirement (SHARE), we analyse the determinants of who engages in mammography screening focusing on European women aged 50-69 years. A special emphasis is put on the measurement error of subjective life expectancy and on the measurement and impact of physician quality. Our main findings are that physician quality, better education, having a partner, younger age and better health are associated with higher rates of receipt. The impact of subjective life expectancy on screening decision substantially increases after taking measurement error into account. JEL Classification C 36, I 11, I 18 PMID:22828268
Mandava, Pitchaiah; Krumpelman, Chase S; Shah, Jharna N; White, Donna L; Kent, Thomas A
2013-01-01
Clinical trial outcomes often involve an ordinal scale of subjective functional assessments but the optimal way to quantify results is not clear. In stroke, the most commonly used scale, the modified Rankin Score (mRS), a range of scores ("Shift") is proposed as superior to dichotomization because of greater information transfer. The influence of known uncertainties in mRS assessment has not been quantified. We hypothesized that errors caused by uncertainties could be quantified by applying information theory. Using Shannon's model, we quantified errors of the "Shift" compared to dichotomized outcomes using published distributions of mRS uncertainties and applied this model to clinical trials. We identified 35 randomized stroke trials that met inclusion criteria. Each trial's mRS distribution was multiplied with the noise distribution from published mRS inter-rater variability to generate an error percentage for "shift" and dichotomized cut-points. For the SAINT I neuroprotectant trial, considered positive by "shift" mRS while the larger follow-up SAINT II trial was negative, we recalculated sample size required if classification uncertainty was taken into account. Considering the full mRS range, error rate was 26.1%±5.31 (Mean±SD). Error rates were lower for all dichotomizations tested using cut-points (e.g. mRS 1; 6.8%±2.89; overall p<0.001). Taking errors into account, SAINT I would have required 24% more subjects than were randomized. We show when uncertainty in assessments is considered, the lowest error rates are with dichotomization. While using the full range of mRS is conceptually appealing, a gain of information is counter-balanced by a decrease in reliability. The resultant errors need to be considered since sample size may otherwise be underestimated. In principle, we have outlined an approach to error estimation for any condition in which there are uncertainties in outcome assessment. We provide the user with programs to calculate and incorporate errors into sample size estimation.
Large-scale optimization-based classification models in medicine and biology.
Lee, Eva K
2007-06-01
We present novel optimization-based classification models that are general purpose and suitable for developing predictive rules for large heterogeneous biological and medical data sets. Our predictive model simultaneously incorporates (1) the ability to classify any number of distinct groups; (2) the ability to incorporate heterogeneous types of attributes as input; (3) a high-dimensional data transformation that eliminates noise and errors in biological data; (4) the ability to incorporate constraints to limit the rate of misclassification, and a reserved-judgment region that provides a safeguard against over-training (which tends to lead to high misclassification rates from the resulting predictive rule); and (5) successive multi-stage classification capability to handle data points placed in the reserved-judgment region. To illustrate the power and flexibility of the classification model and solution engine, and its multi-group prediction capability, application of the predictive model to a broad class of biological and medical problems is described. Applications include: the differential diagnosis of the type of erythemato-squamous diseases; predicting presence/absence of heart disease; genomic analysis and prediction of aberrant CpG island meythlation in human cancer; discriminant analysis of motility and morphology data in human lung carcinoma; prediction of ultrasonic cell disruption for drug delivery; identification of tumor shape and volume in treatment of sarcoma; discriminant analysis of biomarkers for prediction of early atherosclerois; fingerprinting of native and angiogenic microvascular networks for early diagnosis of diabetes, aging, macular degeneracy and tumor metastasis; prediction of protein localization sites; and pattern recognition of satellite images in classification of soil types. In all these applications, the predictive model yields correct classification rates ranging from 80 to 100%. This provides motivation for pursuing its use as a medical diagnostic, monitoring and decision-making tool.
Wiegmann, D A; Shappell, S A
2001-11-01
The Human Factors Analysis and Classification System (HFACS) is a general human error framework originally developed and tested within the U.S. military as a tool for investigating and analyzing the human causes of aviation accidents. Based on Reason's (1990) model of latent and active failures, HFACS addresses human error at all levels of the system, including the condition of aircrew and organizational factors. The purpose of the present study was to assess the utility of the HFACS framework as an error analysis and classification tool outside the military. The HFACS framework was used to analyze human error data associated with aircrew-related commercial aviation accidents that occurred between January 1990 and December 1996 using database records maintained by the NTSB and the FAA. Investigators were able to reliably accommodate all the human causal factors associated with the commercial aviation accidents examined in this study using the HFACS system. In addition, the classification of data using HFACS highlighted several critical safety issues in need of intervention research. These results demonstrate that the HFACS framework can be a viable tool for use within the civil aviation arena. However, additional research is needed to examine its applicability to areas outside the flight deck, such as aircraft maintenance and air traffic control domains.
MicroRNA Expression Profile Selection for Cancer Staging Classification Using Backpropagation
NASA Astrophysics Data System (ADS)
Anjarwati; Wibowo, Adi; Adhy, Satriyo; Kusumaningrum, Retno
2018-05-01
Ovarian cancer, breast cancer, and lung cancer are deadly diseases and require serious treatment. The cancers are among the fifth most common causes of cancer-induced deaths especially for woman. The high mortality rate of cancer is caused by the lack of effective strategies for early detection of the cancer, whereas if its detected in the early stages, the life survival of cancer patients will be 90%, otherwise the survival rate only 30% when the cancers detected on metastasis stages or cancer cells have spread from a primary site of cancer. MicroRNAs can be used as potential biomarkers for cancer due to their profile expression on the cancers. In this paper, we proposed the feature selection of microRNA expression profiles for classification of the cancers stages using Backpropagation Neural Network. The Cancer stages are classified into before metastasis and after metastasis. Several combinations of the microRNA expression profiles from medical references are compared to find the best features for the classification. The accuracy and the mean square errors are used as basis testing the comparison.
Classifier fusion for VoIP attacks classification
NASA Astrophysics Data System (ADS)
Safarik, Jakub; Rezac, Filip
2017-05-01
SIP is one of the most successful protocols in the field of IP telephony communication. It establishes and manages VoIP calls. As the number of SIP implementation rises, we can expect a higher number of attacks on the communication system in the near future. This work aims at malicious SIP traffic classification. A number of various machine learning algorithms have been developed for attack classification. The paper presents a comparison of current research and the use of classifier fusion method leading to a potential decrease in classification error rate. Use of classifier combination makes a more robust solution without difficulties that may affect single algorithms. Different voting schemes, combination rules, and classifiers are discussed to improve the overall performance. All classifiers have been trained on real malicious traffic. The concept of traffic monitoring depends on the network of honeypot nodes. These honeypots run in several networks spread in different locations. Separation of honeypots allows us to gain an independent and trustworthy attack information.
Authentication of the botanical origin of honey by near-infrared spectroscopy.
Ruoff, Kaspar; Luginbühl, Werner; Bogdanov, Stefan; Bosset, Jacques Olivier; Estermann, Barbara; Ziolko, Thomas; Amado, Renato
2006-09-06
Fourier transform near-infrared spectroscopy (FT-NIR) was evaluated for the authentication of eight unifloral and polyfloral honey types (n = 364 samples) previously classified using traditional methods such as chemical, pollen, and sensory analysis. Chemometric evaluation of the spectra was carried out by applying principal component analysis and linear discriminant analysis. The corresponding error rates were calculated according to Bayes' theorem. NIR spectroscopy enabled a reliable discrimination of acacia, chestnut, and fir honeydew honey from the other unifloral and polyfloral honey types studied. The error rates ranged from <0.1 to 6.3% depending on the honey type. NIR proved also to be useful for the classification of blossom and honeydew honeys. The results demonstrate that near-infrared spectrometry is a valuable, rapid, and nondestructive tool for the authentication of the above-mentioned honeys, but not for all varieties studied.
Multiplex Microsphere Immunoassays for the Detection of IgM and IgG to Arboviral Diseases
Basile, Alison J.; Horiuchi, Kalanthe; Panella, Amanda J.; Laven, Janeen; Kosoy, Olga; Lanciotti, Robert S.; Venkateswaran, Neeraja; Biggerstaff, Brad J.
2013-01-01
Serodiagnosis of arthropod-borne viruses (arboviruses) at the Division of Vector-Borne Diseases, CDC, employs a combination of individual enzyme-linked immunosorbent assays and microsphere immunoassays (MIAs) to test for IgM and IgG, followed by confirmatory plaque-reduction neutralization tests. Based upon the geographic origin of a sample, it may be tested concurrently for multiple arboviruses, which can be a cumbersome task. The advent of multiplexing represents an opportunity to streamline these types of assays; however, because serologic cross-reactivity of the arboviral antigens often confounds results, it is of interest to employ data analysis methods that address this issue. Here, we constructed 13-virus multiplexed IgM and IgG MIAs that included internal and external controls, based upon the Luminex platform. Results from samples tested using these methods were analyzed using 8 different statistical schemes to identify the best way to classify the data. Geographic batteries were also devised to serve as a more practical diagnostic format, and further samples were tested using the abbreviated multiplexes. Comparative error rates for the classification schemes identified a specific boosting method based on logistic regression “Logitboost” as the classification method of choice. When the data from all samples tested were combined into one set, error rates from the multiplex IgM and IgG MIAs were <5% for all geographic batteries. This work represents both the most comprehensive, validated multiplexing method for arboviruses to date, and also the most systematic attempt to determine the most useful classification method for use with these types of serologic tests. PMID:24086608
Does the cost function matter in Bayes decision rule?
Schlü ter, Ralf; Nussbaum-Thom, Markus; Ney, Hermann
2012-02-01
In many tasks in pattern recognition, such as automatic speech recognition (ASR), optical character recognition (OCR), part-of-speech (POS) tagging, and other string recognition tasks, we are faced with a well-known inconsistency: The Bayes decision rule is usually used to minimize string (symbol sequence) error, whereas, in practice, we want to minimize symbol (word, character, tag, etc.) error. When comparing different recognition systems, we do indeed use symbol error rate as an evaluation measure. The topic of this work is to analyze the relation between string (i.e., 0-1) and symbol error (i.e., metric, integer valued) cost functions in the Bayes decision rule, for which fundamental analytic results are derived. Simple conditions are derived for which the Bayes decision rule with integer-valued metric cost function and with 0-1 cost gives the same decisions or leads to classes with limited cost. The corresponding conditions can be tested with complexity linear in the number of classes. The results obtained do not make any assumption w.r.t. the structure of the underlying distributions or the classification problem. Nevertheless, the general analytic results are analyzed via simulations of string recognition problems with Levenshtein (edit) distance cost function. The results support earlier findings that considerable improvements are to be expected when initial error rates are high.
Word-level language modeling for P300 spellers based on discriminative graphical models
NASA Astrophysics Data System (ADS)
Delgado Saa, Jaime F.; de Pesters, Adriana; McFarland, Dennis; Çetin, Müjdat
2015-04-01
Objective. In this work we propose a probabilistic graphical model framework that uses language priors at the level of words as a mechanism to increase the performance of P300-based spellers. Approach. This paper is concerned with brain-computer interfaces based on P300 spellers. Motivated by P300 spelling scenarios involving communication based on a limited vocabulary, we propose a probabilistic graphical model framework and an associated classification algorithm that uses learned statistical models of language at the level of words. Exploiting such high-level contextual information helps reduce the error rate of the speller. Main results. Our experimental results demonstrate that the proposed approach offers several advantages over existing methods. Most importantly, it increases the classification accuracy while reducing the number of times the letters need to be flashed, increasing the communication rate of the system. Significance. The proposed approach models all the variables in the P300 speller in a unified framework and has the capability to correct errors in previous letters in a word, given the data for the current one. The structure of the model we propose allows the use of efficient inference algorithms, which in turn makes it possible to use this approach in real-time applications.
Zheng, Ling; Yumak, Hasan; Chen, Ling; Ochs, Christopher; Geller, James; Kapusnik-Uner, Joan; Perl, Yehoshua
2017-09-01
The National Drug File - Reference Terminology (NDF-RT) is a large and complex drug terminology consisting of several classification hierarchies on top of an extensive collection of drug concepts. These hierarchies provide important information about clinical drugs, e.g., their chemical ingredients, mechanisms of action, dosage form and physiological effects. Within NDF-RT such information is represented using tens of thousands of roles connecting drugs to classifications. In previous studies, we have introduced various kinds of Abstraction Networks to summarize the content and structure of terminologies in order to facilitate their visual comprehension, and support quality assurance of terminologies. However, these previous kinds of Abstraction Networks are not appropriate for summarizing the NDF-RT classification hierarchies, due to its unique structure. In this paper, we present the novel Ingredient Abstraction Network (IAbN) to summarize, visualize and support the audit of NDF-RT's Chemical Ingredients hierarchy and its associated drugs. A common theme in our quality assurance framework is to use characterizations of sets of concepts, revealed by the Abstraction Network structure, to capture concepts, the modeling of which is more complex than for other concepts. For the IAbN, we characterize drug ingredient concepts as more complex if they belong to IAbN groups with multiple parent groups. We show that such concepts have a statistically significantly higher rate of errors than a control sample and identify two especially common patterns of errors. Copyright © 2017 Elsevier Inc. All rights reserved.
Saha, Monjoy; Chakraborty, Chandan
2018-05-01
We present an efficient deep learning framework for identifying, segmenting, and classifying cell membranes and nuclei from human epidermal growth factor receptor-2 (HER2)-stained breast cancer images with minimal user intervention. This is a long-standing issue for pathologists because the manual quantification of HER2 is error-prone, costly, and time-consuming. Hence, we propose a deep learning-based HER2 deep neural network (Her2Net) to solve this issue. The convolutional and deconvolutional parts of the proposed Her2Net framework consisted mainly of multiple convolution layers, max-pooling layers, spatial pyramid pooling layers, deconvolution layers, up-sampling layers, and trapezoidal long short-term memory (TLSTM). A fully connected layer and a softmax layer were also used for classification and error estimation. Finally, HER2 scores were calculated based on the classification results. The main contribution of our proposed Her2Net framework includes the implementation of TLSTM and a deep learning framework for cell membrane and nucleus detection, segmentation, and classification and HER2 scoring. Our proposed Her2Net achieved 96.64% precision, 96.79% recall, 96.71% F-score, 93.08% negative predictive value, 98.33% accuracy, and a 6.84% false-positive rate. Our results demonstrate the high accuracy and wide applicability of the proposed Her2Net in the context of HER2 scoring for breast cancer evaluation.
ERIC Educational Resources Information Center
Byars, Alvin Gregg
The objectives of this investigation are to develop, describe, assess, and demonstrate procedures for constructing mastery tests to minimize errors of classification and to maximize decision reliability. The guidelines are based on conditions where item exchangeability is a reasonable assumption and the test constructor can control the number of…
Comparing K-mer based methods for improved classification of 16S sequences.
Vinje, Hilde; Liland, Kristian Hovde; Almøy, Trygve; Snipen, Lars
2015-07-01
The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.
2013-01-01
Background The information of electromyographic signals can be used by Myoelectric Control Systems (MCSs) to actuate prostheses. These devices allow the performing of movements that cannot be carried out by persons with amputated limbs. The state of the art in the development of MCSs is based on the use of individual principal component analysis (iPCA) as a stage of pre-processing of the classifiers. The iPCA pre-processing implies an optimization stage which has not yet been deeply explored. Methods The present study considers two factors in the iPCA stage: namely A (the fitness function), and B (the search algorithm). The A factor comprises two levels, namely A1 (the classification error) and A2 (the correlation factor). Otherwise, the B factor has four levels, specifically B1 (the Sequential Forward Selection, SFS), B2 (the Sequential Floating Forward Selection, SFFS), B3 (Artificial Bee Colony, ABC), and B4 (Particle Swarm Optimization, PSO). This work evaluates the incidence of each one of the eight possible combinations between A and B factors over the classification error of the MCS. Results A two factor ANOVA was performed on the computed classification errors and determined that: (1) the interactive effects over the classification error are not significative (F0.01,3,72 = 4.0659 > f AB = 0.09), (2) the levels of factor A have significative effects on the classification error (F0.02,1,72 = 5.0162 < f A = 6.56), and (3) the levels of factor B over the classification error are not significative (F0.01,3,72 = 4.0659 > f B = 0.08). Conclusions Considering the classification performance we found a superiority of using the factor A2 in combination with any of the levels of factor B. With respect to the time performance the analysis suggests that the PSO algorithm is at least 14 percent better than its best competitor. The latter behavior has been observed for a particular configuration set of parameters in the search algorithms. Future works will investigate the effect of these parameters in the classification performance, such as length of the reduced size vector, number of particles and bees used during optimal search, the cognitive parameters in the PSO algorithm as well as the limit of cycles to improve a solution in the ABC algorithm. PMID:24369728
Hooper, Brionny J; O'Hare, David P A
2013-08-01
Human error classification systems theoretically allow researchers to analyze postaccident data in an objective and consistent manner. The Human Factors Analysis and Classification System (HFACS) framework is one such practical analysis tool that has been widely used to classify human error in aviation. The Cognitive Error Taxonomy (CET) is another. It has been postulated that the focus on interrelationships within HFACS can facilitate the identification of the underlying causes of pilot error. The CET provides increased granularity at the level of unsafe acts. The aim was to analyze the influence of factors at higher organizational levels on the unsafe acts of front-line operators and to compare the errors of fixed-wing and rotary-wing operations. This study analyzed 288 aircraft incidents involving human error from an Australasian military organization occurring between 2001 and 2008. Action errors accounted for almost twice (44%) the proportion of rotary wing compared to fixed wing (23%) incidents. Both classificatory systems showed significant relationships between precursor factors such as the physical environment, mental and physiological states, crew resource management, training and personal readiness, and skill-based, but not decision-based, acts. The CET analysis showed different predisposing factors for different aspects of skill-based behaviors. Skill-based errors in military operations are more prevalent in rotary wing incidents and are related to higher level supervisory processes in the organization. The Cognitive Error Taxonomy provides increased granularity to HFACS analyses of unsafe acts.
Application of human reliability analysis to nursing errors in hospitals.
Inoue, Kayoko; Koizumi, Akio
2004-12-01
Adverse events in hospitals, such as in surgery, anesthesia, radiology, intensive care, internal medicine, and pharmacy, are of worldwide concern and it is important, therefore, to learn from such incidents. There are currently no appropriate tools based on state-of-the art models available for the analysis of large bodies of medical incident reports. In this study, a new model was developed to facilitate medical error analysis in combination with quantitative risk assessment. This model enables detection of the organizational factors that underlie medical errors, and the expedition of decision making in terms of necessary action. Furthermore, it determines medical tasks as module practices and uses a unique coding system to describe incidents. This coding system has seven vectors for error classification: patient category, working shift, module practice, linkage chain (error type, direct threat, and indirect threat), medication, severity, and potential hazard. Such mathematical formulation permitted us to derive two parameters: error rates for module practices and weights for the aforementioned seven elements. The error rate of each module practice was calculated by dividing the annual number of incident reports of each module practice by the annual number of the corresponding module practice. The weight of a given element was calculated by the summation of incident report error rates for an element of interest. This model was applied specifically to nursing practices in six hospitals over a year; 5,339 incident reports with a total of 63,294,144 module practices conducted were analyzed. Quality assurance (QA) of our model was introduced by checking the records of quantities of practices and reproducibility of analysis of medical incident reports. For both items, QA guaranteed legitimacy of our model. Error rates for all module practices were approximately of the order 10(-4) in all hospitals. Three major organizational factors were found to underlie medical errors: "violation of rules" with a weight of 826 x 10(-4), "failure of labor management" with a weight of 661 x 10(-4), and "defects in the standardization of nursing practices" with a weight of 495 x 10(-4).
NASA Astrophysics Data System (ADS)
Wang, Ke-Yan; Li, Yun-Song; Liu, Kai; Wu, Cheng-Ke
2008-08-01
A novel compression algorithm for interferential multispectral images based on adaptive classification and curve-fitting is proposed. The image is first partitioned adaptively into major-interference region and minor-interference region. Different approximating functions are then constructed for two kinds of regions respectively. For the major interference region, some typical interferential curves are selected to predict other curves. These typical curves are then processed by curve-fitting method. For the minor interference region, the data of each interferential curve are independently approximated. Finally the approximating errors of two regions are entropy coded. The experimental results show that, compared with JPEG2000, the proposed algorithm not only decreases the average output bit-rate by about 0.2 bit/pixel for lossless compression, but also improves the reconstructed images and reduces the spectral distortion greatly, especially at high bit-rate for lossy compression.
Waring, R; Knight, R
2013-01-01
Children with speech sound disorders (SSD) form a heterogeneous group who differ in terms of the severity of their condition, underlying cause, speech errors, involvement of other aspects of the linguistic system and treatment response. To date there is no universal and agreed-upon classification system. Instead, a number of theoretically differing classification systems have been proposed based on either an aetiological (medical) approach, a descriptive-linguistic approach or a processing approach. To describe and review the supporting evidence, and to provide a critical evaluation of the current childhood SSD classification systems. Descriptions of the major specific approaches to classification are reviewed and research papers supporting the reliability and validity of the systems are evaluated. Three specific paediatric SSD classification systems; the aetiologic-based Speech Disorders Classification System, the descriptive-linguistic Differential Diagnosis system, and the processing-based Psycholinguistic Framework are identified as potentially useful in classifying children with SSD into homogeneous subgroups. The Differential Diagnosis system has a growing body of empirical support from clinical population studies, across language error pattern studies and treatment efficacy studies. The Speech Disorders Classification System is currently a research tool with eight proposed subgroups. The Psycholinguistic Framework is a potential bridge to linking cause and surface level speech errors. There is a need for a universally agreed-upon classification system that is useful to clinicians and researchers. The resulting classification system needs to be robust, reliable and valid. A universal classification system would allow for improved tailoring of treatments to subgroups of SSD which may, in turn, lead to improved treatment efficacy. © 2012 Royal College of Speech and Language Therapists.
The application of Aronson's taxonomy to medication errors in nursing.
Johnson, Maree; Young, Helen
2011-01-01
Medication administration is a frequent nursing activity that is prone to error. In this study of 318 self-reported medication incidents (including near misses), very few resulted in patient harm-7% required intervention or prolonged hospitalization or caused temporary harm. Aronson's classification system provided an excellent framework for analysis of the incidents with a close connection between the type of error and the change strategy to minimize medication incidents. Taking a behavioral approach to medication error classification has provided helpful strategies for nurses such as nurse-call cards on patient lockers when patients are absent and checking of medication sign-off by outgoing and incoming staff at handover.
Center for Seismic Studies Final Technical Report, October 1992 through October 1993
1994-02-07
SECURITY CLASSIFICATION 18. SECURITY CLASSIFICATION 19. SECURITY CLASSIFICATION 20. LIMITATION OF ABSTRACT OF REPORT OF THIS PAGE OF ABSTRACT...Upper limit of depth error as a function of mb for estimates based on P and S waves for three netowrks : GSETr-2, ALPHA, and ALPHA + a 50 station...U 4A 4 U 4S as 1 I I I Figure 42: Upper limit of depth error as a function of mb for estimatesbased on P and S waves for three netowrk : GSETT-2o ALPHA
HIV classification using the coalescent theory
Bulla, Ingo; Schultz, Anne-Kathrin; Schreiber, Fabian; Zhang, Ming; Leitner, Thomas; Korber, Bette; Morgenstern, Burkhard; Stanke, Mario
2010-01-01
Motivation: Existing coalescent models and phylogenetic tools based on them are not designed for studying the genealogy of sequences like those of HIV, since in HIV recombinants with multiple cross-over points between the parental strains frequently arise. Hence, ambiguous cases in the classification of HIV sequences into subtypes and circulating recombinant forms (CRFs) have been treated with ad hoc methods in lack of tools based on a comprehensive coalescent model accounting for complex recombination patterns. Results: We developed the program ARGUS that scores classifications of sequences into subtypes and recombinant forms. It reconstructs ancestral recombination graphs (ARGs) that reflect the genealogy of the input sequences given a classification hypothesis. An ARG with maximal probability is approximated using a Markov chain Monte Carlo approach. ARGUS was able to distinguish the correct classification with a low error rate from plausible alternative classifications in simulation studies with realistic parameters. We applied our algorithm to decide between two recently debated alternatives in the classification of CRF02 of HIV-1 and find that CRF02 is indeed a recombinant of Subtypes A and G. Availability: ARGUS is implemented in C++ and the source code is available at http://gobics.de/software Contact: ibulla@uni-goettingen.de Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:20400454
NASA Technical Reports Server (NTRS)
Maslanik, J. A.; Key, J.
1992-01-01
An expert system framework has been developed to classify sea ice types using satellite passive microwave data, an operational classification algorithm, spatial and temporal information, ice types estimated from a dynamic-thermodynamic model, output from a neural network that detects the onset of melt, and knowledge about season and region. The rule base imposes boundary conditions upon the ice classification, modifies parameters in the ice algorithm, determines a `confidence' measure for the classified data, and under certain conditions, replaces the algorithm output with model output. Results demonstrate the potential power of such a system for minimizing overall error in the classification and for providing non-expert data users with a means of assessing the usefulness of the classification results for their applications.
Spelling in adolescents with dyslexia: errors and modes of assessment.
Tops, Wim; Callens, Maaike; Bijn, Evi; Brysbaert, Marc
2014-01-01
In this study we focused on the spelling of high-functioning students with dyslexia. We made a detailed classification of the errors in a word and sentence dictation task made by 100 students with dyslexia and 100 matched control students. All participants were in the first year of their bachelor's studies and had Dutch as mother tongue. Three main error categories were distinguished: phonological, orthographic, and grammatical errors (on the basis of morphology and language-specific spelling rules). The results indicated that higher-education students with dyslexia made on average twice as many spelling errors as the controls, with effect sizes of d ≥ 2. When the errors were classified as phonological, orthographic, or grammatical, we found a slight dominance of phonological errors in students with dyslexia. Sentence dictation did not provide more information than word dictation in the correct classification of students with and without dyslexia. © Hammill Institute on Disabilities 2012.
Classification and disease prediction via mathematical programming
NASA Astrophysics Data System (ADS)
Lee, Eva K.; Wu, Tsung-Lin
2007-11-01
In this chapter, we present classification models based on mathematical programming approaches. We first provide an overview on various mathematical programming approaches, including linear programming, mixed integer programming, nonlinear programming and support vector machines. Next, we present our effort of novel optimization-based classification models that are general purpose and suitable for developing predictive rules for large heterogeneous biological and medical data sets. Our predictive model simultaneously incorporates (1) the ability to classify any number of distinct groups; (2) the ability to incorporate heterogeneous types of attributes as input; (3) a high-dimensional data transformation that eliminates noise and errors in biological data; (4) the ability to incorporate constraints to limit the rate of misclassification, and a reserved-judgment region that provides a safeguard against over-training (which tends to lead to high misclassification rates from the resulting predictive rule) and (5) successive multi-stage classification capability to handle data points placed in the reserved judgment region. To illustrate the power and flexibility of the classification model and solution engine, and its multigroup prediction capability, application of the predictive model to a broad class of biological and medical problems is described. Applications include: the differential diagnosis of the type of erythemato-squamous diseases; predicting presence/absence of heart disease; genomic analysis and prediction of aberrant CpG island meythlation in human cancer; discriminant analysis of motility and morphology data in human lung carcinoma; prediction of ultrasonic cell disruption for drug delivery; identification of tumor shape and volume in treatment of sarcoma; multistage discriminant analysis of biomarkers for prediction of early atherosclerois; fingerprinting of native and angiogenic microvascular networks for early diagnosis of diabetes, aging, macular degeneracy and tumor metastasis; prediction of protein localization sites; and pattern recognition of satellite images in classification of soil types. In all these applications, the predictive model yields correct classification rates ranging from 80% to 100%. This provides motivation for pursuing its use as a medical diagnostic, monitoring and decision-making tool.
Danielson, Patrick; Yang, Limin; Jin, Suming; Homer, Collin G.; Napton, Darrell
2016-01-01
We developed a method that analyzes the quality of the cultivated cropland class mapped in the USA National Land Cover Database (NLCD) 2006. The method integrates multiple geospatial datasets and a Multi Index Integrated Change Analysis (MIICA) change detection method that captures spectral changes to identify the spatial distribution and magnitude of potential commission and omission errors for the cultivated cropland class in NLCD 2006. The majority of the commission and omission errors in NLCD 2006 are in areas where cultivated cropland is not the most dominant land cover type. The errors are primarily attributed to the less accurate training dataset derived from the National Agricultural Statistics Service Cropland Data Layer dataset. In contrast, error rates are low in areas where cultivated cropland is the dominant land cover. Agreement between model-identified commission errors and independently interpreted reference data was high (79%). Agreement was low (40%) for omission error comparison. The majority of the commission errors in the NLCD 2006 cultivated crops were confused with low-intensity developed classes, while the majority of omission errors were from herbaceous and shrub classes. Some errors were caused by inaccurate land cover change from misclassification in NLCD 2001 and the subsequent land cover post-classification process.
[Classifications in forensic medicine and their logical basis].
Kovalev, A V; Shmarov, L A; Ten'kov, A A
2014-01-01
The objective of the present study was to characterize the main requirements for the correct construction of classifications used in forensic medicine, with special reference to the errors that occur in the relevant text-books, guidelines, and manuals and the ways to avoid them. This publication continues the series of thematic articles of the authors devoted to the logical errors in the expert conclusions. The preparation of further publications is underway to report the results of the in-depth analysis of the logical errors encountered in expert conclusions, text-books, guidelines, and manuals.
Dieye, A.M.; Roy, David P.; Hanan, N.P.; Liu, S.; Hansen, M.; Toure, A.
2012-01-01
Spatially explicit land cover land use (LCLU) change information is needed to drive biogeochemical models that simulate soil organic carbon (SOC) dynamics. Such information is increasingly being mapped using remotely sensed satellite data with classification schemes and uncertainties constrained by the sensing system, classification algorithms and land cover schemes. In this study, automated LCLU classification of multi-temporal Landsat satellite data were used to assess the sensitivity of SOC modeled by the Global Ensemble Biogeochemical Modeling System (GEMS). The GEMS was run for an area of 1560 km2 in Senegal under three climate change scenarios with LCLU maps generated using different Landsat classification approaches. This research provides a method to estimate the variability of SOC, specifically the SOC uncertainty due to satellite classification errors, which we show is dependent not only on the LCLU classification errors but also on where the LCLU classes occur relative to the other GEMS model inputs.
[Land cover classification of Four Lakes Region in Hubei Province based on MODIS and ENVISAT data].
Xue, Lian; Jin, Wei-Bin; Xiong, Qin-Xue; Liu, Zhang-Yong
2010-03-01
Based on the differences of back scattering coefficient in ENVISAT ASAR data, a classification was made on the towns, waters, and vegetation-covered areas in the Four Lakes Region of Hubei Province. According to the local cropping systems and phenological characteristics in the region, and by using the discrepancies of the MODIS-NDVI index from late April to early May, the vegetation-covered areas were classified into croplands and non-croplands. The classification results based on the above-mentioned procedure was verified by the classification results based on the ETM data with high spatial resolution. Based on the DEM data, the non-croplands were categorized into forest land and bottomland; and based on the discrepancies of mean NDVI index per month, the crops were identified as mid rice, late rice, and cotton, and the croplands were identified as paddy field and upland field. The land cover classification based on the MODIS data with low spatial resolution was basically consistent with that based on the ETM data with high spatial resolution, and the total error rate was about 13.15% when the classification results based on ETM data were taken as the standard. The utilization of the above-mentioned procedures for large scale land cover classification and mapping could make the fast tracking of regional land cover classification.
NASA Astrophysics Data System (ADS)
Porto, C. D. N.; Costa Filho, C. F. F.; Macedo, M. M. G.; Gutierrez, M. A.; Costa, M. G. F.
2017-03-01
Studies in intravascular optical coherence tomography (IV-OCT) have demonstrated the importance of coronary bifurcation regions in intravascular medical imaging analysis, as plaques are more likely to accumulate in this region leading to coronary disease. A typical IV-OCT pullback acquires hundreds of frames, thus developing an automated tool to classify the OCT frames as bifurcation or non-bifurcation can be an important step to speed up OCT pullbacks analysis and assist automated methods for atherosclerotic plaque quantification. In this work, we evaluate the performance of two state-of-the-art classifiers, SVM and Neural Networks in the bifurcation classification task. The study included IV-OCT frames from 9 patients. In order to improve classification performance, we trained and tested the SVM with different parameters by means of a grid search and different stop criteria were applied to the Neural Network classifier: mean square error, early stop and regularization. Different sets of features were tested, using feature selection techniques: PCA, LDA and scalar feature selection with correlation. Training and test were performed in sets with a maximum of 1460 OCT frames. We quantified our results in terms of false positive rate, true positive rate, accuracy, specificity, precision, false alarm, f-measure and area under ROC curve. Neural networks obtained the best classification accuracy, 98.83%, overcoming the results found in literature. Our methods appear to offer a robust and reliable automated classification of OCT frames that might assist physicians indicating potential frames to analyze. Methods for improving neural networks generalization have increased the classification performance.
Kim, Junghoe; Calhoun, Vince D.; Shim, Eunsoo; Lee, Jong-Hwan
2015-01-01
Functional connectivity (FC) patterns obtained from resting-state functional magnetic resonance imaging data are commonly employed to study neuropsychiatric conditions by using pattern classifiers such as the support vector machine (SVM). Meanwhile, a deep neural network (DNN) with multiple hidden layers has shown its ability to systematically extract lower-to-higher level information of image and speech data from lower-to-higher hidden layers, markedly enhancing classification accuracy. The objective of this study was to adopt the DNN for whole-brain resting-state FC pattern classification of schizophrenia (SZ) patients vs. healthy controls (HCs) and identification of aberrant FC patterns associated with SZ. We hypothesized that the lower-to-higher level features learned via the DNN would significantly enhance the classification accuracy, and proposed an adaptive learning algorithm to explicitly control the weight sparsity in each hidden layer via L1-norm regularization. Furthermore, the weights were initialized via stacked autoencoder based pre-training to further improve the classification performance. Classification accuracy was systematically evaluated as a function of (1) the number of hidden layers/nodes, (2) the use of L1-norm regularization, (3) the use of the pre-training, (4) the use of framewise displacement (FD) removal, and (5) the use of anatomical/functional parcellation. Using FC patterns from anatomically parcellated regions without FD removal, an error rate of 14.2% was achieved by employing three hidden layers and 50 hidden nodes with both L1-norm regularization and pre-training, which was substantially lower than the error rate from the SVM (22.3%). Moreover, the trained DNN weights (i.e., the learned features) were found to represent the hierarchical organization of aberrant FC patterns in SZ compared with HC. Specifically, pairs of nodes extracted from the lower hidden layer represented sparse FC patterns implicated in SZ, which was quantified by using kurtosis/modularity measures and features from the higher hidden layer showed holistic/global FC patterns differentiating SZ from HC. Our proposed schemes and reported findings attained by using the DNN classifier and whole-brain FC data suggest that such approaches show improved ability to learn hidden patterns in brain imaging data, which may be useful for developing diagnostic tools for SZ and other neuropsychiatric disorders and identifying associated aberrant FC patterns. PMID:25987366
Evaluating data mining algorithms using molecular dynamics trajectories.
Tatsis, Vasileios A; Tjortjis, Christos; Tzirakis, Panagiotis
2013-01-01
Molecular dynamics simulations provide a sample of a molecule's conformational space. Experiments on the mus time scale, resulting in large amounts of data, are nowadays routine. Data mining techniques such as classification provide a way to analyse such data. In this work, we evaluate and compare several classification algorithms using three data sets which resulted from computer simulations, of a potential enzyme mimetic biomolecule. We evaluated 65 classifiers available in the well-known data mining toolkit Weka, using 'classification' errors to assess algorithmic performance. Results suggest that: (i) 'meta' classifiers perform better than the other groups, when applied to molecular dynamics data sets; (ii) Random Forest and Rotation Forest are the best classifiers for all three data sets; and (iii) classification via clustering yields the highest classification error. Our findings are consistent with bibliographic evidence, suggesting a 'roadmap' for dealing with such data.
Classification of breast cancer cytological specimen using convolutional neural network
NASA Astrophysics Data System (ADS)
Żejmo, Michał; Kowal, Marek; Korbicz, Józef; Monczak, Roman
2017-01-01
The paper presents a deep learning approach for automatic classification of breast tumors based on fine needle cytology. The main aim of the system is to distinguish benign from malignant cases based on microscopic images. Experiment was carried out on cytological samples derived from 50 patients (25 benign cases + 25 malignant cases) diagnosed in Regional Hospital in Zielona Góra. To classify microscopic images, we used convolutional neural networks (CNN) of two types: GoogLeNet and AlexNet. Due to the very large size of images of cytological specimen (on average 200000 × 100000 pixels), they were divided into smaller patches of size 256 × 256 pixels. Breast cancer classification usually is based on morphometric features of nuclei. Therefore, training and validation patches were selected using Support Vector Machine (SVM) so that suitable amount of cell material was depicted. Neural classifiers were tuned using GPU accelerated implementation of gradient descent algorithm. Training error was defined as a cross-entropy classification loss. Classification accuracy was defined as the percentage ratio of successfully classified validation patches to the total number of validation patches. The best accuracy rate of 83% was obtained by GoogLeNet model. We observed that more misclassified patches belong to malignant cases.
Evaluation criteria for software classification inventories, accuracies, and maps
NASA Technical Reports Server (NTRS)
Jayroe, R. R., Jr.
1976-01-01
Statistical criteria are presented for modifying the contingency table used to evaluate tabular classification results obtained from remote sensing and ground truth maps. This classification technique contains information on the spatial complexity of the test site, on the relative location of classification errors, on agreement of the classification maps with ground truth maps, and reduces back to the original information normally found in a contingency table.
NASA Astrophysics Data System (ADS)
Gummeson, Anna; Arvidsson, Ida; Ohlsson, Mattias; Overgaard, Niels C.; Krzyzanowska, Agnieszka; Heyden, Anders; Bjartell, Anders; Aström, Kalle
2017-03-01
Prostate cancer is the most diagnosed cancer in men. The diagnosis is confirmed by pathologists based on ocular inspection of prostate biopsies in order to classify them according to Gleason score. The main goal of this paper is to automate the classification using convolutional neural networks (CNNs). The introduction of CNNs has broadened the field of pattern recognition. It replaces the classical way of designing and extracting hand-made features used for classification with the substantially different strategy of letting the computer itself decide which features are of importance. For automated prostate cancer classification into the classes: Benign, Gleason grade 3, 4 and 5 we propose a CNN with small convolutional filters that has been trained from scratch using stochastic gradient descent with momentum. The input consists of microscopic images of haematoxylin and eosin stained tissue, the output is a coarse segmentation into regions of the four different classes. The dataset used consists of 213 images, each considered to be of one class only. Using four-fold cross-validation we obtained an error rate of 7.3%, which is significantly better than previous state of the art using the same dataset. Although the dataset was rather small, good results were obtained. From this we conclude that CNN is a promising method for this problem. Future work includes obtaining a larger dataset, which potentially could diminish the error margin.
The search for structure - Object classification in large data sets. [for astronomers
NASA Technical Reports Server (NTRS)
Kurtz, Michael J.
1988-01-01
Research concerning object classifications schemes are reviewed, focusing on large data sets. Classification techniques are discussed, including syntactic, decision theoretic methods, fuzzy techniques, and stochastic and fuzzy grammars. Consideration is given to the automation of MK classification (Morgan and Keenan, 1973) and other problems associated with the classification of spectra. In addition, the classification of galaxies is examined, including the problems of systematic errors, blended objects, galaxy types, and galaxy clusters.
Zollanvari, Amin; Dougherty, Edward R
2014-06-01
The most important aspect of any classifier is its error rate, because this quantifies its predictive capacity. Thus, the accuracy of error estimation is critical. Error estimation is problematic in small-sample classifier design because the error must be estimated using the same data from which the classifier has been designed. Use of prior knowledge, in the form of a prior distribution on an uncertainty class of feature-label distributions to which the true, but unknown, feature-distribution belongs, can facilitate accurate error estimation (in the mean-square sense) in circumstances where accurate completely model-free error estimation is impossible. This paper provides analytic asymptotically exact finite-sample approximations for various performance metrics of the resulting Bayesian Minimum Mean-Square-Error (MMSE) error estimator in the case of linear discriminant analysis (LDA) in the multivariate Gaussian model. These performance metrics include the first, second, and cross moments of the Bayesian MMSE error estimator with the true error of LDA, and therefore, the Root-Mean-Square (RMS) error of the estimator. We lay down the theoretical groundwork for Kolmogorov double-asymptotics in a Bayesian setting, which enables us to derive asymptotic expressions of the desired performance metrics. From these we produce analytic finite-sample approximations and demonstrate their accuracy via numerical examples. Various examples illustrate the behavior of these approximations and their use in determining the necessary sample size to achieve a desired RMS. The Supplementary Material contains derivations for some equations and added figures.
Jane, Nancy Yesudhas; Nehemiah, Khanna Harichandran; Arputharaj, Kannan
2016-01-01
Clinical time-series data acquired from electronic health records (EHR) are liable to temporal complexities such as irregular observations, missing values and time constrained attributes that make the knowledge discovery process challenging. This paper presents a temporal rough set induced neuro-fuzzy (TRiNF) mining framework that handles these complexities and builds an effective clinical decision-making system. TRiNF provides two functionalities namely temporal data acquisition (TDA) and temporal classification. In TDA, a time-series forecasting model is constructed by adopting an improved double exponential smoothing method. The forecasting model is used in missing value imputation and temporal pattern extraction. The relevant attributes are selected using a temporal pattern based rough set approach. In temporal classification, a classification model is built with the selected attributes using a temporal pattern induced neuro-fuzzy classifier. For experimentation, this work uses two clinical time series dataset of hepatitis and thrombosis patients. The experimental result shows that with the proposed TRiNF framework, there is a significant reduction in the error rate, thereby obtaining the classification accuracy on an average of 92.59% for hepatitis and 91.69% for thrombosis dataset. The obtained classification results prove the efficiency of the proposed framework in terms of its improved classification accuracy.
Procedural Error and Task Interruption
2016-09-30
red for research on errors and individual differences . Results indicate predictive validity for fluid intelligence and specifi c forms of work...TERMS procedural error, task interruption, individual differences , fluid intelligence, sleep deprivation 16. SECURITY CLASSIFICATION OF: 17...and individual differences . It generates rich data on several kinds of errors, including procedural errors in which steps are skipped or repeated
The Sources of Error in Spanish Writing.
ERIC Educational Resources Information Center
Justicia, Fernando; Defior, Sylvia; Pelegrina, Santiago; Martos, Francisco J.
1999-01-01
Determines the pattern of errors in Spanish spelling. Analyzes and proposes a classification system for the errors made by children in the initial stages of the acquisition of spelling skills. Finds the diverse forms of only 20 Spanish words produces 36% of the spelling errors in Spanish; and substitution is the most frequent type of error. (RS)
Photoacoustic discrimination of vascular and pigmented lesions using classical and Bayesian methods
NASA Astrophysics Data System (ADS)
Swearingen, Jennifer A.; Holan, Scott H.; Feldman, Mary M.; Viator, John A.
2010-01-01
Discrimination of pigmented and vascular lesions in skin can be difficult due to factors such as size, subungual location, and the nature of lesions containing both melanin and vascularity. Misdiagnosis may lead to precancerous or cancerous lesions not receiving proper medical care. To aid in the rapid and accurate diagnosis of such pathologies, we develop a photoacoustic system to determine the nature of skin lesions in vivo. By irradiating skin with two laser wavelengths, 422 and 530 nm, we induce photoacoustic responses, and the relative response at these two wavelengths indicates whether the lesion is pigmented or vascular. This response is due to the distinct absorption spectrum of melanin and hemoglobin. In particular, pigmented lesions have ratios of photoacoustic amplitudes of approximately 1.4 to 1 at the two wavelengths, while vascular lesions have ratios of about 4.0 to 1. Furthermore, we consider two statistical methods for conducting classification of lesions: standard multivariate analysis classification techniques and a Bayesian-model-based approach. We study 15 human subjects with eight vascular and seven pigmented lesions. Using the classical method, we achieve a perfect classification rate, while the Bayesian approach has an error rate of 20%.
Sonnleitner, Andreas; Treder, Matthias Sebastian; Simon, Michael; Willmann, Sven; Ewald, Arne; Buchner, Axel; Schrauf, Michael
2014-01-01
Driver distraction is responsible for a substantial number of traffic accidents. This paper describes the impact of an auditory secondary task on drivers' mental states during a primary driving task. N=20 participants performed the test procedure in a car following task with repeated forced braking on a non-public test track. Performance measures (provoked reaction time to brake lights) and brain activity (EEG alpha spindles) were analyzed to describe distracted drivers. Further, a classification approach was used to investigate whether alpha spindles can predict drivers' mental states. Results show that reaction times and alpha spindle rate increased with time-on-task. Moreover, brake reaction times and alpha spindle rate were significantly higher while driving with auditory secondary task opposed to driving only. In single-trial classification, a combination of spindle parameters yielded a median classification error of about 8% in discriminating the distracted from the alert driving. Reduced driving performance (i.e., prolonged brake reaction times) during increased cognitive load is assumed to be indicated by EEG alpha spindles, enabling the quantification of driver distraction in experiments on public roads without verbally assessing the drivers' mental states. Copyright © 2013 Elsevier Ltd. All rights reserved.
Development of a scale of executive functioning for the RBANS.
Spencer, Robert J; Kitchen Andren, Katherine A; Tolle, Kathryn A
2018-01-01
The Repeatable Battery for the Assessment of Neuropsychological Status (RBANS) is a cognitive battery that contains scales of several cognitive abilities, but no scale in the instrument is exclusively dedicated to executive functioning. Although the subtests allow for observation of executive-type errors, each error is of fairly low base rate, and healthy and clinical normative data are lacking on the frequency of these types of errors, making their significance difficult to interpret in isolation. The aim of this project was to create an RBANS executive errors scale (RBANS EE) with items comprised of qualitatively dysexecutive errors committed throughout the test. Participants included Veterans referred for outpatient neuropsychological testing. Items were initially selected based on theoretical literature and were retained based on item-total correlations. The RBANS EE (a percentage calculated by dividing the number of dysexecutive errors by the total number of responses) was moderately related to each of seven established measures of executive functioning and was strongly predictive of dichotomous classification of executive impairment. Thus, the scale had solid concurrent validity, justifying its use as a supplementary scale. The RBANS EE requires no additional administration time and can provide a quantified measure of otherwise unmeasured aspects of executive functioning.
NASA Astrophysics Data System (ADS)
Książek, Judyta
2015-10-01
At present, there has been a great interest in the development of texture based image classification methods in many different areas. This study presents the results of research carried out to assess the usefulness of selected textural features for detection of asbestos-cement roofs in orthophotomap classification. Two different orthophotomaps of southern Poland (with ground resolution: 5 cm and 25 cm) were used. On both orthoimages representative samples for two classes: asbestos-cement roofing sheets and other roofing materials were selected. Estimation of texture analysis usefulness was conducted using machine learning methods based on decision trees (C5.0 algorithm). For this purpose, various sets of texture parameters were calculated in MaZda software. During the calculation of decision trees different numbers of texture parameters groups were considered. In order to obtain the best settings for decision trees models cross-validation was performed. Decision trees models with the lowest mean classification error were selected. The accuracy of the classification was held based on validation data sets, which were not used for the classification learning. For 5 cm ground resolution samples, the lowest mean classification error was 15.6%. The lowest mean classification error in the case of 25 cm ground resolution was 20.0%. The obtained results confirm potential usefulness of the texture parameter image processing for detection of asbestos-cement roofing sheets. In order to improve the accuracy another extended study should be considered in which additional textural features as well as spectral characteristics should be analyzed.
Caprihan, A; Pearlson, G D; Calhoun, V D
2008-08-15
Principal component analysis (PCA) is often used to reduce the dimension of data before applying more sophisticated data analysis methods such as non-linear classification algorithms or independent component analysis. This practice is based on selecting components corresponding to the largest eigenvalues. If the ultimate goal is separation of data in two groups, then these set of components need not have the most discriminatory power. We measured the distance between two such populations using Mahalanobis distance and chose the eigenvectors to maximize it, a modified PCA method, which we call the discriminant PCA (DPCA). DPCA was applied to diffusion tensor-based fractional anisotropy images to distinguish age-matched schizophrenia subjects from healthy controls. The performance of the proposed method was evaluated by the one-leave-out method. We show that for this fractional anisotropy data set, the classification error with 60 components was close to the minimum error and that the Mahalanobis distance was twice as large with DPCA, than with PCA. Finally, by masking the discriminant function with the white matter tracts of the Johns Hopkins University atlas, we identified left superior longitudinal fasciculus as the tract which gave the least classification error. In addition, with six optimally chosen tracts the classification error was zero.
Using reconstructed IVUS images for coronary plaque classification.
Caballero, Karla L; Barajas, Joel; Pujol, Oriol; Rodriguez, Oriol; Radeva, Petia
2007-01-01
Coronary plaque rupture is one of the principal causes of sudden death in western societies. Reliable diagnostic of the different plaque types are of great interest for the medical community the predicting their evolution and applying an effective treatment. To achieve this, a tissue classification must be performed. Intravascular Ultrasound (IVUS) represents a technique to explore the vessel walls and to observe its histological properties. In this paper, a method to reconstruct IVUS images from the raw Radio Frequency (RF) data coming from ultrasound catheter is proposed. This framework offers a normalization scheme to compare accurately different patient studies. The automatic tissue classification is based on texture analysis and Adapting Boosting (Adaboost) learning technique combined with Error Correcting Output Codes (ECOC). In this study, 9 in-vivo cases are reconstructed with 7 different parameter set. This method improves the classification rate based on images, yielding a 91% of well-detected tissue using the best parameter set. It also reduces the inter-patient variability compared with the analysis of DICOM images, which are obtained from the commercial equipment.
Taghanaki, Saeid Asgari; Kawahara, Jeremy; Miles, Brandon; Hamarneh, Ghassan
2017-07-01
Feature reduction is an essential stage in computer aided breast cancer diagnosis systems. Multilayer neural networks can be trained to extract relevant features by encoding high-dimensional data into low-dimensional codes. Optimizing traditional auto-encoders works well only if the initial weights are close to a proper solution. They are also trained to only reduce the mean squared reconstruction error (MRE) between the encoder inputs and the decoder outputs, but do not address the classification error. The goal of the current work is to test the hypothesis that extending traditional auto-encoders (which only minimize reconstruction error) to multi-objective optimization for finding Pareto-optimal solutions provides more discriminative features that will improve classification performance when compared to single-objective and other multi-objective approaches (i.e. scalarized and sequential). In this paper, we introduce a novel multi-objective optimization of deep auto-encoder networks, in which the auto-encoder optimizes two objectives: MRE and mean classification error (MCE) for Pareto-optimal solutions, rather than just MRE. These two objectives are optimized simultaneously by a non-dominated sorting genetic algorithm. We tested our method on 949 X-ray mammograms categorized into 12 classes. The results show that the features identified by the proposed algorithm allow a classification accuracy of up to 98.45%, demonstrating favourable accuracy over the results of state-of-the-art methods reported in the literature. We conclude that adding the classification objective to the traditional auto-encoder objective and optimizing for finding Pareto-optimal solutions, using evolutionary multi-objective optimization, results in producing more discriminative features. Copyright © 2017 Elsevier B.V. All rights reserved.
Wildlife management by habitat units: A preliminary plan of action
NASA Technical Reports Server (NTRS)
Frentress, C. D.; Frye, R. G.
1975-01-01
Procedures for yielding vegetation type maps were developed using LANDSAT data and a computer assisted classification analysis (LARSYS) to assist in managing populations of wildlife species by defined area units. Ground cover in Travis County, Texas was classified on two occasions using a modified version of the unsupervised approach to classification. The first classification produced a total of 17 classes. Examination revealed that further grouping was justified. A second analysis produced 10 classes which were displayed on printouts which were later color-coded. The final classification was 82 percent accurate. While the classification map appeared to satisfactorily depict the existing vegetation, two classes were determined to contain significant error. The major sources of error could have been eliminated by stratifying cluster sites more closely among previously mapped soil associations that are identified with particular plant associations and by precisely defining class nomenclature using established criteria early in the analysis.
NASA Astrophysics Data System (ADS)
Richards, Joseph W.; Starr, Dan L.; Miller, Adam A.; Bloom, Joshua S.; Butler, Nathaniel R.; Brink, Henrik; Crellin-Quick, Arien
2012-12-01
With growing data volumes from synoptic surveys, astronomers necessarily must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of classification purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In addition to producing accurate classifications, we show how to estimate calibrated class probabilities and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All-Sky Automated Survey (ASAS), and release the Machine-learned ASAS Classification Catalog (MACC), a 28 class probabilistic classification catalog of 50,124 ASAS sources in the ASAS Catalog of Variable Stars. We estimate that MACC achieves a sub-20% classification error rate and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Richards, Joseph W.; Starr, Dan L.; Miller, Adam A.
2012-12-15
With growing data volumes from synoptic surveys, astronomers necessarily must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of classification purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In additionmore » to producing accurate classifications, we show how to estimate calibrated class probabilities and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All-Sky Automated Survey (ASAS), and release the Machine-learned ASAS Classification Catalog (MACC), a 28 class probabilistic classification catalog of 50,124 ASAS sources in the ASAS Catalog of Variable Stars. We estimate that MACC achieves a sub-20% classification error rate and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes.« less
Shahly, Victoria; Berglund, Patricia A; Coulouvrat, Catherine; Fitzgerald, Timothy; Hajak, Goeran; Roth, Thomas; Shillington, Alicia C; Stephenson, Judith J; Walsh, James K; Kessler, Ronald C
2012-10-01
Insomnia is a common and seriously impairing condition that often goes unrecognized. To examine associations of broadly defined insomnia (ie, meeting inclusion criteria for a diagnosis from International Statistical Classification of Diseases, 10th Revision, DSM-IV, or Research Diagnostic Criteria/International Classification of Sleep Disorders, Second Edition) with costly workplace accidents and errors after excluding other chronic conditions among workers in the America Insomnia Survey (AIS). A national cross-sectional telephone survey (65.0% cooperation rate) of commercially insured health plan members selected from the more than 34 million in the HealthCore Integrated Research Database. Four thousand nine hundred ninety-one employed AIS respondents. Costly workplace accidents or errors in the 12 months before the AIS interview were assessed with one question about workplace accidents "that either caused damage or work disruption with a value of $500 or more" and another about other mistakes "that cost your company $500 or more." Current insomnia with duration of at least 12 months was assessed with the Brief Insomnia Questionnaire, a validated (area under the receiver operating characteristic curve, 0.86 compared with diagnoses based on blinded clinical reappraisal interviews), fully structured diagnostic interview. Eighteen other chronic conditions were assessed with medical/pharmacy claims records and validated self-report scales. Insomnia had a significant odds ratio with workplace accidents and/or errors controlled for other chronic conditions (1.4). The odds ratio did not vary significantly with respondent age, sex, educational level, or comorbidity. The average costs of insomnia-related accidents and errors ($32 062) were significantly higher than those of other accidents and errors ($21 914). Simulations estimated that insomnia was associated with 7.2% of all costly workplace accidents and errors and 23.7% of all the costs of these incidents. These proportions are higher than for any other chronic condition, with annualized US population projections of 274 000 costly insomnia-related workplace accidents and errors having a combined value of US $31.1 billion. Effectiveness trials are needed to determine whether expanded screening, outreach, and treatment of workers with insomnia would yield a positive return on investment for employers.
Classification Model for Forest Fire Hotspot Occurrences Prediction Using ANFIS Algorithm
NASA Astrophysics Data System (ADS)
Wijayanto, A. K.; Sani, O.; Kartika, N. D.; Herdiyeni, Y.
2017-01-01
This study proposed the application of data mining technique namely Adaptive Neuro-Fuzzy inference system (ANFIS) on forest fires hotspot data to develop classification models for hotspots occurrence in Central Kalimantan. Hotspot is a point that is indicated as the location of fires. In this study, hotspot distribution is categorized as true alarm and false alarm. ANFIS is a soft computing method in which a given inputoutput data set is expressed in a fuzzy inference system (FIS). The FIS implements a nonlinear mapping from its input space to the output space. The method of this study classified hotspots as target objects by correlating spatial attributes data using three folds in ANFIS algorithm to obtain the best model. The best result obtained from the 3rd fold provided low error for training (error = 0.0093676) and also low error testing result (error = 0.0093676). Attribute of distance to road is the most determining factor that influences the probability of true and false alarm where the level of human activities in this attribute is higher. This classification model can be used to develop early warning system of forest fire.
CREW Escape Capsule Retrorocket Concept. Volume 2. Selection of a Retrorocket System
1977-05-01
Unlimited. 17 DISTRIUTION ST ATEMENT (or h & afiertct entered In Block 20 it different from Repor ) e IS SUPPLEMENTARY NOTES 19 K(EY WOrlDS (Continue -r...c.& UNCLASSIFIED SECUR TN CLASSIFICATION OF THIS PAGE(Wrin Veta Entered) 20. combination with recovery system descent rates of 30 through 60 ft/sec...thrusts are calculated by equation 1. A value of ignition height error E = ±0.5 ft or 2 JEL = 1.0 ft was selected as a design goal based on previous
2004-01-01
chlorophyll content, and the more vigorous the growth , the greater the reflectance. This helps the photointerpreter to better distinguish between plant and...land cover map and the field determinations for each class. Based on the error matrix, the accuracy rate for classifying each map class can be...duck- weed (Lemna, Spirodela, and Wolffia) and other nonrooted- floating aquatics. Because duckweed is free-floating, it can relocate day-to-day
A simple randomisation procedure for validating discriminant analysis: a methodological note.
Wastell, D G
1987-04-01
Because the goal of discriminant analysis (DA) is to optimise classification, it designedly exaggerates between-group differences. This bias complicates validation of DA. Jack-knifing has been used for validation but is inappropriate when stepwise selection (SWDA) is employed. A simple randomisation test is presented which is shown to give correct decisions for SWDA. The general superiority of randomisation tests over orthodox significance tests is discussed. Current work on non-parametric methods of estimating the error rates of prediction rules is briefly reviewed.
Iterative random vs. Kennard-Stone sampling for IR spectrum-based classification task using PLS2-DA
NASA Astrophysics Data System (ADS)
Lee, Loong Chuen; Liong, Choong-Yeun; Jemain, Abdul Aziz
2018-04-01
External testing (ET) is preferred over auto-prediction (AP) or k-fold-cross-validation in estimating more realistic predictive ability of a statistical model. With IR spectra, Kennard-stone (KS) sampling algorithm is often used to split the data into training and test sets, i.e. respectively for model construction and for model testing. On the other hand, iterative random sampling (IRS) has not been the favored choice though it is theoretically more likely to produce reliable estimation. The aim of this preliminary work is to compare performances of KS and IRS in sampling a representative training set from an attenuated total reflectance - Fourier transform infrared spectral dataset (of four varieties of blue gel pen inks) for PLS2-DA modeling. The `best' performance achievable from the dataset is estimated with AP on the full dataset (APF, error). Both IRS (n = 200) and KS were used to split the dataset in the ratio of 7:3. The classic decision rule (i.e. maximum value-based) is employed for new sample prediction via partial least squares - discriminant analysis (PLS2-DA). Error rate of each model was estimated repeatedly via: (a) AP on full data (APF, error); (b) AP on training set (APS, error); and (c) ET on the respective test set (ETS, error). A good PLS2-DA model is expected to produce APS, error and EVS, error that is similar to the APF, error. Bearing that in mind, the similarities between (a) APS, error vs. APF, error; (b) ETS, error vs. APF, error and; (c) APS, error vs. ETS, error were evaluated using correlation tests (i.e. Pearson and Spearman's rank test), using series of PLS2-DA models computed from KS-set and IRS-set, respectively. Overall, models constructed from IRS-set exhibits more similarities between the internal and external error rates than the respective KS-set, i.e. less risk of overfitting. In conclusion, IRS is more reliable than KS in sampling representative training set.
Approximated mutual information training for speech recognition using myoelectric signals.
Guo, Hua J; Chan, A D C
2006-01-01
A new training algorithm called the approximated maximum mutual information (AMMI) is proposed to improve the accuracy of myoelectric speech recognition using hidden Markov models (HMMs). Previous studies have demonstrated that automatic speech recognition can be performed using myoelectric signals from articulatory muscles of the face. Classification of facial myoelectric signals can be performed using HMMs that are trained using the maximum likelihood (ML) algorithm; however, this algorithm maximizes the likelihood of the observations in the training sequence, which is not directly associated with optimal classification accuracy. The AMMI training algorithm attempts to maximize the mutual information, thereby training the HMMs to optimize their parameters for discrimination. Our results show that AMMI training consistently reduces the error rates compared to these by the ML training, increasing the accuracy by approximately 3% on average.
LACIE performance predictor FOC users manual
NASA Technical Reports Server (NTRS)
1976-01-01
The LACIE Performance Predictor (LPP) is a computer simulation of the LACIE process for predicting worldwide wheat production. The simulation provides for the introduction of various errors into the system and provides estimates based on these errors, thus allowing the user to determine the impact of selected error sources. The FOC LPP simulates the acquisition of the sample segment data by the LANDSAT Satellite (DAPTS), the classification of the agricultural area within the sample segment (CAMS), the estimation of the wheat yield (YES), and the production estimation and aggregation (CAS). These elements include data acquisition characteristics, environmental conditions, classification algorithms, the LACIE aggregation and data adjustment procedures. The operational structure for simulating these elements consists of the following key programs: (1) LACIE Utility Maintenance Process, (2) System Error Executive, (3) Ephemeris Generator, (4) Access Generator, (5) Acquisition Selector, (6) LACIE Error Model (LEM), and (7) Post Processor.
Sensitivity of geographic information system outputs to errors in remotely sensed data
NASA Technical Reports Server (NTRS)
Ramapriyan, H. K.; Boyd, R. K.; Gunther, F. J.; Lu, Y. C.
1981-01-01
The sensitivity of the outputs of a geographic information system (GIS) to errors in inputs derived from remotely sensed data (RSD) is investigated using a suitability model with per-cell decisions and a gridded geographic data base whose cells are larger than the RSD pixels. The process of preparing RSD as input to a GIS is analyzed, and the errors associated with classification and registration are examined. In the case of the model considered, it is found that the errors caused during classification and registration are partially compensated by the aggregation of pixels. The compensation is quantified by means of an analytical model, a Monte Carlo simulation, and experiments with Landsat data. The results show that error reductions of the order of 50% occur because of aggregation when 25 pixels of RSD are used per cell in the geographic data base.
Improving ECG Classification Accuracy Using an Ensemble of Neural Network Modules
Javadi, Mehrdad; Ebrahimpour, Reza; Sajedin, Atena; Faridi, Soheil; Zakernejad, Shokoufeh
2011-01-01
This paper illustrates the use of a combined neural network model based on Stacked Generalization method for classification of electrocardiogram (ECG) beats. In conventional Stacked Generalization method, the combiner learns to map the base classifiers' outputs to the target data. We claim adding the input pattern to the base classifiers' outputs helps the combiner to obtain knowledge about the input space and as the result, performs better on the same task. Experimental results support our claim that the additional knowledge according to the input space, improves the performance of the proposed method which is called Modified Stacked Generalization. In particular, for classification of 14966 ECG beats that were not previously seen during training phase, the Modified Stacked Generalization method reduced the error rate for 12.41% in comparison with the best of ten popular classifier fusion methods including Max, Min, Average, Product, Majority Voting, Borda Count, Decision Templates, Weighted Averaging based on Particle Swarm Optimization and Stacked Generalization. PMID:22046232
Kuster, Nils; Cristol, Jean-Paul; Cavalier, Etienne; Bargnoux, Anne-Sophie; Halimi, Jean-Michel; Froissart, Marc; Piéroni, Laurence; Delanaye, Pierre
2014-01-20
The National Kidney Disease Education Program group demonstrated that MDRD equation is sensitive to creatinine measurement error, particularly at higher glomerular filtration rates. Thus, MDRD-based eGFR above 60 mL/min/1.73 m² should not be reported numerically. However, little is known about the impact of analytical error on CKD-EPI-based estimates. This study aimed at assessing the impact of analytical characteristics (bias and imprecision) of 12 enzymatic and 4 compensated Jaffe previously characterized creatinine assays on MDRD and CKD-EPI eGFR. In a simulation study, the impact of analytical error was assessed on a hospital population of 24084 patients. Ability using each assay to correctly classify patients according to chronic kidney disease (CKD) stages was evaluated. For eGFR between 60 and 90 mL/min/1.73 m², both equations were sensitive to analytical error. Compensated Jaffe assays displayed high bias in this range and led to poorer sensitivity/specificity for classification according to CKD stages than enzymatic assays. As compared to MDRD equation, CKD-EPI equation decreases impact of analytical error in creatinine measurement above 90 mL/min/1.73 m². Compensated Jaffe creatinine assays lead to important errors in eGFR and should be avoided. Accurate enzymatic assays allow estimation of eGFR until 90 mL/min/1.73 m² with MDRD and 120 mL/min/1.73 m² with CKD-EPI equation. Copyright © 2013 Elsevier B.V. All rights reserved.
Objective Assessment of Patient Inhaler User Technique Using an Audio-Based Classification Approach.
Taylor, Terence E; Zigel, Yaniv; Egan, Clarice; Hughes, Fintan; Costello, Richard W; Reilly, Richard B
2018-02-01
Many patients make critical user technique errors when using pressurised metered dose inhalers (pMDIs) which reduce the clinical efficacy of respiratory medication. Such critical errors include poor actuation coordination (poor timing of medication release during inhalation) and inhaling too fast (peak inspiratory flow rate over 90 L/min). Here, we present a novel audio-based method that objectively assesses patient pMDI user technique. The Inhaler Compliance Assessment device was employed to record inhaler audio signals from 62 respiratory patients as they used a pMDI with an In-Check Flo-Tone device attached to the inhaler mouthpiece. Using a quadratic discriminant analysis approach, the audio-based method generated a total frame-by-frame accuracy of 88.2% in classifying sound events (actuation, inhalation and exhalation). The audio-based method estimated the peak inspiratory flow rate and volume of inhalations with an accuracy of 88.2% and 83.94% respectively. It was detected that 89% of patients made at least one critical user technique error even after tuition from an expert clinical reviewer. This method provides a more clinically accurate assessment of patient inhaler user technique than standard checklist methods.
Error Analysis in Composition of Iranian Lower Intermediate Students
ERIC Educational Resources Information Center
Taghavi, Mehdi
2012-01-01
Learners make errors during the process of learning languages. This study examines errors in writing task of twenty Iranian lower intermediate male students aged between 13 and 15. A subject was given to the participants was a composition about the seasons of a year. All of the errors were identified and classified. Corder's classification (1967)…
Hinton-Bayre, Anton D
2011-02-01
There is an ongoing debate over the preferred method(s) for determining the reliable change (RC) in individual scores over time. In the present paper, specificity comparisons of several classic and contemporary RC models were made using a real data set. This included a more detailed review of a new RC model recently proposed in this journal, that used the within-subjects standard deviation (WSD) as the error term. It was suggested that the RC(WSD) was more sensitive to change and theoretically superior. The current paper demonstrated that even in the presence of mean practice effects, false-positive rates were comparable across models when reliability was good and initial and retest variances were equivalent. However, when variances differed, discrepancies in classification across models became evident. Notably, the RC using the WSD provided unacceptably high false-positive rates in this setting. It was considered that the WSD was never intended for measuring change in this manner. The WSD actually combines systematic and error variance. The systematic variance comes from measurable between-treatment differences, commonly referred to as practice effect. It was further demonstrated that removal of the systematic variance and appropriate modification of the residual error term for the purpose of testing individual change yielded an error term already published and criticized in the literature. A consensus on the RC approach is needed. To that end, further comparison of models under varied conditions is encouraged.
Combining multiple decisions: applications to bioinformatics
NASA Astrophysics Data System (ADS)
Yukinawa, N.; Takenouchi, T.; Oba, S.; Ishii, S.
2008-01-01
Multi-class classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. This article reviews two recent approaches to multi-class classification by combining multiple binary classifiers, which are formulated based on a unified framework of error-correcting output coding (ECOC). The first approach is to construct a multi-class classifier in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. In the second approach, misclassification of each binary classifier is formulated as a bit inversion error with a probabilistic model by making an analogy to the context of information transmission theory. Experimental studies using various real-world datasets including cancer classification problems reveal that both of the new methods are superior or comparable to other multi-class classification methods.
Credit Risk Evaluation Using a C-Variable Least Squares Support Vector Classification Model
NASA Astrophysics Data System (ADS)
Yu, Lean; Wang, Shouyang; Lai, K. K.
Credit risk evaluation is one of the most important issues in financial risk management. In this paper, a C-variable least squares support vector classification (C-VLSSVC) model is proposed for credit risk analysis. The main idea of this model is based on the prior knowledge that different classes may have different importance for modeling and more weights should be given to those classes with more importance. The C-VLSSVC model can be constructed by a simple modification of the regularization parameter in LSSVC, whereby more weights are given to the lease squares classification errors with important classes than the lease squares classification errors with unimportant classes while keeping the regularized terms in its original form. For illustration purpose, a real-world credit dataset is used to test the effectiveness of the C-VLSSVC model.
Multiclass Classification of Cardiac Arrhythmia Using Improved Feature Selection and SVM Invariants.
Mustaqeem, Anam; Anwar, Syed Muhammad; Majid, Muahammad
2018-01-01
Arrhythmia is considered a life-threatening disease causing serious health issues in patients, when left untreated. An early diagnosis of arrhythmias would be helpful in saving lives. This study is conducted to classify patients into one of the sixteen subclasses, among which one class represents absence of disease and the other fifteen classes represent electrocardiogram records of various subtypes of arrhythmias. The research is carried out on the dataset taken from the University of California at Irvine Machine Learning Data Repository. The dataset contains a large volume of feature dimensions which are reduced using wrapper based feature selection technique. For multiclass classification, support vector machine (SVM) based approaches including one-against-one (OAO), one-against-all (OAA), and error-correction code (ECC) are employed to detect the presence and absence of arrhythmias. The SVM method results are compared with other standard machine learning classifiers using varying parameters and the performance of the classifiers is evaluated using accuracy, kappa statistics, and root mean square error. The results show that OAO method of SVM outperforms all other classifiers by achieving an accuracy rate of 81.11% when used with 80/20 data split and 92.07% using 90/10 data split option.
Exception handling for sensor fusion
NASA Astrophysics Data System (ADS)
Chavez, G. T.; Murphy, Robin R.
1993-08-01
This paper presents a control scheme for handling sensing failures (sensor malfunctions, significant degradations in performance due to changes in the environment, and errant expectations) in sensor fusion for autonomous mobile robots. The advantages of the exception handling mechanism are that it emphasizes a fast response to sensing failures, is able to use only a partial causal model of sensing failure, and leads to a graceful degradation of sensing if the sensing failure cannot be compensated for. The exception handling mechanism consists of two modules: error classification and error recovery. The error classification module in the exception handler attempts to classify the type and source(s) of the error using a modified generate-and-test procedure. If the source of the error is isolated, the error recovery module examines its cache of recovery schemes, which either repair or replace the current sensing configuration. If the failure is due to an error in expectation or cannot be identified, the planner is alerted. Experiments using actual sensor data collected by the CSM Mobile Robotics/Machine Perception Laboratory's Denning mobile robot demonstrate the operation of the exception handling mechanism.
Using beta binomials to estimate classification uncertainty for ensemble models.
Clark, Robert D; Liang, Wenkel; Lee, Adam C; Lawless, Michael S; Fraczkiewicz, Robert; Waldman, Marvin
2014-01-01
Quantitative structure-activity (QSAR) models have enormous potential for reducing drug discovery and development costs as well as the need for animal testing. Great strides have been made in estimating their overall reliability, but to fully realize that potential, researchers and regulators need to know how confident they can be in individual predictions. Submodels in an ensemble model which have been trained on different subsets of a shared training pool represent multiple samples of the model space, and the degree of agreement among them contains information on the reliability of ensemble predictions. For artificial neural network ensembles (ANNEs) using two different methods for determining ensemble classification - one using vote tallies and the other averaging individual network outputs - we have found that the distribution of predictions across positive vote tallies can be reasonably well-modeled as a beta binomial distribution, as can the distribution of errors. Together, these two distributions can be used to estimate the probability that a given predictive classification will be in error. Large data sets comprised of logP, Ames mutagenicity, and CYP2D6 inhibition data are used to illustrate and validate the method. The distributions of predictions and errors for the training pool accurately predicted the distribution of predictions and errors for large external validation sets, even when the number of positive and negative examples in the training pool were not balanced. Moreover, the likelihood of a given compound being prospectively misclassified as a function of the degree of consensus between networks in the ensemble could in most cases be estimated accurately from the fitted beta binomial distributions for the training pool. Confidence in an individual predictive classification by an ensemble model can be accurately assessed by examining the distributions of predictions and errors as a function of the degree of agreement among the constituent submodels. Further, ensemble uncertainty estimation can often be improved by adjusting the voting or classification threshold based on the parameters of the error distribution. Finally, the profiles for models whose predictive uncertainty estimates are not reliable provide clues to that effect without the need for comparison to an external test set.
Pornography classification: The hidden clues in video space-time.
Moreira, Daniel; Avila, Sandra; Perez, Mauricio; Moraes, Daniel; Testoni, Vanessa; Valle, Eduardo; Goldenstein, Siome; Rocha, Anderson
2016-11-01
As web technologies and social networks become part of the general public's life, the problem of automatically detecting pornography is into every parent's mind - nobody feels completely safe when their children go online. In this paper, we focus on video-pornography classification, a hard problem in which traditional methods often employ still-image techniques - labeling frames individually prior to a global decision. Frame-based approaches, however, ignore significant cogent information brought by motion. Here, we introduce a space-temporal interest point detector and descriptor called Temporal Robust Features (TRoF). TRoF was custom-tailored for efficient (low processing time and memory footprint) and effective (high classification accuracy and low false negative rate) motion description, particularly suited to the task at hand. We aggregate local information extracted by TRoF into a mid-level representation using Fisher Vectors, the state-of-the-art model of Bags of Visual Words (BoVW). We evaluate our original strategy, contrasting it both to commercial pornography detection solutions, and to BoVW solutions based upon other space-temporal features from the scientific literature. The performance is assessed using the Pornography-2k dataset, a new challenging pornographic benchmark, comprising 2000 web videos and 140h of video footage. The dataset is also a contribution of this work and is very assorted, including both professional and amateur content, and it depicts several genres of pornography, from cartoon to live action, with diverse behavior and ethnicity. The best approach, based on a dense application of TRoF, yields a classification error reduction of almost 79% when compared to the best commercial classifier. A sparse description relying on TRoF detector is also noteworthy, for yielding a classification error reduction of over 69%, with 19× less memory footprint than the dense solution, and yet can also be implemented to meet real-time requirements. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Wang, Rong
2015-01-01
In real-world applications, the image of faces varies with illumination, facial expression, and poses. It seems that more training samples are able to reveal possible images of the faces. Though minimum squared error classification (MSEC) is a widely used method, its applications on face recognition usually suffer from the problem of a limited number of training samples. In this paper, we improve MSEC by using the mirror faces as virtual training samples. We obtained the mirror faces generated from original training samples and put these two kinds of samples into a new set. The face recognition experiments show that our method does obtain high accuracy performance in classification.
Kim, Junghoe; Calhoun, Vince D; Shim, Eunsoo; Lee, Jong-Hwan
2016-01-01
Functional connectivity (FC) patterns obtained from resting-state functional magnetic resonance imaging data are commonly employed to study neuropsychiatric conditions by using pattern classifiers such as the support vector machine (SVM). Meanwhile, a deep neural network (DNN) with multiple hidden layers has shown its ability to systematically extract lower-to-higher level information of image and speech data from lower-to-higher hidden layers, markedly enhancing classification accuracy. The objective of this study was to adopt the DNN for whole-brain resting-state FC pattern classification of schizophrenia (SZ) patients vs. healthy controls (HCs) and identification of aberrant FC patterns associated with SZ. We hypothesized that the lower-to-higher level features learned via the DNN would significantly enhance the classification accuracy, and proposed an adaptive learning algorithm to explicitly control the weight sparsity in each hidden layer via L1-norm regularization. Furthermore, the weights were initialized via stacked autoencoder based pre-training to further improve the classification performance. Classification accuracy was systematically evaluated as a function of (1) the number of hidden layers/nodes, (2) the use of L1-norm regularization, (3) the use of the pre-training, (4) the use of framewise displacement (FD) removal, and (5) the use of anatomical/functional parcellation. Using FC patterns from anatomically parcellated regions without FD removal, an error rate of 14.2% was achieved by employing three hidden layers and 50 hidden nodes with both L1-norm regularization and pre-training, which was substantially lower than the error rate from the SVM (22.3%). Moreover, the trained DNN weights (i.e., the learned features) were found to represent the hierarchical organization of aberrant FC patterns in SZ compared with HC. Specifically, pairs of nodes extracted from the lower hidden layer represented sparse FC patterns implicated in SZ, which was quantified by using kurtosis/modularity measures and features from the higher hidden layer showed holistic/global FC patterns differentiating SZ from HC. Our proposed schemes and reported findings attained by using the DNN classifier and whole-brain FC data suggest that such approaches show improved ability to learn hidden patterns in brain imaging data, which may be useful for developing diagnostic tools for SZ and other neuropsychiatric disorders and identifying associated aberrant FC patterns. Copyright © 2015 Elsevier Inc. All rights reserved.
Rank preserving sparse learning for Kinect based scene classification.
Tao, Dapeng; Jin, Lianwen; Yang, Zhao; Li, Xuelong
2013-10-01
With the rapid development of the RGB-D sensors and the promptly growing population of the low-cost Microsoft Kinect sensor, scene classification, which is a hard, yet important, problem in computer vision, has gained a resurgence of interest recently. That is because the depth of information provided by the Kinect sensor opens an effective and innovative way for scene classification. In this paper, we propose a new scheme for scene classification, which applies locality-constrained linear coding (LLC) to local SIFT features for representing the RGB-D samples and classifies scenes through the cooperation between a new rank preserving sparse learning (RPSL) based dimension reduction and a simple classification method. RPSL considers four aspects: 1) it preserves the rank order information of the within-class samples in a local patch; 2) it maximizes the margin between the between-class samples on the local patch; 3) the L1-norm penalty is introduced to obtain the parsimony property; and 4) it models the classification error minimization by utilizing the least-squares error minimization. Experiments are conducted on the NYU Depth V1 dataset and demonstrate the robustness and effectiveness of RPSL for scene classification.
Authentication of the botanical and geographical origin of honey by mid-infrared spectroscopy.
Ruoff, Kaspar; Luginbühl, Werner; Künzli, Raphael; Iglesias, María Teresa; Bogdanov, Stefan; Bosset, Jacques Olivier; von der Ohe, Katharina; von der Ohe, Werner; Amado, Renato
2006-09-06
The potential of Fourier transform mid-infrared spectroscopy (FT-MIR) using an attenuated total reflectance (ATR) cell was evaluated for the authentication of 11 unifloral (acacia, alpine rose, chestnut, dandelion, heather, lime, rape, fir honeydew, metcalfa honeydew, oak honeydew) and polyfloral honey types (n = 411 samples) previously classified with traditional methods such as chemical, pollen, and sensory analysis. Chemometric evaluation of the spectra was carried out by applying principal component analysis and linear discriminant analysis, the error rates of the discriminant models being calculated by using Bayes' theorem. The error rates ranged from <0.1% (polyfloral and heather honeys as well as honeydew honeys from metcalfa, oak, and fir) to 8.3% (alpine rose honey) in both jackknife classification and validation, depending on the honey type considered. This study indicates that ATR-MIR spectroscopy is a valuable tool for the authentication of the botanical origin and quality control and may also be useful for the determination of the geographical origin of honey.
On the error in crop acreage estimation using satellite (LANDSAT) data
NASA Technical Reports Server (NTRS)
Chhikara, R. (Principal Investigator)
1983-01-01
The problem of crop acreage estimation using satellite data is discussed. Bias and variance of a crop proportion estimate in an area segment obtained from the classification of its multispectral sensor data are derived as functions of the means, variances, and covariance of error rates. The linear discriminant analysis and the class proportion estimation for the two class case are extended to include a third class of measurement units, where these units are mixed on ground. Special attention is given to the investigation of mislabeling in training samples and its effect on crop proportion estimation. It is shown that the bias and variance of the estimate of a specific crop acreage proportion increase as the disparity in mislabeling rates between two classes increases. Some interaction is shown to take place, causing the bias and the variance to decrease at first and then to increase, as the mixed unit class varies in size from 0 to 50 percent of the total area segment.
NASA Astrophysics Data System (ADS)
Rokni Deilmai, B.; Ahmad, B. Bin; Zabihi, H.
2014-06-01
Mapping is essential for the analysis of the land use and land cover, which influence many environmental processes and properties. For the purpose of the creation of land cover maps, it is important to minimize error. These errors will propagate into later analyses based on these land cover maps. The reliability of land cover maps derived from remotely sensed data depends on an accurate classification. In this study, we have analyzed multispectral data using two different classifiers including Maximum Likelihood Classifier (MLC) and Support Vector Machine (SVM). To pursue this aim, Landsat Thematic Mapper data and identical field-based training sample datasets in Johor Malaysia used for each classification method, which results indicate in five land cover classes forest, oil palm, urban area, water, rubber. Classification results indicate that SVM was more accurate than MLC. With demonstrated capability to produce reliable cover results, the SVM methods should be especially useful for land cover classification.
On the Discriminant Analysis in the 2-Populations Case
NASA Astrophysics Data System (ADS)
Rublík, František
2008-01-01
The empirical Bayes Gaussian rule, which in the normal case yields good values of the probability of total error, may yield high values of the maximum probability error. From this point of view the presented modified version of the classification rule of Broffitt, Randles and Hogg appears to be superior. The modification included in this paper is termed as a WR method, and the choice of its weights is discussed. The mentioned methods are also compared with the K nearest neighbours classification rule.
Basavanhally, Ajay; Viswanath, Satish; Madabhushi, Anant
2015-01-01
Clinical trials increasingly employ medical imaging data in conjunction with supervised classifiers, where the latter require large amounts of training data to accurately model the system. Yet, a classifier selected at the start of the trial based on smaller and more accessible datasets may yield inaccurate and unstable classification performance. In this paper, we aim to address two common concerns in classifier selection for clinical trials: (1) predicting expected classifier performance for large datasets based on error rates calculated from smaller datasets and (2) the selection of appropriate classifiers based on expected performance for larger datasets. We present a framework for comparative evaluation of classifiers using only limited amounts of training data by using random repeated sampling (RRS) in conjunction with a cross-validation sampling strategy. Extrapolated error rates are subsequently validated via comparison with leave-one-out cross-validation performed on a larger dataset. The ability to predict error rates as dataset size increases is demonstrated on both synthetic data as well as three different computational imaging tasks: detecting cancerous image regions in prostate histopathology, differentiating high and low grade cancer in breast histopathology, and detecting cancerous metavoxels in prostate magnetic resonance spectroscopy. For each task, the relationships between 3 distinct classifiers (k-nearest neighbor, naive Bayes, Support Vector Machine) are explored. Further quantitative evaluation in terms of interquartile range (IQR) suggests that our approach consistently yields error rates with lower variability (mean IQRs of 0.0070, 0.0127, and 0.0140) than a traditional RRS approach (mean IQRs of 0.0297, 0.0779, and 0.305) that does not employ cross-validation sampling for all three datasets. PMID:25993029
A classification of errors in lay comprehension of medical documents.
Keselman, Alla; Smith, Catherine Arnott
2012-12-01
Emphasis on participatory medicine requires that patients and consumers participate in tasks traditionally reserved for healthcare providers. This includes reading and comprehending medical documents, often but not necessarily in the context of interacting with Personal Health Records (PHRs). Research suggests that while giving patients access to medical documents has many benefits (e.g., improved patient-provider communication), lay people often have difficulty understanding medical information. Informatics can address the problem by developing tools that support comprehension; this requires in-depth understanding of the nature and causes of errors that lay people make when comprehending clinical documents. The objective of this study was to develop a classification scheme of comprehension errors, based on lay individuals' retellings of two documents containing clinical text: a description of a clinical trial and a typical office visit note. While not comprehensive, the scheme can serve as a foundation of further development of a taxonomy of patients' comprehension errors. Eighty participants, all healthy volunteers, read and retold two medical documents. A data-driven content analysis procedure was used to extract and classify retelling errors. The resulting hierarchical classification scheme contains nine categories and 23 subcategories. The most common error made by the participants involved incorrectly recalling brand names of medications. Other common errors included misunderstanding clinical concepts, misreporting the objective of a clinical research study and physician's findings during a patient's visit, and confusing and misspelling clinical terms. A combination of informatics support and health education is likely to improve the accuracy of lay comprehension of medical documents. Published by Elsevier Inc.
NASA Astrophysics Data System (ADS)
Song, YoungJae; Sepulveda, Francisco
2017-02-01
Objective. Self-paced EEG-based BCIs (SP-BCIs) have traditionally been avoided due to two sources of uncertainty: (1) precisely when an intentional command is sent by the brain, i.e., the command onset detection problem, and (2) how different the intentional command is when compared to non-specific (or idle) states. Performance evaluation is also a problem and there are no suitable standard metrics available. In this paper we attempted to tackle these issues. Approach. Self-paced covert sound-production cognitive tasks (i.e., high pitch and siren-like sounds) were used to distinguish between intentional commands (IC) and idle states. The IC states were chosen for their ease of execution and negligible overlap with common cognitive states. Band power and a digital wavelet transform were used for feature extraction, and the Davies-Bouldin index was used for feature selection. Classification was performed using linear discriminant analysis. Main results. Performance was evaluated under offline and simulated-online conditions. For the latter, a performance score called true-false-positive (TFP) rate, ranging from 0 (poor) to 100 (perfect), was created to take into account both classification performance and onset timing errors. Averaging the results from the best performing IC task for all seven participants, an 77.7% true-positive (TP) rate was achieved in offline testing. For simulated-online analysis the best IC average TFP score was 76.67% (87.61% TP rate, 4.05% false-positive rate). Significance. Results were promising when compared to previous IC onset detection studies using motor imagery, in which best TP rates were reported as 72.0% and 79.7%, and which, crucially, did not take timing errors into account. Moreover, based on our literature review, there is no previous covert sound-production onset detection system for spBCIs. Results showed that the proposed onset detection technique and TFP performance metric have good potential for use in SP-BCIs.
Boubchir, Larbi; Touati, Youcef; Daachi, Boubaker; Chérif, Arab Ali
2015-08-01
In thought-based steering of robots, error potentials (ErrP) can appear when the action resulting from the brain-machine interface (BMI) classifier/controller does not correspond to the user's thought. Using the Steady State Visual Evoked Potentials (SSVEP) techniques, ErrP, which appear when a classification error occurs, are not easily recognizable by only examining the temporal or frequency characteristics of EEG signals. A supplementary classification process is therefore needed to identify them in order to stop the course of the action and back up to a recovery state. This paper presents a set of time-frequency (t-f) features for the detection and classification of EEG ErrP in extra-brain activities due to misclassification observed by a user exploiting non-invasive BMI and robot control in the task space. The proposed features are able to characterize and detect ErrP activities in the t-f domain. These features are derived from the information embedded in the t-f representation of EEG signals, and include the Instantaneous Frequency (IF), t-f information complexity, SVD information, energy concentration and sub-bands' energies. The experiment results on real EEG data show that the use of the proposed t-f features for detecting and classifying EEG ErrP achieved an overall classification accuracy up to 97% for 50 EEG segments using 2-class SVM classifier.
Dai, Shengfa; Wei, Qingguo
2017-01-01
Common spatial pattern algorithm is widely used to estimate spatial filters in motor imagery based brain-computer interfaces. However, use of a large number of channels will make common spatial pattern tend to over-fitting and the classification of electroencephalographic signals time-consuming. To overcome these problems, it is necessary to choose an optimal subset of the whole channels to save computational time and improve the classification accuracy. In this paper, a novel method named backtracking search optimization algorithm is proposed to automatically select the optimal channel set for common spatial pattern. Each individual in the population is a N-dimensional vector, with each component representing one channel. A population of binary codes generate randomly in the beginning, and then channels are selected according to the evolution of these codes. The number and positions of 1's in the code denote the number and positions of chosen channels. The objective function of backtracking search optimization algorithm is defined as the combination of classification error rate and relative number of channels. Experimental results suggest that higher classification accuracy can be achieved with much fewer channels compared to standard common spatial pattern with whole channels.
Evaluation of spatial filtering on the accuracy of wheat area estimate
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Moreira, M. A.; Chen, S. C.; Delima, A. M.
1982-01-01
A 3 x 3 pixel spatial filter for postclassification was used for wheat classification to evaluate the effects of this procedure on the accuracy of area estimation using LANDSAT digital data obtained from a single pass. Quantitative analyses were carried out in five test sites (approx 40 sq km each) and t tests showed that filtering with threshold values significantly decreased errors of commission and omission. In area estimation filtering improved the overestimate of 4.5% to 2.7% and the root-mean-square error decreased from 126.18 ha to 107.02 ha. Extrapolating the same procedure of automatic classification using spatial filtering for postclassification to the whole study area, the accuracy in area estimate was improved from the overestimate of 10.9% to 9.7%. It is concluded that when single pass LANDSAT data is used for crop identification and area estimation the postclassification procedure using a spatial filter provides a more accurate area estimate by reducing classification errors.
NASA Technical Reports Server (NTRS)
Chittineni, C. B.
1979-01-01
The problem of estimating label imperfections and the use of the estimation in identifying mislabeled patterns is presented. Expressions for the maximum likelihood estimates of classification errors and a priori probabilities are derived from the classification of a set of labeled patterns. Expressions also are given for the asymptotic variances of probability of correct classification and proportions. Simple models are developed for imperfections in the labels and for classification errors and are used in the formulation of a maximum likelihood estimation scheme. Schemes are presented for the identification of mislabeled patterns in terms of threshold on the discriminant functions for both two-class and multiclass cases. Expressions are derived for the probability that the imperfect label identification scheme will result in a wrong decision and are used in computing thresholds. The results of practical applications of these techniques in the processing of remotely sensed multispectral data are presented.
Quality improvement through implementation of discharge order reconciliation.
Lu, Yun; Clifford, Pamela; Bjorneby, Andreas; Thompson, Bruce; VanNorman, Samuel; Won, Katie; Larsen, Kevin
2013-05-01
A coordinated multidisciplinary process to reduce medication errors related to patient discharges to skilled-nursing facilities (SNFs) is described. After determining that medication errors were a frequent cause of readmission among patients discharged to SNFs, a medical center launched a two-phase quality-improvement project focused on cardiac and medical patients. Phase one of the project entailed a three-month failure modes and effects analysis of existing procedures discharge, followed by the development and pilot testing of a multidisciplinary, closed-loop workflow process involving staff and resident physicians, clinical nurse coordinators, and clinical pharmacists. During pilot testing of the new workflow process, the rate of discharge medication errors involving SNF patients was tracked, and data on medication-related readmissions in a designated intervention group (n = 87) and a control group of patients (n = 1893) discharged to SNFs via standard procedures during a nine-month period were collected, with the data stratified using severity of illness (SOI) classification. Analysis of the collected data indicated a cumulative 30-day medication-related readmission rate for study group patients in the minor, moderate, and major SOI categories of 5.4% (4 of 74 patients), compared with a rate of 9.5% (169 of 1780 patients) in the control group. In phase 2 of the project, the revised SNF discharge medication reconciliation procedure was implemented throughout the hospital; since hospitalwide implementation of the new workflow, the readmission rate for SNF patients has been maintained at about 6.7%. Implementing a standardized discharge order reconciliation process that includes pharmacists led to decreased readmission rates and improved care for patients discharged to SNFs.
Automatic tissue characterization from ultrasound imagery
NASA Astrophysics Data System (ADS)
Kadah, Yasser M.; Farag, Aly A.; Youssef, Abou-Bakr M.; Badawi, Ahmed M.
1993-08-01
In this work, feature extraction algorithms are proposed to extract the tissue characterization parameters from liver images. Then the resulting parameter set is further processed to obtain the minimum number of parameters representing the most discriminating pattern space for classification. This preprocessing step was applied to over 120 pathology-investigated cases to obtain the learning data for designing the classifier. The extracted features are divided into independent training and test sets and are used to construct both statistical and neural classifiers. The optimal criteria for these classifiers are set to have minimum error, ease of implementation and learning, and the flexibility for future modifications. Various algorithms for implementing various classification techniques are presented and tested on the data. The best performance was obtained using a single layer tensor model functional link network. Also, the voting k-nearest neighbor classifier provided comparably good diagnostic rates.
A robust probabilistic collaborative representation based classification for multimodal biometrics
NASA Astrophysics Data System (ADS)
Zhang, Jing; Liu, Huanxi; Ding, Derui; Xiao, Jianli
2018-04-01
Most of the traditional biometric recognition systems perform recognition with a single biometric indicator. These systems have suffered noisy data, interclass variations, unacceptable error rates, forged identity, and so on. Due to these inherent problems, it is not valid that many researchers attempt to enhance the performance of unimodal biometric systems with single features. Thus, multimodal biometrics is investigated to reduce some of these defects. This paper proposes a new multimodal biometric recognition approach by fused faces and fingerprints. For more recognizable features, the proposed method extracts block local binary pattern features for all modalities, and then combines them into a single framework. For better classification, it employs the robust probabilistic collaborative representation based classifier to recognize individuals. Experimental results indicate that the proposed method has improved the recognition accuracy compared to the unimodal biometrics.
Andreev, Victor P; Gillespie, Brenda W; Helfand, Brian T; Merion, Robert M
2016-01-01
Unsupervised classification methods are gaining acceptance in omics studies of complex common diseases, which are often vaguely defined and are likely the collections of disease subtypes. Unsupervised classification based on the molecular signatures identified in omics studies have the potential to reflect molecular mechanisms of the subtypes of the disease and to lead to more targeted and successful interventions for the identified subtypes. Multiple classification algorithms exist but none is ideal for all types of data. Importantly, there are no established methods to estimate sample size in unsupervised classification (unlike power analysis in hypothesis testing). Therefore, we developed a simulation approach allowing comparison of misclassification errors and estimating the required sample size for a given effect size, number, and correlation matrix of the differentially abundant proteins in targeted proteomics studies. All the experiments were performed in silico. The simulated data imitated the expected one from the study of the plasma of patients with lower urinary tract dysfunction with the aptamer proteomics assay Somascan (SomaLogic Inc, Boulder, CO), which targeted 1129 proteins, including 330 involved in inflammation, 180 in stress response, 80 in aging, etc. Three popular clustering methods (hierarchical, k-means, and k-medoids) were compared. K-means clustering performed much better for the simulated data than the other two methods and enabled classification with misclassification error below 5% in the simulated cohort of 100 patients based on the molecular signatures of 40 differentially abundant proteins (effect size 1.5) from among the 1129-protein panel. PMID:27524871
An analysis of USSPACECOM's space surveillance network sensor tasking methodology
NASA Astrophysics Data System (ADS)
Berger, Jeff M.; Moles, Joseph B.; Wilsey, David G.
1992-12-01
This study provides the basis for the development of a cost/benefit assessment model to determine the effects of alterations to the Space Surveillance Network (SSN) on orbital element (OE) set accuracy. It provides a review of current methods used by NORAD and the SSN to gather and process observations, an alternative to the current Gabbard classification method, and the development of a model to determine the effects of observation rate and correction interval on OE set accuracy. The proposed classification scheme is based on satellite J2 perturbations. Specifically, classes were established based on mean motion, eccentricity, and inclination since J2 perturbation effects are functions of only these elements. Model development began by creating representative sensor observations using a highly accurate orbital propagation model. These observations were compared to predicted observations generated using the NORAD Simplified General Perturbation (SGP4) model and differentially corrected using a Bayes, sequential estimation, algorithm. A 10-run Monte Carlo analysis was performed using this model on 12 satellites using 16 different observation rate/correction interval combinations. An ANOVA and confidence interval analysis of the results show that this model does demonstrate the differences in steady state position error based on varying observation rate and correction interval.
Computer discrimination procedures applicable to aerial and ERTS multispectral data
NASA Technical Reports Server (NTRS)
Richardson, A. J.; Torline, R. J.; Allen, W. A.
1970-01-01
Two statistical models are compared in the classification of crops recorded on color aerial photographs. A theory of error ellipses is applied to the pattern recognition problem. An elliptical boundary condition classification model (EBC), useful for recognition of candidate patterns, evolves out of error ellipse theory. The EBC model is compared with the minimum distance to the mean (MDM) classification model in terms of pattern recognition ability. The pattern recognition results of both models are interpreted graphically using scatter diagrams to represent measurement space. Measurement space, for this report, is determined by optical density measurements collected from Kodak Ektachrome Infrared Aero Film 8443 (EIR). The EBC model is shown to be a significant improvement over the MDM model.
NASA Astrophysics Data System (ADS)
Carter, Jeffrey R.; Simon, Wayne E.
1990-08-01
Neural networks are trained using Recursive Error Minimization (REM) equations to perform statistical classification. Using REM equations with continuous input variables reduces the required number of training experiences by factors of one to two orders of magnitude over standard back propagation. Replacing the continuous input variables with discrete binary representations reduces the number of connections by a factor proportional to the number of variables reducing the required number of experiences by another order of magnitude. Undesirable effects of using recurrent experience to train neural networks for statistical classification problems are demonstrated and nonrecurrent experience used to avoid these undesirable effects. 1. THE 1-41 PROBLEM The statistical classification problem which we address is is that of assigning points in ddimensional space to one of two classes. The first class has a covariance matrix of I (the identity matrix) the covariance matrix of the second class is 41. For this reason the problem is known as the 1-41 problem. Both classes have equal probability of occurrence and samples from both classes may appear anywhere throughout the ddimensional space. Most samples near the origin of the coordinate system will be from the first class while most samples away from the origin will be from the second class. Since the two classes completely overlap it is impossible to have a classifier with zero error. The minimum possible error is known as the Bayes error and
Calibration of remotely sensed proportion or area estimates for misclassification error
Raymond L. Czaplewski; Glenn P. Catts
1992-01-01
Classifications of remotely sensed data contain misclassification errors that bias areal estimates. Monte Carlo techniques were used to compare two statistical methods that correct or calibrate remotely sensed areal estimates for misclassification bias using reference data from an error matrix. The inverse calibration estimator was consistently superior to the...
NASA Astrophysics Data System (ADS)
Young, A. J.; Kuiken, T. A.; Hargrove, L. J.
2014-10-01
Objective. The purpose of this study was to determine the contribution of electromyography (EMG) data, in combination with a diverse array of mechanical sensors, to locomotion mode intent recognition in transfemoral amputees using powered prostheses. Additionally, we determined the effect of adding time history information using a dynamic Bayesian network (DBN) for both the mechanical and EMG sensors. Approach. EMG signals from the residual limbs of amputees have been proposed to enhance pattern recognition-based intent recognition systems for powered lower limb prostheses, but mechanical sensors on the prosthesis—such as inertial measurement units, position and velocity sensors, and load cells—may be just as useful. EMG and mechanical sensor data were collected from 8 transfemoral amputees using a powered knee/ankle prosthesis over basic locomotion modes such as walking, slopes and stairs. An offline study was conducted to determine the benefit of different sensor sets for predicting intent. Main results. EMG information was not as accurate alone as mechanical sensor information (p < 0.05) for any classification strategy. However, EMG in combination with the mechanical sensor data did significantly reduce intent recognition errors (p < 0.05) both for transitions between locomotion modes and steady-state locomotion. The sensor time history (DBN) classifier significantly reduced error rates compared to a linear discriminant classifier for steady-state steps, without increasing the transitional error, for both EMG and mechanical sensors. Combining EMG and mechanical sensor data with sensor time history reduced the average transitional error from 18.4% to 12.2% and the average steady-state error from 3.8% to 1.0% when classifying level-ground walking, ramps, and stairs in eight transfemoral amputee subjects. Significance. These results suggest that a neural interface in combination with time history methods for locomotion mode classification can enhance intent recognition performance; this strategy should be considered for future real-time experiments.
NASA Astrophysics Data System (ADS)
d'Oleire-Oltmanns, Sebastian; Marzolff, Irene; Tiede, Dirk; Blaschke, Thomas
2015-04-01
The need for area-wide landform mapping approaches, especially in terms of land degradation, can be ascribed to the fact that within area-wide landform mapping approaches, the (spatial) context of erosional landforms is considered by providing additional information on the physiography neighboring the distinct landform. This study presents an approach for the detection of gully-affected areas by applying object-based image analysis in the region of Taroudannt, Morocco, which is highly affected by gully erosion while simultaneously representing a major region of agro-industry with a high demand of arable land. Various sensors provide readily available high-resolution optical satellite data with a much better temporal resolution than 3D terrain data which lead to the development of an area-wide mapping approach to extract gully-affected areas using only optical satellite imagery. The classification rule-set was developed with a clear focus on virtual spatial independence within the software environment of eCognition Developer. This allows the incorporation of knowledge about the target objects under investigation. Only optical QuickBird-2 satellite data and freely-available OpenStreetMap (OSM) vector data were used as input data. The OSM vector data were incorporated in order to mask out plantations and residential areas. Optical input data are more readily available for a broad range of users compared to terrain data, which is considered to be a major advantage. The methodology additionally incorporates expert knowledge and freely-available vector data in a cyclic object-based image analysis approach. This connects the two fields of geomorphology and remote sensing. The classification results allow conclusions on the current distribution of gullies. The results of the classification were checked against manually delineated reference data incorporating expert knowledge based on several field campaigns in the area, resulting in an overall classification accuracy of 62%. The error of omission accounts for 38% and the error of commission for 16%, respectively. Additionally, a manual assessment was carried out to assess the quality of the applied classification algorithm. The limited error of omission contributes with 23% to the overall error of omission and the limited error of commission contributes with 98% to the overall error of commission. This assessment improves the results and confirms the high quality of the developed approach for area-wide mapping of gully-affected areas in larger regions. In the field of landform mapping, the overall quality of the classification results is often assessed with more than one method to incorporate all aspects adequately.
Rowen, Donna; Stevens, Katherine; Labeit, Alexander; Elliott, Jackie; Mulhern, Brendan; Carlton, Jill; Basarir, Hasan; Ratcliffe, Julie; Brazier, John
2018-01-01
To describe the use of a novel approach in health valuation of a discrete-choice experiment (DCE) including a cost attribute to value a recently developed classification system for measuring the quality-of-life impact (both health and treatment experience) of self-management for diabetes. A large online survey was conducted using DCE with cost on UK respondents from the general population (n = 1497) and individuals with diabetes (n = 405). The data were modeled using a conditional logit model with robust standard errors. The marginal rate of substitution was used to generate willingness-to-pay (WTP) estimates for every state defined by the classification system. Robustness of results was assessed by including interaction effects for household income. There were some logical inconsistencies and insignificant coefficients for the milder levels of some attributes. There were some differences in the rank ordering of different attributes for the general population and diabetic patients. The WTP to avoid the most severe state was £1118.53 per month for the general population and £2356.02 per month for the diabetic patient population. The results were largely robust. Health and self-management can be valued in a single classification system using DCE with cost. The marginal rate of substitution for key attributes can be used to inform cost-benefit analysis of self-management interventions in diabetes using results from clinical studies in which this new classification system has been applied. The method shows promise, but found large WTP estimates exceeding the cost levels used in the survey. Copyright © 2018 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Schwartzkopf, Wade C; Bovik, Alan C; Evans, Brian L
2005-12-01
Traditional chromosome imaging has been limited to grayscale images, but recently a 5-fluorophore combinatorial labeling technique (M-FISH) was developed wherein each class of chromosomes binds with a different combination of fluorophores. This results in a multispectral image, where each class of chromosomes has distinct spectral components. In this paper, we develop new methods for automatic chromosome identification by exploiting the multispectral information in M-FISH chromosome images and by jointly performing chromosome segmentation and classification. We (1) develop a maximum-likelihood hypothesis test that uses multispectral information, together with conventional criteria, to select the best segmentation possibility; (2) use this likelihood function to combine chromosome segmentation and classification into a robust chromosome identification system; and (3) show that the proposed likelihood function can also be used as a reliable indicator of errors in segmentation, errors in classification, and chromosome anomalies, which can be indicators of radiation damage, cancer, and a wide variety of inherited diseases. We show that the proposed multispectral joint segmentation-classification method outperforms past grayscale segmentation methods when decomposing touching chromosomes. We also show that it outperforms past M-FISH classification techniques that do not use segmentation information.
NASA Technical Reports Server (NTRS)
Mobasseri, B. G.; Mcgillem, C. D.; Anuta, P. E. (Principal Investigator)
1978-01-01
The author has identified the following significant results. The probability of correct classification of various populations in data was defined as the primary performance index. The multispectral data being of multiclass nature as well, required a Bayes error estimation procedure that was dependent on a set of class statistics alone. The classification error was expressed in terms of an N dimensional integral, where N was the dimensionality of the feature space. The multispectral scanner spatial model was represented by a linear shift, invariant multiple, port system where the N spectral bands comprised the input processes. The scanner characteristic function, the relationship governing the transformation of the input spatial, and hence, spectral correlation matrices through the systems, was developed.
Carroll, Kristen L; Murray, Kathleen A; MacLeod, Lynne M; Hennessey, Theresa A; Woiczik, Marcella R; Roach, James W
2011-06-01
Numerous studies underscore the poor intraobserver and interobserver reliability of both the center edge angle (CEA) and the Severin classification using plain film measurements. In this study, experienced observers applied a computer-assisted measurement program to determine the CEA in digital pelvic radiographs of adults who had been previously treated for dysplasia of the hip (DDH). Using a teaching aid/algorithm of the Severin classification, the observers then assigned a Severin rating to these hips. Intraobserver and interobserver errors were then calculated on both the CEA measurements and the Severin classifications. Four pediatric orthopaedic surgeons and 1 pediatric radiologist calculated the CEAs using the OrthoView TM planning system and then determined the Severin classification on 41 blinded digital pelvic radiographs. The radiographs were evaluated by each examiner twice, with evaluations separated by 2 months. All examiners reviewed a Severin classification algorithm before making their Severin assignments. The intraobserver and interobserver reliability for both the CEA and the Severin classification were calculated using the interclass correlation coefficients and Cohen and Fleiss κ scores, respectively. The intraobserver and interobserver reliability for CEA measurement was moderate to almost perfect. When we separated the Severin classification into 3 clinically relevant groups of good (Severin I and II), dysplastic (Severin III), and poor (Severin IV and above), our interobserver reliability neared almost perfect. The Severin classification is an extremely useful and oft-used radiographic measure for the success of DDH treatment. Our research found digital radiography, computer-aided measurement tools, the use of a Severin algorithm, and separating the Severin classification into 3 clinically relevant groups significantly increased the intraobserver and interobserver reliability of both the CEA and Severin classification. This finding will assist future studies using the CEA and Severin classification in the radiographic assessment of DDH treatment outcomes.
NASA Technical Reports Server (NTRS)
Fisher, Brad; Wolff, David B.
2010-01-01
Passive and active microwave rain sensors onboard earth-orbiting satellites estimate monthly rainfall from the instantaneous rain statistics collected during satellite overpasses. It is well known that climate-scale rain estimates from meteorological satellites incur sampling errors resulting from the process of discrete temporal sampling and statistical averaging. Sampling and retrieval errors ultimately become entangled in the estimation of the mean monthly rain rate. The sampling component of the error budget effectively introduces statistical noise into climate-scale rain estimates that obscure the error component associated with the instantaneous rain retrieval. Estimating the accuracy of the retrievals on monthly scales therefore necessitates a decomposition of the total error budget into sampling and retrieval error quantities. This paper presents results from a statistical evaluation of the sampling and retrieval errors for five different space-borne rain sensors on board nine orbiting satellites. Using an error decomposition methodology developed by one of the authors, sampling and retrieval errors were estimated at 0.25 resolution within 150 km of ground-based weather radars located at Kwajalein, Marshall Islands and Melbourne, Florida. Error and bias statistics were calculated according to the land, ocean and coast classifications of the surface terrain mask developed for the Goddard Profiling (GPROF) rain algorithm. Variations in the comparative error statistics are attributed to various factors related to differences in the swath geometry of each rain sensor, the orbital and instrument characteristics of the satellite and the regional climatology. The most significant result from this study found that each of the satellites incurred negative longterm oceanic retrieval biases of 10 to 30%.
Unbiased Taxonomic Annotation of Metagenomic Samples
Fosso, Bruno; Pesole, Graziano; Rosselló, Francesc
2018-01-01
Abstract The classification of reads from a metagenomic sample using a reference taxonomy is usually based on first mapping the reads to the reference sequences and then classifying each read at a node under the lowest common ancestor of the candidate sequences in the reference taxonomy with the least classification error. However, this taxonomic annotation can be biased by an imbalanced taxonomy and also by the presence of multiple nodes in the taxonomy with the least classification error for a given read. In this article, we show that the Rand index is a better indicator of classification error than the often used area under the receiver operating characteristic (ROC) curve and F-measure for both balanced and imbalanced reference taxonomies, and we also address the second source of bias by reducing the taxonomic annotation problem for a whole metagenomic sample to a set cover problem, for which a logarithmic approximation can be obtained in linear time and an exact solution can be obtained by integer linear programming. Experimental results with a proof-of-concept implementation of the set cover approach to taxonomic annotation in a next release of the TANGO software show that the set cover approach further reduces ambiguity in the taxonomic annotation obtained with TANGO without distorting the relative abundance profile of the metagenomic sample. PMID:29028181
Global land cover mapping: a review and uncertainty analysis
Congalton, Russell G.; Gu, Jianyu; Yadav, Kamini; Thenkabail, Prasad S.; Ozdogan, Mutlu
2014-01-01
Given the advances in remotely sensed imagery and associated technologies, several global land cover maps have been produced in recent times including IGBP DISCover, UMD Land Cover, Global Land Cover 2000 and GlobCover 2009. However, the utility of these maps for specific applications has often been hampered due to considerable amounts of uncertainties and inconsistencies. A thorough review of these global land cover projects including evaluating the sources of error and uncertainty is prudent and enlightening. Therefore, this paper describes our work in which we compared, summarized and conducted an uncertainty analysis of the four global land cover mapping projects using an error budget approach. The results showed that the classification scheme and the validation methodology had the highest error contribution and implementation priority. A comparison of the classification schemes showed that there are many inconsistencies between the definitions of the map classes. This is especially true for the mixed type classes for which thresholds vary for the attributes/discriminators used in the classification process. Examination of these four global mapping projects provided quite a few important lessons for the future global mapping projects including the need for clear and uniform definitions of the classification scheme and an efficient, practical, and valid design of the accuracy assessment.
Reliability of the Walker Cranial Nonmetric Method and Implications for Sex Estimation.
Lewis, Cheyenne J; Garvin, Heather M
2016-05-01
The cranial trait scoring method presented in Buikstra and Ubelaker (Standards for data collection from human skeletal remains. Fayetteville, AR: Arkansas Archeological Survey Research Series No. 44, 1994) and Walker (Am J Phys Anthropol, 136, 2008 and 39) is the most common nonmetric cranial sex estimation method utilized by physical and forensic anthropologists. As such, the reliability and accuracy of the method is vital to ensure its validity in forensic applications. In this study, inter- and intra-observer error rates for the Walker scoring method were calculated using a sample of U.S. White and Black individuals (n = 135). Cohen's weighted kappas, intraclass correlation coefficients, and percentage agreements indicate good agreement between trials and observers for all traits except the mental eminence. Slight disagreement in scoring, however, was found to impact sex classifications, leading to lower accuracy rates than those published by Walker. Furthermore, experience does appear to impact trait scoring and sex classification. The use of revised population-specific equations that avoid the mental eminence is highly recommended to minimize the potential for misclassifications. © 2016 American Academy of Forensic Sciences.
Error minimizing algorithms for nearest eighbor classifiers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Porter, Reid B; Hush, Don; Zimmer, G. Beate
2011-01-03
Stack Filters define a large class of discrete nonlinear filter first introd uced in image and signal processing for noise removal. In recent years we have suggested their application to classification problems, and investigated their relationship to other types of discrete classifiers such as Decision Trees. In this paper we focus on a continuous domain version of Stack Filter Classifiers which we call Ordered Hypothesis Machines (OHM), and investigate their relationship to Nearest Neighbor classifiers. We show that OHM classifiers provide a novel framework in which to train Nearest Neighbor type classifiers by minimizing empirical error based loss functions. Wemore » use the framework to investigate a new cost sensitive loss function that allows us to train a Nearest Neighbor type classifier for low false alarm rate applications. We report results on both synthetic data and real-world image data.« less
Defining and classifying medical error: lessons for patient safety reporting systems.
Tamuz, M; Thomas, E J; Franchois, K E
2004-02-01
It is important for healthcare providers to report safety related events, but little attention has been paid to how the definition and classification of events affects a hospital's ability to learn from its experience. To examine how the definition and classification of safety related events influences key organizational routines for gathering information, allocating incentives, and analyzing event reporting data. In semi-structured interviews, professional staff and administrators in a tertiary care teaching hospital and its pharmacy were asked to describe the existing programs designed to monitor medication safety, including the reporting systems. With a focus primarily on the pharmacy staff, interviews were audio recorded, transcribed, and analyzed using qualitative research methods. Eighty six interviews were conducted, including 36 in the hospital pharmacy. Examples are presented which show that: (1) the definition of an event could lead to under-reporting; (2) the classification of a medication error into alternative categories can influence the perceived incentives and disincentives for incident reporting; (3) event classification can enhance or impede organizational routines for data analysis and learning; and (4) routines that promote organizational learning within the pharmacy can reduce the flow of medication error data to the hospital. These findings from one hospital raise important practical and research questions about how reporting systems are influenced by the definition and classification of safety related events. By understanding more clearly how hospitals define and classify their experience, we may improve our capacity to learn and ultimately improve patient safety.
Fisher classifier and its probability of error estimation
NASA Technical Reports Server (NTRS)
Chittineni, C. B.
1979-01-01
Computationally efficient expressions are derived for estimating the probability of error using the leave-one-out method. The optimal threshold for the classification of patterns projected onto Fisher's direction is derived. A simple generalization of the Fisher classifier to multiple classes is presented. Computational expressions are developed for estimating the probability of error of the multiclass Fisher classifier.
Human factors analysis and classification system-HFACS.
DOT National Transportation Integrated Search
2000-02-01
Human error has been implicated in 70 to 80% of all civil and military aviation accidents. Yet, most accident : reporting systems are not designed around any theoretical framework of human error. As a result, most : accident databases are not conduci...
Automatic detection of malaria parasite in blood images using two parameters.
Kim, Jong-Dae; Nam, Kyeong-Min; Park, Chan-Young; Kim, Yu-Seop; Song, Hye-Jeong
2015-01-01
Malaria must be diagnosed quickly and accurately at the initial infection stage and treated early to cure it properly. The malaria diagnosis method using a microscope requires much labor and time of a skilled expert and the diagnosis results vary greatly between individual diagnosticians. Therefore, to be able to measure the malaria parasite infection quickly and accurately, studies have been conducted for automated classification techniques using various parameters. In this study, by measuring classification technique performance according to changes of two parameters, the parameter values were determined that best distinguish normal from plasmodium-infected red blood cells. To reduce the stain deviation of the acquired images, a principal component analysis (PCA) grayscale conversion method was used, and as parameters, we used a malaria infected area and a threshold value used in binarization. The parameter values with the best classification performance were determined by selecting the value (72) corresponding to the lowest error rate on the basis of cell threshold value 128 for the malaria threshold value for detecting plasmodium-infected red blood cells.
Selecting a Classification Ensemble and Detecting Process Drift in an Evolving Data Stream
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heredia-Langner, Alejandro; Rodriguez, Luke R.; Lin, Andy
2015-09-30
We characterize the commercial behavior of a group of companies in a common line of business using a small ensemble of classifiers on a stream of records containing commercial activity information. This approach is able to effectively find a subset of classifiers that can be used to predict company labels with reasonable accuracy. Performance of the ensemble, its error rate under stable conditions, can be characterized using an exponentially weighted moving average (EWMA) statistic. The behavior of the EWMA statistic can be used to monitor a record stream from the commercial network and determine when significant changes have occurred. Resultsmore » indicate that larger classification ensembles may not necessarily be optimal, pointing to the need to search the combinatorial classifier space in a systematic way. Results also show that current and past performance of an ensemble can be used to detect when statistically significant changes in the activity of the network have occurred. The dataset used in this work contains tens of thousands of high level commercial activity records with continuous and categorical variables and hundreds of labels, making classification challenging.« less
IMPACTS OF PATCH SIZE AND LANDSCAPE HETEROGENEITY ON THEMATIC IMAGE CLASSIFICATION ACCURACY
Impacts of Patch Size and Landscape Heterogeneity on Thematic Image Classification Accuracy.
Currently, most thematic accuracy assessments of classified remotely sensed images oily account for errors between the various classes employed, at particular pixels of interest, thu...
Sea ice classification using fast learning neural networks
NASA Technical Reports Server (NTRS)
Dawson, M. S.; Fung, A. K.; Manry, M. T.
1992-01-01
A first learning neural network approach to the classification of sea ice is presented. The fast learning (FL) neural network and a multilayer perceptron (MLP) trained with backpropagation learning (BP network) were tested on simulated data sets based on the known dominant scattering characteristics of the target class. Four classes were used in the data simulation: open water, thick lossy saline ice, thin saline ice, and multiyear ice. The BP network was unable to consistently converge to less than 25 percent error while the FL method yielded an average error of approximately 1 percent on the first iteration of training. The fast learning method presented can significantly reduce the CPU time necessary to train a neural network as well as consistently yield higher classification accuracy than BP networks.
Haghighi, Mohammad Hosein Hayavi; Dehghani, Mohammad; Teshnizi, Saeid Hoseini; Mahmoodi, Hamid
2014-01-01
Accurate cause of death coding leads to organised and usable death information but there are some factors that influence documentation on death certificates and therefore affect the coding. We reviewed the role of documentation errors on the accuracy of death coding at Shahid Mohammadi Hospital (SMH), Bandar Abbas, Iran. We studied the death certificates of all deceased patients in SMH from October 2010 to March 2011. Researchers determined and coded the underlying cause of death on the death certificates according to the guidelines issued by the World Health Organization in Volume 2 of the International Statistical Classification of Diseases and Health Related Problems-10th revision (ICD-10). Necessary ICD coding rules (such as the General Principle, Rules 1-3, the modification rules and other instructions about death coding) were applied to select the underlying cause of death on each certificate. Demographic details and documentation errors were then extracted. Data were analysed with descriptive statistics and chi square tests. The accuracy rate of causes of death coding was 51.7%, demonstrating a statistically significant relationship (p=.001) with major errors but not such a relationship with minor errors. Factors that result in poor quality of Cause of Death coding in SMH are lack of coder training, documentation errors and the undesirable structure of death certificates.
J-Plus: Morphological Classification Of Compact And Extended Sources By Pdf Analysis
NASA Astrophysics Data System (ADS)
López-Sanjuan, C.; Vázquez-Ramió, H.; Varela, J.; Spinoso, D.; Cristóbal-Hornillos, D.; Viironen, K.; Muniesa, D.; J-PLUS Collaboration
2017-10-01
We present a morphological classification of J-PLUS EDR sources into compact (i.e. stars) and extended (i.e. galaxies). Such classification is based on the Bayesian modelling of the concentration distribution, including observational errors and magnitude + sky position priors. We provide the star / galaxy probability of each source computed from the gri images. The comparison with the SDSS number counts support our classification up to r 21. The 31.7 deg² analised comprises 150k stars and 101k galaxies.
Analysis of data mining classification by comparison of C4.5 and ID algorithms
NASA Astrophysics Data System (ADS)
Sudrajat, R.; Irianingsih, I.; Krisnawan, D.
2017-01-01
The rapid development of information technology, triggered by the intensive use of information technology. For example, data mining widely used in investment. Many techniques that can be used assisting in investment, the method that used for classification is decision tree. Decision tree has a variety of algorithms, such as C4.5 and ID3. Both algorithms can generate different models for similar data sets and different accuracy. C4.5 and ID3 algorithms with discrete data provide accuracy are 87.16% and 99.83% and C4.5 algorithm with numerical data is 89.69%. C4.5 and ID3 algorithms with discrete data provides 520 and 598 customers and C4.5 algorithm with numerical data is 546 customers. From the analysis of the both algorithm it can classified quite well because error rate less than 15%.
Rhodes, Nathaniel J.; Richardson, Chad L.; Heraty, Ryan; Liu, Jiajun; Malczynski, Michael; Qi, Chao
2014-01-01
While a lack of concordance is known between gold standard MIC determinations and Vitek 2, the magnitude of the discrepancy and its impact on treatment decisions for extended-spectrum-β-lactamase (ESBL)-producing Escherichia coli are not. Clinical isolates of ESBL-producing E. coli were collected from blood, tissue, and body fluid samples from January 2003 to July 2009. Resistance genotypes were identified by PCR. Primary analyses evaluated the discordance between Vitek 2 and gold standard methods using cefepime susceptibility breakpoint cutoff values of 8, 4, and 2 μg/ml. The discrepancies in MICs between the methods were classified per convention as very major, major, and minor errors. Sensitivity, specificity, and positive and negative predictive values for susceptibility classifications were calculated. A total of 304 isolates were identified; 59% (179) of the isolates carried blaCTX-M, 47% (143) carried blaTEM, and 4% (12) carried blaSHV. At a breakpoint MIC of 8 μg/ml, Vitek 2 produced a categorical agreement of 66.8% and exhibited very major, major, and minor error rates of 23% (20/87 isolates), 5.1% (8/157 isolates), and 24% (73/304), respectively. The sensitivity, specificity, and positive and negative predictive values for a susceptibility breakpoint of 8 μg/ml were 94.9%, 61.2%, 72.3%, and 91.8%, respectively. The sensitivity, specificity, and positive and negative predictive values for a susceptibility breakpoint of 2 μg/ml were 83.8%, 65.3%, 41%, and 93.3%, respectively. Vitek 2 results in unacceptably high error rates for cefepime compared to those of agar dilution for ESBL-producing E. coli. Clinicians should be wary of making treatment decisions on the basis of Vitek 2 susceptibility results for ESBL-producing E. coli. PMID:24752253
Typing mineral deposits using their grades and tonnages in an artificial neural network
Singer, Donald A.; Kouda, Ryoichi
2003-01-01
A test of the ability of a probabilistic neural network to classify deposits into types on the basis of deposit tonnage and average Cu, Mo, Ag, Au, Zn, and Pb grades is conducted. The purpose is to examine whether this type of system might serve as a basis for integrating geoscience information available in large mineral databases to classify sites by deposit type. Benefits of proper classification of many sites in large regions are relatively rapid identification of terranes permissive for deposit types and recognition of specific sites perhaps worthy of exploring further.Total tonnages and average grades of 1,137 well-explored deposits identified in published grade and tonnage models representing 13 deposit types were used to train and test the network. Tonnages were transformed by logarithms and grades by square roots to reduce effects of skewness. All values were scaled by subtracting the variable's mean and dividing by its standard deviation. Half of the deposits were selected randomly to be used in training the probabilistic neural network and the other half were used for independent testing. Tests were performed with a probabilistic neural network employing a Gaussian kernel and separate sigma weights for each class (type) and each variable (grade or tonnage).Deposit types were selected to challenge the neural network. For many types, tonnages or average grades are significantly different from other types, but individual deposits may plot in the grade and tonnage space of more than one type. Porphyry Cu, porphyry Cu-Au, and porphyry Cu-Mo types have similar tonnages and relatively small differences in grades. Redbed Cu deposits typically have tonnages that could be confused with porphyry Cu deposits, also contain Cu and, in some situations, Ag. Cyprus and kuroko massive sulfide types have about the same tonnages. Cu, Zn, Ag, and Au grades. Polymetallic vein, sedimentary exhalative Zn-Pb, and Zn-Pb skarn types contain many of the same metals. Sediment-hosted Au, Comstock Au-Ag, and low-sulfide Au-quartz vein types are principally Au deposits with differing amounts of Ag.Given the intent to test the neural network under the most difficult conditions, an overall 75% agreement between the experts and the neural network is considered excellent. Among the largestclassification errors are skarn Zn-Pb and Cyprus massive sulfide deposits classed by the neuralnetwork as kuroko massive sulfides—24 and 63% error respectively. Other large errors are the classification of 92% of porphyry Cu-Mo as porphyry Cu deposits. Most of the larger classification errors involve 25 or fewer training deposits, suggesting that some errors might be the result of small sample size. About 91% of the gold deposit types were classed properly and 98% of porphyry Cu deposits were classes as some type of porphyry Cu deposit. An experienced economic geologist would not make many of the classification errors that were made by the neural network because the geologic settings of deposits would be used to reduce errors. In a separate test, the probabilistic neural network correctly classed 93% of 336 deposits in eight deposit types when trained with presence or absence of 58 minerals and six generalized rock types. The overall success rate of the probabilistic neural network when trained on tonnage and average grades would probably be more than 90% with additional information on the presence of a few rock types.
Halftoning Algorithms and Systems.
1996-08-01
TERMS 15. NUMBER IF PAGESi. Halftoning algorithms; error diffusions ; color printing; topographic maps 16. PRICE CODE 17. SECURITY CLASSIFICATION 18...graylevels for each screen level. In the case of error diffusion algorithms, the calibration procedure using the new centering concept manifests itself as a...Novel Centering Concept for Overlapping Correction Paper / Transparency (Patent Applied 5/94)I * Applications To Error Diffusion * To Dithering (IS&T
Simulation techniques for estimating error in the classification of normal patterns
NASA Technical Reports Server (NTRS)
Whitsitt, S. J.; Landgrebe, D. A.
1974-01-01
Methods of efficiently generating and classifying samples with specified multivariate normal distributions were discussed. Conservative confidence tables for sample sizes are given for selective sampling. Simulation results are compared with classified training data. Techniques for comparing error and separability measure for two normal patterns are investigated and used to display the relationship between the error and the Chernoff bound.
An embedded implementation based on adaptive filter bank for brain-computer interface systems.
Belwafi, Kais; Romain, Olivier; Gannouni, Sofien; Ghaffari, Fakhreddine; Djemal, Ridha; Ouni, Bouraoui
2018-07-15
Brain-computer interface (BCI) is a new communication pathway for users with neurological deficiencies. The implementation of a BCI system requires complex electroencephalography (EEG) signal processing including filtering, feature extraction and classification algorithms. Most of current BCI systems are implemented on personal computers. Therefore, there is a great interest in implementing BCI on embedded platforms to meet system specifications in terms of time response, cost effectiveness, power consumption, and accuracy. This article presents an embedded-BCI (EBCI) system based on a Stratix-IV field programmable gate array. The proposed system relays on the weighted overlap-add (WOLA) algorithm to perform dynamic filtering of EEG-signals by analyzing the event-related desynchronization/synchronization (ERD/ERS). The EEG-signals are classified, using the linear discriminant analysis algorithm, based on their spatial features. The proposed system performs fast classification within a time delay of 0.430 s/trial, achieving an average accuracy of 76.80% according to an offline approach and 80.25% using our own recording. The estimated power consumption of the prototype is approximately 0.7 W. Results show that the proposed EBCI system reduces the overall classification error rate for the three datasets of the BCI-competition by 5% compared to other similar implementations. Moreover, experiment shows that the proposed system maintains a high accuracy rate with a short processing time, a low power consumption, and a low cost. Performing dynamic filtering of EEG-signals using WOLA increases the recognition rate of ERD/ERS patterns of motor imagery brain activity. This approach allows to develop a complete prototype of a EBCI system that achieves excellent accuracy rates. Copyright © 2018 Elsevier B.V. All rights reserved.
Zhang, Jian-Hua; Peng, Xiao-Di; Liu, Hua; Raisch, Jörg; Wang, Ru-Bin
2013-12-01
The human operator's ability to perform their tasks can fluctuate over time. Because the cognitive demands of the task can also vary it is possible that the capabilities of the operator are not sufficient to satisfy the job demands. This can lead to serious errors when the operator is overwhelmed by the task demands. Psychophysiological measures, such as heart rate and brain activity, can be used to monitor operator cognitive workload. In this paper, the most influential psychophysiological measures are extracted to characterize Operator Functional State (OFS) in automated tasks under a complex form of human-automation interaction. The fuzzy c-mean (FCM) algorithm is used and tested for its OFS classification performance. The results obtained have shown the feasibility and effectiveness of the FCM algorithm as well as the utility of the selected input features for OFS classification. Besides being able to cope with nonlinearity and fuzzy uncertainty in the psychophysiological data it can provide information about the relative importance of the input features as well as the confidence estimate of the classification results. The OFS pattern classification method developed can be incorporated into an adaptive aiding system in order to enhance the overall performance of a large class of safety-critical human-machine cooperative systems.
NASA Astrophysics Data System (ADS)
Tsevas, S.; Iakovidis, D. K.
2011-11-01
Pulmonary infiltrates are common radiological findings indicating the filling of airspaces with fluid, inflammatory exudates, or cells. They are most common in cases of pneumonia, acute respiratory syndrome, atelectasis, pulmonary oedema and haemorrhage, whereas their extent is usually correlated with the extent or the severity of the underlying disease. In this paper we propose a novel pattern recognition framework for the measurement of the extent of pulmonary infiltrates in routine chest radiographs. The proposed framework follows a hierarchical approach to the assessment of image content. It includes the following: (a) sampling of the lung fields; (b) extraction of patient-specific grey-level histogram signatures from each sample; (c) classification of the extracted signatures into classes representing normal lung parenchyma and pulmonary infiltrates; (d) the samples for which the probability of belonging to one of the two classes does not reach an acceptable level are rejected and classified according to their textural content; (e) merging of the classification results of the two classification stages. The proposed framework has been evaluated on real radiographic images with pulmonary infiltrates caused by bacterial infections. The results show that accurate measurements of the infiltration areas can be obtained with respect to each lung field area. The average measurement error rate on the considered dataset reached 9.7% ± 1.0%.
Harmonization of forest disturbance datasets of the conterminous USA from 1986 to 2011
Soulard, Christopher E.; Acevedo, William; Cohen, Warren B.; Yang, Zhiqiang; Stehman, Stephen V.; Taylor, Janis L.
2017-01-01
Several spatial forest disturbance datasets exist for the conterminous USA. The major problem with forest disturbance mapping is that variability between map products leads to uncertainty regarding the actual rate of disturbance. In this article, harmonized maps were produced from multiple data sources (i.e., Global Forest Change, LANDFIRE Vegetation Disturbance, National Land Cover Database, Vegetation Change Tracker, and Web-Enabled Landsat Data). The harmonization process involved fitting common class ontologies and determining spatial congruency to produce forest disturbance maps for four time intervals (1986–1992, 1992–2001, 2001–2006, and 2006–2011). Pixels mapped as disturbed for two or more datasets were labeled as disturbed in the harmonized maps. The primary advantage gained by harmonization was improvement in commission error rates relative to the individual disturbance products. Disturbance omission errors were high for both harmonized and individual forest disturbance maps due to underlying limitations in mapping subtle disturbances with Landsat classification algorithms. To enhance the value of the harmonized disturbance products, we used fire perimeter maps to add information on the cause of disturbance.
Rogel-Castillo, Cristian; Boulton, Roger; Opastpongkarn, Arunwong; Huang, Guangwei; Mitchell, Alyson E
2016-07-27
Concealed damage (CD) is defined as a brown discoloration of the kernel interior (nutmeat) that appears only after moderate to high heat treatment (e.g., blanching, drying, roasting, etc.). Raw almonds with CD have no visible defects before heat treatment. Currently, there are no screening methods available for detecting CD in raw almonds. Herein, the feasibility of using near-infrared (NIR) spectroscopy between 1125 and 2153 nm for the detection of CD in almonds is demonstrated. Almond kernels with CD have less NIR absorbance in the region related with oil, protein, and carbohydrates. With the use of partial least squares discriminant analysis (PLS-DA) and selection of specific wavelengths, three classification models were developed. The calibration models have false-positive and false-negative error rates ranging between 12.4 and 16.1% and between 10.6 and 17.2%, respectively. The percent error rates ranged between 8.2 and 9.2%. Second-derivative preprocessing of the selected wavelength resulted in the most robust predictive model.
Harmonization of forest disturbance datasets of the conterminous USA from 1986 to 2011.
Soulard, Christopher E; Acevedo, William; Cohen, Warren B; Yang, Zhiqiang; Stehman, Stephen V; Taylor, Janis L
2017-04-01
Several spatial forest disturbance datasets exist for the conterminous USA. The major problem with forest disturbance mapping is that variability between map products leads to uncertainty regarding the actual rate of disturbance. In this article, harmonized maps were produced from multiple data sources (i.e., Global Forest Change, LANDFIRE Vegetation Disturbance, National Land Cover Database, Vegetation Change Tracker, and Web-Enabled Landsat Data). The harmonization process involved fitting common class ontologies and determining spatial congruency to produce forest disturbance maps for four time intervals (1986-1992, 1992-2001, 2001-2006, and 2006-2011). Pixels mapped as disturbed for two or more datasets were labeled as disturbed in the harmonized maps. The primary advantage gained by harmonization was improvement in commission error rates relative to the individual disturbance products. Disturbance omission errors were high for both harmonized and individual forest disturbance maps due to underlying limitations in mapping subtle disturbances with Landsat classification algorithms. To enhance the value of the harmonized disturbance products, we used fire perimeter maps to add information on the cause of disturbance.
Zeng, Xueqiang; Luo, Gang
2017-12-01
Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Optimal number of features as a function of sample size for various classification rules.
Hua, Jianping; Xiong, Zixiang; Lowey, James; Suh, Edward; Dougherty, Edward R
2005-04-15
Given the joint feature-label distribution, increasing the number of features always results in decreased classification error; however, this is not the case when a classifier is designed via a classification rule from sample data. Typically (but not always), for fixed sample size, the error of a designed classifier decreases and then increases as the number of features grows. The potential downside of using too many features is most critical for small samples, which are commonplace for gene-expression-based classifiers for phenotype discrimination. For fixed sample size and feature-label distribution, the issue is to find an optimal number of features. Since only in rare cases is there a known distribution of the error as a function of the number of features and sample size, this study employs simulation for various feature-label distributions and classification rules, and across a wide range of sample and feature-set sizes. To achieve the desired end, finding the optimal number of features as a function of sample size, it employs massively parallel computation. Seven classifiers are treated: 3-nearest-neighbor, Gaussian kernel, linear support vector machine, polynomial support vector machine, perceptron, regular histogram and linear discriminant analysis. Three Gaussian-based models are considered: linear, nonlinear and bimodal. In addition, real patient data from a large breast-cancer study is considered. To mitigate the combinatorial search for finding optimal feature sets, and to model the situation in which subsets of genes are co-regulated and correlation is internal to these subsets, we assume that the covariance matrix of the features is blocked, with each block corresponding to a group of correlated features. Altogether there are a large number of error surfaces for the many cases. These are provided in full on a companion website, which is meant to serve as resource for those working with small-sample classification. For the companion website, please visit http://public.tgen.org/tamu/ofs/ e-dougherty@ee.tamu.edu.
The Human Factors Analysis and Classification System : HFACS : final report.
DOT National Transportation Integrated Search
2000-02-01
Human error has been implicated in 70 to 80% of all civil and military aviation accidents. Yet, most accident reporting systems are not designed around any theoretical framework of human error. As a result, most accident databases are not conducive t...
A Confidence Paradigm for Classification Systems
2008-09-01
methodology to determine how much confi- dence one should have in a classifier output. This research proposes a framework to determine the level of...theoretical framework that attempts to unite the viewpoints of the classification system developer (or engineer) and the classification system user (or...operating point. An algorithm is developed that minimizes a “confidence” measure called Binned Error in the Posterior ( BEP ). Then, we prove that training a
A False Alarm Reduction Method for a Gas Sensor Based Electronic Nose
Rahman, Mohammad Mizanur; Suksompong, Prapun; Toochinda, Pisanu; Taparugssanagorn, Attaphongse
2017-01-01
Electronic noses (E-Noses) are becoming popular for food and fruit quality assessment due to their robustness and repeated usability without fatigue, unlike human experts. An E-Nose equipped with classification algorithms and having open ended classification boundaries such as the k-nearest neighbor (k-NN), support vector machine (SVM), and multilayer perceptron neural network (MLPNN), are found to suffer from false classification errors of irrelevant odor data. To reduce false classification and misclassification errors, and to improve correct rejection performance; algorithms with a hyperspheric boundary, such as a radial basis function neural network (RBFNN) and generalized regression neural network (GRNN) with a Gaussian activation function in the hidden layer should be used. The simulation results presented in this paper show that GRNN has more correct classification efficiency and false alarm reduction capability compared to RBFNN. As the design of a GRNN and RBFNN is complex and expensive due to large numbers of neuron requirements, a simple hyperspheric classification method based on minimum, maximum, and mean (MMM) values of each class of the training dataset was presented. The MMM algorithm was simple and found to be fast and efficient in correctly classifying data of training classes, and correctly rejecting data of extraneous odors, and thereby reduced false alarms. PMID:28895910
A False Alarm Reduction Method for a Gas Sensor Based Electronic Nose.
Rahman, Mohammad Mizanur; Charoenlarpnopparut, Chalie; Suksompong, Prapun; Toochinda, Pisanu; Taparugssanagorn, Attaphongse
2017-09-12
Electronic noses (E-Noses) are becoming popular for food and fruit quality assessment due to their robustness and repeated usability without fatigue, unlike human experts. An E-Nose equipped with classification algorithms and having open ended classification boundaries such as the k -nearest neighbor ( k -NN), support vector machine (SVM), and multilayer perceptron neural network (MLPNN), are found to suffer from false classification errors of irrelevant odor data. To reduce false classification and misclassification errors, and to improve correct rejection performance; algorithms with a hyperspheric boundary, such as a radial basis function neural network (RBFNN) and generalized regression neural network (GRNN) with a Gaussian activation function in the hidden layer should be used. The simulation results presented in this paper show that GRNN has more correct classification efficiency and false alarm reduction capability compared to RBFNN. As the design of a GRNN and RBFNN is complex and expensive due to large numbers of neuron requirements, a simple hyperspheric classification method based on minimum, maximum, and mean (MMM) values of each class of the training dataset was presented. The MMM algorithm was simple and found to be fast and efficient in correctly classifying data of training classes, and correctly rejecting data of extraneous odors, and thereby reduced false alarms.
Differences in chewing sounds of dry-crisp snacks by multivariate data analysis
NASA Astrophysics Data System (ADS)
De Belie, N.; Sivertsvik, M.; De Baerdemaeker, J.
2003-09-01
Chewing sounds of different types of dry-crisp snacks (two types of potato chips, prawn crackers, cornflakes and low calorie snacks from extruded starch) were analysed to assess differences in sound emission patterns. The emitted sounds were recorded by a microphone placed over the ear canal. The first bite and the first subsequent chew were selected from the time signal and a fast Fourier transformation provided the power spectra. Different multivariate analysis techniques were used for classification of the snack groups. This included principal component analysis (PCA) and unfold partial least-squares (PLS) algorithms, as well as multi-way techniques such as three-way PLS, three-way PCA (Tucker3), and parallel factor analysis (PARAFAC) on the first bite and subsequent chew. The models were evaluated by calculating the classification errors and the root mean square error of prediction (RMSEP) for independent validation sets. It appeared that the logarithm of the power spectra obtained from the chewing sounds could be used successfully to distinguish the different snack groups. When different chewers were used, recalibration of the models was necessary. Multi-way models distinguished better between chewing sounds of different snack groups than PCA on bite or chew separately and than unfold PLS. From all three-way models applied, N-PLS with three components showed the best classification capabilities, resulting in classification errors of 14-18%. The major amount of incorrect classifications was due to one type of potato chips that had a very irregular shape, resulting in a wide variation of the emitted sounds.
Kernel Wiener filter and its application to pattern recognition.
Yoshino, Hirokazu; Dong, Chen; Washizawa, Yoshikazu; Yamashita, Yukihiko
2010-11-01
The Wiener filter (WF) is widely used for inverse problems. From an observed signal, it provides the best estimated signal with respect to the squared error averaged over the original and the observed signals among linear operators. The kernel WF (KWF), extended directly from WF, has a problem that an additive noise has to be handled by samples. Since the computational complexity of kernel methods depends on the number of samples, a huge computational cost is necessary for the case. By using the first-order approximation of kernel functions, we realize KWF that can handle such a noise not by samples but as a random variable. We also propose the error estimation method for kernel filters by using the approximations. In order to show the advantages of the proposed methods, we conducted the experiments to denoise images and estimate errors. We also apply KWF to classification since KWF can provide an approximated result of the maximum a posteriori classifier that provides the best recognition accuracy. The noise term in the criterion can be used for the classification in the presence of noise or a new regularization to suppress changes in the input space, whereas the ordinary regularization for the kernel method suppresses changes in the feature space. In order to show the advantages of the proposed methods, we conducted experiments of binary and multiclass classifications and classification in the presence of noise.
Errors in clinical laboratories or errors in laboratory medicine?
Plebani, Mario
2006-01-01
Laboratory testing is a highly complex process and, although laboratory services are relatively safe, they are not as safe as they could or should be. Clinical laboratories have long focused their attention on quality control methods and quality assessment programs dealing with analytical aspects of testing. However, a growing body of evidence accumulated in recent decades demonstrates that quality in clinical laboratories cannot be assured by merely focusing on purely analytical aspects. The more recent surveys on errors in laboratory medicine conclude that in the delivery of laboratory testing, mistakes occur more frequently before (pre-analytical) and after (post-analytical) the test has been performed. Most errors are due to pre-analytical factors (46-68.2% of total errors), while a high error rate (18.5-47% of total errors) has also been found in the post-analytical phase. Errors due to analytical problems have been significantly reduced over time, but there is evidence that, particularly for immunoassays, interference may have a serious impact on patients. A description of the most frequent and risky pre-, intra- and post-analytical errors and advice on practical steps for measuring and reducing the risk of errors is therefore given in the present paper. Many mistakes in the Total Testing Process are called "laboratory errors", although these may be due to poor communication, action taken by others involved in the testing process (e.g., physicians, nurses and phlebotomists), or poorly designed processes, all of which are beyond the laboratory's control. Likewise, there is evidence that laboratory information is only partially utilized. A recent document from the International Organization for Standardization (ISO) recommends a new, broader definition of the term "laboratory error" and a classification of errors according to different criteria. In a modern approach to total quality, centered on patients' needs and satisfaction, the risk of errors and mistakes in pre- and post-examination steps must be minimized to guarantee the total quality of laboratory services.
Context-sensitive extraction of tree crown objects in urban areas using VHR satellite images
NASA Astrophysics Data System (ADS)
Ardila, Juan P.; Bijker, Wietske; Tolpekin, Valentyn A.; Stein, Alfred
2012-04-01
Municipalities need accurate and updated inventories of urban vegetation in order to manage green resources and estimate their return on investment in urban forestry activities. Earlier studies have shown that semi-automatic tree detection using remote sensing is a challenging task. This study aims to develop a reproducible geographic object-based image analysis (GEOBIA) methodology to locate and delineate tree crowns in urban areas using high resolution imagery. We propose a GEOBIA approach that considers the spectral, spatial and contextual characteristics of tree objects in the urban space. The study presents classification rules that exploit object features at multiple segmentation scales modifying the labeling and shape of image-objects. The GEOBIA methodology was implemented on QuickBird images acquired over the cities of Enschede and Delft (The Netherlands), resulting in an identification rate of 70% and 82% respectively. False negative errors concentrated on small trees and false positive errors in private gardens. The quality of crown boundaries was acceptable, with an overall delineation error <0.24 outside of gardens and backyards.
Error detection and reduction in blood banking.
Motschman, T L; Moore, S B
1996-12-01
Error management plays a major role in facility process improvement efforts. By detecting and reducing errors, quality and, therefore, patient care improve. It begins with a strong organizational foundation of management attitude with clear, consistent employee direction and appropriate physical facilities. Clearly defined critical processes, critical activities, and SOPs act as the framework for operations as well as active quality monitoring. To assure that personnel can detect an report errors they must be trained in both operational duties and error management practices. Use of simulated/intentional errors and incorporation of error detection into competency assessment keeps employees practiced, confident, and diminishes fear of the unknown. Personnel can clearly see that errors are indeed used as opportunities for process improvement and not for punishment. The facility must have a clearly defined and consistently used definition for reportable errors. Reportable errors should include those errors with potentially harmful outcomes as well as those errors that are "upstream," and thus further away from the outcome. A well-written error report consists of who, what, when, where, why/how, and follow-up to the error. Before correction can occur, an investigation to determine the underlying cause of the error should be undertaken. Obviously, the best corrective action is prevention. Correction can occur at five different levels; however, only three of these levels are directed at prevention. Prevention requires a method to collect and analyze data concerning errors. In the authors' facility a functional error classification method and a quality system-based classification have been useful. An active method to search for problems uncovers them further upstream, before they can have disastrous outcomes. In the continual quest for improving processes, an error management program is itself a process that needs improvement, and we must strive to always close the circle of quality assurance. Ultimately, the goal of better patient care will be the reward.
Jeyasingh, Suganthi; Veluchamy, Malathi
2017-05-01
Early diagnosis of breast cancer is essential to save lives of patients. Usually, medical datasets include a large variety of data that can lead to confusion during diagnosis. The Knowledge Discovery on Database (KDD) process helps to improve efficiency. It requires elimination of inappropriate and repeated data from the dataset before final diagnosis. This can be done using any of the feature selection algorithms available in data mining. Feature selection is considered as a vital step to increase the classification accuracy. This paper proposes a Modified Bat Algorithm (MBA) for feature selection to eliminate irrelevant features from an original dataset. The Bat algorithm was modified using simple random sampling to select the random instances from the dataset. Ranking was with the global best features to recognize the predominant features available in the dataset. The selected features are used to train a Random Forest (RF) classification algorithm. The MBA feature selection algorithm enhanced the classification accuracy of RF in identifying the occurrence of breast cancer. The Wisconsin Diagnosis Breast Cancer Dataset (WDBC) was used for estimating the performance analysis of the proposed MBA feature selection algorithm. The proposed algorithm achieved better performance in terms of Kappa statistic, Mathew’s Correlation Coefficient, Precision, F-measure, Recall, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE) and Root Relative Squared Error (RRSE). Creative Commons Attribution License
Regulation of IAP (Inhibitor of Apoptosis) Gene Expression by the p53 Tumor Suppressor Protein
2005-05-01
adenovirus, gene therapy, polymorphism, 31 16. PRICE CODE 17. SECURITY CLASSIFICATION 18. SECURITY CLASSIFICATION 19. SECURITY CLASSIFICATION 20...averaged results of three inde- pendent experiments, with standard error. Right panel: Level of p53 in infected cells using the antibody Ab-6 (Calbiochem...with highly purified mitochondria as described in (2). The arrow marks oligomerized BAK. The right _ -. panel depicts the purity of BMH CrosIinked Mito
2010-03-15
Swiss cheese model of human error causation. ................................................................... 3 2. Results for the classification of...based on Reason’s “ Swiss cheese ” model of human error (1990). Figure 1 describes how an accident is likely to occur when all of the errors, or “holes...align. A detailed description of HFACS can be found in Wiegmann and Shappell (2003). Figure 1. The Swiss cheese model of human error
Death Certification Errors and the Effect on Mortality Statistics.
McGivern, Lauri; Shulman, Leanne; Carney, Jan K; Shapiro, Steven; Bundock, Elizabeth
Errors in cause and manner of death on death certificates are common and affect families, mortality statistics, and public health research. The primary objective of this study was to characterize errors in the cause and manner of death on death certificates completed by non-Medical Examiners. A secondary objective was to determine the effects of errors on national mortality statistics. We retrospectively compared 601 death certificates completed between July 1, 2015, and January 31, 2016, from the Vermont Electronic Death Registration System with clinical summaries from medical records. Medical Examiners, blinded to original certificates, reviewed summaries, generated mock certificates, and compared mock certificates with original certificates. They then graded errors using a scale from 1 to 4 (higher numbers indicated increased impact on interpretation of the cause) to determine the prevalence of minor and major errors. They also compared International Classification of Diseases, 10th Revision (ICD-10) codes on original certificates with those on mock certificates. Of 601 original death certificates, 319 (53%) had errors; 305 (51%) had major errors; and 59 (10%) had minor errors. We found no significant differences by certifier type (physician vs nonphysician). We did find significant differences in major errors in place of death ( P < .001). Certificates for deaths occurring in hospitals were more likely to have major errors than certificates for deaths occurring at a private residence (59% vs 39%, P < .001). A total of 580 (93%) death certificates had a change in ICD-10 codes between the original and mock certificates, of which 348 (60%) had a change in the underlying cause-of-death code. Error rates on death certificates in Vermont are high and extend to ICD-10 coding, thereby affecting national mortality statistics. Surveillance and certifier education must expand beyond local and state efforts. Simplifying and standardizing underlying literal text for cause of death may improve accuracy, decrease coding errors, and improve national mortality statistics.
Superiority of artificial neural networks for a genetic classification procedure.
Sant'Anna, I C; Tomaz, R S; Silva, G N; Nascimento, M; Bhering, L L; Cruz, C D
2015-08-19
The correct classification of individuals is extremely important for the preservation of genetic variability and for maximization of yield in breeding programs using phenotypic traits and genetic markers. The Fisher and Anderson discriminant functions are commonly used multivariate statistical techniques for these situations, which allow for the allocation of an initially unknown individual to predefined groups. However, for higher levels of similarity, such as those found in backcrossed populations, these methods have proven to be inefficient. Recently, much research has been devoted to developing a new paradigm of computing known as artificial neural networks (ANNs), which can be used to solve many statistical problems, including classification problems. The aim of this study was to evaluate the feasibility of ANNs as an evaluation technique of genetic diversity by comparing their performance with that of traditional methods. The discriminant functions were equally ineffective in discriminating the populations, with error rates of 23-82%, thereby preventing the correct discrimination of individuals between populations. The ANN was effective in classifying populations with low and high differentiation, such as those derived from a genetic design established from backcrosses, even in cases of low differentiation of the data sets. The ANN appears to be a promising technique to solve classification problems, since the number of individuals classified incorrectly by the ANN was always lower than that of the discriminant functions. We envisage the potential relevant application of this improved procedure in the genomic classification of markers to distinguish between breeds and accessions.
Wendel, Jochen; Buttenfield, Barbara P.; Stanislawski, Larry V.
2016-01-01
Knowledge of landscape type can inform cartographic generalization of hydrographic features, because landscape characteristics provide an important geographic context that affects variation in channel geometry, flow pattern, and network configuration. Landscape types are characterized by expansive spatial gradients, lacking abrupt changes between adjacent classes; and as having a limited number of outliers that might confound classification. The US Geological Survey (USGS) is exploring methods to automate generalization of features in the National Hydrography Data set (NHD), to associate specific sequences of processing operations and parameters with specific landscape characteristics, thus obviating manual selection of a unique processing strategy for every NHD watershed unit. A chronology of methods to delineate physiographic regions for the United States is described, including a recent maximum likelihood classification based on seven input variables. This research compares unsupervised and supervised algorithms applied to these seven input variables, to evaluate and possibly refine the recent classification. Evaluation metrics for unsupervised methods include the Davies–Bouldin index, the Silhouette index, and the Dunn index as well as quantization and topographic error metrics. Cross validation and misclassification rate analysis are used to evaluate supervised classification methods. The paper reports the comparative analysis and its impact on the selection of landscape regions. The compared solutions show problems in areas of high landscape diversity. There is some indication that additional input variables, additional classes, or more sophisticated methods can refine the existing classification.
NASA Astrophysics Data System (ADS)
Knoefel, Patrick; Loew, Fabian; Conrad, Christopher
2015-04-01
Crop maps based on classification of remotely sensed data are of increased attendance in agricultural management. This induces a more detailed knowledge about the reliability of such spatial information. However, classification of agricultural land use is often limited by high spectral similarities of the studied crop types. More, spatially and temporally varying agro-ecological conditions can introduce confusion in crop mapping. Classification errors in crop maps in turn may have influence on model outputs, like agricultural production monitoring. One major goal of the PhenoS project ("Phenological structuring to determine optimal acquisition dates for Sentinel-2 data for field crop classification"), is the detection of optimal phenological time windows for land cover classification purposes. Since many crop species are spectrally highly similar, accurate classification requires the right selection of satellite images for a certain classification task. In the course of one growing season, phenological phases exist where crops are separable with higher accuracies. For this purpose, coupling of multi-temporal spectral characteristics and phenological events is promising. The focus of this study is set on the separation of spectrally similar cereal crops like winter wheat, barley, and rye of two test sites in Germany called "Harz/Central German Lowland" and "Demmin". However, this study uses object based random forest (RF) classification to investigate the impact of image acquisition frequency and timing on crop classification uncertainty by permuting all possible combinations of available RapidEye time series recorded on the test sites between 2010 and 2014. The permutations were applied to different segmentation parameters. Then, classification uncertainty was assessed and analysed, based on the probabilistic soft-output from the RF algorithm at the per-field basis. From this soft output, entropy was calculated as a spatial measure of classification uncertainty. The results indicate that uncertainty estimates provide a valuable addition to traditional accuracy assessments and helps the user to allocate error in crop maps.
A new classification of glaucomas
Bordeianu, Constantin-Dan
2014-01-01
Purpose To suggest a new glaucoma classification that is pathogenic, etiologic, and clinical. Methods After discussing the logical pathway used in criteria selection, the paper presents the new classification and compares it with the classification currently in use, that is, the one issued by the European Glaucoma Society in 2008. Results The paper proves that the new classification is clear (being based on a coherent and consistently followed set of criteria), is comprehensive (framing all forms of glaucoma), and helps in understanding the sickness understanding (in that it uses a logical framing system). The great advantage is that it facilitates therapeutic decision making in that it offers direct therapeutic suggestions and avoids errors leading to disasters. Moreover, the scheme remains open to any new development. Conclusion The suggested classification is a pathogenic, etiologic, and clinical classification that fulfills the conditions of an ideal classification. The suggested classification is the first classification in which the main criterion is consistently used for the first 5 to 7 crossings until its differentiation capabilities are exhausted. Then, secondary criteria (etiologic and clinical) pick up the relay until each form finds its logical place in the scheme. In order to avoid unclear aspects, the genetic criterion is no longer used, being replaced by age, one of the clinical criteria. The suggested classification brings only benefits to all categories of ophthalmologists: the beginners will have a tool to better understand the sickness and to ease their decision making, whereas the experienced doctors will have their practice simplified. For all doctors, errors leading to therapeutic disasters will be less likely to happen. Finally, researchers will have the object of their work gathered in the group of glaucoma with unknown or uncertain pathogenesis, whereas the results of their work will easily find a logical place in the scheme, as the suggested classification remains open to any new development. PMID:25246759
Xiao, Bo; Imel, Zac E; Georgiou, Panayiotis G; Atkins, David C; Narayanan, Shrikanth S
2015-01-01
The technology for evaluating patient-provider interactions in psychotherapy-observational coding-has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies.
Detection and Classification of Whale Acoustic Signals
NASA Astrophysics Data System (ADS)
Xian, Yin
This dissertation focuses on two vital challenges in relation to whale acoustic signals: detection and classification. In detection, we evaluated the influence of the uncertain ocean environment on the spectrogram-based detector, and derived the likelihood ratio of the proposed Short Time Fourier Transform detector. Experimental results showed that the proposed detector outperforms detectors based on the spectrogram. The proposed detector is more sensitive to environmental changes because it includes phase information. In classification, our focus is on finding a robust and sparse representation of whale vocalizations. Because whale vocalizations can be modeled as polynomial phase signals, we can represent the whale calls by their polynomial phase coefficients. In this dissertation, we used the Weyl transform to capture chirp rate information, and used a two dimensional feature set to represent whale vocalizations globally. Experimental results showed that our Weyl feature set outperforms chirplet coefficients and MFCC (Mel Frequency Cepstral Coefficients) when applied to our collected data. Since whale vocalizations can be represented by polynomial phase coefficients, it is plausible that the signals lie on a manifold parameterized by these coefficients. We also studied the intrinsic structure of high dimensional whale data by exploiting its geometry. Experimental results showed that nonlinear mappings such as Laplacian Eigenmap and ISOMAP outperform linear mappings such as PCA and MDS, suggesting that the whale acoustic data is nonlinear. We also explored deep learning algorithms on whale acoustic data. We built each layer as convolutions with either a PCA filter bank (PCANet) or a DCT filter bank (DCTNet). With the DCT filter bank, each layer has different a time-frequency scale representation, and from this, one can extract different physical information. Experimental results showed that our PCANet and DCTNet achieve high classification rate on the whale vocalization data set. The word error rate of the DCTNet feature is similar to the MFSC in speech recognition tasks, suggesting that the convolutional network is able to reveal acoustic content of speech signals.
Optimizing pattern recognition-based control for partial-hand prosthesis application.
Earley, Eric J; Adewuyi, Adenike A; Hargrove, Levi J
2014-01-01
Partial-hand amputees often retain good residual wrist motion, which is essential for functional activities involving use of the hand. Thus, a crucial design criterion for a myoelectric, partial-hand prosthesis control scheme is that it allows the user to retain residual wrist motion. Pattern recognition (PR) of electromyographic (EMG) signals is a well-studied method of controlling myoelectric prostheses. However, wrist motion degrades a PR system's ability to correctly predict hand-grasp patterns. We studied the effects of (1) window length and number of hand-grasps, (2) static and dynamic wrist motion, and (3) EMG muscle source on the ability of a PR-based control scheme to classify functional hand-grasp patterns. Our results show that training PR classifiers with both extrinsic and intrinsic muscle EMG yields a lower error rate than training with either group by itself (p<0.001); and that training in only variable wrist positions, with only dynamic wrist movements, or with both variable wrist positions and movements results in lower error rates than training in only the neutral wrist position (p<0.001). Finally, our results show that both an increase in window length and a decrease in the number of grasps available to the classifier significantly decrease classification error (p<0.001). These results remained consistent whether the classifier selected or maintained a hand-grasp.
NASA Technical Reports Server (NTRS)
1984-01-01
Rectifications of multispectral scanner and thematic mapper data sets for full and subscene areas, analyses of planimetric errors, assessments of the number and distribution of ground control points required to minimize errors, and factors contributing to error residual are examined. Other investigations include the generation of three dimensional terrain models and the effects of spatial resolution on digital classification accuracies.
Olives, Casey; Pagano, Marcello; Deitchler, Megan; Hedt, Bethany L; Egge, Kari; Valadez, Joseph J
2009-04-01
Traditional lot quality assurance sampling (LQAS) methods require simple random sampling to guarantee valid results. However, cluster sampling has been proposed to reduce the number of random starting points. This study uses simulations to examine the classification error of two such designs, a 67x3 (67 clusters of three observations) and a 33x6 (33 clusters of six observations) sampling scheme to assess the prevalence of global acute malnutrition (GAM). Further, we explore the use of a 67x3 sequential sampling scheme for LQAS classification of GAM prevalence. Results indicate that, for independent clusters with moderate intracluster correlation for the GAM outcome, the three sampling designs maintain approximate validity for LQAS analysis. Sequential sampling can substantially reduce the average sample size that is required for data collection. The presence of intercluster correlation can impact dramatically the classification error that is associated with LQAS analysis.
Influence of ECG measurement accuracy on ECG diagnostic statements.
Zywietz, C; Celikag, D; Joseph, G
1996-01-01
Computer analysis of electrocardiograms (ECGs) provides a large amount of ECG measurement data, which may be used for diagnostic classification and storage in ECG databases. Until now, neither error limits for ECG measurements have been specified nor has their influence on diagnostic statements been systematically investigated. An analytical method is presented to estimate the influence of measurement errors on the accuracy of diagnostic ECG statements. Systematic (offset) errors will usually result in an increase of false positive or false negative statements since they cause a shift of the working point on the receiver operating characteristics curve. Measurement error dispersion broadens the distribution function of discriminative measurement parameters and, therefore, usually increases the overlap between discriminative parameters. This results in a flattening of the receiver operating characteristics curve and an increase of false positive and false negative classifications. The method developed has been applied to ECG conduction defect diagnoses by using the proposed International Electrotechnical Commission's interval measurement tolerance limits. These limits appear too large because more than 30% of false positive atrial conduction defect statements and 10-18% of false intraventricular conduction defect statements could be expected due to tolerated measurement errors. To assure long-term usability of ECG measurement databases, it is recommended that systems provide its error tolerance limits obtained on a defined test set.
NASA Astrophysics Data System (ADS)
Benaouda, D.; Wadge, G.; Whitmarsh, R. B.; Rothwell, R. G.; MacLeod, C.
1999-02-01
In boreholes with partial or no core recovery, interpretations of lithology in the remainder of the hole are routinely attempted using data from downhole geophysical sensors. We present a practical neural net-based technique that greatly enhances lithological interpretation in holes with partial core recovery by using downhole data to train classifiers to give a global classification scheme for those parts of the borehole for which no core was retrieved. We describe the system and its underlying methods of data exploration, selection and classification, and present a typical example of the system in use. Although the technique is equally applicable to oil industry boreholes, we apply it here to an Ocean Drilling Program (ODP) borehole (Hole 792E, Izu-Bonin forearc, a mixture of volcaniclastic sandstones, conglomerates and claystones). The quantitative benefits of quality-control measures and different subsampling strategies are shown. Direct comparisons between a number of discriminant analysis methods and the use of neural networks with back-propagation of error are presented. The neural networks perform better than the discriminant analysis techniques both in terms of performance rates with test data sets (2-3 per cent better) and in qualitative correlation with non-depth-matched core. We illustrate with the Hole 792E data how vital it is to have a system that permits the number and membership of training classes to be changed as analysis proceeds. The initial classification for Hole 792E evolved from a five-class to a three-class and then to a four-class scheme with resultant classification performance rates for the back-propagation neural network method of 83, 84 and 93 per cent respectively.
Aspen, climate, and sudden decline in western USA
Gerald E. Rehfeldt; Dennis E. Ferguson; Nicholas L. Crookston
2009-01-01
A bioclimate model predicting the presence or absence of aspen, Populus tremuloides, in western USA from climate variables was developed by using the Random Forests classification tree on Forest Inventory data from about 118,000 permanent sample plots. A reasonably parsimonious model used eight predictors to describe aspen's climate profile. Classification errors...
Multi-template tensor-based morphometry: Application to analysis of Alzheimer's disease
Koikkalainen, Juha; Lötjönen, Jyrki; Thurfjell, Lennart; Rueckert, Daniel; Waldemar, Gunhild; Soininen, Hilkka
2012-01-01
In this paper methods for using multiple templates in tensor-based morphometry (TBM) are presented and comparedtothe conventional single-template approach. TBM analysis requires non-rigid registrations which are often subject to registration errors. When using multiple templates and, therefore, multiple registrations, it can be assumed that the registration errors are averaged and eventually compensated. Four different methods are proposed for multi-template TBM. The methods were evaluated using magnetic resonance (MR) images of healthy controls, patients with stable or progressive mild cognitive impairment (MCI), and patients with Alzheimer's disease (AD) from the ADNI database (N=772). The performance of TBM features in classifying images was evaluated both quantitatively and qualitatively. Classification results show that the multi-template methods are statistically significantly better than the single-template method. The overall classification accuracy was 86.0% for the classification of control and AD subjects, and 72.1%for the classification of stable and progressive MCI subjects. The statistical group-level difference maps produced using multi-template TBM were smoother, formed larger continuous regions, and had larger t-values than the maps obtained with single-template TBM. PMID:21419228
Predicting alpine headwater stream intermittency: a case study in the northern Rocky Mountains
Sando, Thomas R.; Blasch, Kyle W.
2015-01-01
This investigation used climatic, geological, and environmental data coupled with observational stream intermittency data to predict alpine headwater stream intermittency. Prediction was made using a random forest classification model. Results showed that the most important variables in the prediction model were snowpack persistence, represented by average snow extent from March through July, mean annual mean monthly minimum temperature, and surface geology types. For stream catchments with intermittent headwater streams, snowpack, on average, persisted until early June, whereas for stream catchments with perennial headwater streams, snowpack, on average, persisted until early July. Additionally, on average, stream catchments with intermittent headwater streams were about 0.7 °C warmer than stream catchments with perennial headwater streams. Finally, headwater stream catchments primarily underlain by coarse, permeable sediment are significantly more likely to have intermittent headwater streams than those primarily underlain by impermeable bedrock. Comparison of the predicted streamflow classification with observed stream status indicated a four percent classification error for first-order streams and a 21 percent classification error for all stream orders in the study area.
Using failure mode and effects analysis to improve the safety of neonatal parenteral nutrition.
Arenas Villafranca, Jose Javier; Gómez Sánchez, Araceli; Nieto Guindo, Miriam; Faus Felipe, Vicente
2014-07-15
Failure mode and effects analysis (FMEA) was used to identify potential errors and to enable the implementation of measures to improve the safety of neonatal parenteral nutrition (PN). FMEA was used to analyze the preparation and dispensing of neonatal PN from the perspective of the pharmacy service in a general hospital. A process diagram was drafted, illustrating the different phases of the neonatal PN process. Next, the failures that could occur in each of these phases were compiled and cataloged, and a questionnaire was developed in which respondents were asked to rate the following aspects of each error: incidence, detectability, and severity. The highest scoring failures were considered high risk and identified as priority areas for improvements to be made. The evaluation process detected a total of 82 possible failures. Among the phases with the highest number of possible errors were transcription of the medical order, formulation of the PN, and preparation of material for the formulation. After the classification of these 82 possible failures and of their relative importance, a checklist was developed to achieve greater control in the error-detection process. FMEA demonstrated that use of the checklist reduced the level of risk and improved the detectability of errors. FMEA was useful for detecting medication errors in the PN preparation process and enabling corrective measures to be taken. A checklist was developed to reduce errors in the most critical aspects of the process. Copyright © 2014 by the American Society of Health-System Pharmacists, Inc. All rights reserved.
A framework for software fault tolerance in real-time systems
NASA Technical Reports Server (NTRS)
Anderson, T.; Knight, J. C.
1983-01-01
A classification scheme for errors and a technique for the provision of software fault tolerance in cyclic real-time systems is presented. The technique requires that the process structure of a system be represented by a synchronization graph which is used by an executive as a specification of the relative times at which they will communicate during execution. Communication between concurrent processes is severely limited and may only take place between processes engaged in an exchange. A history of error occurrences is maintained by an error handler. When an error is detected, the error handler classifies it using the error history information and then initiates appropriate recovery action.
A Robust Unified Approach to Analyzing Methylation and Gene Expression Data
Khalili, Abbas; Huang, Tim; Lin, Shili
2009-01-01
Microarray technology has made it possible to investigate expression levels, and more recently methylation signatures, of thousands of genes simultaneously, in a biological sample. Since more and more data from different biological systems or technological platforms are being generated at an incredible rate, there is an increasing need to develop statistical methods that are applicable to multiple data types and platforms. Motivated by such a need, a flexible finite mixture model that is applicable to methylation, gene expression, and potentially data from other biological systems, is proposed. Two major thrusts of this approach are to allow for a variable number of components in the mixture to capture non-biological variation and small biases, and to use a robust procedure for parameter estimation and probe classification. The method was applied to the analysis of methylation signatures of three breast cancer cell lines. It was also tested on three sets of expression microarray data to study its power and type I error rates. Comparison with a number of existing methods in the literature yielded very encouraging results; lower type I error rates and comparable/better power were achieved based on the limited study. Furthermore, the method also leads to more biologically interpretable results for the three breast cancer cell lines. PMID:20161265
Connors, B M; Cooper, A B
2014-12-01
Categorization of the status of populations, species, and ecosystems underpins most conservation activities. Status is often based on how a system's current indicator value (e.g., change in abundance) relates to some threshold of conservation concern. Receiver operating characteristic (ROC) curves can be used to quantify the statistical reliability of indicators of conservation status and evaluate trade-offs between correct (true positive) and incorrect (false positive) classifications across a range of decision thresholds. However, ROC curves assume a discrete, binary relationship between an indicator and the conservation status it is meant to track, which is a simplification of the more realistic continuum of conservation status, and may limit the applicability of ROC curves in conservation science. We describe a modified ROC curve that treats conservation status as a continuum rather than a discrete state. We explored the influence of this continuum and typical sources of variation in abundance that can lead to classification errors (i.e., random variation and measurement error) on the true and false positive rates corresponding to varying decision thresholds and the reliability of change in abundance as an indicator of conservation status, respectively. We applied our modified ROC approach to an indicator of endangerment in Pacific salmon (Oncorhynchus nerka) (i.e., percent decline in geometric mean abundance) and an indicator of marine ecosystem structure and function (i.e., detritivore biomass). Failure to treat conservation status as a continuum when choosing thresholds for indicators resulted in the misidentification of trade-offs between true and false positive rates and the overestimation of an indicator's reliability. We argue for treating conservation status as a continuum when ROC curves are used to evaluate decision thresholds in indicators for the assessment of conservation status. © 2014 Society for Conservation Biology.
Robust through-the-wall radar image classification using a target-model alignment procedure.
Smith, Graeme E; Mobasseri, Bijan G
2012-02-01
A through-the-wall radar image (TWRI) bears little resemblance to the equivalent optical image, making it difficult to interpret. To maximize the intelligence that may be obtained, it is desirable to automate the classification of targets in the image to support human operators. This paper presents a technique for classifying stationary targets based on the high-range resolution profile (HRRP) extracted from 3-D TWRIs. The dependence of the image on the target location is discussed using a system point spread function (PSF) approach. It is shown that the position dependence will cause a classifier to fail, unless the image to be classified is aligned to a classifier-training location. A target image alignment technique based on deconvolution of the image with the system PSF is proposed. Comparison of the aligned target images with measured images shows the alignment process introducing normalized mean squared error (NMSE) ≤ 9%. The HRRP extracted from aligned target images are classified using a naive Bayesian classifier supported by principal component analysis. The classifier is tested using a real TWRI of canonical targets behind a concrete wall and shown to obtain correct classification rates ≥ 97%. © 2011 IEEE
Breathing Pattern Interpretation as an Alternative and Effective Voice Communication Solution.
Elsahar, Yasmin; Bouazza-Marouf, Kaddour; Kerr, David; Gaur, Atul; Kaushik, Vipul; Hu, Sijung
2018-05-15
Augmentative and alternative communication (AAC) systems tend to rely on the interpretation of purposeful gestures for interaction. Existing AAC methods could be cumbersome and limit the solutions in terms of versatility. The study aims to interpret breathing patterns (BPs) to converse with the outside world by means of a unidirectional microphone and researches breathing-pattern interpretation (BPI) to encode messages in an interactive manner with minimal training. We present BP processing work with (1) output synthesized machine-spoken words (SMSW) along with single-channel Weiner filtering (WF) for signal de-noising, and (2) k -nearest neighbor ( k-NN ) classification of BPs associated with embedded dynamic time warping (DTW). An approved protocol to collect analogue modulated BP sets belonging to 4 distinct classes with 10 training BPs per class and 5 live BPs per class was implemented with 23 healthy subjects. An 86% accuracy of k-NN classification was obtained with decreasing error rates of 17%, 14%, and 11% for the live classifications of classes 2, 3, and 4, respectively. The results express a systematic reliability of 89% with increased familiarity. The outcomes from the current AAC setup recommend a durable engineering solution directly beneficial to the sufferers.
Can single classifiers be as useful as model ensembles to produce benthic seabed substratum maps?
NASA Astrophysics Data System (ADS)
Turner, Joseph A.; Babcock, Russell C.; Hovey, Renae; Kendrick, Gary A.
2018-05-01
Numerous machine-learning classifiers are available for benthic habitat map production, which can lead to different results. This study highlights the performance of the Random Forest (RF) classifier, which was significantly better than Classification Trees (CT), Naïve Bayes (NB), and a multi-model ensemble in terms of overall accuracy, Balanced Error Rate (BER), Kappa, and area under the curve (AUC) values. RF accuracy was often higher than 90% for each substratum class, even at the most detailed level of the substratum classification and AUC values also indicated excellent performance (0.8-1). Total agreement between classifiers was high at the broadest level of classification (75-80%) when differentiating between hard and soft substratum. However, this sharply declined as the number of substratum categories increased (19-45%) including a mix of rock, gravel, pebbles, and sand. The model ensemble, produced from the results of all three classifiers by majority voting, did not show any increase in predictive performance when compared to the single RF classifier. This study shows how a single classifier may be sufficient to produce benthic seabed maps and model ensembles of multiple classifiers.
Classification of DNA nucleotides with transverse tunneling currents
NASA Astrophysics Data System (ADS)
Nyvold Pedersen, Jonas; Boynton, Paul; Di Ventra, Massimiliano; Jauho, Antti-Pekka; Flyvbjerg, Henrik
2017-01-01
It has been theoretically suggested and experimentally demonstrated that fast and low-cost sequencing of DNA, RNA, and peptide molecules might be achieved by passing such molecules between electrodes embedded in a nanochannel. The experimental realization of this scheme faces major challenges, however. In realistic liquid environments, typical currents in tunneling devices are of the order of picoamps. This corresponds to only six electrons per microsecond, and this number affects the integration time required to do current measurements in real experiments. This limits the speed of sequencing, though current fluctuations due to Brownian motion of the molecule average out during the required integration time. Moreover, data acquisition equipment introduces noise, and electronic filters create correlations in time-series data. We discuss how these effects must be included in the analysis of, e.g., the assignment of specific nucleobases to current signals. As the signals from different molecules overlap, unambiguous classification is impossible with a single measurement. We argue that the assignment of molecules to a signal is a standard pattern classification problem and calculation of the error rates is straightforward. The ideas presented here can be extended to other sequencing approaches of current interest.
Quantifying uncertainty in carbon and nutrient pools of coarse woody debris
NASA Astrophysics Data System (ADS)
See, C. R.; Campbell, J. L.; Fraver, S.; Domke, G. M.; Harmon, M. E.; Knoepp, J. D.; Woodall, C. W.
2016-12-01
Woody detritus constitutes a major pool of both carbon and nutrients in forested ecosystems. Estimating coarse wood stocks relies on many assumptions, even when full surveys are conducted. Researchers rarely report error in coarse wood pool estimates, despite the importance to ecosystem budgets and modelling efforts. To date, no study has attempted a comprehensive assessment of error rates and uncertainty inherent in the estimation of this pool. Here, we use Monte Carlo analysis to propagate the error associated with the major sources of uncertainty present in the calculation of coarse wood carbon and nutrient (i.e., N, P, K, Ca, Mg, Na) pools. We also evaluate individual sources of error to identify the importance of each source of uncertainty in our estimates. We quantify sampling error by comparing the three most common field methods used to survey coarse wood (two transect methods and a whole-plot survey). We quantify the measurement error associated with length and diameter measurement, and technician error in species identification and decay class using plots surveyed by multiple technicians. We use previously published values of model error for the four most common methods of volume estimation: Smalian's, conical frustum, conic paraboloid, and average-of-ends. We also use previously published values for error in the collapse ratio (cross-sectional height/width) of decayed logs that serves as a surrogate for the volume remaining. We consider sampling error in chemical concentration and density for all decay classes, using distributions from both published and unpublished studies. Analytical uncertainty is calculated using standard reference plant material from the National Institute of Standards. Our results suggest that technician error in decay classification can have a large effect on uncertainty, since many of the error distributions included in the calculation (e.g. density, chemical concentration, volume-model selection, collapse ratio) are decay-class specific.
Software platform for managing the classification of error- related potentials of observers
NASA Astrophysics Data System (ADS)
Asvestas, P.; Ventouras, E.-C.; Kostopoulos, S.; Sidiropoulos, K.; Korfiatis, V.; Korda, A.; Uzunolglu, A.; Karanasiou, I.; Kalatzis, I.; Matsopoulos, G.
2015-09-01
Human learning is partly based on observation. Electroencephalographic recordings of subjects who perform acts (actors) or observe actors (observers), contain a negative waveform in the Evoked Potentials (EPs) of the actors that commit errors and of observers who observe the error-committing actors. This waveform is called the Error-Related Negativity (ERN). Its detection has applications in the context of Brain-Computer Interfaces. The present work describes a software system developed for managing EPs of observers, with the aim of classifying them into observations of either correct or incorrect actions. It consists of an integrated platform for the storage, management, processing and classification of EPs recorded during error-observation experiments. The system was developed using C# and the following development tools and frameworks: MySQL, .NET Framework, Entity Framework and Emgu CV, for interfacing with the machine learning library of OpenCV. Up to six features can be computed per EP recording per electrode. The user can select among various feature selection algorithms and then proceed to train one of three types of classifiers: Artificial Neural Networks, Support Vector Machines, k-nearest neighbour. Next the classifier can be used for classifying any EP curve that has been inputted to the database.
Wang, L; Qin, X C; Lin, H C; Deng, K F; Luo, Y W; Sun, Q R; Du, Q X; Wang, Z Y; Tuo, Y; Sun, J H
2018-02-01
To analyse the relationship between Fourier transform infrared (FTIR) spectrum of rat's spleen tissue and postmortem interval (PMI) for PMI estimation using FTIR spectroscopy combined with data mining method. Rats were sacrificed by cervical dislocation, and the cadavers were placed at 20 ℃. The FTIR spectrum data of rats' spleen tissues were taken and measured at different time points. After pretreatment, the data was analysed by data mining method. The absorption peak intensity of rat's spleen tissue spectrum changed with the PMI, while the absorption peak position was unchanged. The results of principal component analysis (PCA) showed that the cumulative contribution rate of the first three principal components was 96%. There was an obvious clustering tendency for the spectrum sample at each time point. The methods of partial least squares discriminant analysis (PLS-DA) and support vector machine classification (SVMC) effectively divided the spectrum samples with different PMI into four categories (0-24 h, 48-72 h, 96-120 h and 144-168 h). The determination coefficient ( R ²) of the PMI estimation model established by PLS regression analysis was 0.96, and the root mean square error of calibration (RMSEC) and root mean square error of cross validation (RMSECV) were 9.90 h and 11.39 h respectively. In prediction set, the R ² was 0.97, and the root mean square error of prediction (RMSEP) was 10.49 h. The FTIR spectrum of the rat's spleen tissue can be effectively analyzed qualitatively and quantitatively by the combination of FTIR spectroscopy and data mining method, and the classification and PLS regression models can be established for PMI estimation. Copyright© by the Editorial Department of Journal of Forensic Medicine.
Development of the ICD-10 simplified version and field test.
Paoin, Wansa; Yuenyongsuwan, Maliwan; Yokobori, Yukiko; Endo, Hiroyoshi; Kim, Sukil
2018-05-01
The International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10) has been used in various Asia-Pacific countries for more than 20 years. Although ICD-10 is a powerful tool, clinical coding processes are complex; therefore, many developing countries have not been able to implement ICD-10-based health statistics (WHO-FIC APN, 2007). This study aimed to simplify ICD-10 clinical coding processes, to modify index terms to facilitate computer searching and to provide a simplified version of ICD-10 for use in developing countries. The World Health Organization Family of International Classifications Asia-Pacific Network (APN) developed a simplified version of the ICD-10 and conducted field testing in Cambodia during February and March 2016. Ten hospitals were selected to participate. Each hospital sent a team to join a training workshop before using the ICD-10 simplified version to code 100 cases. All hospitals subsequently sent their coded records to the researchers. Overall, there were 1038 coded records with a total of 1099 ICD clinical codes assigned. The average accuracy rate was calculated as 80.71% (66.67-93.41%). Three types of clinical coding errors were found. These related to errors relating to the coder (14.56%), those resulting from the physician documentation (1.27%) and those considered system errors (3.46%). The field trial results demonstrated that the APN ICD-10 simplified version is feasible for implementation as an effective tool to implement ICD-10 clinical coding for hospitals. Developing countries may consider adopting the APN ICD-10 simplified version for ICD-10 code assignment in hospitals and health care centres. The simplified version can be viewed as an introductory tool which leads to the implementation of the full ICD-10 and may support subsequent ICD-11 adoption.
NASA Astrophysics Data System (ADS)
Swan, B.; Laverdiere, M.; Yang, L.
2017-12-01
In the past five years, deep Convolutional Neural Networks (CNN) have been increasingly favored for computer vision applications due to their high accuracy and ability to generalize well in very complex problems; however, details of how they function and in turn how they may be optimized are still imperfectly understood. In particular, their complex and highly nonlinear network architecture, including many hidden layers and self-learned parameters, as well as their mathematical implications, presents open questions about how to effectively select training data. Without knowledge of the exact ways the model processes and transforms its inputs, intuition alone may fail as a guide to selecting highly relevant training samples. Working in the context of improving a CNN-based building extraction model used for the LandScan USA gridded population dataset, we have approached this problem by developing a semi-supervised, highly-scalable approach to select training samples from a dataset of identified commission errors. Due to the large scope this project, tens of thousands of potential samples could be derived from identified commission errors. To efficiently trim those samples down to a manageable and effective set for creating additional training sample, we statistically summarized the spectral characteristics of areas with rates of commission errors at the image tile level and grouped these tiles using affinity propagation. Highly representative members of each commission error cluster were then used to select sites for training sample creation. The model will be incrementally re-trained with the new training data to allow for an assessment of how the addition of different types of samples affects the model performance, such as precision and recall rates. By using quantitative analysis and data clustering techniques to select highly relevant training samples, we hope to improve model performance in a manner that is resource efficient, both in terms of training process and in sample creation.
Influence of chronic back pain on kinematic reactions to unpredictable arm pulls.
Götze, Martin; Ernst, Michael; Koch, Markus; Blickhan, Reinhard
2015-03-01
There is evidence that muscle reflexes are delayed in patients with chronic low back pain in response to perturbations. It is still unrevealed whether these delays accompanied by an altered kinematic or compensated by adaption of other muscle parameters. The aim of this study was to investigate whether chronic low back pain patients show an altered kinematic reaction and if such data are reliable for the classification of chronic low back pain. In an experiment involving 30 females, sudden lateral perturbations were applied to the arm of a subject in an upright, standing position. Kinematics was used to distinguish between chronic low back pain patients and healthy controls. A calculated model of a stepwise discriminant function analysis correctly predicted 100% of patients and 80% of healthy controls. The estimation of the classification error revealed a constant rate for the classification of the healthy controls and a slightly decreased rate for the patients. Observed reflex delays and identified kinematic differences inside and outside the region of pain during impaired movement indicated that chronic low back pain patients have an altered motor control that is not restricted to the lumbo-pelvic region. This applied paradigm of external perturbations can be used to detect chronic low back pain patients and also persons without chronic low back pain but with an altered motor control. Further investigations are essential to reveal whether healthy persons with changes in motor function have an increased potential to develop chronic back pain. Copyright © 2015 Elsevier Ltd. All rights reserved.
Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines
del Val, Lara; Izquierdo-Fuente, Alberto; Villacorta, Juan J.; Raboso, Mariano
2015-01-01
Drawing on the results of an acoustic biometric system based on a MSE classifier, a new biometric system has been implemented. This new system preprocesses acoustic images, extracts several parameters and finally classifies them, based on Support Vector Machine (SVM). The preprocessing techniques used are spatial filtering, segmentation—based on a Gaussian Mixture Model (GMM) to separate the person from the background, masking—to reduce the dimensions of images—and binarization—to reduce the size of each image. An analysis of classification error and a study of the sensitivity of the error versus the computational burden of each implemented algorithm are presented. This allows the selection of the most relevant algorithms, according to the benefits required by the system. A significant improvement of the biometric system has been achieved by reducing the classification error, the computational burden and the storage requirements. PMID:26091392
Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines.
del Val, Lara; Izquierdo-Fuente, Alberto; Villacorta, Juan J; Raboso, Mariano
2015-06-17
Drawing on the results of an acoustic biometric system based on a MSE classifier, a new biometric system has been implemented. This new system preprocesses acoustic images, extracts several parameters and finally classifies them, based on Support Vector Machine (SVM). The preprocessing techniques used are spatial filtering, segmentation-based on a Gaussian Mixture Model (GMM) to separate the person from the background, masking-to reduce the dimensions of images-and binarization-to reduce the size of each image. An analysis of classification error and a study of the sensitivity of the error versus the computational burden of each implemented algorithm are presented. This allows the selection of the most relevant algorithms, according to the benefits required by the system. A significant improvement of the biometric system has been achieved by reducing the classification error, the computational burden and the storage requirements.
NASA Astrophysics Data System (ADS)
Zamora Ramos, Ernesto
Artificial Intelligence is a big part of automation and with today's technological advances, artificial intelligence has taken great strides towards positioning itself as the technology of the future to control, enhance and perfect automation. Computer vision includes pattern recognition and classification and machine learning. Computer vision is at the core of decision making and it is a vast and fruitful branch of artificial intelligence. In this work, we expose novel algorithms and techniques built upon existing technologies to improve pattern recognition and neural network training, initially motivated by a multidisciplinary effort to build a robot that helps maintain and optimize solar panel energy production. Our contributions detail an improved non-linear pre-processing technique to enhance poorly illuminated images based on modifications to the standard histogram equalization for an image. While the original motivation was to improve nocturnal navigation, the results have applications in surveillance, search and rescue, medical imaging enhancing, and many others. We created a vision system for precise camera distance positioning motivated to correctly locate the robot for capture of solar panel images for classification. The classification algorithm marks solar panels as clean or dirty for later processing. Our algorithm extends past image classification and, based on historical and experimental data, it identifies the optimal moment in which to perform maintenance on marked solar panels as to minimize the energy and profit loss. In order to improve upon the classification algorithm, we delved into feedforward neural networks because of their recent advancements, proven universal approximation and classification capabilities, and excellent recognition rates. We explore state-of-the-art neural network training techniques offering pointers and insights, culminating on the implementation of a complete library with support for modern deep learning architectures, multilayer percepterons and convolutional neural networks. Our research with neural networks has encountered a great deal of difficulties regarding hyperparameter estimation for good training convergence rate and accuracy. Most hyperparameters, including architecture, learning rate, regularization, trainable parameters (or weights) initialization, and so on, are chosen via a trial and error process with some educated guesses. However, we developed the first quantitative method to compare weight initialization strategies, a critical hyperparameter choice during training, to estimate among a group of candidate strategies which would make the network converge to the highest classification accuracy faster with high probability. Our method provides a quick, objective measure to compare initialization strategies to select the best possible among them beforehand without having to complete multiple training sessions for each candidate strategy to compare final results.
Khondoker, Mizanur R; Bachmann, Till T; Mewissen, Muriel; Dickinson, Paul; Dobrzelecki, Bartosz; Campbell, Colin J; Mount, Andrew R; Walton, Anthony J; Crain, Jason; Schulze, Holger; Giraud, Gerard; Ross, Alan J; Ciani, Ilenia; Ember, Stuart W J; Tlili, Chaker; Terry, Jonathan G; Grant, Eilidh; McDonnell, Nicola; Ghazal, Peter
2010-12-01
Machine learning and statistical model based classifiers have increasingly been used with more complex and high dimensional biological data obtained from high-throughput technologies. Understanding the impact of various factors associated with large and complex microarray datasets on the predictive performance of classifiers is computationally intensive, under investigated, yet vital in determining the optimal number of biomarkers for various classification purposes aimed towards improved detection, diagnosis, and therapeutic monitoring of diseases. We investigate the impact of microarray based data characteristics on the predictive performance for various classification rules using simulation studies. Our investigation using Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour shows that the predictive performance of classifiers is strongly influenced by training set size, biological and technical variability, replication, fold change and correlation between biomarkers. Optimal number of biomarkers for a classification problem should therefore be estimated taking account of the impact of all these factors. A database of average generalization errors is built for various combinations of these factors. The database of generalization errors can be used for estimating the optimal number of biomarkers for given levels of predictive accuracy as a function of these factors. Examples show that curves from actual biological data resemble that of simulated data with corresponding levels of data characteristics. An R package optBiomarker implementing the method is freely available for academic use from the Comprehensive R Archive Network (http://www.cran.r-project.org/web/packages/optBiomarker/).
Schneider, Bruce A.; Avivi-Reich, Meital; Mozuraitis, Mindaugas
2015-01-01
A number of statistical textbooks recommend using an analysis of covariance (ANCOVA) to control for the effects of extraneous factors that might influence the dependent measure of interest. However, it is not generally recognized that serious problems of interpretation can arise when the design contains comparisons of participants sampled from different populations (classification designs). Designs that include a comparison of younger and older adults, or a comparison of musicians and non-musicians are examples of classification designs. In such cases, estimates of differences among groups can be contaminated by differences in the covariate population means across groups. A second problem of interpretation will arise if the experimenter fails to center the covariate measures (subtracting the mean covariate score from each covariate score) whenever the design contains within-subject factors. Unless the covariate measures on the participants are centered, estimates of within-subject factors are distorted, and significant increases in Type I error rates, and/or losses in power can occur when evaluating the effects of within-subject factors. This paper: (1) alerts potential users of ANCOVA of the need to center the covariate measures when the design contains within-subject factors, and (2) indicates how they can avoid biases when one cannot assume that the expected value of the covariate measure is the same for all of the groups in a classification design. PMID:25954230
Automatic classification of diseases from free-text death certificates for real-time surveillance.
Koopman, Bevan; Karimi, Sarvnaz; Nguyen, Anthony; McGuire, Rhydwyn; Muscatello, David; Kemp, Madonna; Truran, Donna; Zhang, Ming; Thackway, Sarah
2015-07-15
Death certificates provide an invaluable source for mortality statistics which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies. However, their value can be realised only if accurate, quantitative data can be extracted from death certificates, an aim hampered by both the volume and variable nature of certificates written in natural language. This study aims to develop a set of machine learning and rule-based methods to automatically classify death certificates according to four high impact diseases of interest: diabetes, influenza, pneumonia and HIV. Two classification methods are presented: i) a machine learning approach, where detailed features (terms, term n-grams and SNOMED CT concepts) are extracted from death certificates and used to train a set of supervised machine learning models (Support Vector Machines); and ii) a set of keyword-matching rules. These methods were used to identify the presence of diabetes, influenza, pneumonia and HIV in a death certificate. An empirical evaluation was conducted using 340,142 death certificates, divided between training and test sets, covering deaths from 2000-2007 in New South Wales, Australia. Precision and recall (positive predictive value and sensitivity) were used as evaluation measures, with F-measure providing a single, overall measure of effectiveness. A detailed error analysis was performed on classification errors. Classification of diabetes, influenza, pneumonia and HIV was highly accurate (F-measure 0.96). More fine-grained ICD-10 classification effectiveness was more variable but still high (F-measure 0.80). The error analysis revealed that word variations as well as certain word combinations adversely affected classification. In addition, anomalies in the ground truth likely led to an underestimation of the effectiveness. The high accuracy and low cost of the classification methods allow for an effective means for automatic and real-time surveillance of diabetes, influenza, pneumonia and HIV deaths. In addition, the methods are generally applicable to other diseases of interest and to other sources of medical free-text besides death certificates.
Measures of Linguistic Accuracy in Second Language Writing Research.
ERIC Educational Resources Information Center
Polio, Charlene G.
1997-01-01
Investigates the reliability of measures of linguistic accuracy in second language writing. The study uses a holistic scale, error-free T-units, and an error classification system on the essays of English-as-a-Second-Language students and discusses why disagreements arise within a rater and between raters. (24 references) (Author/CK)
Sun, Xiao-Gang; Tang, Hong; Yuan, Gui-Bin
2008-05-01
For the total light scattering particle sizing technique, an inversion and classification method was proposed with the dependent model algorithm. The measured particle system was inversed simultaneously by different particle distribution functions whose mathematic model was known in advance, and then classified according to the inversion errors. The simulation experiments illustrated that it is feasible to use the inversion errors to determine the particle size distribution. The particle size distribution function was obtained accurately at only three wavelengths in the visible light range with the genetic algorithm, and the inversion results were steady and reliable, which decreased the number of multi wavelengths to the greatest extent and increased the selectivity of light source. The single peak distribution inversion error was less than 5% and the bimodal distribution inversion error was less than 10% when 5% stochastic noise was put in the transmission extinction measurement values at two wavelengths. The running time of this method was less than 2 s. The method has advantages of simplicity, rapidity, and suitability for on-line particle size measurement.
Classification accuracy for stratification with remotely sensed data
Raymond L. Czaplewski; Paul L. Patterson
2003-01-01
Tools are developed that help specify the classification accuracy required from remotely sensed data. These tools are applied during the planning stage of a sample survey that will use poststratification, prestratification with proportional allocation, or double sampling for stratification. Accuracy standards are developed in terms of an âerror matrix,â which is...
NASA Astrophysics Data System (ADS)
McClanahan, James Patrick
Eddy Current Testing (ECT) is a Non-Destructive Examination (NDE) technique that is widely used in power generating plants (both nuclear and fossil) to test the integrity of heat exchanger (HX) and steam generator (SG) tubing. Specifically for this research, laboratory-generated, flawed tubing data were examined. The purpose of this dissertation is to develop and implement an automated method for the classification and an advanced characterization of defects in HX and SG tubing. These two improvements enhanced the robustness of characterization as compared to traditional bobbin-coil ECT data analysis methods. A more robust classification and characterization of the tube flaw in-situ (while the SG is on-line but not when the plant is operating), should provide valuable information to the power industry. The following are the conclusions reached from this research. A feature extraction program acquiring relevant information from both the mixed, absolute and differential data was successfully implemented. The CWT was utilized to extract more information from the mixed, complex differential data. Image Processing techniques used to extract the information contained in the generated CWT, classified the data with a high success rate. The data were accurately classified, utilizing the compressed feature vector and using a Bayes classification system. An estimation of the upper bound for the probability of error, using the Bhattacharyya distance, was successfully applied to the Bayesian classification. The classified data were separated according to flaw-type (classification) to enhance characterization. The characterization routine used dedicated, flaw-type specific ANNs that made the characterization of the tube flaw more robust. The inclusion of outliers may help complete the feature space so that classification accuracy is increased. Given that the eddy current test signals appear very similar, there may not be sufficient information to make an extremely accurate (>95%) classification or an advanced characterization using this system. It is necessary to have a larger database fore more accurate system learning.
Comparison of wheat classification accuracy using different classifiers of the image-100 system
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Chen, S. C.; Moreira, M. A.; Delima, A. M.
1981-01-01
Classification results using single-cell and multi-cell signature acquisition options, a point-by-point Gaussian maximum-likelihood classifier, and K-means clustering of the Image-100 system are presented. Conclusions reached are that: a better indication of correct classification can be provided by using a test area which contains various cover types of the study area; classification accuracy should be evaluated considering both the percentages of correct classification and error of commission; supervised classification approaches are better than K-means clustering; Gaussian distribution maximum likelihood classifier is better than Single-cell and Multi-cell Signature Acquisition Options of the Image-100 system; and in order to obtain a high classification accuracy in a large and heterogeneous crop area, using Gaussian maximum-likelihood classifier, homogeneous spectral subclasses of the study crop should be created to derive training statistics.
NASA Astrophysics Data System (ADS)
Habibzadeh, Mehdi; Jannesari, Mahboobeh; Rezaei, Zahra; Baharvand, Hossein; Totonchi, Mehdi
2018-04-01
This works gives an account of evaluation of white blood cell differential counts via computer aided diagnosis (CAD) system and hematology rules. Leukocytes, also called white blood cells (WBCs) play main role of the immune system. Leukocyte is responsible for phagocytosis and immunity and therefore in defense against infection involving the fatal diseases incidence and mortality related issues. Admittedly, microscopic examination of blood samples is a time consuming, expensive and error-prone task. A manual diagnosis would search for specific Leukocytes and number abnormalities in the blood slides while complete blood count (CBC) examination is performed. Complications may arise from the large number of varying samples including different types of Leukocytes, related sub-types and concentration in blood, which makes the analysis prone to human error. This process can be automated by computerized techniques which are more reliable and economical. In essence, we seek to determine a fast, accurate mechanism for classification and gather information about distribution of white blood evidences which may help to diagnose the degree of any abnormalities during CBC test. In this work, we consider the problem of pre-processing and supervised classification of white blood cells into their four primary types including Neutrophils, Eosinophils, Lymphocytes, and Monocytes using a consecutive proposed deep learning framework. For first step, this research proposes three consecutive pre-processing calculations namely are color distortion; bounding box distortion (crop) and image flipping mirroring. In second phase, white blood cell recognition performed with hierarchy topological feature extraction using Inception and ResNet architectures. Finally, the results obtained from the preliminary analysis of cell classification with (11200) training samples and 1244 white blood cells evaluation data set are presented in confusion matrices and interpreted using accuracy rate, and false positive with the classification framework being validated with experiments conducted on poor quality blood images sized 320 × 240 pixels. The deferential outcomes in the challenging cell detection task, as shown in result section, indicate that there is a significant achievement in using Inception and ResNet architecture with proposed settings. Our framework detects on average 100% of the four main white blood cell types using ResNet V1 50 while also alternative promising result with 99.84% and 99.46% accuracy rate obtained with ResNet V1 152 and ResNet 101, respectively with 3000 epochs and fine-tuning all layers. Further statistical confusion matrix tests revealed that this work achieved 1, 0.9979, 0.9989 sensitivity values when area under the curve (AUC) scores above 1, 0.9992, 0.9833 on three proposed techniques. In addition, current work shows negligible and small false negative 0, 2, 1 and substantial false positive with 0, 0, 5 values in Leukocytes detection.
Unsupervised classification of operator workload from brain signals.
Schultze-Kraft, Matthias; Dähne, Sven; Gugler, Manfred; Curio, Gabriel; Blankertz, Benjamin
2016-06-01
In this study we aimed for the classification of operator workload as it is expected in many real-life workplace environments. We explored brain-signal based workload predictors that differ with respect to the level of label information required for training, including entirely unsupervised approaches. Subjects executed a task on a touch screen that required continuous effort of visual and motor processing with alternating difficulty. We first employed classical approaches for workload state classification that operate on the sensor space of EEG and compared those to the performance of three state-of-the-art spatial filtering methods: common spatial patterns (CSPs) analysis, which requires binary label information; source power co-modulation (SPoC) analysis, which uses the subjects' error rate as a target function; and canonical SPoC (cSPoC) analysis, which solely makes use of cross-frequency power correlations induced by different states of workload and thus represents an unsupervised approach. Finally, we investigated the effects of fusing brain signals and peripheral physiological measures (PPMs) and examined the added value for improving classification performance. Mean classification accuracies of 94%, 92% and 82% were achieved with CSP, SPoC, cSPoC, respectively. These methods outperformed the approaches that did not use spatial filtering and they extracted physiologically plausible components. The performance of the unsupervised cSPoC is significantly increased by augmenting it with PPM features. Our analyses ensured that the signal sources used for classification were of cortical origin and not contaminated with artifacts. Our findings show that workload states can be successfully differentiated from brain signals, even when less and less information from the experimental paradigm is used, thus paving the way for real-world applications in which label information may be noisy or entirely unavailable.
Unsupervised classification of operator workload from brain signals
NASA Astrophysics Data System (ADS)
Schultze-Kraft, Matthias; Dähne, Sven; Gugler, Manfred; Curio, Gabriel; Blankertz, Benjamin
2016-06-01
Objective. In this study we aimed for the classification of operator workload as it is expected in many real-life workplace environments. We explored brain-signal based workload predictors that differ with respect to the level of label information required for training, including entirely unsupervised approaches. Approach. Subjects executed a task on a touch screen that required continuous effort of visual and motor processing with alternating difficulty. We first employed classical approaches for workload state classification that operate on the sensor space of EEG and compared those to the performance of three state-of-the-art spatial filtering methods: common spatial patterns (CSPs) analysis, which requires binary label information; source power co-modulation (SPoC) analysis, which uses the subjects’ error rate as a target function; and canonical SPoC (cSPoC) analysis, which solely makes use of cross-frequency power correlations induced by different states of workload and thus represents an unsupervised approach. Finally, we investigated the effects of fusing brain signals and peripheral physiological measures (PPMs) and examined the added value for improving classification performance. Main results. Mean classification accuracies of 94%, 92% and 82% were achieved with CSP, SPoC, cSPoC, respectively. These methods outperformed the approaches that did not use spatial filtering and they extracted physiologically plausible components. The performance of the unsupervised cSPoC is significantly increased by augmenting it with PPM features. Significance. Our analyses ensured that the signal sources used for classification were of cortical origin and not contaminated with artifacts. Our findings show that workload states can be successfully differentiated from brain signals, even when less and less information from the experimental paradigm is used, thus paving the way for real-world applications in which label information may be noisy or entirely unavailable.
Computer Assisted Navigation in Knee Arthroplasty
Bae, Dae Kyung
2011-01-01
Computer assisted surgery (CAS) was used to improve the positioning of implants during total knee arthroplasty (TKA). Most studies have reported that computer assisted navigation reduced the outliers of alignment and component malpositioning. However, additional sophisticated studies are necessary to determine if the improvement of alignment will improve long-term clinical results and increase the survival rate of the implant. Knowledge of CAS-TKA technology and understanding the advantages and limitations of navigation are crucial to the successful application of the CAS technique in TKA. In this article, we review the components of navigation, classification of the system, surgical method, potential error, clinical results, advantages, and disadvantages. PMID:22162787
Capacitively coupled EMG detection via ultra-low-power microcontroller STFT.
Roland, Theresa; Baumgartner, Werner; Amsuess, Sebastian; Russold, Michael F
2017-07-01
As motion artefacts are a major problem with electromyography sensors, a new algorithm is developed to differentiate artefacts to contraction EMG. The performance of myoelectric prosthesis is increased with this algorithm. The implementation is done for an ultra-low-power microcontroller with limited calculation resources and memory. Short Time Fourier Transformation is used to enable real-time application. The sum of the differences (SOD) of the currently measured EMG to a reference contraction EMG is calculated. The SOD is a new parameter introduced for EMG classification. The satisfactory error rates are determined by measurements done with the capacitively coupling EMG prototype, recently developed by the research group.
The generalization ability of SVM classification based on Markov sampling.
Xu, Jie; Tang, Yuan Yan; Zou, Bin; Xu, Zongben; Li, Luoqing; Lu, Yang; Zhang, Baochang
2015-06-01
The previously known works studying the generalization ability of support vector machine classification (SVMC) algorithm are usually based on the assumption of independent and identically distributed samples. In this paper, we go far beyond this classical framework by studying the generalization ability of SVMC based on uniformly ergodic Markov chain (u.e.M.c.) samples. We analyze the excess misclassification error of SVMC based on u.e.M.c. samples, and obtain the optimal learning rate of SVMC for u.e.M.c. We also introduce a new Markov sampling algorithm for SVMC to generate u.e.M.c. samples from given dataset, and present the numerical studies on the learning performance of SVMC based on Markov sampling for benchmark datasets. The numerical studies show that the SVMC based on Markov sampling not only has better generalization ability as the number of training samples are bigger, but also the classifiers based on Markov sampling are sparsity when the size of dataset is bigger with regard to the input dimension.
Artificial neural networks for processing fluorescence spectroscopy data in skin cancer diagnostics
NASA Astrophysics Data System (ADS)
Lenhardt, L.; Zeković, I.; Dramićanin, T.; Dramićanin, M. D.
2013-11-01
Over the years various optical spectroscopic techniques have been widely used as diagnostic tools in the discrimination of many types of malignant diseases. Recently, synchronous fluorescent spectroscopy (SFS) coupled with chemometrics has been applied in cancer diagnostics. The SFS method involves simultaneous scanning of both emission and excitation wavelengths while keeping the interval of wavelengths (constant-wavelength mode) or frequencies (constant-energy mode) between them constant. This method is fast, relatively inexpensive, sensitive and non-invasive. Total synchronous fluorescence spectra of normal skin, nevus and melanoma samples were used as input for training of artificial neural networks. Two different types of artificial neural networks were trained, the self-organizing map and the feed-forward neural network. Histopathology results of investigated skin samples were used as the gold standard for network output. Based on the obtained classification success rate of neural networks, we concluded that both networks provided high sensitivity with classification errors between 2 and 4%.
Structural analysis of online handwritten mathematical symbols based on support vector machines
NASA Astrophysics Data System (ADS)
Simistira, Foteini; Papavassiliou, Vassilis; Katsouros, Vassilis; Carayannis, George
2013-01-01
Mathematical expression recognition is still a very challenging task for the research community mainly because of the two-dimensional (2d) structure of mathematical expressions (MEs). In this paper, we present a novel approach for the structural analysis between two on-line handwritten mathematical symbols of a ME, based on spatial features of the symbols. We introduce six features to represent the spatial affinity of the symbols and compare two multi-class classification methods that employ support vector machines (SVMs): one based on the "one-against-one" technique and one based on the "one-against-all", in identifying the relation between a pair of symbols (i.e. subscript, numerator, etc). A dataset containing 1906 spatial relations derived from the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) 2012 training dataset is constructed to evaluate the classifiers and compare them with the rule-based classifier of the ILSP-1 system participated in the contest. The experimental results give an overall mean error rate of 2.61% for the "one-against-one" SVM approach, 6.57% for the "one-against-all" SVM technique and 12.31% error rate for the ILSP-1 classifier.
NASA Technical Reports Server (NTRS)
Simpson, C. A.
1985-01-01
In the present study of the responses of pairs of pilots to aircraft warning classification tasks using an isolated word, speaker-dependent speech recognition system, the induced stress was manipulated by means of different scoring procedures for the classification task and by the inclusion of a competitive manual control task. Both speech patterns and recognition accuracy were analyzed, and recognition errors were recorded by type for an isolated word speaker-dependent system and by an offline technique for a connected word speaker-dependent system. While errors increased with task loading for the isolated word system, there was no such effect for task loading in the case of the connected word system.
MacDonald, Shannon E; Schopflocher, Donald P; Golonka, Richard P
2014-01-04
Accurate classification of children's immunization status is essential for clinical care, administration and evaluation of immunization programs, and vaccine program research. Computerized immunization registries have been proposed as a valuable alternative to provider paper records or parent report, but there is a need to better understand the challenges associated with their use. This study assessed the accuracy of immunization status classification in an immunization registry as compared to parent report and determined the number and type of errors occurring in both sources. This study was a sub-analysis of a larger study which compared the characteristics of children whose immunizations were up to date (UTD) at two years as compared to those not UTD. Children's immunization status was initially determined from a population-based immunization registry, and then compared to parent report of immunization status, as reported in a postal survey. Discrepancies between the two sources were adjudicated by review of immunization providers' hard-copy clinic records. Descriptive analyses included calculating proportions and confidence intervals for errors in classification and reporting of the type and frequency of errors. Among the 461 survey respondents, there were 60 discrepancies in immunization status. The majority of errors were due to parent report (n = 44), but the registry was not without fault (n = 16). Parents tended to erroneously report their child as UTD, whereas the registry was more likely to wrongly classify children as not UTD. Reasons for registry errors included failure to account for varicella disease history, variable number of doses required due to age at series initiation, and doses administered out of the region. These results confirm that parent report is often flawed, but also identify that registries are prone to misclassification of immunization status. Immunization program administrators and researchers need to institute measures to identify and reduce misclassification, in order for registries to play an effective role in the control of vaccine-preventable disease.
2014-01-01
Background Accurate classification of children’s immunization status is essential for clinical care, administration and evaluation of immunization programs, and vaccine program research. Computerized immunization registries have been proposed as a valuable alternative to provider paper records or parent report, but there is a need to better understand the challenges associated with their use. This study assessed the accuracy of immunization status classification in an immunization registry as compared to parent report and determined the number and type of errors occurring in both sources. Methods This study was a sub-analysis of a larger study which compared the characteristics of children whose immunizations were up to date (UTD) at two years as compared to those not UTD. Children’s immunization status was initially determined from a population-based immunization registry, and then compared to parent report of immunization status, as reported in a postal survey. Discrepancies between the two sources were adjudicated by review of immunization providers’ hard-copy clinic records. Descriptive analyses included calculating proportions and confidence intervals for errors in classification and reporting of the type and frequency of errors. Results Among the 461 survey respondents, there were 60 discrepancies in immunization status. The majority of errors were due to parent report (n = 44), but the registry was not without fault (n = 16). Parents tended to erroneously report their child as UTD, whereas the registry was more likely to wrongly classify children as not UTD. Reasons for registry errors included failure to account for varicella disease history, variable number of doses required due to age at series initiation, and doses administered out of the region. Conclusions These results confirm that parent report is often flawed, but also identify that registries are prone to misclassification of immunization status. Immunization program administrators and researchers need to institute measures to identify and reduce misclassification, in order for registries to play an effective role in the control of vaccine-preventable disease. PMID:24387002
NASA Astrophysics Data System (ADS)
Park, M.; Stenstrom, M. K.
2004-12-01
Recognizing urban information from the satellite imagery is problematic due to the diverse features and dynamic changes of urban landuse. The use of Landsat imagery for urban land use classification involves inherent uncertainty due to its spatial resolution and the low separability among land uses. To resolve the uncertainty problem, we investigated the performance of Bayesian networks to classify urban land use since Bayesian networks provide a quantitative way of handling uncertainty and have been successfully used in many areas. In this study, we developed the optimized networks for urban land use classification from Landsat ETM+ images of Marina del Rey area based on USGS land cover/use classification level III. The networks started from a tree structure based on mutual information between variables and added the links to improve accuracy. This methodology offers several advantages: (1) The network structure shows the dependency relationships between variables. The class node value can be predicted even with particular band information missing due to sensor system error. The missing information can be inferred from other dependent bands. (2) The network structure provides information of variables that are important for the classification, which is not available from conventional classification methods such as neural networks and maximum likelihood classification. In our case, for example, bands 1, 5 and 6 are the most important inputs in determining the land use of each pixel. (3) The networks can be reduced with those input variables important for classification. This minimizes the problem without considering all possible variables. We also examined the effect of incorporating ancillary data: geospatial information such as X and Y coordinate values of each pixel and DEM data, and vegetation indices such as NDVI and Tasseled Cap transformation. The results showed that the locational information improved overall accuracy (81%) and kappa coefficient (76%), and lowered the omission and commission errors compared with using only spectral data (accuracy 71%, kappa coefficient 62%). Incorporating DEM data did not significantly improve overall accuracy (74%) and kappa coefficient (66%) but lowered the omission and commission errors. Incorporating NDVI did not much improve the overall accuracy (72%) and k coefficient (65%). Including Tasseled Cap transformation reduced the accuracy (accuracy 70%, kappa 61%). Therefore, additional information from the DEM and vegetation indices was not useful as locational ancillary data.
Vector quantizer designs for joint compression and terrain categorization of multispectral imagery
NASA Technical Reports Server (NTRS)
Gorman, John D.; Lyons, Daniel F.
1994-01-01
Two vector quantizer designs for compression of multispectral imagery and their impact on terrain categorization performance are evaluated. The mean-squared error (MSE) and classification performance of the two quantizers are compared, and it is shown that a simple two-stage design minimizing MSE subject to a constraint on classification performance has a significantly better classification performance than a standard MSE-based tree-structured vector quantizer followed by maximum likelihood classification. This improvement in classification performance is obtained with minimal loss in MSE performance. The results show that it is advantageous to tailor compression algorithm designs to the required data exploitation tasks. Applications of joint compression/classification include compression for the archival or transmission of Landsat imagery that is later used for land utility surveys and/or radiometric analysis.
Association of Safety Culture with Surgical Site Infection Outcomes.
Fan, Caleb J; Pawlik, Timothy M; Daniels, Tania; Vernon, Nora; Banks, Katie; Westby, Peggy; Wick, Elizabeth C; Sexton, J Bryan; Makary, Martin A
2016-02-01
Hospital workplace culture may have an impact on surgical outcomes; however, this association has not been established. We designed a study to evaluate the association between safety culture and surgical site infection (SSI). Using the Hospital Survey on Patient Safety Culture and National Healthcare Safety Network definitions, we measured 12 dimensions of safety culture and colon SSI rates, respectively, in the surgical units of Minnesota community hospitals. A Pearson's r correlation was calculated for each of 12 dimensions of surgical unit safety culture and SSI rate and then adjusted for surgical volume and American Society of Anesthesiologists (ASA) classification. Seven hospitals participated in the study, with a mean survey response rate of 43%. The SSI rates ranged from 0% to 30%, and surgical unit safety culture scores ranged from 16 to 92 on a scale of 0 to 100. Ten dimensions of surgical unit safety culture were associated with colon SSI rates: teamwork across units (r = -0.96; 95% CI [-0.76, -0.99]), organizational learning (r = -0.95; 95% CI [-0.71, -0.99]), feedback and communication about error (r = -0.92; 95% CI [-0.56, -0.99]), overall perceptions of safety (r = -0.90; 95% CI [-0.45, -0.99]), management support for patient safety (r = -0.90; 95% CI [-0.44, -0.98]), teamwork within units (r = -0.88; 95% CI [-0.38, -0.98]), communication openness (r = -0.85; 95% CI [-0.26, -0.98]), supervisor/manager expectations and actions promoting safety (r = -0.85; 95% CI [-0.25, -0.98]), non-punitive response to error (r = -0.78; 95% CI [-0.07, -0.97]), and frequency of events reported (r = -0.76; 95% CI [-0.01, -0.96]). After adjusting for surgical volume and ASA classification, 9 of 12 dimensions of surgical unit safety culture were significantly associated with lower colon SSI rates. These data suggest an important role for positive safety and teamwork culture and engaged hospital management in producing high-quality surgical outcomes. Copyright © 2016. Published by Elsevier Inc.
Medication errors: definitions and classification
Aronson, Jeffrey K
2009-01-01
To understand medication errors and to identify preventive strategies, we need to classify them and define the terms that describe them. The four main approaches to defining technical terms consider etymology, usage, previous definitions, and the Ramsey–Lewis method (based on an understanding of theory and practice). A medication error is ‘a failure in the treatment process that leads to, or has the potential to lead to, harm to the patient’. Prescribing faults, a subset of medication errors, should be distinguished from prescription errors. A prescribing fault is ‘a failure in the prescribing [decision-making] process that leads to, or has the potential to lead to, harm to the patient’. The converse of this, ‘balanced prescribing’ is ‘the use of a medicine that is appropriate to the patient's condition and, within the limits created by the uncertainty that attends therapeutic decisions, in a dosage regimen that optimizes the balance of benefit to harm’. This excludes all forms of prescribing faults, such as irrational, inappropriate, and ineffective prescribing, underprescribing and overprescribing. A prescription error is ‘a failure in the prescription writing process that results in a wrong instruction about one or more of the normal features of a prescription’. The ‘normal features’ include the identity of the recipient, the identity of the drug, the formulation, dose, route, timing, frequency, and duration of administration. Medication errors can be classified, invoking psychological theory, as knowledge-based mistakes, rule-based mistakes, action-based slips, and memory-based lapses. This classification informs preventive strategies. PMID:19594526
NASA Astrophysics Data System (ADS)
Ziemba, Alexander; El Serafy, Ghada
2016-04-01
Ecological modeling and water quality investigations are complex processes which can require a high level of parameterization and a multitude of varying data sets in order to properly execute the model in question. Since models are generally complex, their calibration and validation can benefit from the application of data and information fusion techniques. The data applied to ecological models comes from a wide range of sources such as remote sensing, earth observation, and in-situ measurements, resulting in a high variability in the temporal and spatial resolution of the various data sets available to water quality investigators. It is proposed that effective fusion into a comprehensive singular set will provide a more complete and robust data resource with which models can be calibrated, validated, and driven by. Each individual product contains a unique valuation of error resulting from the method of measurement and application of pre-processing techniques. The uncertainty and error is further compounded when the data being fused is of varying temporal and spatial resolution. In order to have a reliable fusion based model and data set, the uncertainty of the results and confidence interval of the data being reported must be effectively communicated to those who would utilize the data product or model outputs in a decision making process[2]. Here we review an array of data fusion techniques applied to various remote sensing, earth observation, and in-situ data sets whose domains' are varied in spatial and temporal resolution. The data sets examined are combined in a manner so that the various classifications, complementary, redundant, and cooperative, of data are all assessed to determine classification's impact on the propagation and compounding of error. In order to assess the error of the fused data products, a comparison is conducted with data sets containing a known confidence interval and quality rating. We conclude with a quantification of the performance of the data fusion techniques and a recommendation on the feasibility of applying of the fused products in operating forecast systems and modeling scenarios. The error bands and confidence intervals derived can be used in order to clarify the error and confidence of water quality variables produced by prediction and forecasting models. References [1] F. Castanedo, "A Review of Data Fusion Techniques", The Scientific World Journal, vol. 2013, pp. 1-19, 2013. [2] T. Keenan, M. Carbone, M. Reichstein and A. Richardson, "The model-data fusion pitfall: assuming certainty in an uncertain world", Oecologia, vol. 167, no. 3, pp. 587-597, 2011.
Prevalence of refractive error in Europe: the European Eye Epidemiology (E(3)) Consortium.
Williams, Katie M; Verhoeven, Virginie J M; Cumberland, Phillippa; Bertelsen, Geir; Wolfram, Christian; Buitendijk, Gabriëlle H S; Hofman, Albert; van Duijn, Cornelia M; Vingerling, Johannes R; Kuijpers, Robert W A M; Höhn, René; Mirshahi, Alireza; Khawaja, Anthony P; Luben, Robert N; Erke, Maja Gran; von Hanno, Therese; Mahroo, Omar; Hogg, Ruth; Gieger, Christian; Cougnard-Grégoire, Audrey; Anastasopoulos, Eleftherios; Bron, Alain; Dartigues, Jean-François; Korobelnik, Jean-François; Creuzot-Garcher, Catherine; Topouzis, Fotis; Delcourt, Cécile; Rahi, Jugnoo; Meitinger, Thomas; Fletcher, Astrid; Foster, Paul J; Pfeiffer, Norbert; Klaver, Caroline C W; Hammond, Christopher J
2015-04-01
To estimate the prevalence of refractive error in adults across Europe. Refractive data (mean spherical equivalent) collected between 1990 and 2013 from fifteen population-based cohort and cross-sectional studies of the European Eye Epidemiology (E(3)) Consortium were combined in a random effects meta-analysis stratified by 5-year age intervals and gender. Participants were excluded if they were identified as having had cataract surgery, retinal detachment, refractive surgery or other factors that might influence refraction. Estimates of refractive error prevalence were obtained including the following classifications: myopia ≤-0.75 diopters (D), high myopia ≤-6D, hyperopia ≥1D and astigmatism ≥1D. Meta-analysis of refractive error was performed for 61,946 individuals from fifteen studies with median age ranging from 44 to 81 and minimal ethnic variation (98 % European ancestry). The age-standardised prevalences (using the 2010 European Standard Population, limited to those ≥25 and <90 years old) were: myopia 30.6 % [95 % confidence interval (CI) 30.4-30.9], high myopia 2.7 % (95 % CI 2.69-2.73), hyperopia 25.2 % (95 % CI 25.0-25.4) and astigmatism 23.9 % (95 % CI 23.7-24.1). Age-specific estimates revealed a high prevalence of myopia in younger participants [47.2 % (CI 41.8-52.5) in 25-29 years-olds]. Refractive error affects just over a half of European adults. The greatest burden of refractive error is due to myopia, with high prevalence rates in young adults. Using the 2010 European population estimates, we estimate there are 227.2 million people with myopia across Europe.
NASA Astrophysics Data System (ADS)
Szuflitowska, B.; Orlowski, P.
2017-08-01
Automated detection system consists of two key steps: extraction of features from EEG signals and classification for detection of pathology activity. The EEG sequences were analyzed using Short-Time Fourier Transform and the classification was performed using Linear Discriminant Analysis. The accuracy of the technique was tested on three sets of EEG signals: epilepsy, healthy and Alzheimer's Disease. The classification error below 10% has been considered a success. The higher accuracy are obtained for new data of unknown classes than testing data. The methodology can be helpful in differentiation epilepsy seizure and disturbances in the EEG signal in Alzheimer's Disease.
Xiao, Bo; Imel, Zac E.; Georgiou, Panayiotis G.; Atkins, David C.; Narayanan, Shrikanth S.
2015-01-01
The technology for evaluating patient-provider interactions in psychotherapy–observational coding–has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies. PMID:26630392
A comparison of the weights-of-evidence method and probabilistic neural networks
Singer, Donald A.; Kouda, Ryoichi
1999-01-01
The need to integrate large quantities of digital geoscience information to classify locations as mineral deposits or nondeposits has been met by the weights-of-evidence method in many situations. Widespread selection of this method may be more the result of its ease of use and interpretation rather than comparisons with alternative methods. A comparison of the weights-of-evidence method to probabilistic neural networks is performed here with data from Chisel Lake-Andeson Lake, Manitoba, Canada. Each method is designed to estimate the probability of belonging to learned classes where the estimated probabilities are used to classify the unknowns. Using these data, significantly lower classification error rates were observed for the neural network, not only when test and training data were the same (0.02 versus 23%), but also when validation data, not used in any training, were used to test the efficiency of classification (0.7 versus 17%). Despite these data containing too few deposits, these tests of this set of data demonstrate the neural network's ability at making unbiased probability estimates and lower error rates when measured by number of polygons or by the area of land misclassified. For both methods, independent validation tests are required to ensure that estimates are representative of real-world results. Results from the weights-of-evidence method demonstrate a strong bias where most errors are barren areas misclassified as deposits. The weights-of-evidence method is based on Bayes rule, which requires independent variables in order to make unbiased estimates. The chi-square test for independence indicates no significant correlations among the variables in the Chisel Lake–Andeson Lake data. However, the expected number of deposits test clearly demonstrates that these data violate the independence assumption. Other, independent simulations with three variables show that using variables with correlations of 1.0 can double the expected number of deposits as can correlations of −1.0. Studies done in the 1970s on methods that use Bayes rule show that moderate correlations among attributes seriously affect estimates and even small correlations lead to increases in misclassifications. Adverse effects have been observed with small to moderate correlations when only six to eight variables were used. Consistent evidence of upward biased probability estimates from multivariate methods founded on Bayes rule must be of considerable concern to institutions and governmental agencies where unbiased estimates are required. In addition to increasing the misclassification rate, biased probability estimates make classification into deposit and nondeposit classes an arbitrary subjective decision. The probabilistic neural network has no problem dealing with correlated variables—its performance depends strongly on having a thoroughly representative training set. Probabilistic neural networks or logistic regression should receive serious consideration where unbiased estimates are required. The weights-of-evidence method would serve to estimate thresholds between anomalies and background and for exploratory data analysis.
The Influence of Item Calibration Error on Variable-Length Computerized Adaptive Testing
ERIC Educational Resources Information Center
Patton, Jeffrey M.; Cheng, Ying; Yuan, Ke-Hai; Diao, Qi
2013-01-01
Variable-length computerized adaptive testing (VL-CAT) allows both items and test length to be "tailored" to examinees, thereby achieving the measurement goal (e.g., scoring precision or classification) with as few items as possible. Several popular test termination rules depend on the standard error of the ability estimate, which in turn depends…
Lexical Errors in Second Language Scientific Writing: Some Conceptual Implications
ERIC Educational Resources Information Center
Carrió Pastor, María Luisa; Mestre-Mestre, Eva María
2014-01-01
Nowadays, scientific writers are required not only a thorough knowledge of their subject field, but also a sound command of English as a lingua franca. In this paper, the lexical errors produced in scientific texts written in English by non-native researchers are identified to propose a classification of the categories they contain. This study…
Land use surveys by means of automatic interpretation of LANDSAT system data
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Lombardo, M. A.; Novo, E. M. L. D.; Niero, M.; Foresti, C.
1981-01-01
Analyses for seven land-use classes are presented. The classes are: urban area, industrial area, bare soil, cultivated area, pastureland, reforestation, and natural vegetation. The automatic classification of LANDSAT MSS data using a maximum likelihood algorithm shows a 39% average error of emission and a 3.45 error of commission for the seven classes.
Kranz, R
2015-01-01
Objective: To establish the prevalence of red dot markers in a sample of wrist radiographs and to identify any anatomical and/or pathological characteristics that predict “incorrect” red dot classification. Methods: Accident and emergency (A&E) wrist cases from a digital imaging and communications in medicine/digital teaching library were examined for red dot prevalence and for the presence of several anatomical and pathological features. Binary logistic regression analyses were run to establish if any of these features were predictors of incorrect red dot classification. Results: 398 cases were analysed. Red dot was “incorrectly” classified in 8.5% of cases; 6.3% were “false negatives” (“FNs”)and 2.3% false positives (FPs) (one decimal place). Old fractures [odds ratio (OR), 5.070 (1.256–20.471)] and reported degenerative change [OR, 9.870 (2.300–42.359)] were found to predict FPs. Frykman V [OR, 9.500 (1.954–46.179)], Frykman VI [OR, 6.333 (1.205–33.283)] and non-Frykman positive abnormalities [OR, 4.597 (1.264–16.711)] predict “FNs”. Old fractures and Frykman VI were predictive of error at 90% confidence interval (CI); the rest at 95% CI. Conclusion: The five predictors of incorrect red dot classification may inform the image interpretation training of radiographers and other professionals to reduce diagnostic error. Verification with larger samples would reinforce these findings. Advances in knowledge: All healthcare providers strive to eradicate diagnostic error. By examining specific anatomical and pathological predictors on radiographs for such error, as well as extrinsic factors that may affect reporting accuracy, image interpretation training can focus on these “problem” areas and influence which radiographic abnormality detection schemes are appropriate to implement in A&E departments. PMID:25496373
ERIC Educational Resources Information Center
Waring, R.; Knight, R.
2013-01-01
Background: Children with speech sound disorders (SSD) form a heterogeneous group who differ in terms of the severity of their condition, underlying cause, speech errors, involvement of other aspects of the linguistic system and treatment response. To date there is no universal and agreed-upon classification system. Instead, a number of…
Ruuska, Salla; Hämäläinen, Wilhelmiina; Kajava, Sari; Mughal, Mikaela; Matilainen, Pekka; Mononen, Jaakko
2018-03-01
The aim of the present study was to evaluate empirically confusion matrices in device validation. We compared the confusion matrix method to linear regression and error indices in the validation of a device measuring feeding behaviour of dairy cattle. In addition, we studied how to extract additional information on classification errors with confusion probabilities. The data consisted of 12 h behaviour measurements from five dairy cows; feeding and other behaviour were detected simultaneously with a device and from video recordings. The resulting 216 000 pairs of classifications were used to construct confusion matrices and calculate performance measures. In addition, hourly durations of each behaviour were calculated and the accuracy of measurements was evaluated with linear regression and error indices. All three validation methods agreed when the behaviour was detected very accurately or inaccurately. Otherwise, in the intermediate cases, the confusion matrix method and error indices produced relatively concordant results, but the linear regression method often disagreed with them. Our study supports the use of confusion matrix analysis in validation since it is robust to any data distribution and type of relationship, it makes a stringent evaluation of validity, and it offers extra information on the type and sources of errors. Copyright © 2018 Elsevier B.V. All rights reserved.
Bayes-LQAS: classifying the prevalence of global acute malnutrition
2010-01-01
Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications. PMID:20534159
Bayes-LQAS: classifying the prevalence of global acute malnutrition.
Olives, Casey; Pagano, Marcello
2010-06-09
Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications.
Galaxy Zoo 1: data release of morphological classifications for nearly 900 000 galaxies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Linott, C.; Slosar, A.; Lintott, C.
Morphology is a powerful indicator of a galaxy's dynamical and merger history. It is strongly correlated with many physical parameters, including mass, star formation history and the distribution of mass. The Galaxy Zoo project collected simple morphological classifications of nearly 900,000 galaxies drawn from the Sloan Digital Sky Survey, contributed by hundreds of thousands of volunteers. This large number of classifications allows us to exclude classifier error, and measure the influence of subtle biases inherent in morphological classification. This paper presents the data collected by the project, alongside measures of classification accuracy and bias. The data are now publicly availablemore » and full catalogues can be downloaded in electronic format from http://data.galaxyzoo.org.« less
Spotting East African mammals in open savannah from space.
Yang, Zheng; Wang, Tiejun; Skidmore, Andrew K; de Leeuw, Jan; Said, Mohammed Y; Freer, Jim
2014-01-01
Knowledge of population dynamics is essential for managing and conserving wildlife. Traditional methods of counting wild animals such as aerial survey or ground counts not only disturb animals, but also can be labour intensive and costly. New, commercially available very high-resolution satellite images offer great potential for accurate estimates of animal abundance over large open areas. However, little research has been conducted in the area of satellite-aided wildlife census, although computer processing speeds and image analysis algorithms have vastly improved. This paper explores the possibility of detecting large animals in the open savannah of Maasai Mara National Reserve, Kenya from very high-resolution GeoEye-1 satellite images. A hybrid image classification method was employed for this specific purpose by incorporating the advantages of both pixel-based and object-based image classification approaches. This was performed in two steps: firstly, a pixel-based image classification method, i.e., artificial neural network was applied to classify potential targets with similar spectral reflectance at pixel level; and then an object-based image classification method was used to further differentiate animal targets from the surrounding landscapes through the applications of expert knowledge. As a result, the large animals in two pilot study areas were successfully detected with an average count error of 8.2%, omission error of 6.6% and commission error of 13.7%. The results of the study show for the first time that it is feasible to perform automated detection and counting of large wild animals in open savannahs from space, and therefore provide a complementary and alternative approach to the conventional wildlife survey techniques.
Roach, Jennifer K.; Griffith, Brad; Verbyla, David
2012-01-01
Programs to monitor lake area change are becoming increasingly important in high latitude regions, and their development often requires evaluating tradeoffs among different approaches in terms of accuracy of measurement, consistency across multiple users over long time periods, and efficiency. We compared three supervised methods for lake classification from Landsat imagery (density slicing, classification trees, and feature extraction). The accuracy of lake area and number estimates was evaluated relative to high-resolution aerial photography acquired within two days of satellite overpasses. The shortwave infrared band 5 was better at separating surface water from nonwater when used alone than when combined with other spectral bands. The simplest of the three methods, density slicing, performed best overall. The classification tree method resulted in the most omission errors (approx. 2x), feature extraction resulted in the most commission errors (approx. 4x), and density slicing had the least directional bias (approx. half of the lakes with overestimated area and half of the lakes with underestimated area). Feature extraction was the least consistent across training sets (i.e., large standard error among different training sets). Density slicing was the best of the three at classifying small lakes as evidenced by its lower optimal minimum lake size criterion of 5850 m2 compared with the other methods (8550 m2). Contrary to conventional wisdom, the use of additional spectral bands and a more sophisticated method not only required additional processing effort but also had a cost in terms of the accuracy and consistency of lake classifications.
Survey methods for assessing land cover map accuracy
Nusser, S.M.; Klaas, E.E.
2003-01-01
The increasing availability of digital photographic materials has fueled efforts by agencies and organizations to generate land cover maps for states, regions, and the United States as a whole. Regardless of the information sources and classification methods used, land cover maps are subject to numerous sources of error. In order to understand the quality of the information contained in these maps, it is desirable to generate statistically valid estimates of accuracy rates describing misclassification errors. We explored a full sample survey framework for creating accuracy assessment study designs that balance statistical and operational considerations in relation to study objectives for a regional assessment of GAP land cover maps. We focused not only on appropriate sample designs and estimation approaches, but on aspects of the data collection process, such as gaining cooperation of land owners and using pixel clusters as an observation unit. The approach was tested in a pilot study to assess the accuracy of Iowa GAP land cover maps. A stratified two-stage cluster sampling design addressed sample size requirements for land covers and the need for geographic spread while minimizing operational effort. Recruitment methods used for private land owners yielded high response rates, minimizing a source of nonresponse error. Collecting data for a 9-pixel cluster centered on the sampled pixel was simple to implement, and provided better information on rarer vegetation classes as well as substantial gains in precision relative to observing data at a single-pixel.
Olives, Casey; Pagano, Marcello; Deitchler, Megan; Hedt, Bethany L; Egge, Kari; Valadez, Joseph J
2009-01-01
Traditional lot quality assurance sampling (LQAS) methods require simple random sampling to guarantee valid results. However, cluster sampling has been proposed to reduce the number of random starting points. This study uses simulations to examine the classification error of two such designs, a 67×3 (67 clusters of three observations) and a 33×6 (33 clusters of six observations) sampling scheme to assess the prevalence of global acute malnutrition (GAM). Further, we explore the use of a 67×3 sequential sampling scheme for LQAS classification of GAM prevalence. Results indicate that, for independent clusters with moderate intracluster correlation for the GAM outcome, the three sampling designs maintain approximate validity for LQAS analysis. Sequential sampling can substantially reduce the average sample size that is required for data collection. The presence of intercluster correlation can impact dramatically the classification error that is associated with LQAS analysis. PMID:20011037
Electroencephalography epilepsy classifications using hybrid cuckoo search and neural network
NASA Astrophysics Data System (ADS)
Pratiwi, A. B.; Damayanti, A.; Miswanto
2017-07-01
Epilepsy is a condition that affects the brain and causes repeated seizures. This seizure is episodes that can vary and nearly undetectable to long periods of vigorous shaking or brain contractions. Epilepsy often can be confirmed with an electrocephalography (EEG). Neural Networks has been used in biomedic signal analysis, it has successfully classified the biomedic signal, such as EEG signal. In this paper, a hybrid cuckoo search and neural network are used to recognize EEG signal for epilepsy classifications. The weight of the multilayer perceptron is optimized by the cuckoo search algorithm based on its error. The aim of this methods is making the network faster to obtained the local or global optimal then the process of classification become more accurate. Based on the comparison results with the traditional multilayer perceptron, the hybrid cuckoo search and multilayer perceptron provides better performance in term of error convergence and accuracy. The purpose methods give MSE 0.001 and accuracy 90.0 %.
Effects of audio compression in automatic detection of voice pathologies.
Sáenz-Lechón, Nicolás; Osma-Ruiz, Víctor; Godino-Llorente, Juan I; Blanco-Velasco, Manuel; Cruz-Roldán, Fernando; Arias-Londoño, Julián D
2008-12-01
This paper investigates the performance of an automatic system for voice pathology detection when the voice samples have been compressed in MP3 format and different binary rates (160, 96, 64, 48, 24, and 8 kb/s). The detectors employ cepstral and noise measurements, along with their derivatives, to characterize the voice signals. The classification is performed using Gaussian mixtures models and support vector machines. The results between the different proposed detectors are compared by means of detector error tradeoff (DET) and receiver operating characteristic (ROC) curves, concluding that there are no significant differences in the performance of the detector when the binary rates of the compressed data are above 64 kb/s. This has useful applications in telemedicine, reducing the storage space of voice recordings or transmitting them over narrow-band communications channels.
Neural network modelling of the influence of channelopathies on reflex visual attention.
Gravier, Alexandre; Quek, Chai; Duch, Włodzisław; Wahab, Abdul; Gravier-Rymaszewska, Joanna
2016-02-01
This paper introduces a model of Emergent Visual Attention in presence of calcium channelopathy (EVAC). By modelling channelopathy, EVAC constitutes an effort towards identifying the possible causes of autism. The network structure embodies the dual pathways model of cortical processing of visual input, with reflex attention as an emergent property of neural interactions. EVAC extends existing work by introducing attention shift in a larger-scale network and applying a phenomenological model of channelopathy. In presence of a distractor, the channelopathic network's rate of failure to shift attention is lower than the control network's, but overall, the control network exhibits a lower classification error rate. The simulation results also show differences in task-relative reaction times between control and channelopathic networks. The attention shift timings inferred from the model are consistent with studies of attention shift in autistic children.
Assiri, Ghadah Asaad; Shebl, Nada Atef; Mahmoud, Mansour Adam; Aloudah, Nouf; Grant, Elizabeth; Aljadhey, Hisham; Sheikh, Aziz
2018-05-05
To investigate the epidemiology of medication errors and error-related adverse events in adults in primary care, ambulatory care and patients' homes. Systematic review. Six international databases were searched for publications between 1 January 2006 and 31 December 2015. Two researchers independently extracted data from eligible studies and assessed the quality of these using established instruments. Synthesis of data was informed by an appreciation of the medicines' management process and the conceptual framework from the International Classification for Patient Safety. 60 studies met the inclusion criteria, of which 53 studies focused on medication errors, 3 on error-related adverse events and 4 on risk factors only. The prevalence of prescribing errors was reported in 46 studies: prevalence estimates ranged widely from 2% to 94%. Inappropriate prescribing was the most common type of error reported. Only one study reported the prevalence of monitoring errors, finding that incomplete therapeutic/safety laboratory-test monitoring occurred in 73% of patients. The incidence of preventable adverse drug events (ADEs) was estimated as 15/1000 person-years, the prevalence of drug-drug interaction-related adverse drug reactions as 7% and the prevalence of preventable ADE as 0.4%. A number of patient, healthcare professional and medication-related risk factors were identified, including the number of medications used by the patient, increased patient age, the number of comorbidities, use of anticoagulants, cases where more than one physician was involved in patients' care and care being provided by family physicians/general practitioners. A very wide variation in the medication error and error-related adverse events rates is reported in the studies, this reflecting heterogeneity in the populations studied, study designs employed and outcomes evaluated. This review has identified important limitations and discrepancies in the methodologies used and gaps in the literature on the epidemiology and outcomes of medication errors in community settings. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
NASA Astrophysics Data System (ADS)
Dementev, A. O.; Dmitriev, E. V.; Kozoderov, V. V.; Egorov, V. D.
2017-10-01
Hyperspectral imaging is up-to-date promising technology widely applied for the accurate thematic mapping. The presence of a large number of narrow survey channels allows us to use subtle differences in spectral characteristics of objects and to make a more detailed classification than in the case of using standard multispectral data. The difficulties encountered in the processing of hyperspectral images are usually associated with the redundancy of spectral information which leads to the problem of the curse of dimensionality. Methods currently used for recognizing objects on multispectral and hyperspectral images are usually based on standard base supervised classification algorithms of various complexity. Accuracy of these algorithms can be significantly different depending on considered classification tasks. In this paper we study the performance of ensemble classification methods for the problem of classification of the forest vegetation. Error correcting output codes and boosting are tested on artificial data and real hyperspectral images. It is demonstrates, that boosting gives more significant improvement when used with simple base classifiers. The accuracy in this case in comparable the error correcting output code (ECOC) classifier with Gaussian kernel SVM base algorithm. However the necessity of boosting ECOC with Gaussian kernel SVM is questionable. It is demonstrated, that selected ensemble classifiers allow us to recognize forest species with high enough accuracy which can be compared with ground-based forest inventory data.
Error, Power, and Blind Sentinels: The Statistics of Seagrass Monitoring
Schultz, Stewart T.; Kruschel, Claudia; Bakran-Petricioli, Tatjana; Petricioli, Donat
2015-01-01
We derive statistical properties of standard methods for monitoring of habitat cover worldwide, and criticize them in the context of mandated seagrass monitoring programs, as exemplified by Posidonia oceanica in the Mediterranean Sea. We report the novel result that cartographic methods with non-trivial classification errors are generally incapable of reliably detecting habitat cover losses less than about 30 to 50%, and the field labor required to increase their precision can be orders of magnitude higher than that required to estimate habitat loss directly in a field campaign. We derive a universal utility threshold of classification error in habitat maps that represents the minimum habitat map accuracy above which direct methods are superior. Widespread government reliance on blind-sentinel methods for monitoring seafloor can obscure the gradual and currently ongoing losses of benthic resources until the time has long passed for meaningful management intervention. We find two classes of methods with very high statistical power for detecting small habitat cover losses: 1) fixed-plot direct methods, which are over 100 times as efficient as direct random-plot methods in a variable habitat mosaic; and 2) remote methods with very low classification error such as geospatial underwater videography, which is an emerging, low-cost, non-destructive method for documenting small changes at millimeter visual resolution. General adoption of these methods and their further development will require a fundamental cultural change in conservation and management bodies towards the recognition and promotion of requirements of minimal statistical power and precision in the development of international goals for monitoring these valuable resources and the ecological services they provide. PMID:26367863
Chanani, Sheila; Wacksman, Jeremy; Deshmukh, Devika; Pantvaidya, Shanti; Fernandez, Armida; Jayaraman, Anuja
2016-12-01
Acute malnutrition is linked to child mortality and morbidity. Community-Based Management of Acute Malnutrition (CMAM) programs can be instrumental in large-scale detection and treatment of undernutrition. The World Health Organization (WHO) 2006 weight-for-height/length tables are diagnostic tools available to screen for acute malnutrition. Frontline workers (FWs) in a CMAM program in Dharavi, Mumbai, were using CommCare, a mobile application, for monitoring and case management of children in combination with the paper-based WHO simplified tables. A strategy was undertaken to digitize the WHO tables into the CommCare application. To measure differences in diagnostic accuracy in community-based screening for acute malnutrition, by FWs, using a mobile-based solution. Twenty-seven FWs initially used the paper-based tables and then switched to an updated mobile application that included a nutritional grade calculator. Human error rates specifically associated with grade classification were calculated by comparison of the grade assigned by the FW to the grade each child should have received based on the same WHO tables. Cohen kappa coefficient, sensitivity and specificity rates were also calculated and compared for paper-based grade assignments and calculator grade assignments. Comparing FWs (N = 14) who completed at least 40 screenings without and 40 with the calculator, the error rates were 5.5% and 0.7%, respectively (p < .0001). Interrater reliability (κ) increased to an almost perfect level (>.90), from .79 to .97, after switching to the mobile calculator. Sensitivity and specificity also improved significantly. The mobile calculator significantly reduces an important component of human error in using the WHO tables to assess acute malnutrition at the community level. © The Author(s) 2016.
Mehrad, Mitra; Chernock, Rebecca D; El-Mofty, Samir K; Lewis, James S
2015-12-01
Medical error is a significant problem in the United States, and pathologic diagnoses are a significant source of errors. Prior studies have shown that second-opinion pathology review results in clinically major diagnosis changes in approximately 0.6% to 5.8% of patients. The few studies specifically on head and neck pathology have suggested rates of changed diagnoses that are even higher. Objectives .- To evaluate the diagnostic discrepancy rates in patients referred to our institution, where all such cases are reviewed by a head and neck subspecialty service, and to identify specific areas with more susceptibility to errors. Five hundred consecutive, scanned head and neck pathology reports from patients referred to our institution were compared for discrepancies between the outside and in-house diagnoses. Major discrepancies were defined as those resulting in a significant change in patient clinical management and/or prognosis. Major discrepancies occurred in 20 cases (4% overall). Informative follow-up material was available on 11 of the 20 patients (55.0%), among whom, the second opinion was supported in 11 of 11 cases (100%). Dysplasia versus invasive squamous cell carcinoma was the most common (7 of 20; 35%) area of discrepancy, and by anatomic subsite, the sinonasal tract (4 of 21; 19.0%) had the highest rate of discrepant diagnoses. Of the major discrepant diagnoses, 12 (12 of 20; 60%) involved a change from benign to malignant, one a change from malignant to benign (1 of 20; 5%), and 6 involved tumor classification (6 of 20; 30%). Head and neck pathology is a relatively high-risk area, prone to erroneous diagnoses in a small fraction of patients. This study supports the importance of second-opinion review by subspecialized pathologists for the best care of patients.
Detecting paroxysmal coughing from pertussis cases using voice recognition technology.
Parker, Danny; Picone, Joseph; Harati, Amir; Lu, Shuang; Jenkyns, Marion H; Polgreen, Philip M
2013-01-01
Pertussis is highly contagious; thus, prompt identification of cases is essential to control outbreaks. Clinicians experienced with the disease can easily identify classic cases, where patients have bursts of rapid coughing followed by gasps, and a characteristic whooping sound. However, many clinicians have never seen a case, and thus may miss initial cases during an outbreak. The purpose of this project was to use voice-recognition software to distinguish pertussis coughs from croup and other coughs. We collected a series of recordings representing pertussis, croup and miscellaneous coughing by children. We manually categorized coughs as either pertussis or non-pertussis, and extracted features for each category. We used Mel-frequency cepstral coefficients (MFCC), a sampling rate of 16 KHz, a frame Duration of 25 msec, and a frame rate of 10 msec. The coughs were filtered. Each cough was divided into 3 sections of proportion 3-4-3. The average of the 13 MFCCs for each section was computed and made into a 39-element feature vector used for the classification. We used the following machine learning algorithms: Neural Networks, K-Nearest Neighbor (KNN), and a 200 tree Random Forest (RF). Data were reserved for cross-validation of the KNN and RF. The Neural Network was trained 100 times, and the averaged results are presented. After categorization, we had 16 examples of non-pertussis coughs and 31 examples of pertussis coughs. Over 90% of all pertussis coughs were properly classified as pertussis. The error rates were: Type I errors of 7%, 12%, and 25% and Type II errors of 8%, 0%, and 0%, using the Neural Network, Random Forest, and KNN, respectively. Our results suggest that we can build a robust classifier to assist clinicians and the public to help identify pertussis cases in children presenting with typical symptoms.
McElroy, Lisa M; Daud, Amna; Lapin, Brittany; Ross, Olivia; Woods, Donna M; Skaro, Anton I; Holl, Jane L; Ladner, Daniela P
2014-11-01
Rates of medical errors and adverse events remain high for patients who undergo kidney transplantation; they are particularly vulnerable because of the complexity of their disease and the kidney transplantation procedure. Although institutional incident-reporting systems are used in hospitals around the country, they often fail to capture a substantial proportion of medical errors. The goal of this study was to assess the ability of a proactive, web-based clinician safety debriefing to augment the information about medical errors and adverse events obtained via traditional incident reporting systems. Debriefings were sent to all individuals listed on operating room personnel reports for kidney transplantation surgeries between April 2010 and April 2011, and incident reports were collected for the same time period. The World Health Organization International Classification for Patient Safety was used to classify all issues reported. A total of 270 debriefings reported 334 patient safety issues (179 safety incidents, 155 contributing factors), and 57 incident reports reported 92 patient safety issues (56 safety incidents, 36 contributing factors). Compared with incident reports, more attending physicians completed the debriefings (32.0 vs 3.5%). The use of a proactive, web-based debriefing to augment an incident reporting system in assessing safety risks in kidney transplantation demonstrated increased information, more perspectives of a single safety issue, and increased breadth of participants. Copyright © 2014 Elsevier Inc. All rights reserved.
One-Class Classification-Based Real-Time Activity Error Detection in Smart Homes.
Das, Barnan; Cook, Diane J; Krishnan, Narayanan C; Schmitter-Edgecombe, Maureen
2016-08-01
Caring for individuals with dementia is frequently associated with extreme physical and emotional stress, which often leads to depression. Smart home technology and advances in machine learning techniques can provide innovative solutions to reduce caregiver burden. One key service that caregivers provide is prompting individuals with memory limitations to initiate and complete daily activities. We hypothesize that sensor technologies combined with machine learning techniques can automate the process of providing reminder-based interventions. The first step towards automated interventions is to detect when an individual faces difficulty with activities. We propose machine learning approaches based on one-class classification that learn normal activity patterns. When we apply these classifiers to activity patterns that were not seen before, the classifiers are able to detect activity errors, which represent potential prompt situations. We validate our approaches on smart home sensor data obtained from older adult participants, some of whom faced difficulties performing routine activities and thus committed errors.
NASA Astrophysics Data System (ADS)
Zhu, Likai; Radeloff, Volker C.; Ives, Anthony R.
2017-06-01
Mapping crop types is of great importance for assessing agricultural production, land-use patterns, and the environmental effects of agriculture. Indeed, both radiometric and spatial resolution of Landsat's sensors images are optimized for cropland monitoring. However, accurate mapping of crop types requires frequent cloud-free images during the growing season, which are often not available, and this raises the question of whether Landsat data can be combined with data from other satellites. Here, our goal is to evaluate to what degree fusing Landsat with MODIS Nadir Bidirectional Reflectance Distribution Function (BRDF)-Adjusted Reflectance (NBAR) data can improve crop-type classification. Choosing either one or two images from all cloud-free Landsat observations available for the Arlington Agricultural Research Station area in Wisconsin from 2010 to 2014, we generated 87 combinations of images, and used each combination as input into the Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM) algorithm to predict Landsat-like images at the nominal dates of each 8-day MODIS NBAR product. Both the original Landsat and STARFM-predicted images were then classified with a support vector machine (SVM), and we compared the classification errors of three scenarios: 1) classifying the one or two original Landsat images of each combination only, 2) classifying the one or two original Landsat images plus all STARFM-predicted images, and 3) classifying the one or two original Landsat images together with STARFM-predicted images for key dates. Our results indicated that using two Landsat images as the input of STARFM did not significantly improve the STARFM predictions compared to using only one, and predictions using Landsat images between July and August as input were most accurate. Including all STARFM-predicted images together with the Landsat images significantly increased average classification error by 4% points (from 21% to 25%) compared to using only Landsat images. However, incorporating only STARFM-predicted images for key dates decreased average classification error by 2% points (from 21% to 19%) compared to using only Landsat images. In particular, if only a single Landsat image was available, adding STARFM predictions for key dates significantly decreased the average classification error by 4 percentage points from 30% to 26% (p < 0.05). We conclude that adding STARFM-predicted images can be effective for improving crop-type classification when only limited Landsat observations are available, but carefully selecting images from a full set of STARFM predictions is crucial. We developed an approach to identify the optimal subsets of all STARFM predictions, which gives an alternative method of feature selection for future research.
Foundation and methodologies in computer-aided diagnosis systems for breast cancer detection.
Jalalian, Afsaneh; Mashohor, Syamsiah; Mahmud, Rozi; Karasfi, Babak; Saripan, M Iqbal B; Ramli, Abdul Rahman B
2017-01-01
Breast cancer is the most prevalent cancer that affects women all over the world. Early detection and treatment of breast cancer could decline the mortality rate. Some issues such as technical reasons, which related to imaging quality and human error, increase misdiagnosis of breast cancer by radiologists. Computer-aided detection systems (CADs) are developed to overcome these restrictions and have been studied in many imaging modalities for breast cancer detection in recent years. The CAD systems improve radiologists' performance in finding and discriminating between the normal and abnormal tissues. These procedures are performed only as a double reader but the absolute decisions are still made by the radiologist. In this study, the recent CAD systems for breast cancer detection on different modalities such as mammography, ultrasound, MRI, and biopsy histopathological images are introduced. The foundation of CAD systems generally consist of four stages: Pre-processing, Segmentation, Feature extraction, and Classification. The approaches which applied to design different stages of CAD system are summarised. Advantages and disadvantages of different segmentation, feature extraction and classification techniques are listed. In addition, the impact of imbalanced datasets in classification outcomes and appropriate methods to solve these issues are discussed. As well as, performance evaluation metrics for various stages of breast cancer detection CAD systems are reviewed.
Foundation and methodologies in computer-aided diagnosis systems for breast cancer detection
Jalalian, Afsaneh; Mashohor, Syamsiah; Mahmud, Rozi; Karasfi, Babak; Saripan, M. Iqbal B.; Ramli, Abdul Rahman B.
2017-01-01
Breast cancer is the most prevalent cancer that affects women all over the world. Early detection and treatment of breast cancer could decline the mortality rate. Some issues such as technical reasons, which related to imaging quality and human error, increase misdiagnosis of breast cancer by radiologists. Computer-aided detection systems (CADs) are developed to overcome these restrictions and have been studied in many imaging modalities for breast cancer detection in recent years. The CAD systems improve radiologists' performance in finding and discriminating between the normal and abnormal tissues. These procedures are performed only as a double reader but the absolute decisions are still made by the radiologist. In this study, the recent CAD systems for breast cancer detection on different modalities such as mammography, ultrasound, MRI, and biopsy histopathological images are introduced. The foundation of CAD systems generally consist of four stages: Pre-processing, Segmentation, Feature extraction, and Classification. The approaches which applied to design different stages of CAD system are summarised. Advantages and disadvantages of different segmentation, feature extraction and classification techniques are listed. In addition, the impact of imbalanced datasets in classification outcomes and appropriate methods to solve these issues are discussed. As well as, performance evaluation metrics for various stages of breast cancer detection CAD systems are reviewed. PMID:28435432
Using phase for radar scatterer classification
NASA Astrophysics Data System (ADS)
Moore, Linda J.; Rigling, Brian D.; Penno, Robert P.; Zelnio, Edmund G.
2017-04-01
Traditional synthetic aperture radar (SAR) systems tend to discard phase information of formed complex radar imagery prior to automatic target recognition (ATR). This practice has historically been driven by available hardware storage, processing capabilities, and data link capacity. Recent advances in high performance computing (HPC) have enabled extremely dense storage and processing solutions. Therefore, previous motives for discarding radar phase information in ATR applications have been mitigated. First, we characterize the value of phase in one-dimensional (1-D) radar range profiles with respect to the ability to correctly estimate target features, which are currently employed in ATR algorithms for target discrimination. These features correspond to physical characteristics of targets through radio frequency (RF) scattering phenomenology. Physics-based electromagnetic scattering models developed from the geometrical theory of diffraction are utilized for the information analysis presented here. Information is quantified by the error of target parameter estimates from noisy radar signals when phase is either retained or discarded. Operating conditions (OCs) of signal-tonoise ratio (SNR) and bandwidth are considered. Second, we investigate the value of phase in 1-D radar returns with respect to the ability to correctly classify canonical targets. Classification performance is evaluated via logistic regression for three targets (sphere, plate, tophat). Phase information is demonstrated to improve radar target classification rates, particularly at low SNRs and low bandwidths.
Exploring diversity in ensemble classification: Applications in large area land cover mapping
NASA Astrophysics Data System (ADS)
Mellor, Andrew; Boukir, Samia
2017-07-01
Ensemble classifiers, such as random forests, are now commonly applied in the field of remote sensing, and have been shown to perform better than single classifier systems, resulting in reduced generalisation error. Diversity across the members of ensemble classifiers is known to have a strong influence on classification performance - whereby classifier errors are uncorrelated and more uniformly distributed across ensemble members. The relationship between ensemble diversity and classification performance has not yet been fully explored in the fields of information science and machine learning and has never been examined in the field of remote sensing. This study is a novel exploration of ensemble diversity and its link to classification performance, applied to a multi-class canopy cover classification problem using random forests and multisource remote sensing and ancillary GIS data, across seven million hectares of diverse dry-sclerophyll dominated public forests in Victoria Australia. A particular emphasis is placed on analysing the relationship between ensemble diversity and ensemble margin - two key concepts in ensemble learning. The main novelty of our work is on boosting diversity by emphasizing the contribution of lower margin instances used in the learning process. Exploring the influence of tree pruning on diversity is also a new empirical analysis that contributes to a better understanding of ensemble performance. Results reveal insights into the trade-off between ensemble classification accuracy and diversity, and through the ensemble margin, demonstrate how inducing diversity by targeting lower margin training samples is a means of achieving better classifier performance for more difficult or rarer classes and reducing information redundancy in classification problems. Our findings inform strategies for collecting training data and designing and parameterising ensemble classifiers, such as random forests. This is particularly important in large area remote sensing applications, for which training data is costly and resource intensive to collect.
Analysis of swallowing sounds using hidden Markov models.
Aboofazeli, Mohammad; Moussavi, Zahra
2008-04-01
In recent years, acoustical analysis of the swallowing mechanism has received considerable attention due to its diagnostic potentials. This paper presents a hidden Markov model (HMM) based method for the swallowing sound segmentation and classification. Swallowing sound signals of 15 healthy and 11 dysphagic subjects were studied. The signals were divided into sequences of 25 ms segments each of which were represented by seven features. The sequences of features were modeled by HMMs. Trained HMMs were used for segmentation of the swallowing sounds into three distinct phases, i.e., initial quiet period, initial discrete sounds (IDS) and bolus transit sounds (BTS). Among the seven features, accuracy of segmentation by the HMM based on multi-scale product of wavelet coefficients was higher than that of the other HMMs and the linear prediction coefficient (LPC)-based HMM showed the weakest performance. In addition, HMMs were used for classification of the swallowing sounds of healthy subjects and dysphagic patients. Classification accuracy of different HMM configurations was investigated. When we increased the number of states of the HMMs from 4 to 8, the classification error gradually decreased. In most cases, classification error for N=9 was higher than that of N=8. Among the seven features used, root mean square (RMS) and waveform fractal dimension (WFD) showed the best performance in the HMM-based classification of swallowing sounds. When the sequences of the features of IDS segment were modeled separately, the accuracy reached up to 85.5%. As a second stage classification, a screening algorithm was used which correctly classified all the subjects but one healthy subject when RMS was used as characteristic feature of the swallowing sounds and the number of states was set to N=8.
Spelling in Adolescents with Dyslexia: Errors and Modes of Assessment
ERIC Educational Resources Information Center
Tops, Wim; Callens, Maaike; Bijn, Evi; Brysbaert, Marc
2014-01-01
In this study we focused on the spelling of high-functioning students with dyslexia. We made a detailed classification of the errors in a word and sentence dictation task made by 100 students with dyslexia and 100 matched control students. All participants were in the first year of their bachelor's studies and had Dutch as mother tongue. Three…
Estimation of a cover-type change matrix from error-prone data
Steen Magnussen
2009-01-01
Coregistration and classification errors seriously compromise per-pixel estimates of land cover change. A more robust estimation of change is proposed in which adjacent pixels are grouped into 3x3 clusters and treated as a unit of observation. A complete change matrix is recovered in a two-step process. The diagonal elements of a change matrix are recovered from...
ERIC Educational Resources Information Center
Chen, Chau-Kuang
2010-01-01
Artificial Neural Network (ANN) and Support Vector Machine (SVM) approaches have been on the cutting edge of science and technology for pattern recognition and data classification. In the ANN model, classification accuracy can be achieved by using the feed-forward of inputs, back-propagation of errors, and the adjustment of connection weights. In…
Gerald E. Rehfeldt; Nicholas L. Crookston; Cuauhtemoc Saenz-Romero; Elizabeth M. Campbell
2012-01-01
Data points intensively sampling 46 North American biomes were used to predict the geographic distribution of biomes from climate variables using the Random Forests classification tree. Techniques were incorporated to accommodate a large number of classes and to predict the future occurrence of climates beyond the contemporary climatic range of the biomes. Errors of...
Feasibility of Equivalent Dipole Models for Electroencephalogram-Based Brain Computer Interfaces.
Schimpf, Paul H
2017-09-15
This article examines the localization errors of equivalent dipolar sources inverted from the surface electroencephalogram in order to determine the feasibility of using their location as classification parameters for non-invasive brain computer interfaces. Inverse localization errors are examined for two head models: a model represented by four concentric spheres and a realistic model based on medical imagery. It is shown that the spherical model results in localization ambiguity such that a number of dipolar sources, with different azimuths and varying orientations, provide a near match to the electroencephalogram of the best equivalent source. No such ambiguity exists for the elevation of inverted sources, indicating that for spherical head models, only the elevation of inverted sources (and not the azimuth) can be expected to provide meaningful classification parameters for brain-computer interfaces. In a realistic head model, all three parameters of the inverted source location are found to be reliable, providing a more robust set of parameters. In both cases, the residual error hypersurfaces demonstrate local minima, indicating that a search for the best-matching sources should be global. Source localization error vs. signal-to-noise ratio is also demonstrated for both head models.
Identification of terrain cover using the optimum polarimetric classifier
NASA Technical Reports Server (NTRS)
Kong, J. A.; Swartz, A. A.; Yueh, H. A.; Novak, L. M.; Shin, R. T.
1988-01-01
A systematic approach for the identification of terrain media such as vegetation canopy, forest, and snow-covered fields is developed using the optimum polarimetric classifier. The covariance matrices for various terrain cover are computed from theoretical models of random medium by evaluating the scattering matrix elements. The optimal classification scheme makes use of a quadratic distance measure and is applied to classify a vegetation canopy consisting of both trees and grass. Experimentally measured data are used to validate the classification scheme. Analytical and Monte Carlo simulated classification errors using the fully polarimetric feature vector are compared with classification based on single features which include the phase difference between the VV and HH polarization returns. It is shown that the full polarimetric results are optimal and provide better classification performance than single feature measurements.
Classifying nursing errors in clinical management within an Australian hospital.
Tran, D T; Johnson, M
2010-12-01
Although many classification systems relating to patient safety exist, no taxonomy was identified that classified nursing errors in clinical management. To develop a classification system for nursing errors relating to clinical management (NECM taxonomy) and to describe contributing factors and patient consequences. We analysed 241 (11%) self-reported incidents relating to clinical management in nursing in a metropolitan hospital. Descriptive analysis of numeric data and content analysis of text data were undertaken to derive the NECM taxonomy, contributing factors and consequences for patients. Clinical management incidents represented 1.63 incidents per 1000 occupied bed days. The four themes of the NECM taxonomy were nursing care process (67%), communication (22%), administrative process (5%), and knowledge and skill (6%). Half of the incidents did not cause any patient harm. Contributing factors (n=111) included the following: patient clinical, social conditions and behaviours (27%); resources (22%); environment and workload (18%); other health professionals (15%); communication (13%); and nurse's knowledge and experience (5%). The NECM taxonomy provides direction to clinicians and managers on areas in clinical management that are most vulnerable to error, and therefore, priorities for system change management. Any nurses who wish to classify nursing errors relating to clinical management could use these types of errors. This study informs further research into risk management behaviour, and self-assessment tools for clinicians. Globally, nurses need to continue to monitor and act upon patient safety issues. © 2010 The Authors. International Nursing Review © 2010 International Council of Nurses.
In-vivo determination of chewing patterns using FBG and artificial neural networks
NASA Astrophysics Data System (ADS)
Pegorini, Vinicius; Zen Karam, Leandro; Rocha Pitta, Christiano S.; Ribeiro, Richardson; Simioni Assmann, Tangriani; Cardozo da Silva, Jean Carlos; Bertotti, Fábio L.; Kalinowski, Hypolito J.; Cardoso, Rafael
2015-09-01
This paper reports the process of pattern classification of the chewing process of ruminants. We propose a simplified signal processing scheme for optical fiber Bragg grating (FBG) sensors based on machine learning techniques. The FBG sensors measure the biomechanical forces during jaw movements and an artificial neural network is responsible for the classification of the associated chewing pattern. In this study, three patterns associated to dietary supplement, hay and ryegrass were considered. Additionally, two other important events for ingestive behavior studies were monitored, rumination and idle period. Experimental results show that the proposed approach for pattern classification has been capable of differentiating the materials involved in the chewing process with a small classification error.
NASA Technical Reports Server (NTRS)
Card, Don H.; Strong, Laurence L.
1989-01-01
An application of a classification accuracy assessment procedure is described for a vegetation and land cover map prepared by digital image processing of LANDSAT multispectral scanner data. A statistical sampling procedure called Stratified Plurality Sampling was used to assess the accuracy of portions of a map of the Arctic National Wildlife Refuge coastal plain. Results are tabulated as percent correct classification overall as well as per category with associated confidence intervals. Although values of percent correct were disappointingly low for most categories, the study was useful in highlighting sources of classification error and demonstrating shortcomings of the plurality sampling method.
Hyperspectral image classification based on local binary patterns and PCANet
NASA Astrophysics Data System (ADS)
Yang, Huizhen; Gao, Feng; Dong, Junyu; Yang, Yang
2018-04-01
Hyperspectral image classification has been well acknowledged as one of the challenging tasks of hyperspectral data processing. In this paper, we propose a novel hyperspectral image classification framework based on local binary pattern (LBP) features and PCANet. In the proposed method, linear prediction error (LPE) is first employed to select a subset of informative bands, and LBP is utilized to extract texture features. Then, spectral and texture features are stacked into a high dimensional vectors. Next, the extracted features of a specified position are transformed to a 2-D image. The obtained images of all pixels are fed into PCANet for classification. Experimental results on real hyperspectral dataset demonstrate the effectiveness of the proposed method.
Sub-pixel image classification for forest types in East Texas
NASA Astrophysics Data System (ADS)
Westbrook, Joey
Sub-pixel classification is the extraction of information about the proportion of individual materials of interest within a pixel. Landcover classification at the sub-pixel scale provides more discrimination than traditional per-pixel multispectral classifiers for pixels where the material of interest is mixed with other materials. It allows for the un-mixing of pixels to show the proportion of each material of interest. The materials of interest for this study are pine, hardwood, mixed forest and non-forest. The goal of this project was to perform a sub-pixel classification, which allows a pixel to have multiple labels, and compare the result to a traditional supervised classification, which allows a pixel to have only one label. The satellite image used was a Landsat 5 Thematic Mapper (TM) scene of the Stephen F. Austin Experimental Forest in Nacogdoches County, Texas and the four cover type classes are pine, hardwood, mixed forest and non-forest. Once classified, a multi-layer raster datasets was created that comprised four raster layers where each layer showed the percentage of that cover type within the pixel area. Percentage cover type maps were then produced and the accuracy of each was assessed using a fuzzy error matrix for the sub-pixel classifications, and the results were compared to the supervised classification in which a traditional error matrix was used. The overall accuracy of the sub-pixel classification using the aerial photo for both training and reference data had the highest (65% overall) out of the three sub-pixel classifications. This was understandable because the analyst can visually observe the cover types actually on the ground for training data and reference data, whereas using the FIA (Forest Inventory and Analysis) plot data, the analyst must assume that an entire pixel contains the exact percentage of a cover type found in a plot. An increase in accuracy was found after reclassifying each sub-pixel classification from nine classes with 10 percent interval each to five classes with 20 percent interval each. When compared to the supervised classification which has a satisfactory overall accuracy of 90%, none of the sub-pixel classification achieved the same level. However, since traditional per-pixel classifiers assign only one label to pixels throughout the landscape while sub-pixel classifications assign multiple labels to each pixel, the traditional 85% accuracy of acceptance for pixel-based classifications should not apply to sub-pixel classifications. More research is needed in order to define the level of accuracy that is deemed acceptable for sub-pixel classifications.
Correcting for sequencing error in maximum likelihood phylogeny inference.
Kuhner, Mary K; McGill, James
2014-11-04
Accurate phylogenies are critical to taxonomy as well as studies of speciation processes and other evolutionary patterns. Accurate branch lengths in phylogenies are critical for dating and rate measurements. Such accuracy may be jeopardized by unacknowledged sequencing error. We use simulated data to test a correction for DNA sequencing error in maximum likelihood phylogeny inference. Over a wide range of data polymorphism and true error rate, we found that correcting for sequencing error improves recovery of the branch lengths, even if the assumed error rate is up to twice the true error rate. Low error rates have little effect on recovery of the topology. When error is high, correction improves topological inference; however, when error is extremely high, using an assumed error rate greater than the true error rate leads to poor recovery of both topology and branch lengths. The error correction approach tested here was proposed in 2004 but has not been widely used, perhaps because researchers do not want to commit to an estimate of the error rate. This study shows that correction with an approximate error rate is generally preferable to ignoring the issue. Copyright © 2014 Kuhner and McGill.
Statistical Deviations From the Theoretical Only-SBU Model to Estimate MCU Rates in SRAMs
NASA Astrophysics Data System (ADS)
Franco, Francisco J.; Clemente, Juan Antonio; Baylac, Maud; Rey, Solenne; Villa, Francesca; Mecha, Hortensia; Agapito, Juan A.; Puchner, Helmut; Hubert, Guillaume; Velazco, Raoul
2017-08-01
This paper addresses a well-known problem that occurs when memories are exposed to radiation: the determination if a bit flip is isolated or if it belongs to a multiple event. As it is unusual to know the physical layout of the memory, this paper proposes to evaluate the statistical properties of the sets of corrupted addresses and to compare the results with a mathematical prediction model where all of the events are single bit upsets. A set of rules easy to implement in common programming languages can be iteratively applied if anomalies are observed, thus yielding a classification of errors quite closer to reality (more than 80% accuracy in our experiments).
Gaia eclipsing binary and multiple systems. Supervised classification and self-organizing maps
NASA Astrophysics Data System (ADS)
Süveges, M.; Barblan, F.; Lecoeur-Taïbi, I.; Prša, A.; Holl, B.; Eyer, L.; Kochoska, A.; Mowlavi, N.; Rimoldini, L.
2017-07-01
Context. Large surveys producing tera- and petabyte-scale databases require machine-learning and knowledge discovery methods to deal with the overwhelming quantity of data and the difficulties of extracting concise, meaningful information with reliable assessment of its uncertainty. This study investigates the potential of a few machine-learning methods for the automated analysis of eclipsing binaries in the data of such surveys. Aims: We aim to aid the extraction of samples of eclipsing binaries from such databases and to provide basic information about the objects. We intend to estimate class labels according to two different, well-known classification systems, one based on the light curve morphology (EA/EB/EW classes) and the other based on the physical characteristics of the binary system (system morphology classes; detached through overcontact systems). Furthermore, we explore low-dimensional surfaces along which the light curves of eclipsing binaries are concentrated, and consider their use in the characterization of the binary systems and in the exploration of biases of the full unknown Gaia data with respect to the training sets. Methods: We have explored the performance of principal component analysis (PCA), linear discriminant analysis (LDA), Random Forest classification and self-organizing maps (SOM) for the above aims. We pre-processed the photometric time series by combining a double Gaussian profile fit and a constrained smoothing spline, in order to de-noise and interpolate the observed light curves. We achieved further denoising, and selected the most important variability elements from the light curves using PCA. Supervised classification was performed using Random Forest and LDA based on the PC decomposition, while SOM gives a continuous 2-dimensional manifold of the light curves arranged by a few important features. We estimated the uncertainty of the supervised methods due to the specific finite training set using ensembles of models constructed on randomized training sets. Results: We obtain excellent results (about 5% global error rate) with classification into light curve morphology classes on the Hipparcos data. The classification into system morphology classes using the Catalog and Atlas of Eclipsing binaries (CALEB) has a higher error rate (about 10.5%), most importantly due to the (sometimes strong) similarity of the photometric light curves originating from physically different systems. When trained on CALEB and then applied to Kepler-detected eclipsing binaries subsampled according to Gaia observing times, LDA and SOM provide tractable, easy-to-visualize subspaces of the full (functional) space of light curves that summarize the most important phenomenological elements of the individual light curves. The sequence of light curves ordered by their first linear discriminant coefficient is compared to results obtained using local linear embedding. The SOM method proves able to find a 2-dimensional embedded surface in the space of the light curves which separates the system morphology classes in its different regions, and also identifies a few other phenomena, such as the asymmetry of the light curves due to spots, eccentric systems, and systems with a single eclipse. Furthermore, when data from other surveys are projected to the same SOM surface, the resulting map yields a good overview of the general biases and distortions due to differences in time sampling or population.
NASA Astrophysics Data System (ADS)
Zhu, Jun; Chen, Lijun; Ma, Lantao; Li, Dejian; Jiang, Wei; Pan, Lihong; Shen, Huiting; Jia, Hongmin; Hsiang, Chingyun; Cheng, Guojie; Ling, Li; Chen, Shijie; Wang, Jun; Liao, Wenkui; Zhang, Gary
2014-04-01
Defect review is a time consuming job. Human error makes result inconsistent. The defects located on don't care area would not hurt the yield and no need to review them such as defects on dark area. However, critical area defects can impact yield dramatically and need more attention to review them such as defects on clear area. With decrease in integrated circuit dimensions, mask defects are always thousands detected during inspection even more. Traditional manual or simple classification approaches are unable to meet efficient and accuracy requirement. This paper focuses on automatic defect management and classification solution using image output of Lasertec inspection equipment and Anchor pattern centric image process technology. The number of mask defect found during an inspection is always in the range of thousands or even more. This system can handle large number defects with quick and accurate defect classification result. Our experiment includes Die to Die and Single Die modes. The classification accuracy can reach 87.4% and 93.3%. No critical or printable defects are missing in our test cases. The missing classification defects are 0.25% and 0.24% in Die to Die mode and Single Die mode. This kind of missing rate is encouraging and acceptable to apply on production line. The result can be output and reloaded back to inspection machine to have further review. This step helps users to validate some unsure defects with clear and magnification images when captured images can't provide enough information to make judgment. This system effectively reduces expensive inline defect review time. As a fully inline automated defect management solution, the system could be compatible with current inspection approach and integrated with optical simulation even scoring function and guide wafer level defect inspection.
Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang
2018-02-01
Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .
Kumar, Shiu; Mamun, Kabir; Sharma, Alok
2017-12-01
Classification of electroencephalography (EEG) signals for motor imagery based brain computer interface (MI-BCI) is an exigent task and common spatial pattern (CSP) has been extensively explored for this purpose. In this work, we focused on developing a new framework for classification of EEG signals for MI-BCI. We propose a single band CSP framework for MI-BCI that utilizes the concept of tangent space mapping (TSM) in the manifold of covariance matrices. The proposed method is named CSP-TSM. Spatial filtering is performed on the bandpass filtered MI EEG signal. Riemannian tangent space is utilized for extracting features from the spatial filtered signal. The TSM features are then fused with the CSP variance based features and feature selection is performed using Lasso. Linear discriminant analysis (LDA) is then applied to the selected features and finally classification is done using support vector machine (SVM) classifier. The proposed framework gives improved performance for MI EEG signal classification in comparison with several competing methods. Experiments conducted shows that the proposed framework reduces the overall classification error rate for MI-BCI by 3.16%, 5.10% and 1.70% (for BCI Competition III dataset IVa, BCI Competition IV Dataset I and BCI Competition IV Dataset IIb, respectively) compared to the conventional CSP method under the same experimental settings. The proposed CSP-TSM method produces promising results when compared with several competing methods in this paper. In addition, the computational complexity is less compared to that of TSM method. Our proposed CSP-TSM framework can be potentially used for developing improved MI-BCI systems. Copyright © 2017 Elsevier Ltd. All rights reserved.
Mansberger, Steven L; Menda, Shivali A; Fortune, Brad A; Gardiner, Stuart K; Demirel, Shaban
2017-02-01
To characterize the error of optical coherence tomography (OCT) measurements of retinal nerve fiber layer (RNFL) thickness when using automated retinal layer segmentation algorithms without manual refinement. Cross-sectional study. This study was set in a glaucoma clinical practice, and the dataset included 3490 scans from 412 eyes of 213 individuals with a diagnosis of glaucoma or glaucoma suspect. We used spectral domain OCT (Spectralis) to measure RNFL thickness in a 6-degree peripapillary circle, and exported the native "automated segmentation only" results. In addition, we exported the results after "manual refinement" to correct errors in the automated segmentation of the anterior (internal limiting membrane) and the posterior boundary of the RNFL. Our outcome measures included differences in RNFL thickness and glaucoma classification (i.e., normal, borderline, or outside normal limits) between scans with automated segmentation only and scans using manual refinement. Automated segmentation only resulted in a thinner global RNFL thickness (1.6 μm thinner, P < .001) when compared to manual refinement. When adjusted by operator, a multivariate model showed increased differences with decreasing RNFL thickness (P < .001), decreasing scan quality (P < .001), and increasing age (P < .03). Manual refinement changed 298 of 3486 (8.5%) of scans to a different global glaucoma classification, wherein 146 of 617 (23.7%) of borderline classifications became normal. Superior and inferior temporal clock hours had the largest differences. Automated segmentation without manual refinement resulted in reduced global RNFL thickness and overestimated the classification of glaucoma. Differences increased in eyes with a thinner RNFL thickness, older age, and decreased scan quality. Operators should inspect and manually refine OCT retinal layer segmentation when assessing RNFL thickness in the management of patients with glaucoma. Copyright © 2016 Elsevier Inc. All rights reserved.
Bertolini, F; Galimberti, G; Schiavo, G; Mastrangelo, S; Di Gerlando, R; Strillacci, M G; Bagnato, A; Portolano, B; Fontanesi, L
2018-01-01
Commercial single nucleotide polymorphism (SNP) arrays have been recently developed for several species and can be used to identify informative markers to differentiate breeds or populations for several downstream applications. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this work, we compared several methods of SNPs preselection (Delta, F st and principal component analyses (PCA)) in addition to Random Forest classifications to analyse SNP data from six dairy cattle breeds, including cosmopolitan (Holstein, Brown and Simmental) and autochthonous Italian breeds raised in two different regions and subjected to limited or no breeding programmes (Cinisara, Modicana, raised only in Sicily and Reggiana, raised only in Emilia Romagna). From these classifications, two panels of 96 and 48 SNPs that contain the most discriminant SNPs were created for each preselection method. These panels were evaluated in terms of the ability to discriminate as a whole and breed-by-breed, as well as linkage disequilibrium within each panel. The obtained results showed that for the 48-SNP panel, the error rate increased mainly for autochthonous breeds, probably as a consequence of their admixed origin lower selection pressure and by ascertaining bias in the construction of the SNP chip. The 96-SNP panels were generally more able to discriminate all breeds. The panel derived by PCA-chrom (obtained by a preselection chromosome by chromosome) could identify informative SNPs that were particularly useful for the assignment of minor breeds that reached the lowest value of Out Of Bag error even in the Cinisara, whose value was quite high in all other panels. Moreover, this panel contained also the lowest number of SNPs in linkage disequilibrium. Several selected SNPs are located nearby genes affecting breed-specific phenotypic traits (coat colour and stature) or associated with production traits. In general, our results demonstrated the usefulness of Random Forest in combination to other reduction techniques to identify population informative SNPs.
Acquiring Research-grade ALSM Data in the Commercial Marketplace
NASA Astrophysics Data System (ADS)
Haugerud, R. A.; Harding, D. J.; Latypov, D.; Martinez, D.; Routh, S.; Ziegler, J.
2003-12-01
The Puget Sound Lidar Consortium, working with TerraPoint, LLC, has procured a large volume of ALSM (topographic lidar) data for scientific research. Research-grade ALSM data can be characterized by their completeness, density, and accuracy. Complete data include-at a minimum-X, Y, Z, time, and classification (ground, vegetation, structure, blunder) for each laser reflection. Off-nadir angle and return number for multiple returns are also useful. We began with a pulse density of 1/sq m, and after limited experiments still find this density satisfactory in the dense second-growth forests of western Washington. Lower pulse densities would have produced unacceptably limited sampling in forested areas and aliased some topographic features. Higher pulse densities do not produce markedly better topographic models, in part because of limitations of reproducibility between the overlapping survey swaths used to achieve higher density. Our experience in a variety of forest types demonstrates that the fraction of pulses that produce ground returns varies with vegetation cover, laser beam divergence, laser power, and detector sensitivity, but have not quantified this relationship. The most significant operational limits on vertical accuracy of ALSM appear to be instrument calibration and the accuracy with which returns are classified as ground or vegetation. TerraPoint has recently implemented in-situ calibration using overlapping swaths (Latypov and Zosse, 2002, see http://www.terrapoint.com/News_damirACSM_ASPRS2002.html). On the consumer side, we routinely perform a similar overlap analysis to produce maps of relative Z error between swaths; we find that in bare, low-slope regions the in-situ calibration has reduced this internal Z error to 6-10 cm RMSE. Comparison with independent ground control points commonly illuminates inconsistencies in how GPS heights have been reduced to orthometric heights. Once these inconsistencies are resolved, it appears that the internal errors are the bulk of the error of the survey. The error maps suggest that with in-situ calibration, minor time-varying errors with a period of circa 1 sec are the largest remaining source of survey error. For forested terrain, limited ground penetration and errors in return classification can severely limit the accuracy of resulting topographic models. Initial work by Haugerud and Harding demonstrated the feasibility of fully-automatic return classification; however, TerraPoint has found that better results can be obtained more effectively with 3rd-party classification software that allows a mix of automated routines and human intervention. Our relationship has been evolving since early 2000. Important aspects of this relationship include close communication between data producer and consumer, a willingness to learn from each other, significant technical expertise and resources on the consumer side, and continued refinement of achievable, quantitative performance and accuracy specifications. Most recently we have instituted a slope-dependent Z accuracy specification that TerraPoint first developed as a heuristic for surveying mountainous terrain in Switzerland. We are now working on quantifying the internal consistency of topographic models in forested areas, using a variant of overlap analysis, and standards for the spatial distribution of internal errors.
Ramírez, J; Górriz, J M; Ortiz, A; Martínez-Murcia, F J; Segovia, F; Salas-Gonzalez, D; Castillo-Barnes, D; Illán, I A; Puntonet, C G
2018-05-15
Alzheimer's disease (AD) is the most common cause of dementia in the elderly and affects approximately 30 million individuals worldwide. Mild cognitive impairment (MCI) is very frequently a prodromal phase of AD, and existing studies have suggested that people with MCI tend to progress to AD at a rate of about 10-15% per year. However, the ability of clinicians and machine learning systems to predict AD based on MRI biomarkers at an early stage is still a challenging problem that can have a great impact in improving treatments. The proposed system, developed by the SiPBA-UGR team for this challenge, is based on feature standardization, ANOVA feature selection, partial least squares feature dimension reduction and an ensemble of One vs. Rest random forest classifiers. With the aim of improving its performance when discriminating healthy controls (HC) from MCI, a second binary classification level was introduced that reconsiders the HC and MCI predictions of the first level. The system was trained and evaluated on an ADNI datasets that consist of T1-weighted MRI morphological measurements from HC, stable MCI, converter MCI and AD subjects. The proposed system yields a 56.25% classification score on the test subset which consists of 160 real subjects. The classifier yielded the best performance when compared to: (i) One vs. One (OvO), One vs. Rest (OvR) and error correcting output codes (ECOC) as strategies for reducing the multiclass classification task to multiple binary classification problems, (ii) support vector machines, gradient boosting classifier and random forest as base binary classifiers, and (iii) bagging ensemble learning. A robust method has been proposed for the international challenge on MCI prediction based on MRI data. The system yielded the second best performance during the competition with an accuracy rate of 56.25% when evaluated on the real subjects of the test set. Copyright © 2017 Elsevier B.V. All rights reserved.
Boursier, Jérôme; Bertrais, Sandrine; Oberti, Frédéric; Gallois, Yves; Fouchard-Hubert, Isabelle; Rousselet, Marie-Christine; Zarski, Jean-Pierre; Calès, Paul
2011-11-30
Non-invasive tests have been constructed and evaluated mainly for binary diagnoses such as significant fibrosis. Recently, detailed fibrosis classifications for several non-invasive tests have been developed, but their accuracy has not been thoroughly evaluated in comparison to liver biopsy, especially in clinical practice and for Fibroscan. Therefore, the main aim of the present study was to evaluate the accuracy of detailed fibrosis classifications available for non-invasive tests and liver biopsy. The secondary aim was to validate these accuracies in independent populations. Four HCV populations provided 2,068 patients with liver biopsy, four different pathologist skill-levels and non-invasive tests. Results were expressed as percentages of correctly classified patients. In population #1 including 205 patients and comparing liver biopsy (reference: consensus reading by two experts) and blood tests, Metavir fibrosis (FM) stage accuracy was 64.4% in local pathologists vs. 82.2% (p < 10-3) in single expert pathologist. Significant discrepancy (≥ 2FM vs reference histological result) rates were: Fibrotest: 17.2%, FibroMeter2G: 5.6%, local pathologists: 4.9%, FibroMeter3G: 0.5%, expert pathologist: 0% (p < 10-3). In population #2 including 1,056 patients and comparing blood tests, the discrepancy scores, taking into account the error magnitude, of detailed fibrosis classification were significantly different between FibroMeter2G (0.30 ± 0.55) and FibroMeter3G (0.14 ± 0.37, p < 10-3) or Fibrotest (0.84 ± 0.80, p < 10-3). In population #3 (and #4) including 458 (359) patients and comparing blood tests and Fibroscan, accuracies of detailed fibrosis classification were, respectively: Fibrotest: 42.5% (33.5%), Fibroscan: 64.9% (50.7%), FibroMeter2G: 68.7% (68.2%), FibroMeter3G: 77.1% (83.4%), p < 10-3 (p < 10-3). Significant discrepancy (≥ 2 FM) rates were, respectively: Fibrotest: 21.3% (22.2%), Fibroscan: 12.9% (12.3%), FibroMeter2G: 5.7% (6.0%), FibroMeter3G: 0.9% (0.9%), p < 10-3 (p < 10-3). The accuracy in detailed fibrosis classification of the best-performing blood test outperforms liver biopsy read by a local pathologist, i.e., in clinical practice; however, the classification precision is apparently lesser. This detailed classification accuracy is much lower than that of significant fibrosis with Fibroscan and even Fibrotest but higher with FibroMeter3G. FibroMeter classification accuracy was significantly higher than those of other non-invasive tests. Finally, for hepatitis C evaluation in clinical practice, fibrosis degree can be evaluated using an accurate blood test.
2011-01-01
Background Non-invasive tests have been constructed and evaluated mainly for binary diagnoses such as significant fibrosis. Recently, detailed fibrosis classifications for several non-invasive tests have been developed, but their accuracy has not been thoroughly evaluated in comparison to liver biopsy, especially in clinical practice and for Fibroscan. Therefore, the main aim of the present study was to evaluate the accuracy of detailed fibrosis classifications available for non-invasive tests and liver biopsy. The secondary aim was to validate these accuracies in independent populations. Methods Four HCV populations provided 2,068 patients with liver biopsy, four different pathologist skill-levels and non-invasive tests. Results were expressed as percentages of correctly classified patients. Results In population #1 including 205 patients and comparing liver biopsy (reference: consensus reading by two experts) and blood tests, Metavir fibrosis (FM) stage accuracy was 64.4% in local pathologists vs. 82.2% (p < 10-3) in single expert pathologist. Significant discrepancy (≥ 2FM vs reference histological result) rates were: Fibrotest: 17.2%, FibroMeter2G: 5.6%, local pathologists: 4.9%, FibroMeter3G: 0.5%, expert pathologist: 0% (p < 10-3). In population #2 including 1,056 patients and comparing blood tests, the discrepancy scores, taking into account the error magnitude, of detailed fibrosis classification were significantly different between FibroMeter2G (0.30 ± 0.55) and FibroMeter3G (0.14 ± 0.37, p < 10-3) or Fibrotest (0.84 ± 0.80, p < 10-3). In population #3 (and #4) including 458 (359) patients and comparing blood tests and Fibroscan, accuracies of detailed fibrosis classification were, respectively: Fibrotest: 42.5% (33.5%), Fibroscan: 64.9% (50.7%), FibroMeter2G: 68.7% (68.2%), FibroMeter3G: 77.1% (83.4%), p < 10-3 (p < 10-3). Significant discrepancy (≥ 2 FM) rates were, respectively: Fibrotest: 21.3% (22.2%), Fibroscan: 12.9% (12.3%), FibroMeter2G: 5.7% (6.0%), FibroMeter3G: 0.9% (0.9%), p < 10-3 (p < 10-3). Conclusions The accuracy in detailed fibrosis classification of the best-performing blood test outperforms liver biopsy read by a local pathologist, i.e., in clinical practice; however, the classification precision is apparently lesser. This detailed classification accuracy is much lower than that of significant fibrosis with Fibroscan and even Fibrotest but higher with FibroMeter3G. FibroMeter classification accuracy was significantly higher than those of other non-invasive tests. Finally, for hepatitis C evaluation in clinical practice, fibrosis degree can be evaluated using an accurate blood test. PMID:22129438
Kazaryan, Airazat M.; Røsok, Bård I.; Edwin, Bjørn
2013-01-01
Background. Morbidity is a cornerstone assessing surgical treatment; nevertheless surgeons have not reached extensive consensus on this problem. Methods and Findings. Clavien, Dindo, and Strasberg with coauthors (1992, 2004, 2009, and 2010) made significant efforts to the standardization of surgical morbidity (Clavien-Dindo-Strasberg classification, last revision, the Accordion classification). However, this classification includes only postoperative complications and has two principal shortcomings: disregard of intraoperative events and confusing terminology. Postoperative events have a major impact on patient well-being. However, intraoperative events should also be recorded and reported even if they do not evidently affect the patient's postoperative well-being. The term surgical complication applied in the Clavien-Dindo-Strasberg classification may be regarded as an incident resulting in a complication caused by technical failure of surgery, in contrast to the so-called medical complications. Therefore, the term surgical complication contributes to misinterpretation of perioperative morbidity. The term perioperative adverse events comprising both intraoperative unfavourable incidents and postoperative complications could be regarded as better alternative. In 2005, Satava suggested a simple grading to evaluate intraoperative surgical errors. Based on that approach, we have elaborated a 3-grade classification of intraoperative incidents so that it can be used to grade intraoperative events of any type of surgery. Refinements have been made to the Accordion classification of postoperative complications. Interpretation. The proposed systematization of perioperative adverse events utilizing the combined application of two appraisal tools, that is, the elaborated classification of intraoperative incidents on the basis of the Satava approach to surgical error evaluation together with the modified Accordion classification of postoperative complication, appears to be an effective tool for comprehensive assessment of surgical outcomes. This concept was validated in regard to various surgical procedures. Broad implementation of this approach will promote the development of surgical science and practice. PMID:23762627
Furlanello, Cesare; Serafini, Maria; Merler, Stefano; Jurman, Giuseppe
2003-11-06
We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process). With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.
DMSP SSJ4 Data Restoration, Classification, and On-Line Data Access
NASA Technical Reports Server (NTRS)
Wing, Simon; Bredekamp, Joseph H. (Technical Monitor)
2000-01-01
Compress and clean raw data file for permanent storage We have identified various error conditions/types and developed algorithms to get rid of these errors/noises, including the more complicated noise in the newer data sets. (status = 100% complete). Internet access of compacted raw data. It is now possible to access the raw data via our web site, http://www.jhuapl.edu/Aurora/index.html. The software to read and plot the compacted raw data is also available from the same web site. The users can now download the raw data, read, plot, or manipulate the data as they wish on their own computer. The users are able to access the cleaned data sets. Internet access of the color spectrograms. This task has also been completed. It is now possible to access the spectrograms from the web site mentioned above. Improve the particle precipitation region classification. The algorithm for doing this task has been developed and implemented. As a result, the accuracies improved. Now the web site routinely distributes the results of applying the new algorithm to the cleaned data set. Mark the classification region on the spectrograms. The software to mark the classification region in the spectrograms has been completed. This is also available from our web site.
Predicting mountain lion activity using radiocollars equipped with mercury tip-sensors
Janis, Michael W.; Clark, Joseph D.; Johnson, Craig
1999-01-01
Radiotelemetry collars with tip-sensors have long been used to monitor wildlife activity. However, comparatively few researchers have tested the reliability of the technique on the species being studied. To evaluate the efficacy of using tip-sensors to assess mountain lion (Puma concolor) activity, we radiocollared 2 hand-reared mountain lions and simultaneously recorded their behavior and the associated telemetry signal characteristics. We noted both the number of pulse-rate changes and the percentage of time the transmitter emitted a fast pulse rate (i.e., head up) within sampling intervals ranging from 1-5 minutes. Based on 27 hours of observations, we were able to correctly distinguish between active and inactive behaviors >93% of the time using a logistic regression model. We present several models to predict activity of mountain lions; the selection of which to us would depend on study objectives and logistics. Our results indicate that field protocols that use only pulse-rate changes to indicate activity can lead to significant classification errors.
Prediction of change in protein unfolding rates upon point mutations in two state proteins.
Chaudhary, Priyashree; Naganathan, Athi N; Gromiha, M Michael
2016-09-01
Studies on protein unfolding rates are limited and challenging due to the complexity of unfolding mechanism and the larger dynamic range of the experimental data. Though attempts have been made to predict unfolding rates using protein sequence-structure information there is no available method for predicting the unfolding rates of proteins upon specific point mutations. In this work, we have systematically analyzed a set of 790 single mutants and developed a robust method for predicting protein unfolding rates upon mutations (Δlnku) in two-state proteins by combining amino acid properties and knowledge-based classification of mutants with multiple linear regression technique. We obtain a mean absolute error (MAE) of 0.79/s and a Pearson correlation coefficient (PCC) of 0.71 between predicted unfolding rates and experimental observations using jack-knife test. We have developed a web server for predicting protein unfolding rates upon mutation and it is freely available at https://www.iitm.ac.in/bioinfo/proteinunfolding/unfoldingrace.html. Prominent features that determine unfolding kinetics as well as plausible reasons for the observed outliers are also discussed. Copyright © 2016 Elsevier B.V. All rights reserved.
Classifications for Cesarean Section: A Systematic Review
Torloni, Maria Regina; Betran, Ana Pilar; Souza, Joao Paulo; Widmer, Mariana; Allen, Tomas; Gulmezoglu, Metin; Merialdi, Mario
2011-01-01
Background Rising cesarean section (CS) rates are a major public health concern and cause worldwide debates. To propose and implement effective measures to reduce or increase CS rates where necessary requires an appropriate classification. Despite several existing CS classifications, there has not yet been a systematic review of these. This study aimed to 1) identify the main CS classifications used worldwide, 2) analyze advantages and deficiencies of each system. Methods and Findings Three electronic databases were searched for classifications published 1968–2008. Two reviewers independently assessed classifications using a form created based on items rated as important by international experts. Seven domains (ease, clarity, mutually exclusive categories, totally inclusive classification, prospective identification of categories, reproducibility, implementability) were assessed and graded. Classifications were tested in 12 hypothetical clinical case-scenarios. From a total of 2948 citations, 60 were selected for full-text evaluation and 27 classifications identified. Indications classifications present important limitations and their overall score ranged from 2–9 (maximum grade = 14). Degree of urgency classifications also had several drawbacks (overall scores 6–9). Woman-based classifications performed best (scores 5–14). Other types of classifications require data not routinely collected and may not be relevant in all settings (scores 3–8). Conclusions This review and critical appraisal of CS classifications is a methodologically sound contribution to establish the basis for the appropriate monitoring and rational use of CS. Results suggest that women-based classifications in general, and Robson's classification, in particular, would be in the best position to fulfill current international and local needs and that efforts to develop an internationally applicable CS classification would be most appropriately placed in building upon this classification. The use of a single CS classification will facilitate auditing, analyzing and comparing CS rates across different settings and help to create and implement effective strategies specifically targeted to optimize CS rates where necessary. PMID:21283801
Contrast enhancement of mail piece images
NASA Astrophysics Data System (ADS)
Shin, Yong-Chul; Sridhar, Ramalingam; Demjanenko, Victor; Palumbo, Paul W.; Hull, Jonathan J.
1992-08-01
A New approach to contrast enhancement of mail piece images is presented. The contrast enhancement is used as a preprocessing step in the real-time address block location (RT-ABL) system. The RT-ABL system processes a stream of mail piece images and locates destination address blocks. Most of the mail pieces (classified into letters) show high contrast between background and foreground. As an extreme case, however, the seasonal greeting cards usually use colored envelopes which results in reduced contrast osured by an error rate by using a linear distributed associative memory (DAM). The DAM is trained to recognize the spectra of three classes of images: with high, medium, and low OCR error rates. The DAM is not forced to make a classification every time. It is allowed to reject as unknown a spectrum presented that does not closely resemble any that has been stored in the DAM. The DAM was fairly accurate with noisy images but conservative (i.e., rejected several text images as unknowns) when there was little ground and foreground degradations without affecting the nondegraded images. This approach provides local enhancement which adapts to local features. In order to simplify the computation of A and (sigma) , dynamic programming technique is used. Implementation details, performance, and the results on test images are presented in this paper.
Software errors and complexity: An empirical investigation
NASA Technical Reports Server (NTRS)
Basili, Victor R.; Perricone, Berry T.
1983-01-01
The distributions and relationships derived from the change data collected during the development of a medium scale satellite software project show that meaningful results can be obtained which allow an insight into software traits and the environment in which it is developed. Modified and new modules were shown to behave similarly. An abstract classification scheme for errors which allows a better understanding of the overall traits of a software project is also shown. Finally, various size and complexity metrics are examined with respect to errors detected within the software yielding some interesting results.
Software errors and complexity: An empirical investigation
NASA Technical Reports Server (NTRS)
Basili, V. R.; Perricone, B. T.
1982-01-01
The distributions and relationships derived from the change data collected during the development of a medium scale satellite software project show that meaningful results can be obtained which allow an insight into software traits and the environment in which it is developed. Modified and new modules were shown to behave similarly. An abstract classification scheme for errors which allows a better understanding of the overall traits of a software project is also shown. Finally, various size and complexity metrics are examined with respect to errors detected within the software yielding some interesting results.
Ho, B T; Tsai, M J; Wei, J; Ma, M; Saipetch, P
1996-01-01
A new method of video compression for angiographic images has been developed to achieve high compression ratio (~20:1) while eliminating block artifacts which leads to loss of diagnostic accuracy. This method adopts motion picture experts group's (MPEGs) motion compensated prediction to takes advantage of frame to frame correlation. However, in contrast to MPEG, the error images arising from mismatches in the motion estimation are encoded by discrete wavelet transform (DWT) rather than block discrete cosine transform (DCT). Furthermore, the authors developed a classification scheme which label each block in an image as intra, error, or background type and encode it accordingly. This hybrid coding can significantly improve the compression efficiency in certain eases. This method can be generalized for any dynamic image sequences applications sensitive to block artifacts.
Empirically Estimable Classification Bounds Based on a Nonparametric Divergence Measure
Berisha, Visar; Wisler, Alan; Hero, Alfred O.; Spanias, Andreas
2015-01-01
Information divergence functions play a critical role in statistics and information theory. In this paper we show that a non-parametric f-divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution and for the case where there exists some mismatch between training and test distributions. We confirm the theoretical results by designing feature selection algorithms using the criteria from these bounds and by evaluating the algorithms on a series of pathological speech classification tasks. PMID:26807014
NASA Astrophysics Data System (ADS)
Huo, Ming-Xia; Li, Ying
2017-12-01
Quantum error correction is important to quantum information processing, which allows us to reliably process information encoded in quantum error correction codes. Efficient quantum error correction benefits from the knowledge of error rates. We propose a protocol for monitoring error rates in real time without interrupting the quantum error correction. Any adaptation of the quantum error correction code or its implementation circuit is not required. The protocol can be directly applied to the most advanced quantum error correction techniques, e.g. surface code. A Gaussian processes algorithm is used to estimate and predict error rates based on error correction data in the past. We find that using these estimated error rates, the probability of error correction failures can be significantly reduced by a factor increasing with the code distance.
Comparisons of neural networks to standard techniques for image classification and correlation
NASA Technical Reports Server (NTRS)
Paola, Justin D.; Schowengerdt, Robert A.
1994-01-01
Neural network techniques for multispectral image classification and spatial pattern detection are compared to the standard techniques of maximum-likelihood classification and spatial correlation. The neural network produced a more accurate classification than maximum-likelihood of a Landsat scene of Tucson, Arizona. Some of the errors in the maximum-likelihood classification are illustrated using decision region and class probability density plots. As expected, the main drawback to the neural network method is the long time required for the training stage. The network was trained using several different hidden layer sizes to optimize both the classification accuracy and training speed, and it was found that one node per class was optimal. The performance improved when 3x3 local windows of image data were entered into the net. This modification introduces texture into the classification without explicit calculation of a texture measure. Larger windows were successfully used for the detection of spatial features in Landsat and Magellan synthetic aperture radar imagery.
Code of Federal Regulations, 2011 CFR
2011-07-01
... different classifications of real property are taxed at different rates? 222.68 Section 222.68 Education... different classifications of real property are taxed at different rates? If the real property of an LEA and its generally comparable LEAs consists of two or more classifications of real property taxed at...
Kim, Ko Eun; Jeoung, Jin Wook; Park, Ki Ho; Kim, Dong Myung; Kim, Seok Hwan
2015-03-01
To investigate the rate and associated factors of false-positive diagnostic classification of ganglion cell analysis (GCA) and retinal nerve fiber layer (RNFL) maps, and characteristic false-positive patterns on optical coherence tomography (OCT) deviation maps. Prospective, cross-sectional study. A total of 104 healthy eyes of 104 normal participants. All participants underwent peripapillary and macular spectral-domain (Cirrus-HD, Carl Zeiss Meditec Inc, Dublin, CA) OCT scans. False-positive diagnostic classification was defined as yellow or red color-coded areas for GCA and RNFL maps. Univariate and multivariate logistic regression analyses were used to determine associated factors. Eyes with abnormal OCT deviation maps were categorized on the basis of the shape and location of abnormal color-coded area. Differences in clinical characteristics among the subgroups were compared. (1) The rate and associated factors of false-positive OCT maps; (2) patterns of false-positive, color-coded areas on the GCA deviation map and associated clinical characteristics. Of the 104 healthy eyes, 42 (40.4%) and 32 (30.8%) showed abnormal diagnostic classifications on any of the GCA and RNFL maps, respectively. Multivariate analysis revealed that false-positive GCA diagnostic classification was associated with longer axial length and larger fovea-disc angle, whereas longer axial length and smaller disc area were associated with abnormal RNFL maps. Eyes with abnormal GCA deviation map were categorized as group A (donut-shaped round area around the inner annulus), group B (island-like isolated area), and group C (diffuse, circular area with an irregular inner margin in either). The axial length showed a significant increasing trend from group A to C (P=0.001), and likewise, the refractive error was more myopic in group C than in groups A (P=0.015) and B (P=0.014). Group C had thinner average ganglion cell-inner plexiform layer thickness compared with other groups (group A=B>C, P=0.004). Abnormal OCT diagnostic classification should be interpreted with caution, especially in eyes with long axial lengths, large fovea-disc angles, and small optic discs. Our findings suggest that the characteristic patterns of OCT deviation map can provide useful clues to distinguish glaucomatous changes from false-positive findings. Copyright © 2015 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Analysis and application of classification methods of complex carbonate reservoirs
NASA Astrophysics Data System (ADS)
Li, Xiongyan; Qin, Ruibao; Ping, Haitao; Wei, Dan; Liu, Xiaomei
2018-06-01
There are abundant carbonate reservoirs from the Cenozoic to Mesozoic era in the Middle East. Due to variation in sedimentary environment and diagenetic process of carbonate reservoirs, several porosity types coexist in carbonate reservoirs. As a result, because of the complex lithologies and pore types as well as the impact of microfractures, the pore structure is very complicated. Therefore, it is difficult to accurately calculate the reservoir parameters. In order to accurately evaluate carbonate reservoirs, based on the pore structure evaluation of carbonate reservoirs, the classification methods of carbonate reservoirs are analyzed based on capillary pressure curves and flow units. Based on the capillary pressure curves, although the carbonate reservoirs can be classified, the relationship between porosity and permeability after classification is not ideal. On the basis of the flow units, the high-precision functional relationship between porosity and permeability after classification can be established. Therefore, the carbonate reservoirs can be quantitatively evaluated based on the classification of flow units. In the dolomite reservoirs, the average absolute error of calculated permeability decreases from 15.13 to 7.44 mD. Similarly, the average absolute error of calculated permeability of limestone reservoirs is reduced from 20.33 to 7.37 mD. Only by accurately characterizing pore structures and classifying reservoir types, reservoir parameters could be calculated accurately. Therefore, characterizing pore structures and classifying reservoir types are very important to accurate evaluation of complex carbonate reservoirs in the Middle East.
NASA Astrophysics Data System (ADS)
Xie, W.-J.; Zhang, L.; Chen, H.-P.; Zhou, J.; Mao, W.-J.
2018-04-01
The purpose of carrying out national geographic conditions monitoring is to obtain information of surface changes caused by human social and economic activities, so that the geographic information can be used to offer better services for the government, enterprise and public. Land cover data contains detailed geographic conditions information, thus has been listed as one of the important achievements in the national geographic conditions monitoring project. At present, the main issue of the production of the land cover data is about how to improve the classification accuracy. For the land cover data quality inspection and acceptance, classification accuracy is also an important check point. So far, the classification accuracy inspection is mainly based on human-computer interaction or manual inspection in the project, which are time consuming and laborious. By harnessing the automatic high-resolution remote sensing image change detection technology based on the ERDAS IMAGINE platform, this paper carried out the classification accuracy inspection test of land cover data in the project, and presented a corresponding technical route, which includes data pre-processing, change detection, result output and information extraction. The result of the quality inspection test shows the effectiveness of the technical route, which can meet the inspection needs for the two typical errors, that is, missing and incorrect update error, and effectively reduces the work intensity of human-computer interaction inspection for quality inspectors, and also provides a technical reference for the data production and quality control of the land cover data.
Boatin, A A; Cullinane, F; Torloni, M R; Betrán, A P
2018-01-01
In most regions worldwide, caesarean section (CS) rates are increasing. In these settings, new strategies are needed to reduce CS rates. To identify, critically appraise and synthesise studies using the Robson classification as a system to categorise and analyse data in clinical audit cycles to reduce CS rates. Medline, Embase, CINAHL and LILACS were searched from 2001 to 2016. Studies reporting use of the Robson classification to categorise and analyse data in clinical audit cycles to reduce CS rates. Data on study design, interventions used, CS rates, and perinatal outcomes were extracted. Of 385 citations, 30 were assessed for full text review and six studies, conducted in Brazil, Chile, Italy and Sweden, were included. All studies measured initial CS rates, provided feedback and monitored performance using the Robson classification. In two studies, the audit cycle consisted exclusively of feedback using the Robson classification; the other four used audit and feedback as part of a multifaceted intervention. Baseline CS rates ranged from 20 to 36.8%; after the intervention, CS rates ranged from 3.1 to 21.2%. No studies were randomised or controlled and all had a high risk of bias. We identified six studies using the Robson classification within clinical audit cycles to reduce CS rates. All six report reductions in CS rates; however, results should be interpreted with caution because of limited methodological quality. Future trials are needed to evaluate the role of the Robson classification within audit cycles aimed at reducing CS rates. Use of the Robson classification in clinical audit cycles to reduce caesarean rates. © 2017 The Authors. BJOG An International Journal of Obstetrics and Gynaecology published by John Wiley & Sons Ltd on behalf of Royal College of Obstetricians and Gynaecologists.
Bayesian inference for heterogeneous caprock permeability based on above zone pressure monitoring
DOE Office of Scientific and Technical Information (OSTI.GOV)
Namhata, Argha; Small, Mitchell J.; Dilmore, Rober
The presence of faults/ fractures or highly permeable zones in the primary sealing caprock of a CO2 storage reservoir can result in leakage of CO2. Monitoring of leakage requires the capability to detect and resolve the onset, location, and volume of leakage in a systematic and timely manner. Pressure-based monitoring possesses such capabilities. This study demonstrates a basis for monitoring network design based on the characterization of CO2 leakage scenarios through an assessment of the integrity and permeability of the caprock inferred from above zone pressure measurements. Four representative heterogeneous fractured seal types are characterized to demonstrate seal permeability rangingmore » from highly permeable to impermeable. Based on Bayesian classification theory, the probability of each fractured caprock scenario given above zone pressure measurements with measurement error is inferred. The sensitivity to injection rate and caprock thickness is also evaluated and the probability of proper classification is calculated. The time required to distinguish between above zone pressure outcomes and the associated leakage scenarios is also computed.« less
Assessing and minimizing contamination in time of flight based validation data
NASA Astrophysics Data System (ADS)
Lennox, Kristin P.; Rosenfield, Paul; Blair, Brenton; Kaplan, Alan; Ruz, Jaime; Glenn, Andrew; Wurtz, Ronald
2017-10-01
Time of flight experiments are the gold standard method for generating labeled training and testing data for the neutron/gamma pulse shape discrimination problem. As the popularity of supervised classification methods increases in this field, there will also be increasing reliance on time of flight data for algorithm development and evaluation. However, time of flight experiments are subject to various sources of contamination that lead to neutron and gamma pulses being mislabeled. Such labeling errors have a detrimental effect on classification algorithm training and testing, and should therefore be minimized. This paper presents a method for identifying minimally contaminated data sets from time of flight experiments and estimating the residual contamination rate. This method leverages statistical models describing neutron and gamma travel time distributions and is easily implemented using existing statistical software. The method produces a set of optimal intervals that balance the trade-off between interval size and nuisance particle contamination, and its use is demonstrated on a time of flight data set for Cf-252. The particular properties of the optimal intervals for the demonstration data are explored in detail.
Myint, S.W.; Yuan, M.; Cerveny, R.S.; Giri, C.P.
2008-01-01
Remote sensing techniques have been shown effective for large-scale damage surveys after a hazardous event in both near real-time or post-event analyses. The paper aims to compare accuracy of common imaging processing techniques to detect tornado damage tracks from Landsat TM data. We employed the direct change detection approach using two sets of images acquired before and after the tornado event to produce a principal component composite images and a set of image difference bands. Techniques in the comparison include supervised classification, unsupervised classification, and objectoriented classification approach with a nearest neighbor classifier. Accuracy assessment is based on Kappa coefficient calculated from error matrices which cross tabulate correctly identified cells on the TM image and commission and omission errors in the result. Overall, the Object-oriented Approach exhibits the highest degree of accuracy in tornado damage detection. PCA and Image Differencing methods show comparable outcomes. While selected PCs can improve detection accuracy 5 to 10%, the Object-oriented Approach performs significantly better with 15-20% higher accuracy than the other two techniques. ?? 2008 by MDPI.
Myint, Soe W.; Yuan, May; Cerveny, Randall S.; Giri, Chandra P.
2008-01-01
Remote sensing techniques have been shown effective for large-scale damage surveys after a hazardous event in both near real-time or post-event analyses. The paper aims to compare accuracy of common imaging processing techniques to detect tornado damage tracks from Landsat TM data. We employed the direct change detection approach using two sets of images acquired before and after the tornado event to produce a principal component composite images and a set of image difference bands. Techniques in the comparison include supervised classification, unsupervised classification, and object-oriented classification approach with a nearest neighbor classifier. Accuracy assessment is based on Kappa coefficient calculated from error matrices which cross tabulate correctly identified cells on the TM image and commission and omission errors in the result. Overall, the Object-oriented Approach exhibits the highest degree of accuracy in tornado damage detection. PCA and Image Differencing methods show comparable outcomes. While selected PCs can improve detection accuracy 5 to 10%, the Object-oriented Approach performs significantly better with 15-20% higher accuracy than the other two techniques. PMID:27879757
NASA Technical Reports Server (NTRS)
Ackleson, S. G.; Klemas, V.
1987-01-01
Landsat MSS and TM imagery, obtained simultaneously over Guinea Marsh, VA, as analyzed and compares for its ability to detect submerged aquatic vegetation (SAV). An unsupervised clustering algorithm was applied to each image, where the input classification parameters are defined as functions of apparent sensor noise. Class confidence and accuracy were computed for all water areas by comparing the classified images, pixel-by-pixel, to rasterized SAV distributions derived from color aerial photography. To illustrate the effect of water depth on classification error, areas of depth greater than 1.9 m were masked, and class confidence and accuracy recalculated. A single-scattering radiative-transfer model is used to illustrate how percent canopy cover and water depth affect the volume reflectance from a water column containing SAV. For a submerged canopy that is morphologically and optically similar to Zostera marina inhabiting Lower Chesapeake Bay, dense canopies may be isolated by masking optically deep water. For less dense canopies, the effect of increasing water depth is to increase the apparent percent crown cover, which may result in classification error.
Texture analysis improves level set segmentation of the anterior abdominal wall
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Zhoubing; Allen, Wade M.; Baucom, Rebeccah B.
2013-12-15
Purpose: The treatment of ventral hernias (VH) has been a challenging problem for medical care. Repair of these hernias is fraught with failure; recurrence rates ranging from 24% to 43% have been reported, even with the use of biocompatible mesh. Currently, computed tomography (CT) is used to guide intervention through expert, but qualitative, clinical judgments, notably, quantitative metrics based on image-processing are not used. The authors propose that image segmentation methods to capture the three-dimensional structure of the abdominal wall and its abnormalities will provide a foundation on which to measure geometric properties of hernias and surrounding tissues and, therefore,more » to optimize intervention.Methods: In this study with 20 clinically acquired CT scans on postoperative patients, the authors demonstrated a novel approach to geometric classification of the abdominal. The authors’ approach uses a texture analysis based on Gabor filters to extract feature vectors and follows a fuzzy c-means clustering method to estimate voxelwise probability memberships for eight clusters. The memberships estimated from the texture analysis are helpful to identify anatomical structures with inhomogeneous intensities. The membership was used to guide the level set evolution, as well as to derive an initial start close to the abdominal wall.Results: Segmentation results on abdominal walls were both quantitatively and qualitatively validated with surface errors based on manually labeled ground truth. Using texture, mean surface errors for the outer surface of the abdominal wall were less than 2 mm, with 91% of the outer surface less than 5 mm away from the manual tracings; errors were significantly greater (2–5 mm) for methods that did not use the texture.Conclusions: The authors’ approach establishes a baseline for characterizing the abdominal wall for improving VH care. Inherent texture patterns in CT scans are helpful to the tissue classification, and texture analysis can improve the level set segmentation around the abdominal region.« less
Selecting Power-Efficient Signal Features for a Low-Power Fall Detector.
Wang, Changhong; Redmond, Stephen J; Lu, Wei; Stevens, Michael C; Lord, Stephen R; Lovell, Nigel H
2017-11-01
Falls are a serious threat to the health of older people. A wearable fall detector can automatically detect the occurrence of a fall and alert a caregiver or an emergency response service so they may deliver immediate assistance, improving the chances of recovering from fall-related injuries. One constraint of such a wearable technology is its limited battery life. Thus, minimization of power consumption is an important design concern, all the while maintaining satisfactory accuracy of the fall detection algorithms implemented on the wearable device. This paper proposes an approach for selecting power-efficient signal features such that the minimum desirable fall detection accuracy is assured. Using data collected in simulated falls, simulated activities of daily living, and real free-living trials, all using young volunteers, the proposed approach selects four features from a set of ten commonly used features, providing a power saving of 75.3%, while limiting the error rate of a binary classification decision tree fall detection algorithm to 7.1%.Falls are a serious threat to the health of older people. A wearable fall detector can automatically detect the occurrence of a fall and alert a caregiver or an emergency response service so they may deliver immediate assistance, improving the chances of recovering from fall-related injuries. One constraint of such a wearable technology is its limited battery life. Thus, minimization of power consumption is an important design concern, all the while maintaining satisfactory accuracy of the fall detection algorithms implemented on the wearable device. This paper proposes an approach for selecting power-efficient signal features such that the minimum desirable fall detection accuracy is assured. Using data collected in simulated falls, simulated activities of daily living, and real free-living trials, all using young volunteers, the proposed approach selects four features from a set of ten commonly used features, providing a power saving of 75.3%, while limiting the error rate of a binary classification decision tree fall detection algorithm to 7.1%.
Development and validation of Aviation Causal Contributors for Error Reporting Systems (ACCERS).
Baker, David P; Krokos, Kelley J
2007-04-01
This investigation sought to develop a reliable and valid classification system for identifying and classifying the underlying causes of pilot errors reported under the Aviation Safety Action Program (ASAP). ASAP is a voluntary safety program that air carriers may establish to study pilot and crew performance on the line. In ASAP programs, similar to the Aviation Safety Reporting System, pilots self-report incidents by filing a short text description of the event. The identification of contributors to errors is critical if organizations are to improve human performance, yet it is difficult for analysts to extract this information from text narratives. A taxonomy was needed that could be used by pilots to classify the causes of errors. After completing a thorough literature review, pilot interviews and a card-sorting task were conducted in Studies 1 and 2 to develop the initial structure of the Aviation Causal Contributors for Event Reporting Systems (ACCERS) taxonomy. The reliability and utility of ACCERS was then tested in studies 3a and 3b by having pilots independently classify the primary and secondary causes of ASAP reports. The results provided initial evidence for the internal and external validity of ACCERS. Pilots were found to demonstrate adequate levels of agreement with respect to their category classifications. ACCERS appears to be a useful system for studying human error captured under pilot ASAP reports. Future work should focus on how ACCERS is organized and whether it can be used or modified to classify human error in ASAP programs for other aviation-related job categories such as dispatchers. Potential applications of this research include systems in which individuals self-report errors and that attempt to extract and classify the causes of those events.
Gender differences in facial emotion recognition in persons with chronic schizophrenia.
Weiss, Elisabeth M; Kohler, Christian G; Brensinger, Colleen M; Bilker, Warren B; Loughead, James; Delazer, Margarete; Nolan, Karen A
2007-03-01
The aim of the present study was to investigate possible sex differences in the recognition of facial expressions of emotion and to investigate the pattern of classification errors in schizophrenic males and females. Such an approach provides an opportunity to inspect the degree to which males and females differ in perceiving and interpreting the different emotions displayed to them and to analyze which emotions are most susceptible to recognition errors. Fifty six chronically hospitalized schizophrenic patients (38 men and 18 women) completed the Penn Emotion Recognition Test (ER40), a computerized emotion discrimination test presenting 40 color photographs of evoked happy, sad, anger, fear expressions and neutral expressions balanced for poser gender and ethnicity. We found a significant sex difference in the patterns of error rates in the Penn Emotion Recognition Test. Neutral faces were more commonly mistaken as angry in schizophrenic men, whereas schizophrenic women misinterpreted neutral faces more frequently as sad. Moreover, female faces were better recognized overall, but fear was better recognized in same gender photographs, whereas anger was better recognized in different gender photographs. The findings of the present study lend support to the notion that sex differences in aggressive behavior could be related to a cognitive style characterized by hostile attributions to neutral faces in schizophrenic men.
Automated spectral classification and the GAIA project
NASA Technical Reports Server (NTRS)
Lasala, Jerry; Kurtz, Michael J.
1995-01-01
Two dimensional spectral types for each of the stars observed in the global astrometric interferometer for astrophysics (GAIA) mission would provide additional information for the galactic structure and stellar evolution studies, as well as helping in the identification of unusual objects and populations. The classification of the large quantity generated spectra requires that automated techniques are implemented. Approaches for the automatic classification are reviewed, and a metric-distance method is discussed. In tests, the metric-distance method produced spectral types with mean errors comparable to those of human classifiers working at similar resolution. Data and equipment requirements for an automated classification survey, are discussed. A program of auxiliary observations is proposed to yield spectral types and radial velocities for the GAIA-observed stars.
Speaker normalization and adaptation using second-order connectionist networks.
Watrous, R L
1993-01-01
A method for speaker normalization and adaption using connectionist networks is developed. A speaker-specific linear transformation of observations of the speech signal is computed using second-order network units. Classification is accomplished by a multilayer feedforward network that operates on the normalized speech data. The network is adapted for a new talker by modifying the transformation parameters while leaving the classifier fixed. This is accomplished by backpropagating classification error through the classifier to the second-order transformation units. This method was evaluated for the classification of ten vowels for 76 speakers using the first two formant values of the Peterson-Barney data. The results suggest that rapid speaker adaptation resulting in high classification accuracy can be accomplished by this method.
Errors in imaging patients in the emergency setting
Reginelli, Alfonso; Lo Re, Giuseppe; Midiri, Federico; Muzj, Carlo; Romano, Luigia; Brunese, Luca
2016-01-01
Emergency and trauma care produces a “perfect storm” for radiological errors: uncooperative patients, inadequate histories, time-critical decisions, concurrent tasks and often junior personnel working after hours in busy emergency departments. The main cause of diagnostic errors in the emergency department is the failure to correctly interpret radiographs, and the majority of diagnoses missed on radiographs are fractures. Missed diagnoses potentially have important consequences for patients, clinicians and radiologists. Radiologists play a pivotal role in the diagnostic assessment of polytrauma patients and of patients with non-traumatic craniothoracoabdominal emergencies, and key elements to reduce errors in the emergency setting are knowledge, experience and the correct application of imaging protocols. This article aims to highlight the definition and classification of errors in radiology, the causes of errors in emergency radiology and the spectrum of diagnostic errors in radiography, ultrasonography and CT in the emergency setting. PMID:26838955
Errors in imaging patients in the emergency setting.
Pinto, Antonio; Reginelli, Alfonso; Pinto, Fabio; Lo Re, Giuseppe; Midiri, Federico; Muzj, Carlo; Romano, Luigia; Brunese, Luca
2016-01-01
Emergency and trauma care produces a "perfect storm" for radiological errors: uncooperative patients, inadequate histories, time-critical decisions, concurrent tasks and often junior personnel working after hours in busy emergency departments. The main cause of diagnostic errors in the emergency department is the failure to correctly interpret radiographs, and the majority of diagnoses missed on radiographs are fractures. Missed diagnoses potentially have important consequences for patients, clinicians and radiologists. Radiologists play a pivotal role in the diagnostic assessment of polytrauma patients and of patients with non-traumatic craniothoracoabdominal emergencies, and key elements to reduce errors in the emergency setting are knowledge, experience and the correct application of imaging protocols. This article aims to highlight the definition and classification of errors in radiology, the causes of errors in emergency radiology and the spectrum of diagnostic errors in radiography, ultrasonography and CT in the emergency setting.
A user-friendly SSVEP-based brain-computer interface using a time-domain classifier.
Luo, An; Sullivan, Thomas J
2010-04-01
We introduce a user-friendly steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI) system. Single-channel EEG is recorded using a low-noise dry electrode. Compared to traditional gel-based multi-sensor EEG systems, a dry sensor proves to be more convenient, comfortable and cost effective. A hardware system was built that displays four LED light panels flashing at different frequencies and synchronizes with EEG acquisition. The visual stimuli have been carefully designed such that potential risk to photosensitive people is minimized. We describe a novel stimulus-locked inter-trace correlation (SLIC) method for SSVEP classification using EEG time-locked to stimulus onsets. We studied how the performance of the algorithm is affected by different selection of parameters. Using the SLIC method, the average light detection rate is 75.8% with very low error rates (an 8.4% false positive rate and a 1.3% misclassification rate). Compared to a traditional frequency-domain-based method, the SLIC method is more robust (resulting in less annoyance to the users) and is also suitable for irregular stimulus patterns.
48 CFR 47.305-9 - Commodity description and freight classification.
Code of Federal Regulations, 2010 CFR
2010-10-01
... freight classification. 47.305-9 Section 47.305-9 Federal Acquisition Regulations System FEDERAL... Commodity description and freight classification. (a) Generally, the freight rate for supplies is based on the rating applicable to the freight classification description published in the National Motor...
Stinchfield, Randy; McCready, John; Turner, Nigel E; Jimenez-Murcia, Susana; Petry, Nancy M; Grant, Jon; Welte, John; Chapman, Heather; Winters, Ken C
2016-09-01
The DSM-5 was published in 2013 and it included two substantive revisions for gambling disorder (GD). These changes are the reduction in the threshold from five to four criteria and elimination of the illegal activities criterion. The purpose of this study was to twofold. First, to assess the reliability, validity and classification accuracy of the DSM-5 diagnostic criteria for GD. Second, to compare the DSM-5-DSM-IV on reliability, validity, and classification accuracy, including an examination of the effect of the elimination of the illegal acts criterion on diagnostic accuracy. To compare DSM-5 and DSM-IV, eight datasets from three different countries (Canada, USA, and Spain; total N = 3247) were used. All datasets were based on similar research methods. Participants were recruited from outpatient gambling treatment services to represent the group with a GD and from the community to represent the group without a GD. All participants were administered a standardized measure of diagnostic criteria. The DSM-5 yielded satisfactory reliability, validity and classification accuracy. In comparing the DSM-5 to the DSM-IV, most comparisons of reliability, validity and classification accuracy showed more similarities than differences. There was evidence of modest improvements in classification accuracy for DSM-5 over DSM-IV, particularly in reduction of false negative errors. This reduction in false negative errors was largely a function of lowering the cut score from five to four and this revision is an improvement over DSM-IV. From a statistical standpoint, eliminating the illegal acts criterion did not make a significant impact on diagnostic accuracy. From a clinical standpoint, illegal acts can still be addressed in the context of the DSM-5 criterion of lying to others.
Olives, Casey; Valadez, Joseph J; Brooker, Simon J; Pagano, Marcello
2012-01-01
Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%), and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n=15 and n=25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n=15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools.
Nineteen hundred seventy three significant accomplishments. [Landsat satellite data applications
NASA Technical Reports Server (NTRS)
1974-01-01
Data collected by the Skylab remote sensing satellites was used to develop applications techniques and to combine automatic data classification with statistical clustering methods. Continuing research was concentrated in the correlation and registration of data products and in the definition of the atmospheric effects on remote sensing. The causes of errors encountered in the automated classification of agricultural data are identified. Other applications in forestry, geography, environmental geology, and land use are discussed.
Fergus, Paul; Hussain, Abir; Al-Jumeily, Dhiya; Huang, De-Shuang; Bouguila, Nizar
2017-07-06
Visual inspection of cardiotocography traces by obstetricians and midwives is the gold standard for monitoring the wellbeing of the foetus during antenatal care. However, inter- and intra-observer variability is high with only a 30% positive predictive value for the classification of pathological outcomes. This has a significant negative impact on the perinatal foetus and often results in cardio-pulmonary arrest, brain and vital organ damage, cerebral palsy, hearing, visual and cognitive defects and in severe cases, death. This paper shows that using machine learning and foetal heart rate signals provides direct information about the foetal state and helps to filter the subjective opinions of medical practitioners when used as a decision support tool. The primary aim is to provide a proof-of-concept that demonstrates how machine learning can be used to objectively determine when medical intervention, such as caesarean section, is required and help avoid preventable perinatal deaths. This is evidenced using an open dataset that comprises 506 controls (normal virginal deliveries) and 46 cases (caesarean due to pH ≤ 7.20-acidosis, n = 18; pH > 7.20 and pH < 7.25-foetal deterioration, n = 4; or clinical decision without evidence of pathological outcome measures, n = 24). Several machine-learning algorithms are trained, and validated, using binary classifier performance measures. The findings show that deep learning classification achieves sensitivity = 94%, specificity = 91%, Area under the curve = 99%, F-score = 100%, and mean square error = 1%. The results demonstrate that machine learning significantly improves the efficiency for the detection of caesarean section and normal vaginal deliveries using foetal heart rate signals compared with obstetrician and midwife predictions and systems reported in previous studies.
Cohen, Aaron M
2008-01-01
We participated in the i2b2 smoking status classification challenge task. The purpose of this task was to evaluate the ability of systems to automatically identify patient smoking status from discharge summaries. Our submission included several techniques that we compared and studied, including hot-spot identification, zero-vector filtering, inverse class frequency weighting, error-correcting output codes, and post-processing rules. We evaluated our approaches using the same methods as the i2b2 task organizers, using micro- and macro-averaged F1 as the primary performance metric. Our best performing system achieved a micro-F1 of 0.9000 on the test collection, equivalent to the best performing system submitted to the i2b2 challenge. Hot-spot identification, zero-vector filtering, classifier weighting, and error correcting output coding contributed additively to increased performance, with hot-spot identification having by far the largest positive effect. High performance on automatic identification of patient smoking status from discharge summaries is achievable with the efficient and straightforward machine learning techniques studied here.
Hybrid Clustering-GWO-NARX neural network technique in predicting stock price
NASA Astrophysics Data System (ADS)
Das, Debashish; Safa Sadiq, Ali; Mirjalili, Seyedali; Noraziah, A.
2017-09-01
Prediction of stock price is one of the most challenging tasks due to nonlinear nature of the stock data. Though numerous attempts have been made to predict the stock price by applying various techniques, yet the predicted price is not always accurate and even the error rate is high to some extent. Consequently, this paper endeavours to determine an efficient stock prediction strategy by implementing a combinatorial method of Grey Wolf Optimizer (GWO), Clustering and Non Linear Autoregressive Exogenous (NARX) Technique. The study uses stock data from prominent stock market i.e. New York Stock Exchange (NYSE), NASDAQ and emerging stock market i.e. Malaysian Stock Market (Bursa Malaysia), Dhaka Stock Exchange (DSE). It applies K-means clustering algorithm to determine the most promising cluster, then MGWO is used to determine the classification rate and finally the stock price is predicted by applying NARX neural network algorithm. The prediction performance gained through experimentation is compared and assessed to guide the investors in making investment decision. The result through this technique is indeed promising as it has shown almost precise prediction and improved error rate. We have applied the hybrid Clustering-GWO-NARX neural network technique in predicting stock price. We intend to work with the effect of various factors in stock price movement and selection of parameters. We will further investigate the influence of company news either positive or negative in stock price movement. We would be also interested to predict the Stock indices.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-11-08
... Authorization of Additional Classification and Rate, Standard Form 1444 AGENCY: Department of Defense (DOD... of Additional Classification and Rate, Standard Form 1444. DATES: Comments may be submitted on or.../or business confidential information provided. FOR FURTHER INFORMATION CONTACT: Mr. Ernest Woodson...
Code of Federal Regulations, 2011 CFR
2011-10-01
..., charges, classifications, rules or regulations. 565.9 Section 565.9 Shipping FEDERAL MARITIME COMMISSION... Commission review, suspension and prohibition of rates, charges, classifications, rules or regulations. (a)(1..., charges, classifications, rules or regulations) from the Commission, each controlled carrier shall file a...
Code of Federal Regulations, 2010 CFR
2010-10-01
..., charges, classifications, rules or regulations. 565.9 Section 565.9 Shipping FEDERAL MARITIME COMMISSION... Commission review, suspension and prohibition of rates, charges, classifications, rules or regulations. (a)(1..., charges, classifications, rules or regulations) from the Commission, each controlled carrier shall file a...
Federal Register 2010, 2011, 2012, 2013, 2014
2013-03-26
...-AM78 Prevailing Rate Systems; North American Industry Classification System Based Federal Wage System... 2007 North American Industry Classification System (NAICS) codes currently used in Federal Wage System... (OPM) issued a final rule (73 FR 45853) to update the 2002 North American Industry Classification...
76 FR 53699 - Labor Surplus Area Classification Under Executive Orders 12073 and 10582
Federal Register 2010, 2011, 2012, 2013, 2014
2011-08-29
... DEPARTMENT OF LABOR Employment and Training Administration Labor Surplus Area Classification Under... estimates provided to ETA by the Bureau of Labor Statistics are used in making these classifications. The... classification criteria include a ``floor unemployment rate'' (6.0%) and a ``ceiling unemployment rate'' (10.0...
A classification on human factor accident/incident of China civil aviation in recent twelve years.
Luo, Xiao-li
2004-10-01
To study human factor accident/incident occurred during 1990-2001 using new classification standard. The human factor accident/incident classification standard is developed on the basis of Reason's Model, combining with CAAC's traditional classifying method, and applied to the classified statistical analysis for 361 flying incidents and 35 flight accidents of China civil aviation, which is induced by human factors and occurred from 1990 to 2001. 1) the incident percentage of taxi and cruise is higher than that of takeoff, climb and descent. 2) The dominating type of flight incidents is diverging of runway, overrunning, near-miss, tail/wingtip/engine strike and ground obstacle impacting. 3) The top three accidents are out of control caused by crew, mountain collision and over runway. 4) Crew's basic operating skill is lower than what we imagined, the mostly representation is poor correcting ability when flight error happened. 5) Crew errors can be represented by incorrect control, regulation and procedure violation, disorientation and diverging percentage of correct flight level. The poor CRM skill is the dominant factor impacting China civil aviation safety, this result has a coincidence with previous study, but there is much difference and distinct characteristic in top incident phase, the type of crew error and behavior performance compared with that of advanced countries. We should strengthen CRM training for all of pilots aiming at the Chinese pilot behavior characteristic in order to improve the safety level of China civil aviation.
Bias in error estimation when using cross-validation for model selection.
Varma, Sudhir; Simon, Richard
2006-02-23
Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data. We used CV to optimize the classification parameters for two kinds of classifiers; Shrunken Centroids and Support Vector Machines (SVM). Random training datasets were created, with no difference in the distribution of the features between the two classes. Using these "null" datasets, we selected classifier parameter values that minimized the CV error estimate. 10-fold CV was used for Shrunken Centroids while Leave-One-Out-CV (LOOCV) was used for the SVM. Independent test data was created to estimate the true error. With "null" and "non null" (with differential expression between the classes) data, we also tested a nested CV procedure, where an inner CV loop is used to perform the tuning of the parameters while an outer CV is used to compute an estimate of the error. The CV error estimate for the classifier with the optimal parameters was found to be a substantially biased estimate of the true error that the classifier would incur on independent data. Even though there is no real difference between the two classes for the "null" datasets, the CV error estimate for the Shrunken Centroid with the optimal parameters was less than 30% on 18.5% of simulated training data-sets. For SVM with optimal parameters the estimated error rate was less than 30% on 38% of "null" data-sets. Performance of the optimized classifiers on the independent test set was no better than chance. The nested CV procedure reduces the bias considerably and gives an estimate of the error that is very close to that obtained on the independent testing set for both Shrunken Centroids and SVM classifiers for "null" and "non-null" data distributions. We show that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating true error of a classifier developed using a well defined algorithm requires that all steps of the algorithm, including classifier parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-01-31
... for Authorization of Additional Classification and Rate, Standard Form 1444 AGENCIES: Department of... Request for Authorization of Additional Classification and Rate, Standard Form 1444. A notice published in... personal and/or business confidential information provided. FOR FURTHER INFORMATION CONTACT: Ms. Clare...
Automatic feature design for optical character recognition using an evolutionary search procedure.
Stentiford, F W
1985-03-01
An automatic evolutionary search is applied to the problem of feature extraction in an OCR application. A performance measure based on feature independence is used to generate features which do not appear to suffer from peaking effects [17]. Features are extracted from a training set of 30 600 machine printed 34 class alphanumeric characters derived from British mail. Classification results on the training set and a test set of 10 200 characters are reported for an increasing number of features. A 1.01 percent forced decision error rate is obtained on the test data using 316 features. The hardware implementation should be cheap and fast to operate. The performance compares favorably with current low cost OCR page readers.
Luo, Lei; Yang, Jian; Qian, Jianjun; Tai, Ying; Lu, Gui-Fu
2017-09-01
Dealing with partial occlusion or illumination is one of the most challenging problems in image representation and classification. In this problem, the characterization of the representation error plays a crucial role. In most current approaches, the error matrix needs to be stretched into a vector and each element is assumed to be independently corrupted. This ignores the dependence between the elements of error. In this paper, it is assumed that the error image caused by partial occlusion or illumination changes is a random matrix variate and follows the extended matrix variate power exponential distribution. This has the heavy tailed regions and can be used to describe a matrix pattern of l×m dimensional observations that are not independent. This paper reveals the essence of the proposed distribution: it actually alleviates the correlations between pixels in an error matrix E and makes E approximately Gaussian. On the basis of this distribution, we derive a Schatten p -norm-based matrix regression model with L q regularization. Alternating direction method of multipliers is applied to solve this model. To get a closed-form solution in each step of the algorithm, two singular value function thresholding operators are introduced. In addition, the extended Schatten p -norm is utilized to characterize the distance between the test samples and classes in the design of the classifier. Extensive experimental results for image reconstruction and classification with structural noise demonstrate that the proposed algorithm works much more robustly than some existing regression-based methods.
Berger, Rachel P; Parks, Sharyn; Fromkin, Janet; Rubin, Pamela; Pecora, Peter J
2015-04-01
To assess the accuracy of an International Classification of Diseases (ICD) code-based operational case definition for abusive head trauma (AHT). Subjects were children <5 years of age evaluated for AHT by a hospital-based Child Protection Team (CPT) at a tertiary care paediatric hospital with a completely electronic medical record (EMR) system. Subjects were designated as non-AHT traumatic brain injury (TBI) or AHT based on whether the CPT determined that the injuries were due to AHT. The sensitivity and specificity of the ICD-based definition were calculated. There were 223 children evaluated for AHT: 117 AHT and 106 non-AHT TBI. The sensitivity and specificity of the ICD-based operational case definition were 92% (95% CI 85.8 to 96.2) and 96% (95% CI 92.3 to 99.7), respectively. All errors in sensitivity and three of the four specificity errors were due to coder error; one specificity error was a physician error. In a paediatric tertiary care hospital with an EMR system, the accuracy of an ICD-based case definition for AHT was high. Additional studies are needed to assess the accuracy of this definition in all types of hospitals in which children with AHT are cared for. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
NASA Technical Reports Server (NTRS)
Fraser, R. S.; Bahethi, O. P.; Al-Abbas, A. H.
1977-01-01
The effect of differences in atmospheric turbidity on the classification of Landsat 1 observations of a rural scene is presented. The observations are classified by an unsupervised clustering technique. These clusters serve as a training set for use of a maximum-likelihood algorithm. The measured radiances in each of the four spectral bands are then changed by amounts measured by Landsat 1. These changes can be associated with a decrease in atmospheric turbidity by a factor of 1.3. The classification of 22% of the pixels changes as a result of the modification. The modified observations are then reclassified as an independent set. Only 3% of the pixels have a different classification than the unmodified set. Hence, if classification errors of rural areas are not to exceed 15%, a new training set has to be developed whenever the difference in turbidity between the training and test sets reaches unity.
Multinomial mixture model with heterogeneous classification probabilities
Holland, M.D.; Gray, B.R.
2011-01-01
Royle and Link (Ecology 86(9):2505-2512, 2005) proposed an analytical method that allowed estimation of multinomial distribution parameters and classification probabilities from categorical data measured with error. While useful, we demonstrate algebraically and by simulations that this method yields biased multinomial parameter estimates when the probabilities of correct category classifications vary among sampling units. We address this shortcoming by treating these probabilities as logit-normal random variables within a Bayesian framework. We use Markov chain Monte Carlo to compute Bayes estimates from a simulated sample from the posterior distribution. Based on simulations, this elaborated Royle-Link model yields nearly unbiased estimates of multinomial and correct classification probability estimates when classification probabilities are allowed to vary according to the normal distribution on the logit scale or according to the Beta distribution. The method is illustrated using categorical submersed aquatic vegetation data. ?? 2010 Springer Science+Business Media, LLC.
NASA Astrophysics Data System (ADS)
Vahidi, Vahid; Saberinia, Ebrahim; Regentova, Emma E.
2017-10-01
A channel estimation (CE) method based on compressed sensing (CS) is proposed to estimate the sparse and doubly selective (DS) channel for hyperspectral image transmission from unmanned aircraft vehicles to ground stations. The proposed method contains three steps: (1) the priori estimate of the channel by orthogonal matching pursuit (OMP), (2) calculation of the linear minimum mean square error (LMMSE) estimate of the received pilots given the estimated channel, and (3) estimate of the complex amplitudes and Doppler shifts of the channel using the enhanced received pilot data applying a second round of a CS algorithm. The proposed method is named DS-LMMSE-OMP, and its performance is evaluated by simulating transmission of AVIRIS hyperspectral data via the communication channel and assessing their fidelity for the automated analysis after demodulation. The performance of the DS-LMMSE-OMP approach is compared with that of two other state-of-the-art CE methods. The simulation results exhibit up to 8-dB figure of merit in the bit error rate and 50% improvement in the hyperspectral image classification accuracy.
Stretchy binary classification.
Toh, Kar-Ann; Lin, Zhiping; Sun, Lei; Li, Zhengguo
2018-01-01
In this article, we introduce an analytic formulation for compressive binary classification. The formulation seeks to solve the least ℓ p -norm of the parameter vector subject to a classification error constraint. An analytic and stretchable estimation is conjectured where the estimation can be viewed as an extension of the pseudoinverse with left and right constructions. Our variance analysis indicates that the estimation based on the left pseudoinverse is unbiased and the estimation based on the right pseudoinverse is biased. Sparseness can be obtained for the biased estimation under certain mild conditions. The proposed estimation is investigated numerically using both synthetic and real-world data. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Hramov, Alexander E.; Frolov, Nikita S.; Musatov, Vyachaslav Yu.
2018-02-01
In present work we studied features of the human brain states classification, corresponding to the real movements of hands and legs. For this purpose we used supervised learning algorithm based on feed-forward artificial neural networks (ANNs) with error back-propagation along with the support vector machine (SVM) method. We compared the quality of operator movements classification by means of EEG signals obtained experimentally in the absence of preliminary processing and after filtration in different ranges up to 25 Hz. It was shown that low-frequency filtering of multichannel EEG data significantly improved accuracy of operator movements classification.
GDF v2.0, an enhanced version of GDF
NASA Astrophysics Data System (ADS)
Tsoulos, Ioannis G.; Gavrilis, Dimitris; Dermatas, Evangelos
2007-12-01
An improved version of the function estimation program GDF is presented. The main enhancements of the new version include: multi-output function estimation, capability of defining custom functions in the grammar and selection of the error function. The new version has been evaluated on a series of classification and regression datasets, that are widely used for the evaluation of such methods. It is compared to two known neural networks and outperforms them in 5 (out of 10) datasets. Program summaryTitle of program: GDF v2.0 Catalogue identifier: ADXC_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADXC_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 98 147 No. of bytes in distributed program, including test data, etc.: 2 040 684 Distribution format: tar.gz Programming language: GNU C++ Computer: The program is designed to be portable in all systems running the GNU C++ compiler Operating system: Linux, Solaris, FreeBSD RAM: 200000 bytes Classification: 4.9 Does the new version supersede the previous version?: Yes Nature of problem: The technique of function estimation tries to discover from a series of input data a functional form that best describes them. This can be performed with the use of parametric models, whose parameters can adapt according to the input data. Solution method: Functional forms are being created by genetic programming which are approximations for the symbolic regression problem. Reasons for new version: The GDF package was extended in order to be more flexible and user customizable than the old package. The user can extend the package by defining his own error functions and he can extend the grammar of the package by adding new functions to the function repertoire. Also, the new version can perform function estimation of multi-output functions and it can be used for classification problems. Summary of revisions: The following features have been added to the package GDF: Multi-output function approximation. The package can now approximate any function f:R→R. This feature gives also to the package the capability of performing classification and not only regression. User defined function can be added to the repertoire of the grammar, extending the regression capabilities of the package. This feature is limited to 3 functions, but easily this number can be increased. Capability of selecting the error function. The package offers now to the user apart from the mean square error other error functions such as: mean absolute square error, maximum square error. Also, user defined error functions can be added to the set of error functions. More verbose output. The main program displays more information to the user as well as the default values for the parameters. Also, the package gives to the user the capability to define an output file, where the output of the gdf program for the testing set will be stored after the termination of the process. Additional comments: A technical report describing the revisions, experiments and test runs is packaged with the source code. Running time: Depending on the train data.
Ground truth management system to support multispectral scanner /MSS/ digital analysis
NASA Technical Reports Server (NTRS)
Coiner, J. C.; Ungar, S. G.
1977-01-01
A computerized geographic information system for management of ground truth has been designed and implemented to relate MSS classification results to in situ observations. The ground truth system transforms, generalizes and rectifies ground observations to conform to the pixel size and shape of high resolution MSS aircraft data. These observations can then be aggregated for comparison to lower resolution sensor data. Construction of a digital ground truth array allows direct pixel by pixel comparison between classification results of MSS data and ground truth. By making comparisons, analysts can identify spatial distribution of error within the MSS data as well as usual figures of merit for the classifications. Use of the ground truth system permits investigators to compare a variety of environmental or anthropogenic data, such as soil color or tillage patterns, with classification results and allows direct inclusion of such data into classification operations. To illustrate the system, examples from classification of simulated Thematic Mapper data for agricultural test sites in North Dakota and Kansas are provided.
NASA Astrophysics Data System (ADS)
Iwahashi, J.; Yamazaki, D.; Matsuoka, M.; Thamarux, P.; Herrick, J.; Yong, A.; Mital, U.
2017-12-01
A seamless model of landform classifications with regional accuracy will be a powerful platform for geophysical studies that forecast geologic hazards. Spatial variability as a function of landform on a global scale was captured in the automated classifications of Iwahashi and Pike (2007) and additional developments are presented here that incorporate more accurate depictions using higher-resolution elevation data than the original 1-km scale Shuttle Radar Topography Mission digital elevation model (DEM). We create polygon-based terrain classifications globally by using the 280-m DEM interpolated from the Multi-Error-Removed Improved-Terrain DEM (MERIT; Yamazaki et al., 2017). The multi-scale pixel-image analysis method, known as Multi-resolution Segmentation (Baatz and Schäpe, 2000), is first used to classify the terrains based on geometric signatures (slope and local convexity) calculated from the 280-m DEM. Next, we apply the machine learning method of "k-means clustering" to prepare the polygon-based classification at the globe-scale using slope, local convexity and surface texture. We then group the divisions with similar properties by hierarchical clustering and other statistical analyses using geological and geomorphological data of the area where landslides and earthquakes are frequent (e.g. Japan and California). We find the 280-m DEM resolution is only partially sufficient for classifying plains. We nevertheless observe that the categories correspond to reported landslide and liquefaction features at the global scale, suggesting that our model is an appropriate platform to forecast ground failure. To predict seismic amplification, we estimate site conditions using the time-averaged shear-wave velocity in the upper 30-m (VS30) measurements compiled by Yong et al. (2016) and the terrain model developed by Yong (2016; Y16). We plan to test our method on finer resolution DEMs and report our findings to obtain a more globally consistent terrain model as there are known errors in DEM derivatives at higher-resolutions. We expect the improvement in DEM resolution (4 times greater detail) and the combination of regional and global coverage will yield a consistent dataset of polygons that have the potential to improve relations to the Y16 estimates significantly.
ERIC Educational Resources Information Center
Severo, Milton; Silva-Pereira, Fernanda; Ferreira, Maria Amelia
2013-01-01
Several studies have shown that the standard error of measurement (SEM) can be used as an additional “safety net” to reduce the frequency of false-positive or false-negative student grading classifications. Practical examinations in clinical anatomy are often used as diagnostic tests to admit students to course final examinations. The aim of this…
NASA Astrophysics Data System (ADS)
Al-Ghraibah, Amani
Solar flares release stored magnetic energy in the form of radiation and can have significant detrimental effects on earth including damage to technological infrastructure. Recent work has considered methods to predict future flare activity on the basis of quantitative measures of the solar magnetic field. Accurate advanced warning of solar flare occurrence is an area of increasing concern and much research is ongoing in this area. Our previous work 111] utilized standard pattern recognition and classification techniques to determine (classify) whether a region is expected to flare within a predictive time window, using a Relevance Vector Machine (RVM) classification method. We extracted 38 features which describing the complexity of the photospheric magnetic field, the result classification metrics will provide the baseline against which we compare our new work. We find a true positive rate (TPR) of 0.8, true negative rate (TNR) of 0.7, and true skill score (TSS) of 0.49. This dissertation proposes three basic topics; the first topic is an extension to our previous work [111, where we consider a feature selection method to determine an appropriate feature subset with cross validation classification based on a histogram analysis of selected features. Classification using the top five features resulting from this analysis yield better classification accuracies across a large unbalanced dataset. In particular, the feature subsets provide better discrimination of the many regions that flare where we find a TPR of 0.85, a TNR of 0.65 sightly lower than our previous work, and a TSS of 0.5 which has an improvement comparing with our previous work. In the second topic, we study the prediction of solar flare size and time-to-flare using support vector regression (SVR). When we consider flaring regions only, we find an average error in estimating flare size of approximately half a GOES class. When we additionally consider non-flaring regions, we find an increased average error of approximately 3/4 a GOES class. We also consider thresholding the regressed flare size for the experiment containing both flaring and non-flaring regions and find a TPR. of 0.69 and a TNR of 0.86 for flare prediction, consistent with our previous studies of flare prediction using the same magnetic complexity features. The results for both of these size regression experiments are consistent across a wide range of predictive time windows, indicating that the magnetic complexity features may be persistent in appearance long before flare activity. This conjecture is supported by our larger error rates of some 40 hours in the time-to-flare regression problem. The magnetic complexity features considered here appear to have discriminative potential for flare size, but their persistence in time makes them less discriminative for the time-to-flare problem. We also study the prediction of solar flare size and time-to-flare using two temporal features, namely the ▵- and ▵-▵-features, the same average size and time-to-flare regression error are found when these temporal features are used in size and time-to-flare prediction. In the third topic, we study the temporal evolution of active region magnetic fields using Hidden Markov Models (HMMs) which is one of the efficient temporal analyses found in literature. We extracted 38 features which describing the complexity of the photospheric magnetic field. These features are converted into a sequence of symbols using k-nearest neighbor search method. We study many parameters before prediction; like the length of the training window Wtrain which denotes to the number of history images use to train the flare and non-flare HMMs, and number of hidden states Q. In training phase, the model parameters of the HMM of each category are optimized so as to best describe the training symbol sequences. In testing phase, we use the best flare and non-flare models to predict/classify active regions as a flaring or non-flaring region using a sliding window method. The best prediction result is found where the length of the history training images are 15 images (i.e., Wtrain= 15) and the length of the sliding testing window is less than or equal to W train, the best result give a TPR of 0.79 consistent with previous flare prediction work, TNR of 0.87 arid TSS of 0.66, where both are higher than our previous flare prediction work. We find that the best number of hidden states which can describe the temporal evolution of the solar ARs is equal to five states, at the same time, a close resultant metrics are found using different number of states.
Use of scan overlap redundancy to enhance multispectral aircraft scanner data
NASA Technical Reports Server (NTRS)
Lindenlaub, J. C.; Keat, J.
1973-01-01
Two criteria were suggested for optimizing the resolution error versus signal-to-noise-ratio tradeoff. The first criterion uses equal weighting coefficients and chooses n, the number of lines averaged, so as to make the average resolution error equal to the noise error. The second criterion adjusts both the number and relative sizes of the weighting coefficients so as to minimize the total error (resolution error plus noise error). The optimum set of coefficients depends upon the geometry of the resolution element, the number of redundant scan lines, the scan line increment, and the original signal-to-noise ratio of the channel. Programs were developed to find the optimum number and relative weights of the averaging coefficients. A working definition of signal-to-noise ratio was given and used to try line averaging on a typical set of data. Line averaging was evaluated only with respect to its effect on classification accuracy.
Bug Distribution and Statistical Pattern Classification.
ERIC Educational Resources Information Center
Tatsuoka, Kikumi K.; Tatsuoka, Maurice M.
1987-01-01
The rule space model permits measurement of cognitive skill acquisition and error diagnosis. Further discussion introduces Bayesian hypothesis testing and bug distribution. An illustration involves an artificial intelligence approach to testing fractions and arithmetic. (Author/GDC)
Simultaneous Control of Error Rates in fMRI Data Analysis
Kang, Hakmook; Blume, Jeffrey; Ombao, Hernando; Badre, David
2015-01-01
The key idea of statistical hypothesis testing is to fix, and thereby control, the Type I error (false positive) rate across samples of any size. Multiple comparisons inflate the global (family-wise) Type I error rate and the traditional solution to maintaining control of the error rate is to increase the local (comparison-wise) Type II error (false negative) rates. However, in the analysis of human brain imaging data, the number of comparisons is so large that this solution breaks down: the local Type II error rate ends up being so large that scientifically meaningful analysis is precluded. Here we propose a novel solution to this problem: allow the Type I error rate to converge to zero along with the Type II error rate. It works because when the Type I error rate per comparison is very small, the accumulation (or global) Type I error rate is also small. This solution is achieved by employing the Likelihood paradigm, which uses likelihood ratios to measure the strength of evidence on a voxel-by-voxel basis. In this paper, we provide theoretical and empirical justification for a likelihood approach to the analysis of human brain imaging data. In addition, we present extensive simulations that show the likelihood approach is viable, leading to ‘cleaner’ looking brain maps and operationally superiority (lower average error rate). Finally, we include a case study on cognitive control related activation in the prefrontal cortex of the human brain. PMID:26272730
A SVM framework for fault detection of the braking system in a high speed train
NASA Astrophysics Data System (ADS)
Liu, Jie; Li, Yan-Fu; Zio, Enrico
2017-03-01
In April 2015, the number of operating High Speed Trains (HSTs) in the world has reached 3603. An efficient, effective and very reliable braking system is evidently very critical for trains running at a speed around 300 km/h. Failure of a highly reliable braking system is a rare event and, consequently, informative recorded data on fault conditions are scarce. This renders the fault detection problem a classification problem with highly unbalanced data. In this paper, a Support Vector Machine (SVM) framework, including feature selection, feature vector selection, model construction and decision boundary optimization, is proposed for tackling this problem. Feature vector selection can largely reduce the data size and, thus, the computational burden. The constructed model is a modified version of the least square SVM, in which a higher cost is assigned to the error of classification of faulty conditions than the error of classification of normal conditions. The proposed framework is successfully validated on a number of public unbalanced datasets. Then, it is applied for the fault detection of braking systems in HST: in comparison with several SVM approaches for unbalanced datasets, the proposed framework gives better results.
Semi-supervised anomaly detection - towards model-independent searches of new physics
NASA Astrophysics Data System (ADS)
Kuusela, Mikael; Vatanen, Tommi; Malmi, Eric; Raiko, Tapani; Aaltonen, Timo; Nagai, Yoshikazu
2012-06-01
Most classification algorithms used in high energy physics fall under the category of supervised machine learning. Such methods require a training set containing both signal and background events and are prone to classification errors should this training data be systematically inaccurate for example due to the assumed MC model. To complement such model-dependent searches, we propose an algorithm based on semi-supervised anomaly detection techniques, which does not require a MC training sample for the signal data. We first model the background using a multivariate Gaussian mixture model. We then search for deviations from this model by fitting to the observations a mixture of the background model and a number of additional Gaussians. This allows us to perform pattern recognition of any anomalous excess over the background. We show by a comparison to neural network classifiers that such an approach is a lot more robust against misspecification of the signal MC than supervised classification. In cases where there is an unexpected signal, a neural network might fail to correctly identify it, while anomaly detection does not suffer from such a limitation. On the other hand, when there are no systematic errors in the training data, both methods perform comparably.
Nancy Jane, Y; Khanna Nehemiah, H; Arputharaj, Kannan
2016-04-01
Parkinson's disease (PD) is a movement disorder that affects the patient's nervous system and health-care applications mostly uses wearable sensors to collect these data. Since these sensors generate time stamped data, analyzing gait disturbances in PD becomes challenging task. The objective of this paper is to develop an effective clinical decision-making system (CDMS) that aids the physician in diagnosing the severity of gait disturbances in PD affected patients. This paper presents a Q-backpropagated time delay neural network (Q-BTDNN) classifier that builds a temporal classification model, which performs the task of classification and prediction in CDMS. The proposed Q-learning induced backpropagation (Q-BP) training algorithm trains the Q-BTDNN by generating a reinforced error signal. The network's weights are adjusted through backpropagating the generated error signal. For experimentation, the proposed work uses a PD gait database, which contains gait measures collected through wearable sensors from three different PD research studies. The experimental result proves the efficiency of Q-BP in terms of its improved classification accuracy of 91.49%, 92.19% and 90.91% with three datasets accordingly compared to other neural network training algorithms. Copyright © 2016 Elsevier Inc. All rights reserved.
Bayesian logistic regression approaches to predict incorrect DRG assignment.
Suleiman, Mani; Demirhan, Haydar; Boyd, Leanne; Girosi, Federico; Aksakalli, Vural
2018-05-07
Episodes of care involving similar diagnoses and treatments and requiring similar levels of resource utilisation are grouped to the same Diagnosis-Related Group (DRG). In jurisdictions which implement DRG based payment systems, DRGs are a major determinant of funding for inpatient care. Hence, service providers often dedicate auditing staff to the task of checking that episodes have been coded to the correct DRG. The use of statistical models to estimate an episode's probability of DRG error can significantly improve the efficiency of clinical coding audits. This study implements Bayesian logistic regression models with weakly informative prior distributions to estimate the likelihood that episodes require a DRG revision, comparing these models with each other and to classical maximum likelihood estimates. All Bayesian approaches had more stable model parameters than maximum likelihood. The best performing Bayesian model improved overall classification per- formance by 6% compared to maximum likelihood, with a 34% gain compared to random classification, respectively. We found that the original DRG, coder and the day of coding all have a significant effect on the likelihood of DRG error. Use of Bayesian approaches has improved model parameter stability and classification accuracy. This method has already lead to improved audit efficiency in an operational capacity.
Muroi, Maki; Shen, Jay J; Angosta, Alona
2017-02-01
Registered nurses (RNs) play an important role in safe medication administration and patient safety. This study examined a total of 1276 medication error (ME) incident reports made by RNs in hospital inpatient settings in the southwestern region of the United States. The most common drug class associated with MEs was cardiovascular drugs (24.7%). Among this class, anticoagulants had the most errors (11.3%). The antimicrobials was the second most common drug class associated with errors (19.1%) and vancomycin was the most common antimicrobial that caused errors in this category (6.1%). MEs occurred more frequently in the medical-surgical and intensive care units than any other hospital units. Ten percent of MEs reached the patients with harm and 11% reached the patients with increased monitoring. Understanding the contributing factors related to MEs, addressing and eliminating risk of errors across hospital units, and providing education and resources for nurses may help reduce MEs. Copyright © 2016 Elsevier Inc. All rights reserved.
Detecting Paroxysmal Coughing from Pertussis Cases Using Voice Recognition Technology
Parker, Danny; Picone, Joseph; Harati, Amir; Lu, Shuang; Jenkyns, Marion H.; Polgreen, Philip M.
2013-01-01
Background Pertussis is highly contagious; thus, prompt identification of cases is essential to control outbreaks. Clinicians experienced with the disease can easily identify classic cases, where patients have bursts of rapid coughing followed by gasps, and a characteristic whooping sound. However, many clinicians have never seen a case, and thus may miss initial cases during an outbreak. The purpose of this project was to use voice-recognition software to distinguish pertussis coughs from croup and other coughs. Methods We collected a series of recordings representing pertussis, croup and miscellaneous coughing by children. We manually categorized coughs as either pertussis or non-pertussis, and extracted features for each category. We used Mel-frequency cepstral coefficients (MFCC), a sampling rate of 16 KHz, a frame Duration of 25 msec, and a frame rate of 10 msec. The coughs were filtered. Each cough was divided into 3 sections of proportion 3-4-3. The average of the 13 MFCCs for each section was computed and made into a 39-element feature vector used for the classification. We used the following machine learning algorithms: Neural Networks, K-Nearest Neighbor (KNN), and a 200 tree Random Forest (RF). Data were reserved for cross-validation of the KNN and RF. The Neural Network was trained 100 times, and the averaged results are presented. Results After categorization, we had 16 examples of non-pertussis coughs and 31 examples of pertussis coughs. Over 90% of all pertussis coughs were properly classified as pertussis. The error rates were: Type I errors of 7%, 12%, and 25% and Type II errors of 8%, 0%, and 0%, using the Neural Network, Random Forest, and KNN, respectively. Conclusion Our results suggest that we can build a robust classifier to assist clinicians and the public to help identify pertussis cases in children presenting with typical symptoms. PMID:24391730
1983-01-01
changes. Concurrently, CIA formed and AD HOC esting to step back and look at the U.S. security Intelligence Community Working Group to re...administrative error; to prevent embarrassment to expected damage will be. If you foresee the dam- a person, organization, or agency; to restrain com- age...the decision will be to classify the informa- petition; or to pTevent or delay the public release of tion. But note that in this thought process, you
Automatic and semi-automatic approaches for arteriolar-to-venular computation in retinal photographs
NASA Astrophysics Data System (ADS)
Mendonça, Ana Maria; Remeseiro, Beatriz; Dashtbozorg, Behdad; Campilho, Aurélio
2017-03-01
The Arteriolar-to-Venular Ratio (AVR) is a popular dimensionless measure which allows the assessment of patients' condition for the early diagnosis of different diseases, including hypertension and diabetic retinopathy. This paper presents two new approaches for AVR computation in retinal photographs which include a sequence of automated processing steps: vessel segmentation, caliber measurement, optic disc segmentation, artery/vein classification, region of interest delineation, and AVR calculation. Both approaches have been tested on the INSPIRE-AVR dataset, and compared with a ground-truth provided by two medical specialists. The obtained results demonstrate the reliability of the fully automatic approach which provides AVR ratios very similar to at least one of the observers. Furthermore, the semi-automatic approach, which includes the manual modification of the artery/vein classification if needed, allows to significantly reduce the error to a level below the human error.
Bayes classification of terrain cover using normalized polarimetric data
NASA Technical Reports Server (NTRS)
Yueh, H. A.; Swartz, A. A.; Kong, J. A.; Shin, R. T.; Novak, L. M.
1988-01-01
The normalized polarimetric classifier (NPC) which uses only the relative magnitudes and phases of the polarimetric data is proposed for discrimination of terrain elements. The probability density functions (PDFs) of polarimetric data are assumed to have a complex Gaussian distribution, and the marginal PDF of the normalized polarimetric data is derived by adopting the Euclidean norm as the normalization function. The general form of the distance measure for the NPC is also obtained. It is demonstrated that for polarimetric data with an arbitrary PDF, the distance measure of NPC will be independent of the normalization function selected even when the classifier is mistrained. A complex Gaussian distribution is assumed for the polarimetric data consisting of grass and tree regions. The probability of error for the NPC is compared with those of several other single-feature classifiers. The classification error of NPCs is shown to be independent of the normalization function.
Learning optimal features for visual pattern recognition
NASA Astrophysics Data System (ADS)
Labusch, Kai; Siewert, Udo; Martinetz, Thomas; Barth, Erhardt
2007-02-01
The optimal coding hypothesis proposes that the human visual system has adapted to the statistical properties of the environment by the use of relatively simple optimality criteria. We here (i) discuss how the properties of different models of image coding, i.e. sparseness, decorrelation, and statistical independence are related to each other (ii) propose to evaluate the different models by verifiable performance measures (iii) analyse the classification performance on images of handwritten digits (MNIST data base). We first employ the SPARSENET algorithm (Olshausen, 1998) to derive a local filter basis (on 13 × 13 pixels windows). We then filter the images in the database (28 × 28 pixels images of digits) and reduce the dimensionality of the resulting feature space by selecting the locally maximal filter responses. We then train a support vector machine on a training set to classify the digits and report results obtained on a separate test set. Currently, the best state-of-the-art result on the MNIST data base has an error rate of 0,4%. This result, however, has been obtained by using explicit knowledge that is specific to the data (elastic distortion model for digits). We here obtain an error rate of 0,55% which is second best but does not use explicit data specific knowledge. In particular it outperforms by far all methods that do not use data-specific knowledge.
Incremental Support Vector Machine Framework for Visual Sensor Networks
NASA Astrophysics Data System (ADS)
Awad, Mariette; Jiang, Xianhua; Motai, Yuichi
2006-12-01
Motivated by the emerging requirements of surveillance networks, we present in this paper an incremental multiclassification support vector machine (SVM) technique as a new framework for action classification based on real-time multivideo collected by homogeneous sites. The technique is based on an adaptation of least square SVM (LS-SVM) formulation but extends beyond the static image-based learning of current SVM methodologies. In applying the technique, an initial supervised offline learning phase is followed by a visual behavior data acquisition and an online learning phase during which the cluster head performs an ensemble of model aggregations based on the sensor nodes inputs. The cluster head then selectively switches on designated sensor nodes for future incremental learning. Combining sensor data offers an improvement over single camera sensing especially when the latter has an occluded view of the target object. The optimization involved alleviates the burdens of power consumption and communication bandwidth requirements. The resulting misclassification error rate, the iterative error reduction rate of the proposed incremental learning, and the decision fusion technique prove its validity when applied to visual sensor networks. Furthermore, the enabled online learning allows an adaptive domain knowledge insertion and offers the advantage of reducing both the model training time and the information storage requirements of the overall system which makes it even more attractive for distributed sensor networks communication.
Viral quasispecies inference from 454 pyrosequencing
2013-01-01
Background Many potentially life-threatening infectious viruses are highly mutable in nature. Characterizing the fittest variants within a quasispecies from infected patients is expected to allow unprecedented opportunities to investigate the relationship between quasispecies diversity and disease epidemiology. The advent of next-generation sequencing technologies has allowed the study of virus diversity with high-throughput sequencing, although these methods come with higher rates of errors which can artificially increase diversity. Results Here we introduce a novel computational approach that incorporates base quality scores from next-generation sequencers for reconstructing viral genome sequences that simultaneously infers the number of variants within a quasispecies that are present. Comparisons on simulated and clinical data on dengue virus suggest that the novel approach provides a more accurate inference of the underlying number of variants within the quasispecies, which is vital for clinical efforts in mapping the within-host viral diversity. Sequence alignments generated by our approach are also found to exhibit lower rates of error. Conclusions The ability to infer the viral quasispecies colony that is present within a human host provides the potential for a more accurate classification of the viral phenotype. Understanding the genomics of viruses will be relevant not just to studying how to control or even eradicate these viral infectious diseases, but also in learning about the innate protection in the human host against the viruses. PMID:24308284
NASA Technical Reports Server (NTRS)
Leake, M. A.
1982-01-01
Planetary imagery techniques, errors in measurement or degradation assignment, and statistical formulas are presented with respect to cratering data. Base map photograph preparation, measurement of crater diameters and sampled area, and instruments used are discussed. Possible uncertainties, such as Sun angle, scale factors, degradation classification, and biases in crater recognition are discussed. The mathematical formulas used in crater statistics are presented.
Zhang, Bin; He, Xin; Ouyang, Fusheng; Gu, Dongsheng; Dong, Yuhao; Zhang, Lu; Mo, Xiaokai; Huang, Wenhui; Tian, Jie; Zhang, Shuixing
2017-09-10
We aimed to identify optimal machine-learning methods for radiomics-based prediction of local failure and distant failure in advanced nasopharyngeal carcinoma (NPC). We enrolled 110 patients with advanced NPC. A total of 970 radiomic features were extracted from MRI images for each patient. Six feature selection methods and nine classification methods were evaluated in terms of their performance. We applied the 10-fold cross-validation as the criterion for feature selection and classification. We repeated each combination for 50 times to obtain the mean area under the curve (AUC) and test error. We observed that the combination methods Random Forest (RF) + RF (AUC, 0.8464 ± 0.0069; test error, 0.3135 ± 0.0088) had the highest prognostic performance, followed by RF + Adaptive Boosting (AdaBoost) (AUC, 0.8204 ± 0.0095; test error, 0.3384 ± 0.0097), and Sure Independence Screening (SIS) + Linear Support Vector Machines (LSVM) (AUC, 0.7883 ± 0.0096; test error, 0.3985 ± 0.0100). Our radiomics study identified optimal machine-learning methods for the radiomics-based prediction of local failure and distant failure in advanced NPC, which could enhance the applications of radiomics in precision oncology and clinical practice. Copyright © 2017 Elsevier B.V. All rights reserved.
A cascaded coding scheme for error control and its performance analysis
NASA Technical Reports Server (NTRS)
Lin, Shu; Kasami, Tadao; Fujiwara, Tohru; Takata, Toyoo
1986-01-01
A coding scheme is investigated for error control in data communication systems. The scheme is obtained by cascading two error correcting codes, called the inner and outer codes. The error performance of the scheme is analyzed for a binary symmetric channel with bit error rate epsilon <1/2. It is shown that if the inner and outer codes are chosen properly, extremely high reliability can be attained even for a high channel bit error rate. Various specific example schemes with inner codes ranging form high rates to very low rates and Reed-Solomon codes as inner codes are considered, and their error probabilities are evaluated. They all provide extremely high reliability even for very high bit error rates. Several example schemes are being considered by NASA for satellite and spacecraft down link error control.
Supporting diagnosis of attention-deficit hyperactive disorder with novelty detection.
Lee, Hyoung-Joo; Cho, Sungzoon; Shin, Min-Sup
2008-03-01
Computerized continuous performance test (CPT) is a widely used diagnostic tool for attention-deficit hyperactivity disorder (ADHD). It measures the number of correctly detected stimuli as well as response times. Typically, when calculating a cut-off score for discriminating between normal and abnormal, only the normal children's data are collected. Then the average and standard deviation of each measure or variable is computed. If any of variables is larger than 2 sigma above the average, that child is diagnosed as abnormal. We will call this approach as "T-score 70" classifier. However, its performance has a lot to be desired due to a high false negative error. In order to improve the classification accuracy we propose to use novelty detection approaches for supporting ADHD diagnosis. Novelty detection is a model building framework where a classifier is constructed using only one class of training data and a new input pattern is classified according to its similarity to the training data. A total of eight novelty detectors are introduced and applied to our ADHD datasets collected from two modes of tests, visual and auditory. They are evaluated and compared with the T-score model on validation datasets in terms of false positive and negative error rates, and area under receiver operating characteristics curve (AuROC). Experimental results show that the cut-off score of 70 is suboptimal which leads to a low false positive error but a very high false negative error. A few novelty detectors such as Parzen density estimators yield much more balanced classification performances. Moreover, most novelty detectors outperform the T-score method for most age groups statistically with a significance level of 1% in terms of AuROC. In particular, we recommend the Parzen and Gaussian density estimators, kernel principal component analysis, one-class support vector machine, and K-means clustering novelty detector which can improve upon the T-score method on average by at least 30% for the visual test and 40% for the auditory test. In addition, their performances are relatively stable over various parameter values as long as they are within reasonable ranges. The proposed novelty detection approaches can replace the T-score method which has been considered the "gold standard" for supporting ADHD diagnosis. Furthermore, they can be applied to other psychological tests where only normal data are available.
High Dimensional Classification Using Features Annealed Independence Rules.
Fan, Jianqing; Fan, Yingying
2008-01-01
Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.
45 CFR 98.100 - Error Rate Report.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 45 Public Welfare 1 2013-10-01 2013-10-01 false Error Rate Report. 98.100 Section 98.100 Public Welfare DEPARTMENT OF HEALTH AND HUMAN SERVICES GENERAL ADMINISTRATION CHILD CARE AND DEVELOPMENT FUND Error Rate Reporting § 98.100 Error Rate Report. (a) Applicability—The requirements of this subpart...
45 CFR 98.100 - Error Rate Report.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 45 Public Welfare 1 2014-10-01 2014-10-01 false Error Rate Report. 98.100 Section 98.100 Public Welfare Department of Health and Human Services GENERAL ADMINISTRATION CHILD CARE AND DEVELOPMENT FUND Error Rate Reporting § 98.100 Error Rate Report. (a) Applicability—The requirements of this subpart...
45 CFR 98.100 - Error Rate Report.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 45 Public Welfare 1 2012-10-01 2012-10-01 false Error Rate Report. 98.100 Section 98.100 Public Welfare DEPARTMENT OF HEALTH AND HUMAN SERVICES GENERAL ADMINISTRATION CHILD CARE AND DEVELOPMENT FUND Error Rate Reporting § 98.100 Error Rate Report. (a) Applicability—The requirements of this subpart...
45 CFR 98.100 - Error Rate Report.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 45 Public Welfare 1 2011-10-01 2011-10-01 false Error Rate Report. 98.100 Section 98.100 Public Welfare DEPARTMENT OF HEALTH AND HUMAN SERVICES GENERAL ADMINISTRATION CHILD CARE AND DEVELOPMENT FUND Error Rate Reporting § 98.100 Error Rate Report. (a) Applicability—The requirements of this subpart...
A label distance maximum-based classifier for multi-label learning.
Liu, Xiaoli; Bao, Hang; Zhao, Dazhe; Cao, Peng
2015-01-01
Multi-label classification is useful in many bioinformatics tasks such as gene function prediction and protein site localization. This paper presents an improved neural network algorithm, Max Label Distance Back Propagation Algorithm for Multi-Label Classification. The method was formulated by modifying the total error function of the standard BP by adding a penalty term, which was realized by maximizing the distance between the positive and negative labels. Extensive experiments were conducted to compare this method against state-of-the-art multi-label methods on three popular bioinformatic benchmark datasets. The results illustrated that this proposed method is more effective for bioinformatic multi-label classification compared to commonly used techniques.
Lacie phase 1 Classification and Mensuration Subsystem (CAMS) rework experiment
NASA Technical Reports Server (NTRS)
Chhikara, R. S.; Hsu, E. M.; Liszcz, C. J.
1976-01-01
An experiment was designed to test the ability of the Classification and Mensuration Subsystem rework operations to improve wheat proportion estimates for segments that had been processed previously. Sites selected for the experiment included three in Kansas and three in Texas, with the remaining five distributed in Montana and North and South Dakota. The acquisition dates were selected to be representative of imagery available in actual operations. No more than one acquisition per biophase were used, and biophases were determined by actual crop calendars. All sites were worked by each of four Analyst-Interpreter/Data Processing Analyst Teams who reviewed the initial processing of each segment and accepted or reworked it for an estimate of the proportion of small grains in the segment. Classification results, acquisitions and classification errors and performance results between CAMS regular and ITS rework are tabulated.
Paul Sullins, D
2017-12-01
Because of classification errors reported by the National Center for Health Statistics, an estimated 42 % of the same-sex married partners in the sample for this study are misclassified different-sex married partners, thus calling into question findings regarding same-sex married parents. Including biological parentage as a control variable suppresses same-sex/different-sex differences, thus obscuring the data error. Parentage is not appropriate as a control because it correlates nearly perfectly (+.97, gamma) with the same-sex/different-sex distinction and is invariant for the category of joint biological parents.
NASA Astrophysics Data System (ADS)
Guha, Daipayan; Jakubovic, Raphael; Gupta, Shaurya; Yang, Victor X. D.
2017-02-01
Computer-assisted navigation (CAN) may guide spinal surgeries, reliably reducing screw breach rates. Definitions of screw breach, if reported, vary widely across studies. Absolute quantitative error is theoretically a more precise and generalizable metric of navigation accuracy, but has been computed variably and reported in fewer than 25% of clinical studies of CAN-guided pedicle screw accuracy. We reviewed a prospectively-collected series of 209 pedicle screws placed with CAN guidance to characterize the correlation between clinical pedicle screw accuracy, based on postoperative imaging, and absolute quantitative navigation accuracy. We found that acceptable screw accuracy was achieved for significantly fewer screws based on 2mm grade vs. Heary grade, particularly in the lumbar spine. Inter-rater agreement was good for the Heary classification and moderate for the 2mm grade, significantly greater among radiologists than surgeon raters. Mean absolute translational/angular accuracies were 1.75mm/3.13° and 1.20mm/3.64° in the axial and sagittal planes, respectively. There was no correlation between clinical and absolute navigation accuracy, in part because surgeons appear to compensate for perceived translational navigation error by adjusting screw medialization angle. Future studies of navigation accuracy should therefore report absolute translational and angular errors. Clinical screw grades based on post-operative imaging, if reported, may be more reliable if performed in multiple by radiologist raters.
Higher criticism thresholding: Optimal feature selection when useful features are rare and weak.
Donoho, David; Jin, Jiashun
2008-09-30
In important application fields today-genomics and proteomics are examples-selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, ..., p, let pi(i) denote the two-sided P-value associated with the ith feature Z-score and pi((i)) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p - pi((i)))/sqrt{i/p(1-i/p)}. We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT.
Higher criticism thresholding: Optimal feature selection when useful features are rare and weak
Donoho, David; Jin, Jiashun
2008-01-01
In important application fields today—genomics and proteomics are examples—selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, …, p, let πi denote the two-sided P-value associated with the ith feature Z-score and π(i) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p − π(i))/i/p(1−i/p). We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT. PMID:18815365
Matheny, Michael E; Normand, Sharon-Lise T; Gross, Thomas P; Marinac-Dabic, Danica; Loyo-Berrios, Nilsa; Vidi, Venkatesan D; Donnelly, Sharon; Resnic, Frederic S
2011-12-14
Automated adverse outcome surveillance tools and methods have potential utility in quality improvement and medical product surveillance activities. Their use for assessing hospital performance on the basis of patient outcomes has received little attention. We compared risk-adjusted sequential probability ratio testing (RA-SPRT) implemented in an automated tool to Massachusetts public reports of 30-day mortality after isolated coronary artery bypass graft surgery. A total of 23,020 isolated adult coronary artery bypass surgery admissions performed in Massachusetts hospitals between January 1, 2002 and September 30, 2007 were retrospectively re-evaluated. The RA-SPRT method was implemented within an automated surveillance tool to identify hospital outliers in yearly increments. We used an overall type I error rate of 0.05, an overall type II error rate of 0.10, and a threshold that signaled if the odds of dying 30-days after surgery was at least twice than expected. Annual hospital outlier status, based on the state-reported classification, was considered the gold standard. An event was defined as at least one occurrence of a higher-than-expected hospital mortality rate during a given year. We examined a total of 83 hospital-year observations. The RA-SPRT method alerted 6 events among three hospitals for 30-day mortality compared with 5 events among two hospitals using the state public reports, yielding a sensitivity of 100% (5/5) and specificity of 98.8% (79/80). The automated RA-SPRT method performed well, detecting all of the true institutional outliers with a small false positive alerting rate. Such a system could provide confidential automated notification to local institutions in advance of public reporting providing opportunities for earlier quality improvement interventions.
An educational and audit tool to reduce prescribing error in intensive care.
Thomas, A N; Boxall, E M; Laha, S K; Day, A J; Grundy, D
2008-10-01
To reduce prescribing errors in an intensive care unit by providing prescriber education in tutorials, ward-based teaching and feedback in 3-monthly cycles with each new group of trainee medical staff. Prescribing audits were conducted three times in each 3-month cycle, once pretraining, once post-training and a final audit after 6 weeks. The audit information was fed back to prescribers with their correct prescribing rates, rates for individual error types and total error rates together with anonymised information about other prescribers' error rates. The percentage of prescriptions with errors decreased over each 3-month cycle (pretraining 25%, 19%, (one missing data point), post-training 23%, 6%, 11%, final audit 7%, 3%, 5% (p<0.0005)). The total number of prescriptions and error rates varied widely between trainees (data collection one; cycle two: range of prescriptions written: 1-61, median 18; error rate: 0-100%; median: 15%). Prescriber education and feedback reduce manual prescribing errors in intensive care.
A Six Sigma Trial For Reduction of Error Rates in Pathology Laboratory.
Tosuner, Zeynep; Gücin, Zühal; Kiran, Tuğçe; Büyükpinarbaşili, Nur; Turna, Seval; Taşkiran, Olcay; Arici, Dilek Sema
2016-01-01
A major target of quality assurance is the minimization of error rates in order to enhance patient safety. Six Sigma is a method targeting zero error (3.4 errors per million events) used in industry. The five main principles of Six Sigma are defining, measuring, analysis, improvement and control. Using this methodology, the causes of errors can be examined and process improvement strategies can be identified. The aim of our study was to evaluate the utility of Six Sigma methodology in error reduction in our pathology laboratory. The errors encountered between April 2014 and April 2015 were recorded by the pathology personnel. Error follow-up forms were examined by the quality control supervisor, administrative supervisor and the head of the department. Using Six Sigma methodology, the rate of errors was measured monthly and the distribution of errors at the preanalytic, analytic and postanalytical phases was analysed. Improvement strategies were reclaimed in the monthly intradepartmental meetings and the control of the units with high error rates was provided. Fifty-six (52.4%) of 107 recorded errors in total were at the pre-analytic phase. Forty-five errors (42%) were recorded as analytical and 6 errors (5.6%) as post-analytical. Two of the 45 errors were major irrevocable errors. The error rate was 6.8 per million in the first half of the year and 1.3 per million in the second half, decreasing by 79.77%. The Six Sigma trial in our pathology laboratory provided the reduction of the error rates mainly in the pre-analytic and analytic phases.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hughes, Michael J.; Hayes, Daniel J
2014-01-01
Use of Landsat data to answer ecological questions is contingent on the effective removal of cloud and cloud shadow from satellite images. We develop a novel algorithm to identify and classify clouds and cloud shadow, \\textsc{sparcs}: Spacial Procedures for Automated Removal of Cloud and Shadow. The method uses neural networks to determine cloud, cloud-shadow, water, snow/ice, and clear-sky membership of each pixel in a Landsat scene, and then applies a set of procedures to enforce spatial rules. In a comparison to FMask, a high-quality cloud and cloud-shadow classification algorithm currently available, \\textsc{sparcs} performs favorably, with similar omission errors for cloudsmore » (0.8% and 0.9%, respectively), substantially lower omission error for cloud-shadow (8.3% and 1.1%), and fewer errors of commission (7.8% and 5.0%). Additionally, textsc{sparcs} provides a measure of uncertainty in its classification that can be exploited by other processes that use the cloud and cloud-shadow detection. To illustrate this, we present an application that constructs obstruction-free composites of images acquired on different dates in support of algorithms detecting vegetation change.« less
Data Analysis & Statistical Methods for Command File Errors
NASA Technical Reports Server (NTRS)
Meshkat, Leila; Waggoner, Bruce; Bryant, Larry
2014-01-01
This paper explains current work on modeling for managing the risk of command file errors. It is focused on analyzing actual data from a JPL spaceflight mission to build models for evaluating and predicting error rates as a function of several key variables. We constructed a rich dataset by considering the number of errors, the number of files radiated, including the number commands and blocks in each file, as well as subjective estimates of workload and operational novelty. We have assessed these data using different curve fitting and distribution fitting techniques, such as multiple regression analysis, and maximum likelihood estimation to see how much of the variability in the error rates can be explained with these. We have also used goodness of fit testing strategies and principal component analysis to further assess our data. Finally, we constructed a model of expected error rates based on the what these statistics bore out as critical drivers to the error rate. This model allows project management to evaluate the error rate against a theoretically expected rate as well as anticipate future error rates.
Detecting Signatures of GRACE Sensor Errors in Range-Rate Residuals
NASA Astrophysics Data System (ADS)
Goswami, S.; Flury, J.
2016-12-01
In order to reach the accuracy of the GRACE baseline, predicted earlier from the design simulations, efforts are ongoing since a decade. GRACE error budget is highly dominated by noise from sensors, dealiasing models and modeling errors. GRACE range-rate residuals contain these errors. Thus, their analysis provides an insight to understand the individual contribution to the error budget. Hence, we analyze the range-rate residuals with focus on contribution of sensor errors due to mis-pointing and bad ranging performance in GRACE solutions. For the analysis of pointing errors, we consider two different reprocessed attitude datasets with differences in pointing performance. Then range-rate residuals are computed from these two datasetsrespectively and analysed. We further compare the system noise of four K-and Ka- band frequencies of the two spacecrafts, with range-rate residuals. Strong signatures of mis-pointing errors can be seen in the range-rate residuals. Also, correlation between range frequency noise and range-rate residuals are seen.
An experiment in multispectral, multitemporal crop classification using relaxation techniques
NASA Technical Reports Server (NTRS)
Davis, L. S.; Wang, C.-Y.; Xie, H.-C
1983-01-01
The paper describes the result of an experimental study concerning the use of probabilistic relaxation for improving pixel classification rates. Two LACIE sites were used in the study and in both cases, relaxation resulted in a marked improvement in classification rates.
Identifying medication error chains from critical incident reports: a new analytic approach.
Huckels-Baumgart, Saskia; Manser, Tanja
2014-10-01
Research into the distribution of medication errors usually focuses on isolated stages within the medication use process. Our study aimed to provide a novel process-oriented approach to medication incident analysis focusing on medication error chains. Our study was conducted across a 900-bed teaching hospital in Switzerland. All reported 1,591 medication errors 2009-2012 were categorized using the Medication Error Index NCC MERP and the WHO Classification for Patient Safety Methodology. In order to identify medication error chains, each reported medication incident was allocated to the relevant stage of the hospital medication use process. Only 25.8% of the reported medication errors were detected before they propagated through the medication use process. The majority of medication errors (74.2%) formed an error chain encompassing two or more stages. The most frequent error chain comprised preparation up to and including medication administration (45.2%). "Non-consideration of documentation/prescribing" during the drug preparation was the most frequent contributor for "wrong dose" during the administration of medication. Medication error chains provide important insights for detecting and stopping medication errors before they reach the patient. Existing and new safety barriers need to be extended to interrupt error chains and to improve patient safety. © 2014, The American College of Clinical Pharmacology.
Akbar, Shahid; Hayat, Maqsood; Iqbal, Muhammad; Jan, Mian Ahmad
2017-06-01
Cancer is a fatal disease, responsible for one-quarter of all deaths in developed countries. Traditional anticancer therapies such as, chemotherapy and radiation, are highly expensive, susceptible to errors and ineffective techniques. These conventional techniques induce severe side-effects on human cells. Due to perilous impact of cancer, the development of an accurate and highly efficient intelligent computational model is desirable for identification of anticancer peptides. In this paper, evolutionary intelligent genetic algorithm-based ensemble model, 'iACP-GAEnsC', is proposed for the identification of anticancer peptides. In this model, the protein sequences are formulated, using three different discrete feature representation methods, i.e., amphiphilic Pseudo amino acid composition, g-Gap dipeptide composition, and Reduce amino acid alphabet composition. The performance of the extracted feature spaces are investigated separately and then merged to exhibit the significance of hybridization. In addition, the predicted results of individual classifiers are combined together, using optimized genetic algorithm and simple majority technique in order to enhance the true classification rate. It is observed that genetic algorithm-based ensemble classification outperforms than individual classifiers as well as simple majority voting base ensemble. The performance of genetic algorithm-based ensemble classification is highly reported on hybrid feature space, with an accuracy of 96.45%. In comparison to the existing techniques, 'iACP-GAEnsC' model has achieved remarkable improvement in terms of various performance metrics. Based on the simulation results, it is observed that 'iACP-GAEnsC' model might be a leading tool in the field of drug design and proteomics for researchers. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Jones, D. O.; Scolnic, D. M.; Riess, A. G.; Kessler, R.; Rest, A.; Kirshner, R. P.; Berger, E.; Ortega, C. A.; Foley, R. J.; Chornock, R.; Challis, P. J.; Burgett, W. S.; Chambers, K. C.; Draper, P. W.; Flewelling, H.; Huber, M. E.; Kaiser, N.; Kudritzki, R.-P.; Metcalfe, N.; Wainscoat, R. J.; Waters, C.
2017-07-01
The Pan-STARRS (PS1) Medium Deep Survey discovered over 5000 likely supernovae (SNe) but obtained spectral classifications for just 10% of its SN candidates. We measured spectroscopic host galaxy redshifts for 3147 of these likely SNe and estimate that ˜1000 are Type Ia SNe (SNe Ia) with light-curve quality sufficient for a cosmological analysis. We use these data with simulations to determine the impact of core-collapse SN (CC SN) contamination on measurements of the dark energy equation of state parameter, w. Using the method of Bayesian Estimation Applied to Multiple Species (BEAMS), distances to SNe Ia and the contaminating CC SN distribution are simultaneously determined. We test light-curve-based SN classification priors for BEAMS as well as a new classification method that relies upon host galaxy spectra and the association of SN type with host type. By testing several SN classification methods and CC SN parameterizations on large SN simulations, we estimate that CC SN contamination gives a systematic error on w ({σ }w{CC}) of 0.014, 29% of the statistical uncertainty. Our best method gives {σ }w{CC}=0.004, just 8% of the statistical uncertainty, but could be affected by incomplete knowledge of the CC SN distribution. This method determines the SALT2 color and shape coefficients, α and β, with ˜3% bias. However, we find that some variants require α and β to be fixed to known values for BEAMS to yield accurate measurements of w. Finally, the inferred abundance of bright CC SNe in our sample is greater than expected based on measured CC SN rates and luminosity functions.
Using spectrotemporal indices to improve the fruit-tree crop classification accuracy
NASA Astrophysics Data System (ADS)
Peña, M. A.; Liao, R.; Brenning, A.
2017-06-01
This study assesses the potential of spectrotemporal indices derived from satellite image time series (SITS) to improve the classification accuracy of fruit-tree crops. Six major fruit-tree crop types in the Aconcagua Valley, Chile, were classified by applying various linear discriminant analysis (LDA) techniques on a Landsat-8 time series of nine images corresponding to the 2014-15 growing season. As features we not only used the complete spectral resolution of the SITS, but also all possible normalized difference indices (NDIs) that can be constructed from any two bands of the time series, a novel approach to derive features from SITS. Due to the high dimensionality of this "enhanced" feature set we used the lasso and ridge penalized variants of LDA (PLDA). Although classification accuracies yielded by the standard LDA applied on the full-band SITS were good (misclassification error rate, MER = 0.13), they were further improved by 23% (MER = 0.10) with ridge PLDA using the enhanced feature set. The most important bands to discriminate the crops of interest were mainly concentrated on the first two image dates of the time series, corresponding to the crops' greenup stage. Despite the high predictor weights provided by the red and near infrared bands, typically used to construct greenness spectral indices, other spectral regions were also found important for the discrimination, such as the shortwave infrared band at 2.11-2.19 μm, sensitive to foliar water changes. These findings support the usefulness of spectrotemporal indices in the context of SITS-based crop type classifications, which until now have been mainly constructed by the arithmetic combination of two bands of the same image date in order to derive greenness temporal profiles like those from the normalized difference vegetation index.
Methods for data classification
Garrity, George [Okemos, MI; Lilburn, Timothy G [Front Royal, VA
2011-10-11
The present invention provides methods for classifying data and uncovering and correcting annotation errors. In particular, the present invention provides a self-organizing, self-correcting algorithm for use in classifying data. Additionally, the present invention provides a method for classifying biological taxa.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, H; Chen, Z; Nath, R
Purpose: kV fluoroscopic imaging combined with MV treatment beam imaging has been investigated for intrafractional motion monitoring and correction. It is, however, subject to additional kV imaging dose to normal tissue. To balance tracking accuracy and imaging dose, we previously proposed an adaptive imaging strategy to dynamically decide future imaging type and moments based on motion tracking uncertainty. kV imaging may be used continuously for maximal accuracy or only when the position uncertainty (probability of out of threshold) is high if a preset imaging dose limit is considered. In this work, we propose more accurate methods to estimate tracking uncertaintymore » through analyzing acquired data in real-time. Methods: We simulated motion tracking process based on a previously developed imaging framework (MV + initial seconds of kV imaging) using real-time breathing data from 42 patients. Motion tracking errors for each time point were collected together with the time point’s corresponding features, such as tumor motion speed and 2D tracking error of previous time points, etc. We tested three methods for error uncertainty estimation based on the features: conditional probability distribution, logistic regression modeling, and support vector machine (SVM) classification to detect errors exceeding a threshold. Results: For conditional probability distribution, polynomial regressions on three features (previous tracking error, prediction quality, and cosine of the angle between the trajectory and the treatment beam) showed strong correlation with the variation (uncertainty) of the mean 3D tracking error and its standard deviation: R-square = 0.94 and 0.90, respectively. The logistic regression and SVM classification successfully identified about 95% of tracking errors exceeding 2.5mm threshold. Conclusion: The proposed methods can reliably estimate the motion tracking uncertainty in real-time, which can be used to guide adaptive additional imaging to confirm the tumor is within the margin or initialize motion compensation if it is out of the margin.« less