Methods for data classification
Garrity, George [Okemos, MI; Lilburn, Timothy G [Front Royal, VA
2011-10-11
The present invention provides methods for classifying data and uncovering and correcting annotation errors. In particular, the present invention provides a self-organizing, self-correcting algorithm for use in classifying data. Additionally, the present invention provides a method for classifying biological taxa.
Ahmadian, Alireza; Ay, Mohammad R; Bidgoli, Javad H; Sarkar, Saeed; Zaidi, Habib
2008-10-01
Oral contrast is usually administered in most X-ray computed tomography (CT) examinations of the abdomen and the pelvis as it allows more accurate identification of the bowel and facilitates the interpretation of abdominal and pelvic CT studies. However, the misclassification of contrast medium with high-density bone in CT-based attenuation correction (CTAC) is known to generate artifacts in the attenuation map (mumap), thus resulting in overcorrection for attenuation of positron emission tomography (PET) images. In this study, we developed an automated algorithm for segmentation and classification of regions containing oral contrast medium to correct for artifacts in CT-attenuation-corrected PET images using the segmented contrast correction (SCC) algorithm. The proposed algorithm consists of two steps: first, high CT number object segmentation using combined region- and boundary-based segmentation and second, object classification to bone and contrast agent using a knowledge-based nonlinear fuzzy classifier. Thereafter, the CT numbers of pixels belonging to the region classified as contrast medium are substituted with their equivalent effective bone CT numbers using the SCC algorithm. The generated CT images are then down-sampled followed by Gaussian smoothing to match the resolution of PET images. A piecewise calibration curve was then used to convert CT pixel values to linear attenuation coefficients at 511 keV. The visual assessment of segmented regions performed by an experienced radiologist confirmed the accuracy of the segmentation and classification algorithms for delineation of contrast-enhanced regions in clinical CT images. The quantitative analysis of generated mumaps of 21 clinical CT colonoscopy datasets showed an overestimation ranging between 24.4% and 37.3% in the 3D-classified regions depending on their volume and the concentration of contrast medium. Two PET/CT studies known to be problematic demonstrated the applicability of the technique in clinical setting. More importantly, correction of oral contrast artifacts improved the readability and interpretation of the PET scan and showed substantial decrease of the SUV (104.3%) after correction. An automated segmentation algorithm for classification of irregular shapes of regions containing contrast medium was developed for wider applicability of the SCC algorithm for correction of oral contrast artifacts during the CTAC procedure. The algorithm is being refined and further validated in clinical setting.
A Random Forest-based ensemble method for activity recognition.
Feng, Zengtao; Mo, Lingfei; Li, Meng
2015-01-01
This paper presents a multi-sensor ensemble approach to human physical activity (PA) recognition, using random forest. We designed an ensemble learning algorithm, which integrates several independent Random Forest classifiers based on different sensor feature sets to build a more stable, more accurate and faster classifier for human activity recognition. To evaluate the algorithm, PA data collected from the PAMAP (Physical Activity Monitoring for Aging People), which is a standard, publicly available database, was utilized to train and test. The experimental results show that the algorithm is able to correctly recognize 19 PA types with an accuracy of 93.44%, while the training is faster than others. The ensemble classifier system based on the RF (Random Forest) algorithm can achieve high recognition accuracy and fast calculation.
Automatic analysis and classification of surface electromyography.
Abou-Chadi, F E; Nashar, A; Saad, M
2001-01-01
In this paper, parametric modeling of surface electromyography (EMG) algorithms that facilitates automatic SEMG feature extraction and artificial neural networks (ANN) are combined for providing an integrated system for the automatic analysis and diagnosis of myopathic disorders. Three paradigms of ANN were investigated: the multilayer backpropagation algorithm, the self-organizing feature map algorithm and a probabilistic neural network model. The performance of the three classifiers was compared with that of the old Fisher linear discriminant (FLD) classifiers. The results have shown that the three ANN models give higher performance. The percentage of correct classification reaches 90%. Poorer diagnostic performance was obtained from the FLD classifier. The system presented here indicates that surface EMG, when properly processed, can be used to provide the physician with a diagnostic assist device.
Development of the Landsat Data Continuity Mission Cloud Cover Assessment Algorithms
Scaramuzza, Pat; Bouchard, M.A.; Dwyer, John L.
2012-01-01
The upcoming launch of the Operational Land Imager (OLI) will start the next era of the Landsat program. However, the Automated Cloud-Cover Assessment (CCA) (ACCA) algorithm used on Landsat 7 requires a thermal band and is thus not suited for OLI. There will be a thermal instrument on the Landsat Data Continuity Mission (LDCM)-the Thermal Infrared Sensor-which may not be available during all OLI collections. This illustrates a need for CCA for LDCM in the absence of thermal data. To research possibilities for full-resolution OLI cloud assessment, a global data set of 207 Landsat 7 scenes with manually generated cloud masks was created. It was used to evaluate the ACCA algorithm, showing that the algorithm correctly classified 79.9% of a standard test subset of 3.95 109 pixels. The data set was also used to develop and validate two successor algorithms for use with OLI data-one derived from an off-the-shelf machine learning package and one based on ACCA but enhanced by a simple neural network. These comprehensive CCA algorithms were shown to correctly classify pixels as cloudy or clear 88.5% and 89.7% of the time, respectively.
NASA Astrophysics Data System (ADS)
Pitarch, Jaime; Ruiz-Verdú, Antonio; Sendra, María. D.; Santoleri, Rosalia
2017-02-01
We studied the performance of the MERIS maximum peak height (MPH) algorithm in the retrieval of chlorophyll-a concentration (CHL), using a matchup data set of Bottom-of-Rayleigh Reflectances (BRR) and CHL from a hypertrophic lake (Albufera de Valencia). The MPH algorithm produced a slight underestimation of CHL in the pixels classified as cyanobacteria (83% of the total) and a strong overestimation in those classified as eukaryotic phytoplankton (17%). In situ biomass data showed that the binary classification of MPH was not appropriate for mixed phytoplankton populations, producing also unrealistic discontinuities in the CHL maps. We recalibrated MPH using our matchup data set and found that a single calibration curve of third degree fitted equally well to all matchups regardless of how they were classified. As a modification to the former approach, we incorporated the Phycocyanin Index (PCI) in the formula, thus taking into account the gradient of phytoplankton composition, which reduced the CHL retrieval errors. By using in situ biomass data, we also proved that PCI was indeed an indicator of cyanobacterial dominance. We applied our recalibration of the MPH algorithm to the whole MERIS data set (2002-2012). Results highlight the usefulness of the MPH algorithm as a tool to monitor eutrophication. The relevance of this fact is higher since MPH does not require a complete atmospheric correction, which often fails over such waters. An adequate flagging or correction of sun glint is advisable though, since the MPH algorithm was sensitive to sun glint.
SVM based colon polyps classifier in a wireless active stereo endoscope.
Ayoub, J; Granado, B; Mhanna, Y; Romain, O
2010-01-01
This work focuses on the recognition of three-dimensional colon polyps captured by an active stereo vision sensor. The detection algorithm consists of SVM classifier trained on robust feature descriptors. The study is related to Cyclope, this prototype sensor allows real time 3D object reconstruction and continues to be optimized technically to improve its classification task by differentiation between hyperplastic and adenomatous polyps. Experimental results were encouraging and show correct classification rate of approximately 97%. The work contains detailed statistics about the detection rate and the computing complexity. Inspired by intensity histogram, the work shows a new approach that extracts a set of features based on depth histogram and combines stereo measurement with SVM classifiers to correctly classify benign and malignant polyps.
Muhlbaier, Michael D; Topalis, Apostolos; Polikar, Robi
2009-01-01
We have previously introduced an incremental learning algorithm Learn(++), which learns novel information from consecutive data sets by generating an ensemble of classifiers with each data set, and combining them by weighted majority voting. However, Learn(++) suffers from an inherent "outvoting" problem when asked to learn a new class omega(new) introduced by a subsequent data set, as earlier classifiers not trained on this class are guaranteed to misclassify omega(new) instances. The collective votes of earlier classifiers, for an inevitably incorrect decision, then outweigh the votes of the new classifiers' correct decision on omega(new) instances--until there are enough new classifiers to counteract the unfair outvoting. This forces Learn(++) to generate an unnecessarily large number of classifiers. This paper describes Learn(++).NC, specifically designed for efficient incremental learning of multiple new classes using significantly fewer classifiers. To do so, Learn (++).NC introduces dynamically weighted consult and vote (DW-CAV), a novel voting mechanism for combining classifiers: individual classifiers consult with each other to determine which ones are most qualified to classify a given instance, and decide how much weight, if any, each classifier's decision should carry. Experiments on real-world problems indicate that the new algorithm performs remarkably well with substantially fewer classifiers, not only as compared to its predecessor Learn(++), but also as compared to several other algorithms recently proposed for similar problems.
NASA Astrophysics Data System (ADS)
Dementev, A. O.; Dmitriev, E. V.; Kozoderov, V. V.; Egorov, V. D.
2017-10-01
Hyperspectral imaging is up-to-date promising technology widely applied for the accurate thematic mapping. The presence of a large number of narrow survey channels allows us to use subtle differences in spectral characteristics of objects and to make a more detailed classification than in the case of using standard multispectral data. The difficulties encountered in the processing of hyperspectral images are usually associated with the redundancy of spectral information which leads to the problem of the curse of dimensionality. Methods currently used for recognizing objects on multispectral and hyperspectral images are usually based on standard base supervised classification algorithms of various complexity. Accuracy of these algorithms can be significantly different depending on considered classification tasks. In this paper we study the performance of ensemble classification methods for the problem of classification of the forest vegetation. Error correcting output codes and boosting are tested on artificial data and real hyperspectral images. It is demonstrates, that boosting gives more significant improvement when used with simple base classifiers. The accuracy in this case in comparable the error correcting output code (ECOC) classifier with Gaussian kernel SVM base algorithm. However the necessity of boosting ECOC with Gaussian kernel SVM is questionable. It is demonstrated, that selected ensemble classifiers allow us to recognize forest species with high enough accuracy which can be compared with ground-based forest inventory data.
An expert support system for breast cancer diagnosis using color wavelet features.
Issac Niwas, S; Palanisamy, P; Chibbar, Rajni; Zhang, W J
2012-10-01
Breast cancer diagnosis can be done through the pathologic assessments of breast tissue samples such as core needle biopsy technique. The result of analysis on this sample by pathologist is crucial for breast cancer patient. In this paper, nucleus of tissue samples are investigated after decomposition by means of the Log-Gabor wavelet on HSV color domain and an algorithm is developed to compute the color wavelet features. These features are used for breast cancer diagnosis using Support Vector Machine (SVM) classifier algorithm. The ability of properly trained SVM is to correctly classify patterns and make them particularly suitable for use in an expert system that aids in the diagnosis of cancer tissue samples. The results are compared with other multivariate classifiers such as Naïves Bayes classifier and Artificial Neural Network. The overall accuracy of the proposed method using SVM classifier will be further useful for automation in cancer diagnosis.
A neural network for the identification of measured helicopter noise
NASA Technical Reports Server (NTRS)
Cabell, R. H.; Fuller, C. R.; O'Brien, W. F.
1991-01-01
The results of a preliminary study of the components of a novel acoustic helicopter identification system are described. The identification system uses the relationship between the amplitudes of the first eight harmonics in the main rotor noise spectrum to distinguish between helicopter types. Two classification algorithms are tested; a statistically optimal Bayes classifier, and a neural network adaptive classifier. The performance of these classifiers is tested using measured noise of three helicopters. The statistical classifier can correctly identify the helicopter an average of 67 percent of the time, while the neural network is correct an average of 65 percent of the time. These results indicate the need for additional study of the envelope of harmonic amplitudes as a component of a helicopter identification system. Issues concerning the implementation of the neural network classifier, such as training time and structure of the network, are discussed.
NASA Astrophysics Data System (ADS)
Xiao, Zhongxiu
2018-04-01
A Method of Measuring and Correcting Tilt of Anti - vibration Wind Turbines Based on Screening Algorithm is proposed in this paper. First of all, we design a device which the core is the acceleration sensor ADXL203, the inclination is measured by installing it on the tower of the wind turbine as well as the engine room. Next using the Kalman filter algorithm to filter effectively by establishing a state space model for signal and noise. Then we use matlab for simulation. Considering the impact of the tower and nacelle vibration on the collected data, the original data and the filtering data are classified and stored by the Screening algorithm, then filter the filtering data to make the output data more accurate. Finally, we eliminate installation errors by using algorithm to achieve the tilt correction. The device based on this method has high precision, low cost and anti-vibration advantages. It has a wide range of application and promotion value.
Srinivasan, Pratul P.; Kim, Leo A.; Mettu, Priyatham S.; Cousins, Scott W.; Comer, Grant M.; Izatt, Joseph A.; Farsiu, Sina
2014-01-01
We present a novel fully automated algorithm for the detection of retinal diseases via optical coherence tomography (OCT) imaging. Our algorithm utilizes multiscale histograms of oriented gradient descriptors as feature vectors of a support vector machine based classifier. The spectral domain OCT data sets used for cross-validation consisted of volumetric scans acquired from 45 subjects: 15 normal subjects, 15 patients with dry age-related macular degeneration (AMD), and 15 patients with diabetic macular edema (DME). Our classifier correctly identified 100% of cases with AMD, 100% cases with DME, and 86.67% cases of normal subjects. This algorithm is a potentially impactful tool for the remote diagnosis of ophthalmic diseases. PMID:25360373
NASA Astrophysics Data System (ADS)
Khan, F.; Enzmann, F.; Kersten, M.
2015-12-01
In X-ray computed microtomography (μXCT) image processing is the most important operation prior to image analysis. Such processing mainly involves artefact reduction and image segmentation. We propose a new two-stage post-reconstruction procedure of an image of a geological rock core obtained by polychromatic cone-beam μXCT technology. In the first stage, the beam-hardening (BH) is removed applying a best-fit quadratic surface algorithm to a given image data set (reconstructed slice), which minimizes the BH offsets of the attenuation data points from that surface. The final BH-corrected image is extracted from the residual data, or the difference between the surface elevation values and the original grey-scale values. For the second stage, we propose using a least square support vector machine (a non-linear classifier algorithm) to segment the BH-corrected data as a pixel-based multi-classification task. A combination of the two approaches was used to classify a complex multi-mineral rock sample. The Matlab code for this approach is provided in the Appendix. A minor drawback is that the proposed segmentation algorithm may become computationally demanding in the case of a high dimensional training data set.
Hierarchical Learning of Tree Classifiers for Large-Scale Plant Species Identification.
Fan, Jianping; Zhou, Ning; Peng, Jinye; Gao, Ling
2015-11-01
In this paper, a hierarchical multi-task structural learning algorithm is developed to support large-scale plant species identification, where a visual tree is constructed for organizing large numbers of plant species in a coarse-to-fine fashion and determining the inter-related learning tasks automatically. For a given parent node on the visual tree, it contains a set of sibling coarse-grained categories of plant species or sibling fine-grained plant species, and a multi-task structural learning algorithm is developed to train their inter-related classifiers jointly for enhancing their discrimination power. The inter-level relationship constraint, e.g., a plant image must first be assigned to a parent node (high-level non-leaf node) correctly if it can further be assigned to the most relevant child node (low-level non-leaf node or leaf node) on the visual tree, is formally defined and leveraged to learn more discriminative tree classifiers over the visual tree. Our experimental results have demonstrated the effectiveness of our hierarchical multi-task structural learning algorithm on training more discriminative tree classifiers for large-scale plant species identification.
NASA Astrophysics Data System (ADS)
Chandra, Malavika; Scheiman, James; Simeone, Diane; McKenna, Barbara; Purdy, Julianne; Mycek, Mary-Ann
2010-01-01
Pancreatic adenocarcinoma is one of the leading causes of cancer death, in part because of the inability of current diagnostic methods to reliably detect early-stage disease. We present the first assessment of the diagnostic accuracy of algorithms developed for pancreatic tissue classification using data from fiber optic probe-based bimodal optical spectroscopy, a real-time approach that would be compatible with minimally invasive diagnostic procedures for early cancer detection in the pancreas. A total of 96 fluorescence and 96 reflectance spectra are considered from 50 freshly excised tissue sites-including human pancreatic adenocarcinoma, chronic pancreatitis (inflammation), and normal tissues-on nine patients. Classification algorithms using linear discriminant analysis are developed to distinguish among tissues, and leave-one-out cross-validation is employed to assess the classifiers' performance. The spectral areas and ratios classifier (SpARC) algorithm employs a combination of reflectance and fluorescence data and has the best performance, with sensitivity, specificity, negative predictive value, and positive predictive value for correctly identifying adenocarcinoma being 85, 89, 92, and 80%, respectively.
Automated spike sorting algorithm based on Laplacian eigenmaps and k-means clustering.
Chah, E; Hok, V; Della-Chiesa, A; Miller, J J H; O'Mara, S M; Reilly, R B
2011-02-01
This study presents a new automatic spike sorting method based on feature extraction by Laplacian eigenmaps combined with k-means clustering. The performance of the proposed method was compared against previously reported algorithms such as principal component analysis (PCA) and amplitude-based feature extraction. Two types of classifier (namely k-means and classification expectation-maximization) were incorporated within the spike sorting algorithms, in order to find a suitable classifier for the feature sets. Simulated data sets and in-vivo tetrode multichannel recordings were employed to assess the performance of the spike sorting algorithms. The results show that the proposed algorithm yields significantly improved performance with mean sorting accuracy of 73% and sorting error of 10% compared to PCA which combined with k-means had a sorting accuracy of 58% and sorting error of 10%.A correction was made to this article on 22 February 2011. The spacing of the title was amended on the abstract page. No changes were made to the article PDF and the print version was unaffected.
NASA Astrophysics Data System (ADS)
Khehra, Baljit Singh; Pharwaha, Amar Partap Singh
2017-04-01
Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.
Toward detecting deception in intelligent systems
NASA Astrophysics Data System (ADS)
Santos, Eugene, Jr.; Johnson, Gregory, Jr.
2004-08-01
Contemporary decision makers often must choose a course of action using knowledge from several sources. Knowledge may be provided from many diverse sources including electronic sources such as knowledge-based diagnostic or decision support systems or through data mining techniques. As the decision maker becomes more dependent on these electronic information sources, detecting deceptive information from these sources becomes vital to making a correct, or at least more informed, decision. This applies to unintentional disinformation as well as intentional misinformation. Our ongoing research focuses on employing models of deception and deception detection from the fields of psychology and cognitive science to these systems as well as implementing deception detection algorithms for probabilistic intelligent systems. The deception detection algorithms are used to detect, classify and correct attempts at deception. Algorithms for detecting unexpected information rely upon a prediction algorithm from the collaborative filtering domain to predict agent responses in a multi-agent system.
Progressive Classification Using Support Vector Machines
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri; Kocurek, Michael
2009-01-01
An algorithm for progressive classification of data, analogous to progressive rendering of images, makes it possible to compromise between speed and accuracy. This algorithm uses support vector machines (SVMs) to classify data. An SVM is a machine learning algorithm that builds a mathematical model of the desired classification concept by identifying the critical data points, called support vectors. Coarse approximations to the concept require only a few support vectors, while precise, highly accurate models require far more support vectors. Once the model has been constructed, the SVM can be applied to new observations. The cost of classifying a new observation is proportional to the number of support vectors in the model. When computational resources are limited, an SVM of the appropriate complexity can be produced. However, if the constraints are not known when the model is constructed, or if they can change over time, a method for adaptively responding to the current resource constraints is required. This capability is particularly relevant for spacecraft (or any other real-time systems) that perform onboard data analysis. The new algorithm enables the fast, interactive application of an SVM classifier to a new set of data. The classification process achieved by this algorithm is characterized as progressive because a coarse approximation to the true classification is generated rapidly and thereafter iteratively refined. The algorithm uses two SVMs: (1) a fast, approximate one and (2) slow, highly accurate one. New data are initially classified by the fast SVM, producing a baseline approximate classification. For each classified data point, the algorithm calculates a confidence index that indicates the likelihood that it was classified correctly in the first pass. Next, the data points are sorted by their confidence indices and progressively reclassified by the slower, more accurate SVM, starting with the items most likely to be incorrectly classified. The user can halt this reclassification process at any point, thereby obtaining the best possible result for a given amount of computation time. Alternatively, the results can be displayed as they are generated, providing the user with real-time feedback about the current accuracy of classification.
Adaptive sleep-wake discrimination for wearable devices.
Karlen, Walter; Floreano, Dario
2011-04-01
Sleep/wake classification systems that rely on physiological signals suffer from intersubject differences that make accurate classification with a single, subject-independent model difficult. To overcome the limitations of intersubject variability, we suggest a novel online adaptation technique that updates the sleep/wake classifier in real time. The objective of the present study was to evaluate the performance of a newly developed adaptive classification algorithm that was embedded on a wearable sleep/wake classification system called SleePic. The algorithm processed ECG and respiratory effort signals for the classification task and applied behavioral measurements (obtained from accelerometer and press-button data) for the automatic adaptation task. When trained as a subject-independent classifier algorithm, the SleePic device was only able to correctly classify 74.94 ± 6.76% of the human-rated sleep/wake data. By using the suggested automatic adaptation method, the mean classification accuracy could be significantly improved to 92.98 ± 3.19%. A subject-independent classifier based on activity data only showed a comparable accuracy of 90.44 ± 3.57%. We demonstrated that subject-independent models used for online sleep-wake classification can successfully be adapted to previously unseen subjects without the intervention of human experts or off-line calibration.
Evaluation of an Algorithm to Predict Menstrual-Cycle Phase at the Time of Injury.
Tourville, Timothy W; Shultz, Sandra J; Vacek, Pamela M; Knudsen, Emily J; Bernstein, Ira M; Tourville, Kelly J; Hardy, Daniel M; Johnson, Robert J; Slauterbeck, James R; Beynnon, Bruce D
2016-01-01
Women are 2 to 8 times more likely to sustain an anterior cruciate ligament (ACL) injury than men, and previous studies indicated an increased risk for injury during the preovulatory phase of the menstrual cycle (MC). However, investigations of risk rely on retrospective classification of MC phase, and no tools for this have been validated. To evaluate the accuracy of an algorithm for retrospectively classifying MC phase at the time of a mock injury based on MC history and salivary progesterone (P4) concentration. Descriptive laboratory study. Research laboratory. Thirty-one healthy female collegiate athletes (age range, 18-24 years) provided serum or saliva (or both) samples at 8 visits over 1 complete MC. Self-reported MC information was obtained on a randomized date (1-45 days) after mock injury, which is the typical timeframe in which researchers have access to ACL-injured study participants. The MC phase was classified using the algorithm as applied in a stand-alone computational fashion and also by 4 clinical experts using the algorithm and additional subjective hormonal history information to help inform their decision. To assess algorithm accuracy, phase classifications were compared with the actual MC phase at the time of mock injury (ascertained using urinary luteinizing hormone tests and serial serum P4 samples). Clinical expert and computed classifications were compared using κ statistics. Fourteen participants (45%) experienced anovulatory cycles. The algorithm correctly classified MC phase for 23 participants (74%): 22 (76%) of 29 who were preovulatory/anovulatory and 1 (50%) of 2 who were postovulatory. Agreement between expert and algorithm classifications ranged from 80.6% (κ = 0.50) to 93% (κ = 0.83). Classifications based on same-day saliva sample and optimal P4 threshold were the same as those based on MC history alone (87.1% correct). Algorithm accuracy varied during the MC but at no time were both sensitivity and specificity levels acceptable. These findings raise concerns about the accuracy of previous retrospective MC-phase classification systems, particularly in a population with a high occurrence of anovulatory cycles.
Classifying threats with a 14-MeV neutron interrogation system.
Strellis, Dan; Gozani, Tsahi
2005-01-01
SeaPODDS (Sea Portable Drug Detection System) is a non-intrusive tool for detecting concealed threats in hidden compartments of maritime vessels. This system consists of an electronic neutron generator, a gamma-ray detector, a data acquisition computer, and a laptop computer user-interface. Although initially developed to detect narcotics, recent algorithm developments have shown that the system is capable of correctly classifying a threat into one of four distinct categories: narcotic, explosive, chemical weapon, or radiological dispersion device (RDD). Detection of narcotics, explosives, and chemical weapons is based on gamma-ray signatures unique to the chemical elements. Elements are identified by their characteristic prompt gamma-rays induced by fast and thermal neutrons. Detection of RDD is accomplished by detecting gamma-rays emitted by common radioisotopes and nuclear reactor fission products. The algorithm phenomenology for classifying threats into the proper categories is presented here.
Automatic red eye correction and its quality metric
NASA Astrophysics Data System (ADS)
Safonov, Ilia V.; Rychagov, Michael N.; Kang, KiMin; Kim, Sang Ho
2008-01-01
The red eye artifacts are troublesome defect of amateur photos. Correction of red eyes during printing without user intervention and making photos more pleasant for an observer are important tasks. The novel efficient technique of automatic correction of red eyes aimed for photo printers is proposed. This algorithm is independent from face orientation and capable to detect paired red eyes as well as single red eyes. The approach is based on application of 3D tables with typicalness levels for red eyes and human skin tones and directional edge detection filters for processing of redness image. Machine learning is applied for feature selection. For classification of red eye regions a cascade of classifiers including Gentle AdaBoost committee from Classification and Regression Trees (CART) is applied. Retouching stage includes desaturation, darkening and blending with initial image. Several versions of approach implementation using trade-off between detection and correction quality, processing time, memory volume are possible. The numeric quality criterion of automatic red eye correction is proposed. This quality metric is constructed by applying Analytic Hierarchy Process (AHP) for consumer opinions about correction outcomes. Proposed numeric metric helped to choose algorithm parameters via optimization procedure. Experimental results demonstrate high accuracy and efficiency of the proposed algorithm in comparison with existing solutions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stinnett, Jacob; Sullivan, Clair J.; Xiong, Hao
Low-resolution isotope identifiers are widely deployed for nuclear security purposes, but these detectors currently demonstrate problems in making correct identifications in many typical usage scenarios. While there are many hardware alternatives and improvements that can be made, performance on existing low resolution isotope identifiers should be able to be improved by developing new identification algorithms. We have developed a wavelet-based peak extraction algorithm and an implementation of a Bayesian classifier for automated peak-based identification. The peak extraction algorithm has been extended to compute uncertainties in the peak area calculations. To build empirical joint probability distributions of the peak areas andmore » uncertainties, a large set of spectra were simulated in MCNP6 and processed with the wavelet-based feature extraction algorithm. Kernel density estimation was then used to create a new component of the likelihood function in the Bayesian classifier. Furthermore, identification performance is demonstrated on a variety of real low-resolution spectra, including Category I quantities of special nuclear material.« less
Spectral band selection for classification of soil organic matter content
NASA Technical Reports Server (NTRS)
Henderson, Tracey L.; Szilagyi, Andrea; Baumgardner, Marion F.; Chen, Chih-Chien Thomas; Landgrebe, David A.
1989-01-01
This paper describes the spectral-band-selection (SBS) algorithm of Chen and Landgrebe (1987, 1988, and 1989) and uses the algorithm to classify the organic matter content in the earth's surface soil. The effectiveness of the algorithm was evaluated comparing the results of classification of the soil organic matter using SBS bands with those obtained using Landsat MSS bands and TM bands, showing that the algorithm was successful in finding important spectral bands for classification of organic matter content. Using the calculated bands, the probabilities of correct classification for climate-stratified data were found to range from 0.910 to 0.980.
Sea ice type maps from Alaska synthetic aperture radar facility imagery: An assessment
NASA Technical Reports Server (NTRS)
Fetterer, Florence M.; Gineris, Denise; Kwok, Ronald
1994-01-01
Synthetic aperture radar (SAR) imagery received at the Alaskan SAR Facility is routinely and automatically classified on the Geophysical Processor System (GPS) to create ice type maps. We evaluated the wintertime performance of the GPS classification algorithm by comparing ice type percentages from supervised classification with percentages from the algorithm. The root mean square (RMS) difference for multiyear ice is about 6%, while the inconsistency in supervised classification is about 3%. The algorithm separates first-year from multiyear ice well, although it sometimes fails to correctly classify new ice and open water owing to the wide distribution of backscatter for these classes. Our results imply a high degree of accuracy and consistency in the growing archive of multiyear and first-year ice distribution maps. These results have implications for heat and mass balance studies which are furthered by the ability to accurately characterize ice type distributions over a large part of the Arctic.
Classification of voting algorithms for N-version software
NASA Astrophysics Data System (ADS)
Tsarev, R. Yu; Durmuş, M. S.; Üstoglu, I.; Morozov, V. A.
2018-05-01
A voting algorithm in N-version software is a crucial component that evaluates the execution of each of the N versions and determines the correct result. Obviously, the result of the voting algorithm determines the outcome of the N-version software in general. Thus, the choice of the voting algorithm is a vital issue. A lot of voting algorithms were already developed and they may be selected for implementation based on the specifics of the analysis of input data. However, the voting algorithms applied in N-version software are not classified. This article presents an overview of classic and recent voting algorithms used in N-version software and the authors' classification of the voting algorithms. Moreover, the steps of the voting algorithms are presented and the distinctive features of the voting algorithms in Nversion software are defined.
Active Learning with Irrelevant Examples
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri; Mazzoni, Dominic
2009-01-01
An improved active learning method has been devised for training data classifiers. One example of a data classifier is the algorithm used by the United States Postal Service since the 1960s to recognize scans of handwritten digits for processing zip codes. Active learning algorithms enable rapid training with minimal investment of time on the part of human experts to provide training examples consisting of correctly classified (labeled) input data. They function by identifying which examples would be most profitable for a human expert to label. The goal is to maximize classifier accuracy while minimizing the number of examples the expert must label. Although there are several well-established methods for active learning, they may not operate well when irrelevant examples are present in the data set. That is, they may select an item for labeling that the expert simply cannot assign to any of the valid classes. In the context of classifying handwritten digits, the irrelevant items may include stray marks, smudges, and mis-scans. Querying the expert about these items results in wasted time or erroneous labels, if the expert is forced to assign the item to one of the valid classes. In contrast, the new algorithm provides a specific mechanism for avoiding querying the irrelevant items. This algorithm has two components: an active learner (which could be a conventional active learning algorithm) and a relevance classifier. The combination of these components yields a method, denoted Relevance Bias, that enables the active learner to avoid querying irrelevant data so as to increase its learning rate and efficiency when irrelevant items are present. The algorithm collects irrelevant data in a set of rejected examples, then trains the relevance classifier to distinguish between labeled (relevant) training examples and the rejected ones. The active learner combines its ranking of the items with the probability that they are relevant to yield a final decision about which item to present to the expert for labeling. Experiments on several data sets have demonstrated that the Relevance Bias approach significantly decreases the number of irrelevant items queried and also accelerates learning speed.
NASA Astrophysics Data System (ADS)
Sokolov, Anton; Gengembre, Cyril; Dmitriev, Egor; Delbarre, Hervé
2017-04-01
The problem is considered of classification of local atmospheric meteorological events in the coastal area such as sea breezes, fogs and storms. The in-situ meteorological data as wind speed and direction, temperature, humidity and turbulence are used as predictors. Local atmospheric events of 2013-2014 were analysed manually to train classification algorithms in the coastal area of English Channel in Dunkirk (France). Then, ultrasonic anemometer data and LIDAR wind profiler data were used as predictors. A few algorithms were applied to determine meteorological events by local data such as a decision tree, the nearest neighbour classifier, a support vector machine. The comparison of classification algorithms was carried out, the most important predictors for each event type were determined. It was shown that in more than 80 percent of the cases machine learning algorithms detect the meteorological class correctly. We expect that this methodology could be applied also to classify events by climatological in-situ data or by modelling data. It allows estimating frequencies of each event in perspective of climate change.
Guinness, Robert E
2015-04-28
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity.
Guinness, Robert E.
2015-01-01
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity. PMID:25928060
Trong Bui, Duong; Nguyen, Nhan Duc; Jeong, Gu-Min
2018-06-25
Human activity recognition and pedestrian dead reckoning are an interesting field because of their importance utilities in daily life healthcare. Currently, these fields are facing many challenges, one of which is the lack of a robust algorithm with high performance. This paper proposes a new method to implement a robust step detection and adaptive distance estimation algorithm based on the classification of five daily wrist activities during walking at various speeds using a smart band. The key idea is that the non-parametric adaptive distance estimator is performed after two activity classifiers and a robust step detector. In this study, two classifiers perform two phases of recognizing five wrist activities during walking. Then, a robust step detection algorithm, which is integrated with an adaptive threshold, peak and valley correction algorithm, is applied to the classified activities to detect the walking steps. In addition, the misclassification activities are fed back to the previous layer. Finally, three adaptive distance estimators, which are based on a non-parametric model of the average walking speed, calculate the length of each strike. The experimental results show that the average classification accuracy is about 99%, and the accuracy of the step detection is 98.7%. The error of the estimated distance is 2.2⁻4.2% depending on the type of wrist activities.
Automated extraction and classification of time-frequency contours in humpback vocalizations.
Ou, Hui; Au, Whitlow W L; Zurk, Lisa M; Lammers, Marc O
2013-01-01
A time-frequency contour extraction and classification algorithm was created to analyze humpback whale vocalizations. The algorithm automatically extracted contours of whale vocalization units by searching for gray-level discontinuities in the spectrogram images. The unit-to-unit similarity was quantified by cross-correlating the contour lines. A library of distinctive humpback units was then generated by applying an unsupervised, cluster-based learning algorithm. The purpose of this study was to provide a fast and automated feature selection tool to describe the vocal signatures of animal groups. This approach could benefit a variety of applications such as species description, identification, and evolution of song structures. The algorithm was tested on humpback whale song data recorded at various locations in Hawaii from 2002 to 2003. Results presented in this paper showed low probability of false alarm (0%-4%) under noisy environments with small boat vessels and snapping shrimp. The classification algorithm was tested on a controlled set of 30 units forming six unit types, and all the units were correctly classified. In a case study on humpback data collected in the Auau Chanel, Hawaii, in 2002, the algorithm extracted 951 units, which were classified into 12 distinctive types.
NASA Technical Reports Server (NTRS)
Nalepka, R. F. (Principal Investigator); Cicone, R. C.; Stinson, J. L.; Balon, R. J.
1977-01-01
The author has identified the following significant results. Two examples of haze correction algorithms were tested: CROP-A and XSTAR. The CROP-A was tested in a unitemporal mode on data collected in 1973-74 over ten sample segments in Kansas. Because of the uniformly low level of haze present in these segments, no conclusion could be reached about CROP-A's ability to compensate for haze. It was noted, however, that in some cases CROP-A made serious errors which actually degraded classification performance. The haze correction algorithm XSTAR was tested in a multitemporal mode on 1975-76 LACIE sample segment data over 23 blind sites in Kansas and 18 sample segments in North Dakota, providing wide range of haze levels and other conditions for algorithm evaluation. It was found that this algorithm substantially improved signature extension classification accuracy when a sum-of-likelihoods classifier was used with an alien rejection threshold.
Retinopathy of Prematurity-assist: Novel Software for Detecting Plus Disease
Pour, Elias Khalili; Pourreza, Hamidreza; Zamani, Kambiz Ameli; Mahmoudi, Alireza; Sadeghi, Arash Mir Mohammad; Shadravan, Mahla; Karkhaneh, Reza; Pour, Ramak Rouhi
2017-01-01
Purpose To design software with a novel algorithm, which analyzes the tortuosity and vascular dilatation in fundal images of retinopathy of prematurity (ROP) patients with an acceptable accuracy for detecting plus disease. Methods Eighty-seven well-focused fundal images taken with RetCam were classified to three groups of plus, non-plus, and pre-plus by agreement between three ROP experts. Automated algorithms in this study were designed based on two methods: the curvature measure and distance transform for assessment of tortuosity and vascular dilatation, respectively as two major parameters of plus disease detection. Results Thirty-eight plus, 12 pre-plus, and 37 non-plus images, which were classified by three experts, were tested by an automated algorithm and software evaluated the correct grouping of images in comparison to expert voting with three different classifiers, k-nearest neighbor, support vector machine and multilayer perceptron network. The plus, pre-plus, and non-plus images were analyzed with 72.3%, 83.7%, and 84.4% accuracy, respectively. Conclusions The new automated algorithm used in this pilot scheme for diagnosis and screening of patients with plus ROP has acceptable accuracy. With more improvements, it may become particularly useful, especially in centers without a skilled person in the ROP field. PMID:29022295
Yang, Xiaofeng; Wu, Shengyong; Sechopoulos, Ioannis; Fei, Baowei
2012-10-01
To develop and test an automated algorithm to classify the different tissues present in dedicated breast CT images. The original CT images are first corrected to overcome cupping artifacts, and then a multiscale bilateral filter is used to reduce noise while keeping edge information on the images. As skin and glandular tissues have similar CT values on breast CT images, morphologic processing is used to identify the skin mask based on its position information. A modified fuzzy C-means (FCM) classification method is then used to classify breast tissue as fat and glandular tissue. By combining the results of the skin mask with the FCM, the breast tissue is classified as skin, fat, and glandular tissue. To evaluate the authors' classification method, the authors use Dice overlap ratios to compare the results of the automated classification to those obtained by manual segmentation on eight patient images. The correction method was able to correct the cupping artifacts and improve the quality of the breast CT images. For glandular tissue, the overlap ratios between the authors' automatic classification and manual segmentation were 91.6% ± 2.0%. A cupping artifact correction method and an automatic classification method were applied and evaluated for high-resolution dedicated breast CT images. Breast tissue classification can provide quantitative measurements regarding breast composition, density, and tissue distribution.
Yang, Xiaofeng; Wu, Shengyong; Sechopoulos, Ioannis; Fei, Baowei
2012-01-01
Purpose: To develop and test an automated algorithm to classify the different tissues present in dedicated breast CT images. Methods: The original CT images are first corrected to overcome cupping artifacts, and then a multiscale bilateral filter is used to reduce noise while keeping edge information on the images. As skin and glandular tissues have similar CT values on breast CT images, morphologic processing is used to identify the skin mask based on its position information. A modified fuzzy C-means (FCM) classification method is then used to classify breast tissue as fat and glandular tissue. By combining the results of the skin mask with the FCM, the breast tissue is classified as skin, fat, and glandular tissue. To evaluate the authors’ classification method, the authors use Dice overlap ratios to compare the results of the automated classification to those obtained by manual segmentation on eight patient images. Results: The correction method was able to correct the cupping artifacts and improve the quality of the breast CT images. For glandular tissue, the overlap ratios between the authors’ automatic classification and manual segmentation were 91.6% ± 2.0%. Conclusions: A cupping artifact correction method and an automatic classification method were applied and evaluated for high-resolution dedicated breast CT images. Breast tissue classification can provide quantitative measurements regarding breast composition, density, and tissue distribution. PMID:23039675
Optical Detection of Degraded Therapeutic Proteins.
Herrington, William F; Singh, Gajendra P; Wu, Di; Barone, Paul W; Hancock, William; Ram, Rajeev J
2018-03-23
The quality of therapeutic proteins such as hormones, subunit and conjugate vaccines, and antibodies is critical to the safety and efficacy of modern medicine. Identifying malformed proteins at the point-of-care can prevent adverse immune reactions in patients; this is of special concern when there is an insecure supply chain resulting in the delivery of degraded, or even counterfeit, drug product. Identification of degraded protein, for example human growth hormone, is demonstrated by applying automated anomaly detection algorithms. Detection of the degraded protein differs from previous applications of machine-learning and classification to spectral analysis: only example spectra of genuine, high-quality drug products are used to construct the classifier. The algorithm is tested on Raman spectra acquired on protein dilutions typical of formulated drug product and at sample volumes of 25 µL, below the typical overfill (waste) volumes present in vials of injectable drug product. The algorithm is demonstrated to correctly classify anomalous recombinant human growth hormone (rhGH) with 92% sensitivity and 98% specificity even when the algorithm has only previously encountered high-quality drug product.
Stinnett, Jacob; Sullivan, Clair J.; Xiong, Hao
2017-03-02
Low-resolution isotope identifiers are widely deployed for nuclear security purposes, but these detectors currently demonstrate problems in making correct identifications in many typical usage scenarios. While there are many hardware alternatives and improvements that can be made, performance on existing low resolution isotope identifiers should be able to be improved by developing new identification algorithms. We have developed a wavelet-based peak extraction algorithm and an implementation of a Bayesian classifier for automated peak-based identification. The peak extraction algorithm has been extended to compute uncertainties in the peak area calculations. To build empirical joint probability distributions of the peak areas andmore » uncertainties, a large set of spectra were simulated in MCNP6 and processed with the wavelet-based feature extraction algorithm. Kernel density estimation was then used to create a new component of the likelihood function in the Bayesian classifier. Furthermore, identification performance is demonstrated on a variety of real low-resolution spectra, including Category I quantities of special nuclear material.« less
ERIC Educational Resources Information Center
Wang, Chun
2013-01-01
Cognitive diagnostic computerized adaptive testing (CD-CAT) purports to combine the strengths of both CAT and cognitive diagnosis. Cognitive diagnosis models aim at classifying examinees into the correct mastery profile group so as to pinpoint the strengths and weakness of each examinee whereas CAT algorithms choose items to determine those…
Genome-Wide Comparative Gene Family Classification
Frech, Christian; Chen, Nansheng
2010-01-01
Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species. PMID:20976221
Bilayer segmentation of webcam videos using tree-based classifiers.
Yin, Pei; Criminisi, Antonio; Winn, John; Essa, Irfan
2011-01-01
This paper presents an automatic segmentation algorithm for video frames captured by a (monocular) webcam that closely approximates depth segmentation from a stereo camera. The frames are segmented into foreground and background layers that comprise a subject (participant) and other objects and individuals. The algorithm produces correct segmentations even in the presence of large background motion with a nearly stationary foreground. This research makes three key contributions: First, we introduce a novel motion representation, referred to as "motons," inspired by research in object recognition. Second, we propose estimating the segmentation likelihood from the spatial context of motion. The estimation is efficiently learned by random forests. Third, we introduce a general taxonomy of tree-based classifiers that facilitates both theoretical and experimental comparisons of several known classification algorithms and generates new ones. In our bilayer segmentation algorithm, diverse visual cues such as motion, motion context, color, contrast, and spatial priors are fused by means of a conditional random field (CRF) model. Segmentation is then achieved by binary min-cut. Experiments on many sequences of our videochat application demonstrate that our algorithm, which requires no initialization, is effective in a variety of scenes, and the segmentation results are comparable to those obtained by stereo systems.
NASA Astrophysics Data System (ADS)
Ruske, S. T.; Topping, D. O.; Foot, V. E.; Kaye, P. H.; Stanley, W. R.; Morse, A. P.; Crawford, I.; Gallagher, M. W.
2016-12-01
Characterisation of bio-aerosols has important implications within Environment and Public Health sectors. Recent developments in Ultra-Violet Light Induced Fluorescence (UV-LIF) detectors such as the Wideband Integrated bio-aerosol Spectrometer (WIBS) and the newly introduced Multiparameter bio-aerosol Spectrometer (MBS) has allowed for the real time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal Spores and pollen. This new generation of instruments has enabled ever-larger data sets to be compiled with the aim of studying more complex environments, yet the algorithms used for specie classification remain largely invalidated. It is therefore imperative that we validate the performance of different algorithms that can be used for the task of classification, which is the focus of this study. For unsupervised learning we test Hierarchical Agglomerative Clustering with various different linkages. For supervised learning, ten methods were tested; including decision trees, ensemble methods: Random Forests, Gradient Boosting and AdaBoost; two implementations for support vector machines: libsvm and liblinear; Gaussian methods: Gaussian naïve Bayesian, quadratic and linear discriminant analysis and finally the k-nearest neighbours algorithm. The methods were applied to two different data sets measured using a new Multiparameter bio-aerosol Spectrometer. We find that clustering, in general, performs slightly worse than the supervised learning methods correctly classifying, at best, only 72.7 and 91.1 percent for the two data sets. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 88.1 and 97.8 percent of the testing data respectively across the two data sets. We discuss the wider relevance of these results with regards to challenging existing classification in real-world environments.
A False Alarm Reduction Method for a Gas Sensor Based Electronic Nose
Rahman, Mohammad Mizanur; Suksompong, Prapun; Toochinda, Pisanu; Taparugssanagorn, Attaphongse
2017-01-01
Electronic noses (E-Noses) are becoming popular for food and fruit quality assessment due to their robustness and repeated usability without fatigue, unlike human experts. An E-Nose equipped with classification algorithms and having open ended classification boundaries such as the k-nearest neighbor (k-NN), support vector machine (SVM), and multilayer perceptron neural network (MLPNN), are found to suffer from false classification errors of irrelevant odor data. To reduce false classification and misclassification errors, and to improve correct rejection performance; algorithms with a hyperspheric boundary, such as a radial basis function neural network (RBFNN) and generalized regression neural network (GRNN) with a Gaussian activation function in the hidden layer should be used. The simulation results presented in this paper show that GRNN has more correct classification efficiency and false alarm reduction capability compared to RBFNN. As the design of a GRNN and RBFNN is complex and expensive due to large numbers of neuron requirements, a simple hyperspheric classification method based on minimum, maximum, and mean (MMM) values of each class of the training dataset was presented. The MMM algorithm was simple and found to be fast and efficient in correctly classifying data of training classes, and correctly rejecting data of extraneous odors, and thereby reduced false alarms. PMID:28895910
A False Alarm Reduction Method for a Gas Sensor Based Electronic Nose.
Rahman, Mohammad Mizanur; Charoenlarpnopparut, Chalie; Suksompong, Prapun; Toochinda, Pisanu; Taparugssanagorn, Attaphongse
2017-09-12
Electronic noses (E-Noses) are becoming popular for food and fruit quality assessment due to their robustness and repeated usability without fatigue, unlike human experts. An E-Nose equipped with classification algorithms and having open ended classification boundaries such as the k -nearest neighbor ( k -NN), support vector machine (SVM), and multilayer perceptron neural network (MLPNN), are found to suffer from false classification errors of irrelevant odor data. To reduce false classification and misclassification errors, and to improve correct rejection performance; algorithms with a hyperspheric boundary, such as a radial basis function neural network (RBFNN) and generalized regression neural network (GRNN) with a Gaussian activation function in the hidden layer should be used. The simulation results presented in this paper show that GRNN has more correct classification efficiency and false alarm reduction capability compared to RBFNN. As the design of a GRNN and RBFNN is complex and expensive due to large numbers of neuron requirements, a simple hyperspheric classification method based on minimum, maximum, and mean (MMM) values of each class of the training dataset was presented. The MMM algorithm was simple and found to be fast and efficient in correctly classifying data of training classes, and correctly rejecting data of extraneous odors, and thereby reduced false alarms.
Swarm intelligence applied to the risk evaluation for congenital heart surgery.
Zapata-Impata, Brayan S; Ruiz-Fernandez, Daniel; Monsalve-Torra, Ana
2015-01-01
Particle Swarm Optimization is an optimization technique based on the positions of several particles created to find the best solution to a problem. In this work we analyze the accuracy of a modification of this algorithm to classify the levels of risk for a surgery, used as a treatment to correct children malformations that imply congenital heart diseases.
2012-01-01
Background Development and application of transcriptomics-based gene classifiers for ecotoxicological applications lag far behind those of biomedical sciences. Many such classifiers discovered thus far lack vigorous statistical and experimental validations. A combination of genetic algorithm/support vector machines and genetic algorithm/K nearest neighbors was used in this study to search for classifiers of endocrine-disrupting chemicals (EDCs) in zebrafish. Searches were conducted on both tissue-specific and tissue-combined datasets, either across the entire transcriptome or within individual transcription factor (TF) networks previously linked to EDC effects. Candidate classifiers were evaluated by gene set enrichment analysis (GSEA) on both the original training data and a dedicated validation dataset. Results Multi-tissue dataset yielded no classifiers. Among the 19 chemical-tissue conditions evaluated, the transcriptome-wide searches yielded classifiers for six of them, each having approximately 20 to 30 gene features unique to a condition. Searches within individual TF networks produced classifiers for 15 chemical-tissue conditions, each containing 100 or fewer top-ranked gene features pooled from those of multiple TF networks and also unique to each condition. For the training dataset, 10 out of 11 classifiers successfully identified the gene expression profiles (GEPs) of their targeted chemical-tissue conditions by GSEA. For the validation dataset, classifiers for prochloraz-ovary and flutamide-ovary also correctly identified the GEPs of corresponding conditions while no classifier could predict the GEP from prochloraz-brain. Conclusions The discrepancies in the performance of these classifiers were attributed in part to varying data complexity among the conditions, as measured to some degree by Fisher’s discriminant ratio statistic. This variation in data complexity could likely be compensated by adjusting sample size for individual chemical-tissue conditions, thus suggesting a need for a preliminary survey of transcriptomic responses before launching a full scale classifier discovery effort. Classifier discovery based on individual TF networks could yield more mechanistically-oriented biomarkers. GSEA proved to be a flexible and effective tool for application of gene classifiers but a similar and more refined algorithm, connectivity mapping, should also be explored. The distribution characteristics of classifiers across tissues, chemicals, and TF networks suggested a differential biological impact among the EDCs on zebrafish transcriptome involving some basic cellular functions. PMID:22849515
Swanson, Alexandra; Kosmala, Margaret; Lintott, Chris; Packer, Craig
2016-06-01
Citizen science has the potential to expand the scope and scale of research in ecology and conservation, but many professional researchers remain skeptical of data produced by nonexperts. We devised an approach for producing accurate, reliable data from untrained, nonexpert volunteers. On the citizen science website www.snapshotserengeti.org, more than 28,000 volunteers classified 1.51 million images taken in a large-scale camera-trap survey in Serengeti National Park, Tanzania. Each image was circulated to, on average, 27 volunteers, and their classifications were aggregated using a simple plurality algorithm. We validated the aggregated answers against a data set of 3829 images verified by experts and calculated 3 certainty metrics-level of agreement among classifications (evenness), fraction of classifications supporting the aggregated answer (fraction support), and fraction of classifiers who reported "nothing here" for an image that was ultimately classified as containing an animal (fraction blank)-to measure confidence that an aggregated answer was correct. Overall, aggregated volunteer answers agreed with the expert-verified data on 98% of images, but accuracy differed by species commonness such that rare species had higher rates of false positives and false negatives. Easily calculated analysis of variance and post-hoc Tukey tests indicated that the certainty metrics were significant indicators of whether each image was correctly classified or classifiable. Thus, the certainty metrics can be used to identify images for expert review. Bootstrapping analyses further indicated that 90% of images were correctly classified with just 5 volunteers per image. Species classifications based on the plurality vote of multiple citizen scientists can provide a reliable foundation for large-scale monitoring of African wildlife. © 2016 The Authors. Conservation Biology published by Wiley Periodicals, Inc. on behalf of Society for Conservation Biology.
Kosmala, Margaret; Lintott, Chris; Packer, Craig
2016-01-01
Abstract Citizen science has the potential to expand the scope and scale of research in ecology and conservation, but many professional researchers remain skeptical of data produced by nonexperts. We devised an approach for producing accurate, reliable data from untrained, nonexpert volunteers. On the citizen science website www.snapshotserengeti.org, more than 28,000 volunteers classified 1.51 million images taken in a large‐scale camera‐trap survey in Serengeti National Park, Tanzania. Each image was circulated to, on average, 27 volunteers, and their classifications were aggregated using a simple plurality algorithm. We validated the aggregated answers against a data set of 3829 images verified by experts and calculated 3 certainty metrics—level of agreement among classifications (evenness), fraction of classifications supporting the aggregated answer (fraction support), and fraction of classifiers who reported “nothing here” for an image that was ultimately classified as containing an animal (fraction blank)—to measure confidence that an aggregated answer was correct. Overall, aggregated volunteer answers agreed with the expert‐verified data on 98% of images, but accuracy differed by species commonness such that rare species had higher rates of false positives and false negatives. Easily calculated analysis of variance and post‐hoc Tukey tests indicated that the certainty metrics were significant indicators of whether each image was correctly classified or classifiable. Thus, the certainty metrics can be used to identify images for expert review. Bootstrapping analyses further indicated that 90% of images were correctly classified with just 5 volunteers per image. Species classifications based on the plurality vote of multiple citizen scientists can provide a reliable foundation for large‐scale monitoring of African wildlife. PMID:27111678
Two algorithms for neural-network design and training with application to channel equalization.
Sweatman, C Z; Mulgrew, B; Gibson, G J
1998-01-01
We describe two algorithms for designing and training neural-network classifiers. The first, the linear programming slab algorithm (LPSA), is motivated by the problem of reconstructing digital signals corrupted by passage through a dispersive channel and by additive noise. It constructs a multilayer perceptron (MLP) to separate two disjoint sets by using linear programming methods to identify network parameters. The second, the perceptron learning slab algorithm (PLSA), avoids the computational costs of linear programming by using an error-correction approach to identify parameters. Both algorithms operate in highly constrained parameter spaces and are able to exploit symmetry in the classification problem. Using these algorithms, we develop a number of procedures for the adaptive equalization of a complex linear 4-quadrature amplitude modulation (QAM) channel, and compare their performance in a simulation study. Results are given for both stationary and time-varying channels, the latter based on the COST 207 GSM propagation model.
A three-parameter model for classifying anurans into four genera based on advertisement calls.
Gingras, Bruno; Fitch, William Tecumseh
2013-01-01
The vocalizations of anurans are innate in structure and may therefore contain indicators of phylogenetic history. Thus, advertisement calls of species which are more closely related phylogenetically are predicted to be more similar than those of distant species. This hypothesis was evaluated by comparing several widely used machine-learning algorithms. Recordings of advertisement calls from 142 species belonging to four genera were analyzed. A logistic regression model, using mean values for dominant frequency, coefficient of variation of root-mean square energy, and spectral flux, correctly classified advertisement calls with regard to genus with an accuracy above 70%. Similar accuracy rates were obtained using these parameters with a support vector machine model, a K-nearest neighbor algorithm, and a multivariate Gaussian distribution classifier, whereas a Gaussian mixture model performed slightly worse. In contrast, models based on mel-frequency cepstral coefficients did not fare as well. Comparable accuracy levels were obtained on out-of-sample recordings from 52 of the 142 original species. The results suggest that a combination of low-level acoustic attributes is sufficient to discriminate efficiently between the vocalizations of these four genera, thus supporting the initial premise and validating the use of high-throughput algorithms on animal vocalizations to evaluate phylogenetic hypotheses.
A lightweight QRS detector for single lead ECG signals using a max-min difference algorithm.
Pandit, Diptangshu; Zhang, Li; Liu, Chengyu; Chattopadhyay, Samiran; Aslam, Nauman; Lim, Chee Peng
2017-06-01
Detection of the R-peak pertaining to the QRS complex of an ECG signal plays an important role for the diagnosis of a patient's heart condition. To accurately identify the QRS locations from the acquired raw ECG signals, we need to handle a number of challenges, which include noise, baseline wander, varying peak amplitudes, and signal abnormality. This research aims to address these challenges by developing an efficient lightweight algorithm for QRS (i.e., R-peak) detection from raw ECG signals. A lightweight real-time sliding window-based Max-Min Difference (MMD) algorithm for QRS detection from Lead II ECG signals is proposed. Targeting to achieve the best trade-off between computational efficiency and detection accuracy, the proposed algorithm consists of five key steps for QRS detection, namely, baseline correction, MMD curve generation, dynamic threshold computation, R-peak detection, and error correction. Five annotated databases from Physionet are used for evaluating the proposed algorithm in R-peak detection. Integrated with a feature extraction technique and a neural network classifier, the proposed ORS detection algorithm has also been extended to undertake normal and abnormal heartbeat detection from ECG signals. The proposed algorithm exhibits a high degree of robustness in QRS detection and achieves an average sensitivity of 99.62% and an average positive predictivity of 99.67%. Its performance compares favorably with those from the existing state-of-the-art models reported in the literature. In regards to normal and abnormal heartbeat detection, the proposed QRS detection algorithm in combination with the feature extraction technique and neural network classifier achieves an overall accuracy rate of 93.44% based on an empirical evaluation using the MIT-BIH Arrhythmia data set with 10-fold cross validation. In comparison with other related studies, the proposed algorithm offers a lightweight adaptive alternative for R-peak detection with good computational efficiency. The empirical results indicate that it not only yields a high accuracy rate in QRS detection, but also exhibits efficient computational complexity at the order of O(n), where n is the length of an ECG signal. Copyright © 2017 Elsevier B.V. All rights reserved.
Wohlmeister, Denise; Vianna, Débora Renz Barreto; Helfer, Virginia Etges; Calil, Luciane Noal; Buffon, Andréia; Fuentefria, Alexandre Meneghello; Corbellini, Valeriano Antonio; Pilger, Diogo André
2017-10-01
Pathogenic Candida species are detected in clinical infections. CHROMagar™ is a phenotypical method used to identify Candida species, although it has limitations, which indicates the need for more sensitive and specific techniques. Infrared Spectroscopy (FT-IR) is an analytical vibrational technique used to identify patterns of metabolic fingerprint of biological matrixes, particularly whole microbial cell systems as Candida sp. in association of classificatory chemometrics algorithms. On the other hand, Soft Independent Modeling by Class Analogy (SIMCA) is one of the typical algorithms still little employed in microbiological classification. This study demonstrates the applicability of the FT-IR-technique by specular reflectance associated with SIMCA to discriminate Candida species isolated from vaginal discharges and grown on CHROMagar™. The differences in spectra of C. albicans, C. glabrata and C. krusei were suitable for use in the discrimination of these species, which was observed by PCA. Then, a SIMCA model was constructed with standard samples of three species and using the spectral region of 1792-1561cm -1 . All samples (n=48) were properly classified based on the chromogenic method using CHROMagar™ Candida. In total, 93.4% (n=45) of the samples were correctly and unambiguously classified (Class I). Two samples of C. albicans were classified correctly, though these could have been C. glabrata (Class II). Also, one C. glabrata sample could have been classified as C. krusei (Class II). Concerning these three samples, one triplicate of each was included in Class II and two in Class I. Therefore, FT-IR associated with SIMCA can be used to identify samples of C. albicans, C. glabrata, and C. krusei grown in CHROMagar™ Candida aiming to improve clinical applications of this technique. Copyright © 2017 Elsevier B.V. All rights reserved.
Piekarczyk, Marcin; Ogiela, Marek R.
2017-01-01
The aim of this paper is to propose and evaluate the novel method of template generation, matching, comparing and visualization applied to motion capture (kinematic) analysis. To evaluate our approach, we have used motion capture recordings (MoCap) of two highly-skilled black belt karate athletes consisting of 560 recordings of various karate techniques acquired with wearable sensors. We have evaluated the quality of generated templates; we have validated the matching algorithm that calculates similarities and differences between various MoCap data; and we have examined visualizations of important differences and similarities between MoCap data. We have concluded that our algorithms works the best when we are dealing with relatively short (2–4 s) actions that might be averaged and aligned with the dynamic time warping framework. In practice, the methodology is designed to optimize the performance of some full body techniques performed in various sport disciplines, for example combat sports and martial arts. We can also use this approach to generate templates or to compare the correct performance of techniques between various top sportsmen in order to generate a knowledge base of reference MoCap videos. The motion template generated by our method can be used for action recognition purposes. We have used the DTW classifier with angle-based features to classify various karate kicks. We have performed leave-one-out action recognition for the Shorin-ryu and Oyama karate master separately. In this case, 100% actions were correctly classified. In another experiment, we used templates generated from Oyama master recordings to classify Shorin-ryu master recordings and vice versa. In this experiment, the overall recognition rate was 94.2%, which is a very good result for this type of complex action. PMID:29125560
Discrimination of herbicide-resistant kochia with hyperspectral imaging
NASA Astrophysics Data System (ADS)
Nugent, Paul W.; Shaw, Joseph A.; Jha, Prashant; Scherrer, Bryan; Donelick, Andrew; Kumar, Vipan
2018-01-01
A hyperspectral imager was used to differentiate herbicide-resistant versus herbicide-susceptible biotypes of the agronomic weed kochia, in different crops in the field at the Southern Agricultural Research Center in Huntley, Montana. Controlled greenhouse experiments showed that enough information was captured by the imager to classify plants as either a crop, herbicide-susceptible or herbicide-resistant kochia. The current analysis is developing an algorithm that will work in more uncontrolled outdoor situations. In overcast conditions, the algorithm correctly identified dicamba-resistant kochia, glyphosate-resistant kochia, and glyphosate- and dicamba-susceptible kochia with 67%, 76%, and 80% success rates, respectively.
NASA Astrophysics Data System (ADS)
Schmalz, M.; Ritter, G.; Key, R.
Accurate and computationally efficient spectral signature classification is a crucial step in the nonimaging detection and recognition of spaceborne objects. In classical hyperspectral recognition applications using linear mixing models, signature classification accuracy depends on accurate spectral endmember discrimination [1]. If the endmembers cannot be classified correctly, then the signatures cannot be classified correctly, and object recognition from hyperspectral data will be inaccurate. In practice, the number of endmembers accurately classified often depends linearly on the number of inputs. This can lead to potentially severe classification errors in the presence of noise or densely interleaved signatures. In this paper, we present an comparison of emerging technologies for nonimaging spectral signature classfication based on a highly accurate, efficient search engine called Tabular Nearest Neighbor Encoding (TNE) [3,4] and a neural network technology called Morphological Neural Networks (MNNs) [5]. Based on prior results, TNE can optimize its classifier performance to track input nonergodicities, as well as yield measures of confidence or caution for evaluation of classification results. Unlike neural networks, TNE does not have a hidden intermediate data structure (e.g., the neural net weight matrix). Instead, TNE generates and exploits a user-accessible data structure called the agreement map (AM), which can be manipulated by Boolean logic operations to effect accurate classifier refinement algorithms. The open architecture and programmability of TNE's agreement map processing allows a TNE programmer or user to determine classification accuracy, as well as characterize in detail the signatures for which TNE did not obtain classification matches, and why such mis-matches occurred. In this study, we will compare TNE and MNN based endmember classification, using performance metrics such as probability of correct classification (Pd) and rate of false detections (Rfa). As proof of principle, we analyze classification of multiple closely spaced signatures from a NASA database of space material signatures. Additional analysis pertains to computational complexity and noise sensitivity, which are superior to Bayesian techniques based on classical neural networks. [1] Winter, M.E. "Fast autonomous spectral end-member determination in hyperspectral data," in Proceedings of the 13th International Conference On Applied Geologic Remote Sensing, Vancouver, B.C., Canada, pp. 337-44 (1999). [2] N. Keshava, "A survey of spectral unmixing algorithms," Lincoln Laboratory Journal 14:55-78 (2003). [3] Key, G., M.S. SCHMALZ, F.M. Caimi, and G.X. Ritter. "Performance analysis of tabular nearest neighbor encoding algorithm for joint compression and ATR", in Proceedings SPIE 3814:115-126 (1999). [4] Schmalz, M.S. and G. Key. "Algorithms for hyperspectral signature classification in unresolved object detection using tabular nearest neighbor encoding" in Proceedings of the 2007 AMOS Conference, Maui HI (2007). [5] Ritter, G.X., G. Urcid, and M.S. Schmalz. "Autonomous single-pass endmember approximation using lattice auto-associative memories", Neurocomputing (Elsevier), accepted (June 2008).
Zhong, Victor W; Obeid, Jihad S; Craig, Jean B; Pfaff, Emily R; Thomas, Joan; Jaacks, Lindsay M; Beavers, Daniel P; Carey, Timothy S; Lawrence, Jean M; Dabelea, Dana; Hamman, Richard F; Bowlby, Deborah A; Pihoker, Catherine; Saydah, Sharon H
2016-01-01
Objective To develop an efficient surveillance approach for childhood diabetes by type across 2 large US health care systems, using phenotyping algorithms derived from electronic health record (EHR) data. Materials and Methods Presumptive diabetes cases <20 years of age from 2 large independent health care systems were identified as those having ≥1 of the 5 indicators in the past 3.5 years, including elevated HbA1c, elevated blood glucose, diabetes-related billing codes, patient problem list, and outpatient anti-diabetic medications. EHRs of all the presumptive cases were manually reviewed, and true diabetes status and diabetes type were determined. Algorithms for identifying diabetes cases overall and classifying diabetes type were either prespecified or derived from classification and regression tree analysis. Surveillance approach was developed based on the best algorithms identified. Results We developed a stepwise surveillance approach using billing code–based prespecified algorithms and targeted manual EHR review, which efficiently and accurately ascertained and classified diabetes cases by type, in both health care systems. The sensitivity and positive predictive values in both systems were approximately ≥90% for ascertaining diabetes cases overall and classifying cases with type 1 or type 2 diabetes. About 80% of the cases with “other” type were also correctly classified. This stepwise surveillance approach resulted in a >70% reduction in the number of cases requiring manual validation compared to traditional surveillance methods. Conclusion EHR data may be used to establish an efficient approach for large-scale surveillance for childhood diabetes by type, although some manual effort is still needed. PMID:27107449
Mata-Cases, Manel; Mauricio, Dídac; Real, Jordi; Bolíbar, Bonaventura; Franch-Nadal, Josep
2016-11-01
To assess the prevalence of miscoding, misclassification, misdiagnosis and under-registration of diabetes mellitus (DM) in primary health care in Catalonia (Spain), and to explore use of automated algorithms to identify them. In this cross-sectional, retrospective study using an anonymized electronic general practice database, data were collected from patients or users with a diabetes-related code or from patients with no DM or prediabetes code but treated with antidiabetic drugs (unregistered DM). Decision algorithms were designed to classify the true diagnosis of type 1 DM (T1DM), type 2 DM (T2DM), and undetermined DM (UDM), and to classify unregistered DM patients treated with antidiabetic drugs. Data were collected from a total of 376,278 subjects with a DM ICD-10 code, and from 8707 patients with no DM or prediabetes code but treated with antidiabetic drugs. After application of the algorithms, 13.9% of patients with T1DM were identified as misclassified, and were probably T2DM; 80.9% of patients with UDM were reclassified as T2DM, and 19.1% of them were misdiagnosed as DM when they probably had prediabetes. The overall prevalence of miscoding (multiple codes or UDM) was 2.2%. Finally, 55.2% of subjects with unregistered DM were classified as prediabetes, 35.7% as T2DM, 8.5% as UDM treated with insulin, and 0.6% as T1DM. The prevalence of inappropriate codification or classification and under-registration of DM is relevant in primary care. Implementation of algorithms could automatically flag cases that need review and would substantially decrease the risk of inappropriate registration or coding. Copyright © 2016 SEEN. Publicado por Elsevier España, S.L.U. All rights reserved.
Image processing for x-ray inspection of pistachio nuts
NASA Astrophysics Data System (ADS)
Casasent, David P.
2001-03-01
A review is provided of image processing techniques that have been applied to the inspection of pistachio nuts using X-ray images. X-ray sensors provide non-destructive internal product detail not available from other sensors. The primary concern in this data is detecting the presence of worm infestations in nuts, since they have been linked to the presence of aflatoxin. We describe new techniques for segmentation, feature selection, selection of product categories (clusters), classifier design, etc. Specific novel results include: a new segmentation algorithm to produce images of isolated product items; preferable classifier operation (the classifier with the best probability of correct recognition Pc is not best); higher-order discrimination information is present in standard features (thus, high-order features appear useful); classifiers that use new cluster categories of samples achieve improved performance. Results are presented for X-ray images of pistachio nuts; however, all techniques have use in other product inspection applications.
Optimal threshold estimation for binary classifiers using game theory.
Sanchez, Ignacio Enrique
2016-01-01
Many bioinformatics algorithms can be understood as binary classifiers. They are usually compared using the area under the receiver operating characteristic ( ROC ) curve. On the other hand, choosing the best threshold for practical use is a complex task, due to uncertain and context-dependent skews in the abundance of positives in nature and in the yields/costs for correct/incorrect classification. We argue that considering a classifier as a player in a zero-sum game allows us to use the minimax principle from game theory to determine the optimal operating point. The proposed classifier threshold corresponds to the intersection between the ROC curve and the descending diagonal in ROC space and yields a minimax accuracy of 1-FPR. Our proposal can be readily implemented in practice, and reveals that the empirical condition for threshold estimation of "specificity equals sensitivity" maximizes robustness against uncertainties in the abundance of positives in nature and classification costs.
Comparison of ring artifact removal methods using flat panel detector based CT images
2011-01-01
Background Ring artifacts are the concentric rings superimposed on the tomographic images often caused by the defective and insufficient calibrated detector elements as well as by the damaged scintillator crystals of the flat panel detector. It may be also generated by objects attenuating X-rays very differently in different projection direction. Ring artifact reduction techniques so far reported in the literature can be broadly classified into two groups. One category of the approaches is based on the sinogram processing also known as the pre-processing techniques and the other category of techniques perform processing on the 2-D reconstructed images, recognized as the post-processing techniques in the literature. The strength and weakness of these categories of approaches are yet to be explored from a common platform. Method In this paper, a comparative study of the two categories of ring artifact reduction techniques basically designed for the multi-slice CT instruments is presented from a common platform. For comparison, two representative algorithms from each of the two categories are selected from the published literature. A very recently reported state-of-the-art sinogram domain ring artifact correction method that classifies the ring artifacts according to their strength and then corrects the artifacts using class adaptive correction schemes is also included in this comparative study. The first sinogram domain correction method uses a wavelet based technique to detect the corrupted pixels and then using a simple linear interpolation technique estimates the responses of the bad pixels. The second sinogram based correction method performs all the filtering operations in the transform domain, i.e., in the wavelet and Fourier domain. On the other hand, the two post-processing based correction techniques actually operate on the polar transform domain of the reconstructed CT images. The first method extracts the ring artifact template vector using a homogeneity test and then corrects the CT images by subtracting the artifact template vector from the uncorrected images. The second post-processing based correction technique performs median and mean filtering on the reconstructed images to produce the corrected images. Results The performances of the comparing algorithms have been tested by using both quantitative and perceptual measures. For quantitative analysis, two different numerical performance indices are chosen. On the other hand, different types of artifact patterns, e.g., single/band ring, artifacts from defective and mis-calibrated detector elements, rings in highly structural object and also in hard object, rings from different flat-panel detectors are analyzed to perceptually investigate the strength and weakness of the five methods. An investigation has been also carried out to compare the efficacy of these algorithms in correcting the volume images from a cone beam CT with the parameters determined from one particular slice. Finally, the capability of each correction technique in retaining the image information (e.g., small object at the iso-center) accurately in the corrected CT image has been also tested. Conclusions The results show that the performances of the algorithms are limited and none is fully suitable for correcting different types of ring artifacts without introducing processing distortion to the image structure. To achieve the diagnostic quality of the corrected slices a combination of the two approaches (sinogram- and post-processing) can be used. Also the comparing methods are not suitable for correcting the volume images from a cone beam flat-panel detector based CT. PMID:21846411
NASA Astrophysics Data System (ADS)
Chander, Shard; Ganguly, Debojyoti
2017-01-01
Water level was estimated, using AltiKa radar altimeter onboard the SARAL satellite, over the Ukai reservoir using modified algorithms specifically for inland water bodies. The methodology was based on waveform classification, waveform retracking, and dedicated inland range corrections algorithms. The 40-Hz waveforms were classified based on linear discriminant analysis and Bayesian classifier. Waveforms were retracked using Brown, Ice-2, threshold, and offset center of gravity methods. Retracking algorithms were implemented on full waveform and subwaveforms (only one leading edge) for estimating the improvement in the retrieved range. European Centre for Medium-Range Weather Forecasts (ECMWF) operational, ECMWF re-analysis pressure fields, and global ionosphere maps were used to exactly estimate the range corrections. The microwave and optical images were used for estimating the extent of the water body and altimeter track location. Four global positioning system (GPS) field trips were conducted on same day as the SARAL pass using two dual frequency GPS. One GPS was mounted close to the dam in static mode and the other was used on a moving vehicle within the reservoir in Kinematic mode. In situ gauge dataset was provided by the Ukai dam authority for the time period January 1972 to March 2015. The altimeter retrieved water level results were then validated with the GPS survey and in situ gauge dataset. With good selection of virtual station (waveform classification, back scattering coefficient), Ice-2 retracker and subwaveform retracker both work better with an overall root-mean-square error <15 cm. The results support that the AltiKa dataset, due to a smaller foot-print and sharp trailing edge of the Ka-band waveform, can be utilized for more accurate water level information over inland water bodies.
Validating predictions from climate envelope models
Watling, J.; Bucklin, D.; Speroterra, C.; Brandt, L.; Cabal, C.; Romañach, Stephanie S.; Mazzotti, Frank J.
2013-01-01
Climate envelope models are a potentially important conservation tool, but their ability to accurately forecast species’ distributional shifts using independent survey data has not been fully evaluated. We created climate envelope models for 12 species of North American breeding birds previously shown to have experienced poleward range shifts. For each species, we evaluated three different approaches to climate envelope modeling that differed in the way they treated climate-induced range expansion and contraction, using random forests and maximum entropy modeling algorithms. All models were calibrated using occurrence data from 1967–1971 (t1) and evaluated using occurrence data from 1998–2002 (t2). Model sensitivity (the ability to correctly classify species presences) was greater using the maximum entropy algorithm than the random forest algorithm. Although sensitivity did not differ significantly among approaches, for many species, sensitivity was maximized using a hybrid approach that assumed range expansion, but not contraction, in t2. Species for which the hybrid approach resulted in the greatest improvement in sensitivity have been reported from more land cover types than species for which there was little difference in sensitivity between hybrid and dynamic approaches, suggesting that habitat generalists may be buffered somewhat against climate-induced range contractions. Specificity (the ability to correctly classify species absences) was maximized using the random forest algorithm and was lowest using the hybrid approach. Overall, our results suggest cautious optimism for the use of climate envelope models to forecast range shifts, but also underscore the importance of considering non-climate drivers of species range limits. The use of alternative climate envelope models that make different assumptions about range expansion and contraction is a new and potentially useful way to help inform our understanding of climate change effects on species.
Single pulse analysis of intracranial pressure for a hydrocephalus implant.
Elixmann, I M; Hansinger, J; Goffin, C; Antes, S; Radermacher, K; Leonhardt, S
2012-01-01
The intracranial pressure (ICP) waveform contains important diagnostic information. Changes in ICP are associated with changes of the pulse waveform. This change has explicitly been observed in 13 infusion tests by analyzing 100 Hz ICP data. An algorithm is proposed which automatically extracts the pulse waves and categorizes them into predefined patterns. A developed algorithm determined 88 %±8 % (mean ±SD) of all classified pulse waves correctly on predefined patterns. This algorithm has low computational cost and is independent of a pressure drift in the sensor by using only the relationship between special waveform characteristics. Hence, it could be implemented on a microcontroller of a future electromechanic hydrocephalus shunt system to control the drainage of cerebrospinal fluid (CSF).
NASA Astrophysics Data System (ADS)
Chen, Dan; Guo, Lin-yuan; Wang, Chen-hao; Ke, Xi-zheng
2017-07-01
Equalization can compensate channel distortion caused by channel multipath effects, and effectively improve convergent of modulation constellation diagram in optical wireless system. In this paper, the subspace blind equalization algorithm is used to preprocess M-ary phase shift keying (MPSK) subcarrier modulation signal in receiver. Mountain clustering is adopted to get the clustering centers of MPSK modulation constellation diagram, and the modulation order is automatically identified through the k-nearest neighbor (KNN) classifier. The experiment has been done under four different weather conditions. Experimental results show that the convergent of constellation diagram is improved effectively after using the subspace blind equalization algorithm, which means that the accuracy of modulation recognition is increased. The correct recognition rate of 16PSK can be up to 85% in any kind of weather condition which is mentioned in paper. Meanwhile, the correct recognition rate is the highest in cloudy and the lowest in heavy rain condition.
Deriving pathway maps from automated text analysis using a grammar-based approach.
Olsson, Björn; Gawronska, Barbara; Erlendsson, Björn
2006-04-01
We demonstrate how automated text analysis can be used to support the large-scale analysis of metabolic and regulatory pathways by deriving pathway maps from textual descriptions found in the scientific literature. The main assumption is that correct syntactic analysis combined with domain-specific heuristics provides a good basis for relation extraction. Our method uses an algorithm that searches through the syntactic trees produced by a parser based on a Referent Grammar formalism, identifies relations mentioned in the sentence, and classifies them with respect to their semantic class and epistemic status (facts, counterfactuals, hypotheses). The semantic categories used in the classification are based on the relation set used in KEGG (Kyoto Encyclopedia of Genes and Genomes), so that pathway maps using KEGG notation can be automatically generated. We present the current version of the relation extraction algorithm and an evaluation based on a corpus of abstracts obtained from PubMed. The results indicate that the method is able to combine a reasonable coverage with high accuracy. We found that 61% of all sentences were parsed, and 97% of the parse trees were judged to be correct. The extraction algorithm was tested on a sample of 300 parse trees and was found to produce correct extractions in 90.5% of the cases.
Celik, Turgay; Lee, Hwee Kuan; Petznick, Andrea; Tong, Louis
2013-01-01
Background Infrared (IR) meibography is an imaging technique to capture the Meibomian glands in the eyelids. These ocular surface structures are responsible for producing the lipid layer of the tear film which helps to reduce tear evaporation. In a normal healthy eye, the glands have similar morphological features in terms of spatial width, in-plane elongation, length. On the other hand, eyes with Meibomian gland dysfunction show visible structural irregularities that help in the diagnosis and prognosis of the disease. However, currently there is no universally accepted algorithm for detection of these image features which will be clinically useful. We aim to develop a method of automated gland segmentation which allows images to be classified. Methods A set of 131 meibography images were acquired from patients from the Singapore National Eye Center. We used a method of automated gland segmentation using Gabor wavelets. Features of the imaged glands including orientation, width, length and curvature were extracted and the IR images enhanced. The images were classified as ‘healthy’, ‘intermediate’ or ‘unhealthy’, through the use of a support vector machine classifier (SVM). Half the images were used for training the SVM and the other half for validation. Independently of this procedure, the meibographs were classified by an expert clinician into the same 3 grades. Results The algorithm correctly detected 94% and 98% of mid-line pixels of gland and inter-gland regions, respectively, on healthy images. On intermediate images, correct detection rates of 92% and 97% of mid-line pixels of gland and inter-gland regions were achieved respectively. The true positive rate of detecting healthy images was 86%, and for intermediate images, 74%. The corresponding false positive rates were 15% and 31% respectively. Using the SVM, the proposed method has 88% accuracy in classifying images into the 3 classes. The classification of images into healthy and unhealthy classes achieved a 100% accuracy, but 7/38 intermediate images were incorrectly classified. Conclusions This technique of image analysis in meibography can help clinicians to interpret the degree of gland destruction in patients with dry eye and meibomian gland dysfunction.
Development and Testing of Data Mining Algorithms for Earth Observation
NASA Technical Reports Server (NTRS)
Glymour, Clark
2005-01-01
The new algorithms developed under this project included a principled procedure for classification of objects, events or circumstances according to a target variable when a very large number of potential predictor variables is available but the number of cases that can be used for training a classifier is relatively small. These "high dimensional" problems require finding a minimal set of variables -called the Markov Blanket-- sufficient for predicting the value of the target variable. An algorithm, the Markov Blanket Fan Search, was developed, implemented and tested on both simulated and real data in conjunction with a graphical model classifier, which was also implemented. Another algorithm developed and implemented in TETRAD IV for time series elaborated on work by C. Granger and N. Swanson, which in turn exploited some of our earlier work. The algorithms in question learn a linear time series model from data. Given such a time series, the simultaneous residual covariances, after factoring out time dependencies, may provide information about causal processes that occur more rapidly than the time series representation allow, so called simultaneous or contemporaneous causal processes. Working with A. Monetta, a graduate student from Italy, we produced the correct statistics for estimating the contemporaneous causal structure from time series data using the TETRAD IV suite of algorithms. Two economists, David Bessler and Kevin Hoover, have independently published applications using TETRAD style algorithms to the same purpose. These implementations and algorithmic developments were separately used in two kinds of studies of climate data: Short time series of geographically proximate climate variables predicting agricultural effects in California, and longer duration climate measurements of temperature teleconnections.
NASA Astrophysics Data System (ADS)
Fatehi, Moslem; Asadi, Hooshang H.
2017-04-01
In this study, the application of a transductive support vector machine (TSVM), an innovative semi-supervised learning algorithm, has been proposed for mapping the potential drill targets at a detailed exploration stage. The semi-supervised learning method is a hybrid of supervised and unsupervised learning approach that simultaneously uses both training and non-training data to design a classifier. By using the TSVM algorithm, exploration layers at the Dalli porphyry Cu-Au deposit in the central Iran were integrated to locate the boundary of the Cu-Au mineralization for further drilling. By applying this algorithm on the non-training (unlabeled) and limited training (labeled) Dalli exploration data, the study area was classified in two domains of Cu-Au ore and waste. Then, the results were validated by the earlier block models created, using the available borehole and trench data. In addition to TSVM, the support vector machine (SVM) algorithm was also implemented on the study area for comparison. Thirty percent of the labeled exploration data was used to evaluate the performance of these two algorithms. The results revealed 87 percent correct recognition accuracy for the TSVM algorithm and 82 percent for the SVM algorithm. The deepest inclined borehole, recently drilled in the western part of the Dalli deposit, indicated that the boundary of Cu-Au mineralization, as identified by the TSVM algorithm, was only 15 m off from the actual boundary intersected by this borehole. According to the results of the TSVM algorithm, six new boreholes were suggested for further drilling at the Dalli deposit. This study showed that the TSVM algorithm could be a useful tool for enhancing the mineralization zones and consequently, ensuring a more accurate drill hole planning.
Das, A.J.; Battles, J.J.; Stephenson, N.L.; van Mantgem, P.J.
2007-01-01
We examined mortality of Abies concolor (Gord. & Glend.) Lindl. (white fir) and Pinus lambertiana Dougl. (sugar pine) by developing logistic models using three growth indices obtained from tree rings: average growth, growth trend, and count of abrupt growth declines. For P. lambertiana, models with average growth, growth trend, and count of abrupt declines improved overall prediction (78.6% dead trees correctly classified, 83.7% live trees correctly classified) compared with a model with average recent growth alone (69.6% dead trees correctly classified, 67.3% live trees correctly classified). For A. concolor, counts of abrupt declines and longer time intervals improved overall classification (trees with DBH ???20 cm: 78.9% dead trees correctly classified and 76.7% live trees correctly classified vs. 64.9% dead trees correctly classified and 77.9% live trees correctly classified; trees with DBH <20 cm: 71.6% dead trees correctly classified and 71.0% live trees correctly classified vs. 67.2% dead trees correctly classified and 66.7% live trees correctly classified). In general, count of abrupt declines improved live-tree classification. External validation of A. concolor models showed that they functioned well at stands not used in model development, and the development of size-specific models demonstrated important differences in mortality risk between understory and canopy trees. Population-level mortality-risk models were developed for A. concolor and generated realistic mortality rates at two sites. Our results support the contention that a more comprehensive use of the growth record yields a more robust assessment of mortality risk. ?? 2007 NRC.
2016-01-01
This paper presents an algorithm, for use with a Portable Powered Ankle-Foot Orthosis (i.e., PPAFO) that can automatically detect changes in gait modes (level ground, ascent and descent of stairs or ramps), thus allowing for appropriate ankle actuation control during swing phase. An artificial neural network (ANN) algorithm used input signals from an inertial measurement unit and foot switches, that is, vertical velocity and segment angle of the foot. Output from the ANN was filtered and adjusted to generate a final data set used to classify different gait modes. Five healthy male subjects walked with the PPAFO on the right leg for two test scenarios (walking over level ground and up and down stairs or a ramp; three trials per scenario). Success rate was quantified by the number of correctly classified steps with respect to the total number of steps. The results indicated that the proposed algorithm's success rate was high (99.3%, 100%, and 98.3% for level, ascent, and descent modes in the stairs scenario, respectively; 98.9%, 97.8%, and 100% in the ramp scenario). The proposed algorithm continuously detected each step's gait mode with faster timing and higher accuracy compared to a previous algorithm that used a decision tree based on maximizing the reliability of the mode recognition. PMID:28070188
NASA Astrophysics Data System (ADS)
Khan, Faisal; Enzmann, Frieder; Kersten, Michael
2016-03-01
Image processing of X-ray-computed polychromatic cone-beam micro-tomography (μXCT) data of geological samples mainly involves artefact reduction and phase segmentation. For the former, the main beam-hardening (BH) artefact is removed by applying a best-fit quadratic surface algorithm to a given image data set (reconstructed slice), which minimizes the BH offsets of the attenuation data points from that surface. A Matlab code for this approach is provided in the Appendix. The final BH-corrected image is extracted from the residual data or from the difference between the surface elevation values and the original grey-scale values. For the segmentation, we propose a novel least-squares support vector machine (LS-SVM, an algorithm for pixel-based multi-phase classification) approach. A receiver operating characteristic (ROC) analysis was performed on BH-corrected and uncorrected samples to show that BH correction is in fact an important prerequisite for accurate multi-phase classification. The combination of the two approaches was thus used to classify successfully three different more or less complex multi-phase rock core samples.
Algorithms and Results of Eye Tissues Differentiation Based on RF Ultrasound
Jurkonis, R.; Janušauskas, A.; Marozas, V.; Jegelevičius, D.; Daukantas, S.; Patašius, M.; Paunksnis, A.; Lukoševičius, A.
2012-01-01
Algorithms and software were developed for analysis of B-scan ultrasonic signals acquired from commercial diagnostic ultrasound system. The algorithms process raw ultrasonic signals in backscattered spectrum domain, which is obtained using two time-frequency methods: short-time Fourier and Hilbert-Huang transformations. The signals from selected regions of eye tissues are characterized by parameters: B-scan envelope amplitude, approximated spectral slope, approximated spectral intercept, mean instantaneous frequency, mean instantaneous bandwidth, and parameters of Nakagami distribution characterizing Hilbert-Huang transformation output. The backscattered ultrasound signal parameters characterizing intraocular and orbit tissues were processed by decision tree data mining algorithm. The pilot trial proved that applied methods are able to correctly classify signals from corpus vitreum blood, extraocular muscle, and orbit tissues. In 26 cases of ocular tissues classification, one error occurred, when tissues were classified into classes of corpus vitreum blood, extraocular muscle, and orbit tissue. In this pilot classification parameters of spectral intercept and Nakagami parameter for instantaneous frequencies distribution of the 1st intrinsic mode function were found specific for corpus vitreum blood, orbit and extraocular muscle tissues. We conclude that ultrasound data should be further collected in clinical database to establish background for decision support system for ocular tissue noninvasive differentiation. PMID:22654643
NASA Astrophysics Data System (ADS)
Roberge, S.; Chokmani, K.; De Sève, D.
2012-04-01
The snow cover plays an important role in the hydrological cycle of Quebec (Eastern Canada). Consequently, evaluating its spatial extent interests the authorities responsible for the management of water resources, especially hydropower companies. The main objective of this study is the development of a snow-cover mapping strategy using remote sensing data and ensemble based systems techniques. Planned to be tested in a near real-time operational mode, this snow-cover mapping strategy has the advantage to provide the probability of a pixel to be snow covered and its uncertainty. Ensemble systems are made of two key components. First, a method is needed to build an ensemble of classifiers that is diverse as much as possible. Second, an approach is required to combine the outputs of individual classifiers that make up the ensemble in such a way that correct decisions are amplified, and incorrect ones are cancelled out. In this study, we demonstrate the potential of ensemble systems to snow-cover mapping using remote sensing data. The chosen classifier is a sequential thresholds algorithm using NOAA-AVHRR data adapted to conditions over Eastern Canada. Its special feature is the use of a combination of six sequential thresholds varying according to the day in the winter season. Two versions of the snow-cover mapping algorithm have been developed: one is specific for autumn (from October 1st to December 31st) and the other for spring (from March 16th to May 31st). In order to build the ensemble based system, different versions of the algorithm are created by varying randomly its parameters. One hundred of the versions are included in the ensemble. The probability of a pixel to be snow, no-snow or cloud covered corresponds to the amount of votes the pixel has been classified as such by all classifiers. The overall performance of ensemble based mapping is compared to the overall performance of the chosen classifier, and also with ground observations at meteorological stations.
Riihimaki, Laura D.; Comstock, Jennifer M.; Anderson, Kevin K.; ...
2016-06-10
Knowledge of cloud phase (liquid, ice, mixed, etc.) is necessary to describe the radiative impact of clouds and their lifetimes, but is a property that is difficult to simulate correctly in climate models. One step towards improving those simulations is to make observations of cloud phase with sufficient accuracy to help constrain model representations of cloud processes. In this study, we outline a methodology using a basic Bayesian classifier to estimate the probabilities of cloud-phase class from Atmospheric Radiation Measurement (ARM) vertically pointing active remote sensors. The advantage of this method over previous ones is that it provides uncertainty informationmore » on the phase classification. We also test the value of including higher moments of the cloud radar Doppler spectrum than are traditionally used operationally. Using training data of known phase from the Mixed-Phase Arctic Cloud Experiment (M-PACE) field campaign, we demonstrate a proof of concept for how the method can be used to train an algorithm that identifies ice, liquid, mixed phase, and snow. Over 95 % of data are identified correctly for pure ice and liquid cases used in this study. Mixed-phase and snow cases are more problematic to identify correctly. When lidar data are not available, including additional information from the Doppler spectrum provides substantial improvement to the algorithm. As a result, this is a first step towards an operational algorithm and can be expanded to include additional categories such as drizzle with additional training data.« less
NASA Astrophysics Data System (ADS)
Riihimaki, Laura D.; Comstock, Jennifer M.; Anderson, Kevin K.; Holmes, Aimee; Luke, Edward
2016-06-01
Knowledge of cloud phase (liquid, ice, mixed, etc.) is necessary to describe the radiative impact of clouds and their lifetimes, but is a property that is difficult to simulate correctly in climate models. One step towards improving those simulations is to make observations of cloud phase with sufficient accuracy to help constrain model representations of cloud processes. In this study, we outline a methodology using a basic Bayesian classifier to estimate the probabilities of cloud-phase class from Atmospheric Radiation Measurement (ARM) vertically pointing active remote sensors. The advantage of this method over previous ones is that it provides uncertainty information on the phase classification. We also test the value of including higher moments of the cloud radar Doppler spectrum than are traditionally used operationally. Using training data of known phase from the Mixed-Phase Arctic Cloud Experiment (M-PACE) field campaign, we demonstrate a proof of concept for how the method can be used to train an algorithm that identifies ice, liquid, mixed phase, and snow. Over 95 % of data are identified correctly for pure ice and liquid cases used in this study. Mixed-phase and snow cases are more problematic to identify correctly. When lidar data are not available, including additional information from the Doppler spectrum provides substantial improvement to the algorithm. This is a first step towards an operational algorithm and can be expanded to include additional categories such as drizzle with additional training data.
Multiscale corner detection and classification using local properties and semantic patterns
NASA Astrophysics Data System (ADS)
Gallo, Giovanni; Giuoco, Alessandro L.
2002-05-01
A new technique to detect, localize and classify corners in digital closed curves is proposed. The technique is based on correct estimation of support regions for each point. We compute multiscale curvature to detect and to localize corners. As a further step, with the aid of some local features, it's possible to classify corners into seven distinct types. Classification is performed using a set of rules, which describe corners according to preset semantic patterns. Compared with existing techniques, the proposed approach inscribes itself into the family of algorithms that try to explain the curve, instead of simple labeling. Moreover, our technique works in manner similar to what is believed are typical mechanisms of human perception.
NASA Astrophysics Data System (ADS)
Metzger, Andrew; Benavides, Amanda; Nopoulos, Peg; Magnotta, Vincent
2016-03-01
The goal of this project was to develop two age appropriate atlases (neonatal and one year old) that account for the rapid growth and maturational changes that occur during early development. Tissue maps from this age group were initially created by manually correcting the resulting tissue maps after applying an expectation maximization (EM) algorithm and an adult atlas to pediatric subjects. The EM algorithm classified each voxel into one of ten possible tissue types including several subcortical structures. This was followed by a novel level set segmentation designed to improve differentiation between distal cortical gray matter and white matter. To minimize the req uired manual corrections, the adult atlas was registered to the pediatric scans using high -dimensional, symmetric image normalization (SyN) registration. The subject images were then mapped to an age specific atlas space, again using SyN registration, and the resulting transformation applied to the manually corrected tissue maps. The individual maps were averaged in the age specific atlas space and blurred to generate the age appropriate anatomical priors. The resulting anatomical priors were then used by the EM algorithm to re-segment the initial training set as well as an independent testing set. The results from the adult and age-specific anatomical priors were compared to the manually corrected results. The age appropriate atlas provided superior results as compared to the adult atlas. The image analysis pipeline used in this work was built using the open source software package BRAINSTools.
Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods.
Polat, Huseyin; Danaei Mehr, Homay; Cetin, Aydin
2017-04-01
As Chronic Kidney Disease progresses slowly, early detection and effective treatment are the only cure to reduce the mortality rate. Machine learning techniques are gaining significance in medical diagnosis because of their classification ability with high accuracy rates. The accuracy of classification algorithms depend on the use of correct feature selection algorithms to reduce the dimension of datasets. In this study, Support Vector Machine classification algorithm was used to diagnose Chronic Kidney Disease. To diagnose the Chronic Kidney Disease, two essential types of feature selection methods namely, wrapper and filter approaches were chosen to reduce the dimension of Chronic Kidney Disease dataset. In wrapper approach, classifier subset evaluator with greedy stepwise search engine and wrapper subset evaluator with the Best First search engine were used. In filter approach, correlation feature selection subset evaluator with greedy stepwise search engine and filtered subset evaluator with the Best First search engine were used. The results showed that the Support Vector Machine classifier by using filtered subset evaluator with the Best First search engine feature selection method has higher accuracy rate (98.5%) in the diagnosis of Chronic Kidney Disease compared to other selected methods.
Automatic Classification of Specific Melanocytic Lesions Using Artificial Intelligence
Jaworek-Korjakowska, Joanna; Kłeczek, Paweł
2016-01-01
Background. Given its propensity to metastasize, and lack of effective therapies for most patients with advanced disease, early detection of melanoma is a clinical imperative. Different computer-aided diagnosis (CAD) systems have been proposed to increase the specificity and sensitivity of melanoma detection. Although such computer programs are developed for different diagnostic algorithms, to the best of our knowledge, a system to classify different melanocytic lesions has not been proposed yet. Method. In this research we present a new approach to the classification of melanocytic lesions. This work is focused not only on categorization of skin lesions as benign or malignant but also on specifying the exact type of a skin lesion including melanoma, Clark nevus, Spitz/Reed nevus, and blue nevus. The proposed automatic algorithm contains the following steps: image enhancement, lesion segmentation, feature extraction, and selection as well as classification. Results. The algorithm has been tested on 300 dermoscopic images and achieved accuracy of 92% indicating that the proposed approach classified most of the melanocytic lesions correctly. Conclusions. A proposed system can not only help to precisely diagnose the type of the skin mole but also decrease the amount of biopsies and reduce the morbidity related to skin lesion excision. PMID:26885520
Automatic Classification of Specific Melanocytic Lesions Using Artificial Intelligence.
Jaworek-Korjakowska, Joanna; Kłeczek, Paweł
2016-01-01
Given its propensity to metastasize, and lack of effective therapies for most patients with advanced disease, early detection of melanoma is a clinical imperative. Different computer-aided diagnosis (CAD) systems have been proposed to increase the specificity and sensitivity of melanoma detection. Although such computer programs are developed for different diagnostic algorithms, to the best of our knowledge, a system to classify different melanocytic lesions has not been proposed yet. In this research we present a new approach to the classification of melanocytic lesions. This work is focused not only on categorization of skin lesions as benign or malignant but also on specifying the exact type of a skin lesion including melanoma, Clark nevus, Spitz/Reed nevus, and blue nevus. The proposed automatic algorithm contains the following steps: image enhancement, lesion segmentation, feature extraction, and selection as well as classification. The algorithm has been tested on 300 dermoscopic images and achieved accuracy of 92% indicating that the proposed approach classified most of the melanocytic lesions correctly. A proposed system can not only help to precisely diagnose the type of the skin mole but also decrease the amount of biopsies and reduce the morbidity related to skin lesion excision.
Tormene, Paolo; Giorgino, Toni; Quaglini, Silvana; Stefanelli, Mario
2009-01-01
The purpose of this study was to assess the performance of a real-time ("open-end") version of the dynamic time warping (DTW) algorithm for the recognition of motor exercises. Given a possibly incomplete input stream of data and a reference time series, the open-end DTW algorithm computes both the size of the prefix of reference which is best matched by the input, and the dissimilarity between the matched portions. The algorithm was used to provide real-time feedback to neurological patients undergoing motor rehabilitation. We acquired a dataset of multivariate time series from a sensorized long-sleeve shirt which contains 29 strain sensors distributed on the upper limb. Seven typical rehabilitation exercises were recorded in several variations, both correctly and incorrectly executed, and at various speeds, totaling a data set of 840 time series. Nearest-neighbour classifiers were built according to the outputs of open-end DTW alignments and their global counterparts on exercise pairs. The classifiers were also tested on well-known public datasets from heterogeneous domains. Nonparametric tests show that (1) on full time series the two algorithms achieve the same classification accuracy (p-value =0.32); (2) on partial time series, classifiers based on open-end DTW have a far higher accuracy (kappa=0.898 versus kappa=0.447;p<10(-5)); and (3) the prediction of the matched fraction follows closely the ground truth (root mean square <10%). The results hold for the motor rehabilitation and the other datasets tested, as well. The open-end variant of the DTW algorithm is suitable for the classification of truncated quantitative time series, even in the presence of noise. Early recognition and accurate class prediction can be achieved, provided that enough variance is available over the time span of the reference. Therefore, the proposed technique expands the use of DTW to a wider range of applications, such as real-time biofeedback systems.
Ahmed, Shiek S. S. J.; Ramakrishnan, V.
2012-01-01
Background Poor oral bioavailability is an important parameter accounting for the failure of the drug candidates. Approximately, 50% of developing drugs fail because of unfavorable oral bioavailability. In silico prediction of oral bioavailability (%F) based on physiochemical properties are highly needed. Although many computational models have been developed to predict oral bioavailability, their accuracy remains low with a significant number of false positives. In this study, we present an oral bioavailability model based on systems biological approach, using a machine learning algorithm coupled with an optimal discriminative set of physiochemical properties. Results The models were developed based on computationally derived 247 physicochemical descriptors from 2279 molecules, among which 969, 605 and 705 molecules were corresponds to oral bioavailability, intestinal absorption (HIA) and caco-2 permeability data set, respectively. The partial least squares discriminate analysis showed 49 descriptors of HIA and 50 descriptors of caco-2 are the major contributing descriptors in classifying into groups. Of these descriptors, 47 descriptors were commonly associated to HIA and caco-2, which suggests to play a vital role in classifying oral bioavailability. To determine the best machine learning algorithm, 21 classifiers were compared using a bioavailability data set of 969 molecules with 47 descriptors. Each molecule in the data set was represented by a set of 47 physiochemical properties with the functional relevance labeled as (+bioavailability/−bioavailability) to indicate good-bioavailability/poor-bioavailability molecules. The best-performing algorithm was the logistic algorithm. The correlation based feature selection (CFS) algorithm was implemented, which confirms that these 47 descriptors are the fundamental descriptors for oral bioavailability prediction. Conclusion The logistic algorithm with 47 selected descriptors correctly predicted the oral bioavailability, with a predictive accuracy of more than 71%. Overall, the method captures the fundamental molecular descriptors, that can be used as an entity to facilitate prediction of oral bioavailability. PMID:22815781
Ahmed, Shiek S S J; Ramakrishnan, V
2012-01-01
Poor oral bioavailability is an important parameter accounting for the failure of the drug candidates. Approximately, 50% of developing drugs fail because of unfavorable oral bioavailability. In silico prediction of oral bioavailability (%F) based on physiochemical properties are highly needed. Although many computational models have been developed to predict oral bioavailability, their accuracy remains low with a significant number of false positives. In this study, we present an oral bioavailability model based on systems biological approach, using a machine learning algorithm coupled with an optimal discriminative set of physiochemical properties. The models were developed based on computationally derived 247 physicochemical descriptors from 2279 molecules, among which 969, 605 and 705 molecules were corresponds to oral bioavailability, intestinal absorption (HIA) and caco-2 permeability data set, respectively. The partial least squares discriminate analysis showed 49 descriptors of HIA and 50 descriptors of caco-2 are the major contributing descriptors in classifying into groups. Of these descriptors, 47 descriptors were commonly associated to HIA and caco-2, which suggests to play a vital role in classifying oral bioavailability. To determine the best machine learning algorithm, 21 classifiers were compared using a bioavailability data set of 969 molecules with 47 descriptors. Each molecule in the data set was represented by a set of 47 physiochemical properties with the functional relevance labeled as (+bioavailability/-bioavailability) to indicate good-bioavailability/poor-bioavailability molecules. The best-performing algorithm was the logistic algorithm. The correlation based feature selection (CFS) algorithm was implemented, which confirms that these 47 descriptors are the fundamental descriptors for oral bioavailability prediction. The logistic algorithm with 47 selected descriptors correctly predicted the oral bioavailability, with a predictive accuracy of more than 71%. Overall, the method captures the fundamental molecular descriptors, that can be used as an entity to facilitate prediction of oral bioavailability.
Self-recovery reversible image watermarking algorithm
Sun, He; Gao, Shangbing; Jin, Shenghua
2018-01-01
The integrity of image content is essential, although most watermarking algorithms can achieve image authentication but not automatically repair damaged areas or restore the original image. In this paper, a self-recovery reversible image watermarking algorithm is proposed to recover the tampered areas effectively. First of all, the original image is divided into homogeneous blocks and non-homogeneous blocks through multi-scale decomposition, and the feature information of each block is calculated as the recovery watermark. Then, the original image is divided into 4×4 non-overlapping blocks classified into smooth blocks and texture blocks according to image textures. Finally, the recovery watermark generated by homogeneous blocks and error-correcting codes is embedded into the corresponding smooth block by mapping; watermark information generated by non-homogeneous blocks and error-correcting codes is embedded into the corresponding non-embedded smooth block and the texture block via mapping. The correlation attack is detected by invariant moments when the watermarked image is attacked. To determine whether a sub-block has been tampered with, its feature is calculated and the recovery watermark is extracted from the corresponding block. If the image has been tampered with, it can be recovered. The experimental results show that the proposed algorithm can effectively recover the tampered areas with high accuracy and high quality. The algorithm is characterized by sound visual quality and excellent image restoration. PMID:29920528
Chaotic particle swarm optimization with mutation for classification.
Assarzadeh, Zahra; Naghsh-Nilchi, Ahmad Reza
2015-01-01
In this paper, a chaotic particle swarm optimization with mutation-based classifier particle swarm optimization is proposed to classify patterns of different classes in the feature space. The introduced mutation operators and chaotic sequences allows us to overcome the problem of early convergence into a local minima associated with particle swarm optimization algorithms. That is, the mutation operator sharpens the convergence and it tunes the best possible solution. Furthermore, to remove the irrelevant data and reduce the dimensionality of medical datasets, a feature selection approach using binary version of the proposed particle swarm optimization is introduced. In order to demonstrate the effectiveness of our proposed classifier, mutation-based classifier particle swarm optimization, it is checked out with three sets of data classifications namely, Wisconsin diagnostic breast cancer, Wisconsin breast cancer and heart-statlog, with different feature vector dimensions. The proposed algorithm is compared with different classifier algorithms including k-nearest neighbor, as a conventional classifier, particle swarm-classifier, genetic algorithm, and Imperialist competitive algorithm-classifier, as more sophisticated ones. The performance of each classifier was evaluated by calculating the accuracy, sensitivity, specificity and Matthews's correlation coefficient. The experimental results show that the mutation-based classifier particle swarm optimization unequivocally performs better than all the compared algorithms.
Modified fuzzy c-means applied to a Bragg grating-based spectral imager for material clustering
NASA Astrophysics Data System (ADS)
Rodríguez, Aida; Nieves, Juan Luis; Valero, Eva; Garrote, Estíbaliz; Hernández-Andrés, Javier; Romero, Javier
2012-01-01
We have modified the Fuzzy C-Means algorithm for an application related to segmentation of hyperspectral images. Classical fuzzy c-means algorithm uses Euclidean distance for computing sample membership to each cluster. We have introduced a different distance metric, Spectral Similarity Value (SSV), in order to have a more convenient similarity measure for reflectance information. SSV distance metric considers both magnitude difference (by the use of Euclidean distance) and spectral shape (by the use of Pearson correlation). Experiments confirmed that the introduction of this metric improves the quality of hyperspectral image segmentation, creating spectrally more dense clusters and increasing the number of correctly classified pixels.
Ozcift, Akin; Gulten, Arif
2011-12-01
Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature. While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC). Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases. RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Physical Human Activity Recognition Using Wearable Sensors.
Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine
2015-12-11
This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors' placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject.
García-Massó, X; Serra-Añó, P; Gonzalez, L M; Ye-Lin, Y; Prats-Boluda, G; Garcia-Casado, J
2015-10-01
This was a cross-sectional study. The main objective of this study was to develop and test classification algorithms based on machine learning using accelerometers to identify the activity type performed by manual wheelchair users with spinal cord injury (SCI). The study was conducted in the Physical Therapy department and the Physical Education and Sports department of the University of Valencia. A total of 20 volunteers were asked to perform 10 physical activities, lying down, body transfers, moving items, mopping, working on a computer, watching TV, arm-ergometer exercises, passive propulsion, slow propulsion and fast propulsion, while fitted with four accelerometers placed on both wrists, chest and waist. The activities were grouped into five categories: sedentary, locomotion, housework, body transfers and moderate physical activity. Different machine learning algorithms were used to develop individual and group activity classifiers from the acceleration data for different combinations of number and position of the accelerometers. We found that although the accuracy of the classifiers for individual activities was moderate (55-72%), with higher values for a greater number of accelerometers, grouped activities were correctly classified in a high percentage of cases (83.2-93.6%). With only two accelerometers and the quadratic discriminant analysis algorithm we achieved a reasonably accurate group activity recognition system (>90%). Such a system with the minimum of intervention would be a valuable tool for studying physical activity in individuals with SCI.
Physical Human Activity Recognition Using Wearable Sensors
Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine
2015-01-01
This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors’ placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject. PMID:26690450
Use of genetic algorithm for the selection of EEG features
NASA Astrophysics Data System (ADS)
Asvestas, P.; Korda, A.; Kostopoulos, S.; Karanasiou, I.; Ouzounoglou, A.; Sidiropoulos, K.; Ventouras, E.; Matsopoulos, G.
2015-09-01
Genetic Algorithm (GA) is a popular optimization technique that can detect the global optimum of a multivariable function containing several local optima. GA has been widely used in the field of biomedical informatics, especially in the context of designing decision support systems that classify biomedical signals or images into classes of interest. The aim of this paper is to present a methodology, based on GA, for the selection of the optimal subset of features that can be used for the efficient classification of Event Related Potentials (ERPs), which are recorded during the observation of correct or incorrect actions. In our experiment, ERP recordings were acquired from sixteen (16) healthy volunteers who observed correct or incorrect actions of other subjects. The brain electrical activity was recorded at 47 locations on the scalp. The GA was formulated as a combinatorial optimizer for the selection of the combination of electrodes that maximizes the performance of the Fuzzy C Means (FCM) classification algorithm. In particular, during the evolution of the GA, for each candidate combination of electrodes, the well-known (Σ, Φ, Ω) features were calculated and were evaluated by means of the FCM method. The proposed methodology provided a combination of 8 electrodes, with classification accuracy 93.8%. Thus, GA can be the basis for the selection of features that discriminate ERP recordings of observations of correct or incorrect actions.
NASA Astrophysics Data System (ADS)
Liu, J.; Lan, T.; Qin, H.
2017-10-01
Traditional data cleaning identifies dirty data by classifying original data sequences, which is a class-imbalanced problem since the proportion of incorrect data is much less than the proportion of correct ones for most diagnostic systems in Magnetic Confinement Fusion (MCF) devices. When using machine learning algorithms to classify diagnostic data based on class-imbalanced training set, most classifiers are biased towards the major class and show very poor classification rates on the minor class. By transforming the direct classification problem about original data sequences into a classification problem about the physical similarity between data sequences, the class-balanced effect of Time-Domain Global Similarity (TDGS) method on training set structure is investigated in this paper. Meanwhile, the impact of improved training set structure on data cleaning performance of TDGS method is demonstrated with an application example in EAST POlarimetry-INTerferometry (POINT) system.
Classification of ring artifacts for their effective removal using type adaptive correction schemes.
Anas, Emran Mohammad Abu; Lee, Soo Yeol; Hasan, Kamrul
2011-06-01
High resolution tomographic images acquired with a digital X-ray detector are often degraded by the so called ring artifacts. In this paper, a detail analysis including the classification, detection and correction of these ring artifacts is presented. At first, a novel idea for classifying rings into two categories, namely type I and type II rings, is proposed based on their statistical characteristics. The defective detector elements and the dusty scintillator screens result in type I ring and the mis-calibrated detector elements lead to type II ring. Unlike conventional approaches, we emphasize here on the separate detection and correction schemes for each type of rings for their effective removal. For the detection of type I ring, the histogram of the responses of the detector elements is used and a modified fast image inpainting algorithm is adopted to correct the responses of the defective pixels. On the other hand, to detect the type II ring, first a simple filtering scheme is presented based on the fast Fourier transform (FFT) to smooth the sum curve derived form the type I ring corrected projection data. The difference between the sum curve and its smoothed version is then used to detect their positions. Then, to remove the constant bias suffered by the responses of the mis-calibrated detector elements with view angle, an estimated dc shift is subtracted from them. The performance of the proposed algorithm is evaluated using real micro-CT images and is compared with three recently reported algorithms. Simulation results demonstrate superior performance of the proposed technique as compared to the techniques reported in the literature. Copyright © 2011 Elsevier Ltd. All rights reserved.
Detection and classification of human body odor using an electronic nose.
Wongchoosuk, Chatchawal; Lutz, Mario; Kerdcharoen, Teerakiat
2009-01-01
An electronic nose (E-nose) has been designed and equipped with software that can detect and classify human armpit body odor. An array of metal oxide sensors was used for detecting volatile organic compounds. The measurement circuit employs a voltage divider resistor to measure the sensitivity of each sensor. This E-nose was controlled by in-house developed software through a portable USB data acquisition card with a principle component analysis (PCA) algorithm implemented for pattern recognition and classification. Because gas sensor sensitivity in the detection of armpit odor samples is affected by humidity, we propose a new method and algorithms combining hardware/software for the correction of the humidity noise. After the humidity correction, the E-nose showed the capability of detecting human body odor and distinguishing the body odors from two persons in a relative manner. The E-nose is still able to recognize people, even after application of deodorant. In conclusion, this is the first report of the application of an E-nose for armpit odor recognition.
Detection and Classification of Human Body Odor Using an Electronic Nose
Wongchoosuk, Chatchawal; Lutz, Mario; Kerdcharoen, Teerakiat
2009-01-01
An electronic nose (E-nose) has been designed and equipped with software that can detect and classify human armpit body odor. An array of metal oxide sensors was used for detecting volatile organic compounds. The measurement circuit employs a voltage divider resistor to measure the sensitivity of each sensor. This E-nose was controlled by in-house developed software through a portable USB data acquisition card with a principle component analysis (PCA) algorithm implemented for pattern recognition and classification. Because gas sensor sensitivity in the detection of armpit odor samples is affected by humidity, we propose a new method and algorithms combining hardware/software for the correction of the humidity noise. After the humidity correction, the E-nose showed the capability of detecting human body odor and distinguishing the body odors from two persons in a relative manner. The E-nose is still able to recognize people, even after application of deodorant. In conclusion, this is the first report of the application of an E-nose for armpit odor recognition. PMID:22399995
Pediatric medical complexity algorithm: a new method to stratify children by medical complexity.
Simon, Tamara D; Cawthon, Mary Lawrence; Stanford, Susan; Popalisky, Jean; Lyons, Dorothy; Woodcox, Peter; Hood, Margaret; Chen, Alex Y; Mangione-Smith, Rita
2014-06-01
The goal of this study was to develop an algorithm based on International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM), codes for classifying children with chronic disease (CD) according to level of medical complexity and to assess the algorithm's sensitivity and specificity. A retrospective observational study was conducted among 700 children insured by Washington State Medicaid with ≥1 Seattle Children's Hospital emergency department and/or inpatient encounter in 2010. The gold standard population included 350 children with complex chronic disease (C-CD), 100 with noncomplex chronic disease (NC-CD), and 250 without CD. An existing ICD-9-CM-based algorithm called the Chronic Disability Payment System was modified to develop a new algorithm called the Pediatric Medical Complexity Algorithm (PMCA). The sensitivity and specificity of PMCA were assessed. Using hospital discharge data, PMCA's sensitivity for correctly classifying children was 84% for C-CD, 41% for NC-CD, and 96% for those without CD. Using Medicaid claims data, PMCA's sensitivity was 89% for C-CD, 45% for NC-CD, and 80% for those without CD. Specificity was 90% to 92% in hospital discharge data and 85% to 91% in Medicaid claims data for all 3 groups. PMCA identified children with C-CD (who have accessed tertiary hospital care) with good sensitivity and good to excellent specificity when applied to hospital discharge or Medicaid claims data. PMCA may be useful for targeting resources such as care coordination to children with C-CD. Copyright © 2014 by the American Academy of Pediatrics.
Generalized expectation-maximization segmentation of brain MR images
NASA Astrophysics Data System (ADS)
Devalkeneer, Arnaud A.; Robe, Pierre A.; Verly, Jacques G.; Phillips, Christophe L. M.
2006-03-01
Manual segmentation of medical images is unpractical because it is time consuming, not reproducible, and prone to human error. It is also very difficult to take into account the 3D nature of the images. Thus, semi- or fully-automatic methods are of great interest. Current segmentation algorithms based on an Expectation- Maximization (EM) procedure present some limitations. The algorithm by Ashburner et al., 2005, does not allow multichannel inputs, e.g. two MR images of different contrast, and does not use spatial constraints between adjacent voxels, e.g. Markov random field (MRF) constraints. The solution of Van Leemput et al., 1999, employs a simplified model (mixture coefficients are not estimated and only one Gaussian is used by tissue class, with three for the image background). We have thus implemented an algorithm that combines the features of these two approaches: multichannel inputs, intensity bias correction, multi-Gaussian histogram model, and Markov random field (MRF) constraints. Our proposed method classifies tissues in three iterative main stages by way of a Generalized-EM (GEM) algorithm: (1) estimation of the Gaussian parameters modeling the histogram of the images, (2) correction of image intensity non-uniformity, and (3) modification of prior classification knowledge by MRF techniques. The goal of the GEM algorithm is to maximize the log-likelihood across the classes and voxels. Our segmentation algorithm was validated on synthetic data (with the Dice metric criterion) and real data (by a neurosurgeon) and compared to the original algorithms by Ashburner et al. and Van Leemput et al. Our combined approach leads to more robust and accurate segmentation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xi, T; Jones, I M; Mohrenweiser, H W
2003-11-03
Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of themore » variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.« less
Chaotic Particle Swarm Optimization with Mutation for Classification
Assarzadeh, Zahra; Naghsh-Nilchi, Ahmad Reza
2015-01-01
In this paper, a chaotic particle swarm optimization with mutation-based classifier particle swarm optimization is proposed to classify patterns of different classes in the feature space. The introduced mutation operators and chaotic sequences allows us to overcome the problem of early convergence into a local minima associated with particle swarm optimization algorithms. That is, the mutation operator sharpens the convergence and it tunes the best possible solution. Furthermore, to remove the irrelevant data and reduce the dimensionality of medical datasets, a feature selection approach using binary version of the proposed particle swarm optimization is introduced. In order to demonstrate the effectiveness of our proposed classifier, mutation-based classifier particle swarm optimization, it is checked out with three sets of data classifications namely, Wisconsin diagnostic breast cancer, Wisconsin breast cancer and heart-statlog, with different feature vector dimensions. The proposed algorithm is compared with different classifier algorithms including k-nearest neighbor, as a conventional classifier, particle swarm-classifier, genetic algorithm, and Imperialist competitive algorithm-classifier, as more sophisticated ones. The performance of each classifier was evaluated by calculating the accuracy, sensitivity, specificity and Matthews's correlation coefficient. The experimental results show that the mutation-based classifier particle swarm optimization unequivocally performs better than all the compared algorithms. PMID:25709937
An automated approach to the design of decision tree classifiers
NASA Technical Reports Server (NTRS)
Argentiero, P.; Chin, R.; Beaudet, P.
1982-01-01
An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.
Lurie, Jon D.; Tosteson, Anna N.A.; Deyo, Richard A.; Tosteson, Tor; Weinstein, James; Mirza, Sohail K.
2014-01-01
Study Design Retrospective analysis of Medicare claims linked to a multi-center clinical trial. Objective The Spine Patient Outcomes Research Trial (SPORT) provided a unique opportunity to examine the validity of a claims-based algorithm for grouping patients by surgical indication. SPORT enrolled patients for lumbar disc herniation, spinal stenosis, and degenerative spondylolisthesis. We compared the surgical indication derived from Medicare claims to that provided by SPORT surgeons, the “gold standard”. Summary of Background Data Administrative data are frequently used to report procedure rates, surgical safety outcomes, and costs in the management of spinal surgery. However, the accuracy of using diagnosis codes to classify patients by surgical indication has not been examined. Methods Medicare claims were link to beneficiaries enrolled in SPORT. The sensitivity and specificity of three claims-based approaches to group patients based on surgical indications were examined: 1) using the first listed diagnosis; 2) using all diagnoses independently; and 3) using a diagnosis hierarchy based on the support for fusion surgery. Results Medicare claims were obtained from 376 SPORT participants, including 21 with disc herniation, 183 with spinal stenosis, and 172 with degenerative spondylolisthesis. The hierarchical coding algorithm was the most accurate approach for classifying patients by surgical indication, with sensitivities of 76.2%, 88.1%, and 84.3% for disc herniation, spinal stenosis, and degenerative spondylolisthesis cohorts, respectively. The specificity was 98.3% for disc herniation, 83.2% for spinal stenosis, and 90.7% for degenerative spondylolisthesis. Misclassifications were primarily due to codes attributing more complex pathology to the case. Conclusion Standardized approaches for using claims data to accurately group patients by surgical indications has widespread interest. We found that a hierarchical coding approach correctly classified over 90% of spine patients into their respective SPORT cohorts. Therefore, claims data appears to be a reasonably valid approach to classifying patients by surgical indication. PMID:24525995
Goebel, Georg; Seppi, Klaus; Donnemiller, Eveline; Warwitz, Boris; Wenning, Gregor K; Virgolini, Irene; Poewe, Werner; Scherfler, Christoph
2011-04-01
The purpose of this study was to develop an observer-independent algorithm for the correct classification of dopamine transporter SPECT images as Parkinson's disease (PD), multiple system atrophy parkinson variant (MSA-P), progressive supranuclear palsy (PSP) or normal. A total of 60 subjects with clinically probable PD (n = 15), MSA-P (n = 15) and PSP (n = 15), and 15 age-matched healthy volunteers, were studied with the dopamine transporter ligand [(123)I]β-CIT. Parametric images of the specific-to-nondisplaceable equilibrium partition coefficient (BP(ND)) were generated. Following a voxel-wise ANOVA, cut-off values were calculated from the voxel values of the resulting six post-hoc t-test maps. The percentages of the volume of an individual BP(ND) image remaining below and above the cut-off values were determined. The higher percentage of image volume from all six cut-off matrices was used to classify an individual's image. For validation, the algorithm was compared to a conventional region of interest analysis. The predictive diagnostic accuracy of the algorithm in the correct assignment of a [(123)I]β-CIT SPECT image was 83.3% and increased to 93.3% on merging the MSA-P and PSP groups. In contrast the multinomial logistic regression of mean region of interest values of the caudate, putamen and midbrain revealed a diagnostic accuracy of 71.7%. In contrast to a rater-driven approach, this novel method was superior in classifying [(123)I]β-CIT-SPECT images as one of four diagnostic entities. In combination with the investigator-driven visual assessment of SPECT images, this clinical decision support tool would help to improve the diagnostic yield of [(123)I]β-CIT SPECT in patients presenting with parkinsonism at their initial visit.
Moore, Richard G.; McMeekin, D. Scott; Brown, Amy K.; DiSilvestro, Paul; Miller, M. Craig; Allard, W. Jeffrey; Gajewski, Walter; Kurman, Robert; Bast, Robert C.; Skates, Steven J.
2012-01-01
Introduction Patients diagnosed with epithelial ovarian cancer (EOC) have improved outcomes when cared for at centers experienced in the management of EOC. The objective of this trial was to validate a predictive model to assess the risk for EOC in women with a pelvic mass. Methods Women diagnosed with a pelvic mass and scheduled to have surgery were enrolled on a multicenter prospective study. Preoperative serum levels of HE4 and CA125 were measured. Separate logistic regression algorithms for premenopausal and postmenopausal women were utilized to categorize patients into low and high risk groups for EOC. Results Twelve sites enrolled 531 evaluable patients with 352 benign tumors, 129 EOC, 22 LMP tumors, 6 non EOC and 22 non ovarian cancers. The postmenopausal group contained 150 benign cases of which 112 were classified as low risk giving a specificity of 75.0% (95% CI 66.9-81.4), and 111 EOC and 6 LMP tumors of which 108 were classified as high risk giving a sensitivity of 92.3% (95% CI=85.9-96.4). The premenopausal group had 202 benign cases of which 151 were classified as low risk providing a specificity of 74.8% (95% CI=68.2--80.6), and 18 EOC and 16 LMP tumors of which 26 were classified as high risk, providing a sensitivity of 76.5% (95% CI=58.8--89.3). Conclusion An algorithm utilizing HE4 and CA125 successfully classified patients into high and low risk groups with 93.8% of EOC correctly classified as high risk. This model can be used to effectively triage patients to centers of excellence. PMID:18851871
Correcting Evaluation Bias of Relational Classifiers with Network Cross Validation
2010-01-01
classi- fication algorithms: simple random resampling (RRS), equal-instance random resampling (ERS), and network cross-validation ( NCV ). The first two... NCV procedure that eliminates overlap between test sets altogether. The procedure samples for k disjoint test sets that will be used for evaluation...propLabeled ∗ S) nodes from train Pool in f erenceSet =network − trainSet F = F ∪ < trainSet, test Set, in f erenceSet > end for output: F NCV addresses
Chen, Peng; Li, Jinyan
2010-05-17
Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC. Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy.
Robust online tracking via adaptive samples selection with saliency detection
NASA Astrophysics Data System (ADS)
Yan, Jia; Chen, Xi; Zhu, QiuPing
2013-12-01
Online tracking has shown to be successful in tracking of previously unknown objects. However, there are two important factors which lead to drift problem of online tracking, the one is how to select the exact labeled samples even when the target locations are inaccurate, and the other is how to handle the confusors which have similar features with the target. In this article, we propose a robust online tracking algorithm with adaptive samples selection based on saliency detection to overcome the drift problem. To deal with the problem of degrading the classifiers using mis-aligned samples, we introduce the saliency detection method to our tracking problem. Saliency maps and the strong classifiers are combined to extract the most correct positive samples. Our approach employs a simple yet saliency detection algorithm based on image spectral residual analysis. Furthermore, instead of using the random patches as the negative samples, we propose a reasonable selection criterion, in which both the saliency confidence and similarity are considered with the benefits that confusors in the surrounding background are incorporated into the classifiers update process before the drift occurs. The tracking task is formulated as a binary classification via online boosting framework. Experiment results in several challenging video sequences demonstrate the accuracy and stability of our tracker.
Semantic segmentation of mFISH images using convolutional networks.
Pardo, Esteban; Morgado, José Mário T; Malpica, Norberto
2018-04-30
Multicolor in situ hybridization (mFISH) is a karyotyping technique used to detect major chromosomal alterations using fluorescent probes and imaging techniques. Manual interpretation of mFISH images is a time consuming step that can be automated using machine learning; in previous works, pixel or patch wise classification was employed, overlooking spatial information which can help identify chromosomes. In this work, we propose a fully convolutional semantic segmentation network for the interpretation of mFISH images, which uses both spatial and spectral information to classify each pixel in an end-to-end fashion. The semantic segmentation network developed was tested on samples extracted from a public dataset using cross validation. Despite having no labeling information of the image it was tested on, our algorithm yielded an average correct classification ratio (CCR) of 87.41%. Previously, this level of accuracy was only achieved with state of the art algorithms when classifying pixels from the same image in which the classifier has been trained. These results provide evidence that fully convolutional semantic segmentation networks may be employed in the computer aided diagnosis of genetic diseases with improved performance over the current image analysis methods. © 2018 International Society for Advancement of Cytometry. © 2018 International Society for Advancement of Cytometry.
NASA Astrophysics Data System (ADS)
Thomaz, Ricardo L.; Carneiro, Pedro C.; Patrocinio, Ana C.
2017-03-01
Breast cancer is the leading cause of death for women in most countries. The high levels of mortality relate mostly to late diagnosis and to the direct proportionally relationship between breast density and breast cancer development. Therefore, the correct assessment of breast density is important to provide better screening for higher risk patients. However, in modern digital mammography the discrimination among breast densities is highly complex due to increased contrast and visual information for all densities. Thus, a computational system for classifying breast density might be a useful tool for aiding medical staff. Several machine-learning algorithms are already capable of classifying small number of classes with good accuracy. However, machinelearning algorithms main constraint relates to the set of features extracted and used for classification. Although well-known feature extraction techniques might provide a good set of features, it is a complex task to select an initial set during design of a classifier. Thus, we propose feature extraction using a Convolutional Neural Network (CNN) for classifying breast density by a usual machine-learning classifier. We used 307 mammographic images downsampled to 260x200 pixels to train a CNN and extract features from a deep layer. After training, the activation of 8 neurons from a deep fully connected layer are extracted and used as features. Then, these features are feedforward to a single hidden layer neural network that is cross-validated using 10-folds to classify among four classes of breast density. The global accuracy of this method is 98.4%, presenting only 1.6% of misclassification. However, the small set of samples and memory constraints required the reuse of data in both CNN and MLP-NN, therefore overfitting might have influenced the results even though we cross-validated the network. Thus, although we presented a promising method for extracting features and classifying breast density, a greater database is still required for evaluating the results.
NASA Astrophysics Data System (ADS)
Sun, Yankui; Li, Shan; Sun, Zhongyang
2017-01-01
We propose a framework for automated detection of dry age-related macular degeneration (AMD) and diabetic macular edema (DME) from retina optical coherence tomography (OCT) images, based on sparse coding and dictionary learning. The study aims to improve the classification performance of state-of-the-art methods. First, our method presents a general approach to automatically align and crop retina regions; then it obtains global representations of images by using sparse coding and a spatial pyramid; finally, a multiclass linear support vector machine classifier is employed for classification. We apply two datasets for validating our algorithm: Duke spectral domain OCT (SD-OCT) dataset, consisting of volumetric scans acquired from 45 subjects-15 normal subjects, 15 AMD patients, and 15 DME patients; and clinical SD-OCT dataset, consisting of 678 OCT retina scans acquired from clinics in Beijing-168, 297, and 213 OCT images for AMD, DME, and normal retinas, respectively. For the former dataset, our classifier correctly identifies 100%, 100%, and 93.33% of the volumes with DME, AMD, and normal subjects, respectively, and thus performs much better than the conventional method; for the latter dataset, our classifier leads to a correct classification rate of 99.67%, 99.67%, and 100.00% for DME, AMD, and normal images, respectively.
Authenticity assessment of banknotes using portable near infrared spectrometer and chemometrics.
da Silva Oliveira, Vanessa; Honorato, Ricardo Saldanha; Honorato, Fernanda Araújo; Pereira, Claudete Fernandes
2018-05-01
Spectra recorded using a portable near infrared (NIR) spectrometer, Soft Independent Modeling of Class Analogy (SIMCA) and Linear Discriminant Analysis (LDA) associated to Successive Projections Algorithm (SPA) models were applied to identify counterfeit and authentic Brazilian Real (R$20, R$50 and R$100) banknotes, enabling a simple field analysis. NIR spectra (950-1650nm) were recorded from seven different areas of the banknotes (two with fluorescent ink, one over watermark, three with intaglio printing process and one over the serial numbers with typography printing). SIMCA and SPA-LDA models were built using 1st derivative preprocessed spectral data from one of the intaglio areas. For the SIMCA models, all authentic (300) banknotes were correctly classified and the counterfeits (227) were not classified. For the two classes SPA-LDA models (authentic and counterfeit currencies), all the test samples were correctly classified into their respective class. The number of selected variables by SPA varied from two to nineteen for R$20, R$50 and R$100 currencies. These results show that the use of the portable near-infrared with SIMCA or SPA-LDA models can be a completely effective, fast, and non-destructive way to identify authenticity of banknotes as well as permitting field analysis. Copyright © 2018 Elsevier B.V. All rights reserved.
Sun, Yankui; Li, Shan; Sun, Zhongyang
2017-01-01
We propose a framework for automated detection of dry age-related macular degeneration (AMD) and diabetic macular edema (DME) from retina optical coherence tomography (OCT) images, based on sparse coding and dictionary learning. The study aims to improve the classification performance of state-of-the-art methods. First, our method presents a general approach to automatically align and crop retina regions; then it obtains global representations of images by using sparse coding and a spatial pyramid; finally, a multiclass linear support vector machine classifier is employed for classification. We apply two datasets for validating our algorithm: Duke spectral domain OCT (SD-OCT) dataset, consisting of volumetric scans acquired from 45 subjects—15 normal subjects, 15 AMD patients, and 15 DME patients; and clinical SD-OCT dataset, consisting of 678 OCT retina scans acquired from clinics in Beijing—168, 297, and 213 OCT images for AMD, DME, and normal retinas, respectively. For the former dataset, our classifier correctly identifies 100%, 100%, and 93.33% of the volumes with DME, AMD, and normal subjects, respectively, and thus performs much better than the conventional method; for the latter dataset, our classifier leads to a correct classification rate of 99.67%, 99.67%, and 100.00% for DME, AMD, and normal images, respectively.
GENIE: a hybrid genetic algorithm for feature classification in multispectral images
NASA Astrophysics Data System (ADS)
Perkins, Simon J.; Theiler, James P.; Brumby, Steven P.; Harvey, Neal R.; Porter, Reid B.; Szymanski, John J.; Bloch, Jeffrey J.
2000-10-01
We consider the problem of pixel-by-pixel classification of a multi- spectral image using supervised learning. Conventional spuervised classification techniques such as maximum likelihood classification and less conventional ones s uch as neural networks, typically base such classifications solely on the spectral components of each pixel. It is easy to see why: the color of a pixel provides a nice, bounded, fixed dimensional space in which these classifiers work well. It is often the case however, that spectral information alone is not sufficient to correctly classify a pixel. Maybe spatial neighborhood information is required as well. Or maybe the raw spectral components do not themselves make for easy classification, but some arithmetic combination of them would. In either of these cases we have the problem of selecting suitable spatial, spectral or spatio-spectral features that allow the classifier to do its job well. The number of all possible such features is extremely large. How can we select a suitable subset? We have developed GENIE, a hybrid learning system that combines a genetic algorithm that searches a space of image processing operations for a set that can produce suitable feature planes, and a more conventional classifier which uses those feature planes to output a final classification. In this paper we show that the use of a hybrid GA provides significant advantages over using either a GA alone or more conventional classification methods alone. We present results using high-resolution IKONOS data, looking for regions of burned forest and for roads.
Chiu, Shih-Hau; Chen, Chien-Chi; Yuan, Gwo-Fang; Lin, Thy-Hou
2006-06-15
The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity. There were five datasets collected from the Swiss-Prot for establishing the annotation rules. These were treated as the training sets. The TrEMBL entries were treated as the testing set. A correct enzyme classification rate of 70% was obtained for the prokaryote datasets and a similar rate of about 80% was obtained for the eukaryote datasets. The fungus training dataset which lacks an enzyme class description was also used to evaluate the fungus candidate rules. A total of 88 out of 5085 test entries were matched with the fungus rule set. These were otherwise poorly annotated using their functional descriptions. The feasibility of using the method presented here to classify enzyme classes based on the enzyme domain rules is evident. The rules may be also employed by the protein annotators in manual annotation or implemented in an automatic annotation flowchart.
Quality grading of Atlantic salmon (Salmo salar) by computer vision.
Misimi, E; Erikson, U; Skavhaug, A
2008-06-01
In this study, we present a promising method of computer vision-based quality grading of whole Atlantic salmon (Salmo salar). Using computer vision, it was possible to differentiate among different quality grades of Atlantic salmon based on the external geometrical information contained in the fish images. Initially, before the image acquisition, the fish were subjectively graded and labeled into grading classes by a qualified human inspector in the processing plant. Prior to classification, the salmon images were segmented into binary images, and then feature extraction was performed on the geometrical parameters of the fish from the grading classes. The classification algorithm was a threshold-based classifier, which was designed using linear discriminant analysis. The performance of the classifier was tested by using the leave-one-out cross-validation method, and the classification results showed a good agreement between the classification done by human inspectors and by the computer vision. The computer vision-based method classified correctly 90% of the salmon from the data set as compared with the classification by human inspector. Overall, it was shown that computer vision can be used as a powerful tool to grade Atlantic salmon into quality grades in a fast and nondestructive manner by a relatively simple classifier algorithm. The low cost of implementation of today's advanced computer vision solutions makes this method feasible for industrial purposes in fish plants as it can replace manual labor, on which grading tasks still rely.
Fully convolutional network with cluster for semantic segmentation
NASA Astrophysics Data System (ADS)
Ma, Xiao; Chen, Zhongbi; Zhang, Jianlin
2018-04-01
At present, image semantic segmentation technology has been an active research topic for scientists in the field of computer vision and artificial intelligence. Especially, the extensive research of deep neural network in image recognition greatly promotes the development of semantic segmentation. This paper puts forward a method based on fully convolutional network, by cluster algorithm k-means. The cluster algorithm using the image's low-level features and initializing the cluster centers by the super-pixel segmentation is proposed to correct the set of points with low reliability, which are mistakenly classified in great probability, by the set of points with high reliability in each clustering regions. This method refines the segmentation of the target contour and improves the accuracy of the image segmentation.
Threshold Assessment of Gear Diagnostic Tools on Flight and Test Rig Data
NASA Technical Reports Server (NTRS)
Dempsey, Paula J.; Mosher, Marianne; Huff, Edward M.
2003-01-01
A method for defining thresholds for vibration-based algorithms that provides the minimum number of false alarms while maintaining sensitivity to gear damage was developed. This analysis focused on two vibration based gear damage detection algorithms, FM4 and MSA. This method was developed using vibration data collected during surface fatigue tests performed in a spur gearbox rig. The thresholds were defined based on damage progression during tests with damage. The thresholds false alarm rates were then evaluated on spur gear tests without damage. Next, the same thresholds were applied to flight data from an OH-58 helicopter transmission. Results showed that thresholds defined in test rigs can be used to define thresholds in flight to correctly classify the transmission operation as normal.
AUC-Maximizing Ensembles through Metalearning.
LeDell, Erin; van der Laan, Mark J; Petersen, Maya
2016-05-01
Area Under the ROC Curve (AUC) is often used to measure the performance of an estimator in binary classification problems. An AUC-maximizing classifier can have significant advantages in cases where ranking correctness is valued or if the outcome is rare. In a Super Learner ensemble, maximization of the AUC can be achieved by the use of an AUC-maximining metalearning algorithm. We discuss an implementation of an AUC-maximization technique that is formulated as a nonlinear optimization problem. We also evaluate the effectiveness of a large number of different nonlinear optimization algorithms to maximize the cross-validated AUC of the ensemble fit. The results provide evidence that AUC-maximizing metalearners can, and often do, out-perform non-AUC-maximizing metalearning methods, with respect to ensemble AUC. The results also demonstrate that as the level of imbalance in the training data increases, the Super Learner ensemble outperforms the top base algorithm by a larger degree.
AUC-Maximizing Ensembles through Metalearning
LeDell, Erin; van der Laan, Mark J.; Peterson, Maya
2016-01-01
Area Under the ROC Curve (AUC) is often used to measure the performance of an estimator in binary classification problems. An AUC-maximizing classifier can have significant advantages in cases where ranking correctness is valued or if the outcome is rare. In a Super Learner ensemble, maximization of the AUC can be achieved by the use of an AUC-maximining metalearning algorithm. We discuss an implementation of an AUC-maximization technique that is formulated as a nonlinear optimization problem. We also evaluate the effectiveness of a large number of different nonlinear optimization algorithms to maximize the cross-validated AUC of the ensemble fit. The results provide evidence that AUC-maximizing metalearners can, and often do, out-perform non-AUC-maximizing metalearning methods, with respect to ensemble AUC. The results also demonstrate that as the level of imbalance in the training data increases, the Super Learner ensemble outperforms the top base algorithm by a larger degree. PMID:27227721
Bruno, Rossella; Alì, Greta; Giannini, Riccardo; Proietti, Agnese; Lucchi, Marco; Chella, Antonio; Melfi, Franca; Mussi, Alfredo; Fontanini, Gabriella
2017-01-10
Malignant pleural mesothelioma (MPM) is a rare asbestos related cancer, aggressive and unresponsive to therapies. Histological examination of pleural lesions is the gold standard of MPM diagnosis, although it is sometimes hard to discriminate the epithelioid type of MPM from benign mesothelial hyperplasia (MH).This work aims to define a new molecular tool for the differential diagnosis of MPM, using the expression profile of 117 genes deregulated in this tumour.The gene expression analysis was performed by nanoString System on tumour tissues from 36 epithelioid MPM and 17 MH patients, and on 14 mesothelial pleural samples analysed in a blind way. Data analysis included raw nanoString data normalization, unsupervised cluster analysis by Pearson correlation, non-parametric Mann Whitney U-test and molecular classification by the Uncorrelated Shrunken Centroid (USC) Algorithm.The Mann-Whitney U-test found 35 genes upregulated and 31 downregulated in MPM. The unsupervised cluster analysis revealed two clusters, one composed only of MPM and one only of MH samples, thus revealing class-specific gene profiles. The Uncorrelated Shrunken Centroid algorithm identified two classifiers, one including 22 genes and the other 40 genes, able to properly classify all the samples as benign or malignant using gene expression data; both classifiers were also able to correctly determine, in a blind analysis, the diagnostic categories of all the 14 unknown samples.In conclusion we delineated a diagnostic tool combining molecular data (gene expression) and computational analysis (USC algorithm), which can be applied in the clinical practice for the differential diagnosis of MPM.
Pediatric Medical Complexity Algorithm: A New Method to Stratify Children by Medical Complexity
Cawthon, Mary Lawrence; Stanford, Susan; Popalisky, Jean; Lyons, Dorothy; Woodcox, Peter; Hood, Margaret; Chen, Alex Y.; Mangione-Smith, Rita
2014-01-01
OBJECTIVES: The goal of this study was to develop an algorithm based on International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM), codes for classifying children with chronic disease (CD) according to level of medical complexity and to assess the algorithm’s sensitivity and specificity. METHODS: A retrospective observational study was conducted among 700 children insured by Washington State Medicaid with ≥1 Seattle Children’s Hospital emergency department and/or inpatient encounter in 2010. The gold standard population included 350 children with complex chronic disease (C-CD), 100 with noncomplex chronic disease (NC-CD), and 250 without CD. An existing ICD-9-CM–based algorithm called the Chronic Disability Payment System was modified to develop a new algorithm called the Pediatric Medical Complexity Algorithm (PMCA). The sensitivity and specificity of PMCA were assessed. RESULTS: Using hospital discharge data, PMCA’s sensitivity for correctly classifying children was 84% for C-CD, 41% for NC-CD, and 96% for those without CD. Using Medicaid claims data, PMCA’s sensitivity was 89% for C-CD, 45% for NC-CD, and 80% for those without CD. Specificity was 90% to 92% in hospital discharge data and 85% to 91% in Medicaid claims data for all 3 groups. CONCLUSIONS: PMCA identified children with C-CD (who have accessed tertiary hospital care) with good sensitivity and good to excellent specificity when applied to hospital discharge or Medicaid claims data. PMCA may be useful for targeting resources such as care coordination to children with C-CD. PMID:24819580
Facial Expression Recognition with Fusion Features Extracted from Salient Facial Areas.
Liu, Yanpeng; Li, Yibin; Ma, Xin; Song, Rui
2017-03-29
In the pattern recognition domain, deep architectures are currently widely used and they have achieved fine results. However, these deep architectures make particular demands, especially in terms of their requirement for big datasets and GPU. Aiming to gain better results without deep networks, we propose a simplified algorithm framework using fusion features extracted from the salient areas of faces. Furthermore, the proposed algorithm has achieved a better result than some deep architectures. For extracting more effective features, this paper firstly defines the salient areas on the faces. This paper normalizes the salient areas of the same location in the faces to the same size; therefore, it can extracts more similar features from different subjects. LBP and HOG features are extracted from the salient areas, fusion features' dimensions are reduced by Principal Component Analysis (PCA) and we apply several classifiers to classify the six basic expressions at once. This paper proposes a salient areas definitude method which uses peak expressions frames compared with neutral faces. This paper also proposes and applies the idea of normalizing the salient areas to align the specific areas which express the different expressions. As a result, the salient areas found from different subjects are the same size. In addition, the gamma correction method is firstly applied on LBP features in our algorithm framework which improves our recognition rates significantly. By applying this algorithm framework, our research has gained state-of-the-art performances on CK+ database and JAFFE database.
Multiple directed graph large-class multi-spectral processor
NASA Technical Reports Server (NTRS)
Casasent, David; Liu, Shiaw-Dong; Yoneyama, Hideyuki
1988-01-01
Numerical analysis techniques for the interpretation of high-resolution imaging-spectrometer data are described and demonstrated. The method proposed involves the use of (1) a hierarchical classifier with a tree structure generated automatically by a Fisher linear-discriminant-function algorithm and (2) a novel multiple-directed-graph scheme which reduces the local maxima and the number of perturbations required. Results for a 500-class test problem involving simulated imaging-spectrometer data are presented in tables and graphs; 100-percent-correct classification is achieved with an improvement factor of 5.
QRS detection based ECG quality assessment.
Hayn, Dieter; Jammerbund, Bernhard; Schreier, Günter
2012-09-01
Although immediate feedback concerning ECG signal quality during recording is useful, up to now not much literature describing quality measures is available. We have implemented and evaluated four ECG quality measures. Empty lead criterion (A), spike detection criterion (B) and lead crossing point criterion (C) were calculated from basic signal properties. Measure D quantified the robustness of QRS detection when applied to the signal. An advanced Matlab-based algorithm combining all four measures and a simplified algorithm for Android platforms, excluding measure D, were developed. Both algorithms were evaluated by taking part in the Computing in Cardiology Challenge 2011. Each measure's accuracy and computing time was evaluated separately. During the challenge, the advanced algorithm correctly classified 93.3% of the ECGs in the training-set and 91.6 % in the test-set. Scores for the simplified algorithm were 0.834 in event 2 and 0.873 in event 3. Computing time for measure D was almost five times higher than for other measures. Required accuracy levels depend on the application and are related to computing time. While our simplified algorithm may be accurate for real-time feedback during ECG self-recordings, QRS detection based measures can further increase the performance if sufficient computing power is available.
2013-01-01
Amplification of the human epidermal growth factor receptor 2 (HER2) is a prognostic marker for poor clinical outcome and a predictive marker for therapeutic response to targeted therapies in breast cancer patients. With the introduction of anti-HER2 therapies, accurate assessment of HER2 status has become essential. Fluorescence in situ hybridization (FISH) is a widely used technique for the determination of HER2 status in breast cancer. However, the manual signal enumeration is time-consuming. Therefore, several companies like MetaSystem have developed automated image analysis software. Some of these signal enumeration software employ the so called “tile-sampling classifier”, a programming algorithm through which the software quantifies fluorescent signals in images on the basis of square tiles of fixed dimensions. Considering that the size of tile does not always correspond to the size of a single tumor cell nucleus, some users argue that this analysis method might not completely reflect the biology of cells. For that reason, MetaSystems has developed a new classifier which is able to recognize nuclei within tissue sections in order to determine the HER2 amplification status on nuclei basis. We call this new programming algorithm “nuclei-sampling classifier”. In this study, we evaluated the accuracy of the “nuclei-sampling classifier” in determining HER2 gene amplification by FISH in nuclei of breast cancer cells. To this aim, we randomly selected from our cohort 64 breast cancer specimens (32 nonamplified and 32 amplified) and we compared results obtained through manual scoring and through this new classifier. The new classifier automatically recognized individual nuclei. The automated analysis was followed by an optional human correction, during which the user interacted with the software in order to improve the selection of cell nuclei automatically selected. Overall concordance between manual scoring and automated nuclei-sampling analysis was 98.4% (100% for nonamplified cases and 96.9% for amplified cases). However, after human correction, concordance between the two methods was 100%. We conclude that the nuclei-based classifier is a new available tool for automated quantitative HER2 FISH signals analysis in nuclei in breast cancer specimen and it can be used for clinical purposes. PMID:23379971
Chiu, Shih-Hau; Chen, Chien-Chi; Yuan, Gwo-Fang; Lin, Thy-Hou
2006-01-01
Background The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity. Results There were five datasets collected from the Swiss-Prot for establishing the annotation rules. These were treated as the training sets. The TrEMBL entries were treated as the testing set. A correct enzyme classification rate of 70% was obtained for the prokaryote datasets and a similar rate of about 80% was obtained for the eukaryote datasets. The fungus training dataset which lacks an enzyme class description was also used to evaluate the fungus candidate rules. A total of 88 out of 5085 test entries were matched with the fungus rule set. These were otherwise poorly annotated using their functional descriptions. Conclusion The feasibility of using the method presented here to classify enzyme classes based on the enzyme domain rules is evident. The rules may be also employed by the protein annotators in manual annotation or implemented in an automatic annotation flowchart. PMID:16776838
NASA Astrophysics Data System (ADS)
Janaki Sathya, D.; Geetha, K.
2017-12-01
Automatic mass or lesion classification systems are developed to aid in distinguishing between malignant and benign lesions present in the breast DCE-MR images, the systems need to improve both the sensitivity and specificity of DCE-MR image interpretation in order to be successful for clinical use. A new classifier (a set of features together with a classification method) based on artificial neural networks trained using artificial fish swarm optimization (AFSO) algorithm is proposed in this paper. The basic idea behind the proposed classifier is to use AFSO algorithm for searching the best combination of synaptic weights for the neural network. An optimal set of features based on the statistical textural features is presented. The investigational outcomes of the proposed suspicious lesion classifier algorithm therefore confirm that the resulting classifier performs better than other such classifiers reported in the literature. Therefore this classifier demonstrates that the improvement in both the sensitivity and specificity are possible through automated image analysis.
Geometry correction Algorithm for UAV Remote Sensing Image Based on Improved Neural Network
NASA Astrophysics Data System (ADS)
Liu, Ruian; Liu, Nan; Zeng, Beibei; Chen, Tingting; Yin, Ninghao
2018-03-01
Aiming at the disadvantage of current geometry correction algorithm for UAV remote sensing image, a new algorithm is proposed. Adaptive genetic algorithm (AGA) and RBF neural network are introduced into this algorithm. And combined with the geometry correction principle for UAV remote sensing image, the algorithm and solving steps of AGA-RBF are presented in order to realize geometry correction for UAV remote sensing. The correction accuracy and operational efficiency is improved through optimizing the structure and connection weight of RBF neural network separately with AGA and LMS algorithm. Finally, experiments show that AGA-RBF algorithm has the advantages of high correction accuracy, high running rate and strong generalization ability.
SU-D-BRB-01: A Predictive Planning Tool for Stereotactic Radiosurgery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Palefsky, S; Roper, J; Elder, E
Purpose: To demonstrate the feasibility of a predictive planning tool which provides SRS planning guidance based on simple patient anatomical properties: PTV size, PTV shape and distance from critical structures. Methods: Ten framed SRS cases treated at Winship Cancer Institute of Emory University were analyzed to extract data on PTV size, sphericity (shape), and distance from critical structures such as the brainstem and optic chiasm. The cases consisted of five pairs. Each pair consisted of two cases with a similar diagnosis (such as pituitary adenoma or arteriovenous malformation) that were treated with different techniques: DCA, or IMRS. A Naive Bayesmore » Classifier was trained on this data to establish the conditions under which each treatment modality was used. This model was validated by classifying ten other randomly-selected cases into DCA or IMRS classes, calculating the probability of each technique, and comparing results to the treated technique. Results: Of the ten cases used to validate the model, nine had their technique predicted correctly. The three cases treated with IMRS were all identified as such. Their probabilities of being treated with IMRS ranged between 59% and 100%. Six of the seven cases treated with DCA were correctly classified. These probabilities ranged between 51% and 95%. One case treated with DCA was incorrectly predicted to be an IMRS plan. The model’s confidence in this case was 91%. Conclusion: These findings indicate that a predictive planning tool based on simple patient anatomical properties can predict the SRS technique used for treatment. The algorithm operated with 90% accuracy. With further validation on larger patient populations, this tool may be used clinically to guide planners in choosing an appropriate treatment technique. The prediction algorithm could also be adapted to guide selection of treatment parameters such as treatment modality and number of fields for radiotherapy across anatomical sites.« less
NASA Astrophysics Data System (ADS)
Knapmeyer-Endrun, B.; Hammer, C.
2014-12-01
The seismometers that the Apollo astronauts deployed on the Moon provide the only recordings of seismic events from any extra-terrestrial body so far. These lunar events are significantly different from ones recorded on Earth, in terms of both signal shape and source processes. Thus they are a valuable test case for any experiment in planetary seismology. In this study, we analyze Apollo 16 data with a single-station event detection and classification algorithm in view of NASA's upcoming InSight mission to Mars. InSight, scheduled for launch in early 2016, has the goal to investigate Mars' internal structure by deploying a seismometer on its surface. As the mission does not feature any orbiter, continuous data will be relayed to Earth at a reduced rate. Full range data will only be available by requesting specific time-windows within a few days after the receipt of the original transmission. We apply a recently introduced algorithm based on hidden Markov models that requires only a single example waveform of each event class for training appropriate models. After constructing the prototypes we detect and classify impacts and deep and shallow moonquakes. Initial results for 1972 (year of station installation with 8 months of data) indicate a high detection rate of over 95% for impacts, of which more than 80% are classified correctly. Deep moonquakes, which occur in large amounts, but often show only very weak signals, are detected with less certainty (~70%). As there is only one weak shallow moonquake covered, results for this event class are not statistically significant. Daily adjustments of the background noise model help to reduce false alarms, which are mainly erroneous deep moonquake detections, by about 25%. The algorithm enables us to classify events that were previously listed in the catalog without classification, and, through the combined use of long period and short period data, identify some unlisted local impacts as well as at least two yet unreported deep moonquakes.
Classification of physical activities based on body-segments coordination.
Fradet, Laetitia; Marin, Frederic
2016-09-01
Numerous innovations based on connected objects and physical activity (PA) monitoring have been proposed. However, recognition of PAs requires robust algorithm and methodology. The current study presents an innovative approach for PA recognition. It is based on the heuristic definition of postures and the use of body-segments coordination obtained through external sensors. The first part of this study presents the methodology required to define the set of accelerations which is the most appropriate to represent the particular body-segments coordination involved in the chosen PAs (here walking, running, and cycling). For that purpose, subjects of different ages and heterogeneous physical conditions walked, ran, cycled, and performed daily activities at different paces. From the 3D motion capture, vertical and horizontal accelerations of 8 anatomical landmarks representative of the body were computed. Then, the 680 combinations from up to 3 accelerations were compared to identify the most appropriate set of acceleration to discriminate the PAs in terms of body segment coordinations. The discrimination was based on the maximal Hausdorff Distance obtained between the different set of accelerations. The vertical accelerations of both knees demonstrated the best PAs discrimination. The second step was the proof of concept, implementing the proposed algorithm to classify PAs of new group of subjects. The originality of the proposed algorithm is the possibility to use the subject's specific measures as reference data. With the proposed algorithm, 94% of the trials were correctly classified. In conclusion, our study proposed a flexible and extendable methodology. At the current stage, the algorithm has been shown to be valid for heterogeneous subjects, which suggests that it could be deployed in clinical or health-related applications regardless of the subjects' physical abilities or characteristics. Copyright © 2016 Elsevier Ltd. All rights reserved.
On-Board Cryospheric Change Detection By The Autonomous Sciencecraft Experiment
NASA Astrophysics Data System (ADS)
Doggett, T.; Greeley, R.; Castano, R.; Cichy, B.; Chien, S.; Davies, A.; Baker, V.; Dohm, J.; Ip, F.
2004-12-01
The Autonomous Sciencecraft Experiment (ASE) is operating on-board Earth Observing - 1 (EO-1) with the Hyperion hyper-spectral visible/near-IR spectrometer. ASE science activities include autonomous monitoring of cryopsheric changes, triggering the collection of additional data when change is detected and filtering of null data such as no change or cloud cover. This would have application to the study of cryospheres on Earth, Mars and the icy moons of the outer solar system. A cryosphere classification algorithm, in combination with a previously developed cloud algorithm [1] has been tested on-board ten times from March through August 2004. The cloud algorithm correctly screened out three scenes with total cloud cover, while the cryosphere algorithm detected alpine snow cover in the Rocky Mountains, lake thaw near Madison, Wisconsin, and the presence and subsequent break-up of sea ice in the Barrow Strait of the Canadian Arctic. Hyperion has 220 bands ranging from 400 to 2400 nm, with a spatial resolution of 30 m/pixel and a spectral resolution of 10 nm. Limited on-board memory and processing speed imposed the constraint that only partially processed Level 0.5 data with dark image subtraction and gain factors applied, but not full radiometric calibration. In addition, a maximum of 12 bands could be used for any stacked sequence of algorithms run for a scene on-board. The cryosphere algorithm was developed to classify snow, water, ice and land, using six Hyperion bands at 427, 559, 661, 864, 1245 and 1649 nm. Of these, only 427 nm does overlap with the cloud algorithm. The cloud algorithm was developed with Level 1 data, which introduces complications because of the incomplete calibration of SWIR in Level 0.5 data, including a high level of noise in the 1377 nm band used by the cloud algorithm. Development of a more robust cryosphere classifier, including cloud classification specifically adapted to Level 0.5, is in progress for deployment on EO-1 as part of continued ASE operations. [1] Griffin, M.K. et al., Cloud Cover Detection Algorithm For EO-1 Hyperion Imagery, SPIE 17, 2003.
Classification of change detection and change blindness from near-infrared spectroscopy signals
NASA Astrophysics Data System (ADS)
Tanaka, Hirokazu; Katura, Takusige
2011-08-01
Using a machine-learning classification algorithm applied to near-infrared spectroscopy (NIRS) signals, we classify a success (change detection) or a failure (change blindness) in detecting visual changes for a change-detection task. Five subjects perform a change-detection task, and their brain activities are continuously monitored. A support-vector-machine algorithm is applied to classify the change-detection and change-blindness trials, and correct classification probability of 70-90% is obtained for four subjects. Two types of temporal shapes in classification probabilities are found: one exhibiting a maximum value after the task is completed (postdictive type), and another exhibiting a maximum value during the task (predictive type). As for the postdictive type, the classification probability begins to increase immediately after the task completion and reaches its maximum in about the time scale of neuronal hemodynamic response, reflecting a subjective report of change detection. As for the predictive type, the classification probability shows an increase at the task initiation and is maximal while subjects are performing the task, predicting the task performance in detecting a change. We conclude that decoding change detection and change blindness from NIRS signal is possible and argue some future applications toward brain-machine interfaces.
Comparison of seven protocols to identify fecal contamination sources using Escherichia coli
Stoeckel, D.M.; Mathes, M.V.; Hyer, K.E.; Hagedorn, C.; Kator, H.; Lukasik, J.; O'Brien, T. L.; Fenger, T.W.; Samadpour, M.; Strickler, K.M.; Wiggins, B.A.
2004-01-01
Microbial source tracking (MST) uses various approaches to classify fecal-indicator microorganisms to source hosts. Reproducibility, accuracy, and robustness of seven phenotypic and genotypic MST protocols were evaluated by use of Escherichia coli from an eight-host library of known-source isolates and a separate, blinded challenge library. In reproducibility tests, measuring each protocol's ability to reclassify blinded replicates, only one (pulsed-field gel electrophoresis; PFGE) correctly classified all test replicates to host species; three protocols classified 48-62% correctly, and the remaining three classified fewer than 25% correctly. In accuracy tests, measuring each protocol's ability to correctly classify new isolates, ribotyping with EcoRI and PvuII approached 100% correct classification but only 6% of isolates were classified; four of the other six protocols (antibiotic resistance analysis, PFGE, and two repetitive-element PCR protocols) achieved better than random accuracy rates when 30-100% of challenge isolates were classified. In robustness tests, measuring each protocol's ability to recognize isolates from nonlibrary hosts, three protocols correctly classified 33-100% of isolates as "unknown origin," whereas four protocols classified all isolates to a source category. A relevance test, summarizing interpretations for a hypothetical water sample containing 30 challenge isolates, indicated that false-positive classifications would hinder interpretations for most protocols. Study results indicate that more representation in known-source libraries and better classification accuracy would be needed before field application. Thorough reliability assessment of classification results is crucial before and during application of MST protocols.
Novel probabilistic neuroclassifier
NASA Astrophysics Data System (ADS)
Hong, Jiang; Serpen, Gursel
2003-09-01
A novel probabilistic potential function neural network classifier algorithm to deal with classes which are multi-modally distributed and formed from sets of disjoint pattern clusters is proposed in this paper. The proposed classifier has a number of desirable properties which distinguish it from other neural network classifiers. A complete description of the algorithm in terms of its architecture and the pseudocode is presented. Simulation analysis of the newly proposed neuro-classifier algorithm on a set of benchmark problems is presented. Benchmark problems tested include IRIS, Sonar, Vowel Recognition, Two-Spiral, Wisconsin Breast Cancer, Cleveland Heart Disease and Thyroid Gland Disease. Simulation results indicate that the proposed neuro-classifier performs consistently better for a subset of problems for which other neural classifiers perform relatively poorly.
Massé, Fabien; Gonzenbach, Roman R; Arami, Arash; Paraschiv-Ionescu, Anisoara; Luft, Andreas R; Aminian, Kamiar
2015-08-25
Stroke survivors often suffer from mobility deficits. Current clinical evaluation methods, including questionnaires and motor function tests, cannot provide an objective measure of the patients' mobility in daily life. Physical activity performance in daily-life can be assessed using unobtrusive monitoring, for example with a single sensor module fixed on the trunk. Existing approaches based on inertial sensors have limited performance, particularly in detecting transitions between different activities and postures, due to the inherent inter-patient variability of kinematic patterns. To overcome these limitations, one possibility is to use additional information from a barometric pressure (BP) sensor. Our study aims at integrating BP and inertial sensor data into an activity classifier in order to improve the activity (sitting, standing, walking, lying) recognition and the corresponding body elevation (during climbing stairs or when taking an elevator). Taking into account the trunk elevation changes during postural transitions (sit-to-stand, stand-to-sit), we devised an event-driven activity classifier based on fuzzy-logic. Data were acquired from 12 stroke patients with impaired mobility, using a trunk-worn inertial and BP sensor. Events, including walking and lying periods and potential postural transitions, were first extracted. These events were then fed into a double-stage hierarchical Fuzzy Inference System (H-FIS). The first stage processed the events to infer activities and the second stage improved activity recognition by applying behavioral constraints. Finally, the body elevation was estimated using a pattern-enhancing algorithm applied on BP. The patients were videotaped for reference. The performance of the algorithm was estimated using the Correct Classification Rate (CCR) and F-score. The BP-based classification approach was benchmarked against a previously-published fuzzy-logic classifier (FIS-IMU) and a conventional epoch-based classifier (EPOCH). The algorithm performance for posture/activity detection, in terms of CCR was 90.4 %, with 3.3 % and 5.6 % improvements against FIS-IMU and EPOCH, respectively. The proposed classifier essentially benefits from a better recognition of standing activity (70.3 % versus 61.5 % [FIS-IMU] and 42.5 % [EPOCH]) with 98.2 % CCR for body elevation estimation. The monitoring and recognition of daily activities in mobility-impaired stoke patients can be significantly improved using a trunk-fixed sensor that integrates BP, inertial sensors, and an event-based activity classifier.
Freedson, Patty S; Lyden, Kate; Kozey-Keadle, Sarah; Staudenmayer, John
2011-12-01
Previous work from our laboratory provided a "proof of concept" for use of artificial neural networks (nnets) to estimate metabolic equivalents (METs) and identify activity type from accelerometer data (Staudenmayer J, Pober D, Crouter S, Bassett D, Freedson P, J Appl Physiol 107: 1330-1307, 2009). The purpose of this study was to develop new nnets based on a larger, more diverse, training data set and apply these nnet prediction models to an independent sample to evaluate the robustness and flexibility of this machine-learning modeling technique. The nnet training data set (University of Massachusetts) included 277 participants who each completed 11 activities. The independent validation sample (n = 65) (University of Tennessee) completed one of three activity routines. Criterion measures were 1) measured METs assessed using open-circuit indirect calorimetry; and 2) observed activity to identify activity type. The nnet input variables included five accelerometer count distribution features and the lag-1 autocorrelation. The bias and root mean square errors for the nnet MET trained on University of Massachusetts and applied to University of Tennessee were +0.32 and 1.90 METs, respectively. Seventy-seven percent of the activities were correctly classified as sedentary/light, moderate, or vigorous intensity. For activity type, household and locomotion activities were correctly classified by the nnet activity type 98.1 and 89.5% of the time, respectively, and sport was correctly classified 23.7% of the time. Use of this machine-learning technique operates reasonably well when applied to an independent sample. We propose the creation of an open-access activity dictionary, including accelerometer data from a broad array of activities, leading to further improvements in prediction accuracy for METs, activity intensity, and activity type.
Mehra, Lucky K; Cowger, Christina; Gross, Kevin; Ojiambo, Peter S
2016-01-01
Pre-planting factors have been associated with the late-season severity of Stagonospora nodorum blotch (SNB), caused by the fungal pathogen Parastagonospora nodorum, in winter wheat (Triticum aestivum). The relative importance of these factors in the risk of SNB has not been determined and this knowledge can facilitate disease management decisions prior to planting of the wheat crop. In this study, we examined the performance of multiple regression (MR) and three machine learning algorithms namely artificial neural networks, categorical and regression trees, and random forests (RF), in predicting the pre-planting risk of SNB in wheat. Pre-planting factors tested as potential predictor variables were cultivar resistance, latitude, longitude, previous crop, seeding rate, seed treatment, tillage type, and wheat residue. Disease severity assessed at the end of the growing season was used as the response variable. The models were developed using 431 disease cases (unique combinations of predictors) collected from 2012 to 2014 and these cases were randomly divided into training, validation, and test datasets. Models were evaluated based on the regression of observed against predicted severity values of SNB, sensitivity-specificity ROC analysis, and the Kappa statistic. A strong relationship was observed between late-season severity of SNB and specific pre-planting factors in which latitude, longitude, wheat residue, and cultivar resistance were the most important predictors. The MR model explained 33% of variability in the data, while machine learning models explained 47 to 79% of the total variability. Similarly, the MR model correctly classified 74% of the disease cases, while machine learning models correctly classified 81 to 83% of these cases. Results show that the RF algorithm, which explained 79% of the variability within the data, was the most accurate in predicting the risk of SNB, with an accuracy rate of 93%. The RF algorithm could allow early assessment of the risk of SNB, facilitating sound disease management decisions prior to planting of wheat.
AdaBoost-based algorithm for network intrusion detection.
Hu, Weiming; Hu, Wei; Maybank, Steve
2008-04-01
Network intrusion detection aims at distinguishing the attacks on the Internet from normal use of the Internet. It is an indispensable part of the information security system. Due to the variety of network behaviors and the rapid development of attack fashions, it is necessary to develop fast machine-learning-based intrusion detection algorithms with high detection rates and low false-alarm rates. In this correspondence, we propose an intrusion detection algorithm based on the AdaBoost algorithm. In the algorithm, decision stumps are used as weak classifiers. The decision rules are provided for both categorical and continuous features. By combining the weak classifiers for continuous features and the weak classifiers for categorical features into a strong classifier, the relations between these two different types of features are handled naturally, without any forced conversions between continuous and categorical features. Adaptable initial weights and a simple strategy for avoiding overfitting are adopted to improve the performance of the algorithm. Experimental results show that our algorithm has low computational complexity and error rates, as compared with algorithms of higher computational complexity, as tested on the benchmark sample data.
Automatic identification of inertial sensor placement on human body segments during walking
2013-01-01
Background Current inertial motion capture systems are rarely used in biomedical applications. The attachment and connection of the sensors with cables is often a complex and time consuming task. Moreover, it is prone to errors, because each sensor has to be attached to a predefined body segment. By using wireless inertial sensors and automatic identification of their positions on the human body, the complexity of the set-up can be reduced and incorrect attachments are avoided. We present a novel method for the automatic identification of inertial sensors on human body segments during walking. This method allows the user to place (wireless) inertial sensors on arbitrary body segments. Next, the user walks for just a few seconds and the segment to which each sensor is attached is identified automatically. Methods Walking data was recorded from ten healthy subjects using an Xsens MVN Biomech system with full-body configuration (17 inertial sensors). Subjects were asked to walk for about 6 seconds at normal walking speed (about 5 km/h). After rotating the sensor data to a global coordinate frame with x-axis in walking direction, y-axis pointing left and z-axis vertical, RMS, mean, and correlation coefficient features were extracted from x-, y- and z-components and magnitudes of the accelerations, angular velocities and angular accelerations. As a classifier, a decision tree based on the C4.5 algorithm was developed using Weka (Waikato Environment for Knowledge Analysis). Results and conclusions After testing the algorithm with 10-fold cross-validation using 31 walking trials (involving 527 sensors), 514 sensors were correctly classified (97.5%). When a decision tree for a lower body plus trunk configuration (8 inertial sensors) was trained and tested using 10-fold cross-validation, 100% of the sensors were correctly identified. This decision tree was also tested on walking trials of 7 patients (17 walking trials) after anterior cruciate ligament reconstruction, which also resulted in 100% correct identification, thus illustrating the robustness of the method. PMID:23517757
Automatic identification of inertial sensor placement on human body segments during walking.
Weenk, Dirk; van Beijnum, Bert-Jan F; Baten, Chris T M; Hermens, Hermie J; Veltink, Peter H
2013-03-21
Current inertial motion capture systems are rarely used in biomedical applications. The attachment and connection of the sensors with cables is often a complex and time consuming task. Moreover, it is prone to errors, because each sensor has to be attached to a predefined body segment. By using wireless inertial sensors and automatic identification of their positions on the human body, the complexity of the set-up can be reduced and incorrect attachments are avoided.We present a novel method for the automatic identification of inertial sensors on human body segments during walking. This method allows the user to place (wireless) inertial sensors on arbitrary body segments. Next, the user walks for just a few seconds and the segment to which each sensor is attached is identified automatically. Walking data was recorded from ten healthy subjects using an Xsens MVN Biomech system with full-body configuration (17 inertial sensors). Subjects were asked to walk for about 6 seconds at normal walking speed (about 5 km/h). After rotating the sensor data to a global coordinate frame with x-axis in walking direction, y-axis pointing left and z-axis vertical, RMS, mean, and correlation coefficient features were extracted from x-, y- and z-components and magnitudes of the accelerations, angular velocities and angular accelerations. As a classifier, a decision tree based on the C4.5 algorithm was developed using Weka (Waikato Environment for Knowledge Analysis). After testing the algorithm with 10-fold cross-validation using 31 walking trials (involving 527 sensors), 514 sensors were correctly classified (97.5%). When a decision tree for a lower body plus trunk configuration (8 inertial sensors) was trained and tested using 10-fold cross-validation, 100% of the sensors were correctly identified. This decision tree was also tested on walking trials of 7 patients (17 walking trials) after anterior cruciate ligament reconstruction, which also resulted in 100% correct identification, thus illustrating the robustness of the method.
Neural network classification of questionable EGRET events
NASA Astrophysics Data System (ADS)
Meetre, C. A.; Norris, J. P.
1992-02-01
High energy gamma rays (greater than 20 MeV) pair producing in the spark chamber of the Energetic Gamma Ray Telescope Experiment (EGRET) give rise to a characteristic but highly variable 3-D locus of spark sites, which must be processed to decide whether the event is to be included in the database. A significant fraction (about 15 percent or 104 events/day) of the candidate events cannot be categorized (accept/reject) by an automated rule-based procedure; they are therefore tagged, and must be examined and classified manually by a team of expert analysts. We describe a feedforward, back-propagation neural network approach to the classification of the questionable events. The algorithm computes a set of coefficients using representative exemplars drawn from the preclassified set of questionable events. These coefficients map a given input event into a decision vector that, ideally, describes the correct disposition of the event. The net's accuracy is then tested using a different subset of preclassified events. Preliminary results demonstrate the net's ability to correctly classify a large proportion of the events for some categories of questionables. Current work includes the use of much larger training sets to improve the accuracy of the net.
Application of hidden Markov models to biological data mining: a case study
NASA Astrophysics Data System (ADS)
Yin, Michael M.; Wang, Jason T.
2000-04-01
In this paper we present an example of biological data mining: the detection of splicing junction acceptors in eukaryotic genes. Identification or prediction of transcribed sequences from within genomic DNA has been a major rate-limiting step in the pursuit of genes. Programs currently available are far from being powerful enough to elucidate the gene structure completely. Here we develop a hidden Markov model (HMM) to represent the degeneracy features of splicing junction acceptor sites in eukaryotic genes. The HMM system is fully trained using an expectation maximization (EM) algorithm and the system performance is evaluated using the 10-way cross- validation method. Experimental results show that our HMM system can correctly classify more than 94% of the candidate sequences (including true and false acceptor sites) into right categories. About 90% of the true acceptor sites and 96% of the false acceptor sites in the test data are classified correctly. These results are very promising considering that only the local information in DNA is used. The proposed model will be a very important component of an effective and accurate gene structure detection system currently being developed in our lab.
Face recognition: database acquisition, hybrid algorithms, and human studies
NASA Astrophysics Data System (ADS)
Gutta, Srinivas; Huang, Jeffrey R.; Singh, Dig; Wechsler, Harry
1997-02-01
One of the most important technologies absent in traditional and emerging frontiers of computing is the management of visual information. Faces are accessible `windows' into the mechanisms that govern our emotional and social lives. The corresponding face recognition tasks considered herein include: (1) Surveillance, (2) CBIR, and (3) CBIR subject to correct ID (`match') displaying specific facial landmarks such as wearing glasses. We developed robust matching (`classification') and retrieval schemes based on hybrid classifiers and showed their feasibility using the FERET database. The hybrid classifier architecture consist of an ensemble of connectionist networks--radial basis functions-- and decision trees. The specific characteristics of our hybrid architecture include (a) query by consensus as provided by ensembles of networks for coping with the inherent variability of the image formation and data acquisition process, and (b) flexible and adaptive thresholds as opposed to ad hoc and hard thresholds. Experimental results, proving the feasibility of our approach, yield (i) 96% accuracy, using cross validation (CV), for surveillance on a data base consisting of 904 images (ii) 97% accuracy for CBIR tasks, on a database of 1084 images, and (iii) 93% accuracy, using CV, for CBIR subject to correct ID match tasks on a data base of 200 images.
Neural network classification of questionable EGRET events
NASA Technical Reports Server (NTRS)
Meetre, C. A.; Norris, J. P.
1992-01-01
High energy gamma rays (greater than 20 MeV) pair producing in the spark chamber of the Energetic Gamma Ray Telescope Experiment (EGRET) give rise to a characteristic but highly variable 3-D locus of spark sites, which must be processed to decide whether the event is to be included in the database. A significant fraction (about 15 percent or 10(exp 4) events/day) of the candidate events cannot be categorized (accept/reject) by an automated rule-based procedure; they are therefore tagged, and must be examined and classified manually by a team of expert analysts. We describe a feedforward, back-propagation neural network approach to the classification of the questionable events. The algorithm computes a set of coefficients using representative exemplars drawn from the preclassified set of questionable events. These coefficients map a given input event into a decision vector that, ideally, describes the correct disposition of the event. The net's accuracy is then tested using a different subset of preclassified events. Preliminary results demonstrate the net's ability to correctly classify a large proportion of the events for some categories of questionables. Current work includes the use of much larger training sets to improve the accuracy of the net.
Improving diagnostic recognition of primary hyperparathyroidism with machine learning.
Somnay, Yash R; Craven, Mark; McCoy, Kelly L; Carty, Sally E; Wang, Tracy S; Greenberg, Caprice C; Schneider, David F
2017-04-01
Parathyroidectomy offers the only cure for primary hyperparathyroidism, but today only 50% of primary hyperparathyroidism patients are referred for operation, in large part, because the condition is widely under-recognized. The diagnosis of primary hyperparathyroidism can be especially challenging with mild biochemical indices. Machine learning is a collection of methods in which computers build predictive algorithms based on labeled examples. With the aim of facilitating diagnosis, we tested the ability of machine learning to distinguish primary hyperparathyroidism from normal physiology using clinical and laboratory data. This retrospective cohort study used a labeled training set and 10-fold cross-validation to evaluate accuracy of the algorithm. Measures of accuracy included area under the receiver operating characteristic curve, precision (sensitivity), and positive and negative predictive value. Several different algorithms and ensembles of algorithms were tested using the Weka platform. Among 11,830 patients managed operatively at 3 high-volume endocrine surgery programs from March 2001 to August 2013, 6,777 underwent parathyroidectomy for confirmed primary hyperparathyroidism, and 5,053 control patients without primary hyperparathyroidism underwent thyroidectomy. Test-set accuracies for machine learning models were determined using 10-fold cross-validation. Age, sex, and serum levels of preoperative calcium, phosphate, parathyroid hormone, vitamin D, and creatinine were defined as potential predictors of primary hyperparathyroidism. Mild primary hyperparathyroidism was defined as primary hyperparathyroidism with normal preoperative calcium or parathyroid hormone levels. After testing a variety of machine learning algorithms, Bayesian network models proved most accurate, classifying correctly 95.2% of all primary hyperparathyroidism patients (area under receiver operating characteristic = 0.989). Omitting parathyroid hormone from the model did not decrease the accuracy significantly (area under receiver operating characteristic = 0.985). In mild disease cases, however, the Bayesian network model classified correctly 71.1% of patients with normal calcium and 92.1% with normal parathyroid hormone levels preoperatively. Bayesian networking and AdaBoost improved the accuracy of all parathyroid hormone patients to 97.2% cases (area under receiver operating characteristic = 0.994), and 91.9% of primary hyperparathyroidism patients with mild disease. This was significantly improved relative to Bayesian networking alone (P < .0001). Machine learning can diagnose accurately primary hyperparathyroidism without human input even in mild disease. Incorporation of this tool into electronic medical record systems may aid in recognition of this under-diagnosed disorder. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Tahernezhad-Javazm, Farajollah; Azimirad, Vahid; Shoaran, Maryam
2018-04-01
Objective. Considering the importance and the near-future development of noninvasive brain-machine interface (BMI) systems, this paper presents a comprehensive theoretical-experimental survey on the classification and evolutionary methods for BMI-based systems in which EEG signals are used. Approach. The paper is divided into two main parts. In the first part, a wide range of different types of the base and combinatorial classifiers including boosting and bagging classifiers and evolutionary algorithms are reviewed and investigated. In the second part, these classifiers and evolutionary algorithms are assessed and compared based on two types of relatively widely used BMI systems, sensory motor rhythm-BMI and event-related potentials-BMI. Moreover, in the second part, some of the improved evolutionary algorithms as well as bi-objective algorithms are experimentally assessed and compared. Main results. In this study two databases are used, and cross-validation accuracy (CVA) and stability to data volume (SDV) are considered as the evaluation criteria for the classifiers. According to the experimental results on both databases, regarding the base classifiers, linear discriminant analysis and support vector machines with respect to CVA evaluation metric, and naive Bayes with respect to SDV demonstrated the best performances. Among the combinatorial classifiers, four classifiers, Bagg-DT (bagging decision tree), LogitBoost, and GentleBoost with respect to CVA, and Bagging-LR (bagging logistic regression) and AdaBoost (adaptive boosting) with respect to SDV had the best performances. Finally, regarding the evolutionary algorithms, single-objective invasive weed optimization (IWO) and bi-objective nondominated sorting IWO algorithms demonstrated the best performances. Significance. We present a general survey on the base and the combinatorial classification methods for EEG signals (sensory motor rhythm and event-related potentials) as well as their optimization methods through the evolutionary algorithms. In addition, experimental and statistical significance tests are carried out to study the applicability and effectiveness of the reviewed methods.
Mandarin Chinese Tone Identification in Cochlear Implants: Predictions from Acoustic Models
Morton, Kenneth D.; Torrione, Peter A.; Throckmorton, Chandra S.; Collins, Leslie M.
2015-01-01
It has been established that current cochlear implants do not supply adequate spectral information for perception of tonal languages. Comprehension of a tonal language, such as Mandarin Chinese, requires recognition of lexical tones. New strategies of cochlear stimulation such as variable stimulation rate and current steering may provide the means of delivering more spectral information and thus may provide the auditory fine structure required for tone recognition. Several cochlear implant signal processing strategies are examined in this study, the continuous interleaved sampling (CIS) algorithm, the frequency amplitude modulation encoding (FAME) algorithm, and the multiple carrier frequency algorithm (MCFA). These strategies provide different types and amounts of spectral information. Pattern recognition techniques can be applied to data from Mandarin Chinese tone recognition tasks using acoustic models as a means of testing the abilities of these algorithms to transmit the changes in fundamental frequency indicative of the four lexical tones. The ability of processed Mandarin Chinese tones to be correctly classified may predict trends in the effectiveness of different signal processing algorithms in cochlear implants. The proposed techniques can predict trends in performance of the signal processing techniques in quiet conditions but fail to do so in noise. PMID:18706497
NASA Astrophysics Data System (ADS)
Lestari, A. W.; Rustam, Z.
2017-07-01
In the last decade, breast cancer has become the focus of world attention as this disease is one of the primary leading cause of death for women. Therefore, it is necessary to have the correct precautions and treatment. In previous studies, Fuzzy Kennel K-Medoid algorithm has been used for multi-class data. This paper proposes an algorithm to classify the high dimensional data of breast cancer using Fuzzy Possibilistic C-means (FPCM) and a new method based on clustering analysis using Normed Kernel Function-Based Fuzzy Possibilistic C-Means (NKFPCM). The objective of this paper is to obtain the best accuracy in classification of breast cancer data. In order to improve the accuracy of the two methods, the features candidates are evaluated using feature selection, where Laplacian Score is used. The results show the comparison accuracy and running time of FPCM and NKFPCM with and without feature selection.
Herrera, Lara Maria; Fernandes, Clemente Maia da Silva; Serra, Mônica da Costa
2018-01-01
This study aimed to develop and to assess an algorithm to facilitate lip print visualization, and to digitally analyze lip prints on different supports, by superimposition. It also aimed to classify lip prints according to sex. A batch image processing algorithm was developed, which facilitated the identification and extraction of information about lip grooves. However, it performed better for lip print images with a uniform background. Paper and glass slab allowed more correct identifications than glass and the both sides of compact disks. There was no significant difference between the type of support and the amount of matching structures located in the middle area of the lower lip. There was no evidence of association between types of lip grooves and sex. Lip groove patterns of type III and type I were the most common for both sexes. The development of systems for lip print analysis is necessary, mainly concerning digital methods. © 2017 American Academy of Forensic Sciences.
NASA Astrophysics Data System (ADS)
Renkoski, Timothy E.; Hatch, Kenneth D.; Utzinger, Urs
2012-03-01
With no sufficient screening test for ovarian cancer, a method to evaluate the ovarian disease state quickly and nondestructively is needed. The authors have applied a wide-field spectral imager to freshly resected ovaries of 30 human patients in a study believed to be the first of its magnitude. Endogenous fluorescence was excited with 365-nm light and imaged in eight emission bands collectively covering the 400- to 640-nm range. Linear discriminant analysis was used to classify all image pixels and generate diagnostic maps of the ovaries. Training the classifier with previously collected single-point autofluorescence measurements of a spectroscopic probe enabled this novel classification. The process by which probe-collected spectra were transformed for comparison with imager spectra is described. Sensitivity of 100% and specificity of 51% were obtained in classifying normal and cancerous ovaries using autofluorescence data alone. Specificity increased to 69% when autofluorescence data were divided by green reflectance data to correct for spatial variation in tissue absorption properties. Benign neoplasm ovaries were also found to classify as nonmalignant using the same algorithm. Although applied ex vivo, the method described here appears useful for quick assessment of cancer presence in the human ovary.
Software platform for managing the classification of error- related potentials of observers
NASA Astrophysics Data System (ADS)
Asvestas, P.; Ventouras, E.-C.; Kostopoulos, S.; Sidiropoulos, K.; Korfiatis, V.; Korda, A.; Uzunolglu, A.; Karanasiou, I.; Kalatzis, I.; Matsopoulos, G.
2015-09-01
Human learning is partly based on observation. Electroencephalographic recordings of subjects who perform acts (actors) or observe actors (observers), contain a negative waveform in the Evoked Potentials (EPs) of the actors that commit errors and of observers who observe the error-committing actors. This waveform is called the Error-Related Negativity (ERN). Its detection has applications in the context of Brain-Computer Interfaces. The present work describes a software system developed for managing EPs of observers, with the aim of classifying them into observations of either correct or incorrect actions. It consists of an integrated platform for the storage, management, processing and classification of EPs recorded during error-observation experiments. The system was developed using C# and the following development tools and frameworks: MySQL, .NET Framework, Entity Framework and Emgu CV, for interfacing with the machine learning library of OpenCV. Up to six features can be computed per EP recording per electrode. The user can select among various feature selection algorithms and then proceed to train one of three types of classifiers: Artificial Neural Networks, Support Vector Machines, k-nearest neighbour. Next the classifier can be used for classifying any EP curve that has been inputted to the database.
Feature Selection and Effective Classifiers.
ERIC Educational Resources Information Center
Deogun, Jitender S.; Choubey, Suresh K.; Raghavan, Vijay V.; Sever, Hayri
1998-01-01
Develops and analyzes four algorithms for feature selection in the context of rough set methodology. Experimental results confirm the expected relationship between the time complexity of these algorithms and the classification accuracy of the resulting upper classifiers. When compared, results of upper classifiers perform better than lower…
Non-proliferative diabetic retinopathy symptoms detection and classification using neural network.
Al-Jarrah, Mohammad A; Shatnawi, Hadeel
2017-08-01
Diabetic retinopathy (DR) causes blindness in the working age for people with diabetes in most countries. The increasing number of people with diabetes worldwide suggests that DR will continue to be major contributors to vision loss. Early detection of retinopathy progress in individuals with diabetes is critical for preventing visual loss. Non-proliferative DR (NPDR) is an early stage of DR. Moreover, NPDR can be classified into mild, moderate and severe. This paper proposes a novel morphology-based algorithm for detecting retinal lesions and classifying each case. First, the proposed algorithm detects the three DR lesions, namely haemorrhages, microaneurysms and exudates. Second, we defined and extracted a set of features from detected lesions. The set of selected feature emulates what physicians looked for in classifying NPDR case. Finally, we designed an artificial neural network (ANN) classifier with three layers to classify NPDR to normal, mild, moderate and severe. Bayesian regularisation and resilient backpropagation algorithms are used to train ANN. The accuracy for the proposed classifiers based on Bayesian regularisation and resilient backpropagation algorithms are 96.6 and 89.9, respectively. The obtained results are compared with results of the recent published classifier. Our proposed classifier outperforms the best in terms of sensitivity and specificity.
Johnson, LeeAnn K; Brown, Mary B; Carruthers, Ethan A; Ferguson, John A; Dombek, Priscilla E; Sadowsky, Michael J
2004-08-01
A horizontal, fluorophore-enhanced, repetitive extragenic palindromic-PCR (rep-PCR) DNA fingerprinting technique (HFERP) was developed and evaluated as a means to differentiate human from animal sources of Escherichia coli. Box A1R primers and PCR were used to generate 2,466 rep-PCR and 1,531 HFERP DNA fingerprints from E. coli strains isolated from fecal material from known human and 12 animal sources: dogs, cats, horses, deer, geese, ducks, chickens, turkeys, cows, pigs, goats, and sheep. HFERP DNA fingerprinting reduced within-gel grouping of DNA fingerprints and improved alignment of DNA fingerprints between gels, relative to that achieved using rep-PCR DNA fingerprinting. Jackknife analysis of the complete rep-PCR DNA fingerprint library, done using Pearson's product-moment correlation coefficient, indicated that animal and human isolates were assigned to the correct source groups with an 82.2% average rate of correct classification. However, when only unique isolates were examined, isolates from a single animal having a unique DNA fingerprint, Jackknife analysis showed that isolates were assigned to the correct source groups with a 60.5% average rate of correct classification. The percentages of correctly classified isolates were about 15 and 17% greater for rep-PCR and HFERP, respectively, when analyses were done using the curve-based Pearson's product-moment correlation coefficient, rather than the band-based Jaccard algorithm. Rarefaction analysis indicated that, despite the relatively large size of the known-source database, genetic diversity in E. coli was very great and is most likely accounting for our inability to correctly classify many environmental E. coli isolates. Our data indicate that removal of duplicate genotypes within DNA fingerprint libraries, increased database size, proper methods of statistical analysis, and correct alignment of band data within and between gels improve the accuracy of microbial source tracking methods.
Tighe, Patrick J; Lucas, Stephen D; Edwards, David A; Boezaart, André P; Aytug, Haldun; Bihorac, Azra
2012-10-01
The purpose of this project was to determine whether machine-learning classifiers could predict which patients would require a preoperative acute pain service (APS) consultation. Retrospective cohort. University teaching hospital. The records of 9,860 surgical patients posted between January 1 and June 30, 2010 were reviewed. Request for APS consultation. A cohort of machine-learning classifiers was compared according to its ability or inability to classify surgical cases as requiring a request for a preoperative APS consultation. Classifiers were then optimized utilizing ensemble techniques. Computational efficiency was measured with the central processing unit processing times required for model training. Classifiers were tested using the full feature set, as well as the reduced feature set that was optimized using a merit-based dimensional reduction strategy. Machine-learning classifiers correctly predicted preoperative requests for APS consultations in 92.3% (95% confidence intervals [CI], 91.8-92.8) of all surgical cases. Bayesian methods yielded the highest area under the receiver operating curve (0.87, 95% CI 0.84-0.89) and lowest training times (0.0018 seconds, 95% CI, 0.0017-0.0019 for the NaiveBayesUpdateable algorithm). An ensemble of high-performing machine-learning classifiers did not yield a higher area under the receiver operating curve than its component classifiers. Dimensional reduction decreased the computational requirements for multiple classifiers, but did not adversely affect classification performance. Using historical data, machine-learning classifiers can predict which surgical cases should prompt a preoperative request for an APS consultation. Dimensional reduction improved computational efficiency and preserved predictive performance. Wiley Periodicals, Inc.
Bourke, Alan K; Klenk, Jochen; Schwickert, Lars; Aminian, Kamiar; Ihlen, Espen A F; Mellone, Sabato; Helbostad, Jorunn L; Chiari, Lorenzo; Becker, Clemens
2016-08-01
Automatic fall detection will promote independent living and reduce the consequences of falls in the elderly by ensuring people can confidently live safely at home for linger. In laboratory studies inertial sensor technology has been shown capable of distinguishing falls from normal activities. However less than 7% of fall-detection algorithm studies have used fall data recorded from elderly people in real life. The FARSEEING project has compiled a database of real life falls from elderly people, to gain new knowledge about fall events and to develop fall detection algorithms to combat the problems associated with falls. We have extracted 12 different kinematic, temporal and kinetic related features from a data-set of 89 real-world falls and 368 activities of daily living. Using the extracted features we applied machine learning techniques and produced a selection of algorithms based on different feature combinations. The best algorithm employs 10 different features and produced a sensitivity of 0.88 and a specificity of 0.87 in classifying falls correctly. This algorithm can be used distinguish real-world falls from normal activities of daily living in a sensor consisting of a tri-axial accelerometer and tri-axial gyroscope located at L5.
Grassmann, Felix; Mengelkamp, Judith; Brandl, Caroline; Harsch, Sebastian; Zimmermann, Martina E; Linkohr, Birgit; Peters, Annette; Heid, Iris M; Palm, Christoph; Weber, Bernhard H F
2018-04-10
Age-related macular degeneration (AMD) is a common threat to vision. While classification of disease stages is critical to understanding disease risk and progression, several systems based on color fundus photographs are known. Most of these require in-depth and time-consuming analysis of fundus images. Herein, we present an automated computer-based classification algorithm. Algorithm development for AMD classification based on a large collection of color fundus images. Validation is performed on a cross-sectional, population-based study. We included 120 656 manually graded color fundus images from 3654 Age-Related Eye Disease Study (AREDS) participants. AREDS participants were >55 years of age, and non-AMD sight-threatening diseases were excluded at recruitment. In addition, performance of our algorithm was evaluated in 5555 fundus images from the population-based Kooperative Gesundheitsforschung in der Region Augsburg (KORA; Cooperative Health Research in the Region of Augsburg) study. We defined 13 classes (9 AREDS steps, 3 late AMD stages, and 1 for ungradable images) and trained several convolution deep learning architectures. An ensemble of network architectures improved prediction accuracy. An independent dataset was used to evaluate the performance of our algorithm in a population-based study. κ Statistics and accuracy to evaluate the concordance between predicted and expert human grader classification. A network ensemble of 6 different neural net architectures predicted the 13 classes in the AREDS test set with a quadratic weighted κ of 92% (95% confidence interval, 89%-92%) and an overall accuracy of 63.3%. In the independent KORA dataset, images wrongly classified as AMD were mainly the result of a macular reflex observed in young individuals. By restricting the KORA analysis to individuals >55 years of age and prior exclusion of other retinopathies, the weighted and unweighted κ increased to 50% and 63%, respectively. Importantly, the algorithm detected 84.2% of all fundus images with definite signs of early or late AMD. Overall, 94.3% of healthy fundus images were classified correctly. Our deep learning algoritm revealed a weighted κ outperforming human graders in the AREDS study and is suitable to classify AMD fundus images in other datasets using individuals >55 years of age. Copyright © 2018 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem
Liu, Dong-sheng; Fan, Shu-jiang
2014-01-01
In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389
Sparse feature selection for classification and prediction of metastasis in endometrial cancer.
Ahsen, Mehmet Eren; Boren, Todd P; Singh, Nitin K; Misganaw, Burook; Mutch, David G; Moore, Kathleen N; Backes, Floor J; McCourt, Carolyn K; Lea, Jayanthi S; Miller, David S; White, Michael A; Vidyasagar, Mathukumalli
2017-03-27
Metastasis via pelvic and/or para-aortic lymph nodes is a major risk factor for endometrial cancer. Lymph-node resection ameliorates risk but is associated with significant co-morbidities. Incidence in patients with stage I disease is 4-22% but no mechanism exists to accurately predict it. Therefore, national guidelines for primary staging surgery include pelvic and para-aortic lymph node dissection for all patients whose tumor exceeds 2cm in diameter. We sought to identify a robust molecular signature that can accurately classify risk of lymph node metastasis in endometrial cancer patients. 86 tumors matched for age and race, and evenly distributed between lymph node-positive and lymph node-negative cases, were selected as a training cohort. Genomic micro-RNA expression was profiled for each sample to serve as the predictive feature matrix. An independent set of 28 tumor samples was collected and similarly characterized to serve as a test cohort. A feature selection algorithm was designed for applications where the number of samples is far smaller than the number of measured features per sample. A predictive miRNA expression signature was developed using this algorithm, which was then used to predict the metastatic status of the independent test cohort. A weighted classifier, using 18 micro-RNAs, achieved 100% accuracy on the training cohort. When applied to the testing cohort, the classifier correctly predicted 90% of node-positive cases, and 80% of node-negative cases (FDR = 6.25%). Results indicate that the evaluation of the quantitative sparse-feature classifier proposed here in clinical trials may lead to significant improvement in the prediction of lymphatic metastases in endometrial cancer patients.
Mining sequential patterns for protein fold recognition.
Exarchos, Themis P; Papaloukas, Costas; Lampros, Christos; Fotiadis, Dimitrios I
2008-02-01
Protein data contain discriminative patterns that can be used in many beneficial applications if they are defined correctly. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. Protein classification in terms of fold recognition plays an important role in computational protein analysis, since it can contribute to the determination of the function of a protein whose structure is unknown. Specifically, one of the most efficient SPM algorithms, cSPADE, is employed for the analysis of protein sequence. A classifier uses the extracted sequential patterns to classify proteins in the appropriate fold category. For training and evaluating the proposed method we used the protein sequences from the Protein Data Bank and the annotation of the SCOP database. The method exhibited an overall accuracy of 25% in a classification problem with 36 candidate categories. The classification performance reaches up to 56% when the five most probable protein folds are considered.
LDA boost classification: boosting by topics
NASA Astrophysics Data System (ADS)
Lei, La; Qiao, Guo; Qimin, Cao; Qitao, Li
2012-12-01
AdaBoost is an efficacious classification algorithm especially in text categorization (TC) tasks. The methodology of setting up a classifier committee and voting on the documents for classification can achieve high categorization precision. However, traditional Vector Space Model can easily lead to the curse of dimensionality and feature sparsity problems; so it affects classification performance seriously. This article proposed a novel classification algorithm called LDABoost based on boosting ideology which uses Latent Dirichlet Allocation (LDA) to modeling the feature space. Instead of using words or phrase, LDABoost use latent topics as the features. In this way, the feature dimension is significantly reduced. Improved Naïve Bayes (NB) is designed as the weaker classifier which keeps the efficiency advantage of classic NB algorithm and has higher precision. Moreover, a two-stage iterative weighted method called Cute Integration in this article is proposed for improving the accuracy by integrating weak classifiers into strong classifier in a more rational way. Mutual Information is used as metrics of weights allocation. The voting information and the categorization decision made by basis classifiers are fully utilized for generating the strong classifier. Experimental results reveals LDABoost making categorization in a low-dimensional space, it has higher accuracy than traditional AdaBoost algorithms and many other classic classification algorithms. Moreover, its runtime consumption is lower than different versions of AdaBoost, TC algorithms based on support vector machine and Neural Networks.
Carvalho, Gustavo A; Minnett, Peter J; Fleming, Lora E; Banzon, Viva F; Baringer, Warner
2010-06-01
In a continuing effort to develop suitable methods for the surveillance of Harmful Algal Blooms (HABs) of Karenia brevis using satellite radiometers, a new multi-algorithm method was developed to explore whether improvements in the remote sensing detection of the Florida Red Tide was possible. A Hybrid Scheme was introduced that sequentially applies the optimized versions of two pre-existing satellite-based algorithms: an Empirical Approach (using water-leaving radiance as a function of chlorophyll concentration) and a Bio-optical Technique (using particulate backscatter along with chlorophyll concentration). The long-term evaluation of the new multi-algorithm method was performed using a multi-year MODIS dataset (2002 to 2006; during the boreal Summer-Fall periods - July to December) along the Central West Florida Shelf between 25.75°N and 28.25°N. Algorithm validation was done with in situ measurements of the abundances of K. brevis; cell counts ≥1.5×10(4) cells l(-1) defined a detectable HAB. Encouraging statistical results were derived when either or both algorithms correctly flagged known samples. The majority of the valid match-ups were correctly identified (~80% of both HABs and non-blooming conditions) and few false negatives or false positives were produced (~20% of each). Additionally, most of the HAB-positive identifications in the satellite data were indeed HAB samples (positive predictive value: ~70%) and those classified as HAB-negative were almost all non-bloom cases (negative predictive value: ~86%). These results demonstrate an excellent detection capability, on average ~10% more accurate than the individual algorithms used separately. Thus, the new Hybrid Scheme could become a powerful tool for environmental monitoring of K. brevis blooms, with valuable consequences including leading to the more rapid and efficient use of ships to make in situ measurements of HABs.
Carvalho, Gustavo A.; Minnett, Peter J.; Fleming, Lora E.; Banzon, Viva F.; Baringer, Warner
2010-01-01
In a continuing effort to develop suitable methods for the surveillance of Harmful Algal Blooms (HABs) of Karenia brevis using satellite radiometers, a new multi-algorithm method was developed to explore whether improvements in the remote sensing detection of the Florida Red Tide was possible. A Hybrid Scheme was introduced that sequentially applies the optimized versions of two pre-existing satellite-based algorithms: an Empirical Approach (using water-leaving radiance as a function of chlorophyll concentration) and a Bio-optical Technique (using particulate backscatter along with chlorophyll concentration). The long-term evaluation of the new multi-algorithm method was performed using a multi-year MODIS dataset (2002 to 2006; during the boreal Summer-Fall periods – July to December) along the Central West Florida Shelf between 25.75°N and 28.25°N. Algorithm validation was done with in situ measurements of the abundances of K. brevis; cell counts ≥1.5×104 cells l−1 defined a detectable HAB. Encouraging statistical results were derived when either or both algorithms correctly flagged known samples. The majority of the valid match-ups were correctly identified (~80% of both HABs and non-blooming conditions) and few false negatives or false positives were produced (~20% of each). Additionally, most of the HAB-positive identifications in the satellite data were indeed HAB samples (positive predictive value: ~70%) and those classified as HAB-negative were almost all non-bloom cases (negative predictive value: ~86%). These results demonstrate an excellent detection capability, on average ~10% more accurate than the individual algorithms used separately. Thus, the new Hybrid Scheme could become a powerful tool for environmental monitoring of K. brevis blooms, with valuable consequences including leading to the more rapid and efficient use of ships to make in situ measurements of HABs. PMID:21037979
Wang, Qing-Tao; Li, Yong-Zhe; Liang, Yu-Fang; Hu, Chao-Jun; Zhai, Yu-Hua; Zhao, Guan-Fei; Zhang, Jian; Li, Ning; Ni, An-Ping; Chen, Wen-Ming; Xu, Yang
2009-04-01
A diagnosis of multiple myeloma (MM) is difficult to make on the basis of any single laboratory test result. Accurate diagnosis of MM generally results from a number of costly and invasive laboratory tests and medical procedures. The aim of this work is to find a new, highly specific and sensitive method for MM diagnosis. Serum samples were tested in groups representing MM (n = 54) and non-MM (n = 108). These included a subgroup of 17 plasma cell dyscrasias, a subgroup of 17 reactive plasmacytosis, 5 B cell lymphomas, and 7 other tumors with osseus metastasis, as well as 62 healthy donors as controls. Bioinformatic calculations associated with MM were performed. The decision algorithm, with a panel of three biomarkers, correctly identified 24 of 24 (100%) MM samples and 46 of 49 (93.88%) non-MM samples in the training set. During the masked test for the discriminatory model, 26 of 30 MM patients (sensitivity, 86.67%) were precisely recognized, and all 34 normal donors were successfully classified; patients with reactive plasmacytosis were also correctly classified into the non-MM group, and 11 of the other patients were incorrectly classified as MM. The results suggested that proteomic fingerprint technology combining magnetic beads with MALDI-TOF-MS has the potential for identifying individuals with MM. The biomarker classification model was suitable for preliminary assessment of MM and could potentially serve as a useful tool for MM diagnosis and differentiation diagnosis.
Experimental testing of four correction algorithms for the forward scattering spectrometer probe
NASA Technical Reports Server (NTRS)
Hovenac, Edward A.; Oldenburg, John R.; Lock, James A.
1992-01-01
Three number density correction algorithms and one size distribution correction algorithm for the Forward Scattering Spectrometer Probe (FSSP) were compared with data taken by the Phase Doppler Particle Analyzer (PDPA) and an optical number density measuring instrument (NDMI). Of the three number density correction algorithms, the one that compared best to the PDPA and NDMI data was the algorithm developed by Baumgardner, Strapp, and Dye (1985). The algorithm that corrects sizing errors in the FSSP that was developed by Lock and Hovenac (1989) was shown to be within 25 percent of the Phase Doppler measurements at number densities as high as 3000/cc.
NASA Astrophysics Data System (ADS)
Sun, Wei; Ding, Wei; Yan, Huifang; Duan, Shunli
2018-06-01
Shoe-mounted pedestrian navigation systems based on micro inertial sensors rely on zero velocity updates to correct their positioning errors in time, which effectively makes determining the zero velocity interval play a key role during normal walking. However, as walking gaits are complicated, and vary from person to person, it is difficult to detect walking gaits with a fixed threshold method. This paper proposes a pedestrian gait classification method based on a hidden Markov model. Pedestrian gait data are collected with a micro inertial measurement unit installed at the instep. On the basis of analyzing the characteristics of the pedestrian walk, a single direction angular rate gyro output is used to classify gait features. The angular rate data are modeled into a univariate Gaussian mixture model with three components, and a four-state left–right continuous hidden Markov model (CHMM) is designed to classify the normal walking gait. The model parameters are trained and optimized using the Baum–Welch algorithm and then the sliding window Viterbi algorithm is used to decode the gait. Walking data are collected through eight subjects walking along the same route at three different speeds; the leave-one-subject-out cross validation method is conducted to test the model. Experimental results show that the proposed algorithm can accurately detect different walking gaits of zero velocity interval. The location experiment shows that the precision of CHMM-based pedestrian navigation improved by 40% when compared to the angular rate threshold method.
A unified classifier for robust face recognition based on combining multiple subspace algorithms
NASA Astrophysics Data System (ADS)
Ijaz Bajwa, Usama; Ahmad Taj, Imtiaz; Waqas Anwar, Muhammad
2012-10-01
Face recognition being the fastest growing biometric technology has expanded manifold in the last few years. Various new algorithms and commercial systems have been proposed and developed. However, none of the proposed or developed algorithm is a complete solution because it may work very well on one set of images with say illumination changes but may not work properly on another set of image variations like expression variations. This study is motivated by the fact that any single classifier cannot claim to show generally better performance against all facial image variations. To overcome this shortcoming and achieve generality, combining several classifiers using various strategies has been studied extensively also incorporating the question of suitability of any classifier for this task. The study is based on the outcome of a comprehensive comparative analysis conducted on a combination of six subspace extraction algorithms and four distance metrics on three facial databases. The analysis leads to the selection of the most suitable classifiers which performs better on one task or the other. These classifiers are then combined together onto an ensemble classifier by two different strategies of weighted sum and re-ranking. The results of the ensemble classifier show that these strategies can be effectively used to construct a single classifier that can successfully handle varying facial image conditions of illumination, aging and facial expressions.
NASA Astrophysics Data System (ADS)
Jin, Minglei; Jin, Weiqi; Li, Yiyang; Li, Shuo
2015-08-01
In this paper, we propose a novel scene-based non-uniformity correction algorithm for infrared image processing-temporal high-pass non-uniformity correction algorithm based on grayscale mapping (THP and GM). The main sources of non-uniformity are: (1) detector fabrication inaccuracies; (2) non-linearity and variations in the read-out electronics and (3) optical path effects. The non-uniformity will be reduced by non-uniformity correction (NUC) algorithms. The NUC algorithms are often divided into calibration-based non-uniformity correction (CBNUC) algorithms and scene-based non-uniformity correction (SBNUC) algorithms. As non-uniformity drifts temporally, CBNUC algorithms must be repeated by inserting a uniform radiation source which SBNUC algorithms do not need into the view, so the SBNUC algorithm becomes an essential part of infrared imaging system. The SBNUC algorithms' poor robustness often leads two defects: artifacts and over-correction, meanwhile due to complicated calculation process and large storage consumption, hardware implementation of the SBNUC algorithms is difficult, especially in Field Programmable Gate Array (FPGA) platform. The THP and GM algorithm proposed in this paper can eliminate the non-uniformity without causing defects. The hardware implementation of the algorithm only based on FPGA has two advantages: (1) low resources consumption, and (2) small hardware delay: less than 20 lines, it can be transplanted to a variety of infrared detectors equipped with FPGA image processing module, it can reduce the stripe non-uniformity and the ripple non-uniformity.
Wilkoff, B L; Kühlkamp, V; Volosin, K; Ellenbogen, K; Waldecker, B; Kacet, S; Gillberg, J M; DeSouza, C M
2001-01-23
One of the perceived benefits of dual-chamber implantable cardioverter-defibrillators (ICDs) is the reduction in inappropriate therapy due to new detection algorithms. It was the purpose of the present investigation to propose methods to minimize bias during such comparisons and to report the arrhythmia detection clinical results of the PR Logic dual-chamber detection algorithm in the GEM DR ICD in the context of these methods. Between November 1997 and October 1998, 933 patients received the GEM DR ICD in this prospective multicenter study. A total of 4856 sustained arrhythmia episodes (n=311) with stored electrogram and marker channel were classified by the investigators; 3488 episodes (n=232) were ventricular tachycardia (VT)/ventricular fibrillation (VF), and 1368 episodes (n=149) were supraventricular tachycardia (SVT). The overall detection results were corrected for multiple episodes within a patient with the generalized estimating equations (GEE) method with an exchangeable correlation structure between episodes. The relative sensitivity for detection of sustained VT and/or VF was 100.0% (3488 of 3488, n=232; 95% CI 98.3% to 100%), the VT/VF positive predictivity was 88.4% uncorrected (3488 of 3945, n=278) and 78.1% corrected (95% CI 73.3% to 82.3%) with the GEE method, and the SVT positive predictivity was 100.0% (911 of 911, n=101; 95% CI 96% to 100%). A structured approach to analysis limits the bias inherent in the evaluation of tachycardia discrimination algorithms through the use of relative VT/VF sensitivity, VT/VF positive predictivity, and SVT positive predictivity along with corrections for multiple tachycardia episodes in a single patient.
Kenttä, Tuomas; Porthan, Kimmo; Tikkanen, Jani T; Väänänen, Heikki; Oikarinen, Lasse; Viitasalo, Matti; Karanko, Hannu; Laaksonen, Maarit; Huikuri, Heikki V
2015-07-01
Early repolarization (ER) is defined as an elevation of the QRS-ST junction in at least two inferior or lateral leads of the standard 12-lead electrocardiogram (ECG). Our purpose was to create an algorithm for the automated detection and classification of ER. A total of 6,047 electrocardiograms were manually graded for ER by two experienced readers. The automated detection of ER was based on quantification of the characteristic slurring or notching in ER-positive leads. The ER detection algorithm was tested and its results were compared with manual grading, which served as the reference. Readers graded 183 ECGs (3.0%) as ER positive, of which the algorithm detected 176 recordings, resulting in sensitivity of 96.2%. Of the 5,864 ER-negative recordings, the algorithm classified 5,281 as negative, resulting in 90.1% specificity. Positive and negative predictive values for the algorithm were 23.2% and 99.9%, respectively, and its accuracy was 90.2%. Inferior ER was correctly detected in 84.6% and lateral ER in 98.6% of the cases. As the automatic algorithm has high sensitivity, it could be used as a prescreening tool for ER; only the electrocardiograms graded positive by the algorithm would be reviewed manually. This would reduce the need for manual labor by 90%. © 2014 Wiley Periodicals, Inc.
Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies
Theis, Fabian J.
2017-01-01
Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464
NASA Astrophysics Data System (ADS)
Rasel, Sikdar M. M.; Chang, Hsing-Chung; Ralph, Tim; Saintilan, Neil
2015-10-01
Saltmarsh is one of the important communities of wetlands, however, due to a range of pressures, it has been declared as an EEC (Ecological Endangered Community) in Australia. In order to correctly identify different saltmarsh species, development of spectral libraries of saltmarsh species is essential to monitor this EEC. Hyperspectral remote sensing, can explore the area of wetland monitoring and mapping. The benefits of Hyperion data to wetland monitoring have been studied at Hunter Wetland Park, NSW, Australia. After exclusion of bad bands from the original data, an atmospheric correction model was applied to minimize atmospheric effect and to retrieve apparent surface reflectance for different land cover. Large data dimensionality was reduced by Forward Minimum Noise Fraction (MNF) algorithm. It was found that first 32 MNF band contains more than 80% information of the image. Pixel Purity Index (PPI) algorithm worked properly to extract pure pixel for water, builtup area and three vegetation Casuarina sp., Phragmitis sp. and green grass. The result showed it was challenging to extract extreme pure pixel for Sporobolus and Sarcocornia from the data due to coarse resolution (30 m) and small patch size (<3 m) of those vegetation on the ground . Spectral Angle Mapper, classified the image into five classes: Casuarina, Saltmarsh (Phragmitis), Green grass, Water and Builtup area with 43.55 % accuracy. This classification also failed to classify Sporobolus as a distinct group due to the same reason. A high spatial resolution airborne hyperspectral data and a new study site with a bigger patch of Sporobolus and Sarcocornia is proposed to overcome the issue.
Chastek, Benjamin J; Oleen-Burkey, Merrikay; Lopez-Bresnahan, Maria V
2010-01-01
Relapse is a common measure of disease activity in relapsing-remitting multiple sclerosis (MS). The objective of this study was to test the content validity of an operational algorithm for detecting relapse in claims data. A claims-based relapse detection algorithm was tested by comparing its detection rate over a 1-year period with relapses identified based on medical chart review. According to the algorithm, MS patients in a US healthcare claims database who had either (1) a primary claim for MS during hospitalization or (2) a corticosteroid claim following a MS-related outpatient visit were designated as having a relapse. Patient charts were examined for explicit indication of relapse or care suggestive of relapse. Positive and negative predictive values were calculated. Medical charts were reviewed for 300 MS patients, half of whom had a relapse according to the algorithm. The claims-based criteria correctly classified 67.3% of patients with relapses (positive predictive value) and 70.0% of patients without relapses (negative predictive value; kappa 0.373: p < 0.001). Alternative algorithms did not improve on the predictive value of the operational algorithm. Limitations of the algorithm include lack of differentiation between relapsing-remitting MS and other types, and that it does not incorporate measures of function and disability. The claims-based algorithm appeared to successfully detect moderate-to-severe MS relapse. This validated definition can be applied to future claims-based MS studies.
NASA Astrophysics Data System (ADS)
Liu, WenXiang; Mou, WeiHua; Wang, FeiXue
2012-03-01
As the introduction of triple-frequency signals in GNSS, the multi-frequency ionosphere correction technology has been fast developing. References indicate that the triple-frequency second order ionosphere correction is worse than the dual-frequency first order ionosphere correction because of the larger noise amplification factor. On the assumption that the variances of three frequency pseudoranges were equal, other references presented the triple-frequency first order ionosphere correction, which proved worse or better than the dual-frequency first order correction in different situations. In practice, the PN code rate, carrier-to-noise ratio, parameters of DLL and multipath effect of each frequency are not the same, so three frequency pseudorange variances are unequal. Under this consideration, a new unequal-weighted triple-frequency first order ionosphere correction algorithm, which minimizes the variance of the pseudorange ionosphere-free combination, is proposed in this paper. It is found that conventional dual-frequency first-order correction algorithms and the equal-weighted triple-frequency first order correction algorithm are special cases of the new algorithm. A new pseudorange variance estimation method based on the three carrier combination is also introduced. Theoretical analysis shows that the new algorithm is optimal. The experiment with COMPASS G3 satellite observations demonstrates that the ionosphere-free pseudorange combination variance of the new algorithm is smaller than traditional multi-frequency correction algorithms.
An efficient robust sound classification algorithm for hearing aids.
Nordqvist, Peter; Leijon, Arne
2004-06-01
An efficient robust sound classification algorithm based on hidden Markov models is presented. The system would enable a hearing aid to automatically change its behavior for differing listening environments according to the user's preferences. This work attempts to distinguish between three listening environment categories: speech in traffic noise, speech in babble, and clean speech, regardless of the signal-to-noise ratio. The classifier uses only the modulation characteristics of the signal. The classifier ignores the absolute sound pressure level and the absolute spectrum shape, resulting in an algorithm that is robust against irrelevant acoustic variations. The measured classification hit rate was 96.7%-99.5% when the classifier was tested with sounds representing one of the three environment categories included in the classifier. False-alarm rates were 0.2%-1.7% in these tests. The algorithm is robust and efficient and consumes a small amount of instructions and memory. It is fully possible to implement the classifier in a DSP-based hearing instrument.
Atmospheric Correction Algorithm for Hyperspectral Remote Sensing of Ocean Color from Space
2000-02-20
Existing atmospheric correction algorithms for multichannel remote sensing of ocean color from space were designed for retrieving water-leaving...atmospheric correction algorithm for hyperspectral remote sensing of ocean color with the near-future Coastal Ocean Imaging Spectrometer. The algorithm uses
Walking Objectively Measured: Classifying Accelerometer Data with GPS and Travel Diaries
Kang, Bumjoon; Moudon, Anne V.; Hurvitz, Philip M.; Reichley, Lucas; Saelens, Brian E.
2013-01-01
Purpose This study developed and tested an algorithm to classify accelerometer data as walking or non-walking using either GPS or travel diary data within a large sample of adults under free-living conditions. Methods Participants wore an accelerometer and a GPS unit, and concurrently completed a travel diary for 7 consecutive days. Physical activity (PA) bouts were identified using accelerometry count sequences. PA bouts were then classified as walking or non-walking based on a decision-tree algorithm consisting of 7 classification scenarios. Algorithm reliability was examined relative to two independent analysts’ classification of a 100-bout verification sample. The algorithm was then applied to the entire set of PA bouts. Results The 706 participants’ (mean age 51 years, 62% female, 80% non-Hispanic white, 70% college graduate or higher) yielded 4,702 person-days of data and had a total of 13,971 PA bouts. The algorithm showed a mean agreement of 95% with the independent analysts. It classified physical activity into 8,170 (58.5 %) walking bouts and 5,337 (38.2%) non-walking bouts; 464 (3.3%) bouts were not classified for lack of GPS and diary data. Nearly 70% of the walking bouts and 68% of the non-walking bouts were classified using only the objective accelerometer and GPS data. Travel diary data helped classify 30% of all bouts with no GPS data. The mean duration of PA bouts classified as walking was 15.2 min (SD=12.9). On average, participants had 1.7 walking bouts and 25.4 total walking minutes per day. Conclusions GPS and travel diary information can be helpful in classifying most accelerometer-derived PA bouts into walking or non-walking behavior. PMID:23439414
Omran, Dalia Abd El Hamid; Awad, AbuBakr Hussein; Mabrouk, Mahasen Abd El Rahman; Soliman, Ahmad Fouad; Aziz, Ashraf Omar Abdel
2015-01-01
Hepatocellular carcinoma (HCC) is the second most common malignancy in Egypt. Data mining is a method of predictive analysis which can explore tremendous volumes of information to discover hidden patterns and relationships. Our aim here was to develop a non-invasive algorithm for prediction of HCC. Such an algorithm should be economical, reliable, easy to apply and acceptable by domain experts. This cross-sectional study enrolled 315 patients with hepatitis C virus (HCV) related chronic liver disease (CLD); 135 HCC, 116 cirrhotic patients without HCC and 64 patients with chronic hepatitis C. Using data mining analysis, we constructed a decision tree learning algorithm to predict HCC. The decision tree algorithm was able to predict HCC with recall (sensitivity) of 83.5% and precession (specificity) of 83.3% using only routine data. The correctly classified instances were 259 (82.2%), and the incorrectly classified instances were 56 (17.8%). Out of 29 attributes, serum alpha fetoprotein (AFP), with an optimal cutoff value of ≥50.3 ng/ml was selected as the best predictor of HCC. To a lesser extent, male sex, presence of cirrhosis, AST>64U/L, and ascites were variables associated with HCC. Data mining analysis allows discovery of hidden patterns and enables the development of models to predict HCC, utilizing routine data as an alternative to CT and liver biopsy. This study has highlighted a new cutoff for AFP (≥50.3 ng/ml). Presence of a score of >2 risk variables (out of 5) can successfully predict HCC with a sensitivity of 96% and specificity of 82%.
New inverse synthetic aperture radar algorithm for translational motion compensation
NASA Astrophysics Data System (ADS)
Bocker, Richard P.; Henderson, Thomas B.; Jones, Scott A.; Frieden, B. R.
1991-10-01
Inverse synthetic aperture radar (ISAR) is an imaging technique that shows real promise in classifying airborne targets in real time under all weather conditions. Over the past few years a large body of ISAR data has been collected and considerable effort has been expended to develop algorithms to form high-resolution images from this data. One important goal of workers in this field is to develop software that will do the best job of imaging under the widest range of conditions. The success of classifying targets using ISAR is predicated upon forming highly focused radar images of these targets. Efforts to develop highly focused imaging computer software have been challenging, mainly because the imaging depends on and is affected by the motion of the target, which in general is not precisely known. Specifically, the target generally has both rotational motion about some axis and translational motion as a whole with respect to the radar. The slant-range translational motion kinematic quantities must be first accurately estimated from the data and compensated before the image can be focused. Following slant-range motion compensation, the image is further focused by determining and correcting for target rotation. The use of the burst derivative measure is proposed as a means to improve the computational efficiency of currently used ISAR algorithms. The use of this measure in motion compensation ISAR algorithms for estimating the slant-range translational motion kinematic quantities of an uncooperative target is described. Preliminary tests have been performed on simulated as well as actual ISAR data using both a Sun 4 workstation and a parallel processing transputer array. Results indicate that the burst derivative measure gives significant improvement in processing speed over the traditional entropy measure now employed.
Development of PET projection data correction algorithm
NASA Astrophysics Data System (ADS)
Bazhanov, P. V.; Kotina, E. D.
2017-12-01
Positron emission tomography is modern nuclear medicine method used in metabolism and internals functions examinations. This method allows to diagnosticate treatments on their early stages. Mathematical algorithms are widely used not only for images reconstruction but also for PET data correction. In this paper random coincidences and scatter correction algorithms implementation are considered, as well as algorithm of PET projection data acquisition modeling for corrections verification.
Retinal biometrics based on Iterative Closest Point algorithm.
Hatanaka, Yuji; Tajima, Mikiya; Kawasaki, Ryo; Saito, Koko; Ogohara, Kazunori; Muramatsu, Chisako; Sunayama, Wataru; Fujita, Hiroshi
2017-07-01
The pattern of blood vessels in the eye is unique to each person because it rarely changes over time. Therefore, it is well known that retinal blood vessels are useful for biometrics. This paper describes a biometrics method using the Jaccard similarity coefficient (JSC) based on blood vessel regions in retinal image pairs. The retinal image pairs were rough matched by the center of their optic discs. Moreover, the image pairs were aligned using the Iterative Closest Point algorithm based on detailed blood vessel skeletons. For registration, perspective transform was applied to the retinal images. Finally, the pairs were classified as either correct or incorrect using the JSC of the blood vessel region in the image pairs. The proposed method was applied to temporal retinal images, which were obtained in 2009 (695 images) and 2013 (87 images). The 87 images acquired in 2013 were all from persons already examined in 2009. The accuracy of the proposed method reached 100%.
Detection of text strings from mixed text/graphics images
NASA Astrophysics Data System (ADS)
Tsai, Chien-Hua; Papachristou, Christos A.
2000-12-01
A robust system for text strings separation from mixed text/graphics images is presented. Based on a union-find (region growing) strategy the algorithm is thus able to classify the text from graphics and adapts to changes in document type, language category (e.g., English, Chinese and Japanese), text font style and size, and text string orientation within digital images. In addition, it allows for a document skew that usually occurs in documents, without skew correction prior to discrimination while these proposed methods such a projection profile or run length coding are not always suitable for the condition. The method has been tested with a variety of printed documents from different origins with one common set of parameters, and the experimental results of the performance of the algorithm in terms of computational efficiency are demonstrated by using several tested images from the evaluation.
NASA Astrophysics Data System (ADS)
Pacheco-Vega, Arturo
2016-09-01
In this work a new set of correlation equations is developed and introduced to accurately describe the thermal performance of compact heat exchangers with possible condensation. The feasible operating conditions for the thermal system correspond to dry- surface, dropwise condensation, and film condensation. Using a prescribed form for each condition, a global regression analysis for the best-fit correlation to experimental data is carried out with a simulated annealing optimization technique. The experimental data were taken from the literature and algorithmically classified into three groups -related to the possible operating conditions- with a previously-introduced Gaussian-mixture-based methodology. Prior to their use in the analysis, the correct data classification was assessed and confirmed via artificial neural networks. Predictions from the correlations obtained for the different conditions are within the uncertainty of the experiments and substantially more accurate than those commonly used.
An efficient algorithm for automatic phase correction of NMR spectra based on entropy minimization
NASA Astrophysics Data System (ADS)
Chen, Li; Weng, Zhiqiang; Goh, LaiYoong; Garland, Marc
2002-09-01
A new algorithm for automatic phase correction of NMR spectra based on entropy minimization is proposed. The optimal zero-order and first-order phase corrections for a NMR spectrum are determined by minimizing entropy. The objective function is constructed using a Shannon-type information entropy measure. Entropy is defined as the normalized derivative of the NMR spectral data. The algorithm has been successfully applied to experimental 1H NMR spectra. The results of automatic phase correction are found to be comparable to, or perhaps better than, manual phase correction. The advantages of this automatic phase correction algorithm include its simple mathematical basis and the straightforward, reproducible, and efficient optimization procedure. The algorithm is implemented in the Matlab program ACME—Automated phase Correction based on Minimization of Entropy.
Automated validation of patient safety clinical incident classification: macro analysis.
Gupta, Jaiprakash; Patrick, Jon
2013-01-01
Patient safety is the buzz word in healthcare. Incident Information Management System (IIMS) is electronic software that stores clinical mishaps narratives in places where patients are treated. It is estimated that in one state alone over one million electronic text documents are available in IIMS. In this paper we investigate the data density available in the fields entered to notify an incident and the validity of the built in classification used by clinician to categories the incidents. Waikato Environment for Knowledge Analysis (WEKA) software was used to test the classes. Four statistical classifier based on J48, Naïve Bayes (NB), Naïve Bayes Multinominal (NBM) and Support Vector Machine using radial basis function (SVM_RBF) algorithms were used to validate the classes. The data pool was 10,000 clinical incidents drawn from 7 hospitals in one state in Australia. In first part of the study 1000 clinical incidents were selected to determine type and number of fields worth investigating and in the second part another 5448 clinical incidents were randomly selected to validate 13 clinical incident types. Result shows 74.6% of the cells were empty and only 23 fields had content over 70% of the time. The percentage correctly classified classes on four algorithms using categorical dataset ranged from 42 to 49%, using free-text datasets from 65% to 77% and using both datasets from 72% to 79%. Kappa statistic ranged from 0.36 to 0.4. for categorical data, from 0.61 to 0.74. for free-text and from 0.67 to 0.77 for both datasets. Similar increases in performance in the 3 experiments was noted on true positive rate, precision, F-measure and area under curve (AUC) of receiver operating characteristics (ROC) scores. The study demonstrates only 14 of 73 fields in IIMS have data that is usable for machine learning experiments. Irrespective of the type of algorithms used when all datasets are used performance was better. Classifier NBM showed best performance. We think the classifier can be improved further by reclassifying the most confused classes and there is scope to apply text mining tool on patient safety classifications.
Gene-Based Multiclass Cancer Diagnosis with Class-Selective Rejections
Jrad, Nisrine; Grall-Maës, Edith; Beauseroy, Pierre
2009-01-01
Supervised learning of microarray data is receiving much attention in recent years. Multiclass cancer diagnosis, based on selected gene profiles, are used as adjunct of clinical diagnosis. However, supervised diagnosis may hinder patient care, add expense or confound a result. To avoid this misleading, a multiclass cancer diagnosis with class-selective rejection is proposed. It rejects some patients from one, some, or all classes in order to ensure a higher reliability while reducing time and expense costs. Moreover, this classifier takes into account asymmetric penalties dependant on each class and on each wrong or partially correct decision. It is based on ν-1-SVM coupled with its regularization path and minimizes a general loss function defined in the class-selective rejection scheme. The state of art multiclass algorithms can be considered as a particular case of the proposed algorithm where the number of decisions is given by the classes and the loss function is defined by the Bayesian risk. Two experiments are carried out in the Bayesian and the class selective rejection frameworks. Five genes selected datasets are used to assess the performance of the proposed method. Results are discussed and accuracies are compared with those computed by the Naive Bayes, Nearest Neighbor, Linear Perceptron, Multilayer Perceptron, and Support Vector Machines classifiers. PMID:19584932
AUTOCLASSIFICATION OF THE VARIABLE 3XMM SOURCES USING THE RANDOM FOREST MACHINE LEARNING ALGORITHM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farrell, Sean A.; Murphy, Tara; Lo, Kitty K., E-mail: s.farrell@physics.usyd.edu.au
In the current era of large surveys and massive data sets, autoclassification of astrophysical sources using intelligent algorithms is becoming increasingly important. In this paper we present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog (3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample of manually classified variable sources from the second data release of the XMM-Newton catalogs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ∼92%. We also evaluated the effectiveness of identifying spurious detections using a sample of spurious sources, achieving an accuracy of ∼95%. Manual investigation of amore » random sample of classified sources confirmed these accuracy levels and showed that the Random Forest machine learning algorithm is highly effective at automatically classifying 3XMM sources. Here we present the catalog of classified 3XMM variable sources. We also present three previously unidentified unusual sources that were flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid.« less
Evaluation of hybrids algorithms for mass detection in digitalized mammograms
NASA Astrophysics Data System (ADS)
Cordero, José; Garzón Reyes, Johnson
2011-01-01
The breast cancer remains being a significant public health problem, the early detection of the lesions can increase the success possibilities of the medical treatments. The mammography is an image modality effective to early diagnosis of abnormalities, where the medical image is obtained of the mammary gland with X-rays of low radiation, this allows detect a tumor or circumscribed mass between two to three years before that it was clinically palpable, and is the only method that until now achieved reducing the mortality by breast cancer. In this paper three hybrids algorithms for circumscribed mass detection on digitalized mammograms are evaluated. In the first stage correspond to a review of the enhancement and segmentation techniques used in the processing of the mammographic images. After a shape filtering was applied to the resulting regions. By mean of a Bayesian filter the survivors regions were processed, where the characteristics vector for the classifier was constructed with few measurements. Later, the implemented algorithms were evaluated by ROC curves, where 40 images were taken for the test, 20 normal images and 20 images with circumscribed lesions. Finally, the advantages and disadvantages in the correct detection of a lesion of every algorithm are discussed.
Automated target classification in high resolution dual frequency sonar imagery
NASA Astrophysics Data System (ADS)
Aridgides, Tom; Fernández, Manuel
2007-04-01
An improved computer-aided-detection / computer-aided-classification (CAD/CAC) processing string has been developed. The classified objects of 2 distinct strings are fused using the classification confidence values and their expansions as features, and using "summing" or log-likelihood-ratio-test (LLRT) based fusion rules. The utility of the overall processing strings and their fusion was demonstrated with new high-resolution dual frequency sonar imagery. Three significant fusion algorithm improvements were made. First, a nonlinear 2nd order (Volterra) feature LLRT fusion algorithm was developed. Second, a Box-Cox nonlinear feature LLRT fusion algorithm was developed. The Box-Cox transformation consists of raising the features to a to-be-determined power. Third, a repeated application of a subset feature selection / feature orthogonalization / Volterra feature LLRT fusion block was utilized. It was shown that cascaded Volterra feature LLRT fusion of the CAD/CAC processing strings outperforms summing, baseline single-stage Volterra and Box-Cox feature LLRT algorithms, yielding significant improvements over the best single CAD/CAC processing string results, and providing the capability to correctly call the majority of targets while maintaining a very low false alarm rate. Additionally, the robustness of cascaded Volterra feature fusion was demonstrated, by showing that the algorithm yields similar performance with the training and test sets.
Dobson, Ruaraidh; Semple, Sean
2018-06-18
Second-hand smoke (SHS) at home is a target for public health interventions, such as air quality feedback interventions using low-cost particle monitors. However, these monitors also detect fine particles generated from non-SHS sources. The Dylos DC1700 reports particle counts in the coarse and fine size ranges. As tobacco smoke produces far more fine particles than coarse ones, and tobacco is generally the greatest source of particulate pollution in a smoking home, the ratio of coarse to fine particles may provide a useful method to identify the presence of SHS in homes. An algorithm was developed to differentiate smoking from smoke-free homes. Particle concentration data from 116 smoking homes and 25 non-smoking homes were used to test this algorithm. The algorithm correctly classified the smoking status of 135 of the 141 homes (96%), comparing favourably with a test of mean mass concentration. Applying this algorithm to Dylos particle count measurements may help identify the presence of SHS in homes or other indoor environments. Future research should adapt it to detect individual smoking periods within a 24 h or longer measurement period. Copyright © 2018 Elsevier Inc. All rights reserved.
Prosthetic joint infection development of an evidence-based diagnostic algorithm.
Mühlhofer, Heinrich M L; Pohlig, Florian; Kanz, Karl-Georg; Lenze, Ulrich; Lenze, Florian; Toepfer, Andreas; Kelch, Sarah; Harrasser, Norbert; von Eisenhart-Rothe, Rüdiger; Schauwecker, Johannes
2017-03-09
Increasing rates of prosthetic joint infection (PJI) have presented challenges for general practitioners, orthopedic surgeons and the health care system in the recent years. The diagnosis of PJI is complex; multiple diagnostic tools are used in the attempt to correctly diagnose PJI. Evidence-based algorithms can help to identify PJI using standardized diagnostic steps. We reviewed relevant publications between 1990 and 2015 using a systematic literature search in MEDLINE and PUBMED. The selected search results were then classified into levels of evidence. The keywords were prosthetic joint infection, biofilm, diagnosis, sonication, antibiotic treatment, implant-associated infection, Staph. aureus, rifampicin, implant retention, pcr, maldi-tof, serology, synovial fluid, c-reactive protein level, total hip arthroplasty (THA), total knee arthroplasty (TKA) and combinations of these terms. From an initial 768 publications, 156 publications were stringently reviewed. Publications with class I-III recommendations (EAST) were considered. We developed an algorithm for the diagnostic approach to display the complex diagnosis of PJI in a clear and logically structured process according to ISO 5807. The evidence-based standardized algorithm combines modern clinical requirements and evidence-based treatment principles. The algorithm provides a detailed transparent standard operating procedure (SOP) for diagnosing PJI. Thus, consistently high, examiner-independent process quality is assured to meet the demands of modern quality management in PJI diagnosis.
Hip and Wrist Accelerometer Algorithms for Free-Living Behavior Classification.
Ellis, Katherine; Kerr, Jacqueline; Godbole, Suneeta; Staudenmayer, John; Lanckriet, Gert
2016-05-01
Accelerometers are a valuable tool for objective measurement of physical activity (PA). Wrist-worn devices may improve compliance over standard hip placement, but more research is needed to evaluate their validity for measuring PA in free-living settings. Traditional cut-point methods for accelerometers can be inaccurate and need testing in free living with wrist-worn devices. In this study, we developed and tested the performance of machine learning (ML) algorithms for classifying PA types from both hip and wrist accelerometer data. Forty overweight or obese women (mean age = 55.2 ± 15.3 yr; BMI = 32.0 ± 3.7) wore two ActiGraph GT3X+ accelerometers (right hip, nondominant wrist; ActiGraph, Pensacola, FL) for seven free-living days. Wearable cameras captured ground truth activity labels. A classifier consisting of a random forest and hidden Markov model classified the accelerometer data into four activities (sitting, standing, walking/running, and riding in a vehicle). Free-living wrist and hip ML classifiers were compared with each other, with traditional accelerometer cut points, and with an algorithm developed in a laboratory setting. The ML classifier obtained average values of 89.4% and 84.6% balanced accuracy over the four activities using the hip and wrist accelerometer, respectively. In our data set with average values of 28.4 min of walking or running per day, the ML classifier predicted average values of 28.5 and 24.5 min of walking or running using the hip and wrist accelerometer, respectively. Intensity-based cut points and the laboratory algorithm significantly underestimated walking minutes. Our results demonstrate the superior performance of our PA-type classification algorithm, particularly in comparison with traditional cut points. Although the hip algorithm performed better, additional compliance achieved with wrist devices might justify using a slightly lower performing algorithm.
Hatch, Kenneth D.
2012-01-01
Abstract. With no sufficient screening test for ovarian cancer, a method to evaluate the ovarian disease state quickly and nondestructively is needed. The authors have applied a wide-field spectral imager to freshly resected ovaries of 30 human patients in a study believed to be the first of its magnitude. Endogenous fluorescence was excited with 365-nm light and imaged in eight emission bands collectively covering the 400- to 640-nm range. Linear discriminant analysis was used to classify all image pixels and generate diagnostic maps of the ovaries. Training the classifier with previously collected single-point autofluorescence measurements of a spectroscopic probe enabled this novel classification. The process by which probe-collected spectra were transformed for comparison with imager spectra is described. Sensitivity of 100% and specificity of 51% were obtained in classifying normal and cancerous ovaries using autofluorescence data alone. Specificity increased to 69% when autofluorescence data were divided by green reflectance data to correct for spatial variation in tissue absorption properties. Benign neoplasm ovaries were also found to classify as nonmalignant using the same algorithm. Although applied ex vivo, the method described here appears useful for quick assessment of cancer presence in the human ovary. PMID:22502561
X-ray agricultural product inspection: segmentation and classification
NASA Astrophysics Data System (ADS)
Casasent, David P.; Talukder, Ashit; Lee, Ha-Woon
1997-09-01
Processing of real-time x-ray images of randomly oriented and touching pistachio nuts for product inspection is considered. We describe the image processing used to isolate individual nuts (segmentation). This involves a new watershed transform algorithm. Segmentation results on approximately 3000 x-ray (film) and real time x-ray (linescan) nut images were excellent (greater than 99.9% correct). Initial classification results on film images are presented that indicate that the percentage of infested nuts can be reduced to 1.6% of the crop with only 2% of the good nuts rejected; this performance is much better than present manual methods and other automated classifiers have achieved.
Artificial Neural Network applied to lightning flashes
NASA Astrophysics Data System (ADS)
Gin, R. B.; Guedes, D.; Bianchi, R.
2013-05-01
The development of video cameras enabled cientists to study lightning discharges comportment with more precision. The main goal of this project is to create a system able to detect images of lightning discharges stored in videos and classify them using an Artificial Neural Network (ANN)using C Language and OpenCV libraries. The developed system, can be split in two different modules: detection module and classification module. The detection module uses OpenCV`s computer vision libraries and image processing techniques to detect if there are significant differences between frames in a sequence, indicating that something, still not classified, occurred. Whenever there is a significant difference between two consecutive frames, two main algorithms are used to analyze the frame image: brightness and shape algorithms. These algorithms detect both shape and brightness of the event, removing irrelevant events like birds, as well as detecting the relevant events exact position, allowing the system to track it over time. The classification module uses a neural network to classify the relevant events as horizontal or vertical lightning, save the event`s images and calculates his number of discharges. The Neural Network was implemented using the backpropagation algorithm, and was trained with 42 training images , containing 57 lightning events (one image can have more than one lightning). TheANN was tested with one to five hidden layers, with up to 50 neurons each. The best configuration achieved a success rate of 95%, with one layer containing 20 neurons (33 test images with 42 events were used in this phase). This configuration was implemented in the developed system to analyze 20 video files, containing 63 lightning discharges previously manually detected. Results showed that all the lightning discharges were detected, many irrelevant events were unconsidered, and the event's number of discharges was correctly computed. The neural network used in this project achieved a success rate of 90%. The videos used in this experiment were acquired by seven video cameras installed in São Bernardo do Campo, Brazil, that continuously recorded lightning events during the summer. The cameras were disposed in a 360 loop, recording all data at a time resolution of 33ms. During this period, several convective storms were recorded.
Nonuniformity correction for an infrared focal plane array based on diamond search block matching.
Sheng-Hui, Rong; Hui-Xin, Zhou; Han-Lin, Qin; Rui, Lai; Kun, Qian
2016-05-01
In scene-based nonuniformity correction algorithms, artificial ghosting and image blurring degrade the correction quality severely. In this paper, an improved algorithm based on the diamond search block matching algorithm and the adaptive learning rate is proposed. First, accurate transform pairs between two adjacent frames are estimated by the diamond search block matching algorithm. Then, based on the error between the corresponding transform pairs, the gradient descent algorithm is applied to update correction parameters. During the process of gradient descent, the local standard deviation and a threshold are utilized to control the learning rate to avoid the accumulation of matching error. Finally, the nonuniformity correction would be realized by a linear model with updated correction parameters. The performance of the proposed algorithm is thoroughly studied with four real infrared image sequences. Experimental results indicate that the proposed algorithm can reduce the nonuniformity with less ghosting artifacts in moving areas and can also overcome the problem of image blurring in static areas.
Optimal two-stage enrichment design correcting for biomarker misclassification.
Zang, Yong; Guo, Beibei
2018-01-01
The enrichment design is an important clinical trial design to detect the treatment effect of the molecularly targeted agent (MTA) in personalized medicine. Under this design, patients are stratified into marker-positive and marker-negative subgroups based on their biomarker statuses and only the marker-positive patients are enrolled into the trial and randomized to receive either the MTA or a standard treatment. As the biomarker plays a key role in determining the enrollment of the trial, a misclassification of the biomarker can induce substantial bias, undermine the integrity of the trial, and seriously affect the treatment evaluation. In this paper, we propose a two-stage optimal enrichment design that utilizes the surrogate marker to correct for the biomarker misclassification. The proposed design is optimal in the sense that it maximizes the probability of correctly classifying each patient's biomarker status based on the surrogate marker information. In addition, after analytically deriving the bias caused by the biomarker misclassification, we develop a likelihood ratio test based on the EM algorithm to correct for such bias. We conduct comprehensive simulation studies to investigate the operating characteristics of the optimal design and the results confirm the desirable performance of the proposed design.
Robust feature extraction for rapid classification of damage in composites
NASA Astrophysics Data System (ADS)
Coelho, Clyde K.; Reynolds, Whitney; Chattopadhyay, Aditi
2009-03-01
The ability to detect anomalies in signals from sensors is imperative for structural health monitoring (SHM) applications. Many of the candidate algorithms for these applications either require a lot of training examples or are very computationally inefficient for large sample sizes. The damage detection framework presented in this paper uses a combination of Linear Discriminant Analysis (LDA) along with Support Vector Machines (SVM) to obtain a computationally efficient classification scheme for rapid damage state determination. LDA was used for feature extraction of damage signals from piezoelectric sensors on a composite plate and these features were used to train the SVM algorithm in parts, reducing the computational intensity associated with the quadratic optimization problem that needs to be solved during training. SVM classifiers were organized into a binary tree structure to speed up classification, which also reduces the total training time required. This framework was validated on composite plates that were impacted at various locations. The results show that the algorithm was able to correctly predict the different impact damage cases in composite laminates using less than 21 percent of the total available training data after data reduction.
Testing and Validating Machine Learning Classifiers by Metamorphic Testing☆
Xie, Xiaoyuan; Ho, Joshua W. K.; Murphy, Christian; Kaiser, Gail; Xu, Baowen; Chen, Tsong Yueh
2011-01-01
Machine Learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no “test oracle” to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique “metamorphic testing”, which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program. PMID:21532969
Pion-Massicotte, Joëlle; Godbout, Roger; Savard, Pierre; Roy, Jean-François
2018-02-23
Portable polysomnography is often too complex and encumbering for recording sleep at home. We recorded sleep using a biometric shirt (electrocardiogram sensors, respiratory inductance plethysmography bands and an accelerometer) in 21 healthy young adults recorded in a sleep laboratory for two consecutive nights, together with standard polysomnography. Polysomnographic recordings were scored using standard methods. An algorithm was developed to classify the biometric shirt recordings into rapid eye movement sleep, non-rapid eye movement sleep and wake. The algorithm was based on breathing rate and heart rate variability, body movement, and included a correction for sleep onset and offset. The overall mean percentage of agreement between the two sets of recordings was 77.4%; when non-rapid eye movement and rapid eye movement sleep epochs were grouped together, it increased to 90.8%. The overall kappa coefficient was 0.53. Five of the seven sleep variables were significantly correlated. The findings of this pilot study indicate that this simple portable system could be used to estimate the general sleep pattern of young healthy adults. © 2018 European Sleep Research Society.
Belchansky, Gennady I.; Douglas, David C.
2000-01-01
This paper presents methods for classifying Arctic sea ice using both passive and active (2-channel) microwave imagery acquired by the Russian OKEAN 01 polar-orbiting satellite series. Methods and results are compared to sea ice classifications derived from nearly coincident Special Sensor Microwave Imager (SSM/I) and Advanced Very High Resolution Radiometer (AVHRR) image data of the Barents, Kara, and Laptev Seas. The Russian OKEAN 01 satellite data were collected over weekly intervals during October 1995 through December 1997. Methods are presented for calibrating, georeferencing and classifying the raw active radar and passive microwave OKEAN 01 data, and for correcting the OKEAN 01 microwave radiometer calibration wedge based on concurrent 37 GHz horizontal polarization SSM/I brightness temperature data. Sea ice type and ice concentration algorithms utilized OKEAN's two-channel radar and passive microwave data in a linear mixture model based on the measured values of brightness temperature and radar backscatter, together with a priori knowledge about the scattering parameters and natural emissivities of basic sea ice types. OKEAN 01 data and algorithms tended to classify lower concentrations of young or first-year sea ice when concentrations were less than 60%, and to produce higher concentrations of multi-year sea ice when concentrations were greater than 40%, when compared to estimates produced from SSM/I data. Overall, total sea ice concentration maps derived independently from OKEAN 01, SSM/I, and AVHRR satellite imagery were all highly correlated, with uniform biases, and mean differences in total ice concentration of less than four percent (sd<15%).
Code-based Diagnostic Algorithms for Idiopathic Pulmonary Fibrosis. Case Validation and Improvement.
Ley, Brett; Urbania, Thomas; Husson, Gail; Vittinghoff, Eric; Brush, David R; Eisner, Mark D; Iribarren, Carlos; Collard, Harold R
2017-06-01
Population-based studies of idiopathic pulmonary fibrosis (IPF) in the United States have been limited by reliance on diagnostic code-based algorithms that lack clinical validation. To validate a well-accepted International Classification of Diseases, Ninth Revision, code-based algorithm for IPF using patient-level information and to develop a modified algorithm for IPF with enhanced predictive value. The traditional IPF algorithm was used to identify potential cases of IPF in the Kaiser Permanente Northern California adult population from 2000 to 2014. Incidence and prevalence were determined overall and by age, sex, and race/ethnicity. A validation subset of cases (n = 150) underwent expert medical record and chest computed tomography review. A modified IPF algorithm was then derived and validated to optimize positive predictive value. From 2000 to 2014, the traditional IPF algorithm identified 2,608 cases among 5,389,627 at-risk adults in the Kaiser Permanente Northern California population. Annual incidence was 6.8/100,000 person-years (95% confidence interval [CI], 6.1-7.7) and was higher in patients with older age, male sex, and white race. The positive predictive value of the IPF algorithm was only 42.2% (95% CI, 30.6 to 54.6%); sensitivity was 55.6% (95% CI, 21.2 to 86.3%). The corrected incidence was estimated at 5.6/100,000 person-years (95% CI, 2.6-10.3). A modified IPF algorithm had improved positive predictive value but reduced sensitivity compared with the traditional algorithm. A well-accepted International Classification of Diseases, Ninth Revision, code-based IPF algorithm performs poorly, falsely classifying many non-IPF cases as IPF and missing a substantial proportion of IPF cases. A modification of the IPF algorithm may be useful for future population-based studies of IPF.
NASA Astrophysics Data System (ADS)
Verma, Sneha K.; Chun, Sophia; Liu, Brent J.
2014-03-01
Pain is a common complication after spinal cord injury with prevalence estimates ranging 77% to 81%, which highly affects a patient's lifestyle and well-being. In the current clinical setting paper-based forms are used to classify pain correctly, however, the accuracy of diagnoses and optimal management of pain largely depend on the expert reviewer, which in many cases is not possible because of very few experts in this field. The need for a clinical decision support system that can be used by expert and non-expert clinicians has been cited in literature, but such a system has not been developed. We have designed and developed a stand-alone tool for correctly classifying pain type in spinal cord injury (SCI) patients, using Bayesian decision theory. Various machine learning simulation methods are used to verify the algorithm using a pilot study data set, which consists of 48 patients data set. The data set consists of the paper-based forms, collected at Long Beach VA clinic with pain classification done by expert in the field. Using the WEKA as the machine learning tool we have tested on the 48 patient dataset that the hypothesis that attributes collected on the forms and the pain location marked by patients have very significant impact on the pain type classification. This tool will be integrated with an imaging informatics system to support a clinical study that will test the effectiveness of using Proton Beam radiotherapy for treating spinal cord injury (SCI) related neuropathic pain as an alternative to invasive surgical lesioning.
Multiclassifier system with hybrid learning applied to the control of bioprosthetic hand.
Kurzynski, Marek; Krysmann, Maciej; Trajdos, Pawel; Wolczowski, Andrzej
2016-02-01
In this paper the problem of recognition of the intended hand movements for the control of bio-prosthetic hand is addressed. The proposed method is based on recognition of electromiographic (EMG) and mechanomiographic (MMG) biosignals using a multiclassifier system (MCS) working in a two-level structure with a dynamic ensemble selection (DES) scheme and original concepts of competence function. Additionally, feedback information coming from bioprosthesis sensors on the correct/incorrect classification is applied to the adjustment of the combining mechanism during MCS operation through adaptive tuning competences of base classifiers depending on their decisions. Three MCS systems operating in decision tree structure and with different tuning algorithms are developed. In the MCS1 system, competence is uniformly allocated to each class belonging to the group indicated by the feedback signal. In the MCS2 system, the modification of competence depends on the node of decision tree at which a correct/incorrect classification is made. In the MCS3 system, the randomized model of classifier and the concept of cross-competence are used in the tuning procedure. Experimental investigations on the real data and computer-simulated procedure of generating feedback signals are performed. In these investigations classification accuracy of the MCS systems developed is compared and furthermore, the MCS systems are evaluated with respect to the effectiveness of the procedure of tuning competence. The results obtained indicate that modification of competence of base classifiers during the working phase essentially improves performance of the MCS system and that this improvement depends on the MCS system and tuning method used. Copyright © 2015 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Beretvas, S. Natasha; Murphy, Daniel L.
2013-01-01
The authors assessed correct model identification rates of Akaike's information criterion (AIC), corrected criterion (AICC), consistent AIC (CAIC), Hannon and Quinn's information criterion (HQIC), and Bayesian information criterion (BIC) for selecting among cross-classified random effects models. Performance of default values for the 5…
Efficient Fingercode Classification
NASA Astrophysics Data System (ADS)
Sun, Hong-Wei; Law, Kwok-Yan; Gollmann, Dieter; Chung, Siu-Leung; Li, Jian-Bin; Sun, Jia-Guang
In this paper, we present an efficient fingerprint classification algorithm which is an essential component in many critical security application systems e. g. systems in the e-government and e-finance domains. Fingerprint identification is one of the most important security requirements in homeland security systems such as personnel screening and anti-money laundering. The problem of fingerprint identification involves searching (matching) the fingerprint of a person against each of the fingerprints of all registered persons. To enhance performance and reliability, a common approach is to reduce the search space by firstly classifying the fingerprints and then performing the search in the respective class. Jain et al. proposed a fingerprint classification algorithm based on a two-stage classifier, which uses a K-nearest neighbor classifier in its first stage. The fingerprint classification algorithm is based on the fingercode representation which is an encoding of fingerprints that has been demonstrated to be an effective fingerprint biometric scheme because of its ability to capture both local and global details in a fingerprint image. We enhance this approach by improving the efficiency of the K-nearest neighbor classifier for fingercode-based fingerprint classification. Our research firstly investigates the various fast search algorithms in vector quantization (VQ) and the potential application in fingerprint classification, and then proposes two efficient algorithms based on the pyramid-based search algorithms in VQ. Experimental results on DB1 of FVC 2004 demonstrate that our algorithms can outperform the full search algorithm and the original pyramid-based search algorithms in terms of computational efficiency without sacrificing accuracy.
Murphy, Malia S Q; Hawken, Steven; Atkinson, Katherine M; Milburn, Jennifer; Pervin, Jesmin; Gravett, Courtney; Stringer, Jeffrey S A; Rahman, Anisur; Lackritz, Eve; Chakraborty, Pranesh; Wilson, Kumanan
2017-01-01
Background Knowledge of gestational age (GA) is critical for guiding neonatal care and quantifying regional burdens of preterm birth. In settings where access to ultrasound dating is limited, postnatal estimates are frequently used despite the issues of accuracy associated with postnatal approaches. Newborn metabolic profiles are known to vary by severity of preterm birth. Recent work by our group and others has highlighted the accuracy of postnatal GA estimation algorithms derived from routinely collected newborn screening profiles. This protocol outlines the validation of a GA model originally developed in a North American cohort among international newborn cohorts. Methods Our primary objective is to use blood spot samples collected from infants born in Zambia and Bangladesh to evaluate our algorithm’s capacity to correctly classify GA within 1, 2, 3 and 4 weeks. Secondary objectives are to 1) determine the algorithm's accuracy in small-for-gestational-age and large-for-gestational-age infants, 2) determine its ability to correctly discriminate GA of newborns across dichotomous thresholds of preterm birth (≤34 weeks, <37 weeks GA) and 3) compare the relative performance of algorithms derived from newborn screening panels including all available analytes and those restricted to analyte subsets. The study population will consist of infants born to mothers already enrolled in one of two preterm birth cohorts in Lusaka, Zambia, and Matlab, Bangladesh. Dried blood spot samples will be collected and sent for analysis in Ontario, Canada, for model validation. Discussion This study will determine the validity of a GA estimation algorithm across ethnically diverse infant populations and assess population specific variations in newborn metabolic profiles. PMID:29104765
Recognition of pornographic web pages by classifying texts and images.
Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve
2007-06-01
With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.
An assessment of support vector machines for land cover classification
Huang, C.; Davis, L.S.; Townshend, J.R.G.
2002-01-01
The support vector machine (SVM) is a group of theoretically superior machine learning algorithms. It was found competitive with the best available machine learning algorithms in classifying high-dimensional data sets. This paper gives an introduction to the theoretical development of the SVM and an experimental evaluation of its accuracy, stability and training speed in deriving land cover classifications from satellite images. The SVM was compared to three other popular classifiers, including the maximum likelihood classifier (MLC), neural network classifiers (NNC) and decision tree classifiers (DTC). The impacts of kernel configuration on the performance of the SVM and of the selection of training data and input variables on the four classifiers were also evaluated in this experiment.
NASA Astrophysics Data System (ADS)
Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y.; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian
2016-09-01
We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.
NASA Astrophysics Data System (ADS)
Niu, Chaojun; Han, Xiang'e.
2015-10-01
Adaptive optics (AO) technology is an effective way to alleviate the effect of turbulence on free space optical communication (FSO). A new adaptive compensation method can be used without a wave-front sensor. Artificial bee colony algorithm (ABC) is a population-based heuristic evolutionary algorithm inspired by the intelligent foraging behaviour of the honeybee swarm with the advantage of simple, good convergence rate, robust and less parameter setting. In this paper, we simulate the application of the improved ABC to correct the distorted wavefront and proved its effectiveness. Then we simulate the application of ABC algorithm, differential evolution (DE) algorithm and stochastic parallel gradient descent (SPGD) algorithm to the FSO system and analyze the wavefront correction capabilities by comparison of the coupling efficiency, the error rate and the intensity fluctuation in different turbulence before and after the correction. The results show that the ABC algorithm has much faster correction speed than DE algorithm and better correct ability for strong turbulence than SPGD algorithm. Intensity fluctuation can be effectively reduced in strong turbulence, but not so effective in week turbulence.
Klancar, Gregor; Kristan, Matej; Kovacic, Stanislav; Orqueda, Omar
2004-07-01
In this paper a global vision scheme for estimation of positions and orientations of mobile robots is presented. It is applied to robot soccer application which is a fast dynamic game and therefore needs an efficient and robust vision system implemented. General applicability of the vision system can be found in other robot applications such as mobile transport robots in production, warehouses, attendant robots, fast vision tracking of targets of interest and entertainment robotics. Basic operation of the vision system is divided into two steps. In the first, the incoming image is scanned and pixels are classified into a finite number of classes. At the same time, a segmentation algorithm is used to find corresponding regions belonging to one of the classes. In the second step, all the regions are examined. Selection of the ones that are a part of the observed object is made by means of simple logic procedures. The novelty is focused on optimization of the processing time needed to finish the estimation of possible object positions. Better results of the vision system are achieved by implementing camera calibration and shading correction algorithm. The former corrects camera lens distortion, while the latter increases robustness to irregular illumination conditions.
Data Compression Techniques for Maps
1989-01-01
Lempel - Ziv compression is applied to the classified and unclassified images as also to the output of the compression algorithms . The algorithms ...resulted in a compression of 7:1. The output of the quadtree coding algorithm was then compressed using Lempel - Ziv coding. The compression ratio achieved...using Lempel - Ziv coding. The unclassified image gave a compression ratio of only 1.4:1. The K means classified image
Classification of neocortical interneurons using affinity propagation.
Santana, Roberto; McGarry, Laura M; Bielza, Concha; Larrañaga, Pedro; Yuste, Rafael
2013-01-01
In spite of over a century of research on cortical circuits, it is still unknown how many classes of cortical neurons exist. In fact, neuronal classification is a difficult problem because it is unclear how to designate a neuronal cell class and what are the best characteristics to define them. Recently, unsupervised classifications using cluster analysis based on morphological, physiological, or molecular characteristics, have provided quantitative and unbiased identification of distinct neuronal subtypes, when applied to selected datasets. However, better and more robust classification methods are needed for increasingly complex and larger datasets. Here, we explored the use of affinity propagation, a recently developed unsupervised classification algorithm imported from machine learning, which gives a representative example or exemplar for each cluster. As a case study, we applied affinity propagation to a test dataset of 337 interneurons belonging to four subtypes, previously identified based on morphological and physiological characteristics. We found that affinity propagation correctly classified most of the neurons in a blind, non-supervised manner. Affinity propagation outperformed Ward's method, a current standard clustering approach, in classifying the neurons into 4 subtypes. Affinity propagation could therefore be used in future studies to validly classify neurons, as a first step to help reverse engineer neural circuits.
Banzato, T; Cherubini, G B; Atzori, M; Zotti, A
2018-05-01
An established deep neural network (DNN) based on transfer learning and a newly designed DNN were tested to predict the grade of meningiomas from magnetic resonance (MR) images in dogs and to determine the accuracy of classification of using pre- and post-contrast T1-weighted (T1W), and T2-weighted (T2W) MR images. The images were randomly assigned to a training set, a validation set and a test set, comprising 60%, 10% and 30% of images, respectively. The combination of DNN and MR sequence displaying the highest discriminating accuracy was used to develop an image classifier to predict the grading of new cases. The algorithm based on transfer learning using the established DNN did not provide satisfactory results, whereas the newly designed DNN had high classification accuracy. On the basis of classification accuracy, an image classifier built on the newly designed DNN using post-contrast T1W images was developed. This image classifier correctly predicted the grading of 8 out of 10 images not included in the data set. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Textual and visual content-based anti-phishing: a Bayesian approach.
Zhang, Haijun; Liu, Gang; Chow, Tommy W S; Liu, Wenyin
2011-10-01
A novel framework using a Bayesian approach for content-based phishing web page detection is presented. Our model takes into account textual and visual contents to measure the similarity between the protected web page and suspicious web pages. A text classifier, an image classifier, and an algorithm fusing the results from classifiers are introduced. An outstanding feature of this paper is the exploration of a Bayesian model to estimate the matching threshold. This is required in the classifier for determining the class of the web page and identifying whether the web page is phishing or not. In the text classifier, the naive Bayes rule is used to calculate the probability that a web page is phishing. In the image classifier, the earth mover's distance is employed to measure the visual similarity, and our Bayesian model is designed to determine the threshold. In the data fusion algorithm, the Bayes theory is used to synthesize the classification results from textual and visual content. The effectiveness of our proposed approach was examined in a large-scale dataset collected from real phishing cases. Experimental results demonstrated that the text classifier and the image classifier we designed deliver promising results, the fusion algorithm outperforms either of the individual classifiers, and our model can be adapted to different phishing cases. © 2011 IEEE
Moteghaed, Niloofar Yousefi; Maghooli, Keivan; Garshasbi, Masoud
2018-01-01
Background: Gene expression data are characteristically high dimensional with a small sample size in contrast to the feature size and variability inherent in biological processes that contribute to difficulties in analysis. Selection of highly discriminative features decreases the computational cost and complexity of the classifier and improves its reliability for prediction of a new class of samples. Methods: The present study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy support vector machine (SVM) as the classifier. Fuzzy logic is used to infer the importance of each sample in the training phase and decrease the outlier sensitivity of the system to increase the ability to generalize the classifier. A decision-tree algorithm was applied to the most frequent genes to develop a set of rules for each type of cancer. This improved the abilities of the algorithm by finding the best parameters for the classifier during the training phase without the need for trial-and-error by the user. The proposed approach was tested on four benchmark gene expression profiles. Results: Good results have been demonstrated for the proposed algorithm. The classification accuracy for leukemia data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the best kernel used in training the SVM classifier is the radial basis function. Conclusions: The experimental results show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most informative gene subset, and improve classification accuracy using the optimal parameters of the classifier with no user interface. PMID:29535919
Yook, Sunhyun; Nam, Kyoung Won; Kim, Heepyung; Hong, Sung Hwa; Jang, Dong Pyo; Kim, In Young
2015-04-01
In order to provide more consistent sound intelligibility for the hearing-impaired person, regardless of environment, it is necessary to adjust the setting of the hearing-support (HS) device to accommodate various environmental circumstances. In this study, a fully automatic HS device management algorithm that can adapt to various environmental situations is proposed; it is composed of a listening-situation classifier, a noise-type classifier, an adaptive noise-reduction algorithm, and a management algorithm that can selectively turn on/off one or more of the three basic algorithms-beamforming, noise-reduction, and feedback cancellation-and can also adjust internal gains and parameters of the wide-dynamic-range compression (WDRC) and noise-reduction (NR) algorithms in accordance with variations in environmental situations. Experimental results demonstrated that the implemented algorithms can classify both listening situation and ambient noise type situations with high accuracies (92.8-96.4% and 90.9-99.4%, respectively), and the gains and parameters of the WDRC and NR algorithms were successfully adjusted according to variations in environmental situation. The average values of signal-to-noise ratio (SNR), frequency-weighted segmental SNR, Perceptual Evaluation of Speech Quality, and mean opinion test scores of 10 normal-hearing volunteers of the adaptive multiband spectral subtraction (MBSS) algorithm were improved by 1.74 dB, 2.11 dB, 0.49, and 0.68, respectively, compared to the conventional fixed-parameter MBSS algorithm. These results indicate that the proposed environment-adaptive management algorithm can be applied to HS devices to improve sound intelligibility for hearing-impaired individuals in various acoustic environments. Copyright © 2014 International Center for Artificial Organs and Transplantation and Wiley Periodicals, Inc.
Method of Menu Selection by Gaze Movement Using AC EOG Signals
NASA Astrophysics Data System (ADS)
Kanoh, Shin'ichiro; Futami, Ryoko; Yoshinobu, Tatsuo; Hoshimiya, Nozomu
A method to detect the direction and the distance of voluntary eye gaze movement from EOG (electrooculogram) signals was proposed and tested. In this method, AC-amplified vertical and horizontal transient EOG signals were classified into 8-class directions and 2-class distances of voluntary eye gaze movements. A horizontal and a vertical EOGs during eye gaze movement at each sampling time were treated as a two-dimensional vector, and the center of gravity of the sample vectors whose norms were more than 80% of the maximum norm was used as a feature vector to be classified. By the classification using the k-nearest neighbor algorithm, it was shown that the averaged correct detection rates on each subject were 98.9%, 98.7%, 94.4%, respectively. This method can avoid strict EOG-based eye tracking which requires DC amplification of very small signal. It would be useful to develop robust human interfacing systems based on menu selection for severely paralyzed patients.
Single-cultivar extra virgin olive oil classification using a potentiometric electronic tongue.
Dias, Luís G; Fernandes, Andreia; Veloso, Ana C A; Machado, Adélio A S C; Pereira, José A; Peres, António M
2014-10-01
Label authentication of monovarietal extra virgin olive oils is of great importance. A novel approach based on a potentiometric electronic tongue is proposed to classify oils obtained from single olive cultivars (Portuguese cvs. Cobrançosa, Madural, Verdeal Transmontana; Spanish cvs. Arbequina, Hojiblanca, Picual). A meta-heuristic simulated annealing algorithm was applied to select the most informative sets of sensors to establish predictive linear discriminant models. Olive oils were correctly classified according to olive cultivar (sensitivities greater than 97%) and each Spanish olive oil was satisfactorily discriminated from the Portuguese ones with the exception of cv. Arbequina (sensitivities from 61% to 98%). Also, the discriminant ability was related to the polar compounds contents of olive oils and so, indirectly, with organoleptic properties like bitterness, astringency or pungency. Therefore the proposed E-tongue can be foreseen as a useful auxiliary tool for trained sensory panels for the classification of monovarietal extra virgin olive oils. Copyright © 2014 Elsevier Ltd. All rights reserved.
Wang, Jinjia; Liu, Yuan
2015-04-01
This paper presents a feature extraction method based on multivariate empirical mode decomposition (MEMD) combining with the power spectrum feature, and the method aims at the non-stationary electroencephalogram (EEG) or magnetoencephalogram (MEG) signal in brain-computer interface (BCI) system. Firstly, we utilized MEMD algorithm to decompose multichannel brain signals into a series of multiple intrinsic mode function (IMF), which was proximate stationary and with multi-scale. Then we extracted and reduced the power characteristic from each IMF to a lower dimensions using principal component analysis (PCA). Finally, we classified the motor imagery tasks by linear discriminant analysis classifier. The experimental verification showed that the correct recognition rates of the two-class and four-class tasks of the BCI competition III and competition IV reached 92.0% and 46.2%, respectively, which were superior to the winner of the BCI competition. The experimental proved that the proposed method was reasonably effective and stable and it would provide a new way for feature extraction.
Neural system for heartbeats recognition using genetically integrated ensemble of classifiers.
Osowski, Stanislaw; Siwek, Krzysztof; Siroic, Robert
2011-03-01
This paper presents the application of genetic algorithm for the integration of neural classifiers combined in the ensemble for the accurate recognition of heartbeat types on the basis of ECG registration. The idea presented in this paper is that using many classifiers arranged in the form of ensemble leads to the increased accuracy of the recognition. In such ensemble the important problem is the integration of all classifiers into one effective classification system. This paper proposes the use of genetic algorithm. It was shown that application of the genetic algorithm is very efficient and allows to reduce significantly the total error of heartbeat recognition. This was confirmed by the numerical experiments performed on the MIT BIH Arrhythmia Database. Copyright © 2011 Elsevier Ltd. All rights reserved.
Optimization of Support Vector Machine (SVM) for Object Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew; Dhingra, Neil; Lu, Thomas T.; Chao, Tien-Hsin
2012-01-01
The Support Vector Machine (SVM) is a powerful algorithm, useful in classifying data into species. The SVMs implemented in this research were used as classifiers for the final stage in a Multistage Automatic Target Recognition (ATR) system. A single kernel SVM known as SVMlight, and a modified version known as a SVM with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SVM as a method for classification. From trial to trial, SVM produces consistent results.
NASA Astrophysics Data System (ADS)
Zhang, Zhenhai; Li, Kejie; Wu, Xiaobing; Zhang, Shujiang
2008-03-01
The unwrapped and correcting algorithm based on Coordinate Rotation Digital Computer (CORDIC) and bilinear interpolation algorithm was presented in this paper, with the purpose of processing dynamic panoramic annular image. An original annular panoramic image captured by panoramic annular lens (PAL) can be unwrapped and corrected to conventional rectangular image without distortion, which is much more coincident with people's vision. The algorithm for panoramic image processing is modeled by VHDL and implemented in FPGA. The experimental results show that the proposed panoramic image algorithm for unwrapped and distortion correction has the lower computation complexity and the architecture for dynamic panoramic image processing has lower hardware cost and power consumption. And the proposed algorithm is valid.
2017-01-01
Background Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development. Objective The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors’ professional performance in the United Kingdom. Methods We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians’ colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Results Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to “popular” (recall=.97), “innovator” (recall=.98), and “respected” (recall=.87) codes and was lower for the “interpersonal” (recall=.80) and “professional” (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as “respected,” “professional,” and “interpersonal” related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05). Conclusions Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor’s performance. PMID:28298265
Gibbons, Chris; Richards, Suzanne; Valderas, Jose Maria; Campbell, John
2017-03-15
Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development. The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom. We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05). Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor's performance. ©Chris Gibbons, Suzanne Richards, Jose Maria Valderas, John Campbell. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.03.2017.
Deenik, Axel; van Mameren, Henk; de Visser, Enrico; de Waal Malefijt, Maarten; Draijer, Frits; de Bie, Rob
2008-12-01
Chevron osteotomy is a widely accepted osteotomy for correction of hallux valgus.(18) Algorithms were developed to overcome the limitations of distal osteotomies. Scarf osteotomy has become popular as a versatile procedure that should be able to correct most cases of acquired hallux valgus. The purpose of this study was to evaluate whether patients with moderate or severe hallux valgus have better correction with a scarf osteotomy as compared to chevron osteotomy. After informed consent, 136 feet in 115 patients were randomized to 66 scarf and 70 chevron osteotomies. Deformities of patients were classified as mild, moderate and severe according to IMA, and both groups were compared with independent t-tests. The results were measured using radiographic HVA, IMA and DMAA measurements. There were no statistical differences in HVA, IMA and DMAA between scarf and chevron osteotomy in mild to moderate hallux valgus. In severe hallux valgus, chevron osteotomy corrected HVA better than scarf osteotomy, although this group consisted of twelve patients only. Five patients in the chevron group and seven in the scarf group developed recurrent subluxation of the metatarsophalangeal joint. In patients with moderate and severe hallux valgus, the results of chevron osteotomy were at least as effective as a scarf osteotomy. Recurrent subluxation of the first metatatarsophalangeal joint was the main cause for insufficient correction. We favor the chevron osteotomy because it is less invasive, without sacrificing correction of HVA and IMA.
A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update
NASA Astrophysics Data System (ADS)
Lotte, F.; Bougrain, L.; Cichocki, A.; Clerc, M.; Congedo, M.; Rakotomamonjy, A.; Yger, F.
2018-06-01
Objective. Most current electroencephalography (EEG)-based brain–computer interfaces (BCIs) are based on machine learning algorithms. There is a large diversity of classifier types that are used in this field, as described in our 2007 review paper. Now, approximately ten years after this review publication, many new algorithms have been developed and tested to classify EEG signals in BCIs. The time is therefore ripe for an updated review of EEG classification algorithms for BCIs. Approach. We surveyed the BCI and machine learning literature from 2007 to 2017 to identify the new classification approaches that have been investigated to design BCIs. We synthesize these studies in order to present such algorithms, to report how they were used for BCIs, what were the outcomes, and to identify their pros and cons. Main results. We found that the recently designed classification algorithms for EEG-based BCIs can be divided into four main categories: adaptive classifiers, matrix and tensor classifiers, transfer learning and deep learning, plus a few other miscellaneous classifiers. Among these, adaptive classifiers were demonstrated to be generally superior to static ones, even with unsupervised adaptation. Transfer learning can also prove useful although the benefits of transfer learning remain unpredictable. Riemannian geometry-based methods have reached state-of-the-art performances on multiple BCI problems and deserve to be explored more thoroughly, along with tensor-based methods. Shrinkage linear discriminant analysis and random forests also appear particularly useful for small training samples settings. On the other hand, deep learning methods have not yet shown convincing improvement over state-of-the-art BCI methods. Significance. This paper provides a comprehensive overview of the modern classification algorithms used in EEG-based BCIs, presents the principles of these methods and guidelines on when and how to use them. It also identifies a number of challenges to further advance EEG classification in BCI.
Contextual classification on a CDC Flexible Processor system. [for photomapped remote sensing data
NASA Technical Reports Server (NTRS)
Smith, B. W.; Siegel, H. J.; Swain, P. H.
1981-01-01
A potential hardware organization for the Flexible Processor Array is presented. An algorithm that implements a contextual classifier for remote sensing data analysis is given, along with uniprocessor classification algorithms. The Flexible Processor algorithm is provided, as are simulated timings for contextual classifiers run on the Flexible Processor Array and another system. The timings are analyzed for context neighborhoods of sizes three and nine.
MacRae, J; Darlow, B; McBain, L; Jones, O; Stubbe, M; Turner, N; Dowell, A
2015-08-21
To develop a natural language processing software inference algorithm to classify the content of primary care consultations using electronic health record Big Data and subsequently test the algorithm's ability to estimate the prevalence and burden of childhood respiratory illness in primary care. Algorithm development and validation study. To classify consultations, the algorithm is designed to interrogate clinical narrative entered as free text, diagnostic (Read) codes created and medications prescribed on the day of the consultation. Thirty-six consenting primary care practices from a mixed urban and semirural region of New Zealand. Three independent sets of 1200 child consultation records were randomly extracted from a data set of all general practitioner consultations in participating practices between 1 January 2008-31 December 2013 for children under 18 years of age (n=754,242). Each consultation record within these sets was independently classified by two expert clinicians as respiratory or non-respiratory, and subclassified according to respiratory diagnostic categories to create three 'gold standard' sets of classified records. These three gold standard record sets were used to train, test and validate the algorithm. Sensitivity, specificity, positive predictive value and F-measure were calculated to illustrate the algorithm's ability to replicate judgements of expert clinicians within the 1200 record gold standard validation set. The algorithm was able to identify respiratory consultations in the 1200 record validation set with a sensitivity of 0.72 (95% CI 0.67 to 0.78) and a specificity of 0.95 (95% CI 0.93 to 0.98). The positive predictive value of algorithm respiratory classification was 0.93 (95% CI 0.89 to 0.97). The positive predictive value of the algorithm classifying consultations as being related to specific respiratory diagnostic categories ranged from 0.68 (95% CI 0.40 to 1.00; other respiratory conditions) to 0.91 (95% CI 0.79 to 1.00; throat infections). A software inference algorithm that uses primary care Big Data can accurately classify the content of clinical consultations. This algorithm will enable accurate estimation of the prevalence of childhood respiratory illness in primary care and resultant service utilisation. The methodology can also be applied to other areas of clinical care. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Multi-objective evolutionary algorithms for fuzzy classification in survival prediction.
Jiménez, Fernando; Sánchez, Gracia; Juárez, José M
2014-03-01
This paper presents a novel rule-based fuzzy classification methodology for survival/mortality prediction in severe burnt patients. Due to the ethical aspects involved in this medical scenario, physicians tend not to accept a computer-based evaluation unless they understand why and how such a recommendation is given. Therefore, any fuzzy classifier model must be both accurate and interpretable. The proposed methodology is a three-step process: (1) multi-objective constrained optimization of a patient's data set, using Pareto-based elitist multi-objective evolutionary algorithms to maximize accuracy and minimize the complexity (number of rules) of classifiers, subject to interpretability constraints; this step produces a set of alternative (Pareto) classifiers; (2) linguistic labeling, which assigns a linguistic label to each fuzzy set of the classifiers; this step is essential to the interpretability of the classifiers; (3) decision making, whereby a classifier is chosen, if it is satisfactory, according to the preferences of the decision maker. If no classifier is satisfactory for the decision maker, the process starts again in step (1) with a different input parameter set. The performance of three multi-objective evolutionary algorithms, niched pre-selection multi-objective algorithm, elitist Pareto-based multi-objective evolutionary algorithm for diversity reinforcement (ENORA) and the non-dominated sorting genetic algorithm (NSGA-II), was tested using a patient's data set from an intensive care burn unit and a standard machine learning data set from an standard machine learning repository. The results are compared using the hypervolume multi-objective metric. Besides, the results have been compared with other non-evolutionary techniques and validated with a multi-objective cross-validation technique. Our proposal improves the classification rate obtained by other non-evolutionary techniques (decision trees, artificial neural networks, Naive Bayes, and case-based reasoning) obtaining with ENORA a classification rate of 0.9298, specificity of 0.9385, and sensitivity of 0.9364, with 14.2 interpretable fuzzy rules on average. Our proposal improves the accuracy and interpretability of the classifiers, compared with other non-evolutionary techniques. We also conclude that ENORA outperforms niched pre-selection and NSGA-II algorithms. Moreover, given that our multi-objective evolutionary methodology is non-combinational based on real parameter optimization, the time cost is significantly reduced compared with other evolutionary approaches existing in literature based on combinational optimization. Copyright © 2014 Elsevier B.V. All rights reserved.
Arnold, J B; Liow, J S; Schaper, K A; Stern, J J; Sled, J G; Shattuck, D W; Worth, A J; Cohen, M S; Leahy, R M; Mazziotta, J C; Rottenberg, D A
2001-05-01
The desire to correct intensity nonuniformity in magnetic resonance images has led to the proliferation of nonuniformity-correction (NUC) algorithms with different theoretical underpinnings. In order to provide end users with a rational basis for selecting a given algorithm for a specific neuroscientific application, we evaluated the performance of six NUC algorithms. We used simulated and real MRI data volumes, including six repeat scans of the same subject, in order to rank the accuracy, precision, and stability of the nonuniformity corrections. We also compared algorithms using data volumes from different subjects and different (1.5T and 3.0T) MRI scanners in order to relate differences in algorithmic performance to intersubject variability and/or differences in scanner performance. In phantom studies, the correlation of the extracted with the applied nonuniformity was highest in the transaxial (left-to-right) direction and lowest in the axial (top-to-bottom) direction. Two of the six algorithms demonstrated a high degree of stability, as measured by the iterative application of the algorithm to its corrected output. While none of the algorithms performed ideally under all circumstances, locally adaptive methods generally outperformed nonadaptive methods. Copyright 2001 Academic Press.
Measuring the lesion load of multiple sclerosis patients within the corticospinal tract
NASA Astrophysics Data System (ADS)
Klein, Jan; Hanken, Katrin; Koceva, Jasna; Hildebrandt, Helmut; Hahn, Horst K.
2015-03-01
In this paper we present a framework for reliable determination of the lesion load within the corticospinal tract (CST) of multiple sclerosis patients. The basis constitutes a probabilistic fiber tracking approach which checks possible parameter intervals on the fly using an anatomical brain atlas. By exploiting the range of those intervals, the algorithm is able to resolve fiber crossings and to determine the CST in its full entity although it can use a simple diffusion tensor model. Another advantage is its short running time, tracking the CST takes less than a minute. For segmenting the lesions we developed a semi-automatic approach. First, a trained classifier is applied to multimodal MRI data (T1/FLAIR) where the spectrum of lesions has been determined in advance by a clustering algorithm. This leads to an automatic detection of the lesions which can be manually corrected afterwards using a threshold-based approach. For evaluation we scanned 46 MS patients and 16 healthy controls. Fiber tracking has been performed using our novel fiber tracking and a standard defection based algorithm. Regression analysis of the old and new version of the algorithm showed a highly significant superiority of the new algorithm for disease duration. Additionally, a low correlation between old and new approach supports the observation that standard DTI fiber tracking is not always able to track and quantify the CST reliably.
NASA Astrophysics Data System (ADS)
Movia, A.; Beinat, A.; Crosilla, F.
2015-04-01
The recognition of vegetation by the analysis of very high resolution (VHR) aerial images provides meaningful information about environmental features; nevertheless, VHR images frequently contain shadows that generate significant problems for the classification of the image components and for the extraction of the needed information. The aim of this research is to classify, from VHR aerial images, vegetation involved in the balance process of the environmental biochemical cycle, and to discriminate it with respect to urban and agricultural features. Three classification algorithms have been experimented in order to better recognize vegetation, and compared to NDVI index; unfortunately all these methods are conditioned by the presence of shadows on the images. Literature presents several algorithms to detect and remove shadows in the scene: most of them are based on the RGB to HSI transformations. In this work some of them have been implemented and compared with one based on RGB bands. Successively, in order to remove shadows and restore brightness on the images, some innovative algorithms, based on Procrustes theory, have been implemented and applied. Among these, we evaluate the capability of the so called "not-centered oblique Procrustes" and "anisotropic Procrustes" methods to efficiently restore brightness with respect to a linear correlation correction based on the Cholesky decomposition. Some experimental results obtained by different classification methods after shadows removal carried out with the innovative algorithms are presented and discussed.
NASA Technical Reports Server (NTRS)
Pagnutti, Mary
2006-01-01
This viewgraph presentation reviews the creation of a prototype algorithm for atmospheric correction using high spatial resolution earth observing imaging systems. The objective of the work was to evaluate accuracy of a prototype algorithm that uses satellite-derived atmospheric products to generate scene reflectance maps for high spatial resolution (HSR) systems. This presentation focused on preliminary results of only the satellite-based atmospheric correction algorithm.
A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory.
Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco
2011-01-01
Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithms.
Performance evaluation of various classifiers for color prediction of rice paddy plant leaf
NASA Astrophysics Data System (ADS)
Singh, Amandeep; Singh, Maninder Lal
2016-11-01
The food industry is one of the industries that uses machine vision for a nondestructive quality evaluation of the produce. These quality measuring systems and softwares are precalculated on the basis of various image-processing algorithms which generally use a particular type of classifier. These classifiers play a vital role in making the algorithms so intelligent that it can contribute its best while performing the said quality evaluations by translating the human perception into machine vision and hence machine learning. The crop of interest is rice, and the color of this crop indicates the health status of the plant. An enormous number of classifiers are available to solve the purpose of color prediction, but choosing the best among them is the focus of this paper. Performance of a total of 60 classifiers has been analyzed from the application point of view, and the results have been discussed. The motivation comes from the idea of providing a set of classifiers with excellent performance and implementing them on a single algorithm for the improvement of machine vision learning and, hence, associated applications.
Bjornsson, Christopher S; Lin, Gang; Al-Kofahi, Yousef; Narayanaswamy, Arunachalam; Smith, Karen L; Shain, William; Roysam, Badrinath
2009-01-01
Brain structural complexity has confounded prior efforts to extract quantitative image-based measurements. We present a systematic ‘divide and conquer’ methodology for analyzing three-dimensional (3D) multi-parameter images of brain tissue to delineate and classify key structures, and compute quantitative associations among them. To demonstrate the method, thick (~100 μm) slices of rat brain tissue were labeled using 3 – 5 fluorescent signals, and imaged using spectral confocal microscopy and unmixing algorithms. Automated 3D segmentation and tracing algorithms were used to delineate cell nuclei, vasculature, and cell processes. From these segmentations, a set of 23 intrinsic and 8 associative image-based measurements was computed for each cell. These features were used to classify astrocytes, microglia, neurons, and endothelial cells. Associations among cells and between cells and vasculature were computed and represented as graphical networks to enable further analysis. The automated results were validated using a graphical interface that permits investigator inspection and corrective editing of each cell in 3D. Nuclear counting accuracy was >89%, and cell classification accuracy ranged from 81–92% depending on cell type. We present a software system named FARSIGHT implementing our methodology. Its output is a detailed XML file containing measurements that may be used for diverse quantitative hypothesis-driven and exploratory studies of the central nervous system. PMID:18294697
Face recognition using total margin-based adaptive fuzzy support vector machines.
Liu, Yi-Hung; Chen, Yen-Ting
2007-01-01
This paper presents a new classifier called total margin-based adaptive fuzzy support vector machines (TAF-SVM) that deals with several problems that may occur in support vector machines (SVMs) when applied to the face recognition. The proposed TAF-SVM not only solves the overfitting problem resulted from the outlier with the approach of fuzzification of the penalty, but also corrects the skew of the optimal separating hyperplane due to the very imbalanced data sets by using different cost algorithm. In addition, by introducing the total margin algorithm to replace the conventional soft margin algorithm, a lower generalization error bound can be obtained. Those three functions are embodied into the traditional SVM so that the TAF-SVM is proposed and reformulated in both linear and nonlinear cases. By using two databases, the Chung Yuan Christian University (CYCU) multiview and the facial recognition technology (FERET) face databases, and using the kernel Fisher's discriminant analysis (KFDA) algorithm to extract discriminating face features, experimental results show that the proposed TAF-SVM is superior to SVM in terms of the face-recognition accuracy. The results also indicate that the proposed TAF-SVM can achieve smaller error variances than SVM over a number of tests such that better recognition stability can be obtained.
Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
2016-01-01
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.
Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
2016-01-01
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases. PMID:26764911
The impact of missing trauma data on predicting massive transfusion
Trickey, Amber W.; Fox, Erin E.; del Junco, Deborah J.; Ning, Jing; Holcomb, John B.; Brasel, Karen J.; Cohen, Mitchell J.; Schreiber, Martin A.; Bulger, Eileen M.; Phelan, Herb A.; Alarcon, Louis H.; Myers, John G.; Muskat, Peter; Cotton, Bryan A.; Wade, Charles E.; Rahbar, Mohammad H.
2013-01-01
INTRODUCTION Missing data are inherent in clinical research and may be especially problematic for trauma studies. This study describes a sensitivity analysis to evaluate the impact of missing data on clinical risk prediction algorithms. Three blood transfusion prediction models were evaluated utilizing an observational trauma dataset with valid missing data. METHODS The PRospective Observational Multi-center Major Trauma Transfusion (PROMMTT) study included patients requiring ≥ 1 unit of red blood cells (RBC) at 10 participating U.S. Level I trauma centers from July 2009 – October 2010. Physiologic, laboratory, and treatment data were collected prospectively up to 24h after hospital admission. Subjects who received ≥ 10 RBC units within 24h of admission were classified as massive transfusion (MT) patients. Correct classification percentages for three MT prediction models were evaluated using complete case analysis and multiple imputation. A sensitivity analysis for missing data was conducted to determine the upper and lower bounds for correct classification percentages. RESULTS PROMMTT enrolled 1,245 subjects. MT was received by 297 patients (24%). Missing percentage ranged from 2.2% (heart rate) to 45% (respiratory rate). Proportions of complete cases utilized in the MT prediction models ranged from 41% to 88%. All models demonstrated similar correct classification percentages using complete case analysis and multiple imputation. In the sensitivity analysis, correct classification upper-lower bound ranges per model were 4%, 10%, and 12%. Predictive accuracy for all models using PROMMTT data was lower than reported in the original datasets. CONCLUSIONS Evaluating the accuracy clinical prediction models with missing data can be misleading, especially with many predictor variables and moderate levels of missingness per variable. The proposed sensitivity analysis describes the influence of missing data on risk prediction algorithms. Reporting upper/lower bounds for percent correct classification may be more informative than multiple imputation, which provided similar results to complete case analysis in this study. PMID:23778514
NASA Astrophysics Data System (ADS)
Roverso, Davide
2003-08-01
Many-class learning is the problem of training a classifier to discriminate among a large number of target classes. Together with the problem of dealing with high-dimensional patterns (i.e. a high-dimensional input space), the many class problem (i.e. a high-dimensional output space) is a major obstacle to be faced when scaling-up classifier systems and algorithms from small pilot applications to large full-scale applications. The Autonomous Recursive Task Decomposition (ARTD) algorithm is here proposed as a solution to the problem of many-class learning. Example applications of ARTD to neural classifier training are also presented. In these examples, improvements in training time are shown to range from 4-fold to more than 30-fold in pattern classification tasks of both static and dynamic character.
Automatic morphological classification of galaxy images
Shamir, Lior
2009-01-01
We describe an image analysis supervised learning algorithm that can automatically classify galaxy images. The algorithm is first trained using a manually classified images of elliptical, spiral, and edge-on galaxies. A large set of image features is extracted from each image, and the most informative features are selected using Fisher scores. Test images can then be classified using a simple Weighted Nearest Neighbor rule such that the Fisher scores are used as the feature weights. Experimental results show that galaxy images from Galaxy Zoo can be classified automatically to spiral, elliptical and edge-on galaxies with accuracy of ~90% compared to classifications carried out by the author. Full compilable source code of the algorithm is available for free download, and its general-purpose nature makes it suitable for other uses that involve automatic image analysis of celestial objects. PMID:20161594
Orżanowski, Tomasz
2016-01-01
This paper presents an infrared focal plane array (IRFPA) response nonuniformity correction (NUC) algorithm which is easy to implement by hardware. The proposed NUC algorithm is based on the linear correction scheme with the useful method of pixel offset correction coefficients update. The new approach to IRFPA response nonuniformity correction consists in the use of pixel response change determined at the actual operating conditions in relation to the reference ones by means of shutter to compensate a pixel offset temporal drift. Moreover, it permits to remove any optics shading effect in the output image as well. To show efficiency of the proposed NUC algorithm some test results for microbolometer IRFPA are presented.
Vision-based posture recognition using an ensemble classifier and a vote filter
NASA Astrophysics Data System (ADS)
Ji, Peng; Wu, Changcheng; Xu, Xiaonong; Song, Aiguo; Li, Huijun
2016-10-01
Posture recognition is a very important Human-Robot Interaction (HRI) way. To segment effective posture from an image, we propose an improved region grow algorithm which combining with the Single Gauss Color Model. The experiment shows that the improved region grow algorithm can get the complete and accurate posture than traditional Single Gauss Model and region grow algorithm, and it can eliminate the similar region from the background at the same time. In the posture recognition part, and in order to improve the recognition rate, we propose a CNN ensemble classifier, and in order to reduce the misjudgments during a continuous gesture control, a vote filter is proposed and applied to the sequence of recognition results. Comparing with CNN classifier, the CNN ensemble classifier we proposed can yield a 96.27% recognition rate, which is better than that of CNN classifier, and the proposed vote filter can improve the recognition result and reduce the misjudgments during the consecutive gesture switch.
Testing of the Support Vector Machine for Binary-Class Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew
2011-01-01
The Support Vector Machine is a powerful algorithm, useful in classifying data in to species. The Support Vector Machines implemented in this research were used as classifiers for the final stage in a Multistage Autonomous Target Recognition system. A single kernel SVM known as SVMlight, and a modified version known as a Support Vector Machine with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SMV as a method for classification. From trial to trial, SVM produces consistent results
Sensor feature fusion for detecting buried objects
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clark, G.A.; Sengupta, S.K.; Sherwood, R.J.
1993-04-01
Given multiple registered images of the earth`s surface from dual-band sensors, our system fuses information from the sensors to reduce the effects of clutter and improve the ability to detect buried or surface target sites. The sensor suite currently includes two sensors (5 micron and 10 micron wavelengths) and one ground penetrating radar (GPR) of the wide-band pulsed synthetic aperture type. We use a supervised teaming pattern recognition approach to detect metal and plastic land mines buried in soil. The overall process consists of four main parts: Preprocessing, feature extraction, feature selection, and classification. These parts are used in amore » two step process to classify a subimage. Thee first step, referred to as feature selection, determines the features of sub-images which result in the greatest separability among the classes. The second step, image labeling, uses the selected features and the decisions from a pattern classifier to label the regions in the image which are likely to correspond to buried mines. We extract features from the images, and use feature selection algorithms to select only the most important features according to their contribution to correct detections. This allows us to save computational complexity and determine which of the sensors add value to the detection system. The most important features from the various sensors are fused using supervised teaming pattern classifiers (including neural networks). We present results of experiments to detect buried land mines from real data, and evaluate the usefulness of fusing feature information from multiple sensor types, including dual-band infrared and ground penetrating radar. The novelty of the work lies mostly in the combination of the algorithms and their application to the very important and currently unsolved operational problem of detecting buried land mines from an airborne standoff platform.« less
Computing group cardinality constraint solutions for logistic regression problems.
Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M
2017-01-01
We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints. Copyright © 2016 Elsevier B.V. All rights reserved.
Latency correction of event-related potentials between different experimental protocols
NASA Astrophysics Data System (ADS)
Iturrate, I.; Chavarriaga, R.; Montesano, L.; Minguez, J.; Millán, JdR
2014-06-01
Objective. A fundamental issue in EEG event-related potentials (ERPs) studies is the amount of data required to have an accurate ERP model. This also impacts the time required to train a classifier for a brain-computer interface (BCI). This issue is mainly due to the poor signal-to-noise ratio and the large fluctuations of the EEG caused by several sources of variability. One of these sources is directly related to the experimental protocol or application designed, and may affect the amplitude or latency of ERPs. This usually prevents BCI classifiers from generalizing among different experimental protocols. In this paper, we analyze the effect of the amplitude and the latency variations among different experimental protocols based on the same type of ERP. Approach. We present a method to analyze and compensate for the latency variations in BCI applications. The algorithm has been tested on two widely used ERPs (P300 and observation error potentials), in three experimental protocols in each case. We report the ERP analysis and single-trial classification. Main results. The results obtained show that the designed experimental protocols significantly affect the latency of the recorded potentials but not the amplitudes. Significance. These results show how the use of latency-corrected data can be used to generalize the BCIs, reducing the calibration time when facing a new experimental protocol.
Watson, Robert A
2014-08-01
To test the hypothesis that machine learning algorithms increase the predictive power to classify surgical expertise using surgeons' hand motion patterns. In 2012 at the University of North Carolina at Chapel Hill, 14 surgical attendings and 10 first- and second-year surgical residents each performed two bench model venous anastomoses. During the simulated tasks, the participants wore an inertial measurement unit on the dorsum of their dominant (right) hand to capture their hand motion patterns. The pattern from each bench model task performed was preprocessed into a symbolic time series and labeled as expert (attending) or novice (resident). The labeled hand motion patterns were processed and used to train a Support Vector Machine (SVM) classification algorithm. The trained algorithm was then tested for discriminative/predictive power against unlabeled (blinded) hand motion patterns from tasks not used in the training. The Lempel-Ziv (LZ) complexity metric was also measured from each hand motion pattern, with an optimal threshold calculated to separately classify the patterns. The LZ metric classified unlabeled (blinded) hand motion patterns into expert and novice groups with an accuracy of 70% (sensitivity 64%, specificity 80%). The SVM algorithm had an accuracy of 83% (sensitivity 86%, specificity 80%). The results confirmed the hypothesis. The SVM algorithm increased the predictive power to classify blinded surgical hand motion patterns into expert versus novice groups. With further development, the system used in this study could become a viable tool for low-cost, objective assessment of procedural proficiency in a competency-based curriculum.
New Dandelion Algorithm Optimizes Extreme Learning Machine for Biomedical Classification Problems
Li, Xiguang; Zhao, Liang; Gong, Changqing; Liu, Xiaojing
2017-01-01
Inspired by the behavior of dandelion sowing, a new novel swarm intelligence algorithm, namely, dandelion algorithm (DA), is proposed for global optimization of complex functions in this paper. In DA, the dandelion population will be divided into two subpopulations, and different subpopulations will undergo different sowing behaviors. Moreover, another sowing method is designed to jump out of local optimum. In order to demonstrate the validation of DA, we compare the proposed algorithm with other existing algorithms, including bat algorithm, particle swarm optimization, and enhanced fireworks algorithm. Simulations show that the proposed algorithm seems much superior to other algorithms. At the same time, the proposed algorithm can be applied to optimize extreme learning machine (ELM) for biomedical classification problems, and the effect is considerable. At last, we use different fusion methods to form different fusion classifiers, and the fusion classifiers can achieve higher accuracy and better stability to some extent. PMID:29085425
Andreini, Daniele; Lin, Fay Y; Rizvi, Asim; Cho, Iksung; Heo, Ran; Pontone, Gianluca; Bartorelli, Antonio L; Mushtaq, Saima; Villines, Todd C; Carrascosa, Patricia; Choi, Byoung Wook; Bloom, Stephen; Wei, Han; Xing, Yan; Gebow, Dan; Gransar, Heidi; Chang, Hyuk-Jae; Leipsic, Jonathon; Min, James K
2018-06-01
Motion artifact can reduce the diagnostic accuracy of coronary CT angiography (CCTA) for coronary artery disease (CAD). The purpose of this study was to compare the diagnostic performance of an algorithm dedicated to correcting coronary motion artifact with the performance of standard reconstruction methods in a prospective international multicenter study. Patients referred for clinically indicated invasive coronary angiography (ICA) for suspected CAD prospectively underwent an investigational CCTA examination free from heart rate-lowering medications before they underwent ICA. Blinded core laboratory interpretations of motion-corrected and standard reconstructions for obstructive CAD (≥ 50% stenosis) were compared with ICA findings. Segments unevaluable owing to artifact were considered obstructive. The primary endpoint was per-subject diagnostic accuracy of the intracycle motion correction algorithm for obstructive CAD found at ICA. Among 230 patients who underwent CCTA with the motion correction algorithm and standard reconstruction, 92 (40.0%) had obstructive CAD on the basis of ICA findings. At a mean heart rate of 68.0 ± 11.7 beats/min, the motion correction algorithm reduced the number of nondiagnostic scans compared with standard reconstruction (20.4% vs 34.8%; p < 0.001). Diagnostic accuracy for obstructive CAD with the motion correction algorithm (62%; 95% CI, 56-68%) was not significantly different from that of standard reconstruction on a per-subject basis (59%; 95% CI, 53-66%; p = 0.28) but was superior on a per-vessel basis: 77% (95% CI, 74-80%) versus 72% (95% CI, 69-75%) (p = 0.02). The motion correction algorithm was superior in subgroups of patients with severely obstructive (≥ 70%) stenosis, heart rate ≥ 70 beats/min, and vessels in the atrioventricular groove. The motion correction algorithm studied reduces artifacts and improves diagnostic performance for obstructive CAD on a per-vessel basis and in selected subgroups on a per-subject basis.
NASA Technical Reports Server (NTRS)
Gao, Bo-Cai; Montes, Marcos J.; Davis, Curtiss O.
2003-01-01
This SIMBIOS contract supports several activities over its three-year time-span. These include certain computational aspects of atmospheric correction, including the modification of our hyperspectral atmospheric correction algorithm Tafkaa for various multi-spectral instruments, such as SeaWiFS, MODIS, and GLI. Additionally, since absorbing aerosols are becoming common in many coastal areas, we are making the model calculations to incorporate various absorbing aerosol models into tables used by our Tafkaa atmospheric correction algorithm. Finally, we have developed the algorithms to use MODIS data to characterize thin cirrus effects on aerosol retrieval.
Imitating manual curation of text-mined facts in biomedicine.
Rodriguez-Esteban, Raul; Iossifov, Ivan; Rzhetsky, Andrey
2006-09-08
Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted) of individual facts--to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations), we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95). Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine.
NASA Astrophysics Data System (ADS)
Tan, Xiangli; Yang, Jungang; Deng, Xinpu
2018-04-01
In the process of geometric correction of remote sensing image, occasionally, a large number of redundant control points may result in low correction accuracy. In order to solve this problem, a control points filtering algorithm based on RANdom SAmple Consensus (RANSAC) was proposed. The basic idea of the RANSAC algorithm is that using the smallest data set possible to estimate the model parameters and then enlarge this set with consistent data points. In this paper, unlike traditional methods of geometric correction using Ground Control Points (GCPs), the simulation experiments are carried out to correct remote sensing images, which using visible stars as control points. In addition, the accuracy of geometric correction without Star Control Points (SCPs) optimization is also shown. The experimental results show that the SCPs's filtering method based on RANSAC algorithm has a great improvement on the accuracy of remote sensing image correction.
Brown, Anna M; Nagala, Sidhartha; McLean, Mary A; Lu, Yonggang; Scoffings, Daniel; Apte, Aditya; Gonen, Mithat; Stambuk, Hilda E; Shaha, Ashok R; Tuttle, R Michael; Deasy, Joseph O; Priest, Andrew N; Jani, Piyush; Shukla-Dave, Amita; Griffiths, John
2016-04-01
Ultrasound-guided fine needle aspirate cytology fails to diagnose many malignant thyroid nodules; consequently, patients may undergo diagnostic lobectomy. This study assessed whether textural analysis (TA) could noninvasively stratify thyroid nodules accurately using diffusion-weighted MRI (DW-MRI). This multi-institutional study examined 3T DW-MRI images obtained with spin echo echo planar imaging sequences. The training data set included 26 patients from Cambridge, United Kingdom, and the test data set included 18 thyroid cancer patients from Memorial Sloan Kettering Cancer Center (New York, New York, USA). Apparent diffusion coefficients (ADCs) were compared over regions of interest (ROIs) defined on thyroid nodules. TA, linear discriminant analysis (LDA), and feature reduction were performed using the 21 MaZda-generated texture parameters that best distinguished benign and malignant ROIs. Training data set mean ADC values were significantly different for benign and malignant nodules (P = 0.02) with a sensitivity and specificity of 70% and 63%, respectively, and a receiver operator characteristic (ROC) area under the curve (AUC) of 0.73. The LDA model of the top 21 textural features correctly classified 89/94 DW-MRI ROIs with 92% sensitivity, 96% specificity, and an AUC of 0.97. This algorithm correctly classified 16/18 (89%) patients in the independently obtained test set of thyroid DW-MRI scans. TA classifies thyroid nodules with high sensitivity and specificity on multi-institutional DW-MRI data sets. This method requires further validation in a larger prospective study. Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance. © 2015 The Authors. Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine.
Non-Uniformity Correction Using Nonlinear Characteristic Performance Curves for Calibration
NASA Astrophysics Data System (ADS)
Lovejoy, McKenna Roberts
Infrared imaging is an expansive field with many applications. Advances in infrared technology have lead to a greater demand from both commercial and military sectors. However, a known problem with infrared imaging is its non-uniformity. This non-uniformity stems from the fact that each pixel in an infrared focal plane array has its own photoresponse. Many factors such as exposure time, temperature, and amplifier choice affect how the pixels respond to incoming illumination and thus impact image uniformity. To improve performance non-uniformity correction (NUC) techniques are applied. Standard calibration based techniques commonly use a linear model to approximate the nonlinear response. This often leaves unacceptable levels of residual non-uniformity. Calibration techniques often have to be repeated during use to continually correct the image. In this dissertation alternates to linear NUC algorithms are investigated. The goal of this dissertation is to determine and compare nonlinear non-uniformity correction algorithms. Ideally the results will provide better NUC performance resulting in less residual non-uniformity as well as reduce the need for recalibration. This dissertation will consider new approaches to nonlinear NUC such as higher order polynomials and exponentials. More specifically, a new gain equalization algorithm has been developed. The various nonlinear non-uniformity correction algorithms will be compared with common linear non-uniformity correction algorithms. Performance will be compared based on RMS errors, residual non-uniformity, and the impact quantization has on correction. Performance will be improved by identifying and replacing bad pixels prior to correction. Two bad pixel identification and replacement techniques will be investigated and compared. Performance will be presented in the form of simulation results as well as before and after images taken with short wave infrared cameras. The initial results show, using a third order polynomial with 16-bit precision, significant improvement over the one and two-point correction algorithms. All algorithm have been implemented in software with satisfactory results and the third order gain equalization non-uniformity correction algorithm has been implemented in hardware.
2017-01-01
In this paper, we propose a new automatic hyperparameter selection approach for determining the optimal network configuration (network structure and hyperparameters) for deep neural networks using particle swarm optimization (PSO) in combination with a steepest gradient descent algorithm. In the proposed approach, network configurations were coded as a set of real-number m-dimensional vectors as the individuals of the PSO algorithm in the search procedure. During the search procedure, the PSO algorithm is employed to search for optimal network configurations via the particles moving in a finite search space, and the steepest gradient descent algorithm is used to train the DNN classifier with a few training epochs (to find a local optimal solution) during the population evaluation of PSO. After the optimization scheme, the steepest gradient descent algorithm is performed with more epochs and the final solutions (pbest and gbest) of the PSO algorithm to train a final ensemble model and individual DNN classifiers, respectively. The local search ability of the steepest gradient descent algorithm and the global search capabilities of the PSO algorithm are exploited to determine an optimal solution that is close to the global optimum. We constructed several experiments on hand-written characters and biological activity prediction datasets to show that the DNN classifiers trained by the network configurations expressed by the final solutions of the PSO algorithm, employed to construct an ensemble model and individual classifier, outperform the random approach in terms of the generalization performance. Therefore, the proposed approach can be regarded an alternative tool for automatic network structure and parameter selection for deep neural networks. PMID:29236718
Jackowski, Konrad; Krawczyk, Bartosz; Woźniak, Michał
2014-05-01
Currently, methods of combined classification are the focus of intense research. A properly designed group of combined classifiers exploiting knowledge gathered in a pool of elementary classifiers can successfully outperform a single classifier. There are two essential issues to consider when creating combined classifiers: how to establish the most comprehensive pool and how to design a fusion model that allows for taking full advantage of the collected knowledge. In this work, we address the issues and propose an AdaSS+, training algorithm dedicated for the compound classifier system that effectively exploits local specialization of the elementary classifiers. An effective training procedure consists of two phases. The first phase detects the classifier competencies and adjusts the respective fusion parameters. The second phase boosts classification accuracy by elevating the degree of local specialization. The quality of the proposed algorithms are evaluated on the basis of a wide range of computer experiments that show that AdaSS+ can outperform the original method and several reference classifiers.
Reduction from cost-sensitive ordinal ranking to weighted binary classification.
Lin, Hsuan-Tien; Li, Ling
2012-05-01
We present a reduction framework from ordinal ranking to binary classification. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a ranker from the binary classifier. Based on the framework, we show that a weighted 0/1 loss of the binary classifier upper-bounds the mislabeling cost of the ranker, both error-wise and regret-wise. Our framework allows not only the design of good ordinal ranking algorithms based on well-tuned binary classification approaches, but also the derivation of new generalization bounds for ordinal ranking from known bounds for binary classification. In addition, our framework unifies many existing ordinal ranking algorithms, such as perceptron ranking and support vector ordinal regression. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages in terms of both training speed and generalization performance over existing algorithms. In addition, the newly designed algorithms lead to better cost-sensitive ordinal ranking performance, as well as improved listwise ranking performance.
Power System Transient Stability Based on Data Mining Theory
NASA Astrophysics Data System (ADS)
Cui, Zhen; Shi, Jia; Wu, Runsheng; Lu, Dan; Cui, Mingde
2018-01-01
In order to study the stability of power system, a power system transient stability based on data mining theory is designed. By introducing association rules analysis in data mining theory, an association classification method for transient stability assessment is presented. A mathematical model of transient stability assessment based on data mining technology is established. Meanwhile, combining rule reasoning with classification prediction, the method of association classification is proposed to perform transient stability assessment. The transient stability index is used to identify the samples that cannot be correctly classified in association classification. Then, according to the critical stability of each sample, the time domain simulation method is used to determine the state, so as to ensure the accuracy of the final results. The results show that this stability assessment system can improve the speed of operation under the premise that the analysis result is completely correct, and the improved algorithm can find out the inherent relation between the change of power system operation mode and the change of transient stability degree.
Collaborative reputation systems in a cultural heritage scenario
NASA Astrophysics Data System (ADS)
Cuomo, Salvatore; De Michele, Pasquale; Galletti, Ardelio; Ponti, Giovanni
2016-10-01
In the last decade, algorithms for reputation systems are been widely developed in order to achieve correct ratings for products, services, companies, digital contents and people. We start from a comprehensive mathematical model for Collaborative Reputation Systems (CRSes), present in the literature and formally defined as a recurrence relation that generates a sequence of trust matrices, from which the reputation of the items and the raters can be derived. Even though this model can be applied to several scenarios, the focus of this work is related to its application in a real case, that is a cultural event scenario. More in detail, in cultural heritage environment, the data collected in an event represent the basic knowledge to be inferred. The main idea is to correctly use the available technology and data to give a reliable rate (reputation) for both visitors and artworks. These rates will be very useful to classify the visiting style of the visitors and to fix the artworks that have most attracted visitors.
NASA Astrophysics Data System (ADS)
Oza, Nikunj
2012-03-01
A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. A set of training examples— examples with known output values—is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate’s measurements. The generalization performance of a learned model (how closely the target outputs and the model’s predicted outputs agree for patterns that have not been presented to the learning algorithm) would provide an indication of how well the model has learned the desired mapping. More formally, a classification learning algorithm L takes a training set T as its input. The training set consists of |T| examples or instances. It is assumed that there is a probability distribution D from which all training examples are drawn independently—that is, all the training examples are independently and identically distributed (i.i.d.). The ith training example is of the form (x_i, y_i), where x_i is a vector of values of several features and y_i represents the class to be predicted.* In the sunspot classification example given above, each training example would represent one sunspot’s classification (y_i) and the corresponding set of measurements (x_i). The output of a supervised learning algorithm is a model h that approximates the unknown mapping from the inputs to the outputs. In our example, h would map from the sunspot measurements to the type of sunspot. We may have a test set S—a set of examples not used in training that we use to test how well the model h predicts the outputs on new examples. Just as with the examples in T, the examples in S are assumed to be independent and identically distributed (i.i.d.) draws from the distribution D. We measure the error of h on the test set as the proportion of test cases that h misclassifies: 1/|S| Sigma(x,y union S)[I(h(x)!= y)] where I(v) is the indicator function—it returns 1 if v is true and 0 otherwise. In our sunspot classification example, we would identify additional examples of sunspots that were not used in generating the model, and use these to determine how accurate the model is—the fraction of the test samples that the model classifies correctly. An example of a classification model is the decision tree shown in Figure 23.1. We will discuss the decision tree learning algorithm in more detail later—for now, we assume that, given a training set with examples of sunspots, this decision tree is derived. This can be used to classify previously unseen examples of sunpots. For example, if a new sunspot’s inputs indicate that its "Group Length" is in the range 10-15, then the decision tree would classify the sunspot as being of type “E,” whereas if the "Group Length" is "NULL," the "Magnetic Type" is "bipolar," and the "Penumbra" is "rudimentary," then it would be classified as type "C." In this chapter, we will add to the above description of classification problems. We will discuss decision trees and several other classification models. In particular, we will discuss the learning algorithms that generate these classification models, how to use them to classify new examples, and the strengths and weaknesses of these models. We will end with pointers to further reading on classification methods applied to astronomy data.
Application of Metamorphic Testing to Supervised Classifiers
Xie, Xiaoyuan; Ho, Joshua; Kaiser, Gail; Xu, Baowen; Chen, Tsong Yueh
2010-01-01
Many applications in the field of scientific computing - such as computational biology, computational linguistics, and others - depend on Machine Learning algorithms to provide important core functionality to support solutions in the particular problem domains. However, it is difficult to test such applications because often there is no “test oracle” to indicate what the correct output should be for arbitrary input. To help address the quality of such software, in this paper we present a technique for testing the implementations of supervised machine learning classification algorithms on which such scientific computing software depends. Our technique is based on an approach called “metamorphic testing”, which has been shown to be effective in such cases. More importantly, we demonstrate that our technique not only serves the purpose of verification, but also can be applied in validation. In addition to presenting our technique, we describe a case study we performed on a real-world machine learning application framework, and discuss how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also discuss how our findings can be of use to other areas outside scientific computing, as well. PMID:21243103
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nadeau, Jeremy S.; Wright, Bob W.; Synovec, Robert E.
2010-04-15
A critical comparison of methods for correcting severely retention time shifted gas chromatography-mass spectrometry (GC-MS) data is presented. The method reported herein is an adaptation to the Piecewise Alignment Algorithm to quickly align severely shifted one-dimensional (1D) total ion current (TIC) data, then applying these shifts to broadly align all mass channels throughout the separation, referred to as a TIC shift function (SF). The maximum shift varied from (-) 5 s in the beginning of the chromatographic separation to (+) 20 s toward the end of the separation, equivalent to a maximum shift of over 5 peak widths. Implementing themore » TIC shift function (TIC SF) prior to Fisher Ratio (F-Ratio) feature selection and then principal component analysis (PCA) was found to be a viable approach to classify complex chromatograms, that in this study were obtained from GC-MS separations of three gasoline samples serving as complex test mixtures, referred to as types C, M and S. The reported alignment algorithm via the TIC SF approach corrects for large dynamic shifting in the data as well as subtle peak-to-peak shifts. The benefits of the overall TIC SF alignment and feature selection approach were quantified using the degree-of-class separation (DCS) metric of the PCA scores plots using the type C and M samples, since they were the most similar, and thus the most challenging samples to properly classify. The DCS values showed an increase from an initial value of essentially zero for the unaligned GC-TIC data to a value of 7.9 following alignment; however, the DCS was unchanged by feature selection using F-Ratios for the GC-TIC data. The full mass spectral data provided an increase to a final DCS of 13.7 after alignment and two-dimensional (2D) F-Ratio feature selection.« less
Chen, Zhiru; Hong, Wenxue
2016-02-01
Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
Solution for the nonuniformity correction of infrared focal plane arrays.
Zhou, Huixin; Liu, Shangqian; Lai, Rui; Wang, Dabao; Cheng, Yubao
2005-05-20
Based on the S-curve model of the detector response of infrared focal plan arrays (IRFPAs), an improved two-point correction algorithm is presented. The algorithm first transforms the nonlinear image data into linear data and then uses the normal two-point algorithm to correct the linear data. The algorithm can effectively overcome the influence of nonlinearity of the detector's response, and it enlarges the correction precision and the dynamic range of the response. A real-time imaging-signal-processing system for IRFPAs that is based on a digital signal processor and field-programmable gate arrays is also presented. The nonuniformity correction capability of the presented solution is validated by experimental imaging procedures of a 128 x 128 pixel IRFPA camera prototype.
Peng, Jiangtao; Peng, Silong; Xie, Qiong; Wei, Jiping
2011-04-01
In order to eliminate the lower order polynomial interferences, a new quantitative calibration algorithm "Baseline Correction Combined Partial Least Squares (BCC-PLS)", which combines baseline correction and conventional PLS, is proposed. By embedding baseline correction constraints into PLS weights selection, the proposed calibration algorithm overcomes the uncertainty in baseline correction and can meet the requirement of on-line attenuated total reflectance Fourier transform infrared (ATR-FTIR) quantitative analysis. The effectiveness of the algorithm is evaluated by the analysis of glucose and marzipan ATR-FTIR spectra. BCC-PLS algorithm shows improved prediction performance over PLS. The root mean square error of cross-validation (RMSECV) on marzipan spectra for the prediction of the moisture is found to be 0.53%, w/w (range 7-19%). The sugar content is predicted with a RMSECV of 2.04%, w/w (range 33-68%). Copyright © 2011 Elsevier B.V. All rights reserved.
Development and Validation of the Pediatric Medical Complexity Algorithm (PMCA) Version 3.0.
Simon, Tamara D; Haaland, Wren; Hawley, Katherine; Lambka, Karen; Mangione-Smith, Rita
2018-02-26
To modify the Pediatric Medical Complexity Algorithm (PMCA) to include both International Classification of Diseases, Ninth and Tenth Revisions, Clinical Modification (ICD-9/10-CM) codes for classifying children with chronic disease (CD) by level of medical complexity and to assess the sensitivity and specificity of the new PMCA version 3.0 for correctly identifying level of medical complexity. To create version 3.0, PMCA version 2.0 was modified to include ICD-10-CM codes. We applied PMCA version 3.0 to Seattle Children's Hospital data for children with ≥1 emergency department (ED), day surgery, and/or inpatient encounter from January 1, 2016, to June 30, 2017. Starting with the encounter date, up to 3 years of retrospective discharge data were used to classify children as having complex chronic disease (C-CD), noncomplex chronic disease (NC-CD), and no CD. We then selected a random sample of 300 children (100 per CD group). Blinded medical record review was conducted to ascertain the levels of medical complexity for these 300 children. The sensitivity and specificity of PMCA version 3.0 was assessed. PMCA version 3.0 identified children with C-CD with 86% sensitivity and 86% specificity, children with NC-CD with 65% sensitivity and 84% specificity, and children without CD with 77% sensitivity and 93% specificity. PMCA version 3.0 is an updated publicly available algorithm that identifies children with C-CD, who have accessed tertiary hospital emergency department, day surgery, or inpatient care, with very good sensitivity and specificity when applied to hospital discharge data and with performance to earlier versions of PMCA. Copyright © 2018 Academic Pediatric Association. Published by Elsevier Inc. All rights reserved.
Quantum ensembles of quantum classifiers.
Schuld, Maria; Petruccione, Francesco
2018-02-09
Quantum machine learning witnesses an increasing amount of quantum algorithms for data-driven decision making, a problem with potential applications ranging from automated image recognition to medical diagnosis. Many of those algorithms are implementations of quantum classifiers, or models for the classification of data inputs with a quantum computer. Following the success of collective decision making with ensembles in classical machine learning, this paper introduces the concept of quantum ensembles of quantum classifiers. Creating the ensemble corresponds to a state preparation routine, after which the quantum classifiers are evaluated in parallel and their combined decision is accessed by a single-qubit measurement. This framework naturally allows for exponentially large ensembles in which - similar to Bayesian learning - the individual classifiers do not have to be trained. As an example, we analyse an exponentially large quantum ensemble in which each classifier is weighed according to its performance in classifying the training data, leading to new results for quantum as well as classical machine learning.
Comparison of Different EHG Feature Selection Methods for the Detection of Preterm Labor
Alamedine, D.; Khalil, M.; Marque, C.
2013-01-01
Numerous types of linear and nonlinear features have been extracted from the electrohysterogram (EHG) in order to classify labor and pregnancy contractions. As a result, the number of available features is now very large. The goal of this study is to reduce the number of features by selecting only the relevant ones which are useful for solving the classification problem. This paper presents three methods for feature subset selection that can be applied to choose the best subsets for classifying labor and pregnancy contractions: an algorithm using the Jeffrey divergence (JD) distance, a sequential forward selection (SFS) algorithm, and a binary particle swarm optimization (BPSO) algorithm. The two last methods are based on a classifier and were tested with three types of classifiers. These methods have allowed us to identify common features which are relevant for contraction classification. PMID:24454536
Vehicle Classification Using an Imbalanced Dataset Based on a Single Magnetic Sensor.
Xu, Chang; Wang, Yingguan; Bao, Xinghe; Li, Fengrong
2018-05-24
This paper aims to improve the accuracy of automatic vehicle classifiers for imbalanced datasets. Classification is made through utilizing a single anisotropic magnetoresistive sensor, with the models of vehicles involved being classified into hatchbacks, sedans, buses, and multi-purpose vehicles (MPVs). Using time domain and frequency domain features in combination with three common classification algorithms in pattern recognition, we develop a novel feature extraction method for vehicle classification. These three common classification algorithms are the k-nearest neighbor, the support vector machine, and the back-propagation neural network. Nevertheless, a problem remains with the original vehicle magnetic dataset collected being imbalanced, and may lead to inaccurate classification results. With this in mind, we propose an approach called SMOTE, which can further boost the performance of classifiers. Experimental results show that the k-nearest neighbor (KNN) classifier with the SMOTE algorithm can reach a classification accuracy of 95.46%, thus minimizing the effect of the imbalance.
Cloud Detection of Optical Satellite Images Using Support Vector Machine
NASA Astrophysics Data System (ADS)
Lee, Kuan-Yi; Lin, Chao-Hung
2016-06-01
Cloud covers are generally present in optical remote-sensing images, which limit the usage of acquired images and increase the difficulty of data analysis, such as image compositing, correction of atmosphere effects, calculations of vegetation induces, land cover classification, and land cover change detection. In previous studies, thresholding is a common and useful method in cloud detection. However, a selected threshold is usually suitable for certain cases or local study areas, and it may be failed in other cases. In other words, thresholding-based methods are data-sensitive. Besides, there are many exceptions to control, and the environment is changed dynamically. Using the same threshold value on various data is not effective. In this study, a threshold-free method based on Support Vector Machine (SVM) is proposed, which can avoid the abovementioned problems. A statistical model is adopted to detect clouds instead of a subjective thresholding-based method, which is the main idea of this study. The features used in a classifier is the key to a successful classification. As a result, Automatic Cloud Cover Assessment (ACCA) algorithm, which is based on physical characteristics of clouds, is used to distinguish the clouds and other objects. In the same way, the algorithm called Fmask (Zhu et al., 2012) uses a lot of thresholds and criteria to screen clouds, cloud shadows, and snow. Therefore, the algorithm of feature extraction is based on the ACCA algorithm and Fmask. Spatial and temporal information are also important for satellite images. Consequently, co-occurrence matrix and temporal variance with uniformity of the major principal axis are used in proposed method. We aim to classify images into three groups: cloud, non-cloud and the others. In experiments, images acquired by the Landsat 7 Enhanced Thematic Mapper Plus (ETM+) and images containing the landscapes of agriculture, snow area, and island are tested. Experiment results demonstrate the detection accuracy of the proposed method is better than related methods.
Yu, Shuang; Liu, Guo-hai; Xia, Rong-sheng; Jiang, Hui
2016-01-01
In order to achieve the rapid monitoring of process state of solid state fermentation (SSF), this study attempted to qualitative identification of process state of SSF of feed protein by use of Fourier transform near infrared (FT-NIR) spectroscopy analysis technique. Even more specifically, the FT-NIR spectroscopy combined with Adaboost-SRDA-NN integrated learning algorithm as an ideal analysis tool was used to accurately and rapidly monitor chemical and physical changes in SSF of feed protein without the need for chemical analysis. Firstly, the raw spectra of all the 140 fermentation samples obtained were collected by use of Fourier transform near infrared spectrometer (Antaris II), and the raw spectra obtained were preprocessed by use of standard normal variate transformation (SNV) spectral preprocessing algorithm. Thereafter, the characteristic information of the preprocessed spectra was extracted by use of spectral regression discriminant analysis (SRDA). Finally, nearest neighbors (NN) algorithm as a basic classifier was selected and building state recognition model to identify different fermentation samples in the validation set. Experimental results showed as follows: the SRDA-NN model revealed its superior performance by compared with other two different NN models, which were developed by use of the feature information form principal component analysis (PCA) and linear discriminant analysis (LDA), and the correct recognition rate of SRDA-NN model achieved 94.28% in the validation set. In this work, in order to further improve the recognition accuracy of the final model, Adaboost-SRDA-NN ensemble learning algorithm was proposed by integrated the Adaboost and SRDA-NN methods, and the presented algorithm was used to construct the online monitoring model of process state of SSF of feed protein. Experimental results showed as follows: the prediction performance of SRDA-NN model has been further enhanced by use of Adaboost lifting algorithm, and the correct recognition rate of the Adaboost-SRDA-NN model achieved 100% in the validation set. The overall results demonstrate that SRDA algorithm can effectively achieve the spectral feature information extraction to the spectral dimension reduction in model calibration process of qualitative analysis of NIR spectroscopy. In addition, the Adaboost lifting algorithm can improve the classification accuracy of the final model. The results obtained in this work can provide research foundation for developing online monitoring instruments for the monitoring of SSF process.
Machine Learning Algorithms for Automatic Classification of Marmoset Vocalizations
Ribeiro, Sidarta; Pereira, Danillo R.; Papa, João P.; de Albuquerque, Victor Hugo C.
2016-01-01
Automatic classification of vocalization type could potentially become a useful tool for acoustic the monitoring of captive colonies of highly vocal primates. However, for classification to be useful in practice, a reliable algorithm that can be successfully trained on small datasets is necessary. In this work, we consider seven different classification algorithms with the goal of finding a robust classifier that can be successfully trained on small datasets. We found good classification performance (accuracy > 0.83 and F1-score > 0.84) using the Optimum Path Forest classifier. Dataset and algorithms are made publicly available. PMID:27654941
NASA Astrophysics Data System (ADS)
Devpura, S.; Siddiqui, M. S.; Chen, D.; Liu, D.; Li, H.; Kumar, S.; Gordon, J.; Ajlouni, M.; Movsas, B.; Chetty, I. J.
2014-03-01
The purpose of this study was to systematically evaluate dose distributions computed with 5 different dose algorithms for patients with lung cancers treated using stereotactic ablative body radiotherapy (SABR). Treatment plans for 133 lung cancer patients, initially computed with a 1D-pencil beam (equivalent-path-length, EPL-1D) algorithm, were recalculated with 4 other algorithms commissioned for treatment planning, including 3-D pencil-beam (EPL-3D), anisotropic analytical algorithm (AAA), collapsed cone convolution superposition (CCC), and Monte Carlo (MC). The plan prescription dose was 48 Gy in 4 fractions normalized to the 95% isodose line. Tumors were classified according to location: peripheral tumors surrounded by lung (lung-island, N=39), peripheral tumors attached to the rib-cage or chest wall (lung-wall, N=44), and centrally-located tumors (lung-central, N=50). Relative to the EPL-1D algorithm, PTV D95 and mean dose values computed with the other 4 algorithms were lowest for "lung-island" tumors with smallest field sizes (3-5 cm). On the other hand, the smallest differences were noted for lung-central tumors treated with largest field widths (7-10 cm). Amongst all locations, dose distribution differences were most strongly correlated with tumor size for lung-island tumors. For most cases, convolution/superposition and MC algorithms were in good agreement. Mean lung dose (MLD) values computed with the EPL-1D algorithm were highly correlated with that of the other algorithms (correlation coefficient =0.99). The MLD values were found to be ~10% lower for small lung-island tumors with the model-based (conv/superposition and MC) vs. the correction-based (pencil-beam) algorithms with the model-based algorithms predicting greater low dose spread within the lungs. This study suggests that pencil beam algorithms should be avoided for lung SABR planning. For the most challenging cases, small tumors surrounded entirely by lung tissue (lung-island type), a Monte-Carlo-based algorithm may be warranted.
NASA Astrophysics Data System (ADS)
Lee, Dong Hyuk; Kim, JongHyo; Kim, Hee C.; Lee, Yong W.; Min, Byong Goo
1997-04-01
There have been a number of studies on the quantitative evaluation of diffuse liver disease by using texture analysis technique. However, the previous studies have been focused on the classification between only normal and abnormal pattern based on textural properties, resulting in lack of clinically useful information about the progressive status of liver disease. Considering our collaborative research experience with clinical experts, we judged that not only texture information but also several shape properties are necessary in order to successfully classify between various states of disease with liver ultrasonogram. Nine image parameters were selected experimentally. One of these was texture parameter and others were shape parameters measured as length, area and curvature. We have developed a neural-net algorithm that classifies liver ultrasonogram into 9 categories of liver disease: 3 main category and 3 sub-steps for each. Nine parameters were collected semi- automatically from the user by using graphical user interface tool, and then processed to give a grade for each parameter. Classifying algorithm consists of two steps. At the first step, each parameter was graded into pre-defined levels using neural network. in the next step, neural network classifier determined disease status using graded nine parameters. We implemented a PC based computer-assist diagnosis workstation and installed it in radiology department of Seoul National University Hospital. Using this workstation we collected 662 cases during 6 months. Some of these were used for training and others were used for evaluating accuracy of the developed algorithm. As a conclusion, a liver ultrasonogram classifying algorithm was developed using both texture and shape parameters and neural network classifier. Preliminary results indicate that the proposed algorithm is useful for evaluation of diffuse liver disease.
Triacylglycerol stereospecific analysis and linear discriminant analysis for milk speciation.
Blasi, Francesca; Lombardi, Germana; Damiani, Pietro; Simonetti, Maria Stella; Giua, Laura; Cossignani, Lina
2013-05-01
Product authenticity is an important topic in dairy sector. Dairy products sold for public consumption must be accurately labelled in accordance with the contained milk species. Linear discriminant analysis (LDA), a common chemometric procedure, has been applied to fatty acid% composition to classify pure milk samples (cow, ewe, buffalo, donkey, goat). All original grouped cases were correctly classified, while 90% of cross-validated grouped cases were correctly classified. Another objective of this research was the characterisation of cow-ewe milk mixtures in order to reveal a common fraud in dairy field, that is the addition of cow to ewe milk. Stereospecific analysis of triacylglycerols (TAG), a method based on chemical-enzymatic procedures coupled with chromatographic techniques, has been carried out to detect fraudulent milk additions, in particular 1, 3, 5% cow milk added to ewe milk. When only TAG composition data were used for the elaboration, 75% of original grouped cases were correctly classified, while totally correct classified samples were obtained when both total and intrapositional TAG data were used. Also the results of cross validation were better when TAG stereospecific analysis data were considered as LDA variables. In particular, 100% of cross-validated grouped cases were obtained when 5% cow milk mixtures were considered.
NASA Astrophysics Data System (ADS)
Ruske, Simon; Topping, David O.; Foot, Virginia E.; Kaye, Paul H.; Stanley, Warren R.; Crawford, Ian; Morse, Andrew P.; Gallagher, Martin W.
2017-03-01
Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal spores and pollen.This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents, bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.For unsupervised learning we tested hierarchical agglomerative clustering with various different linkages. For supervised learning, 11 methods were tested, including decision trees, ensemble methods (random forests, gradient boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian, quadratic and linear discriminant analysis, the k-nearest neighbours algorithm and artificial neural networks).The methods were applied to two different data sets produced using the new MBS, which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. The first data set contained mixed PSLs and the second contained a variety of laboratory-generated aerosol.Clustering in general performs slightly worse than the supervised learning methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 82. 8 and 98. 27 % of the testing data, respectively, across the two data sets.A possible alternative to gradient boosting is neural networks. We do however note that this method requires much more user input than the other methods, and we suggest that further research should be conducted using this method, especially using parallelised hardware such as the GPU, which would allow for larger networks to be trained, which could possibly yield better results.We also saw that some methods, such as clustering, failed to utilise the additional shape information provided by the instrument, whilst for others, such as the decision trees, ensemble methods and neural networks, improved performance could be attained with the inclusion of such information.
Automatic discrimination of fine roots in minirhizotron images.
Zeng, Guang; Birchfield, Stanley T; Wells, Christina E
2008-01-01
Minirhizotrons provide detailed information on the production, life history and mortality of fine roots. However, manual processing of minirhizotron images is time-consuming, limiting the number and size of experiments that can reasonably be analysed. Previously, an algorithm was developed to automatically detect and measure individual roots in minirhizotron images. Here, species-specific root classifiers were developed to discriminate detected roots from bright background artifacts. Classifiers were developed from training images of peach (Prunus persica), freeman maple (Acer x freemanii) and sweetbay magnolia (Magnolia virginiana) using the Adaboost algorithm. True- and false-positive rates for classifiers were estimated using receiver operating characteristic curves. Classifiers gave true positive rates of 89-94% and false positive rates of 3-7% when applied to nontraining images of the species for which they were developed. The application of a classifier trained on one species to images from another species resulted in little or no reduction in accuracy. These results suggest that a single root classifier can be used to distinguish roots from background objects across multiple minirhizotron experiments. By incorporating root detection and discrimination algorithms into an open-source minirhizotron image analysis application, many analysis tasks that are currently performed by hand can be automated.
Marks, Daniel L; Oldenburg, Amy L; Reynolds, J Joshua; Boppart, Stephen A
2003-01-10
The resolution of optical coherence tomography (OCT) often suffers from blurring caused by material dispersion. We present a numerical algorithm for computationally correcting the effect of material dispersion on OCT reflectance data for homogeneous and stratified media. This is experimentally demonstrated by correcting the image of a polydimethyl siloxane microfludic structure and of glass slides. The algorithm can be implemented using the fast Fourier transform. With broad spectral bandwidths and highly dispersive media or thick objects, dispersion correction becomes increasingly important.
NASA Astrophysics Data System (ADS)
Marks, Daniel L.; Oldenburg, Amy L.; Reynolds, J. Joshua; Boppart, Stephen A.
2003-01-01
The resolution of optical coherence tomography (OCT) often suffers from blurring caused by material dispersion. We present a numerical algorithm for computationally correcting the effect of material dispersion on OCT reflectance data for homogeneous and stratified media. This is experimentally demonstrated by correcting the image of a polydimethyl siloxane microfludic structure and of glass slides. The algorithm can be implemented using the fast Fourier transform. With broad spectral bandwidths and highly dispersive media or thick objects, dispersion correction becomes increasingly important.
NASA Astrophysics Data System (ADS)
Avetisyan, H.; Bruna, O.; Holub, J.
2016-11-01
A numerous techniques and algorithms are dedicated to extract emotions from input data. In our investigation it was stated that emotion-detection approaches can be classified into 3 following types: Keyword based / lexical-based, learning based, and hybrid. The most commonly used techniques, such as keyword-spotting method, Support Vector Machines, Naïve Bayes Classifier, Hidden Markov Model and hybrid algorithms, have impressive results in this sphere and can reach more than 90% determining accuracy.
Deferred discrimination algorithm (nibbling) for target filter management
NASA Astrophysics Data System (ADS)
Caulfield, H. John; Johnson, John L.
1999-07-01
A new method of classifying objects is presented. Rather than trying to form the classifier in one step or in one training algorithm, it is done in a series of small steps, or nibbles. This leads to an efficient and versatile system that is trained in series with single one-shot examples but applied in parallel, is implemented with single layer perceptrons, yet maintains its fully sequential hierarchical structure. Based on the nibbling algorithm, a basic new method of target reference filter management is described.
Cao, Yuan-Yuan; Su, Yan-Gang; Bai, Jin; Wang, Wei; Wang, Jing-Feng; Qin, Sheng-Mei; Ge, Jun-Bo
2015-01-01
Loss of left ventricular (LV) capture may lead to deterioration of heart failure in patients with cardiac resynchronization therapy (CRT). Recognition of loss of LV capture in time is important in clinical practice. A total of 422 electrocardiograms were acquired and analyzed from 53 CRT patients at 8 different pacing settings (LV only, right ventricle [RV] only, biventricular [BV] pacing with LV preactivation of 60, 40, 20, and 0 milliseconds and RV preactivation of 20 and 40 milliseconds). A modified Ammann algorithm by adding a third step-presence of Q (q, or QS) wave-to the original 2-step Ammann algorithm and a QRS axis shift method were devised to identify the loss of LV capture. The accuracy of modified Ammann algorithm was significantly higher than that of Ammann algorithm (78.9% vs. 69.1%, P < 0.001). The accuracy of the axis shift method was 66.4%, which was significantly lower than the modified Ammann algorithm (P < 0.001) and similar to the original one (P = 0.412). However, in the ECGs with QRS axis shift, 96.8% were correctly classified. LV preactivation or simultaneous BV activation and LV lead positioned in nonposterior or noninferior wall could elevate the accuracies of the modified Ammann algorithm and the QRS axis shift method. The accuracy of the modified Ammann algorithm is greatly improved. The QRS axis shift method can help diagnose LV capture. The LV preactivation, or simultaneous BV activation and LV lead positioned in nonposterior or noninferior wall can increase the diagnostic power of the modified Ammann algorithm and QRS axis shift method. © 2014 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Kostadinov, T. S.; Harpold, A.; Hill, R.; McGwire, K.
2017-12-01
Seasonal snow cover is a key component of the hydrologic regime in many regions of the world, especially those in temperate latitudes with mountainous terrain and dry summers. Such regions support large human populations which depend on the mountain snowpack for their water supplies. It is thus important to quantify snow cover accurately and continuously in these regions. Optical remote-sensing methods are able to detect snow and leverage space-borne spectroradiometers with global coverage such as MODIS to produce global snow cover maps. However, snow is harder to detect accurately in mountainous forested terrain, where topography influences retrieval algorithms, and importantly - forest canopies complicate radiative transfer and obfuscate the snow. Current satellite snow cover algorithms assume that fractional snow-covered area (fSCA) under the canopy is the same as the fSCA in the visible portion of the pixel. In-situ observations and first principles considerations indicate otherwise, therefore there is a need for improvement of the under-canopy correction of snow cover. Here, we leverage multiple LIDAR overflights and in-situ observations with a distributed fiber-optic temperature sensor (DTS) to quantify snow cover under canopy as opposed to gap areas at the Sagehen Experimental Forest in the Northern Sierra Nevada, California, USA. Snow-off LIDAR overflights from 2014 are used to create a baseline high-resolution digital elevation model and classify pixels at 1 m resolution as canopy-covered or gap. Low canopy pixels are excluded from the analysis. Snow-on LIDAR overflights conducted by the Airborne Snow Observatory in 2016 are then used to classify all pixels as snow-covered or not and quantify fSCA under canopies vs. in gap areas over the Sagehen watershed. DTS observations are classified as snow-covered or not based on diel temperature fluctuations and used as validation for the LIDAR observations. LIDAR- and DTS-derived fSCA is also compared with retrievals from hyperspectral imaging spectroradiometer (AVIRIS) data. Initial evidence suggest fSCA was generally lower under canopy and that overall snow cover estimates were overestimated as a result. Implications for a canopy correction applicable to coarser-resolution sensors like MODIS are discussed, as are topography and view angle effects.
Balouchestani, Mohammadreza; Krishnan, Sridhar
2014-01-01
Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.
Classifying seismic waveforms from scratch: a case study in the alpine environment
NASA Astrophysics Data System (ADS)
Hammer, C.; Ohrnberger, M.; Fäh, D.
2013-01-01
Nowadays, an increasing amount of seismic data is collected by daily observatory routines. The basic step for successfully analyzing those data is the correct detection of various event types. However, the visually scanning process is a time-consuming task. Applying standard techniques for detection like the STA/LTA trigger still requires the manual control for classification. Here, we present a useful alternative. The incoming data stream is scanned automatically for events of interest. A stochastic classifier, called hidden Markov model, is learned for each class of interest enabling the recognition of highly variable waveforms. In contrast to other automatic techniques as neural networks or support vector machines the algorithm allows to start the classification from scratch as soon as interesting events are identified. Neither the tedious process of collecting training samples nor a time-consuming configuration of the classifier is required. An approach originally introduced for the volcanic task force action allows to learn classifier properties from a single waveform example and some hours of background recording. Besides a reduction of required workload this also enables to detect very rare events. Especially the latter feature provides a milestone point for the use of seismic devices in alpine warning systems. Furthermore, the system offers the opportunity to flag new signal classes that have not been defined before. We demonstrate the application of the classification system using a data set from the Swiss Seismological Survey achieving very high recognition rates. In detail we document all refinements of the classifier providing a step-by-step guide for the fast set up of a well-working classification system.
"ON ALGEBRAIC DECODING OF Q-ARY REED-MULLER AND PRODUCT REED-SOLOMON CODES"
DOE Office of Scientific and Technical Information (OSTI.GOV)
SANTHI, NANDAKISHORE
We consider a list decoding algorithm recently proposed by Pellikaan-Wu for q-ary Reed-Muller codes RM{sub q}({ell}, m, n) of length n {le} q{sup m} when {ell} {le} q. A simple and easily accessible correctness proof is given which shows that this algorithm achieves a relative error-correction radius of {tau} {le} (1-{radical}{ell}q{sup m-1}/n). This is an improvement over the proof using one-point Algebraic-Geometric decoding method given in. The described algorithm can be adapted to decode product Reed-Solomon codes. We then propose a new low complexity recursive aJgebraic decoding algorithm for product Reed-Solomon codes and Reed-Muller codes. This algorithm achieves a relativemore » error correction radius of {tau} {le} {Pi}{sub i=1}{sup m} (1 - {radical}k{sub i}/q). This algorithm is then proved to outperform the Pellikaan-Wu algorithm in both complexity and error correction radius over a wide range of code rates.« less
Bitter or not? BitterPredict, a tool for predicting taste from chemical structure.
Dagan-Wiener, Ayana; Nissim, Ido; Ben Abu, Natalie; Borgonovo, Gigliola; Bassoli, Angela; Niv, Masha Y
2017-09-21
Bitter taste is an innately aversive taste modality that is considered to protect animals from consuming toxic compounds. Yet, bitterness is not always noxious and some bitter compounds have beneficial effects on health. Hundreds of bitter compounds were reported (and are accessible via the BitterDB http://bitterdb.agri.huji.ac.il/dbbitter.php ), but numerous additional bitter molecules are still unknown. The dramatic chemical diversity of bitterants makes bitterness prediction a difficult task. Here we present a machine learning classifier, BitterPredict, which predicts whether a compound is bitter or not, based on its chemical structure. BitterDB was used as the positive set, and non-bitter molecules were gathered from literature to create the negative set. Adaptive Boosting (AdaBoost), based on decision trees machine-learning algorithm was applied to molecules that were represented using physicochemical and ADME/Tox descriptors. BitterPredict correctly classifies over 80% of the compounds in the hold-out test set, and 70-90% of the compounds in three independent external sets and in sensory test validation, providing a quick and reliable tool for classifying large sets of compounds into bitter and non-bitter groups. BitterPredict suggests that about 40% of random molecules, and a large portion (66%) of clinical and experimental drugs, and of natural products (77%) are bitter.
Algorithms for detecting cherry pits on the basis of transmittance mode hyperspectral data
NASA Astrophysics Data System (ADS)
Siedliska, Anna; Zubik, Monika; Baranowski, Piotr; Mazurek, Wojciech
2017-10-01
The suitability of the hyperspectral transmittance imaging technique was assessed in terms of detecting the internal intrusions (pits and their fragments) in cherries. Herein, hyperspectral transmission images were acquired in the visible and near-infrared range (450-1000 nm) from pitted and intact cherries of three popular cultivars: `Łutówka', `Pandy 103', and `Groniasta', differing by soluble solid content. The hyperspectral transmittance data of fresh cherries were used to determine the influence of differing soluble solid content in fruit tissues on pit detection effectiveness. Models for predicting the soluble solid content of cherries were also developed. The principal component analysis and the second derivative pre-treatment of the hyperspectral data were used to construct the supervised classification models. In this study, five classifiers were tested for pit detection. From all the classifiers studied, the best prediction accuracies for the whole pit or pit fragment detection were obtained via the backpropagation neural networks model (87.6% of correctly classified instances for the training/test set and 81.4% for the validation set). The accuracy of distinguishing between drilled and intact cherries was close to 96%. These results showed that the hyperspectral transmittance imaging technique is feasible and useful for the non-destructive detection of pits in cherries.
Zhang, Jianhua; Yin, Zhong; Wang, Rubin
2017-01-01
This paper developed a cognitive task-load (CTL) classification algorithm and allocation strategy to sustain the optimal operator CTL levels over time in safety-critical human-machine integrated systems. An adaptive human-machine system is designed based on a non-linear dynamic CTL classifier, which maps a set of electroencephalogram (EEG) and electrocardiogram (ECG) related features to a few CTL classes. The least-squares support vector machine (LSSVM) is used as dynamic pattern classifier. A series of electrophysiological and performance data acquisition experiments were performed on seven volunteer participants under a simulated process control task environment. The participant-specific dynamic LSSVM model is constructed to classify the instantaneous CTL into five classes at each time instant. The initial feature set, comprising 56 EEG and ECG related features, is reduced to a set of 12 salient features (including 11 EEG-related features) by using the locality preserving projection (LPP) technique. An overall correct classification rate of about 80% is achieved for the 5-class CTL classification problem. Then the predicted CTL is used to adaptively allocate the number of process control tasks between operator and computer-based controller. Simulation results showed that the overall performance of the human-machine system can be improved by using the adaptive automation strategy proposed.
Accuracy and efficiency of area classifications based on tree tally
Michael S. Williams; Hans T. Schreuder; Raymond L. Czaplewski
2001-01-01
Inventory data are often used to estimate the area of the land base that is classified as a specific condition class. Examples include areas classified as old-growth forest, private ownership, or suitable habitat for a given species. Many inventory programs rely on classification algorithms of varying complexity to determine condition class. These algorithms can be...
Weng, Wei-Hung; Wagholikar, Kavishwar B; McCray, Alexa T; Szolovits, Peter; Chueh, Henry C
2017-12-01
The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note. We constructed the pipeline using the clinical NLP system, clinical Text Analysis and Knowledge Extraction System (cTAKES), the Unified Medical Language System (UMLS) Metathesaurus, Semantic Network, and learning algorithms to extract features from two datasets - clinical notes from Integrating Data for Analysis, Anonymization, and Sharing (iDASH) data repository (n = 431) and Massachusetts General Hospital (MGH) (n = 91,237), and built medical subdomain classifiers with different combinations of data representation methods and supervised learning algorithms. We evaluated the performance of classifiers and their portability across the two datasets. The convolutional recurrent neural network with neural word embeddings trained-medical subdomain classifier yielded the best performance measurement on iDASH and MGH datasets with area under receiver operating characteristic curve (AUC) of 0.975 and 0.991, and F1 scores of 0.845 and 0.870, respectively. Considering better clinical interpretability, linear support vector machine-trained medical subdomain classifier using hybrid bag-of-words and clinically relevant UMLS concepts as the feature representation, with term frequency-inverse document frequency (tf-idf)-weighting, outperformed other shallow learning classifiers on iDASH and MGH datasets with AUC of 0.957 and 0.964, and F1 scores of 0.932 and 0.934 respectively. We trained classifiers on one dataset, applied to the other dataset and yielded the threshold of F1 score of 0.7 in classifiers for half of the medical subdomains we studied. Our study shows that a supervised learning-based NLP approach is useful to develop medical subdomain classifiers. The deep learning algorithm with distributed word representation yields better performance yet shallow learning algorithms with the word and concept representation achieves comparable performance with better clinical interpretability. Portable classifiers may also be used across datasets from different institutions.
NASA Astrophysics Data System (ADS)
Sakkas, Georgios; Sakellariou, Nikolaos
2018-05-01
Strong motion recordings are the key in many earthquake engineering applications and are also fundamental for seismic design. The present study focuses on the automated correction of accelerograms, analog and digital. The main feature of the proposed algorithm is the automatic selection for the cut-off frequencies based on a minimum spectral value in a predefined frequency bandwidth, instead of the typical signal-to-noise approach. The algorithm follows the basic steps of the correction procedure (instrument correction, baseline correction and appropriate filtering). Besides the corrected time histories, Peak Ground Acceleration, Peak Ground Velocity, Peak Ground Displacement values and the corrected Fourier Spectra are also calculated as well as the response spectra. The algorithm is written in Matlab environment, is fast enough and can be used for batch processing or in real-time applications. In addition, the possibility to also perform a signal-to-noise ratio is added as well as to perform causal or acausal filtering. The algorithm has been tested in six significant earthquakes (Kozani-Grevena 1995, Aigio 1995, Athens 1999, Lefkada 2003 and Kefalonia 2014) of the Greek territory with analog and digital accelerograms.
Wang, Chang; Qin, Xin; Liu, Yan; Zhang, Wenchao
2016-06-01
An adaptive inertia weight particle swarm algorithm is proposed in this study to solve the local optimal problem with the method of traditional particle swarm optimization in the process of estimating magnetic resonance(MR)image bias field.An indicator measuring the degree of premature convergence was designed for the defect of traditional particle swarm optimization algorithm.The inertia weight was adjusted adaptively based on this indicator to ensure particle swarm to be optimized globally and to avoid it from falling into local optimum.The Legendre polynomial was used to fit bias field,the polynomial parameters were optimized globally,and finally the bias field was estimated and corrected.Compared to those with the improved entropy minimum algorithm,the entropy of corrected image was smaller and the estimated bias field was more accurate in this study.Then the corrected image was segmented and the segmentation accuracy obtained in this research was 10% higher than that with improved entropy minimum algorithm.This algorithm can be applied to the correction of MR image bias field.
Towards automated assistance for operating home medical devices.
Gao, Zan; Detyniecki, Marcin; Chen, Ming-Yu; Wu, Wen; Hauptmann, Alexander G; Wactlar, Howard D
2010-01-01
To detect errors when subjects operate a home medical device, we observe them with multiple cameras. We then perform action recognition with a robust approach to recognize action information based on explicitly encoding motion information. This algorithm detects interest points and encodes not only their local appearance but also explicitly models local motion. Our goal is to recognize individual human actions in the operations of a home medical device to see if the patient has correctly performed the required actions in the prescribed sequence. Using a specific infusion pump as a test case, requiring 22 operation steps from 6 action classes, our best classifier selects high likelihood action estimates from 4 available cameras, to obtain an average class recognition rate of 69%.
de Lasarte, Marta; Pujol, Jaume; Arjona, Montserrat; Vilaseca, Meritxell
2007-01-10
We present an optimized linear algorithm for the spatial nonuniformity correction of a CCD color camera's imaging system and the experimental methodology developed for its implementation. We assess the influence of the algorithm's variables on the quality of the correction, that is, the dark image, the base correction image, and the reference level, and the range of application of the correction using a uniform radiance field provided by an integrator cube. The best spatial nonuniformity correction is achieved by having a nonzero dark image, by using an image with a mean digital level placed in the linear response range of the camera as the base correction image and taking the mean digital level of the image as the reference digital level. The response of the CCD color camera's imaging system to the uniform radiance field shows a high level of spatial uniformity after the optimized algorithm has been applied, which also allows us to achieve a high-quality spatial nonuniformity correction of captured images under different exposure conditions.
Calculated X-ray Intensities Using Monte Carlo Algorithms: A Comparison to Experimental EPMA Data
NASA Technical Reports Server (NTRS)
Carpenter, P. K.
2005-01-01
Monte Carlo (MC) modeling has been used extensively to simulate electron scattering and x-ray emission from complex geometries. Here are presented comparisons between MC results and experimental electron-probe microanalysis (EPMA) measurements as well as phi(rhoz) correction algorithms. Experimental EPMA measurements made on NIST SRM 481 (AgAu) and 482 (CuAu) alloys, at a range of accelerating potential and instrument take-off angles, represent a formal microanalysis data set that has been widely used to develop phi(rhoz) correction algorithms. X-ray intensity data produced by MC simulations represents an independent test of both experimental and phi(rhoz) correction algorithms. The alpha-factor method has previously been used to evaluate systematic errors in the analysis of semiconductor and silicate minerals, and is used here to compare the accuracy of experimental and MC-calculated x-ray data. X-ray intensities calculated by MC are used to generate a-factors using the certificated compositions in the CuAu binary relative to pure Cu and Au standards. MC simulations are obtained using the NIST, WinCasino, and WinXray algorithms; derived x-ray intensities have a built-in atomic number correction, and are further corrected for absorption and characteristic fluorescence using the PAP phi(rhoz) correction algorithm. The Penelope code additionally simulates both characteristic and continuum x-ray fluorescence and thus requires no further correction for use in calculating alpha-factors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Iwai, P; Lins, L Nadler
Purpose: There is a lack of studies with significant cohort data about patients using pacemaker (PM), implanted cardioverter defibrillator (ICD) or cardiac resynchronization therapy (CRT) device undergoing radiotherapy. There is no literature comparing the cumulative doses delivered to those cardiac implanted electronic devices (CIED) calculated by different algorithms neither studies comparing doses with heterogeneity correction or not. The aim of this study was to evaluate the influence of the algorithms Pencil Beam Convolution (PBC), Analytical Anisotropic Algorithm (AAA) and Acuros XB (AXB) as well as heterogeneity correction on risk categorization of patients. Methods: A retrospective analysis of 19 3DCRT ormore » IMRT plans of 17 patients was conducted, calculating the dose delivered to CIED using three different calculation algorithms. Doses were evaluated with and without heterogeneity correction for comparison. Risk categorization of the patients was based on their CIED dependency and cumulative dose in the devices. Results: Total estimated doses at CIED calculated by AAA or AXB were higher than those calculated by PBC in 56% of the cases. In average, the doses at CIED calculated by AAA and AXB were higher than those calculated by PBC (29% and 4% higher, respectively). The maximum difference of doses calculated by each algorithm was about 1 Gy, either using heterogeneity correction or not. Values of maximum dose calculated with heterogeneity correction showed that dose at CIED was at least equal or higher in 84% of the cases with PBC, 77% with AAA and 67% with AXB than dose obtained with no heterogeneity correction. Conclusion: The dose calculation algorithm and heterogeneity correction did not change the risk categorization. Since higher estimated doses delivered to CIED do not compromise treatment precautions to be taken, it’s recommend that the most sophisticated algorithm available should be used to predict dose at the CIED using heterogeneity correction.« less
Comparative analysis of peak-detection techniques for comprehensive two-dimensional chromatography.
Latha, Indu; Reichenbach, Stephen E; Tao, Qingping
2011-09-23
Comprehensive two-dimensional gas chromatography (GC×GC) is a powerful technology for separating complex samples. The typical goal of GC×GC peak detection is to aggregate data points of analyte peaks based on their retention times and intensities. Two techniques commonly used for two-dimensional peak detection are the two-step algorithm and the watershed algorithm. A recent study [4] compared the performance of the two-step and watershed algorithms for GC×GC data with retention-time shifts in the second-column separations. In that analysis, the peak retention-time shifts were corrected while applying the two-step algorithm but the watershed algorithm was applied without shift correction. The results indicated that the watershed algorithm has a higher probability of erroneously splitting a single two-dimensional peak than the two-step approach. This paper reconsiders the analysis by comparing peak-detection performance for resolved peaks after correcting retention-time shifts for both the two-step and watershed algorithms. Simulations with wide-ranging conditions indicate that when shift correction is employed with both algorithms, the watershed algorithm detects resolved peaks with greater accuracy than the two-step method. Copyright © 2011 Elsevier B.V. All rights reserved.
Segmentation of vessels: the corkscrew algorithm
NASA Astrophysics Data System (ADS)
Wesarg, Stefan; Firle, Evelyn A.
2004-05-01
Medical imaging is nowadays much more than only providing data for diagnosis. It also links 'classical' diagnosis to modern forms of treatment such as image guided surgery. Those systems require the identification of organs, anatomical regions of the human body etc., i. e. the segmentation of structures from medical data sets. The algorithms used for these segmentation tasks strongly depend on the object to be segmented. One structure which plays an important role in surgery planning are vessels that are found everywhere in the human body. Several approaches for their extraction already exist. However, there is no general one which is suitable for all types of data or all sorts of vascular structures. This work presents a new algorithm for the segmentation of vessels. It can be classified as a skeleton-based approach working on 3D data sets, and has been designed for a reliable segmentation of coronary arteries. The algorithm is a semi-automatic extraction technique requiring the definition of the start and end the point of the (centerline) path to be found. A first estimation of the vessel's centerline is calculated and then corrected iteratively by detecting the vessel's border perpendicular to the centerline. We used contrast enhanced CT data sets of the thorax for testing our approach. Coronary arteries have been extracted from the data sets using the 'corkscrew algorithm' presented in this work. The segmentation turned out to be robust even if moderate breathing artifacts were present in the data sets.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cao, Zheng; Ouyang, Bing; Principe, Jose
A multi-static serial LiDAR system prototype was developed under DE-EE0006787 to detect, classify, and record interactions of marine life with marine hydrokinetic generation equipment. This software implements a shape-matching based classifier algorithm for the underwater automated detection of marine life for that system. In addition to applying shape descriptors, the algorithm also adopts information theoretical learning based affine shape registration, improving point correspondences found by shape descriptors as well as the final similarity measure.
Automatic identification of high impact articles in PubMed to support clinical decision making.
Bian, Jiantao; Morid, Mohammad Amin; Jonnalagadda, Siddhartha; Luo, Gang; Del Fiol, Guilherme
2017-09-01
The practice of evidence-based medicine involves integrating the latest best available evidence into patient care decisions. Yet, critical barriers exist for clinicians' retrieval of evidence that is relevant for a particular patient from primary sources such as randomized controlled trials and meta-analyses. To help address those barriers, we investigated machine learning algorithms that find clinical studies with high clinical impact from PubMed®. Our machine learning algorithms use a variety of features including bibliometric features (e.g., citation count), social media attention, journal impact factors, and citation metadata. The algorithms were developed and evaluated with a gold standard composed of 502 high impact clinical studies that are referenced in 11 clinical evidence-based guidelines on the treatment of various diseases. We tested the following hypotheses: (1) our high impact classifier outperforms a state-of-the-art classifier based on citation metadata and citation terms, and PubMed's® relevance sort algorithm; and (2) the performance of our high impact classifier does not decrease significantly after removing proprietary features such as citation count. The mean top 20 precision of our high impact classifier was 34% versus 11% for the state-of-the-art classifier and 4% for PubMed's® relevance sort (p=0.009); and the performance of our high impact classifier did not decrease significantly after removing proprietary features (mean top 20 precision=34% vs. 36%; p=0.085). The high impact classifier, using features such as bibliometrics, social media attention and MEDLINE® metadata, outperformed previous approaches and is a promising alternative to identifying high impact studies for clinical decision support. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Baránek, M.; Běhal, J.; Bouchal, Z.
2018-01-01
In the phase retrieval applications, the Gerchberg-Saxton (GS) algorithm is widely used for the simplicity of implementation. This iterative process can advantageously be deployed in the combination with a spatial light modulator (SLM) enabling simultaneous correction of optical aberrations. As recently demonstrated, the accuracy and efficiency of the aberration correction using the GS algorithm can be significantly enhanced by a vortex image spot used as the target intensity pattern in the iterative process. Here we present an optimization of the spiral phase modulation incorporated into the GS algorithm.
An Efficient Statistical Computation Technique for Health Care Big Data using R
NASA Astrophysics Data System (ADS)
Sushma Rani, N.; Srinivasa Rao, P., Dr; Parimala, P.
2017-08-01
Due to the changes in living conditions and other factors many critical health related problems are arising. The diagnosis of the problem at earlier stages will increase the chances of survival and fast recovery. This reduces the time of recovery and the cost associated for the treatment. One such medical related issue is cancer and breast cancer has been identified as the second leading cause of cancer death. If detected in the early stage it can be cured. Once a patient is detected with breast cancer tumor, it should be classified whether it is cancerous or non-cancerous. So the paper uses k-nearest neighbors(KNN) algorithm which is one of the simplest machine learning algorithms and is an instance-based learning algorithm to classify the data. Day-to -day new records are added which leds to increase in the data to be classified and this tends to be big data problem. The algorithm is implemented in R whichis the most popular platform applied to machine learning algorithms for statistical computing. Experimentation is conducted by using various classification evaluation metric onvarious values of k. The results show that the KNN algorithm out performes better than existing models.
Distortion correction of echo-planar diffusion-weighted images of uterine cervix.
deSouza, Nandita M; Orton, Matthew; Downey, Kate; Morgan, Veronica A; Collins, David J; Giles, Sharon L; Payne, Geoffrey S
2016-05-01
To investigate the clinical utility of the reverse gradient algorithm in correcting distortions in diffusion-weighted images of the cervix and for increasing diagnostic performance. Forty-one patients ages 25-72 years (mean 40 ± 11 years) with suspected or early stage cervical cancer were imaged at 3T using an endovaginal coil. T2 -weighted (W) and diffusion-weighted images with right and left phase-encode gradient directions were obtained coronal to the cervix (b = 0, 100, 300, 500, 800 s mm(-2) ). Differences in angle of the endocervical canal to the x-axis between T2 W and right-gradient, left-gradient, and corrected images were measured. Uncorrected and corrected images were assessed for diagnostic performance when viewed together with T2 W images by two independent observers against subsequent histology. The angles of the endocervical canal relative to the x-axis were significantly different between the T2 W images and the right-gradient images (P = 0.007), approached significance for left-gradient images (P = 0.055), and were not significantly different after correction (P = 0.95). Corrected images enabled a definitive diagnosis in 34% (n = 14) of patients classified as equivocal on uncorrected images. Tumor volume in this subset was 0.18 ± 0.44 cm(3) (mean ± SD; sensitivity of detection 100% [8/8], specificity 50% [3/6] for an experienced observer). Correction did not improve diagnostic performance for the less-experienced observer. Distortion-corrected diffusion-weighted images improved correspondence with T2 W images and diagnostic performance in a third of cases. © 2015 The Authors Journal of Magnetic Resonance Imaging published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine.
Implementation and performance of shutterless uncooled micro-bolometer cameras
NASA Astrophysics Data System (ADS)
Das, J.; de Gaspari, D.; Cornet, P.; Deroo, P.; Vermeiren, J.; Merken, P.
2015-06-01
A shutterless algorithm is implemented into the Xenics LWIR thermal cameras and modules. Based on a calibration set and a global temperature coefficient the optimal non-uniformity correction is calculated onboard of the camera. The limited resources in the camera require a compact algorithm, hence the efficiency of the coding is important. The performance of the shutterless algorithm is studied by a comparison of the residual non-uniformity (RNU) and signal-to-noise ratio (SNR) between the shutterless and shuttered correction algorithm. From this comparison we conclude that the shutterless correction is only slightly less performant compared to the standard shuttered algorithm, making this algorithm very interesting for thermal infrared applications where small weight and size, and continuous operation are important.
Identification of DNA-Binding Proteins Using Structural, Electrostatic and Evolutionary Features
Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir
2009-01-01
Summary DNA binding proteins (DBPs) often take part in various crucial processes of the cell's life cycle. Therefore, the identification and characterization of these proteins are of great importance. We present here a random forests classifier for identifying DBPs among proteins with known three-dimensional structures. First, clusters of evolutionarily conserved regions (patches) on the protein's surface are detected using the PatchFinder algorithm; previous studies showed that these regions are typically the proteins' functionally important regions. Next, we train a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein including its dipole moment. Using 10-fold cross validation on a dataset of 138 DNA-binding proteins and 110 proteins which do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of previously published methods. Furthermore, when we tested 5 different methods on 11 new DBPs which did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA. PMID:19233205
Identification of DNA-binding proteins using structural, electrostatic and evolutionary features.
Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir
2009-04-10
DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.
Pattern classifier for health monitoring of helicopter gearboxes
NASA Technical Reports Server (NTRS)
Chin, Hsinyung; Danai, Kourosh; Lewicki, David G.
1993-01-01
The application of a newly developed diagnostic method to a helicopter gearbox is demonstrated. This method is a pattern classifier which uses a multi-valued influence matrix (MVIM) as its diagnostic model. The method benefits from a fast learning algorithm, based on error feedback, that enables it to estimate gearbox health from a small set of measurement-fault data. The MVIM method can also assess the diagnosability of the system and variability of the fault signatures as the basis to improve fault signatures. This method was tested on vibration signals reflecting various faults in an OH-58A main rotor transmission gearbox. The vibration signals were then digitized and processed by a vibration signal analyzer to enhance and extract various features of the vibration data. The parameters obtained from this analyzer were utilized to train and test the performance of the MVIM method in both detection and diagnosis. The results indicate that the MVIM method provided excellent detection results when the full range of faults effects on the measurements were included in training, and it had a correct diagnostic rate of 95 percent when the faults were included in training.
Shift-, rotation-, and scale-invariant shape recognition system using an optical Hough transform
NASA Astrophysics Data System (ADS)
Schmid, Volker R.; Bader, Gerhard; Lueder, Ernst H.
1998-02-01
We present a hybrid shape recognition system with an optical Hough transform processor. The features of the Hough space offer a separate cancellation of distortions caused by translations and rotations. Scale invariance is also provided by suitable normalization. The proposed system extends the capabilities of Hough transform based detection from only straight lines to areas bounded by edges. A very compact optical design is achieved by a microlens array processor accepting incoherent light as direct optical input and realizing the computationally expensive connections massively parallel. Our newly developed algorithm extracts rotation and translation invariant normalized patterns of bright spots on a 2D grid. A neural network classifier maps the 2D features via a nonlinear hidden layer onto the classification output vector. We propose initialization of the connection weights according to regions of activity specifically assigned to each neuron in the hidden layer using a competitive network. The presented system is designed for industry inspection applications. Presently we have demonstrated detection of six different machined parts in real-time. Our method yields very promising detection results of more than 96% correctly classified parts.
Machine learning vortices at the Kosterlitz-Thouless transition
NASA Astrophysics Data System (ADS)
Beach, Matthew J. S.; Golubeva, Anna; Melko, Roger G.
2018-01-01
Efficient and automated classification of phases from minimally processed data is one goal of machine learning in condensed-matter and statistical physics. Supervised algorithms trained on raw samples of microstates can successfully detect conventional phase transitions via learning a bulk feature such as an order parameter. In this paper, we investigate whether neural networks can learn to classify phases based on topological defects. We address this question on the two-dimensional classical XY model which exhibits a Kosterlitz-Thouless transition. We find significant feature engineering of the raw spin states is required to convincingly claim that features of the vortex configurations are responsible for learning the transition temperature. We further show a single-layer network does not correctly classify the phases of the XY model, while a convolutional network easily performs classification by learning the global magnetization. Finally, we design a deep network capable of learning vortices without feature engineering. We demonstrate the detection of vortices does not necessarily result in the best classification accuracy, especially for lattices of less than approximately 1000 spins. For larger systems, it remains a difficult task to learn vortices.
Efficient error correction for next-generation sequencing of viral amplicons
2012-01-01
Background Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. Results In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Conclusions Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses. The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm PMID:22759430
Efficient error correction for next-generation sequencing of viral amplicons.
Skums, Pavel; Dimitrova, Zoya; Campo, David S; Vaughan, Gilberto; Rossi, Livia; Forbi, Joseph C; Yokosawa, Jonny; Zelikovsky, Alex; Khudyakov, Yury
2012-06-25
Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm.
A Dual-Wavelength Radar Technique to Detect Hydrometeor Phases
NASA Technical Reports Server (NTRS)
Liao, Liang; Meneghini, Robert
2016-01-01
This study is aimed at investigating the feasibility of a Ku- and Ka-band space/air-borne dual wavelength radar algorithm to discriminate various phase states of precipitating hydrometeors. A phase-state classification algorithm has been developed from the radar measurements of snow, mixed-phase and rain obtained from stratiform storms. The algorithm, presented in the form of the look-up table that links the Ku-band radar reflectivities and dual-frequency ratio (DFR) to the phase states of hydrometeors, is checked by applying it to the measurements of the Jet Propulsion Laboratory, California Institute of Technology, Airborne Precipitation Radar Second Generation (APR-2). In creating the statistically-based phase look-up table, the attenuation corrected (or true) radar reflectivity factors are employed, leading to better accuracy in determining the hydrometeor phase. In practice, however, the true radar reflectivities are not always available before the phase states of the hydrometeors are determined. Therefore, it is desirable to make use of the measured radar reflectivities in classifying the phase states. To do this, a phase-identification procedure is proposed that uses only measured radar reflectivities. The procedure is then tested using APR-2 airborne radar data. Analysis of the classification results in stratiform rain indicates that the regions of snow, mixed-phase and rain derived from the phase-identification algorithm coincide reasonably well with those determined from the measured radar reflectivities and linear depolarization ratio (LDR).
Identification of Anisomerous Motor Imagery EEG Signals Based on Complex Algorithms
Zhang, Zhiwen; Duan, Feng; Zhou, Xin; Meng, Zixuan
2017-01-01
Motor imagery (MI) electroencephalograph (EEG) signals are widely applied in brain-computer interface (BCI). However, classified MI states are limited, and their classification accuracy rates are low because of the characteristics of nonlinearity and nonstationarity. This study proposes a novel MI pattern recognition system that is based on complex algorithms for classifying MI EEG signals. In electrooculogram (EOG) artifact preprocessing, band-pass filtering is performed to obtain the frequency band of MI-related signals, and then, canonical correlation analysis (CCA) combined with wavelet threshold denoising (WTD) is used for EOG artifact preprocessing. We propose a regularized common spatial pattern (R-CSP) algorithm for EEG feature extraction by incorporating the principle of generic learning. A new classifier combining the K-nearest neighbor (KNN) and support vector machine (SVM) approaches is used to classify four anisomerous states, namely, imaginary movements with the left hand, right foot, and right shoulder and the resting state. The highest classification accuracy rate is 92.5%, and the average classification accuracy rate is 87%. The proposed complex algorithm identification method can significantly improve the identification rate of the minority samples and the overall classification performance. PMID:28874909
Extraction and classification of 3D objects from volumetric CT data
NASA Astrophysics Data System (ADS)
Song, Samuel M.; Kwon, Junghyun; Ely, Austin; Enyeart, John; Johnson, Chad; Lee, Jongkyu; Kim, Namho; Boyd, Douglas P.
2016-05-01
We propose an Automatic Threat Detection (ATD) algorithm for Explosive Detection System (EDS) using our multistage Segmentation Carving (SC) followed by Support Vector Machine (SVM) classifier. The multi-stage Segmentation and Carving (SC) step extracts all suspect 3-D objects. The feature vector is then constructed for all extracted objects and the feature vector is classified by the Support Vector Machine (SVM) previously learned using a set of ground truth threat and benign objects. The learned SVM classifier has shown to be effective in classification of different types of threat materials. The proposed ATD algorithm robustly deals with CT data that are prone to artifacts due to scatter, beam hardening as well as other systematic idiosyncrasies of the CT data. Furthermore, the proposed ATD algorithm is amenable for including newly emerging threat materials as well as for accommodating data from newly developing sensor technologies. Efficacy of the proposed ATD algorithm with the SVM classifier is demonstrated by the Receiver Operating Characteristics (ROC) curve that relates Probability of Detection (PD) as a function of Probability of False Alarm (PFA). The tests performed using CT data of passenger bags shows excellent performance characteristics.
Szantoi, Zoltan; Escobedo, Francisco J; Abd-Elrahman, Amr; Pearlstine, Leonard; Dewitt, Bon; Smith, Scot
2015-05-01
Mapping of wetlands (marsh vs. swamp vs. upland) is a common remote sensing application.Yet, discriminating between similar freshwater communities such as graminoid/sedge fromremotely sensed imagery is more difficult. Most of this activity has been performed using medium to low resolution imagery. There are only a few studies using highspatial resolutionimagery and machine learning image classification algorithms for mapping heterogeneouswetland plantcommunities. This study addresses this void by analyzing whether machine learning classifierssuch as decisiontrees (DT) and artificial neural networks (ANN) can accurately classify graminoid/sedgecommunities usinghigh resolution aerial imagery and image texture data in the Everglades National Park, Florida.In addition tospectral bands, the normalized difference vegetation index, and first- and second-order texturefeatures derivedfrom the near-infrared band were analyzed. Classifier accuracies were assessed using confusiontablesand the calculated kappa coefficients of the resulting maps. The results indicated that an ANN(multilayerperceptron based on backpropagation) algorithm produced a statistically significantly higheraccuracy(82.04%) than the DT (QUEST) algorithm (80.48%) or the maximum likelihood (80.56%)classifier (α<0.05). Findings show that using multiple window sizes provided the best results. First-ordertexture featuresalso provided computational advantages and results that were not significantly different fromthose usingsecond-order texture features.
Rodriguez-Diaz, Eladio; Castanon, David A; Singh, Satish K; Bigio, Irving J
2011-06-01
Optical spectroscopy has shown potential as a real-time, in vivo, diagnostic tool for identifying neoplasia during endoscopy. We present the development of a diagnostic algorithm to classify elastic-scattering spectroscopy (ESS) spectra as either neoplastic or non-neoplastic. The algorithm is based on pattern recognition methods, including ensemble classifiers, in which members of the ensemble are trained on different regions of the ESS spectrum, and misclassification-rejection, where the algorithm identifies and refrains from classifying samples that are at higher risk of being misclassified. These "rejected" samples can be reexamined by simply repositioning the probe to obtain additional optical readings or ultimately by sending the polyp for histopathological assessment, as per standard practice. Prospective validation using separate training and testing sets result in a baseline performance of sensitivity = .83, specificity = .79, using the standard framework of feature extraction (principal component analysis) followed by classification (with linear support vector machines). With the developed algorithm, performance improves to Se ∼ 0.90, Sp ∼ 0.90, at a cost of rejecting 20-33% of the samples. These results are on par with a panel of expert pathologists. For colonoscopic prevention of colorectal cancer, our system could reduce biopsy risk and cost, obviate retrieval of non-neoplastic polyps, decrease procedure time, and improve assessment of cancer risk.
Rodriguez-Diaz, Eladio; Castanon, David A.; Singh, Satish K.; Bigio, Irving J.
2011-01-01
Optical spectroscopy has shown potential as a real-time, in vivo, diagnostic tool for identifying neoplasia during endoscopy. We present the development of a diagnostic algorithm to classify elastic-scattering spectroscopy (ESS) spectra as either neoplastic or non-neoplastic. The algorithm is based on pattern recognition methods, including ensemble classifiers, in which members of the ensemble are trained on different regions of the ESS spectrum, and misclassification-rejection, where the algorithm identifies and refrains from classifying samples that are at higher risk of being misclassified. These “rejected” samples can be reexamined by simply repositioning the probe to obtain additional optical readings or ultimately by sending the polyp for histopathological assessment, as per standard practice. Prospective validation using separate training and testing sets result in a baseline performance of sensitivity = .83, specificity = .79, using the standard framework of feature extraction (principal component analysis) followed by classification (with linear support vector machines). With the developed algorithm, performance improves to Se ∼ 0.90, Sp ∼ 0.90, at a cost of rejecting 20–33% of the samples. These results are on par with a panel of expert pathologists. For colonoscopic prevention of colorectal cancer, our system could reduce biopsy risk and cost, obviate retrieval of non-neoplastic polyps, decrease procedure time, and improve assessment of cancer risk. PMID:21721830
Classifying Imbalanced Data Streams via Dynamic Feature Group Weighting with Importance Sampling.
Wu, Ke; Edwards, Andrea; Fan, Wei; Gao, Jing; Zhang, Kun
2014-04-01
Data stream classification and imbalanced data learning are two important areas of data mining research. Each has been well studied to date with many interesting algorithms developed. However, only a few approaches reported in literature address the intersection of these two fields due to their complex interplay. In this work, we proposed an importance sampling driven, dynamic feature group weighting framework (DFGW-IS) for classifying data streams of imbalanced distribution. Two components are tightly incorporated into the proposed approach to address the intrinsic characteristics of concept-drifting, imbalanced streaming data. Specifically, the ever-evolving concepts are tackled by a weighted ensemble trained on a set of feature groups with each sub-classifier (i.e. a single classifier or an ensemble) weighed by its discriminative power and stable level. The un-even class distribution, on the other hand, is typically battled by the sub-classifier built in a specific feature group with the underlying distribution rebalanced by the importance sampling technique. We derived the theoretical upper bound for the generalization error of the proposed algorithm. We also studied the empirical performance of our method on a set of benchmark synthetic and real world data, and significant improvement has been achieved over the competing algorithms in terms of standard evaluation metrics and parallel running time. Algorithm implementations and datasets are available upon request.
Proposed hybrid-classifier ensemble algorithm to map snow cover area
NASA Astrophysics Data System (ADS)
Nijhawan, Rahul; Raman, Balasubramanian; Das, Josodhir
2018-01-01
Metaclassification ensemble approach is known to improve the prediction performance of snow-covered area. The methodology adopted in this case is based on neural network along with four state-of-art machine learning algorithms: support vector machine, artificial neural networks, spectral angle mapper, K-mean clustering, and a snow index: normalized difference snow index. An AdaBoost ensemble algorithm related to decision tree for snow-cover mapping is also proposed. According to available literature, these methods have been rarely used for snow-cover mapping. Employing the above techniques, a study was conducted for Raktavarn and Chaturangi Bamak glaciers, Uttarakhand, Himalaya using multispectral Landsat 7 ETM+ (enhanced thematic mapper) image. The study also compares the results with those obtained from statistical combination methods (majority rule and belief functions) and accuracies of individual classifiers. Accuracy assessment is performed by computing the quantity and allocation disagreement, analyzing statistic measures (accuracy, precision, specificity, AUC, and sensitivity) and receiver operating characteristic curves. A total of 225 combinations of parameters for individual classifiers were trained and tested on the dataset and results were compared with the proposed approach. It was observed that the proposed methodology produced the highest classification accuracy (95.21%), close to (94.01%) that was produced by the proposed AdaBoost ensemble algorithm. From the sets of observations, it was concluded that the ensemble of classifiers produced better results compared to individual classifiers.
Cho, Dongrae; Ham, Jinsil; Oh, Jooyoung; Park, Jeanho; Kim, Sayup; Lee, Nak-Kyu; Lee, Boreom
2017-10-24
Virtual reality (VR) is a computer technique that creates an artificial environment composed of realistic images, sounds, and other sensations. Many researchers have used VR devices to generate various stimuli, and have utilized them to perform experiments or to provide treatment. In this study, the participants performed mental tasks using a VR device while physiological signals were measured: a photoplethysmogram (PPG), electrodermal activity (EDA), and skin temperature (SKT). In general, stress is an important factor that can influence the autonomic nervous system (ANS). Heart-rate variability (HRV) is known to be related to ANS activity, so we used an HRV derived from the PPG peak interval. In addition, the peak characteristics of the skin conductance (SC) from EDA and SKT variation can also reflect ANS activity; we utilized them as well. Then, we applied a kernel-based extreme-learning machine (K-ELM) to correctly classify the stress levels induced by the VR task to reflect five different levels of stress situations: baseline, mild stress, moderate stress, severe stress, and recovery. Twelve healthy subjects voluntarily participated in the study. Three physiological signals were measured in stress environment generated by VR device. As a result, the average classification accuracy was over 95% using K-ELM and the integrated feature (IT = HRV + SC + SKT). In addition, the proposed algorithm can embed a microcontroller chip since K-ELM algorithm have very short computation time. Therefore, a compact wearable device classifying stress levels using physiological signals can be developed.
Seizures in the elderly: development and validation of a diagnostic algorithm.
Dupont, Sophie; Verny, Marc; Harston, Sandrine; Cartz-Piver, Leslie; Schück, Stéphane; Martin, Jennifer; Puisieux, François; Alecu, Cosmin; Vespignani, Hervé; Marchal, Cécile; Derambure, Philippe
2010-05-01
Seizures are frequent in the elderly, but their diagnosis can be challenging. The objective of this work was to develop and validate an expert-based algorithm for the diagnosis of seizures in elderly people. A multidisciplinary group of neurologists and geriatricians developed a diagnostic algorithm using a combination of selected clinical, electroencephalographical and radiological criteria. The algorithm was validated by multicentre retrospective analysis of data of patients referred for specific symptoms and classified by the experts as epileptic patients or not. The algorithm was applied to all the patients, and the diagnosis provided by the algorithm was compared to the clinical diagnosis of the experts. Twenty-nine clinical, electroencephalographical and radiological criteria were selected for the algorithm. According to criteria combination, seizures were classified in four levels of diagnosis: certain, highly probable, possible or improbable. To validate the algorithm, the medical records of 269 elderly patients were analyzed (138 with epileptic seizures, 131 with non-epileptic manifestations). Patients were mainly referred for a transient focal deficit (40%), confusion (38%), unconsciousness (27%). The algorithm best classified certain and probable seizures versus possible and improbable seizures, with 86.2% sensitivity and 67.2% specificity. Using logistical regression, 2 simplified models were developed, the first with 13 criteria (Se 85.5%, Sp 90.1%), and the second with 7 criteria only (Se 84.8%, Sp 88.6%). In conclusion, the present study validated the use of a revised diagnostic algorithm to help diagnosis epileptic seizures in the elderly. A prospective study is planned to further validate this algorithm. Copyright 2010 Elsevier B.V. All rights reserved.
Dementia diagnoses from clinical and neuropsychological data compared: the Cache County study.
Tschanz, J T; Welsh-Bohmer, K A; Skoog, I; West, N; Norton, M C; Wyse, B W; Nickles, R; Breitner, J C
2000-03-28
To validate a neuropsychological algorithm for dementia diagnosis. We developed a neuropsychological algorithm in a sample of 1,023 elderly residents of Cache County, UT. We compared algorithmic and clinical dementia diagnoses both based on DSM-III-R criteria. The algorithm diagnosed dementia when there was impairment in memory and at least one other cognitive domain. We also tested a variant of the algorithm that incorporated functional measures that were based on structured informant reports. Of 1,023 participants, 87% could be classified by the basic algorithm, 94% when functional measures were considered. There was good concordance between basic psychometric and clinical diagnoses (79% agreement, kappa = 0.57). This improved after incorporating functional measures (90% agreement, kappa = 0.76). Neuropsychological algorithms may reasonably classify individuals on dementia status across a range of severity levels and ages and may provide a useful adjunct to clinical diagnoses in population studies.
Meterological correction of optical beam refraction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lukin, V.P.; Melamud, A.E.; Mironov, V.L.
1986-02-01
At the present time laser reference systems (LRS's) are widely used in agrotechnology and in geodesy. The demands for accuracy in LRS's constantly increase, so that a study of error sources and means of considering and correcting them is of practical importance. A theoretical algorithm is presented for correction of the regular component of atmospheric refraction for various types of hydrostatic stability of the atmospheric layer adjacent to the earth. The algorithm obtained is compared to regression equations obtained by processing an experimental data base. It is shown that within admissible accuracy limits the refraction correction algorithm obtained permits constructionmore » of correction tables and design of optical systems with programmable correction for atmospheric refraction on the basis of rapid meteorological measurements.« less
Liu, Yanqiu; Lu, Huijuan; Yan, Ke; Xia, Haixia; An, Chunlin
2016-01-01
Embedding cost-sensitive factors into the classifiers increases the classification stability and reduces the classification costs for classifying high-scale, redundant, and imbalanced datasets, such as the gene expression data. In this study, we extend our previous work, that is, Dissimilar ELM (D-ELM), by introducing misclassification costs into the classifier. We name the proposed algorithm as the cost-sensitive D-ELM (CS-D-ELM). Furthermore, we embed rejection cost into the CS-D-ELM to increase the classification stability of the proposed algorithm. Experimental results show that the rejection cost embedded CS-D-ELM algorithm effectively reduces the average and overall cost of the classification process, while the classification accuracy still remains competitive. The proposed method can be extended to classification problems of other redundant and imbalanced data.
Neurons from the adult human dentate nucleus: neural networks in the neuron classification.
Grbatinić, Ivan; Marić, Dušica L; Milošević, Nebojša T
2015-04-07
Topological (central vs. border neuron type) and morphological classification of adult human dentate nucleus neurons according to their quantified histomorphological properties using neural networks on real and virtual neuron samples. In the real sample 53.1% and 14.1% of central and border neurons, respectively, are classified correctly with total of 32.8% of misclassified neurons. The most important result present 62.2% of misclassified neurons in border neurons group which is even greater than number of correctly classified neurons (37.8%) in that group, showing obvious failure of network to classify neurons correctly based on computational parameters used in our study. On the virtual sample 97.3% of misclassified neurons in border neurons group which is much greater than number of correctly classified neurons (2.7%) in that group, again confirms obvious failure of network to classify neurons correctly. Statistical analysis shows that there is no statistically significant difference in between central and border neurons for each measured parameter (p>0.05). Total of 96.74% neurons are morphologically classified correctly by neural networks and each one belongs to one of the four histomorphological types: (a) neurons with small soma and short dendrites, (b) neurons with small soma and long dendrites, (c) neuron with large soma and short dendrites, (d) neurons with large soma and long dendrites. Statistical analysis supports these results (p<0.05). Human dentate nucleus neurons can be classified in four neuron types according to their quantitative histomorphological properties. These neuron types consist of two neuron sets, small and large ones with respect to their perykarions with subtypes differing in dendrite length i.e. neurons with short vs. long dendrites. Besides confirmation of neuron classification on small and large ones, already shown in literature, we found two new subtypes i.e. neurons with small soma and long dendrites and with large soma and short dendrites. These neurons are most probably equally distributed throughout the dentate nucleus as no significant difference in their topological distribution is observed. Copyright © 2015 Elsevier Ltd. All rights reserved.
Comparison of artificial intelligence classifiers for SIP attack data
NASA Astrophysics Data System (ADS)
Safarik, Jakub; Slachta, Jiri
2016-05-01
Honeypot application is a source of valuable data about attacks on the network. We run several SIP honeypots in various computer networks, which are separated geographically and logically. Each honeypot runs on public IP address and uses standard SIP PBX ports. All information gathered via honeypot is periodically sent to the centralized server. This server classifies all attack data by neural network algorithm. The paper describes optimizations of a neural network classifier, which lower the classification error. The article contains the comparison of two neural network algorithm used for the classification of validation data. The first is the original implementation of the neural network described in recent work; the second neural network uses further optimizations like input normalization or cross-entropy cost function. We also use other implementations of neural networks and machine learning classification algorithms. The comparison test their capabilities on validation data to find the optimal classifier. The article result shows promise for further development of an accurate SIP attack classification engine.
Statistical process control using optimized neural networks: a case study.
Addeh, Jalil; Ebrahimzadeh, Ata; Azarbad, Milad; Ranaee, Vahid
2014-09-01
The most common statistical process control (SPC) tools employed for monitoring process changes are control charts. A control chart demonstrates that the process has altered by generating an out-of-control signal. This study investigates the design of an accurate system for the control chart patterns (CCPs) recognition in two aspects. First, an efficient system is introduced that includes two main modules: feature extraction module and classifier module. In the feature extraction module, a proper set of shape features and statistical feature are proposed as the efficient characteristics of the patterns. In the classifier module, several neural networks, such as multilayer perceptron, probabilistic neural network and radial basis function are investigated. Based on an experimental study, the best classifier is chosen in order to recognize the CCPs. Second, a hybrid heuristic recognition system is introduced based on cuckoo optimization algorithm (COA) algorithm to improve the generalization performance of the classifier. The simulation results show that the proposed algorithm has high recognition accuracy. Copyright © 2013 ISA. Published by Elsevier Ltd. All rights reserved.
Fluorescence intensity positivity classification of Hep-2 cells images using fuzzy logic
NASA Astrophysics Data System (ADS)
Sazali, Dayang Farzana Abang; Janier, Josefina Barnachea; May, Zazilah Bt.
2014-10-01
Indirect Immunofluorescence (IIF) is a good standard used for antinuclear autoantibody (ANA) test using Hep-2 cells to determine specific diseases. Different classifier algorithm methods have been proposed in previous works however, there still no valid set as a standard to classify the fluorescence intensity. This paper presents the use of fuzzy logic to classify the fluorescence intensity and to determine the positivity of the Hep-2 cell serum samples. The fuzzy algorithm involves the image pre-processing by filtering the noises and smoothen the image, converting the red, green and blue (RGB) color space of images to luminosity layer, chromaticity layer "a" and "b" (LAB) color space where the mean value of the lightness and chromaticity layer "a" was extracted and classified by using fuzzy logic algorithm based on the standard score ranges of antinuclear autoantibody (ANA) fluorescence intensity. Using 100 data sets of positive and intermediate fluorescence intensity for testing the performance measurements, the fuzzy logic obtained an accuracy of intermediate and positive class as 85% and 87% respectively.
Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.
Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G
2017-09-01
To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
Yousef, Malik; Khalifa, Waleed; AbedAllah, Loai
2016-12-22
The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that ECkNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
Yousef, Malik; Khalifa, Waleed; AbdAllah, Loai
2016-12-01
The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that EC-kNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
NASA Astrophysics Data System (ADS)
Huang, Lei; Zhou, Chenlu; Gong, Mali; Ma, Xingkun; Bian, Qi
2016-07-01
Deformable mirror is a widely used wavefront corrector in adaptive optics system, especially in astronomical, image and laser optics. A new structure of DM-3D DM is proposed, which has removable actuators and can correct different aberrations with different actuator arrangements. A 3D DM consists of several reflection mirrors. Every mirror has a single actuator and is independent of each other. Two kinds of actuator arrangement algorithm are compared: random disturbance algorithm (RDA) and global arrangement algorithm (GAA). Correction effects of these two algorithms and comparison are analyzed through numerical simulation. The simulation results show that 3D DM with removable actuators can obviously improve the correction effects.
Coupé, Veerle M. H.; Knottnerus, Bart J.; Geerlings, Suzanne E.; Moll van Charante, Eric P.; ter Riet, Gerben
2017-01-01
Background Uncomplicated Urinary Tract Infections (UTIs) are common in primary care resulting in substantial costs. Since antimicrobial resistance against antibiotics for UTIs is rising, accurate diagnosis is needed in settings with low rates of multidrug-resistant bacteria. Objective To compare the cost-effectiveness of different strategies to diagnose UTIs in women who contacted their general practitioner (GP) with painful and/or frequent micturition between 2006 and 2008 in and around Amsterdam, The Netherlands. Methods This is a model-based cost-effectiveness analysis using data from 196 women who underwent four tests: history, urine stick, sediment, dipslide, and the gold standard, a urine culture. Decision trees were constructed reflecting 15 diagnostic strategies comprising different parallel and sequential combinations of the four tests. Using the decision trees, for each strategy the costs and the proportion of women with a correct positive or negative diagnosis were estimated. Probabilistic sensitivity analysis was used to estimate uncertainty surrounding costs and effects. Uncertainty was presented using cost-effectiveness planes and acceptability curves. Results Most sequential testing strategies resulted in higher proportions of correctly classified women and lower costs than parallel testing strategies. For different willingness to pay thresholds, the most cost-effective strategies were: 1) performing a dipstick after a positive history for thresholds below €10 per additional correctly classified patient, 2) performing both a history and dipstick for thresholds between €10 and €17 per additional correctly classified patient, 3) performing a dipstick if history was negative, followed by a sediment if the dipstick was negative for thresholds between €17 and €118 per additional correctly classified patient, 4) performing a dipstick if history was negative, followed by a dipslide if the dipstick was negative for thresholds above €118 per additional correctly classified patient. Conclusion Depending on decision makers’ willingness to pay for one additional correctly classified woman, the strategy consisting of performing a history and dipstick simultaneously (ceiling ratios between €10 and €17) or performing a sediment if history and subsequent dipstick are negative (ceiling ratios between €17 and €118) are the most cost-effective strategies to diagnose a UTI. PMID:29186185
Bosmans, Judith E; Coupé, Veerle M H; Knottnerus, Bart J; Geerlings, Suzanne E; Moll van Charante, Eric P; Ter Riet, Gerben
2017-01-01
Uncomplicated Urinary Tract Infections (UTIs) are common in primary care resulting in substantial costs. Since antimicrobial resistance against antibiotics for UTIs is rising, accurate diagnosis is needed in settings with low rates of multidrug-resistant bacteria. To compare the cost-effectiveness of different strategies to diagnose UTIs in women who contacted their general practitioner (GP) with painful and/or frequent micturition between 2006 and 2008 in and around Amsterdam, The Netherlands. This is a model-based cost-effectiveness analysis using data from 196 women who underwent four tests: history, urine stick, sediment, dipslide, and the gold standard, a urine culture. Decision trees were constructed reflecting 15 diagnostic strategies comprising different parallel and sequential combinations of the four tests. Using the decision trees, for each strategy the costs and the proportion of women with a correct positive or negative diagnosis were estimated. Probabilistic sensitivity analysis was used to estimate uncertainty surrounding costs and effects. Uncertainty was presented using cost-effectiveness planes and acceptability curves. Most sequential testing strategies resulted in higher proportions of correctly classified women and lower costs than parallel testing strategies. For different willingness to pay thresholds, the most cost-effective strategies were: 1) performing a dipstick after a positive history for thresholds below €10 per additional correctly classified patient, 2) performing both a history and dipstick for thresholds between €10 and €17 per additional correctly classified patient, 3) performing a dipstick if history was negative, followed by a sediment if the dipstick was negative for thresholds between €17 and €118 per additional correctly classified patient, 4) performing a dipstick if history was negative, followed by a dipslide if the dipstick was negative for thresholds above €118 per additional correctly classified patient. Depending on decision makers' willingness to pay for one additional correctly classified woman, the strategy consisting of performing a history and dipstick simultaneously (ceiling ratios between €10 and €17) or performing a sediment if history and subsequent dipstick are negative (ceiling ratios between €17 and €118) are the most cost-effective strategies to diagnose a UTI.
A survey of provably correct fault-tolerant clock synchronization techniques
NASA Technical Reports Server (NTRS)
Butler, Ricky W.
1988-01-01
Six provably correct fault-tolerant clock synchronization algorithms are examined. These algorithms are all presented in the same notation to permit easier comprehension and comparison. The advantages and disadvantages of the different techniques are examined and issues related to the implementation of these algorithms are discussed. The paper argues for the use of such algorithms in life-critical applications.
NASA Astrophysics Data System (ADS)
He, Xiaojun; Ma, Haotong; Luo, Chuanxin
2016-10-01
The optical multi-aperture imaging system is an effective way to magnify the aperture and increase the resolution of telescope optical system, the difficulty of which lies in detecting and correcting of co-phase error. This paper presents a method based on stochastic parallel gradient decent algorithm (SPGD) to correct the co-phase error. Compared with the current method, SPGD method can avoid detecting the co-phase error. This paper analyzed the influence of piston error and tilt error on image quality based on double-aperture imaging system, introduced the basic principle of SPGD algorithm, and discuss the influence of SPGD algorithm's key parameters (the gain coefficient and the disturbance amplitude) on error control performance. The results show that SPGD can efficiently correct the co-phase error. The convergence speed of the SPGD algorithm is improved with the increase of gain coefficient and disturbance amplitude, but the stability of the algorithm reduced. The adaptive gain coefficient can solve this problem appropriately. This paper's results can provide the theoretical reference for the co-phase error correction of the multi-aperture imaging system.
Tan, Robin; Perkowski, Marek
2017-01-01
Electrocardiogram (ECG) signals sensed from mobile devices pertain the potential for biometric identity recognition applicable in remote access control systems where enhanced data security is demanding. In this study, we propose a new algorithm that consists of a two-stage classifier combining random forest and wavelet distance measure through a probabilistic threshold schema, to improve the effectiveness and robustness of a biometric recognition system using ECG data acquired from a biosensor integrated into mobile devices. The proposed algorithm is evaluated using a mixed dataset from 184 subjects under different health conditions. The proposed two-stage classifier achieves a total of 99.52% subject verification accuracy, better than the 98.33% accuracy from random forest alone and 96.31% accuracy from wavelet distance measure algorithm alone. These results demonstrate the superiority of the proposed algorithm for biometric identification, hence supporting its practicality in areas such as cloud data security, cyber-security or remote healthcare systems. PMID:28230745
Tan, Robin; Perkowski, Marek
2017-02-20
Electrocardiogram (ECG) signals sensed from mobile devices pertain the potential for biometric identity recognition applicable in remote access control systems where enhanced data security is demanding. In this study, we propose a new algorithm that consists of a two-stage classifier combining random forest and wavelet distance measure through a probabilistic threshold schema, to improve the effectiveness and robustness of a biometric recognition system using ECG data acquired from a biosensor integrated into mobile devices. The proposed algorithm is evaluated using a mixed dataset from 184 subjects under different health conditions. The proposed two-stage classifier achieves a total of 99.52% subject verification accuracy, better than the 98.33% accuracy from random forest alone and 96.31% accuracy from wavelet distance measure algorithm alone. These results demonstrate the superiority of the proposed algorithm for biometric identification, hence supporting its practicality in areas such as cloud data security, cyber-security or remote healthcare systems.
Tchounga, Boris K; Inwoley, Andre; Coffie, Patrick A; Minta, Daouda; Messou, Eugene; Bado, Guillaume; Minga, Albert; Hawerlander, Denise; Kane, Coumba; Eholie, Serge P; Dabis, François; Ekouevi, Didier K
2014-01-01
Introduction West Africa is characterized by the circulation of HIV-1 and HIV-2. The laboratory diagnosis of these two infections as well as the choice of a first-line antiretroviral therapy (ART) is challenging, considering the limited access to second-line regimens. This study aimed at confirming the classification of HIV-2 and HIV-1&2 dually reactive patients followed up in the HIV-2 cohort of the West African Database to evaluate AIDS collaboration. Method A cross-sectional survey was conducted from March to December 2012 in Burkina Faso, Côte d’Ivoire and Mali among patients classified as HIV-2 or HIV-1&2 dually reactive according to the national HIV testing algorithms. A 5-ml blood sample was collected from each patient and tested in a single reference laboratory in Côte d’Ivoire (CeDReS, Abidjan) with two immuno-enzymatic tests: ImmunoCombII® (HIV-1&2 ImmunoComb BiSpot – Alere) and an in-house ELISA test, approved by the French National AIDS and hepatitis Research Agency (ANRS). Results A total of 547 patients were included; 57% of them were initially classified as HIV-2 and 43% as HIV-1&2 dually reactive. Half of the patients had CD4≥500 cells/mm3 and 68.6% were on ART. Of the 312 patients initially classified as HIV-2, 267 (85.7%) were confirmed as HIV-2 with ImmunoCombII® and in-house ELISA while 16 (5.1%) and 9 (2.9%) were reclassified as HIV-1 and HIV-1&2, respectively (Kappa=0.69; p<0.001). Among the 235 patients initially classified as HIV-1&2 dually reactive, only 54 (23.0%) were confirmed as dually reactive with ImmunoCombII® and in-house ELISA, while 103 (43.8%) and 33 (14.0%) were reclassified as HIV-1 and HIV-2 mono-infected, respectively (kappa= 0.70; p<0.001). Overall, 300 samples (54.8%) were concordantly classified as HIV-2, 63 (11.5%) as HIV-1&2 dually reactive and 119 (21.8%) as HIV-1 (kappa=0.79; p<0.001). The two tests gave discordant results for 65 samples (11.9%). Conclusions Patients with HIV-2 mono-infection are correctly discriminated by the national algorithms used in West African countries. HIV-1&2 dually reactive patients should be systematically investigated, with a standardized algorithm using more accurate tests, before initiating ART as at least 4 out of 10 of them could initiate an effective first-line ART for HIV-1 and optimize their second-line treatment options. PMID:25128907
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stanke, Monika, E-mail: monika@fizyka.umk.pl; Palikot, Ewa, E-mail: epalikot@doktorant.umk.pl; Adamowicz, Ludwik, E-mail: ludwik@email.arizona.edu
2016-05-07
Algorithms for calculating the leading mass-velocity (MV) and Darwin (D) relativistic corrections are derived for electronic wave functions expanded in terms of n-electron explicitly correlated Gaussian functions with shifted centers and without pre-exponential angular factors. The algorithms are implemented and tested in calculations of MV and D corrections for several points on the ground-state potential energy curves of the H{sub 2} and LiH molecules. The algorithms are general and can be applied in calculations of systems with an arbitrary number of electrons.
Age and gender classification of Merriam's turkeys from foot measurements
Mark A. Rumble; Todd R. Mills; Brian F. Wakeling; Richard W. Hoffman
1996-01-01
Wild turkey sex and age information is needed to define population structure but is difficult to obtain. We classified age and gender of Merriamâs turkeys (Meleagris gallopavo merriami) accurately based on measurements of two foot characteristics. Gender of birds was correctly classified 93% of the time from measurements of middle toe pads; correct...
Leukocyte Recognition Using EM-Algorithm
NASA Astrophysics Data System (ADS)
Colunga, Mario Chirinos; Siordia, Oscar Sánchez; Maybank, Stephen J.
This document describes a method for classifying images of blood cells. Three different classes of cells are used: Band Neutrophils, Eosinophils and Lymphocytes. The image pattern is projected down to a lower dimensional sub space using PCA; the probability density function for each class is modeled with a Gaussian mixture using the EM-Algorithm. A new cell image is classified using the maximum a posteriori decision rule.
Cloud classification from satellite data using a fuzzy sets algorithm: A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1988-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
Raposo, Letícia M; Nobre, Flavio F
2017-08-30
Resistance to antiretrovirals (ARVs) is a major problem faced by HIV-infected individuals. Different rule-based algorithms were developed to infer HIV-1 susceptibility to antiretrovirals from genotypic data. However, there is discordance between them, resulting in difficulties for clinical decisions about which treatment to use. Here, we developed ensemble classifiers integrating three interpretation algorithms: Agence Nationale de Recherche sur le SIDA (ANRS), Rega, and the genotypic resistance interpretation system from Stanford HIV Drug Resistance Database (HIVdb). Three approaches were applied to develop a classifier with a single resistance profile: stacked generalization, a simple plurality vote scheme and the selection of the interpretation system with the best performance. The strategies were compared with the Friedman's test and the performance of the classifiers was evaluated using the F-measure, sensitivity and specificity values. We found that the three strategies had similar performances for the selected antiretrovirals. For some cases, the stacking technique with naïve Bayes as the learning algorithm showed a statistically superior F-measure. This study demonstrates that ensemble classifiers can be an alternative tool for clinical decision-making since they provide a single resistance profile from the most commonly used resistance interpretation systems.
Sosnovik, David E; Dai, Guangping; Nahrendorf, Matthias; Rosen, Bruce R; Seethamraju, Ravi
2007-08-01
To evaluate the use of a transmit-receive surface (TRS) coil and a cardiac-tailored intensity-correction algorithm for cardiac MRI in mice at 9.4 Tesla (9.4T). Fast low-angle shot (FLASH) cines, with and without delays alternating with nutations for tailored excitation (DANTE) tagging, were acquired in 13 mice. An intensity-correction algorithm was developed to compensate for the sensitivity profile of the surface coil, and was tailored to account for the unique distribution of noise and flow artifacts in cardiac MR images. Image quality was extremely high and allowed fine structures such as trabeculations, valve cusps, and coronary arteries to be clearly visualized. The tag lines created with the surface coil were also sharp and clearly visible. Application of the intensity-correction algorithm improved signal intensity, tissue contrast, and image quality even further. Importantly, the cardiac-tailored properties of the correction algorithm prevented noise and flow artifacts from being significantly amplified. The feasibility and value of cardiac MRI in mice with a TRS coil has been demonstrated. In addition, a cardiac-tailored intensity-correction algorithm has been developed and shown to improve image quality even further. The use of these techniques could produce significant potential benefits over a broad range of scanners, coil configurations, and field strengths. (c) 2007 Wiley-Liss, Inc.
Enhancement web proxy cache performance using Wrapper Feature Selection methods with NB and J48
NASA Astrophysics Data System (ADS)
Mahmoud Al-Qudah, Dua'a.; Funke Olanrewaju, Rashidah; Wong Azman, Amelia
2017-11-01
Web proxy cache technique reduces response time by storing a copy of pages between client and server sides. If requested pages are cached in the proxy, there is no need to access the server. Due to the limited size and excessive cost of cache compared to the other storages, cache replacement algorithm is used to determine evict page when the cache is full. On the other hand, the conventional algorithms for replacement such as Least Recently Use (LRU), First in First Out (FIFO), Least Frequently Use (LFU), Randomized Policy etc. may discard important pages just before use. Furthermore, using conventional algorithm cannot be well optimized since it requires some decision to intelligently evict a page before replacement. Hence, most researchers propose an integration among intelligent classifiers and replacement algorithm to improves replacement algorithms performance. This research proposes using automated wrapper feature selection methods to choose the best subset of features that are relevant and influence classifiers prediction accuracy. The result present that using wrapper feature selection methods namely: Best First (BFS), Incremental Wrapper subset selection(IWSS)embedded NB and particle swarm optimization(PSO)reduce number of features and have a good impact on reducing computation time. Using PSO enhance NB classifier accuracy by 1.1%, 0.43% and 0.22% over using NB with all features, using BFS and using IWSS embedded NB respectively. PSO rises J48 accuracy by 0.03%, 1.91 and 0.04% over using J48 classifier with all features, using IWSS-embedded NB and using BFS respectively. While using IWSS embedded NB fastest NB and J48 classifiers much more than BFS and PSO. However, it reduces computation time of NB by 0.1383 and reduce computation time of J48 by 2.998.
Yue, Dan; Xu, Shuyan; Nie, Haitao; Wang, Zongyang
2016-01-01
The misalignment between recorded in-focus and out-of-focus images using the Phase Diversity (PD) algorithm leads to a dramatic decline in wavefront detection accuracy and image recovery quality for segmented active optics systems. This paper demonstrates the theoretical relationship between the image misalignment and tip-tilt terms in Zernike polynomials of the wavefront phase for the first time, and an efficient two-step alignment correction algorithm is proposed to eliminate these misalignment effects. This algorithm processes a spatial 2-D cross-correlation of the misaligned images, revising the offset to 1 or 2 pixels and narrowing the search range for alignment. Then, it eliminates the need for subpixel fine alignment to achieve adaptive correction by adding additional tip-tilt terms to the Optical Transfer Function (OTF) of the out-of-focus channel. The experimental results demonstrate the feasibility and validity of the proposed correction algorithm to improve the measurement accuracy during the co-phasing of segmented mirrors. With this alignment correction, the reconstructed wavefront is more accurate, and the recovered image is of higher quality. PMID:26934045
Multiobjective GAs, quantitative indices, and pattern classification.
Bandyopadhyay, Sanghamitra; Pal, Sankar K; Aruna, B
2004-10-01
The concept of multiobjective optimization (MOO) has been integrated with variable length chromosomes for the development of a nonparametric genetic classifier which can overcome the problems, like overfitting/overlearning and ignoring smaller classes, as faced by single objective classifiers. The classifier can efficiently approximate any kind of linear and/or nonlinear class boundaries of a data set using an appropriate number of hyperplanes. While designing the classifier the aim is to simultaneously minimize the number of misclassified training points and the number of hyperplanes, and to maximize the product of class wise recognition scores. The concepts of validation set (in addition to training and test sets) and validation functional are introduced in the multiobjective classifier for selecting a solution from a set of nondominated solutions provided by the MOO algorithm. This genetic classifier incorporates elitism and some domain specific constraints in the search process, and is called the CEMOGA-Classifier (constrained elitist multiobjective genetic algorithm based classifier). Two new quantitative indices, namely, the purity and minimal spacing, are developed for evaluating the performance of different MOO techniques. These are used, along with classification accuracy, required number of hyperplanes and the computation time, to compare the CEMOGA-Classifier with other related ones.
Mete, Mutlu; Sakoglu, Unal; Spence, Jeffrey S; Devous, Michael D; Harris, Thomas S; Adinoff, Bryon
2016-10-06
Neuroimaging studies have yielded significant advances in the understanding of neural processes relevant to the development and persistence of addiction. However, these advances have not explored extensively for diagnostic accuracy in human subjects. The aim of this study was to develop a statistical approach, using a machine learning framework, to correctly classify brain images of cocaine-dependent participants and healthy controls. In this study, a framework suitable for educing potential brain regions that differed between the two groups was developed and implemented. Single Photon Emission Computerized Tomography (SPECT) images obtained during rest or a saline infusion in three cohorts of 2-4 week abstinent cocaine-dependent participants (n = 93) and healthy controls (n = 69) were used to develop a classification model. An information theoretic-based feature selection algorithm was first conducted to reduce the number of voxels. A density-based clustering algorithm was then used to form spatially connected voxel clouds in three-dimensional space. A statistical classifier, Support Vectors Machine (SVM), was then used for participant classification. Statistically insignificant voxels of spatially connected brain regions were removed iteratively and classification accuracy was reported through the iterations. The voxel-based analysis identified 1,500 spatially connected voxels in 30 distinct clusters after a grid search in SVM parameters. Participants were successfully classified with 0.88 and 0.89 F-measure accuracies in 10-fold cross validation (10xCV) and leave-one-out (LOO) approaches, respectively. Sensitivity and specificity were 0.90 and 0.89 for LOO; 0.83 and 0.83 for 10xCV. Many of the 30 selected clusters are highly relevant to the addictive process, including regions relevant to cognitive control, default mode network related self-referential thought, behavioral inhibition, and contextual memories. Relative hyperactivity and hypoactivity of regional cerebral blood flow in brain regions in cocaine-dependent participants are presented with corresponding level of significance. The SVM-based approach successfully classified cocaine-dependent and healthy control participants using voxels selected with information theoretic-based and statistical methods from participants' SPECT data. The regions found in this study align with brain regions reported in the literature. These findings support the future use of brain imaging and SVM-based classifier in the diagnosis of substance use disorders and furthering an understanding of their underlying pathology.
Harmony Search Algorithm for Word Sense Disambiguation.
Abed, Saad Adnan; Tiun, Sabrina; Omar, Nazlia
2015-01-01
Word Sense Disambiguation (WSD) is the task of determining which sense of an ambiguous word (word with multiple meanings) is chosen in a particular use of that word, by considering its context. A sentence is considered ambiguous if it contains ambiguous word(s). Practically, any sentence that has been classified as ambiguous usually has multiple interpretations, but just one of them presents the correct interpretation. We propose an unsupervised method that exploits knowledge based approaches for word sense disambiguation using Harmony Search Algorithm (HSA) based on a Stanford dependencies generator (HSDG). The role of the dependency generator is to parse sentences to obtain their dependency relations. Whereas, the goal of using the HSA is to maximize the overall semantic similarity of the set of parsed words. HSA invokes a combination of semantic similarity and relatedness measurements, i.e., Jiang and Conrath (jcn) and an adapted Lesk algorithm, to perform the HSA fitness function. Our proposed method was experimented on benchmark datasets, which yielded results comparable to the state-of-the-art WSD methods. In order to evaluate the effectiveness of the dependency generator, we perform the same methodology without the parser, but with a window of words. The empirical results demonstrate that the proposed method is able to produce effective solutions for most instances of the datasets used.
Harmony Search Algorithm for Word Sense Disambiguation
Abed, Saad Adnan; Tiun, Sabrina; Omar, Nazlia
2015-01-01
Word Sense Disambiguation (WSD) is the task of determining which sense of an ambiguous word (word with multiple meanings) is chosen in a particular use of that word, by considering its context. A sentence is considered ambiguous if it contains ambiguous word(s). Practically, any sentence that has been classified as ambiguous usually has multiple interpretations, but just one of them presents the correct interpretation. We propose an unsupervised method that exploits knowledge based approaches for word sense disambiguation using Harmony Search Algorithm (HSA) based on a Stanford dependencies generator (HSDG). The role of the dependency generator is to parse sentences to obtain their dependency relations. Whereas, the goal of using the HSA is to maximize the overall semantic similarity of the set of parsed words. HSA invokes a combination of semantic similarity and relatedness measurements, i.e., Jiang and Conrath (jcn) and an adapted Lesk algorithm, to perform the HSA fitness function. Our proposed method was experimented on benchmark datasets, which yielded results comparable to the state-of-the-art WSD methods. In order to evaluate the effectiveness of the dependency generator, we perform the same methodology without the parser, but with a window of words. The empirical results demonstrate that the proposed method is able to produce effective solutions for most instances of the datasets used. PMID:26422368
Evaluating data mining algorithms using molecular dynamics trajectories.
Tatsis, Vasileios A; Tjortjis, Christos; Tzirakis, Panagiotis
2013-01-01
Molecular dynamics simulations provide a sample of a molecule's conformational space. Experiments on the mus time scale, resulting in large amounts of data, are nowadays routine. Data mining techniques such as classification provide a way to analyse such data. In this work, we evaluate and compare several classification algorithms using three data sets which resulted from computer simulations, of a potential enzyme mimetic biomolecule. We evaluated 65 classifiers available in the well-known data mining toolkit Weka, using 'classification' errors to assess algorithmic performance. Results suggest that: (i) 'meta' classifiers perform better than the other groups, when applied to molecular dynamics data sets; (ii) Random Forest and Rotation Forest are the best classifiers for all three data sets; and (iii) classification via clustering yields the highest classification error. Our findings are consistent with bibliographic evidence, suggesting a 'roadmap' for dealing with such data.
Airline Passenger Profiling Based on Fuzzy Deep Machine Learning.
Zheng, Yu-Jun; Sheng, Wei-Guo; Sun, Xing-Ming; Chen, Sheng-Yong
2017-12-01
Passenger profiling plays a vital part of commercial aviation security, but classical methods become very inefficient in handling the rapidly increasing amounts of electronic records. This paper proposes a deep learning approach to passenger profiling. The center of our approach is a Pythagorean fuzzy deep Boltzmann machine (PFDBM), whose parameters are expressed by Pythagorean fuzzy numbers such that each neuron can learn how a feature affects the production of the correct output from both the positive and negative sides. We propose a hybrid algorithm combining a gradient-based method and an evolutionary algorithm for training the PFDBM. Based on the novel learning model, we develop a deep neural network (DNN) for classifying normal passengers and potential attackers, and further develop an integrated DNN for identifying group attackers whose individual features are insufficient to reveal the abnormality. Experiments on data sets from Air China show that our approach provides much higher learning ability and classification accuracy than existing profilers. It is expected that the fuzzy deep learning approach can be adapted for a variety of complex pattern analysis tasks.
New machine-learning algorithms for prediction of Parkinson's disease
NASA Astrophysics Data System (ADS)
Mandal, Indrajit; Sairam, N.
2014-03-01
This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.
HomoTarget: a new algorithm for prediction of microRNA targets in Homo sapiens.
Ahmadi, Hamed; Ahmadi, Ali; Azimzadeh-Jamalkandi, Sadegh; Shoorehdeli, Mahdi Aliyari; Salehzadeh-Yazdi, Ali; Bidkhori, Gholamreza; Masoudi-Nejad, Ali
2013-02-01
MiRNAs play an essential role in the networks of gene regulation by inhibiting the translation of target mRNAs. Several computational approaches have been proposed for the prediction of miRNA target-genes. Reports reveal a large fraction of under-predicted or falsely predicted target genes. Thus, there is an imperative need to develop a computational method by which the target mRNAs of existing miRNAs can be correctly identified. In this study, combined pattern recognition neural network (PRNN) and principle component analysis (PCA) architecture has been proposed in order to model the complicated relationship between miRNAs and their target mRNAs in humans. The results of several types of intelligent classifiers and our proposed model were compared, showing that our algorithm outperformed them with higher sensitivity and specificity. Using the recent release of the mirBase database to find potential targets of miRNAs, this model incorporated twelve structural, thermodynamic and positional features of miRNA:mRNA binding sites to select target candidates. Copyright © 2012 Elsevier Inc. All rights reserved.
Studying fish near ocean energy devices using underwater video
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matzner, Shari; Hull, Ryan E.; Harker-Klimes, Genevra EL
The effects of energy devices on fish populations are not well-understood, and studying the interactions of fish with tidal and instream turbines is challenging. To address this problem, we have evaluated algorithms to automatically detect fish in underwater video and propose a semi-automated method for ocean and river energy device ecological monitoring. The key contributions of this work are the demonstration of a background subtraction algorithm (ViBE) that detected 87% of human-identified fish events and is suitable for use in a real-time system to reduce data volume, and the demonstration of a statistical model to classify detections as fish ormore » not fish that achieved a correct classification rate of 85% overall and 92% for detections larger than 5 pixels. Specific recommendations for underwater video acquisition to better facilitate automated processing are given. The recommendations will help energy developers put effective monitoring systems in place, and could lead to a standard approach that simplifies the monitoring effort and advances the scientific understanding of the ecological impacts of ocean and river energy devices.« less
Chen, Hsiu-Chin; Bennett, Sean
2016-08-01
Little evidence shows the use of decision-tree algorithms in identifying predictors and analyzing their associations with pass rates for the NCLEX-RN(®) in associate degree nursing students. This longitudinal and retrospective cohort study investigated whether a decision-tree algorithm could be used to develop an accurate prediction model for the students' passing or failing the NCLEX-RN. This study used archived data from 453 associate degree nursing students in a selected program. The chi-squared automatic interaction detection analysis of the decision trees module was used to examine the effect of the collected predictors on passing/failing the NCLEX-RN. The actual percentage scores of Assessment Technologies Institute®'s RN Comprehensive Predictor(®) accurately identified students at risk of failing. The classification model correctly classified 92.7% of the students for passing. This study applied the decision-tree model to analyze a sequence database for developing a prediction model for early remediation in preparation for the NCLEXRN. [J Nurs Educ. 2016;55(8):454-457.]. Copyright 2016, SLACK Incorporated.
Sun, Rongrong; Wang, Yuanyuan
2008-11-01
Predicting the spontaneous termination of the atrial fibrillation (AF) leads to not only better understanding of mechanisms of the arrhythmia but also the improved treatment of the sustained AF. A novel method is proposed to characterize the AF based on structure and the quantification of the recurrence plot (RP) to predict the termination of the AF. The RP of the electrocardiogram (ECG) signal is firstly obtained and eleven features are extracted to characterize its three basic patterns. Then the sequential forward search (SFS) algorithm and Davies-Bouldin criterion are utilized to select the feature subset which can predict the AF termination effectively. Finally, the multilayer perceptron (MLP) neural network is applied to predict the AF termination. An AF database which includes one training set and two testing sets (A and B) of Holter ECG recordings is studied. Experiment results show that 97% of testing set A and 95% of testing set B are correctly classified. It demonstrates that this algorithm has the ability to predict the spontaneous termination of the AF effectively.
NASA Astrophysics Data System (ADS)
Klose, C. D.; Kim, H. K.; Netz, U.; Blaschke, S.; Zwaka, P. A.; Mueller, G. A.; Beuthan, J.; Hielscher, A. H.
2009-02-01
Novel methods that can help in the diagnosis and monitoring of joint disease are essential for efficient use of novel arthritis therapies that are currently emerging. Building on previous studies that involved continuous wave imaging systems we present here first clinical data obtained with a new frequency-domain imaging system. Three-dimensional tomographic data sets of absorption and scattering coefficients were generated for 107 fingers. The data were analyzed using ANOVA, MANOVA, Discriminant Analysis DA, and a machine-learning algorithm that is based on self-organizing mapping (SOM) for clustering data in 2-dimensional parameter spaces. Overall we found that the SOM algorithm outperforms the more traditional analysis methods in terms of correctly classifying finger joints. Using SOM, healthy and affected joints can now be separated with a sensitivity of 0.97 and specificity of 0.91. Furthermore, preliminary results suggest that if a combination of multiple image properties is used, statistical significant differences can be found between RA-affected finger joints that show different clinical features (e.g. effusion, synovitis or erosion).
Kim, Il-Hwan; Bong, Jae-Hwan; Park, Jooyoung; Park, Shinsuk
2017-01-01
Driver assistance systems have become a major safety feature of modern passenger vehicles. The advanced driver assistance system (ADAS) is one of the active safety systems to improve the vehicle control performance and, thus, the safety of the driver and the passengers. To use the ADAS for lane change control, rapid and correct detection of the driver’s intention is essential. This study proposes a novel preprocessing algorithm for the ADAS to improve the accuracy in classifying the driver’s intention for lane change by augmenting basic measurements from conventional on-board sensors. The information on the vehicle states and the road surface condition is augmented by using an artificial neural network (ANN) models, and the augmented information is fed to a support vector machine (SVM) to detect the driver’s intention with high accuracy. The feasibility of the developed algorithm was tested through driving simulator experiments. The results show that the classification accuracy for the driver’s intention can be improved by providing an SVM model with sufficient driving information augmented by using ANN models of vehicle dynamics. PMID:28604582
Insausti, Matías; Gomes, Adriano A; Cruz, Fernanda V; Pistonesi, Marcelo F; Araujo, Mario C U; Galvão, Roberto K H; Pereira, Claudete F; Band, Beatriz S F
2012-08-15
This paper investigates the use of UV-vis, near infrared (NIR) and synchronous fluorescence (SF) spectrometries coupled with multivariate classification methods to discriminate biodiesel samples with respect to the base oil employed in their production. More specifically, the present work extends previous studies by investigating the discrimination of corn-based biodiesel from two other biodiesel types (sunflower and soybean). Two classification methods are compared, namely full-spectrum SIMCA (soft independent modelling of class analogies) and SPA-LDA (linear discriminant analysis with variables selected by the successive projections algorithm). Regardless of the spectrometric technique employed, full-spectrum SIMCA did not provide an appropriate discrimination of the three biodiesel types. In contrast, all samples were correctly classified on the basis of a reduced number of wavelengths selected by SPA-LDA. It can be concluded that UV-vis, NIR and SF spectrometries can be successfully employed to discriminate corn-based biodiesel from the two other biodiesel types, but wavelength selection by SPA-LDA is key to the proper separation of the classes. Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Bal, A.; Alam, M. S.; Aslan, M. S.
2006-05-01
Often sensor ego-motion or fast target movement causes the target to temporarily go out of the field-of-view leading to reappearing target detection problem in target tracking applications. Since the target goes out of the current frame and reenters at a later frame, the reentering location and variations in rotation, scale, and other 3D orientations of the target are not known thus complicating the detection algorithm has been developed using Fukunaga-Koontz Transform (FKT) and distance classifier correlation filter (DCCF). The detection algorithm uses target and background information, extracted from training samples, to detect possible candidate target images. The detected candidate target images are then introduced into the second algorithm, DCCF, called clutter rejection module, to determine the target coordinates are detected and tracking algorithm is initiated. The performance of the proposed FKT-DCCF based target detection algorithm has been tested using real-world forward looking infrared (FLIR) video sequences.
Adaboost multi-view face detection based on YCgCr skin color model
NASA Astrophysics Data System (ADS)
Lan, Qi; Xu, Zhiyong
2016-09-01
Traditional Adaboost face detection algorithm uses Haar-like features training face classifiers, whose detection error rate is low in the face region. While under the complex background, the classifiers will make wrong detection easily to the background regions with the similar faces gray level distribution, which leads to the error detection rate of traditional Adaboost algorithm is high. As one of the most important features of a face, skin in YCgCr color space has good clustering. We can fast exclude the non-face areas through the skin color model. Therefore, combining with the advantages of the Adaboost algorithm and skin color detection algorithm, this paper proposes Adaboost face detection algorithm method that bases on YCgCr skin color model. Experiments show that, compared with traditional algorithm, the method we proposed has improved significantly in the detection accuracy and errors.
Enhancement of Fast Face Detection Algorithm Based on a Cascade of Decision Trees
NASA Astrophysics Data System (ADS)
Khryashchev, V. V.; Lebedev, A. A.; Priorov, A. L.
2017-05-01
Face detection algorithm based on a cascade of ensembles of decision trees (CEDT) is presented. The new approach allows detecting faces other than the front position through the use of multiple classifiers. Each classifier is trained for a specific range of angles of the rotation head. The results showed a high rate of productivity for CEDT on images with standard size. The algorithm increases the area under the ROC-curve of 13% compared to a standard Viola-Jones face detection algorithm. Final realization of given algorithm consist of 5 different cascades for frontal/non-frontal faces. One more thing which we take from the simulation results is a low computational complexity of CEDT algorithm in comparison with standard Viola-Jones approach. This could prove important in the embedded system and mobile device industries because it can reduce the cost of hardware and make battery life longer.
A UMLS-based spell checker for natural language processing in vaccine safety.
Tolentino, Herman D; Matters, Michael D; Walop, Wikke; Law, Barbara; Tong, Wesley; Liu, Fang; Fontelo, Paul; Kohl, Katrin; Payne, Daniel C
2007-02-12
The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74-75), 100% (95% CI: 100-100), and 47% (95% CI: 46%-48%), respectively. We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest.
A UMLS-based spell checker for natural language processing in vaccine safety
Tolentino, Herman D; Matters, Michael D; Walop, Wikke; Law, Barbara; Tong, Wesley; Liu, Fang; Fontelo, Paul; Kohl, Katrin; Payne, Daniel C
2007-01-01
Background The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. Methods We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. Results We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74–75), 100% (95% CI: 100–100), and 47% (95% CI: 46%–48%), respectively. Conclusion We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest. PMID:17295907
Algorithm Updates for the Fourth SeaWiFS Data Reprocessing
NASA Technical Reports Server (NTRS)
Hooker, Stanford, B. (Editor); Firestone, Elaine R. (Editor); Patt, Frederick S.; Barnes, Robert A.; Eplee, Robert E., Jr.; Franz, Bryan A.; Robinson, Wayne D.; Feldman, Gene Carl; Bailey, Sean W.
2003-01-01
The efforts to improve the data quality for the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) data products have continued, following the third reprocessing of the global data set in May 2000. Analyses have been ongoing to address all aspects of the processing algorithms, particularly the calibration methodologies, atmospheric correction, and data flagging and masking. All proposed changes were subjected to rigorous testing, evaluation and validation. The results of these activities culminated in the fourth reprocessing, which was completed in July 2002. The algorithm changes, which were implemented for this reprocessing, are described in the chapters of this volume. Chapter 1 presents an overview of the activities leading up to the fourth reprocessing, and summarizes the effects of the changes. Chapter 2 describes the modifications to the on-orbit calibration, specifically the focal plane temperature correction and the temporal dependence. Chapter 3 describes the changes to the vicarious calibration, including the stray light correction to the Marine Optical Buoy (MOBY) data and improved data screening procedures. Chapter 4 describes improvements to the near-infrared (NIR) band correction algorithm. Chapter 5 describes changes to the atmospheric correction and the oceanic property retrieval algorithms, including out-of-band corrections, NIR noise reduction, and handling of unusual conditions. Chapter 6 describes various changes to the flags and masks, to increase the number of valid retrievals, improve the detection of the flag conditions, and add new flags. Chapter 7 describes modifications to the level-la and level-3 algorithms, to improve the navigation accuracy, correct certain types of spacecraft time anomalies, and correct a binning logic error. Chapter 8 describes the algorithm used to generate the SeaWiFS photosynthetically available radiation (PAR) product. Chapter 9 describes a coupled ocean-atmosphere model, which is used in one of the changes described in Chapter 4. Finally, Chapter 10 describes a comparison of results from the third and fourth reprocessings along the US. Northeast coast.
A comparative study of nonparametric methods for pattern recognition
NASA Technical Reports Server (NTRS)
Hahn, S. F.; Nelson, G. D.
1972-01-01
The applied research discussed in this report determines and compares the correct classification percentage of the nonparametric sign test, Wilcoxon's signed rank test, and K-class classifier with the performance of the Bayes classifier. The performance is determined for data which have Gaussian, Laplacian and Rayleigh probability density functions. The correct classification percentage is shown graphically for differences in modes and/or means of the probability density functions for four, eight and sixteen samples. The K-class classifier performed very well with respect to the other classifiers used. Since the K-class classifier is a nonparametric technique, it usually performed better than the Bayes classifier which assumes the data to be Gaussian even though it may not be. The K-class classifier has the advantage over the Bayes in that it works well with non-Gaussian data without having to determine the probability density function of the data. It should be noted that the data in this experiment was always unimodal.
Phase 2 development of Great Lakes algorithms for Nimbus-7 coastal zone color scanner
NASA Technical Reports Server (NTRS)
Tanis, Fred J.
1984-01-01
A series of experiments have been conducted in the Great Lakes designed to evaluate the application of the NIMBUS-7 Coastal Zone Color Scanner (CZCS). Atmospheric and water optical models were used to relate surface and subsurface measurements to satellite measured radiances. Absorption and scattering measurements were reduced to obtain a preliminary optical model for the Great Lakes. Algorithms were developed for geometric correction, correction for Rayleigh and aerosol path radiance, and prediction of chlorophyll-a pigment and suspended mineral concentrations. The atmospheric algorithm developed compared favorably with existing algorithms and was the only algorithm found to adequately predict the radiance variations in the 670 nm band. The atmospheric correction algorithm developed was designed to extract needed algorithm parameters from the CZCS radiance values. The Gordon/NOAA ocean algorithms could not be demonstrated to work for Great Lakes waters. Predicted values of chlorophyll-a concentration compared favorably with expected and measured data for several areas of the Great Lakes.
Locating Encrypted Data Hidden Among Non-Encrypted Data Using Statistical Tools
2007-03-01
length of a compressed sequence). If a bit sequence can be significantly compressed , then it is not random. Lempel - Ziv Compression Test This test...communication, targeting, and a host other of tasks. This software will most assuredly contain classified data or algorithms requiring protection in...containing the classified data and algorithms . As the program is executed the solider would have access to the common unclassified tasks, however, to
Gustaf: Detecting and correctly classifying SVs in the NGS twilight zone.
Trappe, Kathrin; Emde, Anne-Katrin; Ehrlich, Hans-Christian; Reinert, Knut
2014-12-15
The landscape of structural variation (SV) including complex duplication and translocation patterns is far from resolved. SV detection tools usually exhibit low agreement, are often geared toward certain types or size ranges of variation and struggle to correctly classify the type and exact size of SVs. We present Gustaf (Generic mUlti-SpliT Alignment Finder), a sound generic multi-split SV detection tool that detects and classifies deletions, inversions, dispersed duplications and translocations of ≥ 30 bp. Our approach is based on a generic multi-split alignment strategy that can identify SV breakpoints with base pair resolution. We show that Gustaf correctly identifies SVs, especially in the range from 30 to 100 bp, which we call the next-generation sequencing (NGS) twilight zone of SVs, as well as larger SVs >500 bp. Gustaf performs better than similar tools in our benchmark and is furthermore able to correctly identify size and location of dispersed duplications and translocations, which otherwise might be wrongly classified, for example, as large deletions. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Classifier-Guided Sampling for Complex Energy System Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Backlund, Peter B.; Eddy, John P.
2015-09-01
This report documents the results of a Laboratory Directed Research and Development (LDRD) effort enti tled "Classifier - Guided Sampling for Complex Energy System Optimization" that was conducted during FY 2014 and FY 2015. The goal of this proj ect was to develop, implement, and test major improvements to the classifier - guided sampling (CGS) algorithm. CGS is type of evolutionary algorithm for perform ing search and optimization over a set of discrete design variables in the face of one or more objective functions. E xisting evolutionary algorithms, such as genetic algorithms , may require a large number of omore » bjecti ve function evaluations to identify optimal or near - optimal solutions . Reducing the number of evaluations can result in significant time savings, especially if the objective function is computationally expensive. CGS reduce s the evaluation count by us ing a Bayesian network classifier to filter out non - promising candidate designs , prior to evaluation, based on their posterior probabilit ies . In this project, b oth the single - objective and multi - objective version s of the CGS are developed and tested on a set of benchm ark problems. As a domain - specific case study, CGS is used to design a microgrid for use in islanded mode during an extended bulk power grid outage.« less
NASA Astrophysics Data System (ADS)
Ross, Z. E.; Meier, M. A.; Hauksson, E.
2017-12-01
Accurate first-motion polarities are essential for determining earthquake focal mechanisms, but are difficult to measure automatically because of picking errors and signal to noise issues. Here we develop an algorithm for reliable automated classification of first-motion polarities using machine learning algorithms. A classifier is designed to identify whether the first-motion polarity is up, down, or undefined by examining the waveform data directly. We first improve the accuracy of automatic P-wave onset picks by maximizing a weighted signal/noise ratio for a suite of candidate picks around the automatic pick. We then use the waveform amplitudes before and after the optimized pick as features for the classification. We demonstrate the method's potential by training and testing the classifier on tens of thousands of hand-made first-motion picks by the Southern California Seismic Network. The classifier assigned the same polarity as chosen by an analyst in more than 94% of the records. We show that the method is generalizable to a variety of learning algorithms, including neural networks and random forest classifiers. The method is suitable for automated processing of large seismic waveform datasets, and can potentially be used in real-time applications, e.g. for improving the source characterizations of earthquake early warning algorithms.
Efficient audio signal processing for embedded systems
NASA Astrophysics Data System (ADS)
Chiu, Leung Kin
As mobile platforms continue to pack on more computational power, electronics manufacturers start to differentiate their products by enhancing the audio features. However, consumers also demand smaller devices that could operate for longer time, hence imposing design constraints. In this research, we investigate two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound ”richer" and "fuller." Piezoelectric speakers have a small form factor but exhibit poor response in the low-frequency region. In the algorithm, we combine psychoacoustic bass extension and dynamic range compression to improve the perceived bass coming out from the tiny speakers. We also developed an audio energy reduction algorithm for loudspeaker power management. The perceptually transparent algorithm extends the battery life of mobile devices and prevents thermal damage in speakers. This method is similar to audio compression algorithms, which encode audio signals in such a ways that the compression artifacts are not easily perceivable. Instead of reducing the storage space, however, we suppress the audio contents that are below the hearing threshold, therefore reducing the signal energy. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The system is an example of an analog-to-information converter. The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. In this classifier architecture, we combine simple "base" analog classifiers to form a strong one. We also designed the circuits to implement the AdaBoost-based analog classifier.
The SEASAT altimeter wet tropospheric range correction revisited
NASA Technical Reports Server (NTRS)
Tapley, D. B.; Lundberg, J. B.; Born, G. H.
1984-01-01
An expanded set of radiosonde observations was used to calculate the wet tropospheric range correction for the brightness temperature measurements of the SEASAT scanning multichannel microwave radiometer (SMMR). The accuracy of the conventional algorithm for wet tropospheric range correction was evaluated. On the basis of the expanded observational data set, the algorithm was found to have a bias of about 1.0 cm, and a standard deviation 2.8 cm. In order to improve the algorithm, the exact linear, quadratic and logarithmic relationships between brightness temperatures and range corrections were determined. Various combinations of measurement parameters were used to reduce the standard deviation between SEASAT SMMR and radiosonde observations to about 2.1 cm. The performance of various range correction formulas is compared in a table.
Texture segmentation by genetic programming.
Song, Andy; Ciesielski, Vic
2008-01-01
This paper describes a texture segmentation method using genetic programming (GP), which is one of the most powerful evolutionary computation algorithms. By choosing an appropriate representation texture, classifiers can be evolved without computing texture features. Due to the absence of time-consuming feature extraction, the evolved classifiers enable the development of the proposed texture segmentation algorithm. This GP based method can achieve a segmentation speed that is significantly higher than that of conventional methods. This method does not require a human expert to manually construct models for texture feature extraction. In an analysis of the evolved classifiers, it can be seen that these GP classifiers are not arbitrary. Certain textural regularities are captured by these classifiers to discriminate different textures. GP has been shown in this study as a feasible and a powerful approach for texture classification and segmentation, which are generally considered as complex vision tasks.
Mohapatra, Bidyut R; Broersma, Klaas; Mazumder, Asit
2008-04-01
Determination of the non-point sources of fecal pollution is essential for the assessment of potential public health risk and development of appropriate management practices for prevention of further contamination. Repetitive extragenic palindromic-PCR coupled with (GTG)(5) primer [(GTG)(5)-PCR] was performed on 573 Escherichia coli isolates obtained from the feces of poultry (chicken, duck and turkey) and free-living (Canada goose, hawk, magpie, seagull and songbird) birds to evaluate the efficacy of (GTG)(5)-PCR genomic fingerprinting in the prediction of the correct source of fecal pollution. A discriminant analysis with the jack-knife algorithm of (GTG)(5)-PCR DNA fingerprints revealed that 95%, 94.1%, 93.2%, 84.6%, 79.7%, 76.7%, 75.3% and 70.7% of magpie, hawk, turkey, seagull, Canada goose, chicken, duck and songbird fecal E. coli isolates classified into the correct host source, respectively. The results of this study indicate that (GTG)(5)-PCR can be considered to be a complementary molecular tool for the rapid determination of E. coli isolates identity and tracking the non-point sources of fecal pollution.
NASA Technical Reports Server (NTRS)
Kitzis, J. L.; Kitzis, S. N.
1979-01-01
The brightness temperature data produced by the SMMR final Antenna Pattern Correction (APC) algorithm is discussed. The algorithm consisted of: (1) a direct comparison of the outputs of the final and interim APC algorithms; and (2) an analysis of a possible relationship between observed cross track gradients in the interim brightness temperatures and the asymmetry in the antenna temperature data. Results indicate a bias between the brightness temperature produced by the final and interim APC algorithm.
Implementation and performance evaluation of acoustic denoising algorithms for UAV
NASA Astrophysics Data System (ADS)
Chowdhury, Ahmed Sony Kamal
Unmanned Aerial Vehicles (UAVs) have become popular alternative for wildlife monitoring and border surveillance applications. Elimination of the UAV's background noise and classifying the target audio signal effectively are still a major challenge. The main goal of this thesis is to remove UAV's background noise by means of acoustic denoising techniques. Existing denoising algorithms, such as Adaptive Least Mean Square (LMS), Wavelet Denoising, Time-Frequency Block Thresholding, and Wiener Filter, were implemented and their performance evaluated. The denoising algorithms were evaluated for average Signal to Noise Ratio (SNR), Segmental SNR (SSNR), Log Likelihood Ratio (LLR), and Log Spectral Distance (LSD) metrics. To evaluate the effectiveness of the denoising algorithms on classification of target audio, we implemented Support Vector Machine (SVM) and Naive Bayes classification algorithms. Simulation results demonstrate that LMS and Discrete Wavelet Transform (DWT) denoising algorithm offered superior performance than other algorithms. Finally, we implemented the LMS and DWT algorithms on a DSP board for hardware evaluation. Experimental results showed that LMS algorithm's performance is robust compared to DWT for various noise types to classify target audio signals.
NASA Technical Reports Server (NTRS)
Freeman, A.; Villasenor, J.; Klein, J. D.
1991-01-01
We describe the calibration and analysis of multi-frequency, multi-polarization radar backscatter signatures over an agriculture test site in the Netherlands. The calibration procedure involved two stages: in the first stage, polarimetric and radiometric calibrations (ignoring noise) were carried out using square-base trihedral corner reflector signatures and some properties of the clutter background. In the second stage, a novel algorithm was used to estimate the noise level in the polarimetric data channels by using the measured signature of an idealized rough surface with Bragg scattering (the ocean in this case). This estimated noise level was then used to correct the measured backscatter signatures from the agriculture fields. We examine the significance of several key parameters extracted from the calibrated and noise-corrected backscatter signatures. The significance is assessed in terms of the ability to uniquely separate among classes from 13 different backscatter types selected from the test site data, including eleven different crops, one forest and one ocean area. Using the parameters with the highest separation for a given class, we use a hierarchical algorithm to classify the entire image. We find that many classes, including ocean, forest, potato, and beet, can be identified with high reliability, while the classes for which no single parameter exhibits sufficient separation have higher rates of misclassification. We expect that modified decision criteria involving simultaneous consideration of several parameters increase performance for these classes.
Incremental learning of concept drift in nonstationary environments.
Elwell, Ryan; Polikar, Robi
2011-10-01
We introduce an ensemble of classifiers-based approach for incremental learning of concept drift, characterized by nonstationary environments (NSEs), where the underlying data distributions change over time. The proposed algorithm, named Learn(++). NSE, learns from consecutive batches of data without making any assumptions on the nature or rate of drift; it can learn from such environments that experience constant or variable rate of drift, addition or deletion of concept classes, as well as cyclical drift. The algorithm learns incrementally, as other members of the Learn(++) family of algorithms, that is, without requiring access to previously seen data. Learn(++). NSE trains one new classifier for each batch of data it receives, and combines these classifiers using a dynamically weighted majority voting. The novelty of the approach is in determining the voting weights, based on each classifier's time-adjusted accuracy on current and past environments. This approach allows the algorithm to recognize, and act accordingly, to the changes in underlying data distributions, as well as to a possible reoccurrence of an earlier distribution. We evaluate the algorithm on several synthetic datasets designed to simulate a variety of nonstationary environments, as well as a real-world weather prediction dataset. Comparisons with several other approaches are also included. Results indicate that Learn(++). NSE can track the changing environments very closely, regardless of the type of concept drift. To allow future use, comparison and benchmarking by interested researchers, we also release our data used in this paper. © 2011 IEEE
NASA Astrophysics Data System (ADS)
Antoine, David; Morel, Andre
1997-02-01
An algorithm is proposed for the atmospheric correction of the ocean color observations by the MERIS instrument. The principle of the algorithm, which accounts for all multiple scattering effects, is presented. The algorithm is then teste, and its accuracy assessed in terms of errors in the retrieved marine reflectances.
On Algorithms for Generating Computationally Simple Piecewise Linear Classifiers
1989-05-01
suffers. - Waveform classification, e.g. speech recognition, seismic analysis (i.e. discrimination between earthquakes and nuclear explosions), target...assuming Gaussian distributions (B-G) d) Bayes classifier with probability densities estimated with the k-N-N method (B- kNN ) e) The -arest neighbour...range of classifiers are chosen including a fast, easy computable and often used classifier (B-G), reliable and complex classifiers (B- kNN and NNR
Classifying epileptic EEG signals with delay permutation entropy and Multi-Scale K-means.
Zhu, Guohun; Li, Yan; Wen, Peng Paul; Wang, Shuaifang
2015-01-01
Most epileptic EEG classification algorithms are supervised and require large training datasets, that hinder their use in real time applications. This chapter proposes an unsupervised Multi-Scale K-means (MSK-means) MSK-means algorithm to distinguish epileptic EEG signals and identify epileptic zones. The random initialization of the K-means algorithm can lead to wrong clusters. Based on the characteristics of EEGs, the MSK-means MSK-means algorithm initializes the coarse-scale centroid of a cluster with a suitable scale factor. In this chapter, the MSK-means algorithm is proved theoretically superior to the K-means algorithm on efficiency. In addition, three classifiers: the K-means, MSK-means MSK-means and support vector machine (SVM), are used to identify seizure and localize epileptogenic zone using delay permutation entropy features. The experimental results demonstrate that identifying seizure with the MSK-means algorithm and delay permutation entropy achieves 4. 7 % higher accuracy than that of K-means, and 0. 7 % higher accuracy than that of the SVM.
Convolutional neural networks for transient candidate vetting in large-scale surveys
NASA Astrophysics Data System (ADS)
Gieseke, Fabian; Bloemen, Steven; van den Bogaard, Cas; Heskes, Tom; Kindler, Jonas; Scalzo, Richard A.; Ribeiro, Valério A. R. M.; van Roestel, Jan; Groot, Paul J.; Yuan, Fang; Möller, Anais; Tucker, Brad E.
2017-12-01
Current synoptic sky surveys monitor large areas of the sky to find variable and transient astronomical sources. As the number of detections per night at a single telescope easily exceeds several thousand, current detection pipelines make intensive use of machine learning algorithms to classify the detected objects and to filter out the most interesting candidates. A number of upcoming surveys will produce up to three orders of magnitude more data, which renders high-precision classification systems essential to reduce the manual and, hence, expensive vetting by human experts. We present an approach based on convolutional neural networks to discriminate between true astrophysical sources and artefacts in reference-subtracted optical images. We show that relatively simple networks are already competitive with state-of-the-art systems and that their quality can further be improved via slightly deeper networks and additional pre-processing steps - eventually yielding models outperforming state-of-the-art systems. In particular, our best model correctly classifies about 97.3 per cent of all 'real' and 99.7 per cent of all 'bogus' instances on a test set containing 1942 'bogus' and 227 'real' instances in total. Furthermore, the networks considered in this work can also successfully classify these objects at hand without relying on difference images, which might pave the way for future detection pipelines not containing image subtraction steps at all.
Effective user guidance in online interactive semantic segmentation
NASA Astrophysics Data System (ADS)
Petersen, Jens; Bendszus, Martin; Debus, Jürgen; Heiland, Sabine; Maier-Hein, Klaus H.
2017-03-01
With the recent success of machine learning based solutions for automatic image parsing, the availability of reference image annotations for algorithm training is one of the major bottlenecks in medical image segmentation. We are interested in interactive semantic segmentation methods that can be used in an online fashion to generate expert segmentations. These can be used to train automated segmentation techniques or, from an application perspective, for quick and accurate tumor progression monitoring. Using simulated user interactions in a MRI glioblastoma segmentation task, we show that if the user possesses knowledge of the correct segmentation it is significantly (p <= 0.009) better to present data and current segmentation to the user in such a manner that they can easily identify falsely classified regions compared to guiding the user to regions where the classifier exhibits high uncertainty, resulting in differences of mean Dice scores between +0.070 (Whole tumor) and +0.136 (Tumor Core) after 20 iterations. The annotation process should cover all classes equally, which results in a significant (p <= 0.002) improvement compared to completely random annotations anywhere in falsely classified regions for small tumor regions such as the necrotic tumor core (mean Dice +0.151 after 20 it.) and non-enhancing abnormalities (mean Dice +0.069 after 20 it.). These findings provide important insights for the development of efficient interactive segmentation systems and user interfaces.
Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei
2014-09-01
In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Ahlers, Volker; Weigl, Paul; Schachtzabel, Hartmut
2005-04-01
Due to the increasing demand for high-quality ceramic crowns and bridges, the CAD/CAM-based production of dental restorations has been a subject of intensive research during the last fifteen years. A prerequisite for the efficient processing of the 3D measurement of prepared teeth with a minimal amount of user interaction is the automatic determination of the preparation line, which defines the sealing margin between the restoration and the prepared tooth. Current dental CAD/CAM systems mostly require the interactive definition of the preparation line by the user, at least by means of giving a number of start points. Previous approaches to the automatic extraction of the preparation line rely on single contour detection algorithms. In contrast, we use a combination of different contour detection algorithms to find several independent potential preparation lines from a height profile of the measured data. The different algorithms (gradient-based, contour-based, and region-based) show their strengths and weaknesses in different clinical situations. A classifier consisting of three stages (range check, decision tree, support vector machine), which is trained by human experts with real-world data, finally decides which is the correct preparation line. In a test with 101 clinical preparations, a success rate of 92.0% has been achieved. Thus the combination of different contour detection algorithms yields a reliable method for the automatic extraction of the preparation line, which enables the setup of a turn-key dental CAD/CAM process chain with a minimal amount of interactive screen work.
Sokolenko, Stanislav; Aucoin, Marc G
2015-09-04
The growing ubiquity of metabolomic techniques has facilitated high frequency time-course data collection for an increasing number of applications. While the concentration trends of individual metabolites can be modeled with common curve fitting techniques, a more accurate representation of the data needs to consider effects that act on more than one metabolite in a given sample. To this end, we present a simple algorithm that uses nonparametric smoothing carried out on all observed metabolites at once to identify and correct systematic error from dilution effects. In addition, we develop a simulation of metabolite concentration time-course trends to supplement available data and explore algorithm performance. Although we focus on nuclear magnetic resonance (NMR) analysis in the context of cell culture, a number of possible extensions are discussed. Realistic metabolic data was successfully simulated using a 4-step process. Starting with a set of metabolite concentration time-courses from a metabolomic experiment, each time-course was classified as either increasing, decreasing, concave, or approximately constant. Trend shapes were simulated from generic functions corresponding to each classification. The resulting shapes were then scaled to simulated compound concentrations. Finally, the scaled trends were perturbed using a combination of random and systematic errors. To detect systematic errors, a nonparametric fit was applied to each trend and percent deviations calculated at every timepoint. Systematic errors could be identified at time-points where the median percent deviation exceeded a threshold value, determined by the choice of smoothing model and the number of observed trends. Regardless of model, increasing the number of observations over a time-course resulted in more accurate error estimates, although the improvement was not particularly large between 10 and 20 samples per trend. The presented algorithm was able to identify systematic errors as small as 2.5 % under a wide range of conditions. Both the simulation framework and error correction method represent examples of time-course analysis that can be applied to further developments in (1)H-NMR methodology and the more general application of quantitative metabolomics.
Selection of Norway spruce somatic embryos by computer vision
NASA Astrophysics Data System (ADS)
Hamalainen, Jari J.; Jokinen, Kari J.
1993-05-01
A computer vision system was developed for the classification of plant somatic embryos. The embryos are in a Petri dish that is transferred with constant speed and they are recognized as they pass a line scan camera. A classification algorithm needs to be installed for every plant species. This paper describes an algorithm for the recognition of Norway spruce (Picea abies) embryos. A short review of conifer micropropagation by somatic embryogenesis is also given. The recognition algorithm is based on features calculated from the boundary of the object. Only part of the boundary corresponding to the developing cotyledons (2 - 15) and the straight sides of the embryo are used for recognition. An index of the length of the cotyledons describes the developmental stage of the embryo. The testing set for classifier performance consisted of 118 embryos and 478 nonembryos. With the classification tolerances chosen 69% of the objects classified as embryos by a human classifier were selected and 31$% rejected. Less than 1% of the nonembryos were classified as embryos. The basic features developed can probably be easily adapted for the recognition of other conifer somatic embryos.
NASA Astrophysics Data System (ADS)
Prochazka, D.; Mazura, M.; Samek, O.; Rebrošová, K.; Pořízka, P.; Klus, J.; Prochazková, P.; Novotný, J.; Novotný, K.; Kaiser, J.
2018-01-01
In this work, we investigate the impact of data provided by complementary laser-based spectroscopic methods on multivariate classification accuracy. Discrimination and classification of five Staphylococcus bacterial strains and one strain of Escherichia coli is presented. The technique that we used for measurements is a combination of Raman spectroscopy and Laser-Induced Breakdown Spectroscopy (LIBS). Obtained spectroscopic data were then processed using Multivariate Data Analysis algorithms. Principal Components Analysis (PCA) was selected as the most suitable technique for visualization of bacterial strains data. To classify the bacterial strains, we used Neural Networks, namely a supervised version of Kohonen's self-organizing maps (SOM). We were processing results in three different ways - separately from LIBS measurements, from Raman measurements, and we also merged data from both mentioned methods. The three types of results were then compared. By applying the PCA to Raman spectroscopy data, we observed that two bacterial strains were fully distinguished from the rest of the data set. In the case of LIBS data, three bacterial strains were fully discriminated. Using a combination of data from both methods, we achieved the complete discrimination of all bacterial strains. All the data were classified with a high success rate using SOM algorithm. The most accurate classification was obtained using a combination of data from both techniques. The classification accuracy varied, depending on specific samples and techniques. As for LIBS, the classification accuracy ranged from 45% to 100%, as for Raman Spectroscopy from 50% to 100% and in case of merged data, all samples were classified correctly. Based on the results of the experiments presented in this work, we can assume that the combination of Raman spectroscopy and LIBS significantly enhances discrimination and classification accuracy of bacterial species and strains. The reason is the complementarity in obtained chemical information while using these two methods.
Phytoplankton global mapping from space with a support vector machine algorithm
NASA Astrophysics Data System (ADS)
de Boissieu, Florian; Menkes, Christophe; Dupouy, Cécile; Rodier, Martin; Bonnet, Sophie; Mangeas, Morgan; Frouin, Robert J.
2014-11-01
In recent years great progress has been made in global mapping of phytoplankton from space. Two main trends have emerged, the recognition of phytoplankton functional types (PFT) based on reflectance normalized to chlorophyll-a concentration, and the recognition of phytoplankton size class (PSC) based on the relationship between cell size and chlorophyll-a concentration. However, PFTs and PSCs are not decorrelated, and one approach can complement the other in a recognition task. In this paper, we explore the recognition of several dominant PFTs by combining reflectance anomalies, chlorophyll-a concentration and other environmental parameters, such as sea surface temperature and wind speed. Remote sensing pixels are labeled thanks to coincident in-situ pigment data from GeP&CO, NOMAD and MAREDAT datasets, covering various oceanographic environments. The recognition is made with a supervised Support Vector Machine classifier trained on the labeled pixels. This algorithm enables a non-linear separation of the classes in the input space and is especially adapted for small training datasets as available here. Moreover, it provides a class probability estimate, allowing one to enhance the robustness of the classification results through the choice of a minimum probability threshold. A greedy feature selection associated to a 10-fold cross-validation procedure is applied to select the most discriminative input features and evaluate the classification performance. The best classifiers are finally applied on daily remote sensing datasets (SeaWIFS, MODISA) and the resulting dominant PFT maps are compared with other studies. Several conclusions are drawn: (1) the feature selection highlights the weight of temperature, chlorophyll-a and wind speed variables in phytoplankton recognition; (2) the classifiers show good results and dominant PFT maps in agreement with phytoplankton distribution knowledge; (3) classification on MODISA data seems to perform better than on SeaWIFS data, (4) the probability threshold screens correctly the areas of smallest confidence such as the interclass regions.
Olofsson, Per; Norén, Håkan; Carlsson, Ann
2018-02-01
The updated intrapartum cardiotocography (CTG) classification system by FIGO in 2015 (FIGO2015) and the FIGO2015-approached classification by the Swedish Society of Obstetricians and Gynecologist in 2017 (SSOG2017) are not harmonized with the fetal ECG ST analysis (STAN) algorithm from 2007 (STAN2007). The study aimed to reveal homogeneity and agreement between the systems in classifying CTG and ST events, and relate them to maternal and perinatal outcomes. Among CTG traces with ST events, 100 traces originally classified as normal, 100 as suspicious and 100 as pathological were randomly selected from a STAN database and classified by two experts in consensus. Homogeneity and agreement statistics between the CTG classifications were performed. Maternal and perinatal outcomes were evaluated in cases with clinically hidden ST data (n = 151). A two-tailed p < 0.05 was regarded as significant. For CTG classes, the heterogeneity was significant between the old and new systems, and agreements were moderate to strong (proportion of agreement, kappa index 0.70-0.86). Between the new classifications, heterogeneity was significant and agreements strong (0.90, 0.92). For significant ST events, heterogeneities were significant and agreements moderate to almost perfect (STAN2007 vs. FIGO2015 0.86, 0.72; STAN2007 vs. SSOG2017 0.92, 0.84; FIGO2015 vs. SSOG2017 0.94, 0.87). Significant ST events occurred more often combined with STAN2007 than with FIGO2015 classification, but not with SSOG2017; correct identification of adverse outcomes was not significantly different between the systems. There are discrepancies in the classification of CTG patterns and significant ST events between the old and new systems. The clinical relevance of the findings remains to be shown. © 2017 The Authors. Acta Obstetricia et Gynecologica Scandinavica published by John Wiley & Sons Ltd on behalf of Nordic Federation of Societies of Obstetrics and Gynecology (NFOG).
PPCM: Combing multiple classifiers to improve protein-protein interaction prediction
Yao, Jianzhuang; Guo, Hong; Yang, Xiaohan
2015-08-01
Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using anmore » assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. Ultimately, this pipeline will be useful for predicting PPI in nonmodel species.« less
Iannaccone, Reto; Hauser, Tobias U; Ball, Juliane; Brandeis, Daniel; Walitza, Susanne; Brem, Silvia
2015-10-01
Attention-deficit/hyperactivity disorder (ADHD) is a common disabling psychiatric disorder associated with consistent deficits in error processing, inhibition and regionally decreased grey matter volumes. The diagnosis is based on clinical presentation, interviews and questionnaires, which are to some degree subjective and would benefit from verification through biomarkers. Here, pattern recognition of multiple discriminative functional and structural brain patterns was applied to classify adolescents with ADHD and controls. Functional activation features in a Flanker/NoGo task probing error processing and inhibition along with structural magnetic resonance imaging data served to predict group membership using support vector machines (SVMs). The SVM pattern recognition algorithm correctly classified 77.78% of the subjects with a sensitivity and specificity of 77.78% based on error processing. Predictive regions for controls were mainly detected in core areas for error processing and attention such as the medial and dorsolateral frontal areas reflecting deficient processing in ADHD (Hart et al., in Hum Brain Mapp 35:3083-3094, 2014), and overlapped with decreased activations in patients in conventional group comparisons. Regions more predictive for ADHD patients were identified in the posterior cingulate, temporal and occipital cortex. Interestingly despite pronounced univariate group differences in inhibition-related activation and grey matter volumes the corresponding classifiers failed or only yielded a poor discrimination. The present study corroborates the potential of task-related brain activation for classification shown in previous studies. It remains to be clarified whether error processing, which performed best here, also contributes to the discrimination of useful dimensions and subtypes, different psychiatric disorders, and prediction of treatment success across studies and sites.
Support vector machine as a binary classifier for automated object detection in remotely sensed data
NASA Astrophysics Data System (ADS)
Wardaya, P. D.
2014-02-01
In the present paper, author proposes the application of Support Vector Machine (SVM) for the analysis of satellite imagery. One of the advantages of SVM is that, with limited training data, it may generate comparable or even better results than the other methods. The SVM algorithm is used for automated object detection and characterization. Specifically, the SVM is applied in its basic nature as a binary classifier where it classifies two classes namely, object and background. The algorithm aims at effectively detecting an object from its background with the minimum training data. The synthetic image containing noises is used for algorithm testing. Furthermore, it is implemented to perform remote sensing image analysis such as identification of Island vegetation, water body, and oil spill from the satellite imagery. It is indicated that SVM provides the fast and accurate analysis with the acceptable result.
Martin, Bryan D.; Wolfson, Julian; Adomavicius, Gediminas; Fan, Yingling
2017-01-01
We propose and compare combinations of several methods for classifying transportation activity data from smartphone GPS and accelerometer sensors. We have two main objectives. First, we aim to classify our data as accurately as possible. Second, we aim to reduce the dimensionality of the data as much as possible in order to reduce the computational burden of the classification. We combine dimension reduction and classification algorithms and compare them with a metric that balances accuracy and dimensionality. In doing so, we develop a classification algorithm that accurately classifies five different modes of transportation (i.e., walking, biking, car, bus and rail) while being computationally simple enough to run on a typical smartphone. Further, we use data that required no behavioral changes from the smartphone users to collect. Our best classification model uses the random forest algorithm to achieve 96.8% accuracy. PMID:28885550
Martin, Bryan D; Addona, Vittorio; Wolfson, Julian; Adomavicius, Gediminas; Fan, Yingling
2017-09-08
We propose and compare combinations of several methods for classifying transportation activity data from smartphone GPS and accelerometer sensors. We have two main objectives. First, we aim to classify our data as accurately as possible. Second, we aim to reduce the dimensionality of the data as much as possible in order to reduce the computational burden of the classification. We combine dimension reduction and classification algorithms and compare them with a metric that balances accuracy and dimensionality. In doing so, we develop a classification algorithm that accurately classifies five different modes of transportation (i.e., walking, biking, car, bus and rail) while being computationally simple enough to run on a typical smartphone. Further, we use data that required no behavioral changes from the smartphone users to collect. Our best classification model uses the random forest algorithm to achieve 96.8% accuracy.
A novel algorithm for simplification of complex gene classifiers in cancer
Wilson, Raphael A.; Teng, Ling; Bachmeyer, Karen M.; Bissonnette, Mei Lin Z.; Husain, Aliya N.; Parham, David M.; Triche, Timothy J.; Wing, Michele R.; Gastier-Foster, Julie M.; Barr, Frederic G.; Hawkins, Douglas S.; Anderson, James R.; Skapek, Stephen X.; Volchenboum, Samuel L.
2013-01-01
The clinical application of complex molecular classifiers as diagnostic or prognostic tools has been limited by the time and cost needed to apply them to patients. Using an existing fifty-gene expression signature known to separate two molecular subtypes of the pediatric cancer rhabdomyosarcoma, we show that an exhaustive iterative search algorithm can distill this complex classifier down to two or three features with equal discrimination. We validated the two-gene signatures using three separate and distinct data sets, including one that uses degraded RNA extracted from formalin-fixed, paraffin-embedded material. Finally, to demonstrate the generalizability of our algorithm, we applied it to a lung cancer data set to find minimal gene signatures that can distinguish survival. Our approach can easily be generalized and coupled to existing technical platforms to facilitate the discovery of simplified signatures that are ready for routine clinical use. PMID:23913937
NASA Technical Reports Server (NTRS)
Mazzoni, Dominic; Wagstaff, Kiri; Bornstein, Benjamin; Tang, Nghia; Roden, Joseph
2006-01-01
PixelLearn is an integrated user-interface computer program for classifying pixels in scientific images. Heretofore, training a machine-learning algorithm to classify pixels in images has been tedious and difficult. PixelLearn provides a graphical user interface that makes it faster and more intuitive, leading to more interactive exploration of image data sets. PixelLearn also provides image-enhancement controls to make it easier to see subtle details in images. PixelLearn opens images or sets of images in a variety of common scientific file formats and enables the user to interact with several supervised or unsupervised machine-learning pixel-classifying algorithms while the user continues to browse through the images. The machinelearning algorithms in PixelLearn use advanced clustering and classification methods that enable accuracy much higher than is achievable by most other software previously available for this purpose. PixelLearn is written in portable C++ and runs natively on computers running Linux, Windows, or Mac OS X.
Classification of the Correct Quranic Letters Pronunciation of Male and Female Reciters
NASA Astrophysics Data System (ADS)
Khairuddin, Safiah; Ahmad, Salmiah; Embong, Abdul Halim; Nur Wahidah Nik Hashim, Nik; Altamas, Tareq M. K.; Nuratikah Syd Badaruddin, Syarifah; Shahbudin Hassan, Surul
2017-11-01
Recitation of the Holy Quran with the correct Tajweed is essential for every Muslim. Islam has encouraged Quranic education since early age as the recitation of the Quran correctly will represent the correct meaning of the words of Allah. It is important to recite the Quranic verses according to its characteristics (sifaat) and from its point of articulations (makhraj). This paper presents the identification and classification analysis of Quranic letters pronunciation for both male and female reciters, to obtain the unique representation of each letter by male as compared to female expert reciters. Linear Discriminant Analysis (LDA) was used as the classifier to classify the data with Formants and Power Spectral Density (PSD) as the acoustic features. The result shows that linear classifier of PSD with band 1 and band 2 power spectral combinations gives a high percentage of classification accuracy for most of the Quranic letters. It is also shown that the pronunciation by male reciters gives better result in the classification of the Quranic letters.
Meat mixture detection in Iberian pork sausages.
Ortiz-Somovilla, V; España-España, F; De Pedro-Sanz, E J; Gaitán-Jurado, A J
2005-11-01
Five homogenized meat mixture treatments of Iberian (I) and/or Standard (S) pork were set up. Each treatment was analyzed by NIRS as a fresh product (N=75) and as dry-cured sausage (N=75). Spectra acquisition was carried out using DA 7000 equipment (Perten Instruments), obtaining a total of 750 spectra. Several absorption peaks and bands were selected as the most representative for homogenized dry-cured and fresh sausages. Discriminant analysis and mixture prediction equations were carried out based on the spectral data gathered. The best results using discriminant models were for fresh products, with 98.3% (calibration) and 60% (validation) correct classification. For dry-cured sausages 91.7% (calibration) and 80% (validation) of the samples were correctly classified. Models developed using mixture prediction equations showed SECV=4.7, r(2)=0.98 (calibration) and 73.3% of validation set were correctly classified for the fresh product. These values for dry-cured sausages were SECV=5.9, r(2)=0.99 (calibration) and 93.3% correctly classified for validation.
NASA Astrophysics Data System (ADS)
Lee, Kwon-Ho; Kim, Wonkook
2017-04-01
The geostationary ocean color imager-II (GOCI-II), designed to be focused on the ocean environmental monitoring with better spatial (250m for local and 1km for full disk) and spectral resolution (13 bands) then the current operational mission of the GOCI-I. GOCI-II will be launched in 2018. This study presents currently developing algorithm for atmospheric correction and retrieval of surface reflectance over land to be optimized with the sensor's characteristics. We first derived the top-of-atmosphere radiances as the proxy data derived from the parameterized radiative transfer code in the 13 bands of GOCI-II. Based on the proxy data, the algorithm has been made with cloud masking, gas absorption correction, aerosol inversion, computation of aerosol extinction correction. The retrieved surface reflectances are evaluated by the MODIS level 2 surface reflectance products (MOD09). For the initial test period, the algorithm gave error of within 0.05 compared to MOD09. Further work will be progressed to fully implement the GOCI-II Ground Segment system (G2GS) algorithm development environment. These atmospherically corrected surface reflectance product will be the standard GOCI-II product after launch.
Learning with imperfectly labeled patterns
NASA Technical Reports Server (NTRS)
Chittineni, C. B.
1979-01-01
The problem of learning in pattern recognition using imperfectly labeled patterns is considered. The performance of the Bayes and nearest neighbor classifiers with imperfect labels is discussed using a probabilistic model for the mislabeling of the training patterns. Schemes for training the classifier using both parametric and non parametric techniques are presented. Methods for the correction of imperfect labels were developed. To gain an understanding of the learning process, expressions are derived for success probability as a function of training time for a one dimensional increment error correction classifier with imperfect labels. Feature selection with imperfectly labeled patterns is described.
Fan, Jianping; Gao, Yuli; Luo, Hangzai
2008-03-01
In this paper, we have developed a new scheme for achieving multilevel annotations of large-scale images automatically. To achieve more sufficient representation of various visual properties of the images, both the global visual features and the local visual features are extracted for image content representation. To tackle the problem of huge intraconcept visual diversity, multiple types of kernels are integrated to characterize the diverse visual similarity relationships between the images more precisely, and a multiple kernel learning algorithm is developed for SVM image classifier training. To address the problem of huge interconcept visual similarity, a novel multitask learning algorithm is developed to learn the correlated classifiers for the sibling image concepts under the same parent concept and enhance their discrimination and adaptation power significantly. To tackle the problem of huge intraconcept visual diversity for the image concepts at the higher levels of the concept ontology, a novel hierarchical boosting algorithm is developed to learn their ensemble classifiers hierarchically. In order to assist users on selecting more effective hypotheses for image classifier training, we have developed a novel hyperbolic framework for large-scale image visualization and interactive hypotheses assessment. Our experiments on large-scale image collections have also obtained very positive results.
Multiple Ordinal Regression by Maximizing the Sum of Margins
Hamsici, Onur C.; Martinez, Aleix M.
2016-01-01
Human preferences are usually measured using ordinal variables. A system whose goal is to estimate the preferences of humans and their underlying decision mechanisms requires to learn the ordering of any given sample set. We consider the solution of this ordinal regression problem using a Support Vector Machine algorithm. Specifically, the goal is to learn a set of classifiers with common direction vectors and different biases correctly separating the ordered classes. Current algorithms are either required to solve a quadratic optimization problem, which is computationally expensive, or are based on maximizing the minimum margin (i.e., a fixed margin strategy) between a set of hyperplanes, which biases the solution to the closest margin. Another drawback of these strategies is that they are limited to order the classes using a single ranking variable (e.g., perceived length). In this paper, we define a multiple ordinal regression algorithm based on maximizing the sum of the margins between every consecutive class with respect to one or more rankings (e.g., perceived length and weight). We provide derivations of an efficient, easy-to-implement iterative solution using a Sequential Minimal Optimization procedure. We demonstrate the accuracy of our solutions in several datasets. In addition, we provide a key application of our algorithms in estimating human subjects’ ordinal classification of attribute associations to object categories. We show that these ordinal associations perform better than the binary one typically employed in the literature. PMID:26529784
Algorithms for detecting antibodies to HIV-1: results from a rural Ugandan cohort.
Nunn, A J; Biryahwaho, B; Downing, R G; van der Groen, G; Ojwiya, A; Mulder, D W
1993-08-01
To evaluate an algorithm using two enzyme immunoassays (EIA) for anti-HIV-1 antibodies in a rural African population and to assess alternative simplified algorithms. Sera obtained from 7895 individuals in a rural population survey were tested using an algorithm based on two different EIA systems: Recombigen HIV-1 EIA and Wellcozyme HIV-1 Recombinant. Alternative algorithms were assessed using negative or confirmed positive sera. None of the 227 sera classified as unequivocably negative by the two assays were positive by Western blot. Of 192 sera unequivocably positive by both assays, four were seronegative by Western blot. The possibility of technical error cannot be ruled out in three of these. One of the alternative algorithms assessed classified all borderline or discordant assay results as negative had a specificity of 100% and a sensitivity of 98.4%. The cost of this algorithm is one-third that of the conventional algorithm. Our evaluation suggests that high specificity and sensitivity can be obtained without using Western blot and at a considerable reduction in cost.
NASA Astrophysics Data System (ADS)
Rosete-Aguilar, Martha
2000-06-01
In this paper a lens correction algorithm based on the see- saw diagram developed by Burch is described. The see-saw diagram describes the image correction in rotationally symmetric systems over a finite field of view by means of aspherics surfaces. The algorithm is applied to the design of some basic telescopic configurations such as the classical Cassegrain telescope, the Dall-Kirkham telescope, the Pressman-Camichel telescope and the Ritchey-Chretien telescope in order to show a physically visualizable concept of image correction for optical systems that employ aspheric surfaces. By using the see-saw method the student can visualize the different possible configurations of such telescopes as well as their performances and also the student will be able to understand that it is not always possible to correct more primary aberrations by aspherizing more surfaces.
Liang, Kun; Yang, Cailan; Peng, Li; Zhou, Bo
2017-02-01
In uncooled long-wave IR camera systems, the temperature of a focal plane array (FPA) is variable along with the environmental temperature as well as the operating time. The spatial nonuniformity of the FPA, which is partly affected by the FPA temperature, obviously changes as well, resulting in reduced image quality. This study presents a real-time nonuniformity correction algorithm based on FPA temperature to compensate for nonuniformity caused by FPA temperature fluctuation. First, gain coefficients are calculated using a two-point correction technique. Then offset parameters at different FPA temperatures are obtained and stored in tables. When the camera operates, the offset tables are called to update the current offset parameters via a temperature-dependent interpolation. Finally, the gain coefficients and offset parameters are used to correct the output of the IR camera in real time. The proposed algorithm is evaluated and compared with two representative shutterless algorithms [minimizing the sum of the squares of errors algorithm (MSSE), template-based solution algorithm (TBS)] using IR images captured by a 384×288 pixel uncooled IR camera with a 17 μm pitch. Experimental results show that this method can quickly trace the response drift of the detector units when the FPA temperature changes. The quality of the proposed algorithm is as good as MSSE, while the processing time is as short as TBS, which means the proposed algorithm is good for real-time control and at the same time has a high correction effect.
Foo, Brian; van der Schaar, Mihaela
2010-11-01
In this paper, we discuss distributed optimization techniques for configuring classifiers in a real-time, informationally-distributed stream mining system. Due to the large volume of streaming data, stream mining systems must often cope with overload, which can lead to poor performance and intolerable processing delay for real-time applications. Furthermore, optimizing over an entire system of classifiers is a difficult task since changing the filtering process at one classifier can impact both the feature values of data arriving at classifiers further downstream and thus, the classification performance achieved by an ensemble of classifiers, as well as the end-to-end processing delay. To address this problem, this paper makes three main contributions: 1) Based on classification and queuing theoretic models, we propose a utility metric that captures both the performance and the delay of a binary filtering classifier system. 2) We introduce a low-complexity framework for estimating the system utility by observing, estimating, and/or exchanging parameters between the inter-related classifiers deployed across the system. 3) We provide distributed algorithms to reconfigure the system, and analyze the algorithms based on their convergence properties, optimality, information exchange overhead, and rate of adaptation to non-stationary data sources. We provide results using different video classifier systems.
Luo, Qiang; Yan, Zhuangzhi; Gu, Dongxing; Cao, Lei
This paper proposed an image interpolation algorithm based on bilinear interpolation and a color correction algorithm based on polynomial regression on FPGA, which focused on the limited number of imaging pixels and color distortion of the ultra-thin electronic endoscope. Simulation experiment results showed that the proposed algorithm realized the real-time display of 1280 x 720@60Hz HD video, and using the X-rite color checker as standard colors, the average color difference was reduced about 30% comparing with that before color correction.
Energy shadowing correction of ultrasonic pulse-echo records by digital signal processing
NASA Technical Reports Server (NTRS)
Kishoni, D.; Heyman, J. S.
1986-01-01
Attention is given to a numerical algorithm that, via signal processing, enables the dynamic correction of the shadowing effect of reflections on ultrasonic displays. The algorithm was applied to experimental data from graphite-epoxy composite material immersed in a water bath. It is concluded that images of material defects with the shadowing corrections allow for a more quantitative interpretation of the material state. It is noted that the proposed algorithm is fast and simple enough to be adopted for real time applications in industry.
Automated detection of tuberculosis on sputum smeared slides using stepwise classification
NASA Astrophysics Data System (ADS)
Divekar, Ajay; Pangilinan, Corina; Coetzee, Gerrit; Sondh, Tarlochan; Lure, Fleming Y. M.; Kennedy, Sean
2012-03-01
Routine visual slide screening for identification of tuberculosis (TB) bacilli in stained sputum slides under microscope system is a tedious labor-intensive task and can miss up to 50% of TB. Based on the Shannon cofactor expansion on Boolean function for classification, a stepwise classification (SWC) algorithm is developed to remove different types of false positives, one type at a time, and to increase the detection of TB bacilli at different concentrations. Both bacilli and non-bacilli objects are first analyzed and classified into several different categories including scanty positive, high concentration positive, and several non-bacilli categories: small bright objects, beaded, dim elongated objects, etc. The morphological and contrast features are extracted based on aprior clinical knowledge. The SWC is composed of several individual classifiers. Individual classifier to increase the bacilli counts utilizes an adaptive algorithm based on a microbiologist's statistical heuristic decision process. Individual classifier to reduce false positive is developed through minimization from a binary decision tree to classify different types of true and false positive based on feature vectors. Finally, the detection algorithm is was tested on 102 independent confirmed negative and 74 positive cases. A multi-class task analysis shows high accordance rate for negative, scanty, and high-concentration as 88.24%, 56.00%, and 97.96%, respectively. A binary-class task analysis using a receiver operating characteristics method with the area under the curve (Az) is also utilized to analyze the performance of this detection algorithm, showing the superior detection performance on the high-concentration cases (Az=0.913) and cases mixed with high-concentration and scanty cases (Az=0.878).
A low computation cost method for seizure prediction.
Zhang, Yanli; Zhou, Weidong; Yuan, Qi; Wu, Qi
2014-10-01
The dynamic changes of electroencephalograph (EEG) signals in the period prior to epileptic seizures play a major role in the seizure prediction. This paper proposes a low computation seizure prediction algorithm that combines a fractal dimension with a machine learning algorithm. The presented seizure prediction algorithm extracts the Higuchi fractal dimension (HFD) of EEG signals as features to classify the patient's preictal or interictal state with Bayesian linear discriminant analysis (BLDA) as a classifier. The outputs of BLDA are smoothed by a Kalman filter for reducing possible sporadic and isolated false alarms and then the final prediction results are produced using a thresholding procedure. The algorithm was evaluated on the intracranial EEG recordings of 21 patients in the Freiburg EEG database. For seizure occurrence period of 30 min and 50 min, our algorithm obtained an average sensitivity of 86.95% and 89.33%, an average false prediction rate of 0.20/h, and an average prediction time of 24.47 min and 39.39 min, respectively. The results confirm that the changes of HFD can serve as a precursor of ictal activities and be used for distinguishing between interictal and preictal epochs. Both HFD and BLDA classifier have a low computational complexity. All of these make the proposed algorithm suitable for real-time seizure prediction. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Lazaro, Clara; Fernandes, Joanna M.
2015-12-01
The GNSS-derived Path Delay (GPD) and the Data Combination (DComb) algorithms were developed by University of Porto (U.Porto), in the scope of different projects funded by ESA, to compute a continuous and improved wet tropospheric correction (WTC) for use in satellite altimetry. Both algorithms are mission independent and are based on a linear space-time objective analysis procedure that combines various wet path delay data sources. A new algorithm that gets the best of each aforementioned algorithm (GNSS-derived Path Delay Plus, GPD+) has been developed at U.Porto in the scope of SL_cci project, where the use of consistent and stable in time datasets is of major importance. The algorithm has been applied to the main eight altimetric missions (TOPEX/Poseidon, Jason-1, Jason-2, ERS-1, ERS-2, Envisat and CryoSat-2 and SARAL). Upcoming Sentinel-3 possesses a two-channel on-board radiometer similar to those that were deployed in ERS-1/2 and Envisat. Consequently, the fine-tuning of the GPD+ algorithm to these missions datasets shall enrich it, by increasing its capability to quickly deal with Sentinel-3 data. Foreseeing that the computation of an improved MWR-based WTC for use with Sentinel-3 data will be required, this study focuses on the results obtained for ERS-1/2 and Envisat missions, which are expected to give insight into the computation of this correction for the upcoming ESA altimetric mission. The various WTC corrections available for each mission (in general, the original correction derived from the on-board MWR, the model correction and the one derived from GPD+) are inter-compared either directly or using various sea level anomaly variance statistical analyses. Results show that the GPD+ algorithm is efficient in generating global and continuous datasets, corrected for land and ice contamination and spurious measurements of instrumental origin, with significant impacts on all ESA missions.
Duchrow, Timo; Shtatland, Timur; Guettler, Daniel; Pivovarov, Misha; Kramer, Stefan; Weissleder, Ralph
2009-01-01
Background The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. Results Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. Conclusion Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases. The system can be accessed at . PMID:19799796
NASA Astrophysics Data System (ADS)
Kodali, Anuradha
In this thesis, we develop dynamic multiple fault diagnosis (DMFD) algorithms to diagnose faults that are sporadic and coupled. Firstly, we formulate a coupled factorial hidden Markov model-based (CFHMM) framework to diagnose dependent faults occurring over time (dynamic case). Here, we implement a mixed memory Markov coupling model to determine the most likely sequence of (dependent) fault states, the one that best explains the observed test outcomes over time. An iterative Gauss-Seidel coordinate ascent optimization method is proposed for solving the problem. A soft Viterbi algorithm is also implemented within the framework for decoding dependent fault states over time. We demonstrate the algorithm on simulated and real-world systems with coupled faults; the results show that this approach improves the correct isolation rate as compared to the formulation where independent fault states are assumed. Secondly, we formulate a generalization of set-covering, termed dynamic set-covering (DSC), which involves a series of coupled set-covering problems over time. The objective of the DSC problem is to infer the most probable time sequence of a parsimonious set of failure sources that explains the observed test outcomes over time. The DSC problem is NP-hard and intractable due to the fault-test dependency matrix that couples the failed tests and faults via the constraint matrix, and the temporal dependence of failure sources over time. Here, the DSC problem is motivated from the viewpoint of a dynamic multiple fault diagnosis problem, but it has wide applications in operations research, for e.g., facility location problem. Thus, we also formulated the DSC problem in the context of a dynamically evolving facility location problem. Here, a facility can be opened, closed, or can be temporarily unavailable at any time for a given requirement of demand points. These activities are associated with costs or penalties, viz., phase-in or phase-out for the opening or closing of a facility, respectively. The set-covering matrix encapsulates the relationship among the rows (tests or demand points) and columns (faults or locations) of the system at each time. By relaxing the coupling constraints using Lagrange multipliers, the DSC problem can be decoupled into independent subproblems, one for each column. Each subproblem is solved using the Viterbi decoding algorithm, and a primal feasible solution is constructed by modifying the Viterbi solutions via a heuristic. The proposed Viterbi-Lagrangian relaxation algorithm (VLRA) provides a measure of suboptimality via an approximate duality gap. As a major practical extension of the above problem, we also consider the problem of diagnosing faults with delayed test outcomes, termed delay-dynamic set-covering (DDSC), and experiment with real-world problems that exhibit masking faults. Also, we present simulation results on OR-library datasets (set-covering formulations are predominantly validated on these matrices in the literature), posed as facility location problems. Finally, we implement these algorithms to solve problems in aerospace and automotive applications. Firstly, we address the diagnostic ambiguity problem in aerospace and automotive applications by developing a dynamic fusion framework that includes dynamic multiple fault diagnosis algorithms. This improves the correct fault isolation rate, while minimizing the false alarm rates, by considering multiple faults instead of the traditional data-driven techniques based on single fault (class)-single epoch (static) assumption. The dynamic fusion problem is formulated as a maximum a posteriori decision problem of inferring the fault sequence based on uncertain outcomes of multiple binary classifiers over time. The fusion process involves three steps: the first step transforms the multi-class problem into dichotomies using error correcting output codes (ECOC), thereby solving the concomitant binary classification problems; the second step fuses the outcomes of multiple binary classifiers over time using a sliding window or block dynamic fusion method that exploits temporal data correlations over time. We solve this NP-hard optimization problem via a Lagrangian relaxation (variational) technique. The third step optimizes the classifier parameters, viz., probabilities of detection and false alarm, using a genetic algorithm. The proposed algorithm is demonstrated by computing the diagnostic performance metrics on a twin-spool commercial jet engine, an automotive engine, and UCI datasets (problems with high classification error are specifically chosen for experimentation). We show that the primal-dual optimization framework performed consistently better than any traditional fusion technique, even when it is forced to give a single fault decision across a range of classification problems. Secondly, we implement the inference algorithms to diagnose faults in vehicle systems that are controlled by a network of electronic control units (ECUs). The faults, originating from various interactions and especially between hardware and software, are particularly challenging to address. Our basic strategy is to divide the fault universe of such cyber-physical systems in a hierarchical manner, and monitor the critical variables/signals that have impact at different levels of interactions. The proposed diagnostic strategy is validated on an electrical power generation and storage system (EPGS) controlled by two ECUs in an environment with CANoe/MATLAB co-simulation. Eleven faults are injected with the failures originating in actuator hardware, sensor, controller hardware and software components. Diagnostic matrix is established to represent the relationship between the faults and the test outcomes (also known as fault signatures) via simulations. The results show that the proposed diagnostic strategy is effective in addressing the interaction-caused faults.
Single image non-uniformity correction using compressive sensing
NASA Astrophysics Data System (ADS)
Jian, Xian-zhong; Lu, Rui-zhi; Guo, Qiang; Wang, Gui-pu
2016-05-01
A non-uniformity correction (NUC) method for an infrared focal plane array imaging system was proposed. The algorithm, based on compressive sensing (CS) of single image, overcame the disadvantages of "ghost artifacts" and bulk calculating costs in traditional NUC algorithms. A point-sampling matrix was designed to validate the measurements of CS on the time domain. The measurements were corrected using the midway infrared equalization algorithm, and the missing pixels were solved with the regularized orthogonal matching pursuit algorithm. Experimental results showed that the proposed method can reconstruct the entire image with only 25% pixels. A small difference was found between the correction results using 100% pixels and the reconstruction results using 40% pixels. Evaluation of the proposed method on the basis of the root-mean-square error, peak signal-to-noise ratio, and roughness index (ρ) proved the method to be robust and highly applicable.
NASA Astrophysics Data System (ADS)
Mlynarczuk, Mariusz; Skiba, Marta
2017-06-01
The correct and consistent identification of the petrographic properties of coal is an important issue for researchers in the fields of mining and geology. As part of the study described in this paper, investigations concerning the application of artificial intelligence methods for the identification of the aforementioned characteristics were carried out. The methods in question were used to identify the maceral groups of coal, i.e. vitrinite, inertinite, and liptinite. Additionally, an attempt was made to identify some non-organic minerals. The analyses were performed using pattern recognition techniques (NN, kNN), as well as artificial neural network techniques (a multilayer perceptron - MLP). The classification process was carried out using microscopy images of polished sections of coals. A multidimensional feature space was defined, which made it possible to classify the discussed structures automatically, based on the methods of pattern recognition and algorithms of the artificial neural networks. Also, from the study we assessed the impact of the parameters for which the applied methods proved effective upon the final outcome of the classification procedure. The result of the analyses was a high percentage (over 97%) of correct classifications of maceral groups and mineral components. The paper discusses also an attempt to analyze particular macerals of the inertinite group. It was demonstrated that using artificial neural networks to this end makes it possible to classify the macerals properly in over 91% of cases. Thus, it was proved that artificial intelligence methods can be successfully applied for the identification of selected petrographic features of coal.
Surkis, Alisa; Hogle, Janice A; DiazGranados, Deborah; Hunt, Joe D; Mazmanian, Paul E; Connors, Emily; Westaby, Kate; Whipple, Elizabeth C; Adamus, Trisha; Mueller, Meridith; Aphinyanaphongs, Yindalon
2016-08-05
Translational research is a key area of focus of the National Institutes of Health (NIH), as demonstrated by the substantial investment in the Clinical and Translational Science Award (CTSA) program. The goal of the CTSA program is to accelerate the translation of discoveries from the bench to the bedside and into communities. Different classification systems have been used to capture the spectrum of basic to clinical to population health research, with substantial differences in the number of categories and their definitions. Evaluation of the effectiveness of the CTSA program and of translational research in general is hampered by the lack of rigor in these definitions and their application. This study adds rigor to the classification process by creating a checklist to evaluate publications across the translational spectrum and operationalizes these classifications by building machine learning-based text classifiers to categorize these publications. Based on collaboratively developed definitions, we created a detailed checklist for categories along the translational spectrum from T0 to T4. We applied the checklist to CTSA-linked publications to construct a set of coded publications for use in training machine learning-based text classifiers to classify publications within these categories. The training sets combined T1/T2 and T3/T4 categories due to low frequency of these publication types compared to the frequency of T0 publications. We then compared classifier performance across different algorithms and feature sets and applied the classifiers to all publications in PubMed indexed to CTSA grants. To validate the algorithm, we manually classified the articles with the top 100 scores from each classifier. The definitions and checklist facilitated classification and resulted in good inter-rater reliability for coding publications for the training set. Very good performance was achieved for the classifiers as represented by the area under the receiver operating curves (AUC), with an AUC of 0.94 for the T0 classifier, 0.84 for T1/T2, and 0.92 for T3/T4. The combination of definitions agreed upon by five CTSA hubs, a checklist that facilitates more uniform definition interpretation, and algorithms that perform well in classifying publications along the translational spectrum provide a basis for establishing and applying uniform definitions of translational research categories. The classification algorithms allow publication analyses that would not be feasible with manual classification, such as assessing the distribution and trends of publications across the CTSA network and comparing the categories of publications and their citations to assess knowledge transfer across the translational research spectrum.
Minimalist ensemble algorithms for genome-wide protein localization prediction.
Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun
2012-07-03
Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
Minimalist ensemble algorithms for genome-wide protein localization prediction
2012-01-01
Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391
Geometric and shading correction for images of printed materials using boundary.
Brown, Michael S; Tsoi, Yau-Chat
2006-06-01
A novel technique that uses boundary interpolation to correct geometric distortion and shading artifacts present in images of printed materials is presented. Unlike existing techniques, our algorithm can simultaneously correct a variety of geometric distortions, including skew, fold distortion, binder curl, and combinations of these. In addition, the same interpolation framework can be used to estimate the intrinsic illumination component of the distorted image to correct shading artifacts. We detail our algorithm for geometric and shading correction and demonstrate its usefulness on real-world and synthetic data.
Age determination of bottled Chinese rice wine by VIS-NIR spectroscopy
NASA Astrophysics Data System (ADS)
Yu, Haiyan; Lin, Tao; Ying, Yibin; Pan, Xingxiang
2006-10-01
The feasibility of non-invasive visible and near infrared (VIS-NIR) spectroscopy for determining wine age (1, 2, 3, 4, and 5 years) of Chinese rice wine was investigated. Samples of Chinese rice wine were analyzed in 600 mL square brown glass bottles with side length of approximately 64 mm at room temperature. VIS-NIR spectra of 100 bottled Chinese rice wine samples were collected in transmission mode in the wavelength range of 350-1200 nm by a fiber spectrometer system. Discriminant models were developed based on discriminant analysis (DA) together with raw, first and second derivative spectra. The concentration of alcoholic degree, total acid, and °Brix was determined to validate the NIR results. The calibration result for raw spectra was better than that for first and second derivative spectra. The percentage of samples correctly classified for raw spectra was 98%. For 1-, 2-, and 3-year-old sample groups, the sample were all correctly classified, and for 4- and 5-year-old sample groups, the percentage of samples correctly classified was 92.9%, respectively. In validation analysis, the percentage of samples correctly classified was 100%. The results demonstrated that VIS-NIR spectroscopic technique could be used as a non-invasive, rapid and reliable method for predicting wine age of bottled Chinese rice wine.
Localization Algorithms of Underwater Wireless Sensor Networks: A Survey
Han, Guangjie; Jiang, Jinfang; Shu, Lei; Xu, Yongjun; Wang, Feng
2012-01-01
In Underwater Wireless Sensor Networks (UWSNs), localization is one of most important technologies since it plays a critical role in many applications. Motivated by widespread adoption of localization, in this paper, we present a comprehensive survey of localization algorithms. First, we classify localization algorithms into three categories based on sensor nodes’ mobility: stationary localization algorithms, mobile localization algorithms and hybrid localization algorithms. Moreover, we compare the localization algorithms in detail and analyze future research directions of localization algorithms in UWSNs. PMID:22438752
NASA Astrophysics Data System (ADS)
Muslim, M. A.; Herowati, A. J.; Sugiharti, E.; Prasetiyo, B.
2018-03-01
A technique to dig valuable information buried or hidden in data collection which is so big to be found an interesting patterns that was previously unknown is called data mining. Data mining has been applied in the healthcare industry. One technique used data mining is classification. The decision tree included in the classification of data mining and algorithm developed by decision tree is C4.5 algorithm. A classifier is designed using applying pessimistic pruning in C4.5 algorithm in diagnosing chronic kidney disease. Pessimistic pruning use to identify and remove branches that are not needed, this is done to avoid overfitting the decision tree generated by the C4.5 algorithm. In this paper, the result obtained using these classifiers are presented and discussed. Using pessimistic pruning shows increase accuracy of C4.5 algorithm of 1.5% from 95% to 96.5% in diagnosing of chronic kidney disease.
Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin
2013-01-01
DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.
NASA Astrophysics Data System (ADS)
Raziff, Abdul Rafiez Abdul; Sulaiman, Md Nasir; Mustapha, Norwati; Perumal, Thinagaran
2017-10-01
Gait recognition is widely used in many applications. In the application of the gait identification especially in people, the number of classes (people) is many which may comprise to more than 20. Due to the large amount of classes, the usage of single classification mapping (direct classification) may not be suitable as most of the existing algorithms are mostly designed for the binary classification. Furthermore, having many classes in a dataset may result in the possibility of having a high degree of overlapped class boundary. This paper discusses the application of multiclass classifier mappings such as one-vs-all (OvA), one-vs-one (OvO) and random correction code (RCC) on handheld based smartphone gait signal for person identification. The results is then compared with a single J48 decision tree for benchmark. From the result, it can be said that using multiclass classification mapping method thus partially improved the overall accuracy especially on OvO and RCC with width factor more than 4. For OvA, the accuracy result is worse than a single J48 due to a high number of classes.
Abbot, Ted A; Premus, Vincent E; Abbot, Philip A; Mayer, Owen A
2012-09-01
This paper presents recent experimental results and a discussion of system enhancements made to the real-time autonomous humpback whale detector-classifier algorithm first presented by Abbot et al. [J. Acoust. Soc. Am. 127, 2894-2903 (2010)]. In February 2010, a second-generation system was deployed in an experiment conducted off of leeward Kauai during which 26 h of humpback vocalizations were recorded via sonobuoy and processed in real time. These data have been analyzed along with 40 h of humpbacks-absent data collected from the same location during July-August 2009. The extensive whales-absent data set in particular has enabled the quantification of system false alarm rates and the measurement of receiver operating characteristic curves. The performance impact of three enhancements incorporated into the second-generation system are discussed, including (1) a method to eliminate redundancy in the kernel library, (2) increased use of contextual analysis, and (3) the augmentation of the training data with more recent humpback vocalizations. It will be shown that the performance of the real-time system was improved to yield a probability of correct classification of 0.93 and a probability of false alarm of 0.004 over the 66 h of independent test data.
Automatic tissue characterization from ultrasound imagery
NASA Astrophysics Data System (ADS)
Kadah, Yasser M.; Farag, Aly A.; Youssef, Abou-Bakr M.; Badawi, Ahmed M.
1993-08-01
In this work, feature extraction algorithms are proposed to extract the tissue characterization parameters from liver images. Then the resulting parameter set is further processed to obtain the minimum number of parameters representing the most discriminating pattern space for classification. This preprocessing step was applied to over 120 pathology-investigated cases to obtain the learning data for designing the classifier. The extracted features are divided into independent training and test sets and are used to construct both statistical and neural classifiers. The optimal criteria for these classifiers are set to have minimum error, ease of implementation and learning, and the flexibility for future modifications. Various algorithms for implementing various classification techniques are presented and tested on the data. The best performance was obtained using a single layer tensor model functional link network. Also, the voting k-nearest neighbor classifier provided comparably good diagnostic rates.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Xi; Mou, Xuanqin; Nishikawa, Robert M.
Purpose: Small calcifications are often the earliest and the main indicator of breast cancer. Dual-energy digital mammography (DEDM) has been considered as a promising technique to improve the detectability of calcifications since it can be used to suppress the contrast between adipose and glandular tissues of the breast. X-ray scatter leads to erroneous calculations of the DEDM image. Although the pinhole-array interpolation method can estimate scattered radiations, it requires extra exposures to measure the scatter and apply the correction. The purpose of this work is to design an algorithmic method for scatter correction in DEDM without extra exposures.Methods: In thismore » paper, a scatter correction method for DEDM was developed based on the knowledge that scattered radiation has small spatial variation and that the majority of pixels in a mammogram are noncalcification pixels. The scatter fraction was estimated in the DEDM calculation and the measured scatter fraction was used to remove scatter from the image. The scatter correction method was implemented on a commercial full-field digital mammography system with breast tissue equivalent phantom and calcification phantom. The authors also implemented the pinhole-array interpolation scatter correction method on the system. Phantom results for both methods are presented and discussed. The authors compared the background DE calcification signals and the contrast-to-noise ratio (CNR) of calcifications in the three DE calcification images: image without scatter correction, image with scatter correction using pinhole-array interpolation method, and image with scatter correction using the authors' algorithmic method.Results: The authors' results show that the resultant background DE calcification signal can be reduced. The root-mean-square of background DE calcification signal of 1962 μm with scatter-uncorrected data was reduced to 194 μm after scatter correction using the authors' algorithmic method. The range of background DE calcification signals using scatter-uncorrected data was reduced by 58% with scatter-corrected data by algorithmic method. With the scatter-correction algorithm and denoising, the minimum visible calcification size can be reduced from 380 to 280 μm.Conclusions: When applying the proposed algorithmic scatter correction to images, the resultant background DE calcification signals can be reduced and the CNR of calcifications can be improved. This method has similar or even better performance than pinhole-array interpolation method in scatter correction for DEDM; moreover, this method is convenient and requires no extra exposure to the patient. Although the proposed scatter correction method is effective, it is validated by a 5-cm-thick phantom with calcifications and homogeneous background. The method should be tested on structured backgrounds to more accurately gauge effectiveness.« less
A novel approach for dimension reduction of microarray.
Aziz, Rabia; Verma, C K; Srivastava, Namita
2017-12-01
This paper proposes a new hybrid search technique for feature (gene) selection (FS) using Independent component analysis (ICA) and Artificial Bee Colony (ABC) called ICA+ABC, to select informative genes based on a Naïve Bayes (NB) algorithm. An important trait of this technique is the optimization of ICA feature vector using ABC. ICA+ABC is a hybrid search algorithm that combines the benefits of extraction approach, to reduce the size of data and wrapper approach, to optimize the reduced feature vectors. This hybrid search technique is facilitated by evaluating the performance of ICA+ABC on six standard gene expression datasets of classification. Extensive experiments were conducted to compare the performance of ICA+ABC with the results obtained from recently published Minimum Redundancy Maximum Relevance (mRMR) +ABC algorithm for NB classifier. Also to check the performance that how ICA+ABC works as feature selection with NB classifier, compared the combination of ICA with popular filter techniques and with other similar bio inspired algorithm such as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The result shows that ICA+ABC has a significant ability to generate small subsets of genes from the ICA feature vector, that significantly improve the classification accuracy of NB classifier compared to other previously suggested methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Bassa, Zaakirah; Bob, Urmilla; Szantoi, Zoltan; Ismail, Riyad
2016-01-01
In recent years, the popularity of tree-based ensemble methods for land cover classification has increased significantly. Using WorldView-2 image data, we evaluate the potential of the oblique random forest algorithm (oRF) to classify a highly heterogeneous protected area. In contrast to the random forest (RF) algorithm, the oRF algorithm builds multivariate trees by learning the optimal split using a supervised model. The oRF binary algorithm is adapted to a multiclass land cover and land use application using both the "one-against-one" and "one-against-all" combination approaches. Results show that the oRF algorithms are capable of achieving high classification accuracies (>80%). However, there was no statistical difference in classification accuracies obtained by the oRF algorithms and the more popular RF algorithm. For all the algorithms, user accuracies (UAs) and producer accuracies (PAs) >80% were recorded for most of the classes. Both the RF and oRF algorithms poorly classified the indigenous forest class as indicated by the low UAs and PAs. Finally, the results from this study advocate and support the utility of the oRF algorithm for land cover and land use mapping of protected areas using WorldView-2 image data.
Using Gaussian mixture models to detect and classify dolphin whistles and pulses.
Peso Parada, Pablo; Cardenal-López, Antonio
2014-06-01
In recent years, a number of automatic detection systems for free-ranging cetaceans have been proposed that aim to detect not just surfaced, but also submerged, individuals. These systems are typically based on pattern-recognition techniques applied to underwater acoustic recordings. Using a Gaussian mixture model, a classification system was developed that detects sounds in recordings and classifies them as one of four types: background noise, whistles, pulses, and combined whistles and pulses. The classifier was tested using a database of underwater recordings made off the Spanish coast during 2011. Using cepstral-coefficient-based parameterization, a sound detection rate of 87.5% was achieved for a 23.6% classification error rate. To improve these results, two parameters computed using the multiple signal classification algorithm and an unpredictability measure were included in the classifier. These parameters, which helped to classify the segments containing whistles, increased the detection rate to 90.3% and reduced the classification error rate to 18.1%. Finally, the potential of the multiple signal classification algorithm and unpredictability measure for estimating whistle contours and classifying cetacean species was also explored, with promising results.
Accurate determination of imaging modality using an ensemble of text- and image-based classifiers.
Kahn, Charles E; Kalpathy-Cramer, Jayashree; Lam, Cesar A; Eldredge, Christina E
2012-02-01
Imaging modality can aid retrieval of medical images for clinical practice, research, and education. We evaluated whether an ensemble classifier could outperform its constituent individual classifiers in determining the modality of figures from radiology journals. Seventeen automated classifiers analyzed 77,495 images from two radiology journals. Each classifier assigned one of eight imaging modalities--computed tomography, graphic, magnetic resonance imaging, nuclear medicine, positron emission tomography, photograph, ultrasound, or radiograph-to each image based on visual and/or textual information. Three physicians determined the modality of 5,000 randomly selected images as a reference standard. A "Simple Vote" ensemble classifier assigned each image to the modality that received the greatest number of individual classifiers' votes. A "Weighted Vote" classifier weighted each individual classifier's vote based on performance over a training set. For each image, this classifier's output was the imaging modality that received the greatest weighted vote score. We measured precision, recall, and F score (the harmonic mean of precision and recall) for each classifier. Individual classifiers' F scores ranged from 0.184 to 0.892. The simple vote and weighted vote classifiers correctly assigned 4,565 images (F score, 0.913; 95% confidence interval, 0.905-0.921) and 4,672 images (F score, 0.934; 95% confidence interval, 0.927-0.941), respectively. The weighted vote classifier performed significantly better than all individual classifiers. An ensemble classifier correctly determined the imaging modality of 93% of figures in our sample. The imaging modality of figures published in radiology journals can be determined with high accuracy, which will improve systems for image retrieval.
WND-CHARM: Multi-purpose image classification using compound image transforms
Orlov, Nikita; Shamir, Lior; Macura, Tomasz; Johnston, Josiah; Eckley, D. Mark; Goldberg, Ilya G.
2008-01-01
We describe a multi-purpose image classifier that can be applied to a wide variety of image classification tasks without modifications or fine-tuning, and yet provide classification accuracy comparable to state-of-the-art task-specific image classifiers. The proposed image classifier first extracts a large set of 1025 image features including polynomial decompositions, high contrast features, pixel statistics, and textures. These features are computed on the raw image, transforms of the image, and transforms of transforms of the image. The feature values are then used to classify test images into a set of pre-defined image classes. This classifier was tested on several different problems including biological image classification and face recognition. Although we cannot make a claim of universality, our experimental results show that this classifier performs as well or better than classifiers developed specifically for these image classification tasks. Our classifier’s high performance on a variety of classification problems is attributed to (i) a large set of features extracted from images; and (ii) an effective feature selection and weighting algorithm sensitive to specific image classification problems. The algorithms are available for free download from openmicroscopy.org. PMID:18958301
NASA Astrophysics Data System (ADS)
Meng, Qingxin; Hu, Xiangyun; Pan, Heping; Xi, Yufei
2018-04-01
We propose an algorithm for calculating all-time apparent resistivity from transient electromagnetic induction logging. The algorithm is based on the whole-space transient electric field expression of the uniform model and Halley's optimisation. In trial calculations for uniform models, the all-time algorithm is shown to have high accuracy. We use the finite-difference time-domain method to simulate the transient electromagnetic field in radial two-layer models without wall rock and convert the simulation results to apparent resistivity using the all-time algorithm. The time-varying apparent resistivity reflects the radially layered geoelectrical structure of the models and the apparent resistivity of the earliest time channel follows the true resistivity of the inner layer; however, the apparent resistivity at larger times reflects the comprehensive electrical characteristics of the inner and outer layers. To accurately identify the outer layer resistivity based on the series relationship model of the layered resistance, the apparent resistivity and diffusion depth of the different time channels are approximately replaced by related model parameters; that is, we propose an apparent resistivity correction algorithm. By correcting the time-varying apparent resistivity of radial two-layer models, we show that the correction results reflect the radially layered electrical structure and the corrected resistivities of the larger time channels follow the outer layer resistivity. The transient electromagnetic fields of radially layered models with wall rock are simulated to obtain the 2D time-varying profiles of the apparent resistivity and corrections. The results suggest that the time-varying apparent resistivity and correction results reflect the vertical and radial geoelectrical structures. For models with small wall-rock effect, the correction removes the effect of the low-resistance inner layer on the apparent resistivity of the larger time channels.
NASA Technical Reports Server (NTRS)
Wang, Menghua
2003-01-01
The primary focus of this proposed research is for the atmospheric correction algorithm evaluation and development and satellite sensor calibration and characterization. It is well known that the atmospheric correction, which removes more than 90% of sensor-measured signals contributed from atmosphere in the visible, is the key procedure in the ocean color remote sensing (Gordon and Wang, 1994). The accuracy and effectiveness of the atmospheric correction directly affect the remotely retrieved ocean bio-optical products. On the other hand, for ocean color remote sensing, in order to obtain the required accuracy in the derived water-leaving signals from satellite measurements, an on-orbit vicarious calibration of the whole system, i.e., sensor and algorithms, is necessary. In addition, it is important to address issues of (i) cross-calibration of two or more sensors and (ii) in-orbit vicarious calibration of the sensor-atmosphere system. The goal of these researches is to develop methods for meaningful comparison and possible merging of data products from multiple ocean color missions. In the past year, much efforts have been on (a) understanding and correcting the artifacts appeared in the SeaWiFS-derived ocean and atmospheric produces; (b) developing an efficient method in generating the SeaWiFS aerosol lookup tables, (c) evaluating the effects of calibration error in the near-infrared (NIR) band to the atmospheric correction of the ocean color remote sensors, (d) comparing the aerosol correction algorithm using the singlescattering epsilon (the current SeaWiFS algorithm) vs. the multiple-scattering epsilon method, and (e) continuing on activities for the International Ocean-Color Coordinating Group (IOCCG) atmospheric correction working group. In this report, I will briefly present and discuss these and some other research activities.
Online boosting for vehicle detection.
Chang, Wen-Chung; Cho, Chih-Wei
2010-06-01
This paper presents a real-time vision-based vehicle detection system employing an online boosting algorithm. It is an online AdaBoost approach for a cascade of strong classifiers instead of a single strong classifier. Most existing cascades of classifiers must be trained offline and cannot effectively be updated when online tuning is required. The idea is to develop a cascade of strong classifiers for vehicle detection that is capable of being online trained in response to changing traffic environments. To make the online algorithm tractable, the proposed system must efficiently tune parameters based on incoming images and up-to-date performance of each weak classifier. The proposed online boosting method can improve system adaptability and accuracy to deal with novel types of vehicles and unfamiliar environments, whereas existing offline methods rely much more on extensive training processes to reach comparable results and cannot further be updated online. Our approach has been successfully validated in real traffic environments by performing experiments with an onboard charge-coupled-device camera in a roadway vehicle.
Classification of ligand molecules in PDB with fast heuristic graph match algorithm COMPLIG.
Saito, Mihoko; Takemura, Naomi; Shirai, Tsuyoshi
2012-12-14
A fast heuristic graph-matching algorithm, COMPLIG, was devised to classify the small-molecule ligands in the Protein Data Bank (PDB), which are currently not properly classified on structure basis. By concurrently classifying proteins and ligands, we determined the most appropriate parameter for categorizing ligands to be more than 60% identity of atoms and bonds between molecules, and we classified 11,585 types of ligands into 1946 clusters. Although the large clusters were composed of nucleotides or amino acids, a significant presence of drug compounds was also observed. Application of the system to classify the natural ligand status of human proteins in the current database suggested that, at most, 37% of the experimental structures of human proteins were in complex with natural ligands. However, protein homology- and/or ligand similarity-based modeling was implied to provide models of natural interactions for an additional 28% of the total, which might be used to increase the knowledge of intrinsic protein-metabolite interactions. Copyright © 2012 Elsevier Ltd. All rights reserved.
Classifier fusion for VoIP attacks classification
NASA Astrophysics Data System (ADS)
Safarik, Jakub; Rezac, Filip
2017-05-01
SIP is one of the most successful protocols in the field of IP telephony communication. It establishes and manages VoIP calls. As the number of SIP implementation rises, we can expect a higher number of attacks on the communication system in the near future. This work aims at malicious SIP traffic classification. A number of various machine learning algorithms have been developed for attack classification. The paper presents a comparison of current research and the use of classifier fusion method leading to a potential decrease in classification error rate. Use of classifier combination makes a more robust solution without difficulties that may affect single algorithms. Different voting schemes, combination rules, and classifiers are discussed to improve the overall performance. All classifiers have been trained on real malicious traffic. The concept of traffic monitoring depends on the network of honeypot nodes. These honeypots run in several networks spread in different locations. Separation of honeypots allows us to gain an independent and trustworthy attack information.
NASA Astrophysics Data System (ADS)
da Silva, Flávio Altinier Maximiano; Pedrini, Helio
2015-03-01
Facial expressions are an important demonstration of humanity's humors and emotions. Algorithms capable of recognizing facial expressions and associating them with emotions were developed and employed to compare the expressions that different cultural groups use to show their emotions. Static pictures of predominantly occidental and oriental subjects from public datasets were used to train machine learning algorithms, whereas local binary patterns, histogram of oriented gradients (HOGs), and Gabor filters were employed to describe the facial expressions for six different basic emotions. The most consistent combination, formed by the association of HOG filter and support vector machines, was then used to classify the other cultural group: there was a strong drop in accuracy, meaning that the subtle differences of facial expressions of each culture affected the classifier performance. Finally, a classifier was trained with images from both occidental and oriental subjects and its accuracy was higher on multicultural data, evidencing the need of a multicultural training set to build an efficient classifier.
Cao, Qi; Leung, K M
2014-09-22
Reliable computer models for the prediction of chemical biodegradability from molecular descriptors and fingerprints are very important for making health and environmental decisions. Coupling of the differential evolution (DE) algorithm with the support vector classifier (SVC) in order to optimize the main parameters of the classifier resulted in an improved classifier called the DE-SVC, which is introduced in this paper for use in chemical biodegradability studies. The DE-SVC was applied to predict the biodegradation of chemicals on the basis of extensive sample data sets and known structural features of molecules. Our optimization experiments showed that DE can efficiently find the proper parameters of the SVC. The resulting classifier possesses strong robustness and reliability compared with grid search, genetic algorithm, and particle swarm optimization methods. The classification experiments conducted here showed that the DE-SVC exhibits better classification performance than models previously used for such studies. It is a more effective and efficient prediction model for chemical biodegradability.
An improved non-uniformity correction algorithm and its GPU parallel implementation
NASA Astrophysics Data System (ADS)
Cheng, Kuanhong; Zhou, Huixin; Qin, Hanlin; Zhao, Dong; Qian, Kun; Rong, Shenghui
2018-05-01
The performance of SLP-THP based non-uniformity correction algorithm is seriously affected by the result of SLP filter, which always leads to image blurring and ghosting artifacts. To address this problem, an improved SLP-THP based non-uniformity correction method with curvature constraint was proposed. Here we put forward a new way to estimate spatial low frequency component. First, the details and contours of input image were obtained respectively by minimizing local Gaussian curvature and mean curvature of image surface. Then, the guided filter was utilized to combine these two parts together to get the estimate of spatial low frequency component. Finally, we brought this SLP component into SLP-THP method to achieve non-uniformity correction. The performance of proposed algorithm was verified by several real and simulated infrared image sequences. The experimental results indicated that the proposed algorithm can reduce the non-uniformity without detail losing. After that, a GPU based parallel implementation that runs 150 times faster than CPU was presented, which showed the proposed algorithm has great potential for real time application.
Carreer, William J.; Flight, Robert M.; Moseley, Hunter N. B.
2013-01-01
New metabolomics applications of ultra-high resolution and accuracy mass spectrometry can provide thousands of detectable isotopologues, with the number of potentially detectable isotopologues increasing exponentially with the number of stable isotopes used in newer isotope tracing methods like stable isotope-resolved metabolomics (SIRM) experiments. This huge increase in usable data requires software capable of correcting the large number of isotopologue peaks resulting from SIRM experiments in a timely manner. We describe the design of a new algorithm and software system capable of handling these high volumes of data, while including quality control methods for maintaining data quality. We validate this new algorithm against a previous single isotope correction algorithm in a two-step cross-validation. Next, we demonstrate the algorithm and correct for the effects of natural abundance for both 13C and 15N isotopes on a set of raw isotopologue intensities of UDP-N-acetyl-D-glucosamine derived from a 13C/15N-tracing experiment. Finally, we demonstrate the algorithm on a full omics-level dataset. PMID:24404440
Luo, Junhai; Fu, Liang
2017-06-09
With the development of communication technology, the demand for location-based services is growing rapidly. This paper presents an algorithm for indoor localization based on Received Signal Strength (RSS), which is collected from Access Points (APs). The proposed localization algorithm contains the offline information acquisition phase and online positioning phase. Firstly, the AP selection algorithm is reviewed and improved based on the stability of signals to remove useless AP; secondly, Kernel Principal Component Analysis (KPCA) is analyzed and used to remove the data redundancy and maintain useful characteristics for nonlinear feature extraction; thirdly, the Affinity Propagation Clustering (APC) algorithm utilizes RSS values to classify data samples and narrow the positioning range. In the online positioning phase, the classified data will be matched with the testing data to determine the position area, and the Maximum Likelihood (ML) estimate will be employed for precise positioning. Eventually, the proposed algorithm is implemented in a real-world environment for performance evaluation. Experimental results demonstrate that the proposed algorithm improves the accuracy and computational complexity.
Paudel, M R; Mackenzie, M; Fallone, B G; Rathee, S
2013-08-01
To evaluate the metal artifacts in kilovoltage computed tomography (kVCT) images that are corrected using a normalized metal artifact reduction (NMAR) method with megavoltage CT (MVCT) prior images. Tissue characterization phantoms containing bilateral steel inserts are used in all experiments. Two MVCT images, one without any metal artifact corrections and the other corrected using a modified iterative maximum likelihood polychromatic algorithm for CT (IMPACT) are translated to pseudo-kVCT images. These are then used as prior images without tissue classification in an NMAR technique for correcting the experimental kVCT image. The IMPACT method in MVCT included an additional model for the pair∕triplet production process and the energy dependent response of the MVCT detectors. An experimental kVCT image, without the metal inserts and reconstructed using the filtered back projection (FBP) method, is artificially patched with the known steel inserts to get a reference image. The regular NMAR image containing the steel inserts that uses tissue classified kVCT prior and the NMAR images reconstructed using MVCT priors are compared with the reference image for metal artifact reduction. The Eclipse treatment planning system is used to calculate radiotherapy dose distributions on the corrected images and on the reference image using the Anisotropic Analytical Algorithm with 6 MV parallel opposed 5×10 cm2 fields passing through the bilateral steel inserts, and the results are compared. Gafchromic film is used to measure the actual dose delivered in a plane perpendicular to the beams at the isocenter. The streaking and shading in the NMAR image using tissue classifications are significantly reduced. However, the structures, including metal, are deformed. Some uniform regions appear to have eroded from one side. There is a large variation of attenuation values inside the metal inserts. Similar results are seen in commercially corrected image. Use of MVCT prior images without tissue classification in NMAR significantly reduces these problems. The radiation dose calculated on the reference image is close to the dose measured using the film. Compared to the reference image, the calculated dose difference in the conventional NMAR image, the corrected images using uncorrected MVCT image, and IMPACT corrected MVCT image as priors is ∼15.5%, ∼5%, and ∼2.7%, respectively, at the isocenter. The deformation and erosion of the structures present in regular NMAR corrected images can be largely reduced by using MVCT priors without tissue segmentation. The attenuation value of metal being incorrect, large dose differences relative to the true value can result when using the conventional NMAR image. This difference can be significantly reduced if MVCT images are used as priors. Reduced tissue deformation, better tissue visualization, and correct information about the electron density of the tissues and metals in the artifact corrected images could help delineate the structures better, as well as calculate radiation dose more correctly, thus enhancing the quality of the radiotherapy treatment planning.
Álvarez, Aitor; Sierra, Basilio; Arruti, Andoni; López-Gil, Juan-Miguel; Garay-Vitoria, Nestor
2015-01-01
In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one. PMID:26712757
Narayanan, Balaji; Hardie, Russell C; Muse, Robert A
2005-06-10
Spatial fixed-pattern noise is a common and major problem in modern infrared imagers owing to the nonuniform response of the photodiodes in the focal plane array of the imaging system. In addition, the nonuniform response of the readout and digitization electronics, which are involved in multiplexing the signals from the photodiodes, causes further nonuniformity. We describe a novel scene based on a nonuniformity correction algorithm that treats the aggregate nonuniformity in separate stages. First, the nonuniformity from the readout amplifiers is corrected by use of knowledge of the readout architecture of the imaging system. Second, the nonuniformity resulting from the individual detectors is corrected with a nonlinear filter-based method. We demonstrate the performance of the proposed algorithm by applying it to simulated imagery and real infrared data. Quantitative results in terms of the mean absolute error and the signal-to-noise ratio are also presented to demonstrate the efficacy of the proposed algorithm. One advantage of the proposed algorithm is that it requires only a few frames to obtain high-quality corrections.
Application and assessment of a robust elastic motion correction algorithm to dynamic MRI.
Herrmann, K-H; Wurdinger, S; Fischer, D R; Krumbein, I; Schmitt, M; Hermosillo, G; Chaudhuri, K; Krishnan, A; Salganicoff, M; Kaiser, W A; Reichenbach, J R
2007-01-01
The purpose of this study was to assess the performance of a new motion correction algorithm. Twenty-five dynamic MR mammography (MRM) data sets and 25 contrast-enhanced three-dimensional peripheral MR angiographic (MRA) data sets which were affected by patient motion of varying severeness were selected retrospectively from routine examinations. Anonymized data were registered by a new experimental elastic motion correction algorithm. The algorithm works by computing a similarity measure for the two volumes that takes into account expected signal changes due to the presence of a contrast agent while penalizing other signal changes caused by patient motion. A conjugate gradient method is used to find the best possible set of motion parameters that maximizes the similarity measures across the entire volume. Images before and after correction were visually evaluated and scored by experienced radiologists with respect to reduction of motion, improvement of image quality, disappearance of existing lesions or creation of artifactual lesions. It was found that the correction improves image quality (76% for MRM and 96% for MRA) and diagnosability (60% for MRM and 96% for MRA).
PCA method for automated detection of mispronounced words
NASA Astrophysics Data System (ADS)
Ge, Zhenhao; Sharma, Sudhendu R.; Smith, Mark J. T.
2011-06-01
This paper presents a method for detecting mispronunciations with the aim of improving Computer Assisted Language Learning (CALL) tools used by foreign language learners. The algorithm is based on Principle Component Analysis (PCA). It is hierarchical with each successive step refining the estimate to classify the test word as being either mispronounced or correct. Preprocessing before detection, like normalization and time-scale modification, is implemented to guarantee uniformity of the feature vectors input to the detection system. The performance using various features including spectrograms and Mel-Frequency Cepstral Coefficients (MFCCs) are compared and evaluated. Best results were obtained using MFCCs, achieving up to 99% accuracy in word verification and 93% in native/non-native classification. Compared with Hidden Markov Models (HMMs) which are used pervasively in recognition application, this particular approach is computational efficient and effective when training data is limited.
Developing processing techniques for Skylab data
NASA Technical Reports Server (NTRS)
Nalepka, R. F. (Principal Investigator); Malila, W. A.; Morgenstern, J. P.
1975-01-01
The author has identified the following significant results. The effects of misregistration and the scan-line-straightening algorithm on multispectral data were found to be: (1) there is greatly increased misregistration in scan-line-straightening data over conic data; (2) scanner caused misregistration between any pairs of channels may not be corrected for in scan-line-straightened data; and (3) this data will have few pure field center pixels than will conic data. A program SIMSIG was developed implementing the signature simulation model. Data processing stages of the experiment were carried out, and an analysis was made of the effects of spatial misregistration on field center classification accuracy. Fifteen signatures originally used for classifying the data were analyzed, showing the following breakdown: corn (4 signatures), trees (2), brush (1), grasses, weeds, etc. (5), bare soil (1), soybeans (1), and alfalfa (1).
Computer Vision and Machine Learning for Autonomous Characterization of AM Powder Feedstocks
NASA Astrophysics Data System (ADS)
DeCost, Brian L.; Jain, Harshvardhan; Rollett, Anthony D.; Holm, Elizabeth A.
2017-03-01
By applying computer vision and machine learning methods, we develop a system to characterize powder feedstock materials for metal additive manufacturing (AM). Feature detection and description algorithms are applied to create a microstructural scale image representation that can be used to cluster, compare, and analyze powder micrographs. When applied to eight commercial feedstock powders, the system classifies powder images into the correct material systems with greater than 95% accuracy. The system also identifies both representative and atypical powder images. These results suggest the possibility of measuring variations in powders as a function of processing history, relating microstructural features of powders to properties relevant to their performance in AM processes, and defining objective material standards based on visual images. A significant advantage of the computer vision approach is that it is autonomous, objective, and repeatable.
Support Vector Machines for Hyperspectral Remote Sensing Classification
NASA Technical Reports Server (NTRS)
Gualtieri, J. Anthony; Cromp, R. F.
1998-01-01
The Support Vector Machine provides a new way to design classification algorithms which learn from examples (supervised learning) and generalize when applied to new data. We demonstrate its success on a difficult classification problem from hyperspectral remote sensing, where we obtain performances of 96%, and 87% correct for a 4 class problem, and a 16 class problem respectively. These results are somewhat better than other recent results on the same data. A key feature of this classifier is its ability to use high-dimensional data without the usual recourse to a feature selection step to reduce the dimensionality of the data. For this application, this is important, as hyperspectral data consists of several hundred contiguous spectral channels for each exemplar. We provide an introduction to this new approach, and demonstrate its application to classification of an agriculture scene.
Validation of the Thematic Mapper radiometric and geometric correction algorithms
NASA Technical Reports Server (NTRS)
Fischel, D.
1984-01-01
The radiometric and geometric correction algorithms for Thematic Mapper are critical to subsequent successful information extraction. Earlier Landsat scanners, known as Multispectral Scanners, produce imagery which exhibits striping due to mismatching of detector gains and biases. Thematic Mapper exhibits the same phenomenon at three levels: detector-to-detector, scan-to-scan, and multiscan striping. The cause of these variations has been traced to variations in the dark current of the detectors. An alternative formulation has been tested and shown to be very satisfactory. Unfortunately, the Thematic Mapper detectors exhibit saturation effects suffered while viewing extensive cloud areas, and is not easily correctable. The geometric correction algorithm has been shown to be remarkably reliable. Only minor and modest improvements are indicated and shown to be effective.
Teuho, Jarmo; Saunavaara, Virva; Tolvanen, Tuula; Tuokkola, Terhi; Karlsson, Antti; Tuisku, Jouni; Teräs, Mika
2017-10-01
In PET, corrections for photon scatter and attenuation are essential for visual and quantitative consistency. MR attenuation correction (MRAC) is generally conducted by image segmentation and assignment of discrete attenuation coefficients, which offer limited accuracy compared with CT attenuation correction. Potential inaccuracies in MRAC may affect scatter correction, because the attenuation image (μ-map) is used in single scatter simulation (SSS) to calculate the scatter estimate. We assessed the impact of MRAC to scatter correction using 2 scatter-correction techniques and 3 μ-maps for MRAC. Methods: The tail-fitted SSS (TF-SSS) and a Monte Carlo-based single scatter simulation (MC-SSS) algorithm implementations on the Philips Ingenuity TF PET/MR were used with 1 CT-based and 2 MR-based μ-maps. Data from 7 subjects were used in the clinical evaluation, and a phantom study using an anatomic brain phantom was conducted. Scatter-correction sinograms were evaluated for each scatter correction method and μ-map. Absolute image quantification was investigated with the phantom data. Quantitative assessment of PET images was performed by volume-of-interest and ratio image analysis. Results: MRAC did not result in large differences in scatter algorithm performance, especially with TF-SSS. Scatter sinograms and scatter fractions did not reveal large differences regardless of the μ-map used. TF-SSS showed slightly higher absolute quantification. The differences in volume-of-interest analysis between TF-SSS and MC-SSS were 3% at maximum in the phantom and 4% in the patient study. Both algorithms showed excellent correlation with each other with no visual differences between PET images. MC-SSS showed a slight dependency on the μ-map used, with a difference of 2% on average and 4% at maximum when a μ-map without bone was used. Conclusion: The effect of different MR-based μ-maps on the performance of scatter correction was minimal in non-time-of-flight 18 F-FDG PET/MR brain imaging. The SSS algorithm was not affected significantly by MRAC. The performance of the MC-SSS algorithm is comparable but not superior to TF-SSS, warranting further investigations of algorithm optimization and performance with different radiotracers and time-of-flight imaging. © 2017 by the Society of Nuclear Medicine and Molecular Imaging.
NASA Astrophysics Data System (ADS)
Ko, Jonathan; Wu, Chensheng; Davis, Christopher C.
2015-09-01
Adaptive optics has been widely used in the field of astronomy to correct for atmospheric turbulence while viewing images of celestial bodies. The slightly distorted incoming wavefronts are typically sensed with a Shack-Hartmann sensor and then corrected with a deformable mirror. Although this approach has proven to be effective for astronomical purposes, a new approach must be developed when correcting for the deep turbulence experienced in ground to ground based optical systems. We propose the use of a modified plenoptic camera as a wavefront sensor capable of accurately representing an incoming wavefront that has been significantly distorted by strong turbulence conditions (C2n <10-13 m- 2/3). An intelligent correction algorithm can then be developed to reconstruct the perturbed wavefront and use this information to drive a deformable mirror capable of correcting the major distortions. After the large distortions have been corrected, a secondary mode utilizing more traditional adaptive optics algorithms can take over to fine tune the wavefront correction. This two-stage algorithm can find use in free space optical communication systems, in directed energy applications, as well as for image correction purposes.
Algorithm for Atmospheric Corrections of Aircraft and Satellite Imagery
NASA Technical Reports Server (NTRS)
Fraser, Robert S.; Kaufman, Yoram J.; Ferrare, Richard A.; Mattoo, Shana
1989-01-01
A simple and fast atmospheric correction algorithm is described which is used to correct radiances of scattered sunlight measured by aircraft and/or satellite above a uniform surface. The atmospheric effect, the basic equations, a description of the computational procedure, and a sensitivity study are discussed. The program is designed to take the measured radiances, view and illumination directions, and the aerosol and gaseous absorption optical thickness to compute the radiance just above the surface, the irradiance on the surface, and surface reflectance. Alternatively, the program will compute the upward radiance at a specific altitude for a given surface reflectance, view and illumination directions, and aerosol and gaseous absorption optical thickness. The algorithm can be applied for any view and illumination directions and any wavelength in the range 0.48 micron to 2.2 micron. The relation between the measured radiance and surface reflectance, which is expressed as a function of atmospheric properties and measurement geometry, is computed using a radiative transfer routine. The results of the computations are presented in a table which forms the basis of the correction algorithm. The algorithm can be used for atmospheric corrections in the presence of a rural aerosol. The sensitivity of the derived surface reflectance to uncertainties in the model and input data is discussed.
Algorithm for atmospheric corrections of aircraft and satellite imagery
NASA Technical Reports Server (NTRS)
Fraser, R. S.; Ferrare, R. A.; Kaufman, Y. J.; Markham, B. L.; Mattoo, S.
1992-01-01
A simple and fast atmospheric correction algorithm is described which is used to correct radiances of scattered sunlight measured by aircraft and/or satellite above a uniform surface. The atmospheric effect, the basic equations, a description of the computational procedure, and a sensitivity study are discussed. The program is designed to take the measured radiances, view and illumination directions, and the aerosol and gaseous absorption optical thickness to compute the radiance just above the surface, the irradiance on the surface, and surface reflectance. Alternatively, the program will compute the upward radiance at a specific altitude for a given surface reflectance, view and illumination directions, and aerosol and gaseous absorption optical thickness. The algorithm can be applied for any view and illumination directions and any wavelength in the range 0.48 micron to 2.2 microns. The relation between the measured radiance and surface reflectance, which is expressed as a function of atmospheric properties and measurement geometry, is computed using a radiative transfer routine. The results of the computations are presented in a table which forms the basis of the correction algorithm. The algorithm can be used for atmospheric corrections in the presence of a rural aerosol. The sensitivity of the derived surface reflectance to uncertainties in the model and input data is discussed.
Approximate string matching algorithms for limited-vocabulary OCR output correction
NASA Astrophysics Data System (ADS)
Lasko, Thomas A.; Hauser, Susan E.
2000-12-01
Five methods for matching words mistranslated by optical character recognition to their most likely match in a reference dictionary were tested on data from the archives of the National Library of Medicine. The methods, including an adaptation of the cross correlation algorithm, the generic edit distance algorithm, the edit distance algorithm with a probabilistic substitution matrix, Bayesian analysis, and Bayesian analysis on an actively thinned reference dictionary were implemented and their accuracy rates compared. Of the five, the Bayesian algorithm produced the most correct matches (87%), and had the advantage of producing scores that have a useful and practical interpretation.
K-mean clustering algorithm for processing signals from compound semiconductor detectors
NASA Astrophysics Data System (ADS)
Tada, Tsutomu; Hitomi, Keitaro; Wu, Yan; Kim, Seong-Yun; Yamazaki, Hiromichi; Ishii, Keizo
2011-12-01
The K-mean clustering algorithm was employed for processing signal waveforms from TlBr detectors. The signal waveforms were classified based on its shape reflecting the charge collection process in the detector. The classified signal waveforms were processed individually to suppress the pulse height variation of signals due to the charge collection loss. The obtained energy resolution of a 137Cs spectrum measured with a 0.5 mm thick TlBr detector was 1.3% FWHM by employing 500 clusters.
Fuzzy Nonlinear Proximal Support Vector Machine for Land Extraction Based on Remote Sensing Image
Zhong, Xiaomei; Li, Jianping; Dou, Huacheng; Deng, Shijun; Wang, Guofei; Jiang, Yu; Wang, Yongjie; Zhou, Zebing; Wang, Li; Yan, Fei
2013-01-01
Currently, remote sensing technologies were widely employed in the dynamic monitoring of the land. This paper presented an algorithm named fuzzy nonlinear proximal support vector machine (FNPSVM) by basing on ETM+ remote sensing image. This algorithm is applied to extract various types of lands of the city Da’an in northern China. Two multi-category strategies, namely “one-against-one” and “one-against-rest” for this algorithm were described in detail and then compared. A fuzzy membership function was presented to reduce the effects of noises or outliers on the data samples. The approaches of feature extraction, feature selection, and several key parameter settings were also given. Numerous experiments were carried out to evaluate its performances including various accuracies (overall accuracies and kappa coefficient), stability, training speed, and classification speed. The FNPSVM classifier was compared to the other three classifiers including the maximum likelihood classifier (MLC), back propagation neural network (BPN), and the proximal support vector machine (PSVM) under different training conditions. The impacts of the selection of training samples, testing samples and features on the four classifiers were also evaluated in these experiments. PMID:23936016
An improved non-uniformity correction algorithm and its hardware implementation on FPGA
NASA Astrophysics Data System (ADS)
Rong, Shenghui; Zhou, Huixin; Wen, Zhigang; Qin, Hanlin; Qian, Kun; Cheng, Kuanhong
2017-09-01
The Non-uniformity of Infrared Focal Plane Arrays (IRFPA) severely degrades the infrared image quality. An effective non-uniformity correction (NUC) algorithm is necessary for an IRFPA imaging and application system. However traditional scene-based NUC algorithm suffers the image blurring and artificial ghosting. In addition, few effective hardware platforms have been proposed to implement corresponding NUC algorithms. Thus, this paper proposed an improved neural-network based NUC algorithm by the guided image filter and the projection-based motion detection algorithm. First, the guided image filter is utilized to achieve the accurate desired image to decrease the artificial ghosting. Then a projection-based moving detection algorithm is utilized to determine whether the correction coefficients should be updated or not. In this way the problem of image blurring can be overcome. At last, an FPGA-based hardware design is introduced to realize the proposed NUC algorithm. A real and a simulated infrared image sequences are utilized to verify the performance of the proposed algorithm. Experimental results indicated that the proposed NUC algorithm can effectively eliminate the fix pattern noise with less image blurring and artificial ghosting. The proposed hardware design takes less logic elements in FPGA and spends less clock cycles to process one frame of image.
Evaluation of two Vaisala RS92 radiosonde solar radiative dry bias correction algorithms
Dzambo, Andrew M.; Turner, David D.; Mlawer, Eli J.
2016-04-12
Solar heating of the relative humidity (RH) probe on Vaisala RS92 radiosondes results in a large dry bias in the upper troposphere. Two different algorithms (Miloshevich et al., 2009, MILO hereafter; and Wang et al., 2013, WANG hereafter) have been designed to account for this solar radiative dry bias (SRDB). These corrections are markedly different with MILO adding up to 40 % more moisture to the original radiosonde profile than WANG; however, the impact of the two algorithms varies with height. The accuracy of these two algorithms is evaluated using three different approaches: a comparison of precipitable water vapor (PWV),more » downwelling radiative closure with a surface-based microwave radiometer at a high-altitude site (5.3 km m.s.l.), and upwelling radiative closure with the space-based Atmospheric Infrared Sounder (AIRS). The PWV computed from the uncorrected and corrected RH data is compared against PWV retrieved from ground-based microwave radiometers at tropical, midlatitude, and arctic sites. Although MILO generally adds more moisture to the original radiosonde profile in the upper troposphere compared to WANG, both corrections yield similar changes to the PWV, and the corrected data agree well with the ground-based retrievals. The two closure activities – done for clear-sky scenes – use the radiative transfer models MonoRTM and LBLRTM to compute radiance from the radiosonde profiles to compare against spectral observations. Both WANG- and MILO-corrected RHs are statistically better than original RH in all cases except for the driest 30 % of cases in the downwelling experiment, where both algorithms add too much water vapor to the original profile. In the upwelling experiment, the RH correction applied by the WANG vs. MILO algorithm is statistically different above 10 km for the driest 30 % of cases and above 8 km for the moistest 30 % of cases, suggesting that the MILO correction performs better than the WANG in clear-sky scenes. Lastly, the cause of this statistical significance is likely explained by the fact the WANG correction also accounts for cloud cover – a condition not accounted for in the radiance closure experiments.« less
Evaluation of two Vaisala RS92 radiosonde solar radiative dry bias correction algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dzambo, Andrew M.; Turner, David D.; Mlawer, Eli J.
Solar heating of the relative humidity (RH) probe on Vaisala RS92 radiosondes results in a large dry bias in the upper troposphere. Two different algorithms (Miloshevich et al., 2009, MILO hereafter; and Wang et al., 2013, WANG hereafter) have been designed to account for this solar radiative dry bias (SRDB). These corrections are markedly different with MILO adding up to 40 % more moisture to the original radiosonde profile than WANG; however, the impact of the two algorithms varies with height. The accuracy of these two algorithms is evaluated using three different approaches: a comparison of precipitable water vapor (PWV),more » downwelling radiative closure with a surface-based microwave radiometer at a high-altitude site (5.3 km m.s.l.), and upwelling radiative closure with the space-based Atmospheric Infrared Sounder (AIRS). The PWV computed from the uncorrected and corrected RH data is compared against PWV retrieved from ground-based microwave radiometers at tropical, midlatitude, and arctic sites. Although MILO generally adds more moisture to the original radiosonde profile in the upper troposphere compared to WANG, both corrections yield similar changes to the PWV, and the corrected data agree well with the ground-based retrievals. The two closure activities – done for clear-sky scenes – use the radiative transfer models MonoRTM and LBLRTM to compute radiance from the radiosonde profiles to compare against spectral observations. Both WANG- and MILO-corrected RHs are statistically better than original RH in all cases except for the driest 30 % of cases in the downwelling experiment, where both algorithms add too much water vapor to the original profile. In the upwelling experiment, the RH correction applied by the WANG vs. MILO algorithm is statistically different above 10 km for the driest 30 % of cases and above 8 km for the moistest 30 % of cases, suggesting that the MILO correction performs better than the WANG in clear-sky scenes. Lastly, the cause of this statistical significance is likely explained by the fact the WANG correction also accounts for cloud cover – a condition not accounted for in the radiance closure experiments.« less
Using support vector machine to predict beta- and gamma-turns in proteins.
Hu, Xiuzhen; Li, Qianzhong
2008-09-01
By using the composite vector with increment of diversity, position conservation scoring function, and predictive secondary structures to express the information of sequence, a support vector machine (SVM) algorithm for predicting beta- and gamma-turns in the proteins is proposed. The 426 and 320 nonhomologous protein chains described by Guruprasad and Rajkumar (Guruprasad and Rajkumar J. Biosci 2000, 25,143) are used for training and testing the predictive model of the beta- and gamma-turns, respectively. The overall prediction accuracy and the Matthews correlation coefficient in 7-fold cross-validation are 79.8% and 0.47, respectively, for the beta-turns. The overall prediction accuracy in 5-fold cross-validation is 61.0% for the gamma-turns. These results are significantly higher than the other algorithms in the prediction of beta- and gamma-turns using the same datasets. In addition, the 547 and 823 nonhomologous protein chains described by Fuchs and Alix (Fuchs and Alix Proteins: Struct Funct Bioinform 2005, 59, 828) are used for training and testing the predictive model of the beta- and gamma-turns, and better results are obtained. This algorithm may be helpful to improve the performance of protein turns' prediction. To ensure the ability of the SVM method to correctly classify beta-turn and non-beta-turn (gamma-turn and non-gamma-turn), the receiver operating characteristic threshold independent measure curves are provided. (c) 2008 Wiley Periodicals, Inc.
Improved analytical methods for microarray-based genome-composition analysis
Kim, Charles C; Joyce, Elizabeth A; Chan, Kaman; Falkow, Stanley
2002-01-01
Background Whereas genome sequencing has given us high-resolution pictures of many different species of bacteria, microarrays provide a means of obtaining information on genome composition for many strains of a given species. Genome-composition analysis using microarrays, or 'genomotyping', can be used to categorize genes into 'present' and 'divergent' categories based on the level of hybridization signal. This typically involves selecting a signal value that is used as a cutoff to discriminate present (high signal) and divergent (low signal) genes. Current methodology uses empirical determination of cutoffs for classification into these categories, but this methodology is subject to several problems that can result in the misclassification of many genes. Results We describe a method that depends on the shape of the signal-ratio distribution and does not require empirical determination of a cutoff. Moreover, the cutoff is determined on an array-to-array basis, accounting for variation in strain composition and hybridization quality. The algorithm also provides an estimate of the probability that any given gene is present, which provides a measure of confidence in the categorical assignments. Conclusions Many genes previously classified as present using static methods are in fact divergent on the basis of microarray signal; this is corrected by our algorithm. We have reassigned hundreds of genes from previous genomotyping studies of Helicobacter pylori and Campylobacter jejuni strains, and expect that the algorithm should be widely applicable to genomotyping data. PMID:12429064
Classification of fMRI resting-state maps using machine learning techniques: A comparative study
NASA Astrophysics Data System (ADS)
Gallos, Ioannis; Siettos, Constantinos
2017-11-01
We compare the efficiency of Principal Component Analysis (PCA) and nonlinear learning manifold algorithms (ISOMAP and Diffusion maps) for classifying brain maps between groups of schizophrenia patients and healthy from fMRI scans during a resting-state experiment. After a standard pre-processing pipeline, we applied spatial Independent component analysis (ICA) to reduce (a) noise and (b) spatial-temporal dimensionality of fMRI maps. On the cross-correlation matrix of the ICA components, we applied PCA, ISOMAP and Diffusion Maps to find an embedded low-dimensional space. Finally, support-vector-machines (SVM) and k-NN algorithms were used to evaluate the performance of the algorithms in classifying between the two groups.
False alarm reduction by the And-ing of multiple multivariate Gaussian classifiers
NASA Astrophysics Data System (ADS)
Dobeck, Gerald J.; Cobb, J. Tory
2003-09-01
The high-resolution sonar is one of the principal sensors used by the Navy to detect and classify sea mines in minehunting operations. For such sonar systems, substantial effort has been devoted to the development of automated detection and classification (D/C) algorithms. These have been spurred by several factors including (1) aids for operators to reduce work overload, (2) more optimal use of all available data, and (3) the introduction of unmanned minehunting systems. The environments where sea mines are typically laid (harbor areas, shipping lanes, and the littorals) give rise to many false alarms caused by natural, biologic, and man-made clutter. The objective of the automated D/C algorithms is to eliminate most of these false alarms while still maintaining a very high probability of mine detection and classification (PdPc). In recent years, the benefits of fusing the outputs of multiple D/C algorithms have been studied. We refer to this as Algorithm Fusion. The results have been remarkable, including reliable robustness to new environments. This paper describes a method for training several multivariate Gaussian classifiers such that their And-ing dramatically reduces false alarms while maintaining a high probability of classification. This training approach is referred to as the Focused- Training method. This work extends our 2001-2002 work where the Focused-Training method was used with three other types of classifiers: the Attractor-based K-Nearest Neighbor Neural Network (a type of radial-basis, probabilistic neural network), the Optimal Discrimination Filter Classifier (based linear discrimination theory), and the Quadratic Penalty Function Support Vector Machine (QPFSVM). Although our experience has been gained in the area of sea mine detection and classification, the principles described herein are general and can be applied to a wide range of pattern recognition and automatic target recognition (ATR) problems.
Ozçift, Akin
2011-05-01
Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.
DOT National Transportation Integrated Search
1986-12-01
The algorithms described in this report determine the differential corrections to be broadcast to users of the Global Positioning System (GPS) who require higher accuracy navigation or position information than the 30 to 100 meters that GPS normally ...
Hard decoding algorithm for optimizing thresholds under general Markovian noise
NASA Astrophysics Data System (ADS)
Chamberland, Christopher; Wallman, Joel; Beale, Stefanie; Laflamme, Raymond
2017-04-01
Quantum error correction is instrumental in protecting quantum systems from noise in quantum computing and communication settings. Pauli channels can be efficiently simulated and threshold values for Pauli error rates under a variety of error-correcting codes have been obtained. However, realistic quantum systems can undergo noise processes that differ significantly from Pauli noise. In this paper, we present an efficient hard decoding algorithm for optimizing thresholds and lowering failure rates of an error-correcting code under general completely positive and trace-preserving (i.e., Markovian) noise. We use our hard decoding algorithm to study the performance of several error-correcting codes under various non-Pauli noise models by computing threshold values and failure rates for these codes. We compare the performance of our hard decoding algorithm to decoders optimized for depolarizing noise and show improvements in thresholds and reductions in failure rates by several orders of magnitude. Our hard decoding algorithm can also be adapted to take advantage of a code's non-Pauli transversal gates to further suppress noise. For example, we show that using the transversal gates of the 5-qubit code allows arbitrary rotations around certain axes to be perfectly corrected. Furthermore, we show that Pauli twirling can increase or decrease the threshold depending upon the code properties. Lastly, we show that even if the physical noise model differs slightly from the hypothesized noise model used to determine an optimized decoder, failure rates can still be reduced by applying our hard decoding algorithm.
A multiscale curvature algorithm for classifying discrete return LiDAR in forested environments
Jeffrey S. Evans; Andrew T. Hudak
2007-01-01
One prerequisite to the use of light detection and ranging (LiDAR) across disciplines is differentiating ground from nonground returns. The objective was to automatically and objectively classify points within unclassified LiDAR point clouds, with few model parameters and minimal postprocessing. Presented is an automated method for classifying LiDAR returns as ground...
NASA Technical Reports Server (NTRS)
Garay, Michael J.; Mazzoni, Dominic; Davies, Roger; Wagstaff, Kiri
2004-01-01
Support Vector Machines (SVMs) are a type of supervised learning algorith,, other examples of which are Artificial Neural Networks (ANNs), Decision Trees, and Naive Bayesian Classifiers. Supervised learning algorithms are used to classify objects labled by a 'supervisor' - typically a human 'expert.'.
Brunner, Stephen; Nett, Brian E; Tolakanahalli, Ranjini; Chen, Guang-Hong
2011-02-21
X-ray scatter is a significant problem in cone-beam computed tomography when thicker objects and larger cone angles are used, as scattered radiation can lead to reduced contrast and CT number inaccuracy. Advances have been made in x-ray computed tomography (CT) by incorporating a high quality prior image into the image reconstruction process. In this paper, we extend this idea to correct scatter-induced shading artifacts in cone-beam CT image-guided radiation therapy. Specifically, this paper presents a new scatter correction algorithm which uses a prior image with low scatter artifacts to reduce shading artifacts in cone-beam CT images acquired under conditions of high scatter. The proposed correction algorithm begins with an empirical hypothesis that the target image can be written as a weighted summation of a series of basis images that are generated by raising the raw cone-beam projection data to different powers, and then, reconstructing using the standard filtered backprojection algorithm. The weight for each basis image is calculated by minimizing the difference between the target image and the prior image. The performance of the scatter correction algorithm is qualitatively and quantitatively evaluated through phantom studies using a Varian 2100 EX System with an on-board imager. Results show that the proposed scatter correction algorithm using a prior image with low scatter artifacts can substantially mitigate scatter-induced shading artifacts in both full-fan and half-fan modes.
Ramsperger, Robert; Meckler, Stefan; Heger, Tanja; van Uem, Janet; Hucker, Svenja; Braatz, Ulrike; Graessner, Holm; Berg, Daniela; Manoli, Yiannos; Serrano, J Artur; Ferreira, Joaquim J; Hobert, Markus A; Maetzler, Walter
2016-05-01
Dyskinesias in Parkinson's disease (PD) patients are a common side effect of long-term dopaminergic therapy and are associated with motor dysfunctions, including gait and balance deficits. Although promising compounds have been developed to treat these symptoms, clinical trials have failed. This failure may, at least partly, be explained by the lack of objective and continuous assessment strategies. This study tested the clinical validity and ecological effect of an algorithm that detects and quantifies dyskinesias of the legs using a single ankle-worn sensor. Twenty-three PD patients (seven with leg dyskinesias) and 13 control subjects were investigated in the lab. Participants performed purposeful daily activity-like tasks while being video-taped. Clinical evaluation was performed using the leg dyskinesia item of the Unified Dyskinesia Rating Scale. The ecological effect of the developed algorithm was investigated in a multi-center, 12-week, home-based sub-study that included three patients with and seven without dyskinesias. In the lab-based sub-study, the sensor-based algorithm exhibited a specificity of 98%, a sensitivity of 85%, and an accuracy of 0.96 for the detection of dyskinesias and a correlation level of 0.61 (p < 0.001) with the clinical severity score. In the home-based sub-study, all patients could be correctly classified regarding the presence or absence of leg dyskinesias, supporting the ecological relevance of the algorithm. This study provides evidence of clinical validity and ecological effect of an algorithm derived from a single sensor on the ankle for detecting leg dyskinesias in PD patients. These results should motivate the investigation of leg dyskinesias in larger studies using wearable sensors. Copyright © 2016 Elsevier Ltd. All rights reserved.
Incorporating spatial context into statistical classification of multidimensional image data
NASA Technical Reports Server (NTRS)
Bauer, M. E. (Principal Investigator); Tilton, J. C.; Swain, P. H.
1981-01-01
Compound decision theory is employed to develop a general statistical model for classifying image data using spatial context. The classification algorithm developed from this model exploits the tendency of certain ground-cover classes to occur more frequently in some spatial contexts than in others. A key input to this contextural classifier is a quantitative characterization of this tendency: the context function. Several methods for estimating the context function are explored, and two complementary methods are recommended. The contextural classifier is shown to produce substantial improvements in classification accuracy compared to the accuracy produced by a non-contextural uniform-priors maximum likelihood classifier when these methods of estimating the context function are used. An approximate algorithm, which cuts computational requirements by over one-half, is presented. The search for an optimal implementation is furthered by an exploration of the relative merits of using spectral classes or information classes for classification and/or context function estimation.
Comparative Analysis of Document level Text Classification Algorithms using R
NASA Astrophysics Data System (ADS)
Syamala, Maganti; Nalini, N. J., Dr; Maguluri, Lakshamanaphaneendra; Ragupathy, R., Dr.
2017-08-01
From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.
A Theoretical Analysis of Why Hybrid Ensembles Work.
Hsu, Kuo-Wei
2017-01-01
Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles.
Ant-cuckoo colony optimization for feature selection in digital mammogram.
Jona, J B; Nagaveni, N
2014-01-15
Digital mammogram is the only effective screening method to detect the breast cancer. Gray Level Co-occurrence Matrix (GLCM) textural features are extracted from the mammogram. All the features are not essential to detect the mammogram. Therefore identifying the relevant feature is the aim of this work. Feature selection improves the classification rate and accuracy of any classifier. In this study, a new hybrid metaheuristic named Ant-Cuckoo Colony Optimization a hybrid of Ant Colony Optimization (ACO) and Cuckoo Search (CS) is proposed for feature selection in Digital Mammogram. ACO is a good metaheuristic optimization technique but the drawback of this algorithm is that the ant will walk through the path where the pheromone density is high which makes the whole process slow hence CS is employed to carry out the local search of ACO. Support Vector Machine (SVM) classifier with Radial Basis Kernal Function (RBF) is done along with the ACO to classify the normal mammogram from the abnormal mammogram. Experiments are conducted in miniMIAS database. The performance of the new hybrid algorithm is compared with the ACO and PSO algorithm. The results show that the hybrid Ant-Cuckoo Colony Optimization algorithm is more accurate than the other techniques.
Spatial Statistics for Tumor Cell Counting and Classification
NASA Astrophysics Data System (ADS)
Wirjadi, Oliver; Kim, Yoo-Jin; Breuel, Thomas
To count and classify cells in histological sections is a standard task in histology. One example is the grading of meningiomas, benign tumors of the meninges, which requires to assess the fraction of proliferating cells in an image. As this process is very time consuming when performed manually, automation is required. To address such problems, we propose a novel application of Markov point process methods in computer vision, leading to algorithms for computing the locations of circular objects in images. In contrast to previous algorithms using such spatial statistics methods in image analysis, the present one is fully trainable. This is achieved by combining point process methods with statistical classifiers. Using simulated data, the method proposed in this paper will be shown to be more accurate and more robust to noise than standard image processing methods. On the publicly available SIMCEP benchmark for cell image analysis algorithms, the cell count performance of the present paper is significantly more accurate than results published elsewhere, especially when cells form dense clusters. Furthermore, the proposed system performs as well as a state-of-the-art algorithm for the computer-aided histological grading of meningiomas when combined with a simple k-nearest neighbor classifier for identifying proliferating cells.
Design of the OMPS limb sensor correction algorithm
NASA Astrophysics Data System (ADS)
Jaross, Glen; McPeters, Richard; Seftor, Colin; Kowitt, Mark
The Sensor Data Records (SDR) for the Ozone Mapping and Profiler Suite (OMPS) on NPOESS (National Polar-orbiting Operational Environmental Satellite System) contains geolocated and calibrated radiances, and are similar to the Level 1 data of NASA Earth Observing System and other programs. The SDR algorithms (one for each of the 3 OMPS focal planes) are the processes by which the Raw Data Records (RDR) from the OMPS sensors are converted into the records that contain all data necessary for ozone retrievals. Consequently, the algorithms must correct and calibrate Earth signals, geolocate the data, and identify and ingest collocated ancillary data. As with other limb sensors, ozone profile retrievals are relatively insensitive to calibration errors due to the use of altitude normalization and wavelength pairing. But the profile retrievals as they pertain to OMPS are not immune from sensor changes. In particular, the OMPS Limb sensor images an altitude range of > 100 km and a spectral range of 290-1000 nm on its detector. Uncorrected sensor degradation and spectral registration drifts can lead to changes in the measured radiance profile, which in turn affects the ozone trend measurement. Since OMPS is intended for long-term monitoring, sensor calibration is a specific concern. The calibration is maintained via the ground data processing. This means that all sensor calibration data, including direct solar measurements, are brought down in the raw data and processed separately by the SDR algorithms. One of the sensor corrections performed by the algorithm is the correction for stray light. The imaging spectrometer and the unique focal plane design of OMPS makes these corrections particularly challenging and important. Following an overview of the algorithm flow, we will briefly describe the sensor stray light characterization and the correction approach used in the code.
Filli, Lukas; Marcon, Magda; Scholz, Bernhard; Calcagni, Maurizio; Finkenstädt, Tim; Andreisek, Gustav; Guggenberger, Roman
2014-12-01
The aim of this study was to evaluate a prototype correction algorithm to reduce metal artefacts in flat detector computed tomography (FDCT) of scaphoid fixation screws. FDCT has gained interest in imaging small anatomic structures of the appendicular skeleton. Angiographic C-arm systems with flat detectors allow fluoroscopy and FDCT imaging in a one-stop procedure emphasizing their role as an ideal intraoperative imaging tool. However, FDCT imaging can be significantly impaired by artefacts induced by fixation screws. Following ethical board approval, commercially available scaphoid fixation screws were inserted into six cadaveric specimens in order to fix artificially induced scaphoid fractures. FDCT images corrected with the algorithm were compared to uncorrected images both quantitatively and qualitatively by two independent radiologists in terms of artefacts, screw contour, fracture line visibility, bone visibility, and soft tissue definition. Normal distribution of variables was evaluated using the Kolmogorov-Smirnov test. In case of normal distribution, quantitative variables were compared using paired Student's t tests. The Wilcoxon signed-rank test was used for quantitative variables without normal distribution and all qualitative variables. A p value of < 0.05 was considered to indicate statistically significant differences. Metal artefacts were significantly reduced by the correction algorithm (p < 0.001), and the fracture line was more clearly defined (p < 0.01). The inter-observer reliability was "almost perfect" (intra-class correlation coefficient 0.85, p < 0.001). The prototype correction algorithm in FDCT for metal artefacts induced by scaphoid fixation screws may facilitate intra- and postoperative follow-up imaging. Flat detector computed tomography (FDCT) is a helpful imaging tool for scaphoid fixation. The correction algorithm significantly reduces artefacts in FDCT induced by scaphoid fixation screws. This may facilitate intra- and postoperative follow-up imaging.
NASA Technical Reports Server (NTRS)
Hlaing, Soe; Gilerson, Alexander; Harmal, Tristan; Tonizzo, Alberto; Weidemann, Alan; Arnone, Robert; Ahmed, Samir
2012-01-01
Water-leaving radiances, retrieved from in situ or satellite measurements, need to be corrected for the bidirectional properties of the measured light in order to standardize the data and make them comparable with each other. The current operational algorithm for the correction of bidirectional effects from the satellite ocean color data is optimized for typical oceanic waters. However, versions of bidirectional reflectance correction algorithms specifically tuned for typical coastal waters and other case 2 conditions are particularly needed to improve the overall quality of those data. In order to analyze the bidirectional reflectance distribution function (BRDF) of case 2 waters, a dataset of typical remote sensing reflectances was generated through radiative transfer simulations for a large range of viewing and illumination geometries. Based on this simulated dataset, a case 2 water focused remote sensing reflectance model is proposed to correct above-water and satellite water-leaving radiance data for bidirectional effects. The proposed model is first validated with a one year time series of in situ above-water measurements acquired by collocated multispectral and hyperspectral radiometers, which have different viewing geometries installed at the Long Island Sound Coastal Observatory (LISCO). Match-ups and intercomparisons performed on these concurrent measurements show that the proposed algorithm outperforms the algorithm currently in use at all wavelengths, with average improvement of 2.4% over the spectral range. LISCO's time series data have also been used to evaluate improvements in match-up comparisons of Moderate Resolution Imaging Spectroradiometer satellite data when the proposed BRDF correction is used in lieu of the current algorithm. It is shown that the discrepancies between coincident in-situ sea-based and satellite data decreased by 3.15% with the use of the proposed algorithm.
NASA Astrophysics Data System (ADS)
Santospirito, S. P.; Słyk, Kamil; Luo, Bin; Łopatka, Rafał; Gilmour, Oliver; Rudlin, John
2013-05-01
Detection of defects in Laser Powder Deposition (LPD) produced components has been achieved by laser thermography. An automatic in-process NDT defect detection software system has been developed for the analysis of laser thermography to automatically detect, reliably measure and then sentence defects in individual beads of LPD components. A deposition path profile definition has been introduced so all laser powder deposition beads can be modeled, and the inspection system has been developed to automatically generate an optimized inspection plan in which sampling images follow the deposition track, and automatically control and communicate with robot-arms, the source laser and cameras to implement image acquisition. Algorithms were developed so that the defect sizes can be correctly evaluated and these have been confirmed using test samples. Individual inspection images can also be stitched together for a single bead, a layer of beads or multiple layers of beads so that defects can be mapped through the additive process. A mathematical model was built up to analyze and evaluate the movement of heat throughout the inspection bead. Inspection processes were developed and positional and temporal gradient algorithms have been used to measure the flaw sizes. Defect analysis is then performed to determine if the defect(s) can be further classified (crack, lack of fusion, porosity) and the sentencing engine then compares the most significant defect or group of defects against the acceptance criteria - independent of human decisions. Testing on manufactured defects from the EC funded INTRAPID project has successful detected and correctly sentenced all samples.
NASA Technical Reports Server (NTRS)
Truong, T. K.; Hsu, I. S.; Eastman, W. L.; Reed, I. S.
1987-01-01
It is well known that the Euclidean algorithm or its equivalent, continued fractions, can be used to find the error locator polynomial and the error evaluator polynomial in Berlekamp's key equation needed to decode a Reed-Solomon (RS) code. A simplified procedure is developed and proved to correct erasures as well as errors by replacing the initial condition of the Euclidean algorithm by the erasure locator polynomial and the Forney syndrome polynomial. By this means, the errata locator polynomial and the errata evaluator polynomial can be obtained, simultaneously and simply, by the Euclidean algorithm only. With this improved technique the complexity of time domain RS decoders for correcting both errors and erasures is reduced substantially from previous approaches. As a consequence, decoders for correcting both errors and erasures of RS codes can be made more modular, regular, simple, and naturally suitable for both VLSI and software implementation. An example illustrating this modified decoding procedure is given for a (15, 9) RS code.
Atmospheric correction of SeaWiFS imagery for turbid coastal and inland waters.
Ruddick, K G; Ovidio, F; Rijkeboer, M
2000-02-20
The standard SeaWiFS atmospheric correction algorithm, designed for open ocean water, has been extended for use over turbid coastal and inland waters. Failure of the standard algorithm over turbid waters can be attributed to invalid assumptions of zero water-leaving radiance for the near-infrared bands at 765 and 865 nm. In the present study these assumptions are replaced by the assumptions of spatial homogeneity of the 765:865-nm ratios for aerosol reflectance and for water-leaving reflectance. These two ratios are imposed as calibration parameters after inspection of the Rayleigh-corrected reflectance scatterplot. The performance of the new algorithm is demonstrated for imagery of Belgian coastal waters and yields physically realistic water-leaving radiance spectra. A preliminary comparison with in situ radiance spectra for the Dutch Lake Markermeer shows significant improvement over the standard atmospheric correction algorithm. An analysis is made of the sensitivity of results to the choice of calibration parameters, and perspectives for application of the method to other sensors are briefly discussed.
Li, Jinyan; Fong, Simon; Wong, Raymond K; Millham, Richard; Wong, Kelvin K L
2017-06-28
Due to the high-dimensional characteristics of dataset, we propose a new method based on the Wolf Search Algorithm (WSA) for optimising the feature selection problem. The proposed approach uses the natural strategy established by Charles Darwin; that is, 'It is not the strongest of the species that survives, but the most adaptable'. This means that in the evolution of a swarm, the elitists are motivated to quickly obtain more and better resources. The memory function helps the proposed method to avoid repeat searches for the worst position in order to enhance the effectiveness of the search, while the binary strategy simplifies the feature selection problem into a similar problem of function optimisation. Furthermore, the wrapper strategy gathers these strengthened wolves with the classifier of extreme learning machine to find a sub-dataset with a reasonable number of features that offers the maximum correctness of global classification models. The experimental results from the six public high-dimensional bioinformatics datasets tested demonstrate that the proposed method can best some of the conventional feature selection methods up to 29% in classification accuracy, and outperform previous WSAs by up to 99.81% in computational time.
Kelly, Jane M.; Osamba, Benta; Garg, Renu M.; Hamel, Mary J.; Lewis, Jennifer J.; Rowe, Samantha Y.; Rowe, Alexander K.; Deming, Michael S.
2001-01-01
Objectives. To characterize community health worker (CHW) performance using an algorithm for managing common childhood illnesses in Siaya District, Kenya, we conducted CHW evaluations in 1998, 1999, and 2001. Methods. Randomly selected CHWs were observed managing sick outpatient and inpatient children at a hospital, and their management was compared with that of an expert clinician who used the algorithm. Results. One hundred, 108, and 114 CHWs participated in the evaluations in 1998, 1999, and 2001, respectively. The proportions of children treated “adequately” (with an antibiotic, antimalarial, oral rehydration solution, or referral, depending on the child's disease classifications) were 57.8%, 35.5%, and 38.9%, respectively, for children with a severe classification and 27.7%, 77.3%, and 74.3%, respectively, for children with a moderate (but not severe) classification. CHWs adequately treated 90.5% of malaria cases (the most commonly encountered classification). CHWs often made mistakes assessing symptoms, classifying illnesses, and prescribing correct doses of medications. Conclusions. Deficiencies were found in the management of sick children by CHWs, although care was not consistently poor. Key reasons for the deficiencies appear to be guideline complexity and inadequate clinical supervision; other possible causes are discussed. PMID:11574324
Semi-supervised anomaly detection - towards model-independent searches of new physics
NASA Astrophysics Data System (ADS)
Kuusela, Mikael; Vatanen, Tommi; Malmi, Eric; Raiko, Tapani; Aaltonen, Timo; Nagai, Yoshikazu
2012-06-01
Most classification algorithms used in high energy physics fall under the category of supervised machine learning. Such methods require a training set containing both signal and background events and are prone to classification errors should this training data be systematically inaccurate for example due to the assumed MC model. To complement such model-dependent searches, we propose an algorithm based on semi-supervised anomaly detection techniques, which does not require a MC training sample for the signal data. We first model the background using a multivariate Gaussian mixture model. We then search for deviations from this model by fitting to the observations a mixture of the background model and a number of additional Gaussians. This allows us to perform pattern recognition of any anomalous excess over the background. We show by a comparison to neural network classifiers that such an approach is a lot more robust against misspecification of the signal MC than supervised classification. In cases where there is an unexpected signal, a neural network might fail to correctly identify it, while anomaly detection does not suffer from such a limitation. On the other hand, when there are no systematic errors in the training data, both methods perform comparably.
A real-time method for autonomous passive acoustic detection-classification of humpback whales.
Abbot, Ted A; Premus, Vincent E; Abbot, Philip A
2010-05-01
This paper describes a method for real-time, autonomous, joint detection-classification of humpback whale vocalizations. The approach adapts the spectrogram correlation method used by Mellinger and Clark [J. Acoust. Soc. Am. 107, 3518-3529 (2000)] for bowhead whale endnote detection to the humpback whale problem. The objective is the implementation of a system to determine the presence or absence of humpback whales with passive acoustic methods and to perform this classification with low false alarm rate in real time. Multiple correlation kernels are used due to the diversity of humpback song. The approach also takes advantage of the fact that humpbacks tend to vocalize repeatedly for extended periods of time, and identification is declared only when multiple song units are detected within a fixed time interval. Humpback whale vocalizations from Alaska, Hawaii, and Stellwagen Bank were used to train the algorithm. It was then tested on independent data obtained off Kaena Point, Hawaii in February and March of 2009. Results show that the algorithm successfully classified humpback whales autonomously in real time, with a measured probability of correct classification in excess of 74% and a measured probability of false alarm below 1%.
Closing the Loop in ICU Decision Support: Physiologic Event Detection, Alerts, and Documentation
Norris, Patrick R.; Dawant, Benoit M.
2002-01-01
Automated physiologic event detection and alerting is a challenging task in the ICU. Ideally care providers should be alerted only when events are clinically significant and there is opportunity for corrective action. However, the concepts of clinical significance and opportunity are difficult to define in automated systems, and effectiveness of alerting algorithms is difficult to measure. This paper describes recent efforts on the Simon project to capture information from ICU care providers about patient state and therapy in response to alerts, in order to assess the value of event definitions and progressively refine alerting algorithms. Event definitions for intracranial pressure and cerebral perfusion pressure were studied by implementing a reliable system to automatically deliver alerts to clinical users’ alphanumeric pagers, and to capture associated documentation about patient state and therapy when the alerts occurred. During a 6-month test period in the trauma ICU at Vanderbilt University Medical Center, 530 alerts were detected in 2280 hours of data spanning 14 patients. Clinical users electronically documented 81% of these alerts as they occurred. Retrospectively classifying documentation based on therapeutic actions taken, or reasons why actions were not taken, provided useful information about ways to potentially improve event definitions and enhance system utility.
Artes, Paul H; McLeod, David; Henson, David B
2002-01-01
To report on differences between the latency distributions of responses to stimuli and to false-positive catch trials in suprathreshold perimetry. To describe an algorithm for defining response time windows and to report on its performance in discriminating between true- and false-positive responses on the basis of response time (RT). A sample of 435 largely inexperienced patients underwent suprathreshold visual field examination on a perimeter that was modified to record RTs. Data were analyzed from 60,500 responses to suprathreshold stimuli and from 523 false-positive responses to catch trials. False-positive responses had much more variable latencies than responses to suprathreshold stimuli. An algorithm defining RT windows on the basis of z-transformed individual latency samples correctly identified more than 70% of false-positive responses to catch trials, whereas fewer than 3% of responses to suprathreshold stimuli were classified as false-positive responses. Latency analysis can be used to detect a substantial proportion of false-positive responses in suprathreshold perimetry. Rejection of such responses may increase the reliability of visual field screening by reducing variability and bias in a small but clinically important proportion of patients.