reduced feature set: Topics by Science.gov

Sample records for reduced feature set

Fast detection of vascular plaque in optical coherence tomography images using a reduced feature set

NASA Astrophysics Data System (ADS)

Prakash, Ammu; Ocana Macias, Mariano; Hewko, Mark; Sowa, Michael; Sherif, Sherif

2018-03-01

Optical coherence tomography (OCT) images are capable of detecting vascular plaque by using the full set of 26 Haralick textural features and a standard K-means clustering algorithm. However, the use of the full set of 26 textural features is computationally expensive and may not be feasible for real time implementation. In this work, we identified a reduced set of 3 textural feature which characterizes vascular plaque and used a generalized Fuzzy C-means clustering algorithm. Our work involves three steps: 1) the reduction of a full set 26 textural feature to a reduced set of 3 textural features by using genetic algorithm (GA) optimization method 2) the implementation of an unsupervised generalized clustering algorithm (Fuzzy C-means) on the reduced feature space, and 3) the validation of our results using histology and actual photographic images of vascular plaque. Our results show an excellent match with histology and actual photographic images of vascular tissue. Therefore, our results could provide an efficient pre-clinical tool for the detection of vascular plaque in real time OCT imaging.
A Reduced Set of Features for Chronic Kidney Disease Prediction

PubMed Central

Misir, Rajesh; Mitra, Malay; Samanta, Ranjit Kumar

2017-01-01

Chronic kidney disease (CKD) is one of the life-threatening diseases. Early detection and proper management are solicited for augmenting survivability. As per the UCI data set, there are 24 attributes for predicting CKD or non-CKD. At least there are 16 attributes need pathological investigations involving more resources, money, time, and uncertainties. The objective of this work is to explore whether we can predict CKD or non-CKD with reasonable accuracy using less number of features. An intelligent system development approach has been used in this study. We attempted one important feature selection technique to discover reduced features that explain the data set much better. Two intelligent binary classification techniques have been adopted for the validity of the reduced feature set. Performances were evaluated in terms of four important classification evaluation parameters. As suggested from our results, we may more concentrate on those reduced features for identifying CKD and thereby reduces uncertainty, saves time, and reduces costs. PMID:28706750
Effective traffic features selection algorithm for cyber-attacks samples

NASA Astrophysics Data System (ADS)

Li, Yihong; Liu, Fangzheng; Du, Zhenyu

2018-05-01

By studying the defense scheme of Network attacks, this paper propose an effective traffic features selection algorithm based on k-means++ clustering to deal with the problem of high dimensionality of traffic features which extracted from cyber-attacks samples. Firstly, this algorithm divide the original feature set into attack traffic feature set and background traffic feature set by the clustering. Then, we calculates the variation of clustering performance after removing a certain feature. Finally, evaluating the degree of distinctiveness of the feature vector according to the result. Among them, the effective feature vector is whose degree of distinctiveness exceeds the set threshold. The purpose of this paper is to select out the effective features from the extracted original feature set. In this way, it can reduce the dimensionality of the features so as to reduce the space-time overhead of subsequent detection. The experimental results show that the proposed algorithm is feasible and it has some advantages over other selection algorithms.
Perceptual quality estimation of H.264/AVC videos using reduced-reference and no-reference models

NASA Astrophysics Data System (ADS)

Shahid, Muhammad; Pandremmenou, Katerina; Kondi, Lisimachos P.; Rossholm, Andreas; Lövström, Benny

2016-09-01

Reduced-reference (RR) and no-reference (NR) models for video quality estimation, using features that account for the impact of coding artifacts, spatio-temporal complexity, and packet losses, are proposed. The purpose of this study is to analyze a number of potentially quality-relevant features in order to select the most suitable set of features for building the desired models. The proposed sets of features have not been used in the literature and some of the features are used for the first time in this study. The features are employed by the least absolute shrinkage and selection operator (LASSO), which selects only the most influential of them toward perceptual quality. For comparison, we apply feature selection in the complete feature sets and ridge regression on the reduced sets. The models are validated using a database of H.264/AVC encoded videos that were subjectively assessed for quality in an ITU-T compliant laboratory. We infer that just two features selected by RR LASSO and two bitstream-based features selected by NR LASSO are able to estimate perceptual quality with high accuracy, higher than that of ridge, which uses more features. The comparisons with competing works and two full-reference metrics also verify the superiority of our models.
Performance comparison of phenomenology-based features to generic features for false alarm reduction in UWB SAR imagery

NASA Astrophysics Data System (ADS)

Marble, Jay A.; Gorman, John D.

1999-08-01

A feature based approach is taken to reduce the occurrence of false alarms in foliage penetrating, ultra-wideband, synthetic aperture radar data. A set of 'generic' features is defined based on target size, shape, and pixel intensity. A second set of features is defined that contains generic features combined with features based on scattering phenomenology. Each set is combined using a quadratic polynomial discriminant (QPD), and performance is characterized by generating a receiver operating characteristic (ROC) curve. Results show that the feature set containing phenomenological features improves performance against both broadside and end-on targets. Performance against end-on targets, however, is especially pronounced.
A novel feature extraction approach for microarray data based on multi-algorithm fusion

PubMed Central

Jiang, Zhu; Xu, Rong

2015-01-01

Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions. PMID:25780277
A novel feature extraction approach for microarray data based on multi-algorithm fusion.

PubMed

Jiang, Zhu; Xu, Rong

2015-01-01

Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions.
Breast Cancer Detection with Reduced Feature Set.

PubMed

Mert, Ahmet; Kılıç, Niyazi; Bilgili, Erdem; Akan, Aydin

2015-01-01

This paper explores feature reduction properties of independent component analysis (ICA) on breast cancer decision support system. Wisconsin diagnostic breast cancer (WDBC) dataset is reduced to one-dimensional feature vector computing an independent component (IC). The original data with 30 features and reduced one feature (IC) are used to evaluate diagnostic accuracy of the classifiers such as k-nearest neighbor (k-NN), artificial neural network (ANN), radial basis function neural network (RBFNN), and support vector machine (SVM). The comparison of the proposed classification using the IC with original feature set is also tested on different validation (5/10-fold cross-validations) and partitioning (20%-40%) methods. These classifiers are evaluated how to effectively categorize tumors as benign and malignant in terms of specificity, sensitivity, accuracy, F-score, Youden's index, discriminant power, and the receiver operating characteristic (ROC) curve with its criterion values including area under curve (AUC) and 95% confidential interval (CI). This represents an improvement in diagnostic decision support system, while reducing computational complexity.
A Review of Feature Extraction Software for Microarray Gene Expression Data

PubMed Central

Tan, Ching Siang; Ting, Wai Soon; Mohamad, Mohd Saberi; Chan, Weng Howe; Deris, Safaai; Ali Shah, Zuraini

2014-01-01

When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming large-scale gene expression data into a set of genes is called feature extraction. If the genes extracted are carefully chosen, this gene set can extract the relevant information from the large-scale gene expression data, allowing further analysis by using this reduced representation instead of the full size data. In this paper, we review numerous software applications that can be used for feature extraction. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component Analysis (ICA), Partial Least Squares (PLS), and Local Linear Embedding (LLE). A summary and sources of the software are provided in the last section for each feature extraction method. PMID:25250315
A dimension reduction strategy for improving the efficiency of computer-aided detection for CT colonography

NASA Astrophysics Data System (ADS)

Song, Bowen; Zhang, Guopeng; Wang, Huafeng; Zhu, Wei; Liang, Zhengrong

2013-02-01

Various types of features, e.g., geometric features, texture features, projection features etc., have been introduced for polyp detection and differentiation tasks via computer aided detection and diagnosis (CAD) for computed tomography colonography (CTC). Although these features together cover more information of the data, some of them are statistically highly-related to others, which made the feature set redundant and burdened the computation task of CAD. In this paper, we proposed a new dimension reduction method which combines hierarchical clustering and principal component analysis (PCA) for false positives (FPs) reduction task. First, we group all the features based on their similarity using hierarchical clustering, and then PCA is employed within each group. Different numbers of principal components are selected from each group to form the final feature set. Support vector machine is used to perform the classification. The results show that when three principal components were chosen from each group we can achieve an area under the curve of receiver operating characteristics of 0.905, which is as high as the original dataset. Meanwhile, the computation time is reduced by 70% and the feature set size is reduce by 77%. It can be concluded that the proposed method captures the most important information of the feature set and the classification accuracy is not affected after the dimension reduction. The result is promising and further investigation, such as automatically threshold setting, are worthwhile and are under progress.
Efficient feature selection using a hybrid algorithm for the task of epileptic seizure detection

NASA Astrophysics Data System (ADS)

Lai, Kee Huong; Zainuddin, Zarita; Ong, Pauline

2014-07-01

Feature selection is a very important aspect in the field of machine learning. It entails the search of an optimal subset from a very large data set with high dimensional feature space. Apart from eliminating redundant features and reducing computational cost, a good selection of feature also leads to higher prediction and classification accuracy. In this paper, an efficient feature selection technique is introduced in the task of epileptic seizure detection. The raw data are electroencephalography (EEG) signals. Using discrete wavelet transform, the biomedical signals were decomposed into several sets of wavelet coefficients. To reduce the dimension of these wavelet coefficients, a feature selection method that combines the strength of both filter and wrapper methods is proposed. Principal component analysis (PCA) is used as part of the filter method. As for wrapper method, the evolutionary harmony search (HS) algorithm is employed. This metaheuristic method aims at finding the best discriminating set of features from the original data. The obtained features were then used as input for an automated classifier, namely wavelet neural networks (WNNs). The WNNs model was trained to perform a binary classification task, that is, to determine whether a given EEG signal was normal or epileptic. For comparison purposes, different sets of features were also used as input. Simulation results showed that the WNNs that used the features chosen by the hybrid algorithm achieved the highest overall classification accuracy.
FSR: feature set reduction for scalable and accurate multi-class cancer subtype classification based on copy number.

PubMed

Wong, Gerard; Leckie, Christopher; Kowalczyk, Adam

2012-01-15

Feature selection is a key concept in machine learning for microarray datasets, where features represented by probesets are typically several orders of magnitude larger than the available sample size. Computational tractability is a key challenge for feature selection algorithms in handling very high-dimensional datasets beyond a hundred thousand features, such as in datasets produced on single nucleotide polymorphism microarrays. In this article, we present a novel feature set reduction approach that enables scalable feature selection on datasets with hundreds of thousands of features and beyond. Our approach enables more efficient handling of higher resolution datasets to achieve better disease subtype classification of samples for potentially more accurate diagnosis and prognosis, which allows clinicians to make more informed decisions in regards to patient treatment options. We applied our feature set reduction approach to several publicly available cancer single nucleotide polymorphism (SNP) array datasets and evaluated its performance in terms of its multiclass predictive classification accuracy over different cancer subtypes, its speedup in execution as well as its scalability with respect to sample size and array resolution. Feature Set Reduction (FSR) was able to reduce the dimensions of an SNP array dataset by more than two orders of magnitude while achieving at least equal, and in most cases superior predictive classification performance over that achieved on features selected by existing feature selection methods alone. An examination of the biological relevance of frequently selected features from FSR-reduced feature sets revealed strong enrichment in association with cancer. FSR was implemented in MATLAB R2010b and is available at http://ww2.cs.mu.oz.au/~gwong/FSR.
Differential diagnosis of CT focal liver lesions using texture features, feature selection and ensemble driven classifiers.

PubMed

Mougiakakou, Stavroula G; Valavanis, Ioannis K; Nikita, Alexandra; Nikita, Konstantina S

2007-09-01

The aim of the present study is to define an optimally performing computer-aided diagnosis (CAD) architecture for the classification of liver tissue from non-enhanced computed tomography (CT) images into normal liver (C1), hepatic cyst (C2), hemangioma (C3), and hepatocellular carcinoma (C4). To this end, various CAD architectures, based on texture features and ensembles of classifiers (ECs), are comparatively assessed. Number of regions of interests (ROIs) corresponding to C1-C4 have been defined by experienced radiologists in non-enhanced liver CT images. For each ROI, five distinct sets of texture features were extracted using first order statistics, spatial gray level dependence matrix, gray level difference method, Laws' texture energy measures, and fractal dimension measurements. Two different ECs were constructed and compared. The first one consists of five multilayer perceptron neural networks (NNs), each using as input one of the computed texture feature sets or its reduced version after genetic algorithm-based feature selection. The second EC comprised five different primary classifiers, namely one multilayer perceptron NN, one probabilistic NN, and three k-nearest neighbor classifiers, each fed with the combination of the five texture feature sets or their reduced versions. The final decision of each EC was extracted by using appropriate voting schemes, while bootstrap re-sampling was utilized in order to estimate the generalization ability of the CAD architectures based on the available relatively small-sized data set. The best mean classification accuracy (84.96%) is achieved by the second EC using a fused feature set, and the weighted voting scheme. The fused feature set was obtained after appropriate feature selection applied to specific subsets of the original feature set. The comparative assessment of the various CAD architectures shows that combining three types of classifiers with a voting scheme, fed with identical feature sets obtained after appropriate feature selection and fusion, may result in an accurate system able to assist differential diagnosis of focal liver lesions from non-enhanced CT images.
Sensor-oriented feature usability evaluation in fingerprint segmentation

NASA Astrophysics Data System (ADS)

Li, Ying; Yin, Yilong; Yang, Gongping

2013-06-01

Existing fingerprint segmentation methods usually process fingerprint images captured by different sensors with the same feature or feature set. We propose to improve the fingerprint segmentation result in view of an important fact that images from different sensors have different characteristics for segmentation. Feature usability evaluation, which means to evaluate the usability of features to find the personalized feature or feature set for different sensors to improve the performance of segmentation. The need for feature usability evaluation for fingerprint segmentation is raised and analyzed as a new issue. To address this issue, we present a decision-tree-based feature-usability evaluation method, which utilizes a C4.5 decision tree algorithm to evaluate and pick the best suitable feature or feature set for fingerprint segmentation from a typical candidate feature set. We apply the novel method on the FVC2002 database of fingerprint images, which are acquired by four different respective sensors and technologies. Experimental results show that the accuracy of segmentation is improved, and time consumption for feature extraction is dramatically reduced with selected feature(s).
Assessing postural stability via the correlation patterns of vertical ground reaction force components.

PubMed

Hong, Chih-Yuan; Guo, Lan-Yuen; Song, Rong; Nagurka, Mark L; Sung, Jia-Li; Yen, Chen-Wen

2016-08-02

Many methods have been proposed to assess the stability of human postural balance by using a force plate. While most of these approaches characterize postural stability by extracting features from the trajectory of the center of pressure (COP), this work develops stability measures derived from components of the ground reaction force (GRF). In comparison with previous GRF-based approaches that extract stability features from the GRF resultant force, this study proposes three feature sets derived from the correlation patterns among the vertical GRF (VGRF) components. The first and second feature sets quantitatively assess the strength and changing speed of the correlation patterns, respectively. The third feature set is used to quantify the stabilizing effect of the GRF coordination patterns on the COP. In addition to experimentally demonstrating the reliability of the proposed features, the efficacy of the proposed features has also been tested by using them to classify two age groups (18-24 and 65-73 years) in quiet standing. The experimental results show that the proposed features are considerably more sensitive to aging than one of the most effective conventional COP features and two recently proposed COM features. By extracting information from the correlation patterns of the VGRF components, this study proposes three sets of features to assess human postural stability during quiet standing. As demonstrated by the experimental results, the proposed features are not only robust to inter-trial variability but also more accurate than the tested COP and COM features in classifying the older and younger age groups. An additional advantage of the proposed approach is that it reduces the force sensing requirement from 3D to 1D, substantially reducing the cost of the force plate measurement system.
System Complexity Reduction via Feature Selection

ERIC Educational Resources Information Center

Deng, Houtao

2011-01-01

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree…
Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods.

PubMed

Qu, Kaiyang; Han, Ke; Wu, Song; Wang, Guohua; Wei, Leyi

2017-09-22

DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.
Optimizing data collection for public health decisions: a data mining approach

PubMed Central

2014-01-01

Background Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. Methods The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Results Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. Conclusions While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost. PMID:24919484
Optimizing data collection for public health decisions: a data mining approach.

PubMed

Partington, Susan N; Papakroni, Vasil; Menzies, Tim

2014-06-12

Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost.
Development and Validation of a Deep Neural Network Model for Prediction of Postoperative In-hospital Mortality.

PubMed

Lee, Christine K; Hofer, Ira; Gabel, Eilon; Baldi, Pierre; Cannesson, Maxime

2018-04-17

The authors tested the hypothesis that deep neural networks trained on intraoperative features can predict postoperative in-hospital mortality. The data used to train and validate the algorithm consists of 59,985 patients with 87 features extracted at the end of surgery. Feed-forward networks with a logistic output were trained using stochastic gradient descent with momentum. The deep neural networks were trained on 80% of the data, with 20% reserved for testing. The authors assessed improvement of the deep neural network by adding American Society of Anesthesiologists (ASA) Physical Status Classification and robustness of the deep neural network to a reduced feature set. The networks were then compared to ASA Physical Status, logistic regression, and other published clinical scores including the Surgical Apgar, Preoperative Score to Predict Postoperative Mortality, Risk Quantification Index, and the Risk Stratification Index. In-hospital mortality in the training and test sets were 0.81% and 0.73%. The deep neural network with a reduced feature set and ASA Physical Status classification had the highest area under the receiver operating characteristics curve, 0.91 (95% CI, 0.88 to 0.93). The highest logistic regression area under the curve was found with a reduced feature set and ASA Physical Status (0.90, 95% CI, 0.87 to 0.93). The Risk Stratification Index had the highest area under the receiver operating characteristics curve, at 0.97 (95% CI, 0.94 to 0.99). Deep neural networks can predict in-hospital mortality based on automatically extractable intraoperative data, but are not (yet) superior to existing methods.

Emotional recognition from the speech signal for a virtual education agent

NASA Astrophysics Data System (ADS)

Tickle, A.; Raghu, S.; Elshaw, M.

2013-06-01

This paper explores the extraction of features from the speech wave to perform intelligent emotion recognition. A feature extract tool (openSmile) was used to obtain a baseline set of 998 acoustic features from a set of emotional speech recordings from a microphone. The initial features were reduced to the most important ones so recognition of emotions using a supervised neural network could be performed. Given that the future use of virtual education agents lies with making the agents more interactive, developing agents with the capability to recognise and adapt to the emotional state of humans is an important step.
Feature selection gait-based gender classification under different circumstances

NASA Astrophysics Data System (ADS)

Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah

2014-05-01

This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.
Fuzzy feature selection based on interval type-2 fuzzy sets

NASA Astrophysics Data System (ADS)

Cherif, Sahar; Baklouti, Nesrine; Alimi, Adel; Snasel, Vaclav

2017-03-01

When dealing with real world data; noise, complexity, dimensionality, uncertainty and irrelevance can lead to low performance and insignificant judgment. Fuzzy logic is a powerful tool for controlling conflicting attributes which can have similar effects and close meanings. In this paper, an interval type-2 fuzzy feature selection is presented as a new approach for removing irrelevant features and reducing complexity. We demonstrate how can Feature Selection be joined with Interval Type-2 Fuzzy Logic for keeping significant features and hence reducing time complexity. The proposed method is compared with some other approaches. The results show that the number of attributes is proportionally small.
Systems-Level Annotation of a Metabolomics Data Set Reduces 25 000 Features to Fewer than 1000 Unique Metabolites.

PubMed

Mahieu, Nathaniel G; Patti, Gary J

2017-10-03

When using liquid chromatography/mass spectrometry (LC/MS) to perform untargeted metabolomics, it is now routine to detect tens of thousands of features from biological samples. Poor understanding of the data, however, has complicated interpretation and masked the number of unique metabolites actually being measured in an experiment. Here we place an upper bound on the number of unique metabolites detected in Escherichia coli samples analyzed with one untargeted metabolomics method. We first group multiple features arising from the same analyte, which we call "degenerate features", using a context-driven annotation approach. Surprisingly, this analysis revealed thousands of previously unreported degeneracies that reduced the number of unique analytes to ∼2961. We then applied an orthogonal approach to remove nonbiological features from the data using the 13 C-based credentialing technology. This further reduced the number of unique analytes to less than 1000. Our 90% reduction in data is 5-fold greater than previously published studies. On the basis of the results, we propose an alternative approach to untargeted metabolomics that relies on thoroughly annotated reference data sets. To this end, we introduce the creDBle database ( http://creDBle.wustl.edu ), which contains accurate mass, retention time, and MS/MS fragmentation data as well as annotations of all credentialed features.
Improved classification accuracy by feature extraction using genetic algorithms

NASA Astrophysics Data System (ADS)

Patriarche, Julia; Manduca, Armando; Erickson, Bradley J.

2003-05-01

A feature extraction algorithm has been developed for the purposes of improving classification accuracy. The algorithm uses a genetic algorithm / hill-climber hybrid to generate a set of linearly recombined features, which may be of reduced dimensionality compared with the original set. The genetic algorithm performs the global exploration, and a hill climber explores local neighborhoods. Hybridizing the genetic algorithm with a hill climber improves both the rate of convergence, and the final overall cost function value; it also reduces the sensitivity of the genetic algorithm to parameter selection. The genetic algorithm includes the operators: crossover, mutation, and deletion / reactivation - the last of these effects dimensionality reduction. The feature extractor is supervised, and is capable of deriving a separate feature space for each tissue (which are reintegrated during classification). A non-anatomical digital phantom was developed as a gold standard for testing purposes. In tests with the phantom, and with images of multiple sclerosis patients, classification with feature extractor derived features yielded lower error rates than using standard pulse sequences, and with features derived using principal components analysis. Using the multiple sclerosis patient data, the algorithm resulted in a mean 31% reduction in classification error of pure tissues.
Integrating dimension reduction and out-of-sample extension in automated classification of ex vivo human patellar cartilage on phase contrast X-ray computed tomography.

PubMed

Nagarajan, Mahesh B; Coan, Paola; Huber, Markus B; Diemoz, Paul C; Wismüller, Axel

2015-01-01

Phase contrast X-ray computed tomography (PCI-CT) has been demonstrated as a novel imaging technique that can visualize human cartilage with high spatial resolution and soft tissue contrast. Different textural approaches have been previously investigated for characterizing chondrocyte organization on PCI-CT to enable classification of healthy and osteoarthritic cartilage. However, the large size of feature sets extracted in such studies motivates an investigation into algorithmic feature reduction for computing efficient feature representations without compromising their discriminatory power. For this purpose, geometrical feature sets derived from the scaling index method (SIM) were extracted from 1392 volumes of interest (VOI) annotated on PCI-CT images of ex vivo human patellar cartilage specimens. The extracted feature sets were subject to linear and non-linear dimension reduction techniques as well as feature selection based on evaluation of mutual information criteria. The reduced feature set was subsequently used in a machine learning task with support vector regression to classify VOIs as healthy or osteoarthritic; classification performance was evaluated using the area under the receiver-operating characteristic (ROC) curve (AUC). Our results show that the classification performance achieved by 9-D SIM-derived geometric feature sets (AUC: 0.96 ± 0.02) can be maintained with 2-D representations computed from both dimension reduction and feature selection (AUC values as high as 0.97 ± 0.02). Thus, such feature reduction techniques can offer a high degree of compaction to large feature sets extracted from PCI-CT images while maintaining their ability to characterize the underlying chondrocyte patterns.
Manipulations of the features of standard video lottery terminal (VLT) games: effects in pathological and non-pathological gamblers.

PubMed

Loba, P; Stewart, S H; Klein, R M; Blackburn, J R

2001-01-01

The present study was conducted to identify game parameters that would reduce the risk of abuse of video lottery terminals (VLTs) by pathological gamblers, while exerting minimal effects on the behavior of non-pathological gamblers. Three manipulations of standard VLT game features were explored. Participants were exposed to: a counter which displayed a running total of money spent; a VLT spinning reels game where participants could no longer "stop" the reels by touching the screen; and sensory feature manipulations. In control conditions, participants were exposed to standard settings for either a spinning reels or a video poker game. Dependent variables were self-ratings of reactions to each set of parameters. A set of 2(3) x 2 x 2 (game manipulation [experimental condition(s) vs. control condition] x game [spinning reels vs. video poker] x gambler status [pathological vs. non-pathological]) repeated measures ANOVAs were conducted on all dependent variables. The findings suggest that the sensory manipulations (i.e., fast speed/sound or slow speed/no sound manipulations) produced the most robust reaction differences. Before advocating harm reduction policies such as lowering sensory features of VLT games to reduce potential harm to pathological gamblers, it is important to replicate findings in a more naturalistic setting, such as a real bar.
Anticipating and controlling mask costs within EDA physical design

NASA Astrophysics Data System (ADS)

Rieger, Michael L.; Mayhew, Jeffrey P.; Melvin, Lawrence S.; Lugg, Robert M.; Beale, Daniel F.

2003-08-01

For low k1 lithography, more aggressive OPC is being applied to critical layers, and the number of mask layers with OPC treatments is growing rapidly. The 130 nm, process node required, on average, 8 layers containing rules- or model-based OPC. The 90 nm node will have 16 OPC layers, of which 14 layers contain aggressive model-based OPC. This escalation of mask pattern complexity, coupled with the predominant use of vector-scan e-beam (VSB) mask writers contributes to the rising costs of advanced mask sets. Writing times for OPC layouts are several times longer than for traditional layouts, making mask exposure the single largest cost component for OPC masks. Lower mask yields, another key factor in higher mask costs, is also aggravated by OPC. Historical mask set costs are plotted below. The initial cost of a 90 nm-node mask set will exceed one million dollars. The relative impact of mask cost on chip depends on how many total wafers are printed with each mask set. For many foundry chips, where unit production is often well below 1000 wafers, mask costs are larger than wafer processing costs. Further increases in NRE may begin to discourage these suppliers' adoption to 90 nm and smaller nodes. In this paper we will outline several alternatives for reducing mask costs by strategically leveraging dimensional margins. Dimensional specifications for a particular masking layer usually are applied uniformly to all features on that layer. As a practical matter, accuracy requirements on different features in the design may vary widely. Take a polysilicon layer, for example: global tolerance specifications for that layer are driven by the transistor-gate requirements; but these parameters over-specify interconnect feature requirements. By identifying features where dimensional accuracy requirements can be reduced, additional margin can be leveraged to reduce OPC complexity. Mask writing time on VSB tools will drop in nearly direct proportion to reduce shot count. By inspecting masks with reference to feature-dependent margins, instead of uniform specifications, mask yield can be effectively increased further reducing delivered mask expense.
Classification of large-scale fundus image data sets: a cloud-computing framework.

PubMed

Roychowdhury, Sohini

2016-08-01

Large medical image data sets with high dimensionality require substantial amount of computation time for data creation and data processing. This paper presents a novel generalized method that finds optimal image-based feature sets that reduce computational time complexity while maximizing overall classification accuracy for detection of diabetic retinopathy (DR). First, region-based and pixel-based features are extracted from fundus images for classification of DR lesions and vessel-like structures. Next, feature ranking strategies are used to distinguish the optimal classification feature sets. DR lesion and vessel classification accuracies are computed using the boosted decision tree and decision forest classifiers in the Microsoft Azure Machine Learning Studio platform, respectively. For images from the DIARETDB1 data set, 40 of its highest-ranked features are used to classify four DR lesion types with an average classification accuracy of 90.1% in 792 seconds. Also, for classification of red lesion regions and hemorrhages from microaneurysms, accuracies of 85% and 72% are observed, respectively. For images from STARE data set, 40 high-ranked features can classify minor blood vessels with an accuracy of 83.5% in 326 seconds. Such cloud-based fundus image analysis systems can significantly enhance the borderline classification performances in automated screening systems.
Hypotheses generation as supervised link discovery with automated class labeling on large-scale biomedical concept networks

PubMed Central

2012-01-01

Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce frame-work. We extract a set of heterogeneous features such as random walk based features, neighborhood features and common author features. The potential number of links to consider for the possibility of link discovery is large in our concept network and to address the scalability problem, the features from a concept network are extracted using a cluster with Map-Reduce framework. We further model link discovery as a classification problem carried out on a training data set automatically extracted from two network snapshots taken in two consecutive time duration. A set of heterogeneous features, which cover both topological and semantic features derived from the concept network, have been studied with respect to their impacts on the accuracy of the proposed supervised link discovery process. A case study of hypotheses generation based on the proposed method has been presented in the paper. PMID:22759614
Dimensionality Reduction Through Classifier Ensembles

NASA Technical Reports Server (NTRS)

Oza, Nikunj C.; Tumer, Kagan; Norwig, Peter (Technical Monitor)

1999-01-01

In data mining, one often needs to analyze datasets with a very large number of attributes. Performing machine learning directly on such data sets is often impractical because of extensive run times, excessive complexity of the fitted model (often leading to overfitting), and the well-known "curse of dimensionality." In practice, to avoid such problems, feature selection and/or extraction are often used to reduce data dimensionality prior to the learning step. However, existing feature selection/extraction algorithms either evaluate features by their effectiveness across the entire data set or simply disregard class information altogether (e.g., principal component analysis). Furthermore, feature extraction algorithms such as principal components analysis create new features that are often meaningless to human users. In this article, we present input decimation, a method that provides "feature subsets" that are selected for their ability to discriminate among the classes. These features are subsequently used in ensembles of classifiers, yielding results superior to single classifiers, ensembles that use the full set of features, and ensembles based on principal component analysis on both real and synthetic datasets.
Integrating Dimension Reduction and Out-of-Sample Extension in Automated Classification of Ex Vivo Human Patellar Cartilage on Phase Contrast X-Ray Computed Tomography

PubMed Central

Nagarajan, Mahesh B.; Coan, Paola; Huber, Markus B.; Diemoz, Paul C.; Wismüller, Axel

2015-01-01

Phase contrast X-ray computed tomography (PCI-CT) has been demonstrated as a novel imaging technique that can visualize human cartilage with high spatial resolution and soft tissue contrast. Different textural approaches have been previously investigated for characterizing chondrocyte organization on PCI-CT to enable classification of healthy and osteoarthritic cartilage. However, the large size of feature sets extracted in such studies motivates an investigation into algorithmic feature reduction for computing efficient feature representations without compromising their discriminatory power. For this purpose, geometrical feature sets derived from the scaling index method (SIM) were extracted from 1392 volumes of interest (VOI) annotated on PCI-CT images of ex vivo human patellar cartilage specimens. The extracted feature sets were subject to linear and non-linear dimension reduction techniques as well as feature selection based on evaluation of mutual information criteria. The reduced feature set was subsequently used in a machine learning task with support vector regression to classify VOIs as healthy or osteoarthritic; classification performance was evaluated using the area under the receiver-operating characteristic (ROC) curve (AUC). Our results show that the classification performance achieved by 9-D SIM-derived geometric feature sets (AUC: 0.96 ± 0.02) can be maintained with 2-D representations computed from both dimension reduction and feature selection (AUC values as high as 0.97 ± 0.02). Thus, such feature reduction techniques can offer a high degree of compaction to large feature sets extracted from PCI-CT images while maintaining their ability to characterize the underlying chondrocyte patterns. PMID:25710875
Information Theory for Gabor Feature Selection for Face Recognition

NASA Astrophysics Data System (ADS)

Shen, Linlin; Bai, Li

2006-12-01

A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004), our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.
How well does multiple OCR error correction generalize?

NASA Astrophysics Data System (ADS)

Lund, William B.; Ringger, Eric K.; Walker, Daniel D.

2013-12-01

As the digitization of historical documents, such as newspapers, becomes more common, the need of the archive patron for accurate digital text from those documents increases. Building on our earlier work, the contributions of this paper are: 1. in demonstrating the applicability of novel methods for correcting optical character recognition (OCR) on disparate data sets, including a new synthetic training set, 2. enhancing the correction algorithm with novel features, and 3. assessing the data requirements of the correction learning method. First, we correct errors using conditional random fields (CRF) trained on synthetic training data sets in order to demonstrate the applicability of the methodology to unrelated test sets. Second, we show the strength of lexical features from the training sets on two unrelated test sets, yielding a relative reduction in word error rate on the test sets of 6.52%. New features capture the recurrence of hypothesis tokens and yield an additional relative reduction in WER of 2.30%. Further, we show that only 2.0% of the full training corpus of over 500,000 feature cases is needed to achieve correction results comparable to those using the entire training corpus, effectively reducing both the complexity of the training process and the learned correction model.
Feature Selection for Classification of Polar Regions Using a Fuzzy Expert System

NASA Technical Reports Server (NTRS)

Penaloza, Mauel A.; Welch, Ronald M.

1996-01-01

Labeling, feature selection, and the choice of classifier are critical elements for classification of scenes and for image understanding. This study examines several methods for feature selection in polar regions, including the list, of a fuzzy logic-based expert system for further refinement of a set of selected features. Six Advanced Very High Resolution Radiometer (AVHRR) Local Area Coverage (LAC) arctic scenes are classified into nine classes: water, snow / ice, ice cloud, land, thin stratus, stratus over water, cumulus over water, textured snow over water, and snow-covered mountains. Sixty-seven spectral and textural features are computed and analyzed by the feature selection algorithms. The divergence, histogram analysis, and discriminant analysis approaches are intercompared for their effectiveness in feature selection. The fuzzy expert system method is used not only to determine the effectiveness of each approach in classifying polar scenes, but also to further reduce the features into a more optimal set. For each selection method,features are ranked from best to worst, and the best half of the features are selected. Then, rules using these selected features are defined. The results of running the fuzzy expert system with these rules show that the divergence method produces the best set features, not only does it produce the highest classification accuracy, but also it has the lowest computation requirements. A reduction of the set of features produced by the divergence method using the fuzzy expert system results in an overall classification accuracy of over 95 %. However, this increase of accuracy has a high computation cost.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ogden, K; O’Dwyer, R; Bradford, T

Purpose: To reduce differences in features calculated from MRI brain scans acquired at different field strengths with or without Gadolinium contrast. Methods: Brain scans were processed for 111 epilepsy patients to extract hippocampus and thalamus features. Scans were acquired on 1.5 T scanners with Gadolinium contrast (group A), 1.5T scanners without Gd (group B), and 3.0 T scanners without Gd (group C). A total of 72 features were extracted. Features were extracted from original scans and from scans where the image pixel values were rescaled to the mean of the hippocampi and thalami values. For each data set, cluster analysismore » was performed on the raw feature set and for feature sets with normalization (conversion to Z scores). Two methods of normalization were used: The first was over all values of a given feature, and the second by normalizing within the patient group membership. The clustering software was configured to produce 3 clusters. Group fractions in each cluster were calculated. Results: For features calculated from both the non-rescaled and rescaled data, cluster membership was identical for both the non-normalized and normalized data sets. Cluster 1 was comprised entirely of Group A data, Cluster 2 contained data from all three groups, and Cluster 3 contained data from only groups 1 and 2. For the categorically normalized data sets there was a more uniform distribution of group data in the three Clusters. A less pronounced effect was seen in the rescaled image data features. Conclusion: Image Rescaling and feature renormalization can have a significant effect on the results of clustering analysis. These effects are also likely to influence the results of supervised machine learning algorithms. It may be possible to partly remove the influence of scanner field strength and the presence of Gadolinium based contrast in feature extraction for radiomics applications.« less
Feature selection for wearable smartphone-based human activity recognition with able bodied, elderly, and stroke patients.

PubMed

Capela, Nicole A; Lemaire, Edward D; Baddour, Natalie

2015-01-01

Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.
Feature Selection for Wearable Smartphone-Based Human Activity Recognition with Able bodied, Elderly, and Stroke Patients

PubMed Central

2015-01-01

Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations. PMID:25885272
SoFoCles: feature filtering for microarray classification based on gene ontology.

PubMed

Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A

2010-02-01

Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.
A PCA aided cross-covariance scheme for discriminative feature extraction from EEG signals.

PubMed

Zarei, Roozbeh; He, Jing; Siuly, Siuly; Zhang, Yanchun

2017-07-01

Feature extraction of EEG signals plays a significant role in Brain-computer interface (BCI) as it can significantly affect the performance and the computational time of the system. The main aim of the current work is to introduce an innovative algorithm for acquiring reliable discriminating features from EEG signals to improve classification performances and to reduce the time complexity. This study develops a robust feature extraction method combining the principal component analysis (PCA) and the cross-covariance technique (CCOV) for the extraction of discriminatory information from the mental states based on EEG signals in BCI applications. We apply the correlation based variable selection method with the best first search on the extracted features to identify the best feature set for characterizing the distribution of mental state signals. To verify the robustness of the proposed feature extraction method, three machine learning techniques: multilayer perceptron neural networks (MLP), least square support vector machine (LS-SVM), and logistic regression (LR) are employed on the obtained features. The proposed methods are evaluated on two publicly available datasets. Furthermore, we evaluate the performance of the proposed methods by comparing it with some recently reported algorithms. The experimental results show that all three classifiers achieve high performance (above 99% overall classification accuracy) for the proposed feature set. Among these classifiers, the MLP and LS-SVM methods yield the best performance for the obtained feature. The average sensitivity, specificity and classification accuracy for these two classifiers are same, which are 99.32%, 100%, and 99.66%, respectively for the BCI competition dataset IVa and 100%, 100%, and 100%, for the BCI competition dataset IVb. The results also indicate the proposed methods outperform the most recently reported methods by at least 0.25% average accuracy improvement in dataset IVa. The execution time results show that the proposed method has less time complexity after feature selection. The proposed feature extraction method is very effective for getting representatives information from mental states EEG signals in BCI applications and reducing the computational complexity of classifiers by reducing the number of extracted features. Copyright © 2017 Elsevier B.V. All rights reserved.

Hyperspectral data discrimination methods

NASA Astrophysics Data System (ADS)

Casasent, David P.; Chen, Xuewen

2000-12-01

Hyperspectral data provides spectral response information that provides detailed chemical, moisture, and other description of constituent parts of an item. These new sensor data are useful in USDA product inspection. However, such data introduce problems such as the curse of dimensionality, the need to reduce the number of features used to accommodate realistic small training set sizes, and the need to employ discriminatory features and still achieve good generalization (comparable training and test set performance). Several two-step methods are compared to a new and preferable single-step spectral decomposition algorithm. Initial results on hyperspectral data for good/bad almonds and for good/bad (aflatoxin infested) corn kernels are presented. The hyperspectral application addressed differs greatly from prior USDA work (PLS) in which the level of a specific channel constituent in food was estimated. A validation set (separate from the test set) is used in selecting algorithm parameters. Threshold parameters are varied to select the best Pc operating point. Initial results show that nonlinear features yield improved performance.
Evolutionary optimization of radial basis function classifiers for data mining applications.

PubMed

Buchtala, Oliver; Klimek, Manuel; Sick, Bernhard

2005-10-01

In many data mining applications that address classification problems, feature and model selection are considered as key tasks. That is, appropriate input features of the classifier must be selected from a given (and often large) set of possible features and structure parameters of the classifier must be adapted with respect to these features and a given data set. This paper describes an evolutionary algorithm (EA) that performs feature and model selection simultaneously for radial basis function (RBF) classifiers. In order to reduce the optimization effort, various techniques are integrated that accelerate and improve the EA significantly: hybrid training of RBF networks, lazy evaluation, consideration of soft constraints by means of penalty terms, and temperature-based adaptive control of the EA. The feasibility and the benefits of the approach are demonstrated by means of four data mining problems: intrusion detection in computer networks, biometric signature verification, customer acquisition with direct marketing methods, and optimization of chemical production processes. It is shown that, compared to earlier EA-based RBF optimization techniques, the runtime is reduced by up to 99% while error rates are lowered by up to 86%, depending on the application. The algorithm is independent of specific applications so that many ideas and solutions can be transferred to other classifier paradigms.
Real estate value prediction using multivariate regression models

NASA Astrophysics Data System (ADS)

Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

2017-11-01

The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
Human Movement Detection and Idengification Using Pyroelectric Infrared Sensors

PubMed Central

Yun, Jaeseok; Lee, Sang-Shin

2014-01-01

Pyroelectric infrared (PIR) sensors are widely used as a presence trigger, but the analog output of PIR sensors depends on several other aspects, including the distance of the body from the PIR sensor, the direction and speed of movement, the body shape and gait. In this paper, we present an empirical study of human movement detection and idengification using a set of PIR sensors. We have developed a data collection module having two pairs of PIR sensors orthogonally aligned and modified Fresnel lenses. We have placed three PIR-based modules in a hallway for monitoring people; one module on the ceiling; two modules on opposite walls facing each other. We have collected a data set from eight subjects when walking in three different conditions: two directions (back and forth), three distance intervals (close to one wall sensor, in the middle, close to the other wall sensor) and three speed levels (slow, moderate, fast). We have used two types of feature sets: a raw data set and a reduced feature set composed of amplitude and time to peaks; and passage duration extracted from each PIR sensor. We have performed classification analysis with well-known machine learning algorithms, including instance-based learning and support vector machine. Our findings show that with the raw data set captured from a single PIR sensor of each of the three modules, we could achieve more than 92% accuracy in classifying the direction and speed of movement, the distance interval and idengifying subjects. We could also achieve more than 94% accuracy in classifying the direction, speed and distance and idengifying subjects using the reduced feature set extracted from two pairs of PIR sensors of each of the three modules. PMID:24803195
On-line object feature extraction for multispectral scene representation

NASA Technical Reports Server (NTRS)

Ghassemian, Hassan; Landgrebe, David

1988-01-01

A new on-line unsupervised object-feature extraction method is presented that reduces the complexity and costs associated with the analysis of the multispectral image data and data transmission, storage, archival and distribution. The ambiguity in the object detection process can be reduced if the spatial dependencies, which exist among the adjacent pixels, are intelligently incorporated into the decision making process. The unity relation was defined that must exist among the pixels of an object. Automatic Multispectral Image Compaction Algorithm (AMICA) uses the within object pixel-feature gradient vector as a valuable contextual information to construct the object's features, which preserve the class separability information within the data. For on-line object extraction the path-hypothesis and the basic mathematical tools for its realization are introduced in terms of a specific similarity measure and adjacency relation. AMICA is applied to several sets of real image data, and the performance and reliability of features is evaluated.
Managing rumor and gossip in operating room settings.

PubMed

Blakeley, J A; Ribeiro, V; Hughes, A

1996-07-01

The unique features of the operating room (OR) make it an ideal setting for the proliferation of gossip and rumor. Although not always negative, these "grapevine" communications can reduce productivity and work satisfaction. Hence, OR managers need to understand these forms of communication and prevent or control their negative consequences. The authors offer suggestions for undertaking this challenge.
Learning feature representations with a cost-relevant sparse autoencoder.

PubMed

Längkvist, Martin; Loutfi, Amy

2015-02-01

There is an increasing interest in the machine learning community to automatically learn feature representations directly from the (unlabeled) data instead of using hand-designed features. The autoencoder is one method that can be used for this purpose. However, for data sets with a high degree of noise, a large amount of the representational capacity in the autoencoder is used to minimize the reconstruction error for these noisy inputs. This paper proposes a method that improves the feature learning process by focusing on the task relevant information in the data. This selective attention is achieved by weighting the reconstruction error and reducing the influence of noisy inputs during the learning process. The proposed model is trained on a number of publicly available image data sets and the test error rate is compared to a standard sparse autoencoder and other methods, such as the denoising autoencoder and contractive autoencoder.
Input Decimated Ensembles

NASA Technical Reports Server (NTRS)

Tumer, Kagan; Oza, Nikunj C.; Clancy, Daniel (Technical Monitor)

2001-01-01

Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many pattern recognition problems. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers. Therefore, reducing those correlations while keeping the classifiers' performance levels high is an important area of research. In this article, we explore input decimation (ID), a method which selects feature subsets for their ability to discriminate among the classes and uses them to decouple the base classifiers. We provide a summary of the theoretical benefits of correlation reduction, along with results of our method on two underwater sonar data sets, three benchmarks from the Probenl/UCI repositories, and two synthetic data sets. The results indicate that input decimated ensembles (IDEs) outperform ensembles whose base classifiers use all the input features; randomly selected subsets of features; and features created using principal components analysis, on a wide range of domains.
A Dimensionally Aligned Signal Projection for Classification of Unintended Radiated Emissions

DOE PAGES

Vann, Jason Michael; Karnowski, Thomas P.; Kerekes, Ryan; ...

2017-04-24

Characterization of unintended radiated emissions (URE) from electronic devices plays an important role in many research areas from electromagnetic interference to nonintrusive load monitoring to information system security. URE can provide insights for applications ranging from load disaggregation and energy efficiency to condition-based maintenance of equipment-based upon detected fault conditions. URE characterization often requires subject matter expertise to tailor transforms and feature extractors for the specific electrical devices of interest. We present a novel approach, named dimensionally aligned signal projection (DASP), for projecting aligned signal characteristics that are inherent to the physical implementation of many commercial electronic devices. These projectionsmore » minimize the need for an intimate understanding of the underlying physical circuitry and significantly reduce the number of features required for signal classification. We present three possible DASP algorithms that leverage frequency harmonics, modulation alignments, and frequency peak spacings, along with a two-dimensional image manipulation method for statistical feature extraction. To demonstrate the ability of DASP to generate relevant features from URE, we measured the conducted URE from 14 residential electronic devices using a 2 MS/s collection system. Furthermore, a linear discriminant analysis classifier was trained using DASP generated features and was blind tested resulting in a greater than 90% classification accuracy for each of the DASP algorithms and an accuracy of 99.1% when DASP features are used in combination. Furthermore, we show that a rank reduced feature set of the combined DASP algorithms provides a 98.9% classification accuracy with only three features and outperforms a set of spectral features in terms of general classification as well as applicability across a broad number of devices.« less
A Dimensionally Aligned Signal Projection for Classification of Unintended Radiated Emissions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vann, Jason Michael; Karnowski, Thomas P.; Kerekes, Ryan

Characterization of unintended radiated emissions (URE) from electronic devices plays an important role in many research areas from electromagnetic interference to nonintrusive load monitoring to information system security. URE can provide insights for applications ranging from load disaggregation and energy efficiency to condition-based maintenance of equipment-based upon detected fault conditions. URE characterization often requires subject matter expertise to tailor transforms and feature extractors for the specific electrical devices of interest. We present a novel approach, named dimensionally aligned signal projection (DASP), for projecting aligned signal characteristics that are inherent to the physical implementation of many commercial electronic devices. These projectionsmore » minimize the need for an intimate understanding of the underlying physical circuitry and significantly reduce the number of features required for signal classification. We present three possible DASP algorithms that leverage frequency harmonics, modulation alignments, and frequency peak spacings, along with a two-dimensional image manipulation method for statistical feature extraction. To demonstrate the ability of DASP to generate relevant features from URE, we measured the conducted URE from 14 residential electronic devices using a 2 MS/s collection system. Furthermore, a linear discriminant analysis classifier was trained using DASP generated features and was blind tested resulting in a greater than 90% classification accuracy for each of the DASP algorithms and an accuracy of 99.1% when DASP features are used in combination. Furthermore, we show that a rank reduced feature set of the combined DASP algorithms provides a 98.9% classification accuracy with only three features and outperforms a set of spectral features in terms of general classification as well as applicability across a broad number of devices.« less
A feature-weighting account of priming in conjunction search.

PubMed

Becker, Stefanie I; Horstmann, Gernot

2009-02-01

Previous research on the priming effect in conjunction search has shown that repeating the target and distractor features across displays speeds mean response times but does not improve search efficiency: Repetitions do not reduce the set size effect-that is, the effect of the number of distractor items-but only modulate the intercept of the search function. In the present study, we investigated whether priming modulates search efficiency when a conjunctively defined target randomly changes between red and green. The results from an eyetracking experiment show that repeating the target across trials reduced the set size effect and, thus, did enhance search efficiency. Moreover, the probability of selecting the target as the first item in the display was higher when the target-distractor displays were repeated across trials than when they changed. Finally, red distractors were selected more frequently than green distractors when the previous target had been red (and vice versa). Taken together, these results indicate that priming in conjunction search modulates processes concerned with guiding attention to the target, by assigning more attentional weight to features sharing the previous target's color.
Multiscale intensity homogeneity transformation method and its application to computer-aided detection of pulmonary embolism in computed tomographic pulmonary angiography (CTPA)

NASA Astrophysics Data System (ADS)

Guo, Yanhui; Zhou, Chuan; Chan, Heang-Ping; Wei, Jun; Chughtai, Aamer; Sundaram, Baskaran; Hadjiiski, Lubomir M.; Patel, Smita; Kazerooni, Ella A.

2013-04-01

A 3D multiscale intensity homogeneity transformation (MIHT) method was developed to reduce false positives (FPs) in our previously developed CAD system for pulmonary embolism (PE) detection. In MIHT, the voxel intensity of a PE candidate region was transformed to an intensity homogeneity value (IHV) with respect to the local median intensity. The IHVs were calculated in multiscales (MIHVs) to measure the intensity homogeneity, taking into account vessels of different sizes and different degrees of occlusion. Seven new features including the entropy, gradient, and moments that characterized the intensity distributions of the candidate regions were derived from the MIHVs and combined with the previously designed features that described the shape and intensity of PE candidates for the training of a linear classifier to reduce the FPs. 59 CTPA PE cases were collected from our patient files (UM set) with IRB approval and 69 cases from the PIOPED II data set with access permission. 595 and 800 PEs were identified as reference standard by experienced thoracic radiologists in the UM and PIOPED set, respectively. FROC analysis was used for performance evaluation. Compared with our previous CAD system, at a test sensitivity of 80%, the new method reduced the FP rate from 18.9 to 14.1/scan for the PIOPED set when the classifier was trained with the UM set and from 22.6 to 16.0/scan vice versa. The improvement was statistically significant (p<0.05) by JAFROC analysis. This study demonstrated that the MIHT method is effective in reducing FPs and improving the performance of the CAD system.
A Semisupervised Support Vector Machines Algorithm for BCI Systems

PubMed Central

Qin, Jianzhao; Li, Yuanqing; Sun, Wei

2007-01-01

As an emerging technology, brain-computer interfaces (BCIs) bring us new communication interfaces which translate brain activities into control signals for devices like computers, robots, and so forth. In this study, we propose a semisupervised support vector machine (SVM) algorithm for brain-computer interface (BCI) systems, aiming at reducing the time-consuming training process. In this algorithm, we apply a semisupervised SVM for translating the features extracted from the electrical recordings of brain into control signals. This SVM classifier is built from a small labeled data set and a large unlabeled data set. Meanwhile, to reduce the time for training semisupervised SVM, we propose a batch-mode incremental learning method, which can also be easily applied to the online BCI systems. Additionally, it is suggested in many studies that common spatial pattern (CSP) is very effective in discriminating two different brain states. However, CSP needs a sufficient labeled data set. In order to overcome the drawback of CSP, we suggest a two-stage feature extraction method for the semisupervised learning algorithm. We apply our algorithm to two BCI experimental data sets. The offline data analysis results demonstrate the effectiveness of our algorithm. PMID:18368141
A novel algorithm for reducing false arrhythmia alarms in intensive care units.

PubMed

Srivastava, Chandan; Sharma, Sonal; Jalali, Ali

2016-08-01

Alarm fatigue in intensive care units (ICU) is one of the top healthcare issues in the US. False alarms in ICU will decrease the quality of care and staff response time over the alarms. Normally, false alarm will cause desensitization of the clinical staff which leads to warnings and misleading, if the triggered alarm is true. In this study, we have proposed a multi-model ensemble approach to reduce the false alarm rate in monitoring systems. We have used 750 patient records from PhysioNet database. At First arrhythmia based features from electrocardiogram (ECG), arterial blood pressure (ABP) and photoplethysmogram (PPG) features were extracted from the records. Next, the dataset has been separated into two subsets on the basis of available features information. The first dataset (DS1) is the combination of ECG physiological, ABP and PPG features. Their correlation coefficient and p-values criteria have been applied for relevant alarm-wise feature-set selection, and random forest classifier was used for model development and validation. The threshold based approach was used on second dataset (DS2) which is the combination of arrhythmia, ABP and PPG features. The developed ensemble model is able to achieve sensitivity 83.33-100 % (average 95.56 %) being true alarms and suppress false alarms rate 66.67-89% (average 77.25%). The predictability of classifier shows the advantage to deal with unbalanced set of information, therefore overall model performance has reached to 83.96% accuracy.
Subset selective search on the basis of color and preview.

PubMed

Donk, Mieke

2017-01-01

In the preview paradigm observers are presented with one set of elements (the irrelevant set) followed by the addition of a second set among which the target is presented (the relevant set). Search efficiency in such a preview condition has been demonstrated to be higher than that in a full-baseline condition in which both sets are simultaneously presented, suggesting that a preview of the irrelevant set reduces its influence on the search process. However, numbers of irrelevant and relevant elements are typically not independently manipulated. Moreover, subset selective search also occurs when both sets are presented simultaneously but differ in color. The aim of the present study was to investigate how numbers of irrelevant and relevant elements contribute to preview search in the absence and presence of a color difference between subsets. In two experiments it was demonstrated that a preview reduced the influence of the number of irrelevant elements in the absence but not in the presence of a color difference between subsets. In the presence of a color difference, a preview lowered the effect of the number of relevant elements but only when the target was defined by a unique feature within the relevant set (Experiment 1); when the target was defined by a conjunction of features (Experiment 2), search efficiency as a function of the number of relevant elements was not modulated by a preview. Together the results are in line with the idea that subset selective search is based on different simultaneously operating mechanisms.
Late summer sea ice segmentation with multi-polarisation SAR features in C- and X-band

NASA Astrophysics Data System (ADS)

Fors, A. S.; Brekke, C.; Doulgeris, A. P.; Eltoft, T.; Renner, A. H. H.; Gerland, S.

2015-09-01

In this study we investigate the potential of sea ice segmentation by C- and X-band multi-polarisation synthetic aperture radar (SAR) features during late summer. Five high-resolution satellite SAR scenes were recorded in the Fram Strait covering iceberg-fast first-year and old sea ice during a week with air temperatures varying around zero degrees Celsius. In situ data consisting of sea ice thickness, surface roughness and aerial photographs were collected during a helicopter flight at the site. Six polarimetric SAR features were extracted for each of the scenes. The ability of the individual SAR features to discriminate between sea ice types and their temporally consistency were examined. All SAR features were found to add value to sea ice type discrimination. Relative kurtosis, geometric brightness, cross-polarisation ratio and co-polarisation correlation angle were found to be temporally consistent in the investigated period, while co-polarisation ratio and co-polarisation correlation magnitude were found to be temporally inconsistent. An automatic feature-based segmentation algorithm was tested both for a full SAR feature set, and for a reduced SAR feature set limited to temporally consistent features. In general, the algorithm produces a good late summer sea ice segmentation. Excluding temporally inconsistent SAR features improved the segmentation at air temperatures above zero degrees Celcius.
Teens, Crime, and Rural Communities. How Youth in Rural America Can Help Reduce Violent and Property Crimes.

ERIC Educational Resources Information Center

Donovan, Erin; O'Neil, Jean F., Ed.

Featuring the Teens, Crime, and Community (TCC) program, this monograph focuses on youth crime and crime prevention in rural settings. TCC actively involves teens and adults in a partnership designed to reduce teen victimization and to encourage teens to be catalysts of change for community safety. The guide provides teachers, administrators, and…
What top-down task sets do for us: an ERP study on the benefits of advance preparation in visual search.

PubMed

Eimer, Martin; Kiss, Monika; Nicholas, Susan

2011-12-01

When target-defining features are specified in advance, attentional target selection in visual search is controlled by preparatory top-down task sets. We used ERP measures to study voluntary target selection in the absence of such feature-specific task sets, and to compare it to selection that is guided by advance knowledge about target features. Visual search arrays contained two different color singleton digits, and participants had to select one of these as target and report its parity. Target color was either known in advance (fixed color task) or had to be selected anew on each trial (free color-choice task). ERP correlates of spatially selective attentional target selection (N2pc) and working memory processing (SPCN) demonstrated rapid target selection and efficient exclusion of color singleton distractors from focal attention and working memory in the fixed color task. In the free color-choice task, spatially selective processing also emerged rapidly, but selection efficiency was reduced, with nontarget singleton digits capturing attention and gaining access to working memory. Results demonstrate the benefits of top-down task sets: Feature-specific advance preparation accelerates target selection, rapidly resolves attentional competition, and prevents irrelevant events from attracting attention and entering working memory.
Setting and changing feature priorities in visual short-term memory.

PubMed

Kalogeropoulou, Zampeta; Jagadeesh, Akshay V; Ohl, Sven; Rolfs, Martin

2017-04-01

Many everyday tasks require prioritizing some visual features over competing ones, both during the selection from the rich sensory input and while maintaining information in visual short-term memory (VSTM). Here, we show that observers can change priorities in VSTM when, initially, they attended to a different feature. Observers reported from memory the orientation of one of two spatially interspersed groups of black and white gratings. Using colored pre-cues (presented before stimulus onset) and retro-cues (presented after stimulus offset) predicting the to-be-reported group, we manipulated observers' feature priorities independently during stimulus encoding and maintenance, respectively. Valid pre-cues reliably increased observers' performance (reduced guessing, increased report precision) as compared to neutral ones; invalid pre-cues had the opposite effect. Valid retro-cues also consistently improved performance (by reducing random guesses), even if the unexpected group suddenly became relevant (invalid-valid condition). Thus, feature-based attention can reshape priorities in VSTM protecting information that would otherwise be forgotten.
Hadoop neural network for parallel and distributed feature selection.

PubMed

Hodge, Victoria J; O'Keefe, Simon; Austin, Jim

2016-06-01

In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

Design of an Adaptive Human-Machine System Based on Dynamical Pattern Recognition of Cognitive Task-Load.

PubMed

Zhang, Jianhua; Yin, Zhong; Wang, Rubin

2017-01-01

This paper developed a cognitive task-load (CTL) classification algorithm and allocation strategy to sustain the optimal operator CTL levels over time in safety-critical human-machine integrated systems. An adaptive human-machine system is designed based on a non-linear dynamic CTL classifier, which maps a set of electroencephalogram (EEG) and electrocardiogram (ECG) related features to a few CTL classes. The least-squares support vector machine (LSSVM) is used as dynamic pattern classifier. A series of electrophysiological and performance data acquisition experiments were performed on seven volunteer participants under a simulated process control task environment. The participant-specific dynamic LSSVM model is constructed to classify the instantaneous CTL into five classes at each time instant. The initial feature set, comprising 56 EEG and ECG related features, is reduced to a set of 12 salient features (including 11 EEG-related features) by using the locality preserving projection (LPP) technique. An overall correct classification rate of about 80% is achieved for the 5-class CTL classification problem. Then the predicted CTL is used to adaptively allocate the number of process control tasks between operator and computer-based controller. Simulation results showed that the overall performance of the human-machine system can be improved by using the adaptive automation strategy proposed.
Cluster compression algorithm: A joint clustering/data compression concept

NASA Technical Reports Server (NTRS)

Hilbert, E. E.

1977-01-01

The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.
Learner, Patient, and Supervisor Features Are Associated With Different Types of Cognitive Load During Procedural Skills Training: Implications for Teaching and Instructional Design.

PubMed

Sewell, Justin L; Boscardin, Christy K; Young, John Q; Ten Cate, Olle; O'Sullivan, Patricia S

2017-11-01

Cognitive load theory, focusing on limits of the working memory, is relevant to medical education; however, factors associated with cognitive load during procedural skills training are not well characterized. The authors sought to determine how features of learners, patients/tasks, settings, and supervisors were associated with three types of cognitive load among learners performing a specific procedure, colonoscopy, to identify implications for procedural teaching. Data were collected through an electronically administered survey sent to 1,061 U.S. gastroenterology fellows during the 2014-2015 academic year; 477 (45.0%) participated. Participants completed the survey immediately following a colonoscopy. Using multivariable linear regression analyses, the authors identified sets of features associated with intrinsic, extraneous, and germane loads. Features associated with intrinsic load included learners (prior experience and year in training negatively associated, fatigue positively associated) and patient/tasks (procedural complexity positively associated, better patient tolerance negatively associated). Features associated with extraneous load included learners (fatigue positively associated), setting (queue order positively associated), and supervisors (supervisor engagement and confidence negatively associated). Only one feature, supervisor engagement, was (positively) associated with germane load. These data support practical recommendations for teaching procedural skills through the lens of cognitive load theory. To optimize intrinsic load, level of experience and competence of learners should be balanced with procedural complexity; part-task approaches and scaffolding may be beneficial. To reduce extraneous load, teachers should remain engaged, and factors within the procedural setting that may interfere with learning should be minimized. To optimize germane load, teachers should remain engaged.
Unsupervised texture image segmentation by improved neural network ART2

NASA Technical Reports Server (NTRS)

Wang, Zhiling; Labini, G. Sylos; Mugnuolo, R.; Desario, Marco

1994-01-01

We here propose a segmentation algorithm of texture image for a computer vision system on a space robot. An improved adaptive resonance theory (ART2) for analog input patterns is adapted to classify the image based on a set of texture image features extracted by a fast spatial gray level dependence method (SGLDM). The nonlinear thresholding functions in input layer of the neural network have been constructed by two parts: firstly, to reduce the effects of image noises on the features, a set of sigmoid functions is chosen depending on the types of the feature; secondly, to enhance the contrast of the features, we adopt fuzzy mapping functions. The cluster number in output layer can be increased by an autogrowing mechanism constantly when a new pattern happens. Experimental results and original or segmented pictures are shown, including the comparison between this approach and K-means algorithm. The system written in C language is performed on a SUN-4/330 sparc-station with an image board IT-150 and a CCD camera.
Computer aided diagnosis based on medical image processing and artificial intelligence methods

NASA Astrophysics Data System (ADS)

Stoitsis, John; Valavanis, Ioannis; Mougiakakou, Stavroula G.; Golemati, Spyretta; Nikita, Alexandra; Nikita, Konstantina S.

2006-12-01

Advances in imaging technology and computer science have greatly enhanced interpretation of medical images, and contributed to early diagnosis. The typical architecture of a Computer Aided Diagnosis (CAD) system includes image pre-processing, definition of region(s) of interest, features extraction and selection, and classification. In this paper, the principles of CAD systems design and development are demonstrated by means of two examples. The first one focuses on the differentiation between symptomatic and asymptomatic carotid atheromatous plaques. For each plaque, a vector of texture and motion features was estimated, which was then reduced to the most robust ones by means of ANalysis of VAriance (ANOVA). Using fuzzy c-means, the features were then clustered into two classes. Clustering performances of 74%, 79%, and 84% were achieved for texture only, motion only, and combinations of texture and motion features, respectively. The second CAD system presented in this paper supports the diagnosis of focal liver lesions and is able to characterize liver tissue from Computed Tomography (CT) images as normal, hepatic cyst, hemangioma, and hepatocellular carcinoma. Five texture feature sets were extracted for each lesion, while a genetic algorithm based feature selection method was applied to identify the most robust features. The selected feature set was fed into an ensemble of neural network classifiers. The achieved classification performance was 100%, 93.75% and 90.63% in the training, validation and testing set, respectively. It is concluded that computerized analysis of medical images in combination with artificial intelligence can be used in clinical practice and may contribute to more efficient diagnosis.
Adversarial Feature Selection Against Evasion Attacks.

PubMed

Zhang, Fei; Chan, Patrick P K; Biggio, Battista; Yeung, Daniel S; Roli, Fabio

2016-03-01

Pattern recognition and machine learning techniques have been increasingly adopted in adversarial settings such as spam, intrusion, and malware detection, although their security against well-crafted attacks that aim to evade detection by manipulating data at test time has not yet been thoroughly assessed. While previous work has been mainly focused on devising adversary-aware classification algorithms to counter evasion attempts, only few authors have considered the impact of using reduced feature sets on classifier security against the same attacks. An interesting, preliminary result is that classifier security to evasion may be even worsened by the application of feature selection. In this paper, we provide a more detailed investigation of this aspect, shedding some light on the security properties of feature selection against evasion attacks. Inspired by previous work on adversary-aware classifiers, we propose a novel adversary-aware feature selection model that can improve classifier security against evasion attacks, by incorporating specific assumptions on the adversary's data manipulation strategy. We focus on an efficient, wrapper-based implementation of our approach, and experimentally validate its soundness on different application examples, including spam and malware detection.
Design of 240,000 orthogonal 25mer DNA barcode probes.

PubMed

Xu, Qikai; Schlabach, Michael R; Hannon, Gregory J; Elledge, Stephen J

2009-02-17

DNA barcodes linked to genetic features greatly facilitate screening these features in pooled formats using microarray hybridization, and new tools are needed to design large sets of barcodes to allow construction of large barcoded mammalian libraries such as shRNA libraries. Here we report a framework for designing large sets of orthogonal barcode probes. We demonstrate the utility of this framework by designing 240,000 barcode probes and testing their performance by hybridization. From the test hybridizations, we also discovered new probe design rules that significantly reduce cross-hybridization after their introduction into the framework of the algorithm. These rules should improve the performance of DNA microarray probe designs for many applications.
Design of 240,000 orthogonal 25mer DNA barcode probes

PubMed Central

Xu, Qikai; Schlabach, Michael R.; Hannon, Gregory J.; Elledge, Stephen J.

2009-01-01

DNA barcodes linked to genetic features greatly facilitate screening these features in pooled formats using microarray hybridization, and new tools are needed to design large sets of barcodes to allow construction of large barcoded mammalian libraries such as shRNA libraries. Here we report a framework for designing large sets of orthogonal barcode probes. We demonstrate the utility of this framework by designing 240,000 barcode probes and testing their performance by hybridization. From the test hybridizations, we also discovered new probe design rules that significantly reduce cross-hybridization after their introduction into the framework of the algorithm. These rules should improve the performance of DNA microarray probe designs for many applications. PMID:19171886
Design of a pulse oximeter for price sensitive emerging markets.

PubMed

Jones, Z; Woods, E; Nielson, D; Mahadevan, S V

2010-01-01

While the global market for medical devices is located primarily in developed countries, price sensitive emerging markets comprise an attractive, underserved segment in which products need a unique set of value propositions to be competitive. A pulse oximeter was designed expressly for emerging markets, and a novel feature set was implemented to reduce the cost of ownership and improve the usability of the device. Innovations included the ability of the device to generate its own electricity, a built in sensor which cuts down on operating costs, and a graphical, symbolic user interface. These features yield an average reduction of over 75% in the device cost of ownership versus comparable pulse oximeters already on the market.
Carotid artery B-mode ultrasound image segmentation based on morphology, geometry and gradient direction

NASA Astrophysics Data System (ADS)

Sunarya, I. Made Gede; Yuniarno, Eko Mulyanto; Purnomo, Mauridhi Hery; Sardjono, Tri Arief; Sunu, Ismoyo; Purnama, I. Ketut Eddy

2017-06-01

Carotid Artery (CA) is one of the vital organs in the human body. CA features that can be used are position, size and volume. Position feature can used to determine the preliminary initialization of the tracking. Examination of the CA features can use Ultrasound. Ultrasound imaging can be operated dependently by an skilled operator, hence there could be some differences in the images result obtained by two or more different operators. This can affect the process of determining of CA. To reduce the level of subjectivity among operators, it can determine the position of the CA automatically. In this study, the proposed method is to segment CA in B-Mode Ultrasound Image based on morphology, geometry and gradient direction. This study consists of three steps, the data collection, preprocessing and artery segmentation. The data used in this study were taken directly by the researchers and taken from the Brno university's signal processing lab database. Each data set contains 100 carotid artery B-Mode ultrasound image. Artery is modeled using ellipse with center c, major axis a and minor axis b. The proposed method has a high value on each data set, 97% (data set 1), 73 % (data set 2), 87% (data set 3). This segmentation results will then be used in the process of tracking the CA.
Oversampling the Minority Class in the Feature Space.

PubMed

Perez-Ortiz, Maria; Gutierrez, Pedro Antonio; Tino, Peter; Hervas-Martinez, Cesar

2016-09-01

The imbalanced nature of some real-world data is one of the current challenges for machine learning researchers. One common approach oversamples the minority class through convex combination of its patterns. We explore the general idea of synthetic oversampling in the feature space induced by a kernel function (as opposed to input space). If the kernel function matches the underlying problem, the classes will be linearly separable and synthetically generated patterns will lie on the minority class region. Since the feature space is not directly accessible, we use the empirical feature space (EFS) (a Euclidean space isomorphic to the feature space) for oversampling purposes. The proposed method is framed in the context of support vector machines, where the imbalanced data sets can pose a serious hindrance. The idea is investigated in three scenarios: 1) oversampling in the full and reduced-rank EFSs; 2) a kernel learning technique maximizing the data class separation to study the influence of the feature space structure (implicitly defined by the kernel function); and 3) a unified framework for preferential oversampling that spans some of the previous approaches in the literature. We support our investigation with extensive experiments over 50 imbalanced data sets.
Paving the COWpath: data-driven design of pediatric order sets

PubMed Central

Zhang, Yiye; Padman, Rema; Levin, James E

2014-01-01

Objective Evidence indicates that users incur significant physical and cognitive costs in the use of order sets, a core feature of computerized provider order entry systems. This paper develops data-driven approaches for automating the construction of order sets that match closely with user preferences and workflow while minimizing physical and cognitive workload. Materials and methods We developed and tested optimization-based models embedded with clustering techniques using physical and cognitive click cost criteria. By judiciously learning from users’ actual actions, our methods identify items for constituting order sets that are relevant according to historical ordering data and grouped on the basis of order similarity and ordering time. We evaluated performance of the methods using 47 099 orders from the year 2011 for asthma, appendectomy and pneumonia management in a pediatric inpatient setting. Results In comparison with existing order sets, those developed using the new approach significantly reduce the physical and cognitive workload associated with usage by 14–52%. This approach is also capable of accommodating variations in clinical conditions that affect order set usage and development. Discussion There is a critical need to investigate the cognitive complexity imposed on users by complex clinical information systems, and to design their features according to ‘human factors’ best practices. Optimizing order set generation using cognitive cost criteria introduces a new approach that can potentially improve ordering efficiency, reduce unintended variations in order placement, and enhance patient safety. Conclusions We demonstrate that data-driven methods offer a promising approach for designing order sets that are generalizable, data-driven, condition-based, and up to date with current best practices. PMID:24674844
Implementation of a smartphone as a wireless gyroscope platform for quantifying reduced arm swing in hemiplegie gait with machine learning classification by multilayer perceptron neural network.

PubMed

LeMoyne, Robert; Mastroianni, Timothy

2016-08-01

Natural gait consists of synchronous and rhythmic patterns for both the lower and upper limb. People with hemiplegia can experience reduced arm swing, which can negatively impact the quality of gait. Wearable and wireless sensors, such as through a smartphone, have demonstrated the ability to quantify various features of gait. With a software application the smartphone (iPhone) can function as a wireless gyroscope platform capable of conveying a gyroscope signal recording as an email attachment by wireless connectivity to the Internet. The gyroscope signal recordings of the affected hemiplegic arm with reduced arm swing arm and the unaffected arm are post-processed into a feature set for machine learning. Using a multilayer perceptron neural network a considerable degree of classification accuracy is attained to distinguish between the affected hemiplegic arm with reduced arm swing arm and the unaffected arm.
Discontinuous Galerkin algorithms for fully kinetic plasmas

DOE Office of Scientific and Technical Information (OSTI.GOV)

Juno, J.; Hakim, A.; TenBarge, J.

Here, we present a new algorithm for the discretization of the non-relativistic Vlasov–Maxwell system of equations for the study of plasmas in the kinetic regime. Using the discontinuous Galerkin finite element method for the spatial discretization, we obtain a high order accurate solution for the plasma's distribution function. Time stepping for the distribution function is done explicitly with a third order strong-stability preserving Runge–Kutta method. Since the Vlasov equation in the Vlasov–Maxwell system is a high dimensional transport equation, up to six dimensions plus time, we take special care to note various features we have implemented to reduce the costmore » while maintaining the integrity of the solution, including the use of a reduced high-order basis set. A series of benchmarks, from simple wave and shock calculations, to a five dimensional turbulence simulation, are presented to verify the efficacy of our set of numerical methods, as well as demonstrate the power of the implemented features.« less
Discontinuous Galerkin algorithms for fully kinetic plasmas

DOE PAGES

Juno, J.; Hakim, A.; TenBarge, J.; ...

2017-10-10

Here, we present a new algorithm for the discretization of the non-relativistic Vlasov–Maxwell system of equations for the study of plasmas in the kinetic regime. Using the discontinuous Galerkin finite element method for the spatial discretization, we obtain a high order accurate solution for the plasma's distribution function. Time stepping for the distribution function is done explicitly with a third order strong-stability preserving Runge–Kutta method. Since the Vlasov equation in the Vlasov–Maxwell system is a high dimensional transport equation, up to six dimensions plus time, we take special care to note various features we have implemented to reduce the costmore » while maintaining the integrity of the solution, including the use of a reduced high-order basis set. A series of benchmarks, from simple wave and shock calculations, to a five dimensional turbulence simulation, are presented to verify the efficacy of our set of numerical methods, as well as demonstrate the power of the implemented features.« less
On Quantitative Biomarkers of VNS Therapy Using EEG and ECG Signals.

PubMed

Ravan, Maryam; Sabesan, Shivkumar; D'Cruz, O'Neill

2017-02-01

The goal of this work is to objectively evaluate the effectiveness of neuromodulation therapies, specifically, Vagus nerve stimulation (VNS) in reducing the severity of seizures in patients with medically refractory epilepsy. Using novel quantitative features obtained from combination of electroencephalographic (EEG) and electrocardiographic (ECG) signals around seizure events in 16 patients who underwent implantation of closed-loop VNS therapy system, namely AspireSR, we evaluated if automated delivery of VNS at the time of seizure onset reduces the severity of seizures by reducing EEG spatial synchronization as well as the duration and magnitude of heart rate increase. Unsupervised classification was subsequently applied to test the discriminative ability and validity of these features to measure responsiveness to VNS therapy. Results of application of this methodology to compare 105 pre-VNS treatment and 107 post-VNS treatment seizures revealed that seizures that were acutely stimulated using VNS had a reduced ictal spread as well as reduced impact on cardiovascular function compared to the ones that occurred prior to any treatment. Furthermore, application of an unsupervised fuzzy-c-mean classifier to evaluate the ability of the combined EEG-ECG based features to classify pre and post-treatment seizures achieved a classification accuracy of 85.85%. These results indicate the importance of timely delivery of VNS to reduce seizure severity and thus help achieve better seizure control for patients with epilepsy. The proposed set of quantitative features could be used as potential biomarkers for predicting long-term response to VNS therapy.
An Optimization-Based Method for Feature Ranking in Nonlinear Regression Problems.

PubMed

Bravi, Luca; Piccialli, Veronica; Sciandrone, Marco

2017-04-01

In this paper, we consider the feature ranking problem, where, given a set of training instances, the task is to associate a score with the features in order to assess their relevance. Feature ranking is a very important tool for decision support systems, and may be used as an auxiliary step of feature selection to reduce the high dimensionality of real-world data. We focus on regression problems by assuming that the process underlying the generated data can be approximated by a continuous function (for instance, a feedforward neural network). We formally state the notion of relevance of a feature by introducing a minimum zero-norm inversion problem of a neural network, which is a nonsmooth, constrained optimization problem. We employ a concave approximation of the zero-norm function, and we define a smooth, global optimization problem to be solved in order to assess the relevance of the features. We present the new feature ranking method based on the solution of instances of the global optimization problem depending on the available training data. Computational experiments on both artificial and real data sets are performed, and point out that the proposed feature ranking method is a valid alternative to existing methods in terms of effectiveness. The obtained results also show that the method is costly in terms of CPU time, and this may be a limitation in the solution of large-dimensional problems.
Margin-maximizing feature elimination methods for linear and nonlinear kernel-based discriminant functions.

PubMed

Aksu, Yaman; Miller, David J; Kesidis, George; Yang, Qing X

2010-05-01

Feature selection for classification in high-dimensional spaces can improve generalization, reduce classifier complexity, and identify important, discriminating feature "markers." For support vector machine (SVM) classification, a widely used technique is recursive feature elimination (RFE). We demonstrate that RFE is not consistent with margin maximization, central to the SVM learning approach. We thus propose explicit margin-based feature elimination (MFE) for SVMs and demonstrate both improved margin and improved generalization, compared with RFE. Moreover, for the case of a nonlinear kernel, we show that RFE assumes that the squared weight vector 2-norm is strictly decreasing as features are eliminated. We demonstrate this is not true for the Gaussian kernel and, consequently, RFE may give poor results in this case. MFE for nonlinear kernels gives better margin and generalization. We also present an extension which achieves further margin gains, by optimizing only two degrees of freedom--the hyperplane's intercept and its squared 2-norm--with the weight vector orientation fixed. We finally introduce an extension that allows margin slackness. We compare against several alternatives, including RFE and a linear programming method that embeds feature selection within the classifier design. On high-dimensional gene microarray data sets, University of California at Irvine (UCI) repository data sets, and Alzheimer's disease brain image data, MFE methods give promising results.
Convolutional neural network approach for enhanced capture of breast parenchymal complexity patterns associated with breast cancer risk

NASA Astrophysics Data System (ADS)

Oustimov, Andrew; Gastounioti, Aimilia; Hsieh, Meng-Kang; Pantalone, Lauren; Conant, Emily F.; Kontos, Despina

2017-03-01

We assess the feasibility of a parenchymal texture feature fusion approach, utilizing a convolutional neural network (ConvNet) architecture, to benefit breast cancer risk assessment. Hypothesizing that by capturing sparse, subtle interactions between localized motifs present in two-dimensional texture feature maps derived from mammographic images, a multitude of texture feature descriptors can be optimally reduced to five meta-features capable of serving as a basis on which a linear classifier, such as logistic regression, can efficiently assess breast cancer risk. We combine this methodology with our previously validated lattice-based strategy for parenchymal texture analysis and we evaluate the feasibility of this approach in a case-control study with 424 digital mammograms. In a randomized split-sample setting, we optimize our framework in training/validation sets (N=300) and evaluate its descriminatory performance in an independent test set (N=124). The discriminatory capacity is assessed in terms of the the area under the curve (AUC) of the receiver operator characteristic (ROC). The resulting meta-features exhibited strong classification capability in the test dataset (AUC = 0.90), outperforming conventional, non-fused, texture analysis which previously resulted in an AUC=0.85 on the same case-control dataset. Our results suggest that informative interactions between localized motifs exist and can be extracted and summarized via a fairly simple ConvNet architecture.
Improving the Accuracy and Training Speed of Motor Imagery Brain-Computer Interfaces Using Wavelet-Based Combined Feature Vectors and Gaussian Mixture Model-Supervectors.

PubMed

Lee, David; Park, Sang-Hoon; Lee, Sang-Goog

2017-10-07

In this paper, we propose a set of wavelet-based combined feature vectors and a Gaussian mixture model (GMM)-supervector to enhance training speed and classification accuracy in motor imagery brain-computer interfaces. The proposed method is configured as follows: first, wavelet transforms are applied to extract the feature vectors for identification of motor imagery electroencephalography (EEG) and principal component analyses are used to reduce the dimensionality of the feature vectors and linearly combine them. Subsequently, the GMM universal background model is trained by the expectation-maximization (EM) algorithm to purify the training data and reduce its size. Finally, a purified and reduced GMM-supervector is used to train the support vector machine classifier. The performance of the proposed method was evaluated for three different motor imagery datasets in terms of accuracy, kappa, mutual information, and computation time, and compared with the state-of-the-art algorithms. The results from the study indicate that the proposed method achieves high accuracy with a small amount of training data compared with the state-of-the-art algorithms in motor imagery EEG classification.

Proportional plus integral MIMO controller for regulation and tracking with anti-wind-up features

DOE Office of Scientific and Technical Information (OSTI.GOV)

Puleston, P.F.; Mantz, R.J.

1993-11-01

A proportional plus integral matrix control structure for MIMO systems is proposed. Based on a standard optimal control structure with integral action, it permits a greater degree of independence of the design and tuning of the regulating and tracking features, without considerably increasing the controller complexity. Fast recovery from load disturbances is achieved, while large overshoots associated with set-point changes and reset wind-up problems can be reduced. A simple effective procedure for practical tuning is introduced.
Feature generation using genetic programming with application to fault classification.

PubMed

Guo, Hong; Jack, Lindsay B; Nandi, Asoke K

2005-02-01

One of the major challenges in pattern recognition problems is the feature extraction process which derives new features from existing features, or directly from raw data in order to reduce the cost of computation during the classification process, while improving classifier efficiency. Most current feature extraction techniques transform the original pattern vector into a new vector with increased discrimination capability but lower dimensionality. This is conducted within a predefined feature space, and thus, has limited searching power. Genetic programming (GP) can generate new features from the original dataset without prior knowledge of the probabilistic distribution. In this paper, a GP-based approach is developed for feature extraction from raw vibration data recorded from a rotating machine with six different conditions. The created features are then used as the inputs to a neural classifier for the identification of six bearing conditions. Experimental results demonstrate the ability of GP to discover autimatically the different bearing conditions using features expressed in the form of nonlinear functions. Furthermore, four sets of results--using GP extracted features with artificial neural networks (ANN) and support vector machines (SVM), as well as traditional features with ANN and SVM--have been obtained. This GP-based approach is used for bearing fault classification for the first time and exhibits superior searching power over other techniques. Additionaly, it significantly reduces the time for computation compared with genetic algorithm (GA), therefore, makes a more practical realization of the solution.
Self-organizing neural networks--an alternative way of cluster analysis in clinical chemistry.

PubMed

Reibnegger, G; Wachter, H

1996-04-15

Supervised learning schemes have been employed by several workers for training neural networks designed to solve clinical problems. We demonstrate that unsupervised techniques can also produce interesting and meaningful results. Using a data set on the chemical composition of milk from 22 different mammals, we demonstrate that self-organizing feature maps (Kohonen networks) as well as a modified version of error backpropagation technique yield results mimicking conventional cluster analysis. Both techniques are able to project a potentially multi-dimensional input vector onto a two-dimensional space whereby neighborhood relationships remain conserved. Thus, these techniques can be used for reducing dimensionality of complicated data sets and for enhancing comprehensibility of features hidden in the data matrix.
Late-summer sea ice segmentation with multi-polarisation SAR features in C and X band

NASA Astrophysics Data System (ADS)

Fors, Ane S.; Brekke, Camilla; Doulgeris, Anthony P.; Eltoft, Torbjørn; Renner, Angelika H. H.; Gerland, Sebastian

2016-02-01

In this study, we investigate the potential of sea ice segmentation by C- and X-band multi-polarisation synthetic aperture radar (SAR) features during late summer. Five high-resolution satellite SAR scenes were recorded in the Fram Strait covering iceberg-fast first-year and old sea ice during a week with air temperatures varying around 0 °C. Sea ice thickness, surface roughness and aerial photographs were collected during a helicopter flight at the site. Six polarimetric SAR features were extracted for each of the scenes. The ability of the individual SAR features to discriminate between sea ice types and their temporal consistency were examined. All SAR features were found to add value to sea ice type discrimination. Relative kurtosis, geometric brightness, cross-polarisation ratio and co-polarisation correlation angle were found to be temporally consistent in the investigated period, while co-polarisation ratio and co-polarisation correlation magnitude were found to be temporally inconsistent. An automatic feature-based segmentation algorithm was tested both for a full SAR feature set and for a reduced SAR feature set limited to temporally consistent features. In C band, the algorithm produced a good late-summer sea ice segmentation, separating the scenes into segments that could be associated with different sea ice types in the next step. The X-band performance was slightly poorer. Excluding temporally inconsistent SAR features improved the segmentation in one of the X-band scenes.
Portable automatic text classification for adverse drug reaction detection via multi-corpus training.

PubMed

Sarker, Abeed; Gonzalez, Graciela

2015-02-01

Automatic detection of adverse drug reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media-where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing (NLP) approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-corpus Training

PubMed Central

Gonzalez, Graciela

2014-01-01

Objective Automatic detection of Adverse Drug Reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media — where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. Methods One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Results Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Conclusions Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. PMID:25451103
An accelerated training method for back propagation networks

NASA Technical Reports Server (NTRS)

Shelton, Robert O. (Inventor)

1993-01-01

The principal objective is to provide a training procedure for a feed forward, back propagation neural network which greatly accelerates the training process. A set of orthogonal singular vectors are determined from the input matrix such that the standard deviations of the projections of the input vectors along these singular vectors, as a set, are substantially maximized, thus providing an optimal means of presenting the input data. Novelty exists in the method of extracting from the set of input data, a set of features which can serve to represent the input data in a simplified manner, thus greatly reducing the time/expense to training the system.
CAFÉ-Map: Context Aware Feature Mapping for mining high dimensional biomedical data.

PubMed

Minhas, Fayyaz Ul Amir Afsar; Asif, Amina; Arif, Muhammad

2016-12-01

Feature selection and ranking is of great importance in the analysis of biomedical data. In addition to reducing the number of features used in classification or other machine learning tasks, it allows us to extract meaningful biological and medical information from a machine learning model. Most existing approaches in this domain do not directly model the fact that the relative importance of features can be different in different regions of the feature space. In this work, we present a context aware feature ranking algorithm called CAFÉ-Map. CAFÉ-Map is a locally linear feature ranking framework that allows recognition of important features in any given region of the feature space or for any individual example. This allows for simultaneous classification and feature ranking in an interpretable manner. We have benchmarked CAFÉ-Map on a number of toy and real world biomedical data sets. Our comparative study with a number of published methods shows that CAFÉ-Map achieves better accuracies on these data sets. The top ranking features obtained through CAFÉ-Map in a gene profiling study correlate very well with the importance of different genes reported in the literature. Furthermore, CAFÉ-Map provides a more in-depth analysis of feature ranking at the level of individual examples. CAFÉ-Map Python code is available at: http://faculty.pieas.edu.pk/fayyaz/software.html#cafemap . The CAFÉ-Map package supports parallelization and sparse data and provides example scripts for classification. This code can be used to reconstruct the results given in this paper. Copyright © 2016 Elsevier Ltd. All rights reserved.
Variable Selection for Road Segmentation in Aerial Images

NASA Astrophysics Data System (ADS)

Warnke, S.; Bulatov, D.

2017-05-01

For extraction of road pixels from combined image and elevation data, Wegner et al. (2015) proposed classification of superpixels into road and non-road, after which a refinement of the classification results using minimum cost paths and non-local optimization methods took place. We believed that the variable set used for classification was to a certain extent suboptimal, because many variables were redundant while several features known as useful in Photogrammetry and Remote Sensing are missed. This motivated us to implement a variable selection approach which builds a model for classification using portions of training data and subsets of features, evaluates this model, updates the feature set, and terminates when a stopping criterion is satisfied. The choice of classifier is flexible; however, we tested the approach with Logistic Regression and Random Forests, and taylored the evaluation module to the chosen classifier. To guarantee a fair comparison, we kept the segment-based approach and most of the variables from the related work, but we extended them by additional, mostly higher-level features. Applying these superior features, removing the redundant ones, as well as using more accurately acquired 3D data allowed to keep stable or even to reduce the misclassification error in a challenging dataset.
iFER: facial expression recognition using automatically selected geometric eye and eyebrow features

NASA Astrophysics Data System (ADS)

Oztel, Ismail; Yolcu, Gozde; Oz, Cemil; Kazan, Serap; Bunyak, Filiz

2018-03-01

Facial expressions have an important role in interpersonal communications and estimation of emotional states or intentions. Automatic recognition of facial expressions has led to many practical applications and became one of the important topics in computer vision. We present a facial expression recognition system that relies on geometry-based features extracted from eye and eyebrow regions of the face. The proposed system detects keypoints on frontal face images and forms a feature set using geometric relationships among groups of detected keypoints. Obtained feature set is refined and reduced using the sequential forward selection (SFS) algorithm and fed to a support vector machine classifier to recognize five facial expression classes. The proposed system, iFER (eye-eyebrow only facial expression recognition), is robust to lower face occlusions that may be caused by beards, mustaches, scarves, etc. and lower face motion during speech production. Preliminary experiments on benchmark datasets produced promising results outperforming previous facial expression recognition studies using partial face features, and comparable results to studies using whole face information, only slightly lower by ˜ 2.5 % compared to the best whole face facial recognition system while using only ˜ 1 / 3 of the facial region.
Derivation and validation of different machine-learning models in mortality prediction of trauma in motorcycle riders: a cross-sectional retrospective study in southern Taiwan

PubMed Central

Kuo, Pao-Jen; Wu, Shao-Chun; Chien, Peng-Chen; Rau, Cheng-Shyuan; Chen, Yi-Chun; Hsieh, Hsiao-Yun; Hsieh, Ching-Hua

2018-01-01

Objectives This study aimed to build and test the models of machine learning (ML) to predict the mortality of hospitalised motorcycle riders. Setting The study was conducted in a level-1 trauma centre in southern Taiwan. Participants Motorcycle riders who were hospitalised between January 2009 and December 2015 were classified into a training set (n=6306) and test set (n=946). Using the demographic information, injury characteristics and laboratory data of patients, logistic regression (LR), support vector machine (SVM) and decision tree (DT) analyses were performed to determine the mortality of individual motorcycle riders, under different conditions, using all samples or reduced samples, as well as all variables or selected features in the algorithm. Primary and secondary outcome measures The predictive performance of the model was evaluated based on accuracy, sensitivity, specificity and geometric mean, and an analysis of the area under the receiver operating characteristic curves of the two different models was carried out. Results In the training set, both LR and SVM had a significantly higher area under the receiver operating characteristic curve (AUC) than DT. No significant difference was observed in the AUC of LR and SVM, regardless of whether all samples or reduced samples and whether all variables or selected features were used. In the test set, the performance of the SVM model for all samples with selected features was better than that of all other models, with an accuracy of 98.73%, sensitivity of 86.96%, specificity of 99.02%, geometric mean of 92.79% and AUC of 0.9517, in mortality prediction. Conclusion ML can provide a feasible level of accuracy in predicting the mortality of motorcycle riders. Integration of the ML model, particularly the SVM algorithm in the trauma system, may help identify high-risk patients and, therefore, guide appropriate interventions by the clinical staff. PMID:29306885
A data-driven feature extraction framework for predicting the severity of condition of congestive heart failure patients.

PubMed

Sideris, Costas; Alshurafa, Nabil; Pourhomayoun, Mohammad; Shahmohammadi, Farhad; Samy, Lauren; Sarrafzadeh, Majid

2015-01-01

In this paper, we propose a novel methodology for utilizing disease diagnostic information to predict severity of condition for Congestive Heart Failure (CHF) patients. Our methodology relies on a novel, clustering-based, feature extraction framework using disease diagnostic information. To reduce the dimensionality we identify disease clusters using cooccurence frequencies. We then utilize these clusters as features to predict patient severity of condition. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 patients. We compare our cluster-based feature set with another that incorporates the Charlson comorbidity score as a feature and demonstrate an accuracy improvement of up to 14% in the predictability of the severity of condition.
A Meta-Analytic Review of School-Based Prevention for Cannabis Use

ERIC Educational Resources Information Center

Porath-Waller, Amy J.; Beasley, Erin; Beirness, Douglas J.

2010-01-01

This investigation used meta-analytic techniques to evaluate the effectiveness of school-based prevention programming in reducing cannabis use among youth aged 12 to 19. It summarized the results from 15 studies published in peer-reviewed journals since 1999 and identified features that influenced program effectiveness. The results from the set of…
Detection of epileptiform activity in EEG signals based on time-frequency and non-linear analysis

PubMed Central

Gajic, Dragoljub; Djurovic, Zeljko; Gligorijevic, Jovan; Di Gennaro, Stefano; Savic-Gajic, Ivana

2015-01-01

We present a new technique for detection of epileptiform activity in EEG signals. After preprocessing of EEG signals we extract representative features in time, frequency and time-frequency domain as well as using non-linear analysis. The features are extracted in a few frequency sub-bands of clinical interest since these sub-bands showed much better discriminatory characteristics compared with the whole frequency band. Then we optimally reduce the dimension of feature space to two using scatter matrices. A decision about the presence of epileptiform activity in EEG signals is made by quadratic classifiers designed in the reduced two-dimensional feature space. The accuracy of the technique was tested on three sets of electroencephalographic (EEG) signals recorded at the University Hospital Bonn: surface EEG signals from healthy volunteers, intracranial EEG signals from the epilepsy patients during the seizure free interval from within the seizure focus and intracranial EEG signals of epileptic seizures also from within the seizure focus. An overall detection accuracy of 98.7% was achieved. PMID:25852534
Recognising discourse causality triggers in the biomedical domain.

PubMed

Mihăilă, Claudiu; Ananiadou, Sophia

2013-12-01

Current domain-specific information extraction systems represent an important resource for biomedical researchers, who need to process vast amounts of knowledge in a short time. Automatic discourse causality recognition can further reduce their workload by suggesting possible causal connections and aiding in the curation of pathway models. We describe here an approach to the automatic identification of discourse causality triggers in the biomedical domain using machine learning. We create several baselines and experiment with and compare various parameter settings for three algorithms, i.e. Conditional Random Fields (CRF), Support Vector Machines (SVM) and Random Forests (RF). We also evaluate the impact of lexical, syntactic, and semantic features on each of the algorithms, showing that semantics improves the performance in all cases. We test our comprehensive feature set on two corpora containing gold standard annotations of causal relations, and demonstrate the need for more gold standard data. The best performance of 79.35% F-score is achieved by CRFs when using all three feature types.
Palm vein recognition based on directional empirical mode decomposition

NASA Astrophysics Data System (ADS)

Lee, Jen-Chun; Chang, Chien-Ping; Chen, Wei-Kuei

2014-04-01

Directional empirical mode decomposition (DEMD) has recently been proposed to make empirical mode decomposition suitable for the processing of texture analysis. Using DEMD, samples are decomposed into a series of images, referred to as two-dimensional intrinsic mode functions (2-D IMFs), from finer to large scale. A DEMD-based 2 linear discriminant analysis (LDA) for palm vein recognition is proposed. The proposed method progresses through three steps: (i) a set of 2-D IMF features of various scale and orientation are extracted using DEMD, (ii) the 2LDA method is then applied to reduce the dimensionality of the feature space in both the row and column directions, and (iii) the nearest neighbor classifier is used for classification. We also propose two strategies for using the set of 2-D IMF features: ensemble DEMD vein representation (EDVR) and multichannel DEMD vein representation (MDVR). In experiments using palm vein databases, the proposed MDVR-based 2LDA method achieved recognition accuracy of 99.73%, thereby demonstrating its feasibility for palm vein recognition.
Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.

PubMed

Schädler, Marc René; Kollmeier, Birger

2015-04-01

To test if simultaneous spectral and temporal processing is required to extract robust features for automatic speech recognition (ASR), the robust spectro-temporal two-dimensional-Gabor filter bank (GBFB) front-end from Schädler, Meyer, and Kollmeier [J. Acoust. Soc. Am. 131, 4134-4151 (2012)] was de-composed into a spectral one-dimensional-Gabor filter bank and a temporal one-dimensional-Gabor filter bank. A feature set that is extracted with these separate spectral and temporal modulation filter banks was introduced, the separate Gabor filter bank (SGBFB) features, and evaluated on the CHiME (Computational Hearing in Multisource Environments) keywords-in-noise recognition task. From the perspective of robust ASR, the results showed that spectral and temporal processing can be performed independently and are not required to interact with each other. Using SGBFB features permitted the signal-to-noise ratio (SNR) to be lowered by 1.2 dB while still performing as well as the GBFB-based reference system, which corresponds to a relative improvement of the word error rate by 12.8%. Additionally, the real time factor of the spectro-temporal processing could be reduced by more than an order of magnitude. Compared to human listeners, the SNR needed to be 13 dB higher when using Mel-frequency cepstral coefficient features, 11 dB higher when using GBFB features, and 9 dB higher when using SGBFB features to achieve the same recognition performance.
Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking.

PubMed

Yu, Jun; Yang, Xiaokang; Gao, Fei; Tao, Dacheng

2017-12-01

How do we retrieve images accurately? Also, how do we rank a group of images precisely and efficiently for specific queries? These problems are critical for researchers and engineers to generate a novel image searching engine. First, it is important to obtain an appropriate description that effectively represent the images. In this paper, multimodal features are considered for describing images. The images unique properties are reflected by visual features, which are correlated to each other. However, semantic gaps always exist between images visual features and semantics. Therefore, we utilize click feature to reduce the semantic gap. The second key issue is learning an appropriate distance metric to combine these multimodal features. This paper develops a novel deep multimodal distance metric learning (Deep-MDML) method. A structured ranking model is adopted to utilize both visual and click features in distance metric learning (DML). Specifically, images and their related ranking results are first collected to form the training set. Multimodal features, including click and visual features, are collected with these images. Next, a group of autoencoders is applied to obtain initially a distance metric in different visual spaces, and an MDML method is used to assign optimal weights for different modalities. Next, we conduct alternating optimization to train the ranking model, which is used for the ranking of new queries with click features. Compared with existing image ranking methods, the proposed method adopts a new ranking model to use multimodal features, including click features and visual features in DML. We operated experiments to analyze the proposed Deep-MDML in two benchmark data sets, and the results validate the effects of the method.
Multi-source feature extraction and target recognition in wireless sensor networks based on adaptive distributed wavelet compression algorithms

NASA Astrophysics Data System (ADS)

Hortos, William S.

2008-04-01

Proposed distributed wavelet-based algorithms are a means to compress sensor data received at the nodes forming a wireless sensor network (WSN) by exchanging information between neighboring sensor nodes. Local collaboration among nodes compacts the measurements, yielding a reduced fused set with equivalent information at far fewer nodes. Nodes may be equipped with multiple sensor types, each capable of sensing distinct phenomena: thermal, humidity, chemical, voltage, or image signals with low or no frequency content as well as audio, seismic or video signals within defined frequency ranges. Compression of the multi-source data through wavelet-based methods, distributed at active nodes, reduces downstream processing and storage requirements along the paths to sink nodes; it also enables noise suppression and more energy-efficient query routing within the WSN. Targets are first detected by the multiple sensors; then wavelet compression and data fusion are applied to the target returns, followed by feature extraction from the reduced data; feature data are input to target recognition/classification routines; targets are tracked during their sojourns through the area monitored by the WSN. Algorithms to perform these tasks are implemented in a distributed manner, based on a partition of the WSN into clusters of nodes. In this work, a scheme of collaborative processing is applied for hierarchical data aggregation and decorrelation, based on the sensor data itself and any redundant information, enabled by a distributed, in-cluster wavelet transform with lifting that allows multiple levels of resolution. The wavelet-based compression algorithm significantly decreases RF bandwidth and other resource use in target processing tasks. Following wavelet compression, features are extracted. The objective of feature extraction is to maximize the probabilities of correct target classification based on multi-source sensor measurements, while minimizing the resource expenditures at participating nodes. Therefore, the feature-extraction method based on the Haar DWT is presented that employs a maximum-entropy measure to determine significant wavelet coefficients. Features are formed by calculating the energy of coefficients grouped around the competing clusters. A DWT-based feature extraction algorithm used for vehicle classification in WSNs can be enhanced by an added rule for selecting the optimal number of resolution levels to improve the correct classification rate and reduce energy consumption expended in local algorithm computations. Published field trial data for vehicular ground targets, measured with multiple sensor types, are used to evaluate the wavelet-assisted algorithms. Extracted features are used in established target recognition routines, e.g., the Bayesian minimum-error-rate classifier, to compare the effects on the classification performance of the wavelet compression. Simulations of feature sets and recognition routines at different resolution levels in target scenarios indicate the impact on classification rates, while formulas are provided to estimate reduction in resource use due to distributed compression.
Geminal-spanning orbitals make explicitly correlated reduced-scaling coupled-cluster methods robust, yet simple

NASA Astrophysics Data System (ADS)

Pavošević, Fabijan; Neese, Frank; Valeev, Edward F.

2014-08-01

We present a production implementation of reduced-scaling explicitly correlated (F12) coupled-cluster singles and doubles (CCSD) method based on pair-natural orbitals (PNOs). A key feature is the reformulation of the explicitly correlated terms using geminal-spanning orbitals that greatly reduce the truncation errors of the F12 contribution. For the standard S66 benchmark of weak intermolecular interactions, the cc-pVDZ-F12 PNO CCSD F12 interaction energies reproduce the complete basis set CCSD limit with mean absolute error <0.1 kcal/mol, and at a greatly reduced cost compared to the conventional CCSD F12.

Unsupervised Spatio-Temporal Data Mining Framework for Burned Area Mapping

NASA Technical Reports Server (NTRS)

Kumar, Vipin (Inventor); Boriah, Shyam (Inventor); Mithal, Varun (Inventor); Khandelwal, Ankush (Inventor)

2016-01-01

A method reduces processing time required to identify locations burned by fire by receiving a feature value for each pixel in an image, each pixel representing a sub-area of a location. Pixels are then grouped based on similarities of the feature values to form candidate burn events. For each candidate burn event, a probability that the candidate burn event is a true burn event is determined based on at least one further feature value for each pixel in the candidate burn event. Candidate burn events that have a probability below a threshold are removed from further consideration as burn events to produce a set of remaining candidate burn events.
An Interval Type-2 Neural Fuzzy System for Online System Identification and Feature Elimination.

PubMed

Lin, Chin-Teng; Pal, Nikhil R; Wu, Shang-Lin; Liu, Yu-Ting; Lin, Yang-Yin

2015-07-01

We propose an integrated mechanism for discarding derogatory features and extraction of fuzzy rules based on an interval type-2 neural fuzzy system (NFS)-in fact, it is a more general scheme that can discard bad features, irrelevant antecedent clauses, and even irrelevant rules. High-dimensional input variable and a large number of rules not only enhance the computational complexity of NFSs but also reduce their interpretability. Therefore, a mechanism for simultaneous extraction of fuzzy rules and reducing the impact of (or eliminating) the inferior features is necessary. The proposed approach, namely an interval type-2 Neural Fuzzy System for online System Identification and Feature Elimination (IT2NFS-SIFE), uses type-2 fuzzy sets to model uncertainties associated with information and data in designing the knowledge base. The consequent part of the IT2NFS-SIFE is of Takagi-Sugeno-Kang type with interval weights. The IT2NFS-SIFE possesses a self-evolving property that can automatically generate fuzzy rules. The poor features can be discarded through the concept of a membership modulator. The antecedent and modulator weights are learned using a gradient descent algorithm. The consequent part weights are tuned via the rule-ordered Kalman filter algorithm to enhance learning effectiveness. Simulation results show that IT2NFS-SIFE not only simplifies the system architecture by eliminating derogatory/irrelevant antecedent clauses, rules, and features but also maintains excellent performance.
An AIS-Based E-mail Classification Method

NASA Astrophysics Data System (ADS)

Qing, Jinjian; Mao, Ruilong; Bie, Rongfang; Gao, Xiao-Zhi

This paper proposes a new e-mail classification method based on the Artificial Immune System (AIS), which is endowed with good diversity and self-adaptive ability by using the immune learning, immune memory, and immune recognition. In our method, the features of spam and non-spam extracted from the training sets are combined together, and the number of false positives (non-spam messages that are incorrectly classified as spam) can be reduced. The experimental results demonstrate that this method is effective in reducing the false rate.
A method for feature selection of APT samples based on entropy

NASA Astrophysics Data System (ADS)

Du, Zhenyu; Li, Yihong; Hu, Jinsong

2018-05-01

By studying the known APT attack events deeply, this paper propose a feature selection method of APT sample and a logic expression generation algorithm IOCG (Indicator of Compromise Generate). The algorithm can automatically generate machine readable IOCs (Indicator of Compromise), to solve the existing IOCs logical relationship is fixed, the number of logical items unchanged, large scale and cannot generate a sample of the limitations of the expression. At the same time, it can reduce the redundancy and useless APT sample processing time consumption, and improve the sharing rate of information analysis, and actively respond to complex and volatile APT attack situation. The samples were divided into experimental set and training set, and then the algorithm was used to generate the logical expression of the training set with the IOC_ Aware plug-in. The contrast expression itself was different from the detection result. The experimental results show that the algorithm is effective and can improve the detection effect.
PDC-SGB: Prediction of effective drug combinations using a stochastic gradient boosting algorithm.

PubMed

Xu, Qian; Xiong, Yi; Dai, Hao; Kumari, Kotni Meena; Xu, Qin; Ou, Hong-Yu; Wei, Dong-Qing

2017-03-21

Combinatorial therapy is a promising strategy for combating complex diseases by improving the efficacy and reducing the side effects. To facilitate the identification of drug combinations in pharmacology, we proposed a new computational model, termed PDC-SGB, to predict effective drug combinations by integrating biological, chemical and pharmacological information based on a stochastic gradient boosting algorithm. To begin with, a set of 352 golden positive samples were collected from the public drug combination database. Then, a set of 732 dimensional feature vector involving biological, chemical and pharmaceutical information was constructed for each drug combination to describe its properties. To avoid overfitting, the maximum relevance & minimum redundancy (mRMR) method was performed to extract useful ones by removing redundant subsets. Based on the selected features, the three different type of classification algorithms were employed to build the drug combination prediction models. Our results demonstrated that the model based on the stochastic gradient boosting algorithm yield out the best performance. Furthermore, it is indicated that the feature patterns of therapy had powerful ability to discriminate effective drug combinations from non-effective ones. By analyzing various features, it is shown that the enriched features occurred frequently in golden positive samples can help predict novel drug combinations. Copyright © 2017 Elsevier Ltd. All rights reserved.
A flexible data-driven comorbidity feature extraction framework.

PubMed

Sideris, Costas; Pourhomayoun, Mohammad; Kalantarian, Haik; Sarrafzadeh, Majid

2016-06-01

Disease and symptom diagnostic codes are a valuable resource for classifying and predicting patient outcomes. In this paper, we propose a novel methodology for utilizing disease diagnostic information in a predictive machine learning framework. Our methodology relies on a novel, clustering-based feature extraction framework using disease diagnostic information. To reduce the data dimensionality, we identify disease clusters using co-occurrence statistics. We optimize the number of generated clusters in the training set and then utilize these clusters as features to predict patient severity of condition and patient readmission risk. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million hospital discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 Congestive Heart Failure (CHF) patients and the UCI 130-US diabetes dataset that includes admissions from 69,980 diabetic patients. We compare our cluster-based feature set with the commonly used comorbidity frameworks including Charlson's index, Elixhauser's comorbidities and their variations. The proposed approach was shown to have significant gains between 10.7-22.1% in predictive accuracy for CHF severity of condition prediction and 4.65-5.75% in diabetes readmission prediction. Copyright © 2016 Elsevier Ltd. All rights reserved.
A fast algorithm for identifying friends-of-friends halos

NASA Astrophysics Data System (ADS)

Feng, Y.; Modi, C.

2017-07-01

We describe a simple and fast algorithm for identifying friends-of-friends features and prove its correctness. The algorithm avoids unnecessary expensive neighbor queries, uses minimal memory overhead, and rejects slowdown in high over-density regions. We define our algorithm formally based on pair enumeration, a problem that has been heavily studied in fast 2-point correlation codes and our reference implementation employs a dual KD-tree correlation function code. We construct features in a hierarchical tree structure, and use a splay operation to reduce the average cost of identifying the root of a feature from O [ log L ] to O [ 1 ] (L is the size of a feature) without additional memory costs. This reduces the overall time complexity of merging trees from O [ L log L ] to O [ L ] , reducing the number of operations per splay by orders of magnitude. We next introduce a pruning operation that skips merge operations between two fully self-connected KD-tree nodes. This improves the robustness of the algorithm, reducing the number of merge operations in high density peaks from O [δ2 ] to O [ δ ] . We show that for cosmological data set the algorithm eliminates more than half of merge operations for typically used linking lengths b ∼ 0 . 2 (relative to mean separation). Furthermore, our algorithm is extremely simple and easy to implement on top of an existing pair enumeration code, reusing the optimization effort that has been invested in fast correlation function codes.
Large Margin Multi-Modal Multi-Task Feature Extraction for Image Classification.

PubMed

Yong Luo; Yonggang Wen; Dacheng Tao; Jie Gui; Chao Xu

2016-01-01

The features used in many image analysis-based applications are frequently of very high dimension. Feature extraction offers several advantages in high-dimensional cases, and many recent studies have used multi-task feature extraction approaches, which often outperform single-task feature extraction approaches. However, most of these methods are limited in that they only consider data represented by a single type of feature, even though features usually represent images from multiple modalities. We, therefore, propose a novel large margin multi-modal multi-task feature extraction (LM3FE) framework for handling multi-modal features for image classification. In particular, LM3FE simultaneously learns the feature extraction matrix for each modality and the modality combination coefficients. In this way, LM3FE not only handles correlated and noisy features, but also utilizes the complementarity of different modalities to further help reduce feature redundancy in each modality. The large margin principle employed also helps to extract strongly predictive features, so that they are more suitable for prediction (e.g., classification). An alternating algorithm is developed for problem optimization, and each subproblem can be efficiently solved. Experiments on two challenging real-world image data sets demonstrate the effectiveness and superiority of the proposed method.
Operator for object recognition and scene analysis by estimation of set occupancy with noisy and incomplete data sets

NASA Astrophysics Data System (ADS)

Rees, S. J.; Jones, Bryan F.

1992-11-01

Once feature extraction has occurred in a processed image, the recognition problem becomes one of defining a set of features which maps sufficiently well onto one of the defined shape/object models to permit a claimed recognition. This process is usually handled by aggregating features until a large enough weighting is obtained to claim membership, or an adequate number of located features are matched to the reference set. A requirement has existed for an operator or measure capable of a more direct assessment of membership/occupancy between feature sets, particularly where the feature sets may be defective representations. Such feature set errors may be caused by noise, by overlapping of objects, and by partial obscuration of features. These problems occur at the point of acquisition: repairing the data would then assume a priori knowledge of the solution. The technique described in this paper offers a set theoretical measure for partial occupancy defined in terms of the set of minimum additions to permit full occupancy and the set of locations of occupancy if such additions are made. As is shown, this technique permits recognition of partial feature sets with quantifiable degrees of uncertainty. A solution to the problems of obscuration and overlapping is therefore available.
Sequential structural damage diagnosis algorithm using a change point detection method

NASA Astrophysics Data System (ADS)

Noh, H.; Rajagopal, R.; Kiremidjian, A. S.

2013-11-01

This paper introduces a damage diagnosis algorithm for civil structures that uses a sequential change point detection method. The general change point detection method uses the known pre- and post-damage feature distributions to perform a sequential hypothesis test. In practice, however, the post-damage distribution is unlikely to be known a priori, unless we are looking for a known specific type of damage. Therefore, we introduce an additional algorithm that estimates and updates this distribution as data are collected using the maximum likelihood and the Bayesian methods. We also applied an approximate method to reduce the computation load and memory requirement associated with the estimation. The algorithm is validated using a set of experimental data collected from a four-story steel special moment-resisting frame and multiple sets of simulated data. Various features of different dimensions have been explored, and the algorithm was able to identify damage, particularly when it uses multidimensional damage sensitive features and lower false alarm rates, with a known post-damage feature distribution. For unknown feature distribution cases, the post-damage distribution was consistently estimated and the detection delays were only a few time steps longer than the delays from the general method that assumes we know the post-damage feature distribution. We confirmed that the Bayesian method is particularly efficient in declaring damage with minimal memory requirement, but the maximum likelihood method provides an insightful heuristic approach.
Face Alignment via Regressing Local Binary Features.

PubMed

Ren, Shaoqing; Cao, Xudong; Wei, Yichen; Sun, Jian

2016-03-01

This paper presents a highly efficient and accurate regression approach for face alignment. Our approach has two novel components: 1) a set of local binary features and 2) a locality principle for learning those features. The locality principle guides us to learn a set of highly discriminative local binary features for each facial landmark independently. The obtained local binary features are used to jointly learn a linear regression for the final output. This approach achieves the state-of-the-art results when tested on the most challenging benchmarks to date. Furthermore, because extracting and regressing local binary features are computationally very cheap, our system is much faster than previous methods. It achieves over 3000 frames per second (FPS) on a desktop or 300 FPS on a mobile phone for locating a few dozens of landmarks. We also study a key issue that is important but has received little attention in the previous research, which is the face detector used to initialize alignment. We investigate several face detectors and perform quantitative evaluation on how they affect alignment accuracy. We find that an alignment friendly detector can further greatly boost the accuracy of our alignment method, reducing the error up to 16% relatively. To facilitate practical usage of face detection/alignment methods, we also propose a convenient metric to measure how good a detector is for alignment initialization.
Classification of health webpages as expert and non expert with a reduced set of cross-language features.

PubMed

Grabar, Natalia; Krivine, Sonia; Jaulent, Marie-Christine

2007-10-11

Making the distinction between expert and non expert health documents can help users to select the information which is more suitable for them, according to whether they are familiar or not with medical terminology. This issue is particularly important for the information retrieval area. In our work we address this purpose through stylistic corpus analysis and the application of machine learning algorithms. Our hypothesis is that this distinction can be performed on the basis of a small number of features and that such features can be language and domain independent. The used features were acquired in source corpus (Russian language, diabetes topic) and then tested on target (French language, pneumology topic) and source corpora. These cross-language features show 90% precision and 93% recall with non expert documents in source language; and 85% precision and 74% recall with expert documents in target language.
Modelling assistive technology adoption for people with dementia.

PubMed

Chaurasia, Priyanka; McClean, Sally I; Nugent, Chris D; Cleland, Ian; Zhang, Shuai; Donnelly, Mark P; Scotney, Bryan W; Sanders, Chelsea; Smith, Ken; Norton, Maria C; Tschanz, JoAnn

2016-10-01

Assistive technologies have been identified as a potential solution for the provision of elderly care. Such technologies have in general the capacity to enhance the quality of life and increase the level of independence among their users. Nevertheless, the acceptance of these technologies is crucial to their success. Generally speaking, the elderly are not well-disposed to technologies and have limited experience; these factors contribute towards limiting the widespread acceptance of technology. It is therefore important to evaluate the potential success of technologies prior to their deployment. The research described in this paper builds upon our previous work on modelling adoption of assistive technology, in the form of cognitive prosthetics such as reminder apps and aims at identifying a refined sub-set of features which offer improved accuracy in predicting technology adoption. Consequently, in this paper, an adoption model is built using a set of features extracted from a user's background to minimise the likelihood of non-adoption. The work is based on analysis of data from the Cache County Study on Memory and Aging (CCSMA) with 31 features covering a range of age, gender, education and details of health condition. In the process of modelling adoption, feature selection and feature reduction is carried out followed by identifying the best classification models. With the reduced set of labelled features the technology adoption model built achieved an average prediction accuracy of 92.48% when tested on 173 participants. We conclude that modelling user adoption from a range of parameters such as physical, environmental and social perspectives is beneficial in recommending a technology to a particular user based on their profile. Copyright © 2016 Elsevier Inc. All rights reserved.
Analysis Of The IJCNN 2011 UTL Challenge

DTIC Science & Technology

2012-01-13

large datasets from various application domains: handwriting recognition, image recognition, video processing, text processing, and ecology. The goal...validation and final evaluation sets consist of 4096 examples each. Dataset Domain Features Sparsity Devel. Transf. AVICENNA Handwriting 120 0% 150205...documents [3]. Transfer learning methods could accelerate the application of handwriting recognizers to historical manuscript by reducing the need for
Resilience: An Entry Point for African Health Promoting Schools?

ERIC Educational Resources Information Center

Stewart, Donald

2014-01-01

Purpose: The purpose of this paper is to provide a review of an Australian health promoting schools (HPS) project to identify key features of the concept of resilience and how it can be used in a school setting to develop and strengthen protective factors in young people, as a mechanism for improving social functioning and reducing involvement in…
Analysis of geometric moments as features for firearm identification.

PubMed

Md Ghani, Nor Azura; Liong, Choong-Yeun; Jemain, Abdul Aziz

2010-05-20

The task of identifying firearms from forensic ballistics specimens is exacting in crime investigation since the last two decades. Every firearm, regardless of its size, make and model, has its own unique 'fingerprint'. These fingerprints transfer when a firearm is fired to the fired bullet and cartridge case. The components that are involved in producing these unique characteristics are the firing chamber, breech face, firing pin, ejector, extractor and the rifling of the barrel. These unique characteristics are the critical features in identifying firearms. It allows investigators to decide on which particular firearm that has fired the bullet. Traditionally the comparison of ballistic evidence has been a tedious and time-consuming process requiring highly skilled examiners. Therefore, the main objective of this study is the extraction and identification of suitable features from firing pin impression of cartridge case images for firearm recognition. Some previous studies have shown that firing pin impression of cartridge case is one of the most important characteristics used for identifying an individual firearm. In this study, data are gathered using 747 cartridge case images captured from five different pistols of type 9mm Parabellum Vektor SP1, made in South Africa. All the images of the cartridge cases are then segmented into three regions, forming three different set of images, i.e. firing pin impression image, centre of firing pin impression image and ring of firing pin impression image. Then geometric moments up to the sixth order were generated from each part of the images to form a set of numerical features. These 48 features were found to be significantly different using the MANOVA test. This high dimension of features is then reduced into only 11 significant features using correlation analysis. Classification results using cross-validation under discriminant analysis show that 96.7% of the images were classified correctly. These results demonstrate the value of geometric moments technique for producing a set of numerical features, based on which the identification of firearms are made.
On the use of feature selection to improve the detection of sea oil spills in SAR images

NASA Astrophysics Data System (ADS)

Mera, David; Bolon-Canedo, Veronica; Cotos, J. M.; Alonso-Betanzos, Amparo

2017-03-01

Fast and effective oil spill detection systems are crucial to ensure a proper response to environmental emergencies caused by hydrocarbon pollution on the ocean's surface. Typically, these systems uncover not only oil spills, but also a high number of look-alikes. The feature extraction is a critical and computationally intensive phase where each detected dark spot is independently examined. Traditionally, detection systems use an arbitrary set of features to discriminate between oil spills and look-alikes phenomena. However, Feature Selection (FS) methods based on Machine Learning (ML) have proved to be very useful in real domains for enhancing the generalization capabilities of the classifiers, while discarding the existing irrelevant features. In this work, we present a generic and systematic approach, based on FS methods, for choosing a concise and relevant set of features to improve the oil spill detection systems. We have compared five FS methods: Correlation-based feature selection (CFS), Consistency-based filter, Information Gain, ReliefF and Recursive Feature Elimination for Support Vector Machine (SVM-RFE). They were applied on a 141-input vector composed of features from a collection of outstanding studies. Selected features were validated via a Support Vector Machine (SVM) classifier and the results were compared with previous works. Test experiments revealed that the classifier trained with the 6-input feature vector proposed by SVM-RFE achieved the best accuracy and Cohen's kappa coefficient (87.1% and 74.06% respectively). This is a smaller feature combination with similar or even better classification accuracy than previous works. The presented finding allows to speed up the feature extraction phase without reducing the classifier accuracy. Experiments also confirmed the significance of the geometrical features since 75.0% of the different features selected by the applied FS methods as well as 66.67% of the proposed 6-input feature vector belong to this category.
Reduced multiple empirical kernel learning machine.

PubMed

Wang, Zhe; Lu, MingZhe; Gao, Daqi

2015-02-01

Multiple kernel learning (MKL) is demonstrated to be flexible and effective in depicting heterogeneous data sources since MKL can introduce multiple kernels rather than a single fixed kernel into applications. However, MKL would get a high time and space complexity in contrast to single kernel learning, which is not expected in real-world applications. Meanwhile, it is known that the kernel mapping ways of MKL generally have two forms including implicit kernel mapping and empirical kernel mapping (EKM), where the latter is less attracted. In this paper, we focus on the MKL with the EKM, and propose a reduced multiple empirical kernel learning machine named RMEKLM for short. To the best of our knowledge, it is the first to reduce both time and space complexity of the MKL with EKM. Different from the existing MKL, the proposed RMEKLM adopts the Gauss Elimination technique to extract a set of feature vectors, which is validated that doing so does not lose much information of the original feature space. Then RMEKLM adopts the extracted feature vectors to span a reduced orthonormal subspace of the feature space, which is visualized in terms of the geometry structure. It can be demonstrated that the spanned subspace is isomorphic to the original feature space, which means that the dot product of two vectors in the original feature space is equal to that of the two corresponding vectors in the generated orthonormal subspace. More importantly, the proposed RMEKLM brings a simpler computation and meanwhile needs a less storage space, especially in the processing of testing. Finally, the experimental results show that RMEKLM owns a much efficient and effective performance in terms of both complexity and classification. The contributions of this paper can be given as follows: (1) by mapping the input space into an orthonormal subspace, the geometry of the generated subspace is visualized; (2) this paper first reduces both the time and space complexity of the EKM-based MKL; (3) this paper adopts the Gauss Elimination, one of the on-the-shelf techniques, to generate a basis of the original feature space, which is stable and efficient.
Automatic Identification of Messages Related to Adverse Drug Reactions from Online User Reviews using Feature-based Classification.

PubMed

Liu, Jingfang; Zhang, Pengzhu; Lu, Yingjie

2014-11-01

User-generated medical messages on Internet contain extensive information related to adverse drug reactions (ADRs) and are known as valuable resources for post-marketing drug surveillance. The aim of this study was to find an effective method to identify messages related to ADRs automatically from online user reviews. We conducted experiments on online user reviews using different feature set and different classification technique. Firstly, the messages from three communities, allergy community, schizophrenia community and pain management community, were collected, the 3000 messages were annotated. Secondly, the N-gram-based features set and medical domain-specific features set were generated. Thirdly, three classification techniques, SVM, C4.5 and Naïve Bayes, were used to perform classification tasks separately. Finally, we evaluated the performance of different method using different feature set and different classification technique by comparing the metrics including accuracy and F-measure. In terms of accuracy, the accuracy of SVM classifier was higher than 0.8, the accuracy of C4.5 classifier or Naïve Bayes classifier was lower than 0.8; meanwhile, the combination feature sets including n-gram-based feature set and domain-specific feature set consistently outperformed single feature set. In terms of F-measure, the highest F-measure is 0.895 which was achieved by using combination feature sets and a SVM classifier. In all, we can get the best classification performance by using combination feature sets and SVM classifier. By using combination feature sets and SVM classifier, we can get an effective method to identify messages related to ADRs automatically from online user reviews.
Higher criticism thresholding: Optimal feature selection when useful features are rare and weak.

PubMed

Donoho, David; Jin, Jiashun

2008-09-30

In important application fields today-genomics and proteomics are examples-selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, ..., p, let pi(i) denote the two-sided P-value associated with the ith feature Z-score and pi((i)) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p - pi((i)))/sqrt{i/p(1-i/p)}. We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT.

Higher criticism thresholding: Optimal feature selection when useful features are rare and weak

PubMed Central

Donoho, David; Jin, Jiashun

2008-01-01

In important application fields today—genomics and proteomics are examples—selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, …, p, let πi denote the two-sided P-value associated with the ith feature Z-score and π(i) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p − π(i))/i/p(1−i/p). We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT. PMID:18815365
TU-G-204-05: The Effects of CT Acquisition and Reconstruction Conditions On Computed Texture Feature Values of Lung Lesions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lo, P; Young, S; Kim, G

2015-06-15

Purpose: Texture features have been investigated as a biomarker of response and malignancy. Because these features reflect local differences in density, they may be influenced by acquisition and reconstruction parameters. The purpose of this study was to investigate the effects of radiation dose level and reconstruction method on features derived from lung lesions. Methods: With IRB approval, 33 lung tumor cases were identified from clinically indicated thoracic CT scans in which the raw projection (sinogram) data were available. Based on a previously-published technique, noise was added to the raw data to simulate reduced-dose versions of each case at 25%, 10%more » and 3% of the original dose. Original and simulated reduced dose projection data were reconstructed with conventional and two iterative-reconstruction settings, yielding 12 combinations of dose/recon conditions. One lesion from each case was contoured. At the reference condition (full dose, conventional recon), 17 lesions were randomly selected for repeat contouring (repeatability). For each lesion at each dose/recon condition, 151 texture measures were calculated. A paired differences approach was employed to compare feature variation from repeat contours at the reference condition to the variation observed in other dose/recon conditions (reproducibility). The ratio of standard deviation of the reproducibility to repeatability was used as the variation measure for each feature. Results: The mean variation (standard deviation) across dose levels and kernel was significantly different with a ratio of 2.24 (±5.85) across texture features (p=0.01). The mean variation (standard deviation) across dose levels with conventional recon was also significantly different with 2.30 (7.11) (p=0.025). The mean variation across reconstruction settings of original dose has a trend in showing difference with 1.35 (2.60) among all features (p=0.09). Conclusion: Texture features varied considerably with variations in dose and reconstruction condition. Care should be taken to standardize these conditions when using texture as a quantitative feature. This effort supported in part by a grant from the National Cancer Institute’s Quantitative Imaging Network (QIN): U01 CA181156; The UCLA Department of Radiology has a Master Research Agreement with Siemens Healthcare; Dr. McNitt-Gray has previously received research support from Siemens Healthcare.« less
Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition.

PubMed

Ibrahim, Wisam; Abadeh, Mohammad Saniee

2017-05-21

Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences. Principal Component Analysis PCA has been implemented to reduce the number of extracted features. The extracted feature vectors have been used with original features to improve the performance of the Deep Extreme Learning Machine DELM in the second stage. Four new features have been extracted from the second stage and used in the third stage by Linear Discriminant Analysis LDA to classify the instances into 27 folds. The proposed framework is implemented on the independent and combined feature sets in SCOP datasets. The experimental results show that extracted feature vectors in the first stage could improve the performance of DELM in extracting new useful features in second stage. Copyright © 2017 Elsevier Ltd. All rights reserved.
A Transform-Based Feature Extraction Approach for Motor Imagery Tasks Classification

PubMed Central

Khorshidtalab, Aida; Mesbah, Mostefa; Salami, Momoh J. E.

2015-01-01

In this paper, we present a new motor imagery classification method in the context of electroencephalography (EEG)-based brain–computer interface (BCI). This method uses a signal-dependent orthogonal transform, referred to as linear prediction singular value decomposition (LP-SVD), for feature extraction. The transform defines the mapping as the left singular vectors of the LP coefficient filter impulse response matrix. Using a logistic tree-based model classifier; the extracted features are classified into one of four motor imagery movements. The proposed approach was first benchmarked against two related state-of-the-art feature extraction approaches, namely, discrete cosine transform (DCT) and adaptive autoregressive (AAR)-based methods. By achieving an accuracy of 67.35%, the LP-SVD approach outperformed the other approaches by large margins (25% compared with DCT and 6 % compared with AAR-based methods). To further improve the discriminatory capability of the extracted features and reduce the computational complexity, we enlarged the extracted feature subset by incorporating two extra features, namely, Q- and the Hotelling’s \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$T^{2}$ \\end{document} statistics of the transformed EEG and introduced a new EEG channel selection method. The performance of the EEG classification based on the expanded feature set and channel selection method was compared with that of a number of the state-of-the-art classification methods previously reported with the BCI IIIa competition data set. Our method came second with an average accuracy of 81.38%. PMID:27170898
Reduced isothermal feature set for long wave infrared (LWIR) face recognition

NASA Astrophysics Data System (ADS)

Donoso, Ramiro; San Martín, Cesar; Hermosilla, Gabriel

2017-06-01

In this paper, we introduce a new concept in the thermal face recognition area: isothermal features. This consists of a feature vector built from a thermal signature that depends on the emission of the skin of the person and its temperature. A thermal signature is the appearance of the face to infrared sensors and is unique to each person. The infrared face is decomposed into isothermal regions that present the thermal features of the face. Each isothermal region is modeled as circles within a center representing the pixel of the image, and the feature vector is composed of a maximum radius of the circles at the isothermal region. This feature vector corresponds to the thermal signature of a person. The face recognition process is built using a modification of the Expectation Maximization (EM) algorithm in conjunction with a proposed probabilistic index to the classification process. Results obtained using an infrared database are compared with typical state-of-the-art techniques showing better performance, especially in uncontrolled acquisition conditions scenarios.
Computer-Aided Breast Cancer Diagnosis with Optimal Feature Sets: Reduction Rules and Optimization Techniques.

PubMed

Mathieson, Luke; Mendes, Alexandre; Marsden, John; Pond, Jeffrey; Moscato, Pablo

2017-01-01

This chapter introduces a new method for knowledge extraction from databases for the purpose of finding a discriminative set of features that is also a robust set for within-class classification. Our method is generic and we introduce it here in the field of breast cancer diagnosis from digital mammography data. The mathematical formalism is based on a generalization of the k-Feature Set problem called (α, β)-k-Feature Set problem, introduced by Cotta and Moscato (J Comput Syst Sci 67(4):686-690, 2003). This method proceeds in two steps: first, an optimal (α, β)-k-feature set of minimum cardinality is identified and then, a set of classification rules using these features is obtained. We obtain the (α, β)-k-feature set in two phases; first a series of extremely powerful reduction techniques, which do not lose the optimal solution, are employed; and second, a metaheuristic search to identify the remaining features to be considered or disregarded. Two algorithms were tested with a public domain digital mammography dataset composed of 71 malignant and 75 benign cases. Based on the results provided by the algorithms, we obtain classification rules that employ only a subset of these features.
Probabilistic hazard assessment for skin sensitization potency by dose–response modeling using feature elimination instead of quantitative structure–activity relationships

PubMed Central

McKim, James M.; Hartung, Thomas; Kleensang, Andre; Sá-Rocha, Vanessa

2016-01-01

Supervised learning methods promise to improve integrated testing strategies (ITS), but must be adjusted to handle high dimensionality and dose–response data. ITS approaches are currently fueled by the increasing mechanistic understanding of adverse outcome pathways (AOP) and the development of tests reflecting these mechanisms. Simple approaches to combine skin sensitization data sets, such as weight of evidence, fail due to problems in information redundancy and high dimension-ality. The problem is further amplified when potency information (dose/response) of hazards would be estimated. Skin sensitization currently serves as the foster child for AOP and ITS development, as legislative pressures combined with a very good mechanistic understanding of contact dermatitis have led to test development and relatively large high-quality data sets. We curated such a data set and combined a recursive variable selection algorithm to evaluate the information available through in silico, in chemico and in vitro assays. Chemical similarity alone could not cluster chemicals’ potency, and in vitro models consistently ranked high in recursive feature elimination. This allows reducing the number of tests included in an ITS. Next, we analyzed with a hidden Markov model that takes advantage of an intrinsic inter-relationship among the local lymph node assay classes, i.e. the monotonous connection between local lymph node assay and dose. The dose-informed random forest/hidden Markov model was superior to the dose-naive random forest model on all data sets. Although balanced accuracy improvement may seem small, this obscures the actual improvement in misclassifications as the dose-informed hidden Markov model strongly reduced "false-negatives" (i.e. extreme sensitizers as non-sensitizer) on all data sets. PMID:26046447
Probabilistic hazard assessment for skin sensitization potency by dose-response modeling using feature elimination instead of quantitative structure-activity relationships.

PubMed

Luechtefeld, Thomas; Maertens, Alexandra; McKim, James M; Hartung, Thomas; Kleensang, Andre; Sá-Rocha, Vanessa

2015-11-01

Supervised learning methods promise to improve integrated testing strategies (ITS), but must be adjusted to handle high dimensionality and dose-response data. ITS approaches are currently fueled by the increasing mechanistic understanding of adverse outcome pathways (AOP) and the development of tests reflecting these mechanisms. Simple approaches to combine skin sensitization data sets, such as weight of evidence, fail due to problems in information redundancy and high dimensionality. The problem is further amplified when potency information (dose/response) of hazards would be estimated. Skin sensitization currently serves as the foster child for AOP and ITS development, as legislative pressures combined with a very good mechanistic understanding of contact dermatitis have led to test development and relatively large high-quality data sets. We curated such a data set and combined a recursive variable selection algorithm to evaluate the information available through in silico, in chemico and in vitro assays. Chemical similarity alone could not cluster chemicals' potency, and in vitro models consistently ranked high in recursive feature elimination. This allows reducing the number of tests included in an ITS. Next, we analyzed with a hidden Markov model that takes advantage of an intrinsic inter-relationship among the local lymph node assay classes, i.e. the monotonous connection between local lymph node assay and dose. The dose-informed random forest/hidden Markov model was superior to the dose-naive random forest model on all data sets. Although balanced accuracy improvement may seem small, this obscures the actual improvement in misclassifications as the dose-informed hidden Markov model strongly reduced " false-negatives" (i.e. extreme sensitizers as non-sensitizer) on all data sets. Copyright © 2015 John Wiley & Sons, Ltd.
A mixture model-based approach to the clustering of microarray expression data.

PubMed

McLachlan, G J; Bean, R W; Peel, D

2002-03-01

This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets. EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/
A Realistic Seizure Prediction Study Based on Multiclass SVM.

PubMed

Direito, Bruno; Teixeira, César A; Sales, Francisco; Castelo-Branco, Miguel; Dourado, António

2017-05-01

A patient-specific algorithm, for epileptic seizure prediction, based on multiclass support-vector machines (SVM) and using multi-channel high-dimensional feature sets, is presented. The feature sets, combined with multiclass classification and post-processing schemes aim at the generation of alarms and reduced influence of false positives. This study considers 216 patients from the European Epilepsy Database, and includes 185 patients with scalp EEG recordings and 31 with intracranial data. The strategy was tested over a total of 16,729.80[Formula: see text]h of inter-ictal data, including 1206 seizures. We found an overall sensitivity of 38.47% and a false positive rate per hour of 0.20. The performance of the method achieved statistical significance in 24 patients (11% of the patients). Despite the encouraging results previously reported in specific datasets, the prospective demonstration on long-term EEG recording has been limited. Our study presents a prospective analysis of a large heterogeneous, multicentric dataset. The statistical framework based on conservative assumptions, reflects a realistic approach compared to constrained datasets, and/or in-sample evaluations. The improvement of these results, with the definition of an appropriate set of features able to improve the distinction between the pre-ictal and nonpre-ictal states, hence minimizing the effect of confounding variables, remains a key aspect.
Comparing supervised learning techniques on the task of physical activity recognition.

PubMed

Dalton, A; OLaighin, G

2013-01-01

The objective of this study was to compare the performance of base-level and meta-level classifiers on the task of physical activity recognition. Five wireless kinematic sensors were attached to each subject (n = 25) while they completed a range of basic physical activities in a controlled laboratory setting. Subjects were then asked to carry out similar self-annotated physical activities in a random order and in an unsupervised environment. A combination of time-domain and frequency-domain features were extracted from the sensor data including the first four central moments, zero-crossing rate, average magnitude, sensor cross-correlation, sensor auto-correlation, spectral entropy and dominant frequency components. A reduced feature set was generated using a wrapper subset evaluation technique with a linear forward search and this feature set was employed for classifier comparison. The meta-level classifier AdaBoostM1 with C4.5 Graft as its base-level classifier achieved an overall accuracy of 95%. Equal sized datasets of subject independent data and subject dependent data were used to train this classifier and high recognition rates could be achieved without the need for user specific training. Furthermore, it was found that an accuracy of 88% could be achieved using data from the ankle and wrist sensors only.
Quantitative Real-Time PCR using the Thermo Scientific Solaris qPCR Assay

PubMed Central

Ogrean, Christy; Jackson, Ben; Covino, James

2010-01-01

The Solaris qPCR Gene Expression Assay is a novel type of primer/probe set, designed to simplify the qPCR process while maintaining the sensitivity and accuracy of the assay. These primer/probe sets are pre-designed to >98% of the human and mouse genomes and feature significant improvements from previously available technologies. These improvements were made possible by virtue of a novel design algorithm, developed by Thermo Scientific bioinformatics experts. Several convenient features have been incorporated into the Solaris qPCR Assay to streamline the process of performing quantitative real-time PCR. First, the protocol is similar to commonly employed alternatives, so the methods used during qPCR are likely to be familiar. Second, the master mix is blue, which makes setting the qPCR reactions easier to track. Third, the thermal cycling conditions are the same for all assays (genes), making it possible to run many samples at a time and reducing the potential for error. Finally, the probe and primer sequence information are provided, simplifying the publication process. Here, we demonstrate how to obtain the appropriate Solaris reagents using the GENEius product search feature found on the ordering web site (www.thermo.com/solaris) and how to use the Solaris reagents for performing qPCR using the standard curve method. PMID:20567213
An assessment of the Height Above Nearest Drainage terrain descriptor for the thematic enhancement of automatic SAR-based flood monitoring services

NASA Astrophysics Data System (ADS)

Chow, Candace; Twele, André; Martinis, Sandro

2016-10-01

Flood extent maps derived from Synthetic Aperture Radar (SAR) data can communicate spatially-explicit information in a timely and cost-effective manner to support disaster management. Automated processing chains for SAR-based flood mapping have the potential to substantially reduce the critical time delay between the delivery of post-event satellite data and the subsequent provision of satellite derived crisis information to emergency management authorities. However, the accuracy of SAR-based flood mapping can vary drastically due to the prevalent land cover and topography of a given scene. While expert-based image interpretation with the consideration of contextual information can effectively isolate flood surface features, a fully-automated feature differentiation algorithm mainly based on the grey levels of a given pixel is comparatively more limited for features with similar SAR-backscattering characteristics. The inclusion of ancillary data in the automatic classification procedure can effectively reduce instances of misclassification. In this work, a near-global `Height Above Nearest Drainage' (HAND) index [10] was calculated with digital elevation data and drainage directions from the HydroSHEDS mapping project [2]. The index can be used to separate flood-prone regions from areas with a low probability of flood occurrence. Based on the HAND-index, an exclusion mask was computed to reduce water look-alikes with respect to the hydrologictopographic setting. The applicability of this near-global ancillary data set for the thematic improvement of Sentinel-1 and TerraSAR-X based services for flood and surface water monitoring has been validated both qualitatively and quantitatively. Application of a HAND-based exclusion mask resulted in improvements to the classification accuracy of SAR scenes with high amounts of water look-alikes and considerable elevation differences.
Indonesian name matching using machine learning supervised approach

NASA Astrophysics Data System (ADS)

Alifikri, Mohamad; Arif Bijaksana, Moch.

2018-03-01

Most existing name matching methods are developed for English language and so they cover the characteristics of this language. Up to this moment, there is no specific one has been designed and implemented for Indonesian names. The purpose of this thesis is to develop Indonesian name matching dataset as a contribution to academic research and to propose suitable feature set by utilizing combination of context of name strings and its permute-winkler score. Machine learning classification algorithms is taken as the method for performing name matching. Based on the experiments, by using tuned Random Forest algorithm and proposed features, there is an improvement of matching performance by approximately 1.7% and it is able to reduce until 70% misclassification result of the state of the arts methods. This improving performance makes the matching system more effective and reduces the risk of misclassified matches.
Local binary pattern texture-based classification of solid masses in ultrasound breast images

NASA Astrophysics Data System (ADS)

Matsumoto, Monica M. S.; Sehgal, Chandra M.; Udupa, Jayaram K.

2012-03-01

Breast cancer is one of the leading causes of cancer mortality among women. Ultrasound examination can be used to assess breast masses, complementarily to mammography. Ultrasound images reveal tissue information in its echoic patterns. Therefore, pattern recognition techniques can facilitate classification of lesions and thereby reduce the number of unnecessary biopsies. Our hypothesis was that image texture features on the boundary of a lesion and its vicinity can be used to classify masses. We have used intensity-independent and rotation-invariant texture features, known as Local Binary Patterns (LBP). The classifier selected was K-nearest neighbors. Our breast ultrasound image database consisted of 100 patient images (50 benign and 50 malignant cases). The determination of whether the mass was benign or malignant was done through biopsy and pathology assessment. The training set consisted of sixty images, randomly chosen from the database of 100 patients. The testing set consisted of forty images to be classified. The results with a multi-fold cross validation of 100 iterations produced a robust evaluation. The highest performance was observed for feature LBP with 24 symmetrically distributed neighbors over a circle of radius 3 (LBP24,3) with an accuracy rate of 81.0%. We also investigated an approach with a score of malignancy assigned to the images in the test set. This approach provided an ROC curve with Az of 0.803. The analysis of texture features over the boundary of solid masses showed promise for malignancy classification in ultrasound breast images.
User-centered design and evaluation of a next generation fixed-split ergonomic keyboard.

PubMed

McLoone, Hugh E; Jacobson, Melissa; Hegg, Chau; Johnson, Peter W

2010-01-01

Research has shown that fixed-split, ergonomic keyboards lessen the pain and functional status in symptomatic individuals as well as reduce the likelihood of developing musculoskeletal disorders in asymptomatic typists over extended use. The goal of this study was to evaluate design features to determine whether the current fixed-split ergonomic keyboard design could be improved. Thirty-nine, adult-aged, fixed-split ergonomic keyboard users were recruited to participate in one of three studies. First utilizing non-functional models and later a functional prototype, three studies evaluated keyboard design features including: 1) keyboard lateral inclination, 2) wrist rest height, 3) keyboard slope, and 4) curved "gull-wing" key layouts. The findings indicated that keyboard lateral inclination could be increased from 8° to 14°; wrist rest height could be increased up to 10 mm from current setting; positive, flat, and negative slope settings were equally preferred and facilitated greater postural variation; and participants preferred a new gull-wing key layout. The design changes reduced forearm pronation and wrist extension while not adversely affecting typing performance. This research demonstrated how iterative-evaluative, user-centered research methods can be utilized to improve a product's design such as a fixed-split ergonomic keyboard.
A Non-destructive Terahertz Spectroscopy-Based Method for Transgenic Rice Seed Discrimination via Sparse Representation

NASA Astrophysics Data System (ADS)

Hu, Xiaohua; Lang, Wenhui; Liu, Wei; Xu, Xue; Yang, Jianbo; Zheng, Lei

2017-08-01

Terahertz (THz) spectroscopy technique has been researched and developed for rapid and non-destructive detection of food safety and quality due to its low-energy and non-ionizing characteristics. The objective of this study was to develop a flexible identification model to discriminate transgenic and non-transgenic rice seeds based on terahertz (THz) spectroscopy. To extract THz spectral features and reduce the feature dimension, sparse representation (SR) is employed in this work. A sufficient sparsity level is selected to train the sparse coding of the THz data, and the random forest (RF) method is then applied to obtain a discrimination model. The results show that there exist differences between transgenic and non-transgenic rice seeds in THz spectral band and, comparing with Least squares support vector machines (LS-SVM) method, SR-RF is a better model for discrimination (accuracy is 95% in prediction set, 100% in calibration set, respectively). The conclusion is that SR may be more useful in the application of THz spectroscopy to reduce dimension and the SR-RF provides a new, effective, and flexible method for detection and identification of transgenic and non-transgenic rice seeds with THz spectral system.
Collaborative knowledge acquisition for the design of context-aware alert systems.

PubMed

Joffe, Erel; Havakuk, Ofer; Herskovic, Jorge R; Patel, Vimla L; Bernstam, Elmer Victor

2012-01-01

To present a framework for combining implicit knowledge acquisition from multiple experts with machine learning and to evaluate this framework in the context of anemia alerts. Five internal medicine residents reviewed 18 anemia alerts, while 'talking aloud'. They identified features that were reviewed by two or more physicians to determine appropriate alert level, etiology and treatment recommendation. Based on these features, data were extracted from 100 randomly-selected anemia cases for a training set and an additional 82 cases for a test set. Two staff internists assigned an alert level, etiology and treatment recommendation before and after reviewing the entire electronic medical record. The training set of 118 cases (100 plus 18) and the test set of 82 cases were explored using RIDOR and JRip algorithms. The feature set was sufficient to assess 93% of anemia cases (intraclass correlation for alert level before and after review of the records by internists 1 and 2 were 0.92 and 0.95, respectively). High-precision classifiers were constructed to identify low-level alerts (precision p=0.87, recall R=0.4), iron deficiency (p=1.0, R=0.73), and anemia associated with kidney disease (p=0.87, R=0.77). It was possible to identify low-level alerts and several conditions commonly associated with chronic anemia. This approach may reduce the number of clinically unimportant alerts. The study was limited to anemia alerts. Furthermore, clinicians were aware of the study hypotheses potentially biasing their evaluation. Implicit knowledge acquisition, collaborative filtering and machine learning were combined automatically to induce clinically meaningful and precise decision rules.
Detection of explosive cough events in audio recordings by internal sound analysis.

PubMed

Rocha, B M; Mendes, L; Couceiro, R; Henriques, J; Carvalho, P; Paiva, R P

2017-07-01

We present a new method for the discrimination of explosive cough events, which is based on a combination of spectral content descriptors and pitch-related features. After the removal of near-silent segments, a vector of event boundaries is obtained and a proposed set of 9 features is extracted for each event. Two data sets, recorded using electronic stethoscopes and comprising a total of 46 healthy subjects and 13 patients, were employed to evaluate the method. The proposed feature set is compared to three other sets of descriptors: a baseline, a combination of both sets, and an automatic selection of the best 10 features from both sets. The combined feature set yields good results on the cross-validated database, attaining a sensitivity of 92.3±2.3% and a specificity of 84.7±3.3%. Besides, this feature set seems to generalize well when it is trained on a small data set of patients, with a variety of respiratory and cardiovascular diseases, and tested on a bigger data set of mostly healthy subjects: a sensitivity of 93.4% and a specificity of 83.4% are achieved in those conditions. These results demonstrate that complementing the proposed feature set with a baseline set is a promising approach.
Effective Feature Selection for Classification of Promoter Sequences.

PubMed

K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish

2016-01-01

Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.

Which features of primary care affect unscheduled secondary care use? A systematic review

PubMed Central

Huntley, Alyson; Lasserson, Daniel; Wye, Lesley; Morris, Richard; Checkland, Kath; England, Helen; Salisbury, Chris; Purdy, Sarah

2014-01-01

Objectives To conduct a systematic review to identify studies that describe factors and interventions at primary care practice level that impact on levels of utilisation of unscheduled secondary care. Setting Observational studies at primary care practice level. Participants Studies included people of any age of either sex living in Organisation for Economic Co-operation and Development (OECD) countries with any health condition. Primary and secondary outcome measures The primary outcome measure was unscheduled secondary care as measured by emergency department attendance and emergency hospital admissions. Results 48 papers were identified describing potential influencing features on emergency department visits (n=24 studies) and emergency admissions (n=22 studies). Patient factors associated with both outcomes were increased age, reduced socioeconomic status, lower educational attainment, chronic disease and multimorbidity. Features of primary care affecting unscheduled secondary care were more complex. Being able to see the same healthcare professional reduced unscheduled secondary care. Generally, better access was associated with reduced unscheduled care in the USA. Proximity to healthcare provision influenced patterns of use. Evidence relating to quality of care was limited and mixed. Conclusions The majority of research was from different healthcare systems and limited in the extent to which it can inform policy. However, there is evidence that continuity of care is associated with reduced emergency department attendance and emergency hospital admissions. PMID:24860000
Large-scale urban point cloud labeling and reconstruction

NASA Astrophysics Data System (ADS)

Zhang, Liqiang; Li, Zhuqiang; Li, Anjian; Liu, Fangyu

2018-04-01

The large number of object categories and many overlapping or closely neighboring objects in large-scale urban scenes pose great challenges in point cloud classification. In this paper, a novel framework is proposed for classification and reconstruction of airborne laser scanning point cloud data. To label point clouds, we present a rectified linear units neural network named ReLu-NN where the rectified linear units (ReLu) instead of the traditional sigmoid are taken as the activation function in order to speed up the convergence. Since the features of the point cloud are sparse, we reduce the number of neurons by the dropout to avoid over-fitting of the training process. The set of feature descriptors for each 3D point is encoded through self-taught learning, and forms a discriminative feature representation which is taken as the input of the ReLu-NN. The segmented building points are consolidated through an edge-aware point set resampling algorithm, and then they are reconstructed into 3D lightweight models using the 2.5D contouring method (Zhou and Neumann, 2010). Compared with deep learning approaches, the ReLu-NN introduced can easily classify unorganized point clouds without rasterizing the data, and it does not need a large number of training samples. Most of the parameters in the network are learned, and thus the intensive parameter tuning cost is significantly reduced. Experimental results on various datasets demonstrate that the proposed framework achieves better performance than other related algorithms in terms of classification accuracy and reconstruction quality.
An Evolving Worldview: Making Open Source Easy

NASA Astrophysics Data System (ADS)

Rice, Z.

2017-12-01

NASA Worldview is an interactive interface for browsing full-resolution, global satellite imagery. Worldview supports an open data policy so that academia, private industries and the general public can use NASA's satellite data to address Earth science related issues. Worldview was open sourced in 2014. By shifting to an open source approach, the Worldview application has evolved to better serve end-users. Project developers are able to have discussions with end-users and community developers to understand issues and develop new features. Community developers are able to track upcoming features, collaborate on them and make their own contributions. Developers who discover issues are able to address those issues and submit a fix. This reduces the time it takes for a project developer to reproduce an issue or develop a new feature. Getting new developers to contribute to the project has been one of the most important and difficult aspects of open sourcing Worldview. After witnessing potential outside contributors struggle, a focus has been made on making the installation of Worldview simple to reduce the initial learning curve and make contributing code easy. One way we have addressed this is through a simplified setup process. Our setup documentation includes a set of prerequisites and a set of straightforward commands to clone, configure, install and run. This presentation will emphasize our focus to simplify and standardize Worldview's open source code so that more people are able to contribute. The more people who contribute, the better the application will become over time.
A new computational strategy for predicting essential genes.

PubMed

Cheng, Jian; Wu, Wenwu; Zhang, Yinwen; Li, Xiangchen; Jiang, Xiaoqian; Wei, Gehong; Tao, Shiheng

2013-12-21

Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.
Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres-Focus on Feature Selection.

PubMed

Zawbaa, Hossam M; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

2016-01-01

Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven.
Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres—Focus on Feature Selection

PubMed Central

Zawbaa, Hossam M.; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

2016-01-01

Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven. PMID:27315205
Unbiased feature selection in learning random forests for high-dimensional data.

PubMed

Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi

2015-01-01

Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.
Integrated feature extraction and selection for neuroimage classification

NASA Astrophysics Data System (ADS)

Fan, Yong; Shen, Dinggang

2009-02-01

Feature extraction and selection are of great importance in neuroimage classification for identifying informative features and reducing feature dimensionality, which are generally implemented as two separate steps. This paper presents an integrated feature extraction and selection algorithm with two iterative steps: constrained subspace learning based feature extraction and support vector machine (SVM) based feature selection. The subspace learning based feature extraction focuses on the brain regions with higher possibility of being affected by the disease under study, while the possibility of brain regions being affected by disease is estimated by the SVM based feature selection, in conjunction with SVM classification. This algorithm can not only take into account the inter-correlation among different brain regions, but also overcome the limitation of traditional subspace learning based feature extraction methods. To achieve robust performance and optimal selection of parameters involved in feature extraction, selection, and classification, a bootstrapping strategy is used to generate multiple versions of training and testing sets for parameter optimization, according to the classification performance measured by the area under the ROC (receiver operating characteristic) curve. The integrated feature extraction and selection method is applied to a structural MR image based Alzheimer's disease (AD) study with 98 non-demented and 100 demented subjects. Cross-validation results indicate that the proposed algorithm can improve performance of the traditional subspace learning based classification.
Diagnostic support for glaucoma using retinal images: a hybrid image analysis and data mining approach.

PubMed

Yu, Jin; Abidi, Syed Sibte Raza; Artes, Paul; McIntyre, Andy; Heywood, Malcolm

2005-01-01

The availability of modern imaging techniques such as Confocal Scanning Laser Tomography (CSLT) for capturing high-quality optic nerve images offer the potential for developing automatic and objective methods for diagnosing glaucoma. We present a hybrid approach that features the analysis of CSLT images using moment methods to derive abstract image defining features. The features are then used to train classifers for automatically distinguishing CSLT images of normal and glaucoma patient. As a first, in this paper, we present investigations in feature subset selction methods for reducing the relatively large input space produced by the moment methods. We use neural networks and support vector machines to determine a sub-set of moments that offer high classification accuracy. We demonstratee the efficacy of our methods to discriminate between healthy and glaucomatous optic disks based on shape information automatically derived from optic disk topography and reflectance images.
Two-dimensional wavelet analysis based classification of gas chromatogram differential mobility spectrometry signals.

PubMed

Zhao, Weixiang; Sankaran, Shankar; Ibáñez, Ana M; Dandekar, Abhaya M; Davis, Cristina E

2009-08-04

This study introduces two-dimensional (2-D) wavelet analysis to the classification of gas chromatogram differential mobility spectrometry (GC/DMS) data which are composed of retention time, compensation voltage, and corresponding intensities. One reported method to process such large data sets is to convert 2-D signals to 1-D signals by summing intensities either across retention time or compensation voltage, but it can lose important signal information in one data dimension. A 2-D wavelet analysis approach keeps the 2-D structure of original signals, while significantly reducing data size. We applied this feature extraction method to 2-D GC/DMS signals measured from control and disordered fruit and then employed two typical classification algorithms to testify the effects of the resultant features on chemical pattern recognition. Yielding a 93.3% accuracy of separating data from control and disordered fruit samples, 2-D wavelet analysis not only proves its feasibility to extract feature from original 2-D signals but also shows its superiority over the conventional feature extraction methods including converting 2-D to 1-D and selecting distinguishable pixels from training set. Furthermore, this process does not require coupling with specific pattern recognition methods, which may help ensure wide applications of this method to 2-D spectrometry data.
Multiscale moment-based technique for object matching and recognition

NASA Astrophysics Data System (ADS)

Thio, HweeLi; Chen, Liya; Teoh, Eam-Khwang

2000-03-01

A new method is proposed to extract features from an object for matching and recognition. The features proposed are a combination of local and global characteristics -- local characteristics from the 1-D signature function that is defined to each pixel on the object boundary, global characteristics from the moments that are generated from the signature function. The boundary of the object is first extracted, then the signature function is generated by computing the angle between two lines from every point on the boundary as a function of position along the boundary. This signature function is position, scale and rotation invariant (PSRI). The shape of the signature function is then described quantitatively by using moments. The moments of the signature function are the global characters of a local feature set. Using moments as the eventual features instead of the signature function reduces the time and complexity of an object matching application. Multiscale moments are implemented to produce several sets of moments that will generate more accurate matching. Basically multiscale technique is a coarse to fine procedure and makes the proposed method more robust to noise. This method is proposed to match and recognize objects under simple transformation, such as translation, scale changes, rotation and skewing. A simple logo indexing system is implemented to illustrate the performance of the proposed method.
Comparison of the application of B-mode and strain elastography ultrasound in the estimation of lymph node metastasis of papillary thyroid carcinoma based on a radiomics approach.

PubMed

Liu, Tongtong; Ge, Xifeng; Yu, Jinhua; Guo, Yi; Wang, Yuanyuan; Wang, Wenping; Cui, Ligang

2018-06-21

B-mode ultrasound (B-US) and strain elastography ultrasound (SE-US) images have a potential to distinguish thyroid tumor with different lymph node (LN) status. The purpose of our study is to investigate whether the application of multi-modality images including B-US and SE-US can improve the discriminability of thyroid tumor with LN metastasis based on a radiomics approach. Ultrasound (US) images including B-US and SE-US images of 75 papillary thyroid carcinoma (PTC) cases were retrospectively collected. A radiomics approach was developed in this study to estimate LNs status of PTC patients. The approach included image segmentation, quantitative feature extraction, feature selection and classification. Three feature sets were extracted from B-US, SE-US, and multi-modality containing B-US and SE-US. They were used to evaluate the contribution of different modalities. A total of 684 radiomics features have been extracted in our study. We used sparse representation coefficient-based feature selection method with 10-bootstrap to reduce the dimension of feature sets. Support vector machine with leave-one-out cross-validation was used to build the model for estimating LN status. Using features extracted from both B-US and SE-US, the radiomics-based model produced an area under the receiver operating characteristic curve (AUC) [Formula: see text] 0.90, accuracy (ACC) [Formula: see text] 0.85, sensitivity (SENS) [Formula: see text] 0.77 and specificity (SPEC) [Formula: see text] 0.88, which was better than using features extracted from B-US or SE-US separately. Multi-modality images provided more information in radiomics study. Combining use of B-US and SE-US could improve the LN metastasis estimation accuracy for PTC patients.
A practical approach for writer-dependent symbol recognition using a writer-independent symbol recognizer.

PubMed

LaViola, Joseph J; Zeleznik, Robert C

2007-11-01

We present a practical technique for using a writer-independent recognition engine to improve the accuracy and speed while reducing the training requirements of a writer-dependent symbol recognizer. Our writer-dependent recognizer uses a set of binary classifiers based on the AdaBoost learning algorithm, one for each possible pairwise symbol comparison. Each classifier consists of a set of weak learners, one of which is based on a writer-independent handwriting recognizer. During online recognition, we also use the n-best list of the writer-independent recognizer to prune the set of possible symbols and thus reduce the number of required binary classifications. In this paper, we describe the geometric and statistical features used in our recognizer and our all-pairs classification algorithm. We also present the results of experiments that quantify the effect incorporating a writer-independent recognition engine into a writer-dependent recognizer has on accuracy, speed, and user training time.
Australia’s Submarine Design Capabilities and Capacities: Challenges and Options for the Future Submarine

DTIC Science & Technology

2011-01-01

stealth features requiring specialised noise and vibra- tion skills and propulsion plants requiring other unique skill sets. Personnel with these...analysis Acoustic, wake , thermal, electromagnetic, and other signature analysis Combat systems and ship control Combat system integration, combat system...to-diagnose flow-induced radiated noise Own-sensor performance degradation Note: Risks can be reduced for given designs using scale models
Multidimensional Analysis of Nuclear Detonations

DTIC Science & Technology

2015-09-17

Features on the nuclear weapons testing films because of the expanding and emissive nature of the nuclear fireball. The use of these techniques to produce...Treaty (New Start Treaty) have reduced the acceptable margins of error. Multidimensional analysis provides the modern approach to nuclear weapon ...scientific community access to the information necessary to expand upon the knowledge of nuclear weapon effects. This data set has the potential to provide
Delta Clipper vehicle design for supportability

NASA Astrophysics Data System (ADS)

Smiljanic, Ray R.; Klevatt, Paul L.; Steinmeyer, Donald A.

1993-02-01

The paper describes the Single Stage Rocket Technology (SSRT) Delta Clipper vehicle design. As a means of reducing vehicle processing and turnaround times, the SSRT Delta Clipper design, contrary to past practices, incorporates support ability engineering features into its initial set of design requirements. The engineering process used to 'design-in' supportability into the Delta Clipper vehicle is described in detail and is illustrated using diagrams.
Classification of Medical Datasets Using SVMs with Hybrid Evolutionary Algorithms Based on Endocrine-Based Particle Swarm Optimization and Artificial Bee Colony Algorithms.

PubMed

Lin, Kuan-Cheng; Hsieh, Yi-Hsiu

2015-10-01

The classification and analysis of data is an important issue in today's research. Selecting a suitable set of features makes it possible to classify an enormous quantity of data quickly and efficiently. Feature selection is generally viewed as a problem of feature subset selection, such as combination optimization problems. Evolutionary algorithms using random search methods have proven highly effective in obtaining solutions to problems of optimization in a diversity of applications. In this study, we developed a hybrid evolutionary algorithm based on endocrine-based particle swarm optimization (EPSO) and artificial bee colony (ABC) algorithms in conjunction with a support vector machine (SVM) for the selection of optimal feature subsets for the classification of datasets. The results of experiments using specific UCI medical datasets demonstrate that the accuracy of the proposed hybrid evolutionary algorithm is superior to that of basic PSO, EPSO and ABC algorithms, with regard to classification accuracy using subsets with a reduced number of features.
Combining Model-Based and Feature-Driven Diagnosis Approaches - A Case Study on Electromechanical Actuators

NASA Technical Reports Server (NTRS)

Narasimhan, Sriram; Roychoudhury, Indranil; Balaban, Edward; Saxena, Abhinav

2010-01-01

Model-based diagnosis typically uses analytical redundancy to compare predictions from a model against observations from the system being diagnosed. However this approach does not work very well when it is not feasible to create analytic relations describing all the observed data, e.g., for vibration data which is usually sampled at very high rates and requires very detailed finite element models to describe its behavior. In such cases, features (in time and frequency domains) that contain diagnostic information are extracted from the data. Since this is a computationally intensive process, it is not efficient to extract all the features all the time. In this paper we present an approach that combines the analytic model-based and feature-driven diagnosis approaches. The analytic approach is used to reduce the set of possible faults and then features are chosen to best distinguish among the remaining faults. We describe an implementation of this approach on the Flyable Electro-mechanical Actuator (FLEA) test bed.
Modified kernel-based nonlinear feature extraction.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ma, J.; Perkins, S. J.; Theiler, J. P.

2002-01-01

Feature Extraction (FE) techniques are widely used in many applications to pre-process data in order to reduce the complexity of subsequent processes. A group of Kernel-based nonlinear FE ( H E ) algorithms has attracted much attention due to their high performance. However, a serious limitation that is inherent in these algorithms -- the maximal number of features extracted by them is limited by the number of classes involved -- dramatically degrades their flexibility. Here we propose a modified version of those KFE algorithms (MKFE), This algorithm is developed from a special form of scatter-matrix, whose rank is not determinedmore » by the number of classes involved, and thus breaks the inherent limitation in those KFE algorithms. Experimental results suggest that MKFE algorithm is .especially useful when the training set is small.« less
Pneumococcal pneumonia: clinical features, diagnosis and management in HIV-infected and HIV noninfected patients.

PubMed

Madeddu, Giordano; Fois, Alessandro Giuseppe; Pirina, Pietro; Mura, Maria Stella

2009-05-01

In this review, we focus on the clinical features, diagnosis and management of pneumococcal pneumonia in HIV-infected and noninfected patients, with particular attention to the most recent advances in this area. Classical clinical features are found in young adults, whereas atypical forms occur in immunocompromised patients including HIV-infected individuals. Bacteremic pneumococcal pneumonia is more frequently observed in HIV-infected and also in low-risk patients, according to the Pneumonia Severity Index (PSI). Pneumococcal pneumonia diagnostic process includes physical examination, radiologic findings and microbiologic diagnosis. However, etiologic diagnosis using traditional culture methods is difficult to obtain. In this setting, urinary antigen test, which recognizes Streptococcus pneumoniae cell wall C-polysaccharide, increases the probability of etiologic diagnosis. A correct management approach is crucial in reducing pneumococcal pneumonia mortality. The use of the PSI helps clinicians in deciding between inpatient and outpatient management in immunocompetent individuals, according to Infectious Diseases Society of America (IDSA)-American Thoracic Society (ATS) guidelines. Recent findings support PSI utility also in HIV-infected patients. Recently, efficacy of pneumococcal vaccine in reducing pneumococcal disease incidence has been evidenced in both HIV-infected and noninfected individuals. Rapid diagnosis and correct management together with implementation of preventive measures are crucial in order to reduce pneumococcal pneumonia related incidence and mortality in HIV-infected and noninfected patients.

Lossless Compression of JPEG Coded Photo Collections.

PubMed

Wu, Hao; Sun, Xiaoyan; Yang, Jingyu; Zeng, Wenjun; Wu, Feng

2016-04-06

The explosion of digital photos has posed a significant challenge to photo storage and transmission for both personal devices and cloud platforms. In this paper, we propose a novel lossless compression method to further reduce the size of a set of JPEG coded correlated images without any loss of information. The proposed method jointly removes inter/intra image redundancy in the feature, spatial, and frequency domains. For each collection, we first organize the images into a pseudo video by minimizing the global prediction cost in the feature domain. We then present a hybrid disparity compensation method to better exploit both the global and local correlations among the images in the spatial domain. Furthermore, the redundancy between each compensated signal and the corresponding target image is adaptively reduced in the frequency domain. Experimental results demonstrate the effectiveness of the proposed lossless compression method. Compared to the JPEG coded image collections, our method achieves average bit savings of more than 31%.
Brain properties predict proximity to symptom onset in sporadic Alzheimer's disease.

PubMed

Vogel, Jacob W; Vachon-Presseau, Etienne; Pichet Binette, Alexa; Tam, Angela; Orban, Pierre; La Joie, Renaud; Savard, Mélissa; Picard, Cynthia; Poirier, Judes; Bellec, Pierre; Breitner, John C S; Villeneuve, Sylvia

2018-06-01

See Tijms and Visser (doi:10.1093/brain/awy113) for a scientific commentary on this article.Alzheimer's disease is preceded by a lengthy 'preclinical' stage spanning many years, during which subtle brain changes occur in the absence of overt cognitive symptoms. Predicting when the onset of disease symptoms will occur is an unsolved challenge in individuals with sporadic Alzheimer's disease. In individuals with autosomal dominant genetic Alzheimer's disease, the age of symptom onset is similar across generations, allowing the prediction of individual onset times with some accuracy. We extend this concept to persons with a parental history of sporadic Alzheimer's disease to test whether an individual's symptom onset age can be informed by the onset age of their affected parent, and whether this estimated onset age can be predicted using only MRI. Structural and functional MRIs were acquired from 255 ageing cognitively healthy subjects with a parental history of sporadic Alzheimer's disease from the PREVENT-AD cohort. Years to estimated symptom onset was calculated as participant age minus age of parental symptom onset. Grey matter volume was extracted from T1-weighted images and whole-brain resting state functional connectivity was evaluated using degree count. Both modalities were summarized using a 444-region cortical-subcortical atlas. The entire sample was divided into training (n = 138) and testing (n = 68) sets. Within the training set, individuals closer to or beyond their parent's symptom onset demonstrated reduced grey matter volume and altered functional connectivity, specifically in regions known to be vulnerable in Alzheimer's disease. Machine learning was used to identify a weighted set of imaging features trained to predict years to estimated symptom onset. This feature set alone significantly predicted years to estimated symptom onset in the unseen testing data. This model, using only neuroimaging features, significantly outperformed a similar model instead trained with cognitive, genetic, imaging and demographic features used in a traditional clinical setting. We next tested if these brain properties could be generalized to predict time to clinical progression in a subgroup of 26 individuals from the Alzheimer's Disease Neuroimaging Initiative, who eventually converted either to mild cognitive impairment or to Alzheimer's dementia. The feature set trained on years to estimated symptom onset in the PREVENT-AD predicted variance in time to clinical conversion in this separate longitudinal dataset. Adjusting for participant age did not impact any of the results. These findings demonstrate that years to estimated symptom onset or similar measures can be predicted from brain features and may help estimate presymptomatic disease progression in at-risk individuals.
Predicting discharge mortality after acute ischemic stroke using balanced data.

PubMed

Ho, King Chung; Speier, William; El-Saden, Suzie; Liebeskind, David S; Saver, Jeffery L; Bui, Alex A T; Arnold, Corey W

2014-01-01

Several models have been developed to predict stroke outcomes (e.g., stroke mortality, patient dependence, etc.) in recent decades. However, there is little discussion regarding the problem of between-class imbalance in stroke datasets, which leads to prediction bias and decreased performance. In this paper, we demonstrate the use of the Synthetic Minority Over-sampling Technique to overcome such problems. We also compare state of the art machine learning methods and construct a six-variable support vector machine (SVM) model to predict stroke mortality at discharge. Finally, we discuss how the identification of a reduced feature set allowed us to identify additional cases in our research database for validation testing. Our classifier achieved a c-statistic of 0.865 on the cross-validated dataset, demonstrating good classification performance using a reduced set of variables.
Neural Network and Letter Recognition.

NASA Astrophysics Data System (ADS)

Lee, Hue Yeon

Neural net architectures and learning algorithms that recognize hand written 36 alphanumeric characters are studied. The thin line input patterns written in 32 x 32 binary array are used. The system is comprised of two major components, viz. a preprocessing unit and a Recognition unit. The preprocessing unit in turn consists of three layers of neurons; the U-layer, the V-layer, and the C -layer. The functions of the U-layer is to extract local features by template matching. The correlation between the detected local features are considered. Through correlating neurons in a plane with their neighboring neurons, the V-layer would thicken the on-cells or lines that are groups of on-cells of the previous layer. These two correlations would yield some deformation tolerance and some of the rotational tolerance of the system. The C-layer then compresses data through the 'Gabor' transform. Pattern dependent choice of center and wavelengths of 'Gabor' filters is the cause of shift and scale tolerance of the system. Three different learning schemes had been investigated in the recognition unit, namely; the error back propagation learning with hidden units, a simple perceptron learning, and a competitive learning. Their performances were analyzed and compared. Since sometimes the network fails to distinguish between two letters that are inherently similar, additional ambiguity resolving neural nets are introduced on top of the above main neural net. The two dimensional Fourier transform is used as the preprocessing and the perceptron is used as the recognition unit of the ambiguity resolver. One hundred different person's handwriting sets are collected. Some of these are used as the training sets and the remainders are used as the test sets. The correct recognition rate of the system increases with the number of training sets and eventually saturates at a certain value. Similar recognition rates are obtained for the above three different learning algorithms. The minimum error rate, 4.9% is achieved for alphanumeric sets when 50 sets are trained. With the ambiguity resolver, it is reduced to 2.5%. In case that only numeral sets are trained and tested, 2.0% error rate is achieved. When only alphabet sets are considered, the error rate is reduced to 1.1%.
Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data.

PubMed

Ooi, Chia Huey; Chetty, Madhu; Teng, Shyh Wei

2006-06-23

Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy. We present two realistically evaluated correlation-based feature selection techniques which incorporate, in addition to the two existing criteria involved in forming a predictor set (relevance and redundancy), a third criterion called the degree of differential prioritization (DDP). DDP functions as a parameter to strike the balance between relevance and redundancy, providing our techniques with the novel ability to differentially prioritize the optimization of relevance against redundancy (and vice versa). This ability proves useful in producing optimal classification accuracy while using reasonably small predictor set sizes for nine well-known multiclass microarray datasets. For multiclass microarray datasets, especially the GCM and NCI60 datasets, DDP enables our filter-based techniques to produce accuracies better than those reported in previous studies which employed similarly realistic evaluation procedures.
Prediction of Cognitive States During Flight Simulation Using Multimodal Psychophysiological Sensing

NASA Technical Reports Server (NTRS)

Harrivel, Angela R.; Stephens, Chad L.; Milletich, Robert J.; Heinich, Christina M.; Last, Mary Carolyn; Napoli, Nicholas J.; Abraham, Nijo A.; Prinzel, Lawrence J.; Motter, Mark A.; Pope, Alan T.

2017-01-01

The Commercial Aviation Safety Team found the majority of recent international commercial aviation accidents attributable to loss of control inflight involved flight crew loss of airplane state awareness (ASA), and distraction was involved in all of them. Research on attention-related human performance limiting states (AHPLS) such as channelized attention, diverted attention, startle/surprise, and confirmation bias, has been recommended in a Safety Enhancement (SE) entitled "Training for Attention Management." To accomplish the detection of such cognitive and psychophysiological states, a broad suite of sensors was implemented to simultaneously measure their physiological markers during a high fidelity flight simulation human subject study. Twenty-four pilot participants were asked to wear the sensors while they performed benchmark tasks and motion-based flight scenarios designed to induce AHPLS. Pattern classification was employed to predict the occurrence of AHPLS during flight simulation also designed to induce those states. Classifier training data were collected during performance of the benchmark tasks. Multimodal classification was performed, using pre-processed electroencephalography, galvanic skin response, electrocardiogram, and respiration signals as input features. A combination of one, some or all modalities were used. Extreme gradient boosting, random forest and two support vector machine classifiers were implemented. The best accuracy for each modality-classifier combination is reported. Results using a select set of features and using the full set of available features are presented. Further, results are presented for training one classifier with the combined features and for training multiple classifiers with features from each modality separately. Using the select set of features and combined training, multistate prediction accuracy averaged 0.64 +/- 0.14 across thirteen participants and was significantly higher than that for the separate training case. These results support the goal of demonstrating simultaneous real-time classification of multiple states using multiple sensing modalities in high fidelity flight simulators. This detection is intended to support and inform training methods under development to mitigate the loss of ASA and thus reduce accidents and incidents.
Collaborative knowledge acquisition for the design of context-aware alert systems

PubMed Central

Joffe, Erel; Havakuk, Ofer; Herskovic, Jorge R; Patel, Vimla L

2012-01-01

Objective To present a framework for combining implicit knowledge acquisition from multiple experts with machine learning and to evaluate this framework in the context of anemia alerts. Materials and Methods Five internal medicine residents reviewed 18 anemia alerts, while ‘talking aloud’. They identified features that were reviewed by two or more physicians to determine appropriate alert level, etiology and treatment recommendation. Based on these features, data were extracted from 100 randomly-selected anemia cases for a training set and an additional 82 cases for a test set. Two staff internists assigned an alert level, etiology and treatment recommendation before and after reviewing the entire electronic medical record. The training set of 118 cases (100 plus 18) and the test set of 82 cases were explored using RIDOR and JRip algorithms. Results The feature set was sufficient to assess 93% of anemia cases (intraclass correlation for alert level before and after review of the records by internists 1 and 2 were 0.92 and 0.95, respectively). High-precision classifiers were constructed to identify low-level alerts (precision p=0.87, recall R=0.4), iron deficiency (p=1.0, R=0.73), and anemia associated with kidney disease (p=0.87, R=0.77). Discussion It was possible to identify low-level alerts and several conditions commonly associated with chronic anemia. This approach may reduce the number of clinically unimportant alerts. The study was limited to anemia alerts. Furthermore, clinicians were aware of the study hypotheses potentially biasing their evaluation. Conclusion Implicit knowledge acquisition, collaborative filtering and machine learning were combined automatically to induce clinically meaningful and precise decision rules. PMID:22744961
AVP-IC50 Pred: Multiple machine learning techniques-based prediction of peptide antiviral activity in terms of half maximal inhibitory concentration (IC50).

PubMed

Qureshi, Abid; Tandon, Himani; Kumar, Manoj

2015-11-01

Peptide-based antiviral therapeutics has gradually paved their way into mainstream drug discovery research. Experimental determination of peptides' antiviral activity as expressed by their IC50 values involves a lot of effort. Therefore, we have developed "AVP-IC50 Pred," a regression-based algorithm to predict the antiviral activity in terms of IC50 values (μM). A total of 759 non-redundant peptides from AVPdb and HIPdb were divided into a training/test set having 683 peptides (T(683)) and a validation set with 76 independent peptides (V(76)) for evaluation. We utilized important peptide sequence features like amino-acid compositions, binary profile of N8-C8 residues, physicochemical properties and their hybrids. Four different machine learning techniques (MLTs) namely Support vector machine, Random Forest, Instance-based classifier, and K-Star were employed. During 10-fold cross validation, we achieved maximum Pearson correlation coefficients (PCCs) of 0.66, 0.64, 0.56, 0.55, respectively, for the above MLTs using the best combination of feature sets. All the predictive models also performed well on the independent validation dataset and achieved maximum PCCs of 0.74, 0.68, 0.59, 0.57, respectively, on the best combination of feature sets. The AVP-IC50 Pred web server is anticipated to assist the researchers working on antiviral therapeutics by enabling them to computationally screen many compounds and focus experimental validation on the most promising set of peptides, thus reducing cost and time efforts. The server is available at http://crdd.osdd.net/servers/ic50avp. © 2015 Wiley Periodicals, Inc.
Research on artificial neural network intrusion detection photochemistry based on the improved wavelet analysis and transformation

NASA Astrophysics Data System (ADS)

Li, Hong; Ding, Xue

2017-03-01

This paper combines wavelet analysis and wavelet transform theory with artificial neural network, through the pretreatment on point feature attributes before in intrusion detection, to make them suitable for improvement of wavelet neural network. The whole intrusion classification model gets the better adaptability, self-learning ability, greatly enhances the wavelet neural network for solving the problem of field detection invasion, reduces storage space, contributes to improve the performance of the constructed neural network, and reduces the training time. Finally the results of the KDDCup99 data set simulation experiment shows that, this method reduces the complexity of constructing wavelet neural network, but also ensures the accuracy of the intrusion classification.
Comparing Pattern Recognition Feature Sets for Sorting Triples in the FIRST Database

NASA Astrophysics Data System (ADS)

Proctor, D. D.

2006-07-01

Pattern recognition techniques have been used with increasing success for coping with the tremendous amounts of data being generated by automated surveys. Usually this process involves construction of training sets, the typical examples of data with known classifications. Given a feature set, along with the training set, statistical methods can be employed to generate a classifier. The classifier is then applied to process the remaining data. Feature set selection, however, is still an issue. This paper presents techniques developed for accommodating data for which a substantive portion of the training set cannot be classified unambiguously, a typical case for low-resolution data. Significance tests on the sort-ordered, sample-size-normalized vote distribution of an ensemble of decision trees is introduced as a method of evaluating relative quality of feature sets. The technique is applied to comparing feature sets for sorting a particular radio galaxy morphology, bent-doubles, from the Faint Images of the Radio Sky at Twenty Centimeters (FIRST) database. Also examined are alternative functional forms for feature sets. Associated standard deviations provide the means to evaluate the effect of the number of folds, the number of classifiers per fold, and the sample size on the resulting classifications. The technique also may be applied to situations for which, although accurate classifications are available, the feature set is clearly inadequate, but is desired nonetheless to make the best of available information.
Feature Selection for Ridge Regression with Provable Guarantees.

PubMed

Paul, Saurabh; Drineas, Petros

2016-04-01

We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
RecceMan: an interactive recognition assistance for image-based reconnaissance: synergistic effects of human perception and computational methods for object recognition, identification, and infrastructure analysis

NASA Astrophysics Data System (ADS)

El Bekri, Nadia; Angele, Susanne; Ruckhäberle, Martin; Peinsipp-Byma, Elisabeth; Haelke, Bruno

2015-10-01

This paper introduces an interactive recognition assistance system for imaging reconnaissance. This system supports aerial image analysts on missions during two main tasks: Object recognition and infrastructure analysis. Object recognition concentrates on the classification of one single object. Infrastructure analysis deals with the description of the components of an infrastructure and the recognition of the infrastructure type (e.g. military airfield). Based on satellite or aerial images, aerial image analysts are able to extract single object features and thereby recognize different object types. It is one of the most challenging tasks in the imaging reconnaissance. Currently, there are no high potential ATR (automatic target recognition) applications available, as consequence the human observer cannot be replaced entirely. State-of-the-art ATR applications cannot assume in equal measure human perception and interpretation. Why is this still such a critical issue? First, cluttered and noisy images make it difficult to automatically extract, classify and identify object types. Second, due to the changed warfare and the rise of asymmetric threats it is nearly impossible to create an underlying data set containing all features, objects or infrastructure types. Many other reasons like environmental parameters or aspect angles compound the application of ATR supplementary. Due to the lack of suitable ATR procedures, the human factor is still important and so far irreplaceable. In order to use the potential benefits of the human perception and computational methods in a synergistic way, both are unified in an interactive assistance system. RecceMan® (Reconnaissance Manual) offers two different modes for aerial image analysts on missions: the object recognition mode and the infrastructure analysis mode. The aim of the object recognition mode is to recognize a certain object type based on the object features that originated from the image signatures. The infrastructure analysis mode pursues the goal to analyze the function of the infrastructure. The image analyst extracts visually certain target object signatures, assigns them to corresponding object features and is finally able to recognize the object type. The system offers him the possibility to assign the image signatures to features given by sample images. The underlying data set contains a wide range of objects features and object types for different domains like ships or land vehicles. Each domain has its own feature tree developed by aerial image analyst experts. By selecting the corresponding features, the possible solution set of objects is automatically reduced and matches only the objects that contain the selected features. Moreover, we give an outlook of current research in the field of ground target analysis in which we deal with partly automated methods to extract image signatures and assign them to the corresponding features. This research includes methods for automatically determining the orientation of an object and geometric features like width and length of the object. This step enables to reduce automatically the possible object types offered to the image analyst by the interactive recognition assistance system.
Impaired visual search in rats reveals cholinergic contributions to feature binding in visuospatial attention.

PubMed

Botly, Leigh C P; De Rosa, Eve

2012-10-01

The visual search task established the feature integration theory of attention in humans and measures visuospatial attentional contributions to feature binding. We recently demonstrated that the neuromodulator acetylcholine (ACh), from the nucleus basalis magnocellularis (NBM), supports the attentional processes required for feature binding using a rat digging-based task. Additional research has demonstrated cholinergic contributions from the NBM to visuospatial attention in rats. Here, we combined these lines of evidence and employed visual search in rats to examine whether cortical cholinergic input supports visuospatial attention specifically for feature binding. We trained 18 male Long-Evans rats to perform visual search using touch screen-equipped operant chambers. Sessions comprised Feature Search (no feature binding required) and Conjunctive Search (feature binding required) trials using multiple stimulus set sizes. Following acquisition of visual search, 8 rats received bilateral NBM lesions using 192 IgG-saporin to selectively reduce cholinergic afferentation of the neocortex, which we hypothesized would selectively disrupt the visuospatial attentional processes needed for efficient conjunctive visual search. As expected, relative to sham-lesioned rats, ACh-NBM-lesioned rats took significantly longer to locate the target stimulus on Conjunctive Search, but not Feature Search trials, thus demonstrating that cholinergic contributions to visuospatial attention are important for feature binding in rats.
Feature Selection and Pedestrian Detection Based on Sparse Representation.

PubMed

Yao, Shihong; Wang, Tao; Shen, Weiming; Pan, Shaoming; Chong, Yanwen; Ding, Fei

2015-01-01

Pedestrian detection have been currently devoted to the extraction of effective pedestrian features, which has become one of the obstacles in pedestrian detection application according to the variety of pedestrian features and their large dimension. Based on the theoretical analysis of six frequently-used features, SIFT, SURF, Haar, HOG, LBP and LSS, and their comparison with experimental results, this paper screens out the sparse feature subsets via sparse representation to investigate whether the sparse subsets have the same description abilities and the most stable features. When any two of the six features are fused, the fusion feature is sparsely represented to obtain its important components. Sparse subsets of the fusion features can be rapidly generated by avoiding calculation of the corresponding index of dimension numbers of these feature descriptors; thus, the calculation speed of the feature dimension reduction is improved and the pedestrian detection time is reduced. Experimental results show that sparse feature subsets are capable of keeping the important components of these six feature descriptors. The sparse features of HOG and LSS possess the same description ability and consume less time compared with their full features. The ratios of the sparse feature subsets of HOG and LSS to their full sets are the highest among the six, and thus these two features can be used to best describe the characteristics of the pedestrian and the sparse feature subsets of the combination of HOG-LSS show better distinguishing ability and parsimony.
Combining various types of classifiers and features extracted from magnetic resonance imaging data in schizophrenia recognition.

PubMed

Janousova, Eva; Schwarz, Daniel; Kasparek, Tomas

2015-06-30

We investigated a combination of three classification algorithms, namely the modified maximum uncertainty linear discriminant analysis (mMLDA), the centroid method, and the average linkage, with three types of features extracted from three-dimensional T1-weighted magnetic resonance (MR) brain images, specifically MR intensities, grey matter densities, and local deformations for distinguishing 49 first episode schizophrenia male patients from 49 healthy male subjects. The feature sets were reduced using intersubject principal component analysis before classification. By combining the classifiers, we were able to obtain slightly improved results when compared with single classifiers. The best classification performance (81.6% accuracy, 75.5% sensitivity, and 87.8% specificity) was significantly better than classification by chance. We also showed that classifiers based on features calculated using more computation-intensive image preprocessing perform better; mMLDA with classification boundary calculated as weighted mean discriminative scores of the groups had improved sensitivity but similar accuracy compared to the original MLDA; reducing a number of eigenvectors during data reduction did not always lead to higher classification accuracy, since noise as well as the signal important for classification were removed. Our findings provide important information for schizophrenia research and may improve accuracy of computer-aided diagnostics of neuropsychiatric diseases. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
From big data to rich data: The key features of athlete wheelchair mobility performance.

PubMed

van der Slikke, R M A; Berger, M A M; Bregman, D J J; Veeger, H E J

2016-10-03

Quantitative assessment of an athlete׳s individual wheelchair mobility performance is one prerequisite needed to evaluate game performance, improve wheelchair settings and optimize training routines. Inertial Measurement Unit (IMU) based methods can be used to perform such quantitative assessment, providing a large number of kinematic data. The goal of this research was to reduce that large amount of data to a set of key features best describing wheelchair mobility performance in match play and present them in meaningful way for both scientists and athletes. To test the discriminative power, wheelchair mobility characteristics of athletes with different performance levels were compared. The wheelchair kinematics of 29 (inter-)national level athletes were measured during a match using three inertial sensors mounted on the wheelchair. Principal component analysis was used to reduce 22 kinematic outcomes to a set of six outcomes regarding linear and rotational movement; speed and acceleration; average and best performance. In addition, it was explored whether groups of athletes with known performance differences based on their impairment classification also differed with respect to these key outcomes using univariate general linear models. For all six key outcomes classification showed to be a significant factor (p<0.05). We composed a set of six key kinematic outcomes that accurately describe wheelchair mobility performance in match play. The key kinematic outcomes were displayed in an easy to interpret way, usable for athletes, coaches and scientists. This standardized representation enables comparison of different wheelchair sports regarding wheelchair mobility, but also evaluation at the level of an individual athlete. By this means, the tool could enhance further development of wheelchair sports in general. Copyright © 2016 Elsevier Ltd. All rights reserved.
Parameter Optimization for Feature and Hit Generation in a General Unknown Screening Method-Proof of Concept Study Using a Design of Experiment Approach for a High Resolution Mass Spectrometry Procedure after Data Independent Acquisition.

PubMed

Elmiger, Marco P; Poetzsch, Michael; Steuer, Andrea E; Kraemer, Thomas

2018-03-06

High resolution mass spectrometry and modern data independent acquisition (DIA) methods enable the creation of general unknown screening (GUS) procedures. However, even when DIA is used, its potential is far from being exploited, because often, the untargeted acquisition is followed by a targeted search. Applying an actual GUS (including untargeted screening) produces an immense amount of data that must be dealt with. An optimization of the parameters regulating the feature detection and hit generation algorithms of the data processing software could significantly reduce the amount of unnecessary data and thereby the workload. Design of experiment (DoE) approaches allow a simultaneous optimization of multiple parameters. In a first step, parameters are evaluated (crucial or noncrucial). Second, crucial parameters are optimized. The aim in this study was to reduce the number of hits, without missing analytes. The obtained parameter settings from the optimization were compared to the standard settings by analyzing a test set of blood samples spiked with 22 relevant analytes as well as 62 authentic forensic cases. The optimization lead to a marked reduction of workload (12.3 to 1.1% and 3.8 to 1.1% hits for the test set and the authentic cases, respectively) while simultaneously increasing the identification rate (68.2 to 86.4% and 68.8 to 88.1%, respectively). This proof of concept study emphasizes the great potential of DoE approaches to master the data overload resulting from modern data independent acquisition methods used for general unknown screening procedures by optimizing software parameters.
Rapid production of optimal-quality reduced-resolution representations of very large databases

DOEpatents

Sigeti, David E.; Duchaineau, Mark; Miller, Mark C.; Wolinsky, Murray; Aldrich, Charles; Mineev-Weinstein, Mark B.

2001-01-01

View space representation data is produced in real time from a world space database representing terrain features. The world space database is first preprocessed. A database is formed having one element for each spatial region corresponding to a finest selected level of detail. A multiresolution database is then formed by merging elements and a strict error metric is computed for each element at each level of detail that is independent of parameters defining the view space. The multiresolution database and associated strict error metrics are then processed in real time for real time frame representations. View parameters for a view volume comprising a view location and field of view are selected. The error metric with the view parameters is converted to a view-dependent error metric. Elements with the coarsest resolution are chosen for an initial representation. Data set first elements from the initial representation data set are selected that are at least partially within the view volume. The first elements are placed in a split queue ordered by the value of the view-dependent error metric. If the number of first elements in the queue meets or exceeds a predetermined number of elements or whether the largest error metric is less than or equal to a selected upper error metric bound, the element at the head of the queue is force split and the resulting elements are inserted into the queue. Force splitting is continued until the determination is positive to form a first multiresolution set of elements. The first multiresolution set of elements is then outputted as reduced resolution view space data representing the terrain features.
Spectral feature design in high dimensional multispectral data

NASA Technical Reports Server (NTRS)

Chen, Chih-Chien Thomas; Landgrebe, David A.

1988-01-01

The High resolution Imaging Spectrometer (HIRIS) is designed to acquire images simultaneously in 192 spectral bands in the 0.4 to 2.5 micrometers wavelength region. It will make possible the collection of essentially continuous reflectance spectra at a spectral resolution sufficient to extract significantly enhanced amounts of information from return signals as compared to existing systems. The advantages of such high dimensional data come at a cost of increased system and data complexity. For example, since the finer the spectral resolution, the higher the data rate, it becomes impractical to design the sensor to be operated continuously. It is essential to find new ways to preprocess the data which reduce the data rate while at the same time maintaining the information content of the high dimensional signal produced. Four spectral feature design techniques are developed from the Weighted Karhunen-Loeve Transforms: (1) non-overlapping band feature selection algorithm; (2) overlapping band feature selection algorithm; (3) Walsh function approach; and (4) infinite clipped optimal function approach. The infinite clipped optimal function approach is chosen since the features are easiest to find and their classification performance is the best. After the preprocessed data has been received at the ground station, canonical analysis is further used to find the best set of features under the criterion that maximal class separability is achieved. Both 100 dimensional vegetation data and 200 dimensional soil data were used to test the spectral feature design system. It was shown that the infinite clipped versions of the first 16 optimal features had excellent classification performance. The overall probability of correct classification is over 90 percent while providing for a reduced downlink data rate by a factor of 10.
Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools

PubMed Central

Cheng, Yinhe; Tzeng, Tzy-Hwa Kathy

2016-01-01

This paper introduces a high-throughput software tool framework called sam2bam that enables users to significantly speed up pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156–186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize multiple processors, available memory, high-bandwidth storage, and hardware compression accelerators, if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting input data are provided by using plug-in tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of next generation sequencing (NGS) data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime of NGS data pre-processing from about 20 hours to about nine minutes for a whole-genome sequencing data set on the same system using up to 711 GB of memory. PMID:27861637

Computer-aided-diagnosis (CAD) for colposcopy

NASA Astrophysics Data System (ADS)

Lange, Holger; Ferris, Daron G.

2005-04-01

Uterine cervical cancer is the second most common cancer among women worldwide. Colposcopy is a diagnostic method, whereby a physician (colposcopist) visually inspects the lower genital tract (cervix, vulva and vagina), with special emphasis on the subjective appearance of metaplastic epithelium comprising the transformation zone on the cervix. Cervical cancer precursor lesions and invasive cancer exhibit certain distinctly abnormal morphologic features. Lesion characteristics such as margin; color or opacity; blood vessel caliber, intercapillary spacing and distribution; and contour are considered by colposcopists to derive a clinical diagnosis. Clinicians and academia have suggested and shown proof of concept that automated image analysis of cervical imagery can be used for cervical cancer screening and diagnosis, having the potential to have a direct impact on improving women"s health care and reducing associated costs. STI Medical Systems is developing a Computer-Aided-Diagnosis (CAD) system for colposcopy -- ColpoCAD. At the heart of ColpoCAD is a complex multi-sensor, multi-data and multi-feature image analysis system. A functional description is presented of the envisioned ColpoCAD system, broken down into: Modality Data Management System, Image Enhancement, Feature Extraction, Reference Database, and Diagnosis and directed Biopsies. The system design and development process of the image analysis system is outlined. The system design provides a modular and open architecture built on feature based processing. The core feature set includes the visual features used by colposcopists. This feature set can be extended to include new features introduced by new instrument technologies, like fluorescence and impedance, and any other plausible feature that can be extracted from the cervical data. Preliminary results of our research on detecting the three most important features: blood vessel structures, acetowhite regions and lesion margins are shown. As this is a new and very complex field in medical image processing, the hope is that this paper can provide a framework and basis to encourage and facilitate collaboration and discussion between industry, academia, and medical practitioners.
Analysis of Morphological Features of Benign and Malignant Breast Cell Extracted From FNAC Microscopic Image Using the Pearsonian System of Curves.

PubMed

Rajbongshi, Nijara; Bora, Kangkana; Nath, Dilip C; Das, Anup K; Mahanta, Lipi B

2018-01-01

Cytological changes in terms of shape and size of nuclei are some of the common morphometric features to study breast cancer, which can be observed by careful screening of fine needle aspiration cytology (FNAC) images. This study attempts to categorize a collection of FNAC microscopic images into benign and malignant classes based on family of probability distribution using some morphometric features of cell nuclei. For this study, features namely area, perimeter, eccentricity, compactness, and circularity of cell nuclei were extracted from FNAC images of both benign and malignant samples using an image processing technique. All experiments were performed on a generated FNAC image database containing 564 malignant (cancerous) and 693 benign (noncancerous) cell level images. The five-set extracted features were reduced to three-set (area, perimeter, and circularity) based on the mean statistic. Finally, the data were fitted to the generalized Pearsonian system of frequency curve, so that the resulting distribution can be used as a statistical model. Pearsonian system is a family of distributions where kappa (κ) is the selection criteria computed as functions of the first four central moments. For the benign group, kappa (κ) corresponding to area, perimeter, and circularity was -0.00004, 0.0000, and 0.04155 and for malignant group it was 1016942, 0.01464, and -0.3213, respectively. Thus, the family of distribution related to these features for the benign and malignant group were different, and therefore, characterization of their probability curve will also be different.
AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity.

PubMed

Sun, Lei; Wang, Jun; Wei, Jinmao

2017-03-14

The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features). The existing ROC-based feature selection approaches are simple and effective in evaluating individual features. However, these approaches may fail to find real target feature subset due to their lack of effective means to reduce the redundancy between features, which is essential in machine learning. In this paper, we propose to assess feature complementarity by a trick of measuring the distances between the misclassified instances and their nearest misses on the dimensions of pairwise features. If a misclassified instance and its nearest miss on one feature dimension are far apart on another feature dimension, the two features are regarded as complementary to each other. Subsequently, we propose a novel filter feature selection approach on the basis of the ROC analysis. The new approach employs an efficient heuristic search strategy to select optimal features with highest complementarities. The experimental results on a broad range of microarray data sets validate that the classifiers built on the feature subset selected by our approach can get the minimal balanced error rate with a small amount of significant features. Compared with other ROC-based feature selection approaches, our new approach can select fewer features and effectively improve the classification performance.
Diagnostic and prognostic histopathology system using morphometric indices

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parvin, Bahram; Chang, Hang; Han, Ju

Determining at least one of a prognosis or a therapy for a patient based on a stained tissue section of the patient. An image of a stained tissue section of a patient is processed by a processing device. A set of features values for a set of cell-based features is extracted from the processed image, and the processed image is associated with a particular cluster of a plurality of clusters based on the set of feature values, where the plurality of clusters is defined with respect to a feature space corresponding to the set of features.
Tumor Burden Analysis on Computed Tomography by Automated Liver and Tumor Segmentation

PubMed Central

Linguraru, Marius George; Richbourg, William J.; Liu, Jianfei; Watt, Jeremy M.; Pamulapati, Vivek; Wang, Shijun; Summers, Ronald M.

2013-01-01

The paper presents the automated computation of hepatic tumor burden from abdominal CT images of diseased populations with images with inconsistent enhancement. The automated segmentation of livers is addressed first. A novel three-dimensional (3D) affine invariant shape parameterization is employed to compare local shape across organs. By generating a regular sampling of the organ's surface, this parameterization can be effectively used to compare features of a set of closed 3D surfaces point-to-point, while avoiding common problems with the parameterization of concave surfaces. From an initial segmentation of the livers, the areas of atypical local shape are determined using training sets. A geodesic active contour corrects locally the segmentations of the livers in abnormal images. Graph cuts segment the hepatic tumors using shape and enhancement constraints. Liver segmentation errors are reduced significantly and all tumors are detected. Finally, support vector machines and feature selection are employed to reduce the number of false tumor detections. The tumor detection true position fraction of 100% is achieved at 2.3 false positives/case and the tumor burden is estimated with 0.9% error. Results from the test data demonstrate the method's robustness to analyze livers from difficult clinical cases to allow the temporal monitoring of patients with hepatic cancer. PMID:22893379
An assessment of the positive partnership project in Thailand: key considerations for scaling-up microcredit loans for HIV-positive and negative pairs in other settings.

PubMed

Viravaidya, M; Wolf, R C; Guest, P

2008-01-01

Stigmatization and discrimination against people living with HIV/AIDS (PLHA), and their families, remains a barrier to participation in prevention and care programmes. This barrier takes on added significance as Thailand expands provision of free antiretroviral therapy (ART). This paper documents an innovative approach to improve quality of life for PLHA, while reducing levels of stigma and discrimination. The Population and Community Development Association (PDA) began implementing the Positive Partnership Project (PPP) in 2002. In this project, an HIV-negative person must team up with an HIV-positive person to become eligible for a loan for income-generating activities. The use of microcredit to explicitly reduce stigma and discrimination is a unique feature of the PPP. While the microcredit component of the project is an important dimension for improving the status of participating PLHA, the impacts of the project extend far beyond the PLHA who receive loans. Both directly and indirectly, it has contributed to improved quality of life and economic conditions for PLHA, while raising their visibility and acceptance in hundreds of communities throughout urban and rural Thailand. This paper identifies key features of the project and considerations for adapting its use in other settings.
An efficient scheme for automatic web pages categorization using the support vector machine

NASA Astrophysics Data System (ADS)

Bhalla, Vinod Kumar; Kumar, Neeraj

2016-07-01

In the past few years, with an evolution of the Internet and related technologies, the number of the Internet users grows exponentially. These users demand access to relevant web pages from the Internet within fraction of seconds. To achieve this goal, there is a requirement of an efficient categorization of web page contents. Manual categorization of these billions of web pages to achieve high accuracy is a challenging task. Most of the existing techniques reported in the literature are semi-automatic. Using these techniques, higher level of accuracy cannot be achieved. To achieve these goals, this paper proposes an automatic web pages categorization into the domain category. The proposed scheme is based on the identification of specific and relevant features of the web pages. In the proposed scheme, first extraction and evaluation of features are done followed by filtering the feature set for categorization of domain web pages. A feature extraction tool based on the HTML document object model of the web page is developed in the proposed scheme. Feature extraction and weight assignment are based on the collection of domain-specific keyword list developed by considering various domain pages. Moreover, the keyword list is reduced on the basis of ids of keywords in keyword list. Also, stemming of keywords and tag text is done to achieve a higher accuracy. An extensive feature set is generated to develop a robust classification technique. The proposed scheme was evaluated using a machine learning method in combination with feature extraction and statistical analysis using support vector machine kernel as the classification tool. The results obtained confirm the effectiveness of the proposed scheme in terms of its accuracy in different categories of web pages.
MO-DE-207A-02: A Feature-Preserving Image Reconstruction Method for Improved Pancreaticlesion Classification in Diagnostic CT Imaging

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, J; Tsui, B; Noo, F

Purpose: To develop a feature-preserving model based image reconstruction (MBIR) method that improves performance in pancreatic lesion classification at equal or reduced radiation dose. Methods: A set of pancreatic lesion models was created with both benign and premalignant lesion types. These two classes of lesions are distinguished by their fine internal structures; their delineation is therefore crucial to the task of pancreatic lesion classification. To reduce image noise while preserving the features of the lesions, we developed a MBIR method with curvature-based regularization. The novel regularization encourages formation of smooth surfaces that model both the exterior shape and the internalmore » features of pancreatic lesions. Given that the curvature depends on the unknown image, image reconstruction or denoising becomes a non-convex optimization problem; to address this issue an iterative-reweighting scheme was used to calculate and update the curvature using the image from the previous iteration. Evaluation was carried out with insertion of the lesion models into the pancreas of a patient CT image. Results: Visual inspection was used to compare conventional TV regularization with our curvature-based regularization. Several penalty-strengths were considered for TV regularization, all of which resulted in erasing portions of the septation (thin partition) in a premalignant lesion. At matched noise variance (50% noise reduction in the patient stomach region), the connectivity of the septation was well preserved using the proposed curvature-based method. Conclusion: The curvature-based regularization is able to reduce image noise while simultaneously preserving the lesion features. This method could potentially improve task performance for pancreatic lesion classification at equal or reduced radiation dose. The result is of high significance for longitudinal surveillance studies of patients with pancreatic cysts, which may develop into pancreatic cancer. The Senior Author receives financial support from Siemens GmbH Healthcare.« less
Neurodynamic evaluation of hearing aid features using EEG correlates of listening effort.

PubMed

Bernarding, Corinna; Strauss, Daniel J; Hannemann, Ronny; Seidler, Harald; Corona-Strauss, Farah I

2017-06-01

In this study, we propose a novel estimate of listening effort using electroencephalographic data. This method is a translation of our past findings, gained from the evoked electroencephalographic activity, to the oscillatory EEG activity. To test this technique, electroencephalographic data from experienced hearing aid users with moderate hearing loss were recorded, wearing hearing aids. The investigated hearing aid settings were: a directional microphone combined with a noise reduction algorithm in a medium and a strong setting, the noise reduction setting turned off, and a setting using omnidirectional microphones without any noise reduction. The results suggest that the electroencephalographic estimate of listening effort seems to be a useful tool to map the exerted effort of the participants. In addition, the results indicate that a directional processing mode can reduce the listening effort in multitalker listening situations.
When will Low-Contrast Features be Visible in a STEM X-Ray Spectrum Image?

PubMed

Parish, Chad M

2015-06-01

When will a small or low-contrast feature, such as an embedded second-phase particle, be visible in a scanning transmission electron microscopy (STEM) X-ray map? This work illustrates a computationally inexpensive method to simulate X-ray maps and spectrum images (SIs), based upon the equations of X-ray generation and detection. To particularize the general procedure, an example of nanostructured ferritic alloy (NFA) containing nm-sized Y2Ti2O7 embedded precipitates in ferritic stainless steel matrix is chosen. The proposed model produces physically appearing simulated SI data sets, which can either be reduced to X-ray dot maps or analyzed via multivariate statistical analysis. Comparison to NFA X-ray maps acquired using three different STEM instruments match the generated simulations quite well, despite the large number of simplifying assumptions used. A figure of merit of electron dose multiplied by X-ray collection solid angle is proposed to compare feature detectability from one data set (simulated or experimental) to another. The proposed method can scope experiments that are feasible under specific analysis conditions on a given microscope. Future applications, such as spallation proton-neutron irradiations, core-shell nanoparticles, or dopants in polycrystalline photovoltaic solar cells, are proposed.
Anomaly Detection Using an Ensemble of Feature Models

PubMed Central

Noto, Keith; Brodley, Carla; Slonim, Donna

2011-01-01

We present a new approach to semi-supervised anomaly detection. Given a set of training examples believed to come from the same distribution or class, the task is to learn a model that will be able to distinguish examples in the future that do not belong to the same class. Traditional approaches typically compare the position of a new data point to the set of “normal” training data points in a chosen representation of the feature space. For some data sets, the normal data may not have discernible positions in feature space, but do have consistent relationships among some features that fail to appear in the anomalous examples. Our approach learns to predict the values of training set features from the values of other features. After we have formed an ensemble of predictors, we apply this ensemble to new data points. To combine the contribution of each predictor in our ensemble, we have developed a novel, information-theoretic anomaly measure that our experimental results show selects against noisy and irrelevant features. Our results on 47 data sets show that for most data sets, this approach significantly improves performance over current state-of-the-art feature space distance and density-based approaches. PMID:22020249
Membership-degree preserving discriminant analysis with applications to face recognition.

PubMed

Yang, Zhangjing; Liu, Chuancai; Huang, Pu; Qian, Jianjun

2013-01-01

In pattern recognition, feature extraction techniques have been widely employed to reduce the dimensionality of high-dimensional data. In this paper, we propose a novel feature extraction algorithm called membership-degree preserving discriminant analysis (MPDA) based on the fisher criterion and fuzzy set theory for face recognition. In the proposed algorithm, the membership degree of each sample to particular classes is firstly calculated by the fuzzy k-nearest neighbor (FKNN) algorithm to characterize the similarity between each sample and class centers, and then the membership degree is incorporated into the definition of the between-class scatter and the within-class scatter. The feature extraction criterion via maximizing the ratio of the between-class scatter to the within-class scatter is applied. Experimental results on the ORL, Yale, and FERET face databases demonstrate the effectiveness of the proposed algorithm.
Deep Constrained Siamese Hash Coding Network and Load-Balanced Locality-Sensitive Hashing for Near Duplicate Image Detection.

PubMed

Hu, Weiming; Fan, Yabo; Xing, Junliang; Sun, Liang; Cai, Zhaoquan; Maybank, Stephen

2018-09-01

We construct a new efficient near duplicate image detection method using a hierarchical hash code learning neural network and load-balanced locality-sensitive hashing (LSH) indexing. We propose a deep constrained siamese hash coding neural network combined with deep feature learning. Our neural network is able to extract effective features for near duplicate image detection. The extracted features are used to construct a LSH-based index. We propose a load-balanced LSH method to produce load-balanced buckets in the hashing process. The load-balanced LSH significantly reduces the query time. Based on the proposed load-balanced LSH, we design an effective and feasible algorithm for near duplicate image detection. Extensive experiments on three benchmark data sets demonstrate the effectiveness of our deep siamese hash encoding network and load-balanced LSH.
Feature Selection Methods for Zero-Shot Learning of Neural Activity.

PubMed

Caceres, Carlos A; Roos, Matthew J; Rupp, Kyle M; Milsap, Griffin; Crone, Nathan E; Wolmetz, Michael E; Ratto, Christopher R

2017-01-01

Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy.
Off-line data reduction

NASA Astrophysics Data System (ADS)

Gutowski, Marek W.

1992-12-01

Presented is a novel, heuristic algorithm, based on fuzzy set theory, allowing for significant off-line data reduction. Given the equidistant data, the algorithm discards some points while retaining others with their original values. The fraction of original data points retained is typically {1}/{6} of the initial value. The reduced data set preserves all the essential features of the input curve. It is possible to reconstruct the original information to high degree of precision by means of natural cubic splines, rational cubic splines or even linear interpolation. Main fields of application should be non-linear data fitting (substantial savings in CPU time) and graphics (storage space savings).
Incremental wind tunnel testing of high lift systems

NASA Astrophysics Data System (ADS)

Victor, Pricop Mihai; Mircea, Boscoianu; Daniel-Eugeniu, Crunteanu

2016-06-01

Efficiency of trailing edge high lift systems is essential for long range future transport aircrafts evolving in the direction of laminar wings, because they have to compensate for the low performance of the leading edge devices. Modern high lift systems are subject of high performance requirements and constrained to simple actuation, combined with a reduced number of aerodynamic elements. Passive or active flow control is thus required for the performance enhancement. An experimental investigation of reduced kinematics flap combined with passive flow control took place in a low speed wind tunnel. The most important features of the experimental setup are the relatively large size, corresponding to a Reynolds number of about 2 Million, the sweep angle of 30 degrees corresponding to long range airliners with high sweep angle wings and the large number of flap settings and mechanical vortex generators. The model description, flap settings, methodology and results are presented.
Acoustic⁻Seismic Mixed Feature Extraction Based on Wavelet Transform for Vehicle Classification in Wireless Sensor Networks.

PubMed

Zhang, Heng; Pan, Zhongming; Zhang, Wenna

2018-06-07

An acoustic⁻seismic mixed feature extraction method based on the wavelet coefficient energy ratio (WCER) of the target signal is proposed in this study for classifying vehicle targets in wireless sensor networks. The signal was decomposed into a set of wavelet coefficients using the à trous algorithm, which is a concise method used to implement the wavelet transform of a discrete signal sequence. After the wavelet coefficients of the target acoustic and seismic signals were obtained, the energy ratio of each layer coefficient was calculated as the feature vector of the target signals. Subsequently, the acoustic and seismic features were merged into an acoustic⁻seismic mixed feature to improve the target classification accuracy after the acoustic and seismic WCER features of the target signal were simplified using the hierarchical clustering method. We selected the support vector machine method for classification and utilized the data acquired from a real-world experiment to validate the proposed method. The calculated results show that the WCER feature extraction method can effectively extract the target features from target signals. Feature simplification can reduce the time consumption of feature extraction and classification, with no effect on the target classification accuracy. The use of acoustic⁻seismic mixed features effectively improved target classification accuracy by approximately 12% compared with either acoustic signal or seismic signal alone.
Seizure Forecasting and the Preictal State in Canine Epilepsy.

PubMed

Varatharajah, Yogatheesan; Iyer, Ravishankar K; Berry, Brent M; Worrell, Gregory A; Brinkmann, Benjamin H

2017-02-01

The ability to predict seizures may enable patients with epilepsy to better manage their medications and activities, potentially reducing side effects and improving quality of life. Forecasting epileptic seizures remains a challenging problem, but machine learning methods using intracranial electroencephalographic (iEEG) measures have shown promise. A machine-learning-based pipeline was developed to process iEEG recordings and generate seizure warnings. Results support the ability to forecast seizures at rates greater than a Poisson random predictor for all feature sets and machine learning algorithms tested. In addition, subject-specific neurophysiological changes in multiple features are reported preceding lead seizures, providing evidence supporting the existence of a distinct and identifiable preictal state.
SEIZURE FORECASTING AND THE PREICTAL STATE IN CANINE EPILEPSY

PubMed Central

Varatharajah, Yogatheesan; Iyer, Ravishankar K.; Berry, Brent M.; Worrell, Gregory A.; Brinkmann, Benjamin H.

2017-01-01

The ability to predict seizures may enable patients with epilepsy to better manage their medications and activities, potentially reducing side effects and improving quality of life. Forecasting epileptic seizures remains a challenging problem, but machine learning methods using intracranial electroencephalographic (iEEG) measures have shown promise. A machine-learning-based pipeline was developed to process iEEG recordings and generate seizure warnings. Results support the ability to forecast seizures at rates greater than a Poisson random predictor for all feature sets and machine learning algorithms tested. In addition, subject-specific neurophysiological changes in multiple features are reported preceding lead seizures, providing evidence supporting the existence of a distinct and identifiable preictal state. PMID:27464854
Artificial intelligence tools for pattern recognition

NASA Astrophysics Data System (ADS)

Acevedo, Elena; Acevedo, Antonio; Felipe, Federico; Avilés, Pedro

2017-06-01

In this work, we present a system for pattern recognition that combines the power of genetic algorithms for solving problems and the efficiency of the morphological associative memories. We use a set of 48 tire prints divided into 8 brands of tires. The images have dimensions of 200 x 200 pixels. We applied Hough transform to obtain lines as main features. The number of lines obtained is 449. The genetic algorithm reduces the number of features to ten suitable lines that give thus the 100% of recognition. Morphological associative memories were used as evaluation function. The selection algorithms were Tournament and Roulette wheel. For reproduction, we applied one-point, two-point and uniform crossover.

Damage diagnosis algorithm using a sequential change point detection method with an unknown distribution for damage

NASA Astrophysics Data System (ADS)

Noh, Hae Young; Rajagopal, Ram; Kiremidjian, Anne S.

2012-04-01

This paper introduces a damage diagnosis algorithm for civil structures that uses a sequential change point detection method for the cases where the post-damage feature distribution is unknown a priori. This algorithm extracts features from structural vibration data using time-series analysis and then declares damage using the change point detection method. The change point detection method asymptotically minimizes detection delay for a given false alarm rate. The conventional method uses the known pre- and post-damage feature distributions to perform a sequential hypothesis test. In practice, however, the post-damage distribution is unlikely to be known a priori. Therefore, our algorithm estimates and updates this distribution as data are collected using the maximum likelihood and the Bayesian methods. We also applied an approximate method to reduce the computation load and memory requirement associated with the estimation. The algorithm is validated using multiple sets of simulated data and a set of experimental data collected from a four-story steel special moment-resisting frame. Our algorithm was able to estimate the post-damage distribution consistently and resulted in detection delays only a few seconds longer than the delays from the conventional method that assumes we know the post-damage feature distribution. We confirmed that the Bayesian method is particularly efficient in declaring damage with minimal memory requirement, but the maximum likelihood method provides an insightful heuristic approach.
Improved multi-stage neonatal seizure detection using a heuristic classifier and a data-driven post-processor.

PubMed

Ansari, A H; Cherian, P J; Dereymaeker, A; Matic, V; Jansen, K; De Wispelaere, L; Dielman, C; Vervisch, J; Swarte, R M; Govaert, P; Naulaers, G; De Vos, M; Van Huffel, S

2016-09-01

After identifying the most seizure-relevant characteristics by a previously developed heuristic classifier, a data-driven post-processor using a novel set of features is applied to improve the performance. The main characteristics of the outputs of the heuristic algorithm are extracted by five sets of features including synchronization, evolution, retention, segment, and signal features. Then, a support vector machine and a decision making layer remove the falsely detected segments. Four datasets including 71 neonates (1023h, 3493 seizures) recorded in two different university hospitals, are used to train and test the algorithm without removing the dubious seizures. The heuristic method resulted in a false alarm rate of 3.81 per hour and good detection rate of 88% on the entire test databases. The post-processor, effectively reduces the false alarm rate by 34% while the good detection rate decreases by 2%. This post-processing technique improves the performance of the heuristic algorithm. The structure of this post-processor is generic, improves our understanding of the core visually determined EEG features of neonatal seizures and is applicable for other neonatal seizure detectors. The post-processor significantly decreases the false alarm rate at the expense of a small reduction of the good detection rate. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Multispectra CWT-based algorithm (MCWT) in mass spectra for peak extraction.

PubMed

Hsueh, Huey-Miin; Kuo, Hsun-Chih; Tsai, Chen-An

2008-01-01

An important objective in mass spectrometry (MS) is to identify a set of biomarkers that can be used to potentially distinguish patients between distinct treatments (or conditions) from tens or hundreds of spectra. A common two-step approach involving peak extraction and quantification is employed to identify the features of scientific interest. The selected features are then used for further investigation to understand underlying biological mechanism of individual protein or for development of genomic biomarkers to early diagnosis. However, the use of inadequate or ineffective peak detection and peak alignment algorithms in peak extraction step may lead to a high rate of false positives. Also, it is crucial to reduce the false positive rate in detecting biomarkers from ten or hundreds of spectra. Here a new procedure is introduced for feature extraction in mass spectrometry data that extends the continuous wavelet transform-based (CWT-based) algorithm to multiple spectra. The proposed multispectra CWT-based algorithm (MCWT) not only can perform peak detection for multiple spectra but also carry out peak alignment at the same time. The author' MCWT algorithm constructs a reference, which integrates information of multiple raw spectra, for feature extraction. The algorithm is applied to a SELDI-TOF mass spectra data set provided by CAMDA 2006 with known polypeptide m/z positions. This new approach is easy to implement and it outperforms the existing peak extraction method from the Bioconductor PROcess package.
Approach for Text Classification Based on the Similarity Measurement between Normal Cloud Models

PubMed Central

Dai, Jin; Liu, Xin

2014-01-01

The similarity between objects is the core research area of data mining. In order to reduce the interference of the uncertainty of nature language, a similarity measurement between normal cloud models is adopted to text classification research. On this basis, a novel text classifier based on cloud concept jumping up (CCJU-TC) is proposed. It can efficiently accomplish conversion between qualitative concept and quantitative data. Through the conversion from text set to text information table based on VSM model, the text qualitative concept, which is extraction from the same category, is jumping up as a whole category concept. According to the cloud similarity between the test text and each category concept, the test text is assigned to the most similar category. By the comparison among different text classifiers in different feature selection set, it fully proves that not only does CCJU-TC have a strong ability to adapt to the different text features, but also the classification performance is also better than the traditional classifiers. PMID:24711737
Spectral-Spatial Shared Linear Regression for Hyperspectral Image Classification.

PubMed

Haoliang Yuan; Yuan Yan Tang

2017-04-01

Classification of the pixels in hyperspectral image (HSI) is an important task and has been popularly applied in many practical applications. Its major challenge is the high-dimensional small-sized problem. To deal with this problem, lots of subspace learning (SL) methods are developed to reduce the dimension of the pixels while preserving the important discriminant information. Motivated by ridge linear regression (RLR) framework for SL, we propose a spectral-spatial shared linear regression method (SSSLR) for extracting the feature representation. Comparing with RLR, our proposed SSSLR has the following two advantages. First, we utilize a convex set to explore the spatial structure for computing the linear projection matrix. Second, we utilize a shared structure learning model, which is formed by original data space and a hidden feature space, to learn a more discriminant linear projection matrix for classification. To optimize our proposed method, an efficient iterative algorithm is proposed. Experimental results on two popular HSI data sets, i.e., Indian Pines and Salinas demonstrate that our proposed methods outperform many SL methods.
Feature selection through validation and un-censoring of endovascular repair survival data for predicting the risk of re-intervention.

PubMed

Attallah, Omneya; Karthikesalingam, Alan; Holt, Peter J E; Thompson, Matthew M; Sayers, Rob; Bown, Matthew J; Choke, Eddie C; Ma, Xianghong

2017-08-03

Feature selection (FS) process is essential in the medical area as it reduces the effort and time needed for physicians to measure unnecessary features. Choosing useful variables is a difficult task with the presence of censoring which is the unique characteristic in survival analysis. Most survival FS methods depend on Cox's proportional hazard model; however, machine learning techniques (MLT) are preferred but not commonly used due to censoring. Techniques that have been proposed to adopt MLT to perform FS with survival data cannot be used with the high level of censoring. The researcher's previous publications proposed a technique to deal with the high level of censoring. It also used existing FS techniques to reduce dataset dimension. However, in this paper a new FS technique was proposed and combined with feature transformation and the proposed uncensoring approaches to select a reduced set of features and produce a stable predictive model. In this paper, a FS technique based on artificial neural network (ANN) MLT is proposed to deal with highly censored Endovascular Aortic Repair (EVAR). Survival data EVAR datasets were collected during 2004 to 2010 from two vascular centers in order to produce a final stable model. They contain almost 91% of censored patients. The proposed approach used a wrapper FS method with ANN to select a reduced subset of features that predict the risk of EVAR re-intervention after 5 years to patients from two different centers located in the United Kingdom, to allow it to be potentially applied to cross-centers predictions. The proposed model is compared with the two popular FS techniques; Akaike and Bayesian information criteria (AIC, BIC) that are used with Cox's model. The final model outperforms other methods in distinguishing the high and low risk groups; as they both have concordance index and estimated AUC better than the Cox's model based on AIC, BIC, Lasso, and SCAD approaches. These models have p-values lower than 0.05, meaning that patients with different risk groups can be separated significantly and those who would need re-intervention can be correctly predicted. The proposed approach will save time and effort made by physicians to collect unnecessary variables. The final reduced model was able to predict the long-term risk of aortic complications after EVAR. This predictive model can help clinicians decide patients' future observation plan.
Adaptive Local Spatiotemporal Features from RGB-D Data for One-Shot Learning Gesture Recognition

PubMed Central

Lin, Jia; Ruan, Xiaogang; Yu, Naigong; Yang, Yee-Hong

2016-01-01

Noise and constant empirical motion constraints affect the extraction of distinctive spatiotemporal features from one or a few samples per gesture class. To tackle these problems, an adaptive local spatiotemporal feature (ALSTF) using fused RGB-D data is proposed. First, motion regions of interest (MRoIs) are adaptively extracted using grayscale and depth velocity variance information to greatly reduce the impact of noise. Then, corners are used as keypoints if their depth, and velocities of grayscale and of depth meet several adaptive local constraints in each MRoI. With further filtering of noise, an accurate and sufficient number of keypoints is obtained within the desired moving body parts (MBPs). Finally, four kinds of multiple descriptors are calculated and combined in extended gradient and motion spaces to represent the appearance and motion features of gestures. The experimental results on the ChaLearn gesture, CAD-60 and MSRDailyActivity3D datasets demonstrate that the proposed feature achieves higher performance compared with published state-of-the-art approaches under the one-shot learning setting and comparable accuracy under the leave-one-out cross validation. PMID:27999337
Adaptive Local Spatiotemporal Features from RGB-D Data for One-Shot Learning Gesture Recognition.

PubMed

Lin, Jia; Ruan, Xiaogang; Yu, Naigong; Yang, Yee-Hong

2016-12-17

Noise and constant empirical motion constraints affect the extraction of distinctive spatiotemporal features from one or a few samples per gesture class. To tackle these problems, an adaptive local spatiotemporal feature (ALSTF) using fused RGB-D data is proposed. First, motion regions of interest (MRoIs) are adaptively extracted using grayscale and depth velocity variance information to greatly reduce the impact of noise. Then, corners are used as keypoints if their depth, and velocities of grayscale and of depth meet several adaptive local constraints in each MRoI. With further filtering of noise, an accurate and sufficient number of keypoints is obtained within the desired moving body parts (MBPs). Finally, four kinds of multiple descriptors are calculated and combined in extended gradient and motion spaces to represent the appearance and motion features of gestures. The experimental results on the ChaLearn gesture, CAD-60 and MSRDailyActivity3D datasets demonstrate that the proposed feature achieves higher performance compared with published state-of-the-art approaches under the one-shot learning setting and comparable accuracy under the leave-one-out cross validation.
Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering.

PubMed

Rodríguez-Sotelo, J L; Peluffo-Ordoñez, D; Cuesta-Frau, D; Castellanos-Domínguez, G

2012-10-01

The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance of the method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Automated pixel-wise brain tissue segmentation of diffusion-weighted images via machine learning.

PubMed

Ciritsis, Alexander; Boss, Andreas; Rossi, Cristina

2018-04-26

The diffusion-weighted (DW) MR signal sampled over a wide range of b-values potentially allows for tissue differentiation in terms of cellularity, microstructure, perfusion, and T 2 relaxivity. This study aimed to implement a machine learning algorithm for automatic brain tissue segmentation from DW-MRI datasets, and to determine the optimal sub-set of features for accurate segmentation. DWI was performed at 3 T in eight healthy volunteers using 15 b-values and 20 diffusion-encoding directions. The pixel-wise signal attenuation, as well as the trace and fractional anisotropy (FA) of the diffusion tensor, were used as features to train a support vector machine classifier for gray matter, white matter, and cerebrospinal fluid classes. The datasets of two volunteers were used for validation. For each subject, tissue classification was also performed on 3D T 1 -weighted data sets with a probabilistic framework. Confusion matrices were generated for quantitative assessment of image classification accuracy in comparison with the reference method. DWI-based tissue segmentation resulted in an accuracy of 82.1% on the validation dataset and of 82.2% on the training dataset, excluding relevant model over-fitting. A mean Dice coefficient (DSC) of 0.79 ± 0.08 was found. About 50% of the classification performance was attributable to five features (i.e. signal measured at b-values of 5/10/500/1200 s/mm 2 and the FA). This reduced set of features led to almost identical performances for the validation (82.2%) and the training (81.4%) datasets (DSC = 0.79 ± 0.08). Machine learning techniques applied to DWI data allow for accurate brain tissue segmentation based on both morphological and functional information. Copyright © 2018 John Wiley & Sons, Ltd.
Feature relevance assessment for the semantic interpretation of 3D point cloud data

NASA Astrophysics Data System (ADS)

Weinmann, M.; Jutzi, B.; Mallet, C.

2013-10-01

The automatic analysis of large 3D point clouds represents a crucial task in photogrammetry, remote sensing and computer vision. In this paper, we propose a new methodology for the semantic interpretation of such point clouds which involves feature relevance assessment in order to reduce both processing time and memory consumption. Given a standard benchmark dataset with 1.3 million 3D points, we first extract a set of 21 geometric 3D and 2D features. Subsequently, we apply a classifier-independent ranking procedure which involves a general relevance metric in order to derive compact and robust subsets of versatile features which are generally applicable for a large variety of subsequent tasks. This metric is based on 7 different feature selection strategies and thus addresses different intrinsic properties of the given data. For the example of semantically interpreting 3D point cloud data, we demonstrate the great potential of smaller subsets consisting of only the most relevant features with 4 different state-of-the-art classifiers. The results reveal that, instead of including as many features as possible in order to compensate for lack of knowledge, a crucial task such as scene interpretation can be carried out with only few versatile features and even improved accuracy.
Feature Selection Methods for Zero-Shot Learning of Neural Activity

PubMed Central

Caceres, Carlos A.; Roos, Matthew J.; Rupp, Kyle M.; Milsap, Griffin; Crone, Nathan E.; Wolmetz, Michael E.; Ratto, Christopher R.

2017-01-01

Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy. PMID:28690513
Brain properties predict proximity to symptom onset in sporadic Alzheimer’s disease

PubMed Central

Vogel, Jacob W; Vachon-Presseau, Etienne; Pichet Binette, Alexa; Tam, Angela; Orban, Pierre; La Joie, Renaud; Savard, Mélissa; Picard, Cynthia; Poirier, Judes; Bellec, Pierre; Breitner, John C S; Villeneuve, Sylvia

2018-01-01

Abstract See Tijms and Visser (doi:10.1093/brain/awy113) for a scientific commentary on this article. Alzheimer’s disease is preceded by a lengthy ‘preclinical’ stage spanning many years, during which subtle brain changes occur in the absence of overt cognitive symptoms. Predicting when the onset of disease symptoms will occur is an unsolved challenge in individuals with sporadic Alzheimer’s disease. In individuals with autosomal dominant genetic Alzheimer’s disease, the age of symptom onset is similar across generations, allowing the prediction of individual onset times with some accuracy. We extend this concept to persons with a parental history of sporadic Alzheimer’s disease to test whether an individual’s symptom onset age can be informed by the onset age of their affected parent, and whether this estimated onset age can be predicted using only MRI. Structural and functional MRIs were acquired from 255 ageing cognitively healthy subjects with a parental history of sporadic Alzheimer’s disease from the PREVENT-AD cohort. Years to estimated symptom onset was calculated as participant age minus age of parental symptom onset. Grey matter volume was extracted from T1-weighted images and whole-brain resting state functional connectivity was evaluated using degree count. Both modalities were summarized using a 444-region cortical-subcortical atlas. The entire sample was divided into training (n = 138) and testing (n = 68) sets. Within the training set, individuals closer to or beyond their parent’s symptom onset demonstrated reduced grey matter volume and altered functional connectivity, specifically in regions known to be vulnerable in Alzheimer’s disease. Machine learning was used to identify a weighted set of imaging features trained to predict years to estimated symptom onset. This feature set alone significantly predicted years to estimated symptom onset in the unseen testing data. This model, using only neuroimaging features, significantly outperformed a similar model instead trained with cognitive, genetic, imaging and demographic features used in a traditional clinical setting. We next tested if these brain properties could be generalized to predict time to clinical progression in a subgroup of 26 individuals from the Alzheimer’s Disease Neuroimaging Initiative, who eventually converted either to mild cognitive impairment or to Alzheimer’s dementia. The feature set trained on years to estimated symptom onset in the PREVENT-AD predicted variance in time to clinical conversion in this separate longitudinal dataset. Adjusting for participant age did not impact any of the results. These findings demonstrate that years to estimated symptom onset or similar measures can be predicted from brain features and may help estimate presymptomatic disease progression in at-risk individuals. PMID:29688388
Impact of experimental design on PET radiomics in predicting somatic mutation status.

PubMed

Yip, Stephen S F; Parmar, Chintan; Kim, John; Huynh, Elizabeth; Mak, Raymond H; Aerts, Hugo J W L

2017-12-01

PET-based radiomic features have demonstrated great promises in predicting genetic data. However, various experimental parameters can influence the feature extraction pipeline, and hence, Here, we investigated how experimental settings affect the performance of radiomic features in predicting somatic mutation status in non-small cell lung cancer (NSCLC) patients. 348 NSCLC patients with somatic mutation testing and diagnostic PET images were included in our analysis. Radiomic feature extractions were analyzed for varying voxel sizes, filters and bin widths. 66 radiomic features were evaluated. The performance of features in predicting mutations status was assessed using the area under the receiver-operating-characteristic curve (AUC). The influence of experimental parameters on feature predictability was quantified as the relative difference between the minimum and maximum AUC (δ). The large majority of features (n=56, 85%) were significantly predictive for EGFR mutation status (AUC≥0.61). 29 radiomic features significantly predicted EGFR mutations and were robust to experimental settings with δ Overall <5%. The overall influence (δ Overall ) of the voxel size, filter and bin width for all features ranged from 5% to 15%, respectively. For all features, none of the experimental designs was predictive of KRAS+ from KRAS- (AUC≤0.56). The predictability of 29 radiomic features was robust to the choice of experimental settings; however, these settings need to be carefully chosen for all other features. The combined effect of the investigated processing methods could be substantial and must be considered. Optimized settings that will maximize the predictive performance of individual radiomic features should be investigated in the future. Copyright © 2017 Elsevier B.V. All rights reserved.
Reproducibility of radiomics for deciphering tumor phenotype with imaging

NASA Astrophysics Data System (ADS)

Zhao, Binsheng; Tan, Yongqiang; Tsai, Wei-Yann; Qi, Jing; Xie, Chuanmiao; Lu, Lin; Schwartz, Lawrence H.

2016-03-01

Radiomics (radiogenomics) characterizes tumor phenotypes based on quantitative image features derived from routine radiologic imaging to improve cancer diagnosis, prognosis, prediction and response to therapy. Although radiomic features must be reproducible to qualify as biomarkers for clinical care, little is known about how routine imaging acquisition techniques/parameters affect reproducibility. To begin to fill this knowledge gap, we assessed the reproducibility of a comprehensive, commonly-used set of radiomic features using a unique, same-day repeat computed tomography data set from lung cancer patients. Each scan was reconstructed at 6 imaging settings, varying slice thicknesses (1.25 mm, 2.5 mm and 5 mm) and reconstruction algorithms (sharp, smooth). Reproducibility was assessed using the repeat scans reconstructed at identical imaging setting (6 settings in total). In separate analyses, we explored differences in radiomic features due to different imaging parameters by assessing the agreement of these radiomic features extracted from the repeat scans reconstructed at the same slice thickness but different algorithms (3 settings in total). Our data suggest that radiomic features are reproducible over a wide range of imaging settings. However, smooth and sharp reconstruction algorithms should not be used interchangeably. These findings will raise awareness of the importance of properly setting imaging acquisition parameters in radiomics/radiogenomics research.
Multi person detection and tracking based on hierarchical level-set method

NASA Astrophysics Data System (ADS)

Khraief, Chadia; Benzarti, Faouzi; Amiri, Hamid

2018-04-01

In this paper, we propose an efficient unsupervised method for mutli-person tracking based on hierarchical level-set approach. The proposed method uses both edge and region information in order to effectively detect objects. The persons are tracked on each frame of the sequence by minimizing an energy functional that combines color, texture and shape information. These features are enrolled in covariance matrix as region descriptor. The present method is fully automated without the need to manually specify the initial contour of Level-set. It is based on combined person detection and background subtraction methods. The edge-based is employed to maintain a stable evolution, guide the segmentation towards apparent boundaries and inhibit regions fusion. The computational cost of level-set is reduced by using narrow band technique. Many experimental results are performed on challenging video sequences and show the effectiveness of the proposed method.
An Integrated Ransac and Graph Based Mismatch Elimination Approach for Wide-Baseline Image Matching

NASA Astrophysics Data System (ADS)

Hasheminasab, M.; Ebadi, H.; Sedaghat, A.

2015-12-01

In this paper we propose an integrated approach in order to increase the precision of feature point matching. Many different algorithms have been developed as to optimizing the short-baseline image matching while because of illumination differences and viewpoints changes, wide-baseline image matching is so difficult to handle. Fortunately, the recent developments in the automatic extraction of local invariant features make wide-baseline image matching possible. The matching algorithms which are based on local feature similarity principle, using feature descriptor as to establish correspondence between feature point sets. To date, the most remarkable descriptor is the scale-invariant feature transform (SIFT) descriptor , which is invariant to image rotation and scale, and it remains robust across a substantial range of affine distortion, presence of noise, and changes in illumination. The epipolar constraint based on RANSAC (random sample consensus) method is a conventional model for mismatch elimination, particularly in computer vision. Because only the distance from the epipolar line is considered, there are a few false matches in the selected matching results based on epipolar geometry and RANSAC. Aguilariu et al. proposed Graph Transformation Matching (GTM) algorithm to remove outliers which has some difficulties when the mismatched points surrounded by the same local neighbor structure. In this study to overcome these limitations, which mentioned above, a new three step matching scheme is presented where the SIFT algorithm is used to obtain initial corresponding point sets. In the second step, in order to reduce the outliers, RANSAC algorithm is applied. Finally, to remove the remained mismatches, based on the adjacent K-NN graph, the GTM is implemented. Four different close range image datasets with changes in viewpoint are utilized to evaluate the performance of the proposed method and the experimental results indicate its robustness and capability.
Effects of band selection on endmember extraction for forestry applications

NASA Astrophysics Data System (ADS)

Karathanassi, Vassilia; Andreou, Charoula; Andronis, Vassilis; Kolokoussis, Polychronis

2014-10-01

In spectral unmixing theory, data reduction techniques play an important role as hyperspectral imagery contains an immense amount of data, posing many challenging problems such as data storage, computational efficiency, and the so called "curse of dimensionality". Feature extraction and feature selection are the two main approaches for dimensionality reduction. Feature extraction techniques are used for reducing the dimensionality of the hyperspectral data by applying transforms on hyperspectral data. Feature selection techniques retain the physical meaning of the data by selecting a set of bands from the input hyperspectral dataset, which mainly contain the information needed for spectral unmixing. Although feature selection techniques are well-known for their dimensionality reduction potentials they are rarely used in the unmixing process. The majority of the existing state-of-the-art dimensionality reduction methods set criteria to the spectral information, which is derived by the whole wavelength, in order to define the optimum spectral subspace. These criteria are not associated with any particular application but with the data statistics, such as correlation and entropy values. However, each application is associated with specific land c over materials, whose spectral characteristics present variations in specific wavelengths. In forestry for example, many applications focus on tree leaves, in which specific pigments such as chlorophyll, xanthophyll, etc. determine the wavelengths where tree species, diseases, etc., can be detected. For such applications, when the unmixing process is applied, the tree species, diseases, etc., are considered as the endmembers of interest. This paper focuses on investigating the effects of band selection on the endmember extraction by exploiting the information of the vegetation absorbance spectral zones. More precisely, it is explored whether endmember extraction can be optimized when specific sets of initial bands related to leaf spectral characteristics are selected. Experiments comprise application of well-known signal subspace estimation and endmember extraction methods on a hyperspectral imagery that presents a forest area. Evaluation of the extracted endmembers showed that more forest species can be extracted as endmembers using selected bands.
Towards limb position invariant myoelectric pattern recognition using time-dependent spectral features.

PubMed

Khushaba, Rami N; Takruri, Maen; Miro, Jaime Valls; Kodagoda, Sarath

2014-07-01

Recent studies in Electromyogram (EMG) pattern recognition reveal a gap between research findings and a viable clinical implementation of myoelectric control strategies. One of the important factors contributing to the limited performance of such controllers in practice is the variation in the limb position associated with normal use as it results in different EMG patterns for the same movements when carried out at different positions. However, the end goal of the myoelectric control scheme is to allow amputees to control their prosthetics in an intuitive and accurate manner regardless of the limb position at which the movement is initiated. In an attempt to reduce the impact of limb position on EMG pattern recognition, this paper proposes a new feature extraction method that extracts a set of power spectrum characteristics directly from the time-domain. The end goal is to form a set of features invariant to limb position. Specifically, the proposed method estimates the spectral moments, spectral sparsity, spectral flux, irregularity factor, and signals power spectrum correlation. This is achieved through using Fourier transform properties to form invariants to amplification, translation and signal scaling, providing an efficient and accurate representation of the underlying EMG activity. Additionally, due to the inherent temporal structure of the EMG signal, the proposed method is applied on the global segments of EMG data as well as the sliced segments using multiple overlapped windows. The performance of the proposed features is tested on EMG data collected from eleven subjects, while implementing eight classes of movements, each at five different limb positions. Practical results indicate that the proposed feature set can achieve significant reduction in classification error rates, in comparison to other methods, with ≈8% error on average across all subjects and limb positions. A real-time implementation and demonstration is also provided and made available as a video supplement (see Appendix A). Copyright © 2014 Elsevier Ltd. All rights reserved.
Feature Selection for Chemical Sensor Arrays Using Mutual Information

PubMed Central

Wang, X. Rosalind; Lizier, Joseph T.; Nowotny, Thomas; Berna, Amalia Z.; Prokopenko, Mikhail; Trowell, Stephen C.

2014-01-01

We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays. PMID:24595058

Freezing of Gait Detection in Parkinson's Disease: A Subject-Independent Detector Using Anomaly Scores.

PubMed

Pham, Thuy T; Moore, Steven T; Lewis, Simon John Geoffrey; Nguyen, Diep N; Dutkiewicz, Eryk; Fuglevand, Andrew J; McEwan, Alistair L; Leong, Philip H W

2017-11-01

Freezing of gait (FoG) is common in Parkinsonian gait and strongly relates to falls. Current clinical FoG assessments are patients' self-report diaries and experts' manual video analysis. Both are subjective and yield moderate reliability. Existing detection algorithms have been predominantly designed in subject-dependent settings. In this paper, we aim to develop an automated FoG detector for subject independent. After extracting highly relevant features, we apply anomaly detection techniques to detect FoG events. Specifically, feature selection is performed using correlation and clusterability metrics. From a list of 244 feature candidates, 36 candidates were selected using saliency and robustness criteria. We develop an anomaly score detector with adaptive thresholding to identify FoG events. Then, using accuracy metrics, we reduce the feature list to seven candidates. Our novel multichannel freezing index was the most selective across all window sizes, achieving sensitivity (specificity) of (). On the other hand, freezing index from the vertical axis was the best choice for a single input, achieving sensitivity (specificity) of () for ankle and () for back sensors. Our subject-independent method is not only significantly more accurate than those previously reported, but also uses a much smaller window (e.g., versus ) and/or lower tolerance (e.g., versus ).Freezing of gait (FoG) is common in Parkinsonian gait and strongly relates to falls. Current clinical FoG assessments are patients' self-report diaries and experts' manual video analysis. Both are subjective and yield moderate reliability. Existing detection algorithms have been predominantly designed in subject-dependent settings. In this paper, we aim to develop an automated FoG detector for subject independent. After extracting highly relevant features, we apply anomaly detection techniques to detect FoG events. Specifically, feature selection is performed using correlation and clusterability metrics. From a list of 244 feature candidates, 36 candidates were selected using saliency and robustness criteria. We develop an anomaly score detector with adaptive thresholding to identify FoG events. Then, using accuracy metrics, we reduce the feature list to seven candidates. Our novel multichannel freezing index was the most selective across all window sizes, achieving sensitivity (specificity) of (). On the other hand, freezing index from the vertical axis was the best choice for a single input, achieving sensitivity (specificity) of () for ankle and () for back sensors. Our subject-independent method is not only significantly more accurate than those previously reported, but also uses a much smaller window (e.g., versus ) and/or lower tolerance (e.g., versus ).
Web-based newborn screening system for metabolic diseases: machine learning versus clinicians.

PubMed

Chen, Wei-Hsin; Hsieh, Sheau-Ling; Hsu, Kai-Ping; Chen, Han-Ping; Su, Xing-Yu; Tseng, Yi-Ju; Chien, Yin-Hsiu; Hwu, Wuh-Liang; Lai, Feipei

2013-05-23

A hospital information system (HIS) that integrates screening data and interpretation of the data is routinely requested by hospitals and parents. However, the accuracy of disease classification may be low because of the disease characteristics and the analytes used for classification. The objective of this study is to describe a system that enhanced the neonatal screening system of the Newborn Screening Center at the National Taiwan University Hospital. The system was designed and deployed according to a service-oriented architecture (SOA) framework under the Web services .NET environment. The system consists of sample collection, testing, diagnosis, evaluation, treatment, and follow-up services among collaborating hospitals. To improve the accuracy of newborn screening, machine learning and optimal feature selection mechanisms were investigated for screening newborns for inborn errors of metabolism. The framework of the Newborn Screening Hospital Information System (NSHIS) used the embedded Health Level Seven (HL7) standards for data exchanges among heterogeneous platforms integrated by Web services in the C# language. In this study, machine learning classification was used to predict phenylketonuria (PKU), hypermethioninemia, and 3-methylcrotonyl-CoA-carboxylase (3-MCC) deficiency. The classification methods used 347,312 newborn dried blood samples collected at the Center between 2006 and 2011. Of these, 220 newborns had values over the diagnostic cutoffs (positive cases) and 1557 had values that were over the screening cutoffs but did not meet the diagnostic cutoffs (suspected cases). The original 35 analytes and the manifested features were ranked based on F score, then combinations of the top 20 ranked features were selected as input features to support vector machine (SVM) classifiers to obtain optimal feature sets. These feature sets were tested using 5-fold cross-validation and optimal models were generated. The datasets collected in year 2011 were used as predicting cases. The feature selection strategies were implemented and the optimal markers for PKU, hypermethioninemia, and 3-MCC deficiency were obtained. The results of the machine learning approach were compared with the cutoff scheme. The number of the false positive cases were reduced from 21 to 2 for PKU, from 30 to 10 for hypermethioninemia, and 209 to 46 for 3-MCC deficiency. This SOA Web service-based newborn screening system can accelerate screening procedures effectively and efficiently. An SVM learning methodology for PKU, hypermethioninemia, and 3-MCC deficiency metabolic diseases classification, including optimal feature selection strategies, is presented. By adopting the results of this study, the number of suspected cases could be reduced dramatically.
Web-Based Newborn Screening System for Metabolic Diseases: Machine Learning Versus Clinicians

PubMed Central

Chen, Wei-Hsin; Hsu, Kai-Ping; Chen, Han-Ping; Su, Xing-Yu; Tseng, Yi-Ju; Chien, Yin-Hsiu; Hwu, Wuh-Liang; Lai, Feipei

2013-01-01

Background A hospital information system (HIS) that integrates screening data and interpretation of the data is routinely requested by hospitals and parents. However, the accuracy of disease classification may be low because of the disease characteristics and the analytes used for classification. Objective The objective of this study is to describe a system that enhanced the neonatal screening system of the Newborn Screening Center at the National Taiwan University Hospital. The system was designed and deployed according to a service-oriented architecture (SOA) framework under the Web services .NET environment. The system consists of sample collection, testing, diagnosis, evaluation, treatment, and follow-up services among collaborating hospitals. To improve the accuracy of newborn screening, machine learning and optimal feature selection mechanisms were investigated for screening newborns for inborn errors of metabolism. Methods The framework of the Newborn Screening Hospital Information System (NSHIS) used the embedded Health Level Seven (HL7) standards for data exchanges among heterogeneous platforms integrated by Web services in the C# language. In this study, machine learning classification was used to predict phenylketonuria (PKU), hypermethioninemia, and 3-methylcrotonyl-CoA-carboxylase (3-MCC) deficiency. The classification methods used 347,312 newborn dried blood samples collected at the Center between 2006 and 2011. Of these, 220 newborns had values over the diagnostic cutoffs (positive cases) and 1557 had values that were over the screening cutoffs but did not meet the diagnostic cutoffs (suspected cases). The original 35 analytes and the manifested features were ranked based on F score, then combinations of the top 20 ranked features were selected as input features to support vector machine (SVM) classifiers to obtain optimal feature sets. These feature sets were tested using 5-fold cross-validation and optimal models were generated. The datasets collected in year 2011 were used as predicting cases. Results The feature selection strategies were implemented and the optimal markers for PKU, hypermethioninemia, and 3-MCC deficiency were obtained. The results of the machine learning approach were compared with the cutoff scheme. The number of the false positive cases were reduced from 21 to 2 for PKU, from 30 to 10 for hypermethioninemia, and 209 to 46 for 3-MCC deficiency. Conclusions This SOA Web service–based newborn screening system can accelerate screening procedures effectively and efficiently. An SVM learning methodology for PKU, hypermethioninemia, and 3-MCC deficiency metabolic diseases classification, including optimal feature selection strategies, is presented. By adopting the results of this study, the number of suspected cases could be reduced dramatically. PMID:23702487
Cross-indexing of binary SIFT codes for large-scale image search.

PubMed

Liu, Zhen; Li, Houqiang; Zhang, Liyan; Zhou, Wengang; Tian, Qi

2014-05-01

In recent years, there has been growing interest in mapping visual features into compact binary codes for applications on large-scale image collections. Encoding high-dimensional data as compact binary codes reduces the memory cost for storage. Besides, it benefits the computational efficiency since the computation of similarity can be efficiently measured by Hamming distance. In this paper, we propose a novel flexible scale invariant feature transform (SIFT) binarization (FSB) algorithm for large-scale image search. The FSB algorithm explores the magnitude patterns of SIFT descriptor. It is unsupervised and the generated binary codes are demonstrated to be dispreserving. Besides, we propose a new searching strategy to find target features based on the cross-indexing in the binary SIFT space and original SIFT space. We evaluate our approach on two publicly released data sets. The experiments on large-scale partial duplicate image retrieval system demonstrate the effectiveness and efficiency of the proposed algorithm.
Process service quality evaluation based on Dempster-Shafer theory and support vector machine.

PubMed

Pei, Feng-Que; Li, Dong-Bo; Tong, Yi-Fei; He, Fei

2017-01-01

Human involvement influences traditional service quality evaluations, which triggers an evaluation's low accuracy, poor reliability and less impressive predictability. This paper proposes a method by employing a support vector machine (SVM) and Dempster-Shafer evidence theory to evaluate the service quality of a production process by handling a high number of input features with a low sampling data set, which is called SVMs-DS. Features that can affect production quality are extracted by a large number of sensors. Preprocessing steps such as feature simplification and normalization are reduced. Based on three individual SVM models, the basic probability assignments (BPAs) are constructed, which can help the evaluation in a qualitative and quantitative way. The process service quality evaluation results are validated by the Dempster rules; the decision threshold to resolve conflicting results is generated from three SVM models. A case study is presented to demonstrate the effectiveness of the SVMs-DS method.
Location and Modality Effects in Online Dating: Rich Modality Profile and Location-Based Information Cues Increase Social Presence, While Moderating the Impact of Uncertainty Reduction Strategy.

PubMed

Jung, Soyoung; Roh, Soojin; Yang, Hyun; Biocca, Frank

2017-09-01

This study investigates how different interface modality features of online dating sites, such as location awareness cues and modality of profiles, affect the sense of social presence of a prospective date. We also examined how various user behaviors aimed at reducing uncertainty about online interactions affect social presence perceptions and are affected by the user interface features. Male users felt a greater sense of social presence when exposed to both location and accessibility cues (geographical proximity) and a richer medium (video profiles). Viewing a richer medium significantly increased the sense of social presence among female participants whereas location-based information sharing features did not directly affect their social presence perception. Augmented social presence, as a mediator, contributed to users' greater intention to meet potential dating partners in a face-to-face setting and to buy paid memberships on online dating sites.
Discriminative region extraction and feature selection based on the combination of SURF and saliency

NASA Astrophysics Data System (ADS)

Deng, Li; Wang, Chunhong; Rao, Changhui

2011-08-01

The objective of this paper is to provide a possible optimization on salient region algorithm, which is extensively used in recognizing and learning object categories. Salient region algorithm owns the superiority of intra-class tolerance, global score of features and automatically prominent scale selection under certain range. However, the major limitation behaves on performance, and that is what we attempt to improve. By reducing the number of pixels involved in saliency calculation, it can be accelerated. We use interest points detected by fast-Hessian, the detector of SURF, as the candidate feature for saliency operation, rather than the whole set in image. This implementation is thereby called Saliency based Optimization over SURF (SOSU for short). Experiment shows that bringing in of such a fast detector significantly speeds up the algorithm. Meanwhile, Robustness of intra-class diversity ensures object recognition accuracy.
Speech recognition features for EEG signal description in detection of neonatal seizures.

PubMed

Temko, A; Boylan, G; Marnane, W; Lightbody, G

2010-01-01

In this work, features which are usually employed in automatic speech recognition (ASR) are used for the detection of neonatal seizures in newborn EEG. Three conventional ASR feature sets are compared to the feature set which has been previously developed for this task. The results indicate that the thoroughly-studied spectral envelope based ASR features perform reasonably well on their own. Additionally, the SVM Recursive Feature Elimination routine is applied to all extracted features pooled together. It is shown that ASR features consistently appear among the top-rank features.
Automatic feature design for optical character recognition using an evolutionary search procedure.

PubMed

Stentiford, F W

1985-03-01

An automatic evolutionary search is applied to the problem of feature extraction in an OCR application. A performance measure based on feature independence is used to generate features which do not appear to suffer from peaking effects [17]. Features are extracted from a training set of 30 600 machine printed 34 class alphanumeric characters derived from British mail. Classification results on the training set and a test set of 10 200 characters are reported for an increasing number of features. A 1.01 percent forced decision error rate is obtained on the test data using 316 features. The hardware implementation should be cheap and fast to operate. The performance compares favorably with current low cost OCR page readers.
Assessment of Mental, Emotional and Physical Stress through Analysis of Physiological Signals Using Smartphones.

PubMed

Mohino-Herranz, Inma; Gil-Pita, Roberto; Ferreira, Javier; Rosa-Zurera, Manuel; Seoane, Fernando

2015-10-08

Determining the stress level of a subject in real time could be of special interest in certain professional activities to allow the monitoring of soldiers, pilots, emergency personnel and other professionals responsible for human lives. Assessment of current mental fitness for executing a task at hand might avoid unnecessary risks. To obtain this knowledge, two physiological measurements were recorded in this work using customized non-invasive wearable instrumentation that measures electrocardiogram (ECG) and thoracic electrical bioimpedance (TEB) signals. The relevant information from each measurement is extracted via evaluation of a reduced set of selected features. These features are primarily obtained from filtered and processed versions of the raw time measurements with calculations of certain statistical and descriptive parameters. Selection of the reduced set of features was performed using genetic algorithms, thus constraining the computational cost of the real-time implementation. Different classification approaches have been studied, but neural networks were chosen for this investigation because they represent a good tradeoff between the intelligence of the solution and computational complexity. Three different application scenarios were considered. In the first scenario, the proposed system is capable of distinguishing among different types of activity with a 21.2% probability error, for activities coded as neutral, emotional, mental and physical. In the second scenario, the proposed solution distinguishes among the three different emotional states of neutral, sadness and disgust, with a probability error of 4.8%. In the third scenario, the system is able to distinguish between low mental load and mental overload with a probability error of 32.3%. The computational cost was calculated, and the solution was implemented in commercially available Android-based smartphones. The results indicate that execution of such a monitoring solution is negligible compared to the nominal computational load of current smartphones.
Low-Power Analog Processing for Sensing Applications: Low-Frequency Harmonic Signal Classification

PubMed Central

White, Daniel J.; William, Peter E.; Hoffman, Michael W.; Balkir, Sina

2013-01-01

A low-power analog sensor front-end is described that reduces the energy required to extract environmental sensing spectral features without using Fast Fouriér Transform (FFT) or wavelet transforms. An Analog Harmonic Transform (AHT) allows selection of only the features needed by the back-end, in contrast to the FFT, where all coefficients must be calculated simultaneously. We also show that the FFT coefficients can be easily calculated from the AHT results by a simple back-substitution. The scheme is tailored for low-power, parallel analog implementation in an integrated circuit (IC). Two different applications are tested with an ideal front-end model and compared to existing studies with the same data sets. Results from the military vehicle classification and identification of machine-bearing fault applications shows that the front-end suits a wide range of harmonic signal sources. Analog-related errors are modeled to evaluate the feasibility of and to set design parameters for an IC implementation to maintain good system-level performance. Design of a preliminary transistor-level integrator circuit in a 0.13 μm complementary metal-oxide-silicon (CMOS) integrated circuit process showed the ability to use online self-calibration to reduce fabrication errors to a sufficiently low level. Estimated power dissipation is about three orders of magnitude less than similar vehicle classification systems that use commercially available FFT spectral extraction. PMID:23892765
Computer-aided diagnosis of psoriasis skin images with HOS, texture and color features: A first comparative study of its kind.

PubMed

Shrivastava, Vimal K; Londhe, Narendra D; Sonawane, Rajendra S; Suri, Jasjit S

2016-04-01

Psoriasis is an autoimmune skin disease with red and scaly plaques on skin and affecting about 125 million people worldwide. Currently, dermatologist use visual and haptic methods for diagnosis the disease severity. This does not help them in stratification and risk assessment of the lesion stage and grade. Further, current methods add complexity during monitoring and follow-up phase. The current diagnostic tools lead to subjectivity in decision making and are unreliable and laborious. This paper presents a first comparative performance study of its kind using principal component analysis (PCA) based CADx system for psoriasis risk stratification and image classification utilizing: (i) 11 higher order spectra (HOS) features, (ii) 60 texture features, and (iii) 86 color feature sets and their seven combinations. Aggregate 540 image samples (270 healthy and 270 diseased) from 30 psoriasis patients of Indian ethnic origin are used in our database. Machine learning using PCA is used for dominant feature selection which is then fed to support vector machine classifier (SVM) to obtain optimized performance. Three different protocols are implemented using three kinds of feature sets. Reliability index of the CADx is computed. Among all feature combinations, the CADx system shows optimal performance of 100% accuracy, 100% sensitivity and specificity, when all three sets of feature are combined. Further, our experimental result with increasing data size shows that all feature combinations yield high reliability index throughout the PCA-cutoffs except color feature set and combination of color and texture feature sets. HOS features are powerful in psoriasis disease classification and stratification. Even though, independently, all three set of features HOS, texture, and color perform competitively, but when combined, the machine learning system performs the best. The system is fully automated, reliable and accurate. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification.

PubMed

Huang, Chuen-Der; Lin, Chin-Teng; Pal, Nikhil Ranjan

2003-12-01

The structure classification of proteins plays a very important role in bioinformatics, since the relationships and characteristics among those known proteins can be exploited to predict the structure of new proteins. The success of a classification system depends heavily on two things: the tools being used and the features considered. For the bioinformatics applications, the role of appropriate features has not been paid adequate importance. In this investigation we use three novel ideas for multiclass protein fold classification. First, we use the gating neural network, where each input node is associated with a gate. This network can select important features in an online manner when the learning goes on. At the beginning of the training, all gates are almost closed, i.e., no feature is allowed to enter the network. Through the training, gates corresponding to good features are completely opened while gates corresponding to bad features are closed more tightly, and some gates may be partially open. The second novel idea is to use a hierarchical learning architecture (HLA). The classifier in the first level of HLA classifies the protein features into four major classes: all alpha, all beta, alpha + beta, and alpha/beta. And in the next level we have another set of classifiers, which further classifies the protein features into 27 folds. The third novel idea is to induce the indirect coding features from the amino-acid composition sequence of proteins based on the N-gram concept. This provides us with more representative and discriminative new local features of protein sequences for multiclass protein fold classification. The proposed HLA with new indirect coding features increases the protein fold classification accuracy by about 12%. Moreover, the gating neural network is found to reduce the number of features drastically. Using only half of the original features selected by the gating neural network can reach comparable test accuracy as that using all the original features. The gating mechanism also helps us to get a better insight into the folding process of proteins. For example, tracking the evolution of different gates we can find which characteristics (features) of the data are more important for the folding process. And, of course, it also reduces the computation time.
A primitive study of voxel feature generation by multiple stacked denoising autoencoders for detecting cerebral aneurysms on MRA

NASA Astrophysics Data System (ADS)

Nemoto, Mitsutaka; Hayashi, Naoto; Hanaoka, Shouhei; Nomura, Yukihiro; Miki, Soichiro; Yoshikawa, Takeharu; Ohtomo, Kuni

2016-03-01

The purpose of this study is to evaluate the feasibility of a novel feature generation, which is based on multiple deep neural networks (DNNs) with boosting, for computer-assisted detection (CADe). It is hard and time-consuming to optimize the hyperparameters for DNNs such as stacked denoising autoencoder (SdA). The proposed method allows using SdA based features without the burden of the hyperparameter setting. The proposed method was evaluated by an application for detecting cerebral aneurysms on magnetic resonance angiogram (MRA). A baseline CADe process included four components; scaling, candidate area limitation, candidate detection, and candidate classification. Proposed feature generation method was applied to extract the optimal features for candidate classification. Proposed method only required setting range of the hyperparameters for SdA. The optimal feature set was selected from a large quantity of SdA based features by multiple SdAs, each of which was trained using different hyperparameter set. The feature selection was operated through ada-boost ensemble learning method. Training of the baseline CADe process and proposed feature generation were operated with 200 MRA cases, and the evaluation was performed with 100 MRA cases. Proposed method successfully provided SdA based features just setting the range of some hyperparameters for SdA. The CADe process by using both previous voxel features and SdA based features had the best performance with 0.838 of an area under ROC curve and 0.312 of ANODE score. The results showed that proposed method was effective in the application for detecting cerebral aneurysms on MRA.
Working memory for visual features and conjunctions in schizophrenia.

PubMed

Gold, James M; Wilk, Christopher M; McMahon, Robert P; Buchanan, Robert W; Luck, Steven J

2003-02-01

The visual working memory (WM) storage capacity of patients with schizophrenia was investigated using a change detection paradigm. Participants were presented with 2, 3, 4, or 6 colored bars with testing of both single feature (color, orientation) and feature conjunction conditions. Patients performed significantly worse than controls at all set sizes but demonstrated normal feature binding. Unlike controls, patient WM capacity declined at set size 6 relative to set size 4. Impairments with subcapacity arrays suggest a deficit in task set maintenance: Greater impairment for supercapacity set sizes suggests a deficit in the ability to selectively encode information for WM storage. Thus, the WM impairment in schizophrenia appears to be a consequence of attentional deficits rather than a reduction in storage capacity.
X-ray EM simulation tool for ptychography dataset construction

NASA Astrophysics Data System (ADS)

Stoevelaar, L. Pjotr; Gerini, Giampiero

2018-03-01

In this paper, we present an electromagnetic full-wave modeling framework, as a support EM tool providing data sets for X-ray ptychographic imaging. Modeling the entire scattering problem with Finite Element Method (FEM) tools is, in fact, a prohibitive task, because of the large area illuminated by the beam (due to the poor focusing power at these wavelengths) and the very small features to be imaged. To overcome this problem, the spectrum of the illumination beam is decomposed into a discrete set of plane waves. This allows reducing the electromagnetic modeling volume to the one enclosing the area to be imaged. The total scattered field is reconstructed by superimposing the solutions for each plane wave illumination.
Hierarchical Kohonenen net for anomaly detection in network security.

PubMed

Sarasamma, Suseela T; Zhu, Qiuming A; Huff, Julie

2005-04-01

A novel multilevel hierarchical Kohonen Net (K-Map) for an intrusion detection system is presented. Each level of the hierarchical map is modeled as a simple winner-take-all K-Map. One significant advantage of this multilevel hierarchical K-Map is its computational efficiency. Unlike other statistical anomaly detection methods such as nearest neighbor approach, K-means clustering or probabilistic analysis that employ distance computation in the feature space to identify the outliers, our approach does not involve costly point-to-point computation in organizing the data into clusters. Another advantage is the reduced network size. We use the classification capability of the K-Map on selected dimensions of data set in detecting anomalies. Randomly selected subsets that contain both attacks and normal records from the KDD Cup 1999 benchmark data are used to train the hierarchical net. We use a confidence measure to label the clusters. Then we use the test set from the same KDD Cup 1999 benchmark to test the hierarchical net. We show that a hierarchical K-Map in which each layer operates on a small subset of the feature space is superior to a single-layer K-Map operating on the whole feature space in detecting a variety of attacks in terms of detection rate as well as false positive rate.
Dimensionality reduction for the quantitative evaluation of a smartphone-based Timed Up and Go test.

PubMed

Palmerini, Luca; Mellone, Sabato; Rocchi, Laura; Chiari, Lorenzo

2011-01-01

The Timed Up and Go is a clinical test to assess mobility in the elderly and in Parkinson's disease. Lately instrumented versions of the test are being considered, where inertial sensors assess motion. To improve the pervasiveness, ease of use, and cost, we consider a smartphone's accelerometer as the measurement system. Several parameters (usually highly correlated) can be computed from the signals recorded during the test. To avoid redundancy and obtain the features that are most sensitive to the locomotor performance, a dimensionality reduction was performed through principal component analysis (PCA). Forty-nine healthy subjects of different ages were tested. PCA was performed to extract new features (principal components) which are not redundant combinations of the original parameters and account for most of the data variability. They can be useful for exploratory analysis and outlier detection. Then, a reduced set of the original parameters was selected through correlation analysis with the principal components. This set could be recommended for studies based on healthy adults. The proposed procedure could be used as a first-level feature selection in classification studies (i.e. healthy-Parkinson's disease, fallers-non fallers) and could allow, in the future, a complete system for movement analysis to be incorporated in a smartphone.
When will low-contrast features be visible in a STEM X-ray spectrum image?

DOE PAGES

Parish, Chad M.

2015-04-01

When will a small or low-contrast feature, such as an embedded second-phase particle, be visible in a scanning transmission electron microscopy (STEM) X-ray map? This work illustrates a computationally inexpensive method to simulate X-ray maps and spectrum images (SIs), based upon the equations of X-ray generation and detection. To particularize the general procedure, an example of nanostructured ferritic alloy (NFA) containing nm-sized Y 2Ti 2O 7 embedded precipitates in ferritic stainless steel matrix is chosen. The proposed model produces physically appearing simulated SI data sets, which can either be reduced to X-ray dot maps or analyzed via multivariate statistical analysis.more » Comparison to NFA X-ray maps acquired using three different STEM instruments match the generated simulations quite well, despite the large number of simplifying assumptions used. A figure of merit of electron dose multiplied by X-ray collection solid angle is proposed to compare feature detectability from one data set (simulated or experimental) to another. The proposed method can scope experiments that are feasible under specific analysis conditions on a given microscope. As a result, future applications, such as spallation proton–neutron irradiations, core-shell nanoparticles, or dopants in polycrystalline photovoltaic solar cells, are proposed.« less
Discrimination between Alzheimer's Disease and Late Onset Bipolar Disorder Using Multivariate Analysis.

PubMed

Besga, Ariadna; Gonzalez, Itxaso; Echeburua, Enrique; Savio, Alexandre; Ayerdi, Borja; Chyzhyk, Darya; Madrigal, Jose L M; Leza, Juan C; Graña, Manuel; Gonzalez-Pinto, Ana Maria

2015-01-01

Late onset bipolar disorder (LOBD) is often difficult to distinguish from degenerative dementias, such as Alzheimer disease (AD), due to comorbidities and common cognitive symptoms. Moreover, LOBD prevalence in the elder population is not negligible and it is increasing. Both pathologies share pathophysiological neuroinflammation features. Improvements in differential diagnosis of LOBD and AD will help to select the best personalized treatment. The aim of this study is to assess the relative significance of clinical observations, neuropsychological tests, and specific blood plasma biomarkers (inflammatory and neurotrophic), separately and combined, in the differential diagnosis of LOBD versus AD. It was carried out evaluating the accuracy achieved by classification-based computer-aided diagnosis (CAD) systems based on these variables. A sample of healthy controls (HC) (n = 26), AD patients (n = 37), and LOBD patients (n = 32) was recruited at the Alava University Hospital. Clinical observations, neuropsychological tests, and plasma biomarkers were measured at recruitment time. We applied multivariate machine learning classification methods to discriminate subjects from HC, AD, and LOBD populations in the study. We analyzed, for each classification contrast, feature sets combining clinical observations, neuropsychological measures, and biological markers, including inflammation biomarkers. Furthermore, we analyzed reduced feature sets containing variables with significative differences determined by a Welch's t-test. Furthermore, a battery of classifier architectures were applied, encompassing linear and non-linear Support Vector Machines (SVM), Random Forests (RF), Classification and regression trees (CART), and their performance was evaluated in a leave-one-out (LOO) cross-validation scheme. Post hoc analysis of Gini index in CART classifiers provided a measure of each variable importance. Welch's t-test found one biomarker (Malondialdehyde) with significative differences (p < 0.001) in LOBD vs. AD contrast. Classification results with the best features are as follows: discrimination of HC vs. AD patients reaches accuracy 97.21% and AUC 98.17%. Discrimination of LOBD vs. AD patients reaches accuracy 90.26% and AUC 89.57%. Discrimination of HC vs LOBD patients achieves accuracy 95.76% and AUC 88.46%. It is feasible to build CAD systems for differential diagnosis of LOBD and AD on the basis of a reduced set of clinical variables. Clinical observations provide the greatest discrimination. Neuropsychological tests are improved by the addition of biomarkers, and both contribute significantly to improve the overall predictive performance.

Control Circuit For Two Stepping Motors

NASA Technical Reports Server (NTRS)

Ratliff, Roger; Rehmann, Kenneth; Backus, Charles

1990-01-01

Control circuit operates two independent stepping motors, one at a time. Provides following operating features: After selected motor stepped to chosen position, power turned off to reduce dissipation; Includes two up/down counters that remember at which one of eight steps each motor set. For selected motor, step indicated by illumination of one of eight light-emitting diodes (LED's) in ring; Selected motor advanced one step at time or repeatedly at rate controlled; Motor current - 30 mA at 90 degree positions, 60 mA at 45 degree positions - indicated by high or low intensity of LED that serves as motor-current monitor; Power-on reset feature provides trouble-free starts; To maintain synchronism between control circuit and motors, stepping of counters inhibited when motor power turned off.
Evidence of tampering in watermark identification

NASA Astrophysics Data System (ADS)

McLauchlan, Lifford; Mehrübeoglu, Mehrübe

2009-08-01

In this work, watermarks are embedded in digital images in the discrete wavelet transform (DWT) domain. Principal component analysis (PCA) is performed on the DWT coefficients. Next higher order statistics based on the principal components and the eigenvalues are determined for different sets of images. Feature sets are analyzed for different types of attacks in m dimensional space. The results demonstrate the separability of the features for the tampered digital copies. Different feature sets are studied to determine more effective tamper evident feature sets. The digital forensics, the probable manipulation(s) or modification(s) performed on the digital information can be identified using the described technique.
Drug-target interaction prediction using ensemble learning and dimensionality reduction.

PubMed

Ezzat, Ali; Wu, Min; Li, Xiao-Li; Kwoh, Chee-Keong

2017-10-01

Experimental prediction of drug-target interactions is expensive, time-consuming and tedious. Fortunately, computational methods help narrow down the search space for interaction candidates to be further examined via wet-lab techniques. Nowadays, the number of attributes/features for drugs and targets, as well as the amount of their interactions, are increasing, making these computational methods inefficient or occasionally prohibitive. This motivates us to derive a reduced feature set for prediction. In addition, since ensemble learning techniques are widely used to improve the classification performance, it is also worthwhile to design an ensemble learning framework to enhance the performance for drug-target interaction prediction. In this paper, we propose a framework for drug-target interaction prediction leveraging both feature dimensionality reduction and ensemble learning. First, we conducted feature subspacing to inject diversity into the classifier ensemble. Second, we applied three different dimensionality reduction methods to the subspaced features. Third, we trained homogeneous base learners with the reduced features and then aggregated their scores to derive the final predictions. For base learners, we selected two classifiers, namely Decision Tree and Kernel Ridge Regression, resulting in two variants of ensemble models, EnsemDT and EnsemKRR, respectively. In our experiments, we utilized AUC (Area under ROC Curve) as an evaluation metric. We compared our proposed methods with various state-of-the-art methods under 5-fold cross validation. Experimental results showed EnsemKRR achieving the highest AUC (94.3%) for predicting drug-target interactions. In addition, dimensionality reduction helped improve the performance of EnsemDT. In conclusion, our proposed methods produced significant improvements for drug-target interaction prediction. Copyright © 2017 Elsevier Inc. All rights reserved.
Event Recognition for Contactless Activity Monitoring Using Phase-Modulated Continuous Wave Radar.

PubMed

Forouzanfar, Mohamad; Mabrouk, Mohamed; Rajan, Sreeraman; Bolic, Miodrag; Dajani, Hilmi R; Groza, Voicu Z

2017-02-01

The use of remote sensing technologies such as radar is gaining popularity as a technique for contactless detection of physiological signals and analysis of human motion. This paper presents a methodology for classifying different events in a collection of phase modulated continuous wave radar returns. The primary application of interest is to monitor inmates where the presence of human vital signs amidst different, interferences needs to be identified. A comprehensive set of features is derived through time and frequency domain analyses of the radar returns. The Bhattacharyya distance is used to preselect the features with highest class separability as the possible candidate features for use in the classification process. The uncorrelated linear discriminant analysis is performed to decorrelate, denoise, and reduce the dimension of the candidate feature set. Linear and quadratic Bayesian classifiers are designed to distinguish breathing, different human motions, and nonhuman motions. The performance of these classifiers is evaluated on a pilot dataset of radar returns that contained different events including breathing, stopped breathing, simple human motions, and movement of fan and water. Our proposed pattern classification system achieved accuracies of up to 93% in stationary subject detection, 90% in stop-breathing detection, and 86% in interference detection. Our proposed radar pattern recognition system was able to accurately distinguish the predefined events amidst interferences. Besides inmate monitoring and suicide attempt detection, this paper can be extended to other radar applications such as home-based monitoring of elderly people, apnea detection, and home occupancy detection.
Inefficient conjunction search made efficient by concurrent spoken delivery of target identity.

PubMed

Reali, Florencia; Spivey, Michael J; Tyler, Melinda J; Terranova, Joseph

2006-08-01

Visual search based on a conjunction of two features typically elicits reaction times that increase linearly as a function of the number of distractors, whereas search based on a single feature is essentially unaffected by set size. These and related findings have often been interpreted as evidence of a serial search stage that follows a parallel search stage. However, a wide range of studies has been showing a form of blending of these two processes. For example, when a spoken instruction identifies the conjunction target concurrently with the visual display, the effect of set size is significantly reduced, suggesting that incremental linguistic processing of the first feature adjective and then the second feature adjective may facilitate something approximating a parallel extraction of objects during search for the target. Here, we extend these results to a variety of experimental designs. First, we replicate the result with a mixed-trials design (ruling out potential strategies associated with the blocked design of the original study). Second, in a mixed-trials experiment, the order of adjective types in the spoken query varies randomly across conditions. In a third experiment, we extend the effect to a triple-conjunction search task. A fourth (control) experiment demonstrates that these effects are not due to an efficient odd-one-out search that ignores the linguistic input. This series of experiments, along with attractor-network simulations of the phenomena, provide further evidence toward understanding linguistically mediated influences in real-time visual search processing.
Rough sets and Laplacian score based cost-sensitive feature selection

PubMed Central

Yu, Shenglong

2018-01-01

Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of “good” features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms. PMID:29912884
Rough sets and Laplacian score based cost-sensitive feature selection.

PubMed

Yu, Shenglong; Zhao, Hong

2018-01-01

Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of "good" features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.
Reduced Set of Virulence Genes Allows High Accuracy Prediction of Bacterial Pathogenicity in Humans

PubMed Central

Iraola, Gregorio; Vazquez, Gustavo; Spangenberg, Lucía; Naya, Hugo

2012-01-01

Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of different virulence-related genes among more than finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes () is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at ), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions. PMID:22916122
Detecting spam comments on Indonesia’s Instagram posts

NASA Astrophysics Data System (ADS)

Septiandri, Ali Akbar; Wibisono, Okiriza

2017-01-01

In this paper we experimented with several feature sets for detecting spam comments in social media contents authored by Indonesian public figures. We define spam comments as comments which have promotional purposes (e.g. referring other users to products and services) and thus not related to the content to which the comments are posted. Three sets of features are evaluated for detecting spams: (1) hand-engineered features such as comment length, number of capital letters, and number of emojis, (2) keyword features such as whether the comment contains advertising words or product-related words, and (3) text features, namely, bag-of-words, TF-IDF, and fastText embeddings, each combined with latent semantic analysis. With 24,000 manually-annotated comments scraped from Instagram posts authored by more than 100 Indonesian public figures, we compared the performance of these feature sets and their combinations using 3 popular classification algorithms: Na¨ıve Bayes, SVM, and XGBoost. We find that using all three feature sets (with fastText embedding for the text features) gave the best F 1-score of 0.9601 on a holdout dataset. More interestingly, fastText embedding combined with hand-engineered features (i.e. without keyword features) yield similar F 1-score of 0.9523, and McNemar’s test failed to reject the hypothesis that the two results are not significantly different. This result is important as keyword features are largely dependent on the dataset and may not be as generalisable as the other feature sets when applied to new data. For future work, we hope to collect bigger and more diverse dataset of Indonesian spam comments, improve our model’s performance and generalisability, and publish a programming package for others to reliably detect spam comments.
Contingent attentional capture across multiple feature dimensions in a temporal search task.

PubMed

Ito, Motohiro; Kawahara, Jun I

2016-01-01

The present study examined whether attention can be flexibly controlled to monitor two different feature dimensions (shape and color) in a temporal search task. Specifically, we investigated the occurrence of contingent attentional capture (i.e., interference from task-relevant distractors) and resulting set reconfiguration (i.e., enhancement of single task-relevant set). If observers can restrict searches to a specific value for each relevant feature dimension independently, the capture and reconfiguration effect should only occur when the single relevant distractor in each dimension appears. Participants identified a target letter surrounded by a non-green square or a non-square green frame. The results revealed contingent attentional capture, as target identification accuracy was lower when the distractor contained a target-defining feature than when it contained a nontarget feature. Resulting set reconfiguration was also obtained in that accuracy was superior when the current target's feature (e.g., shape) corresponded to the defining feature of the present distractor (shape) than when the current target's feature did not match the distractor's feature (color). This enhancement was not due to perceptual priming. The present study demonstrated that the principles of contingent attentional capture and resulting set reconfiguration held even when multiple target feature dimensions were monitored. Copyright © 2015 Elsevier B.V. All rights reserved.
New Features for Neuron Classification.

PubMed

Hernández-Pérez, Leonardo A; Delgado-Castillo, Duniel; Martín-Pérez, Rainer; Orozco-Morales, Rubén; Lorenzo-Ginori, Juan V

2018-04-28

This paper addresses the problem of obtaining new neuron features capable of improving results of neuron classification. Most studies on neuron classification using morphological features have been based on Euclidean geometry. Here three one-dimensional (1D) time series are derived from the three-dimensional (3D) structure of neuron instead, and afterwards a spatial time series is finally constructed from which the features are calculated. Digitally reconstructed neurons were separated into control and pathological sets, which are related to three categories of alterations caused by epilepsy, Alzheimer's disease (long and local projections), and ischemia. These neuron sets were then subjected to supervised classification and the results were compared considering three sets of features: morphological, features obtained from the time series and a combination of both. The best results were obtained using features from the time series, which outperformed the classification using only morphological features, showing higher correct classification rates with differences of 5.15, 3.75, 5.33% for epilepsy and Alzheimer's disease (long and local projections) respectively. The morphological features were better for the ischemia set with a difference of 3.05%. Features like variance, Spearman auto-correlation, partial auto-correlation, mutual information, local minima and maxima, all related to the time series, exhibited the best performance. Also we compared different evaluators, among which ReliefF was the best ranked.
The Current State and TRL Assessment of People Tracking Technology for Video Surveillance Applications

DTIC Science & Technology

2014-09-01

the feature-space used to represent the target. Sometimes we trade off keeping information about one domain of the target in exchange for robustness... Kullback - Leibler distance), can be used as a similarity function between a candidate target and a template. This approach is invariant to changes in scale...basis vectors to adapt to appearance change and learns the visual information that the set of targets have in common, which is used to reduce the
COSMOS: Carnegie Observatories System for MultiObject Spectroscopy

NASA Astrophysics Data System (ADS)

Oemler, A.; Clardy, K.; Kelson, D.; Walth, G.; Villanueva, E.

2017-05-01

COSMOS (Carnegie Observatories System for MultiObject Spectroscopy) reduces multislit spectra obtained with the IMACS and LDSS3 spectrographs on the Magellan Telescopes. It can be used for the quick-look analysis of data at the telescope as well as for pipeline reduction of large data sets. COSMOS is based on a precise optical model of the spectrographs, which allows (after alignment and calibration) an accurate prediction of the location of spectra features. This eliminates the line search procedure which is fundamental to many spectral reduction programs, and allows a robust data pipeline to be run in an almost fully automatic mode, allowing large amounts of data to be reduced with minimal intervention.
Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval.

PubMed

Wei, Xiu-Shen; Luo, Jian-Hao; Wu, Jianxin; Zhou, Zhi-Hua

2017-06-01

Deep convolutional neural network models pre-trained for the ImageNet classification task have been successfully adopted to tasks in other domains, such as texture description and object proposal generation, but these tasks require annotations for images in the new domain. In this paper, we focus on a novel and challenging task in the pure unsupervised setting: fine-grained image retrieval. Even with image labels, fine-grained images are difficult to classify, letting alone the unsupervised retrieval task. We propose the selective convolutional descriptor aggregation (SCDA) method. The SCDA first localizes the main object in fine-grained images, a step that discards the noisy background and keeps useful deep descriptors. The selected descriptors are then aggregated and the dimensionality is reduced into a short feature vector using the best practices we found. The SCDA is unsupervised, using no image label or bounding box annotation. Experiments on six fine-grained data sets confirm the effectiveness of the SCDA for fine-grained image retrieval. Besides, visualization of the SCDA features shows that they correspond to visual attributes (even subtle ones), which might explain SCDA's high-mean average precision in fine-grained retrieval. Moreover, on general image retrieval data sets, the SCDA achieves comparable retrieval results with the state-of-the-art general image retrieval approaches.
Jedi Public Health: Co-creating an Identity-Safe Culture to Promote Health Equity.

PubMed

Geronimus, Arline T; James, Sherman A; Destin, Mesmin; Graham, Louis A; Hatzenbuehler, Mark; Murphy, Mary; Pearson, Jay A; Omari, Amel; Thompson, James Phillip

2016-12-01

The extent to which socially-assigned and culturally mediated social identity affects health depends on contingencies of social identity that vary across and within populations in day-to-day life. These contingencies are structurally rooted and health damaging inasmuch as they activate physiological stress responses. They also have adverse effects on cognition and emotion, undermining self-confidence and diminishing academic performance. This impact reduces opportunities for social mobility, while ensuring those who "beat the odds" pay a physical price for their positive efforts. Recent applications of social identity theory toward closing racial, ethnic, and gender academic achievement gaps through changing features of educational settings, rather than individual students, have proved fruitful. We sought to integrate this evidence with growing social epidemiological evidence that structurally-rooted biopsychosocial processes have population health effects. We explicate an emergent framework, Jedi Public Health (JPH). JPH focuses on changing features of settings in everyday life, rather than individuals, to promote population health equity, a high priority, yet, elusive national public health objective. We call for an expansion and, in some ways, a re-orienting of efforts to eliminate population health inequity. Policies and interventions to remove and replace discrediting cues in everyday settings hold promise for disrupting the repeated physiological stress process activation that fuels population health inequities with potentially wide application.
ANALYSIS OF SAMPLING TECHNIQUES FOR IMBALANCED DATA: AN N=648 ADNI STUDY

PubMed Central

Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M.; Ye, Jieping

2013-01-01

Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer’s disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and under sampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1). a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2). sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. PMID:24176869
Analysis of sampling techniques for imbalanced data: An n = 648 ADNI study.

PubMed

Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M; Ye, Jieping

2014-02-15

Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer's disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and undersampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1) a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2) sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. © 2013 Elsevier Inc. All rights reserved.
Enhancing Critical Infrastructure and Key Resources (CIKR) Level-0 Physical Process Security Using Field Device Distinct Native Attribute Features

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lopez, Juan; Liefer, Nathan C.; Busho, Colin R.

Here, the need for improved Critical Infrastructure and Key Resource (CIKR) security is unquestioned and there has been minimal emphasis on Level-0 (PHY Process) improvements. Wired Signal Distinct Native Attribute (WS-DNA) Fingerprinting is investigated here as a non-intrusive PHY-based security augmentation to support an envisioned layered security strategy. Results are based on experimental response collections from Highway Addressable Remote Transducer (HART) Differential Pressure Transmitter (DPT) devices from three manufacturers (Yokogawa, Honeywell, Endress+Hauer) installed in an automated process control system. Device discrimination is assessed using Time Domain (TD) and Slope-Based FSK (SB-FSK) fingerprints input to Multiple Discriminant Analysis, Maximum Likelihood (MDA/ML)more » and Random Forest (RndF) classifiers. For 12 different classes (two devices per manufacturer at two distinct set points), both classifiers performed reliably and achieved an arbitrary performance benchmark of average cross-class percent correct of %C > 90%. The least challenging cross-manufacturer results included near-perfect %C ≈ 100%, while the more challenging like-model (serial number) discrimination results included 90%< %C < 100%, with TD Fingerprinting marginally outperforming SB-FSK Fingerprinting; SB-FSK benefits from having less stringent response alignment and registration requirements. The RndF classifier was most beneficial and enabled reliable selection of dimensionally reduced fingerprint subsets that minimize data storage and computational requirements. The RndF selected feature sets contained 15% of the full-dimensional feature sets and only suffered a worst case %CΔ = 3% to 4% performance degradation.« less
Bladder cancer treatment response assessment with radiomic, clinical, and radiologist semantic features

NASA Astrophysics Data System (ADS)

Gordon, Marshall N.; Cha, Kenny H.; Hadjiiski, Lubomir M.; Chan, Heang-Ping; Cohan, Richard H.; Caoili, Elaine M.; Paramagul, Chintana; Alva, Ajjai; Weizer, Alon Z.

2018-02-01

We are developing a decision support system for assisting clinicians in assessment of response to neoadjuvant chemotherapy for bladder cancer. Accurate treatment response assessment is crucial for identifying responders and improving quality of life for non-responders. An objective machine learning decision support system may help reduce variability and inaccuracy in treatment response assessment. We developed a predictive model to assess the likelihood that a patient will respond based on image and clinical features. With IRB approval, we retrospectively collected a data set of pre- and post- treatment CT scans along with clinical information from surgical pathology from 98 patients. A linear discriminant analysis (LDA) classifier was used to predict the likelihood that a patient would respond to treatment based on radiomic features extracted from CT urography (CTU), a radiologist's semantic feature, and a clinical feature extracted from surgical and pathology reports. The classification accuracy was evaluated using the area under the ROC curve (AUC) with a leave-one-case-out cross validation. The classification accuracy was compared for the systems based on radiomic features, clinical feature, and radiologist's semantic feature. For the system based on only radiomic features the AUC was 0.75. With the addition of clinical information from examination under anesthesia (EUA) the AUC was improved to 0.78. Our study demonstrated the potential of designing a decision support system to assist in treatment response assessment. The combination of clinical features, radiologist semantic features and CTU radiomic features improved the performance of the classifier and the accuracy of treatment response assessment.
Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers

NASA Astrophysics Data System (ADS)

Richa, Tambi; Ide, Soichiro; Suzuki, Ryosuke; Ebina, Teppei; Kuroda, Yutaka

2017-02-01

Efficient and rapid prediction of domain regions from amino acid sequence information alone is often required for swift structural and functional characterization of large multi-domain proteins. Here we introduce Fast H-DROP, a thirty times accelerated version of our previously reported H-DROP (Helical Domain linker pRediction using OPtimal features), which is unique in specifically predicting helical domain linkers (boundaries). Fast H-DROP, analogously to H-DROP, uses optimum features selected from a set of 3000 ones by combining a random forest and a stepwise feature selection protocol. We reduced the computational time from 8.5 min per sequence in H-DROP to 14 s per sequence in Fast H-DROP on an 8 Xeon processor Linux server by using SWISS-PROT instead of Genbank non-redundant (nr) database for generating the PSSMs. The sensitivity and precision of Fast H-DROP assessed by cross-validation were 33.7 and 36.2%, which were merely 2% lower than that of H-DROP. The reduced computational time of Fast H-DROP, without affecting prediction performances, makes it more interactive and user-friendly. Fast H-DROP and H-DROP are freely available from http://domserv.lab.tuat.ac.jp/.

Novel Mahalanobis-based feature selection improves one-class classification of early hepatocellular carcinoma.

PubMed

Thomaz, Ricardo de Lima; Carneiro, Pedro Cunha; Bonin, João Eliton; Macedo, Túlio Augusto Alves; Patrocinio, Ana Claudia; Soares, Alcimar Barbosa

2018-05-01

Detection of early hepatocellular carcinoma (HCC) is responsible for increasing survival rates in up to 40%. One-class classifiers can be used for modeling early HCC in multidetector computed tomography (MDCT), but demand the specific knowledge pertaining to the set of features that best describes the target class. Although the literature outlines several features for characterizing liver lesions, it is unclear which is most relevant for describing early HCC. In this paper, we introduce an unconstrained GA feature selection algorithm based on a multi-objective Mahalanobis fitness function to improve the classification performance for early HCC. We compared our approach to a constrained Mahalanobis function and two other unconstrained functions using Welch's t-test and Gaussian Data Descriptors. The performance of each fitness function was evaluated by cross-validating a one-class SVM. The results show that the proposed multi-objective Mahalanobis fitness function is capable of significantly reducing data dimensionality (96.4%) and improving one-class classification of early HCC (0.84 AUC). Furthermore, the results provide strong evidence that intensity features extracted at the arterial to portal and arterial to equilibrium phases are important for classifying early HCC.
Cellular neural network-based hybrid approach toward automatic image registration

NASA Astrophysics Data System (ADS)

Arun, Pattathal VijayaKumar; Katiyar, Sunil Kumar

2013-01-01

Image registration is a key component of various image processing operations that involve the analysis of different image data sets. Automatic image registration domains have witnessed the application of many intelligent methodologies over the past decade; however, inability to properly model object shape as well as contextual information has limited the attainable accuracy. A framework for accurate feature shape modeling and adaptive resampling using advanced techniques such as vector machines, cellular neural network (CNN), scale invariant feature transform (SIFT), coreset, and cellular automata is proposed. CNN has been found to be effective in improving feature matching as well as resampling stages of registration and complexity of the approach has been considerably reduced using coreset optimization. The salient features of this work are cellular neural network approach-based SIFT feature point optimization, adaptive resampling, and intelligent object modelling. Developed methodology has been compared with contemporary methods using different statistical measures. Investigations over various satellite images revealed that considerable success was achieved with the approach. This system has dynamically used spectral and spatial information for representing contextual knowledge using CNN-prolog approach. This methodology is also illustrated to be effective in providing intelligent interpretation and adaptive resampling.
Tongue Images Classification Based on Constrained High Dispersal Network.

PubMed

Meng, Dan; Cao, Guitao; Duan, Ye; Zhu, Minghua; Tu, Liping; Xu, Dong; Xu, Jiatuo

2017-01-01

Computer aided tongue diagnosis has a great potential to play important roles in traditional Chinese medicine (TCM). However, the majority of the existing tongue image analyses and classification methods are based on the low-level features, which may not provide a holistic view of the tongue. Inspired by deep convolutional neural network (CNN), we propose a novel feature extraction framework called constrained high dispersal neural networks (CHDNet) to extract unbiased features and reduce human labor for tongue diagnosis in TCM. Previous CNN models have mostly focused on learning convolutional filters and adapting weights between them, but these models have two major issues: redundancy and insufficient capability in handling unbalanced sample distribution. We introduce high dispersal and local response normalization operation to address the issue of redundancy. We also add multiscale feature analysis to avoid the problem of sensitivity to deformation. Our proposed CHDNet learns high-level features and provides more classification information during training time, which may result in higher accuracy when predicting testing samples. We tested the proposed method on a set of 267 gastritis patients and a control group of 48 healthy volunteers. Test results show that CHDNet is a promising method in tongue image classification for the TCM study.
A novel design of dual-channel optical system of star-tracker based on non-blind area PAL system

NASA Astrophysics Data System (ADS)

Luo, Yujie; Bai, Jian

2016-07-01

Star-tracker plays an important role in satellite navigation. Considering the satellites on near-Earth orbit, the system usually has two optical systems: one for observing the profile of Earth and the other for capturing the positions of stars. In this paper, we demonstrate a novel kind of dual-channel optical observation system of star-tracker with non-blind area PAL imaging system based on dichroic filter, which can combine both different observation channels into an integrated structure and realize the feature of miniaturization. According to the practical usage of star-tracker and the features of dichroic filter, we set the ultraviolet band as the PAL channel to observe the Earth with the FOV ranging from 40°-60°, and set the visible band as the front imaging channel to capture the stars far away from this system with the FOV ranging from 0°-20°. Consequently, the rays of both channels are converged on the same image plane, improving the efficiency of pixels of detector and reducing the weight and size of whole star-tracker system.
Recent progress and development of a speedster-EXD: a new event-triggered hybrid CMOS x-ray detector

NASA Astrophysics Data System (ADS)

Griffith, Christopher V.; Falcone, Abraham D.; Prieskorn, Zachary R.; Burrows, David N.

2015-08-01

We present the characterization of a new event-driven X-ray hybrid CMOS detector developed by Penn State University in collaboration with Teledyne Imaging Sensors. Along with its low susceptibility to radiation damage, low power consumption, and fast readout time to avoid pile-up, the Speedster-EXD has been designed with the capability to limit its readout to only those pixels containing charge, thus enabling even faster effective frame rates. The threshold for the comparator in each pixel can be set by the user so that only pixels with signal above the set threshold are read out. The Speedster-EXD hybrid CMOS detector also has two new in-pixel features that reduce noise from known noise sources: (1) a low-noise, high-gain CTIA amplifier to eliminate crosstalk from interpixel capacitance (IPC) and (2) in-pixel CDS subtraction to reduce kTC noise. We present the read noise, dark current, IPC, energy resolution, and gain variation measurements of one Speedster-EXD detector.
Simulation of lake ice and its effect on the late-Pleistocene evaporation rate of Lake Lahontan

USGS Publications Warehouse

Hostetler, S.W.

1991-01-01

A model of lake ice was coupled with a model of lake temperature and evaporation to assess the possible effect of ice cover on the late-Pleistocene evaporation rate of Lake Lahontan. The simulations were done using a data set based on proxy temperature indicators and features of the simulated late-Pleistocene atmospheric circulation over western North America. When a data set based on a mean-annual air temperature of 3?? C (7?? C colder than present) and reduced solar radiation from jet-stream induced cloud cover was used as input to the model, ice cover lasting ??? 4 months was simulated. Simulated evaporation rates (490-527 mm a-1) were ??? 60% lower than the present-day evaporation rate (1300 mm a-1) of Pyramid Lake. With this reduced rate of evaporation, water inputs similar to the 1983 historical maxima that occurred in the Lahontan basin would have been sufficient to maintain the 13.5 ka BP high stand of Lake Lahontan. ?? 1991 Springer-Verlag.
Bayesian learning for spatial filtering in an EEG-based brain-computer interface.

PubMed

Zhang, Haihong; Yang, Huijuan; Guan, Cuntai

2013-07-01

Spatial filtering for EEG feature extraction and classification is an important tool in brain-computer interface. However, there is generally no established theory that links spatial filtering directly to Bayes classification error. To address this issue, this paper proposes and studies a Bayesian analysis theory for spatial filtering in relation to Bayes error. Following the maximum entropy principle, we introduce a gamma probability model for describing single-trial EEG power features. We then formulate and analyze the theoretical relationship between Bayes classification error and the so-called Rayleigh quotient, which is a function of spatial filters and basically measures the ratio in power features between two classes. This paper also reports our extensive study that examines the theory and its use in classification, using three publicly available EEG data sets and state-of-the-art spatial filtering techniques and various classifiers. Specifically, we validate the positive relationship between Bayes error and Rayleigh quotient in real EEG power features. Finally, we demonstrate that the Bayes error can be practically reduced by applying a new spatial filter with lower Rayleigh quotient.
CNNdel: Calling Structural Variations on Low Coverage Data Based on Convolutional Neural Networks

PubMed Central

2017-01-01

Many structural variations (SVs) detection methods have been proposed due to the popularization of next-generation sequencing (NGS). These SV calling methods use different SV-property-dependent features; however, they all suffer from poor accuracy when running on low coverage sequences. The union of results from these tools achieves fairly high sensitivity but still produces low accuracy on low coverage sequence data. That is, these methods contain many false positives. In this paper, we present CNNdel, an approach for calling deletions from paired-end reads. CNNdel gathers SV candidates reported by multiple tools and then extracts features from aligned BAM files at the positions of candidates. With labeled feature-expressed candidates as a training set, CNNdel trains convolutional neural networks (CNNs) to distinguish true unlabeled candidates from false ones. Results show that CNNdel works well with NGS reads from 26 low coverage genomes of the 1000 Genomes Project. The paper demonstrates that convolutional neural networks can automatically assign the priority of SV features and reduce the false positives efficaciously. PMID:28630866
Learning better deep features for the prediction of occult invasive disease in ductal carcinoma in situ through transfer learning

NASA Astrophysics Data System (ADS)

Shi, Bibo; Hou, Rui; Mazurowski, Maciej A.; Grimm, Lars J.; Ren, Yinhao; Marks, Jeffrey R.; King, Lorraine M.; Maley, Carlo C.; Hwang, E. Shelley; Lo, Joseph Y.

2018-02-01

Purpose: To determine whether domain transfer learning can improve the performance of deep features extracted from digital mammograms using a pre-trained deep convolutional neural network (CNN) in the prediction of occult invasive disease for patients with ductal carcinoma in situ (DCIS) on core needle biopsy. Method: In this study, we collected digital mammography magnification views for 140 patients with DCIS at biopsy, 35 of which were subsequently upstaged to invasive cancer. We utilized a deep CNN model that was pre-trained on two natural image data sets (ImageNet and DTD) and one mammographic data set (INbreast) as the feature extractor, hypothesizing that these data sets are increasingly more similar to our target task and will lead to better representations of deep features to describe DCIS lesions. Through a statistical pooling strategy, three sets of deep features were extracted using the CNNs at different levels of convolutional layers from the lesion areas. A logistic regression classifier was then trained to predict which tumors contain occult invasive disease. The generalization performance was assessed and compared using repeated random sub-sampling validation and receiver operating characteristic (ROC) curve analysis. Result: The best performance of deep features was from CNN model pre-trained on INbreast, and the proposed classifier using this set of deep features was able to achieve a median classification performance of ROC-AUC equal to 0.75, which is significantly better (p<=0.05) than the performance of deep features extracted using ImageNet data set (ROCAUC = 0.68). Conclusion: Transfer learning is helpful for learning a better representation of deep features, and improves the prediction of occult invasive disease in DCIS.
Real-Time Feature Tracking Using Homography

NASA Technical Reports Server (NTRS)

Clouse, Daniel S.; Cheng, Yang; Ansar, Adnan I.; Trotz, David C.; Padgett, Curtis W.

2010-01-01

This software finds feature point correspondences in sequences of images. It is designed for feature matching in aerial imagery. Feature matching is a fundamental step in a number of important image processing operations: calibrating the cameras in a camera array, stabilizing images in aerial movies, geo-registration of images, and generating high-fidelity surface maps from aerial movies. The method uses a Shi-Tomasi corner detector and normalized cross-correlation. This process is likely to result in the production of some mismatches. The feature set is cleaned up using the assumption that there is a large planar patch visible in both images. At high altitude, this assumption is often reasonable. A mathematical transformation, called an homography, is developed that allows us to predict the position in image 2 of any point on the plane in image 1. Any feature pair that is inconsistent with the homography is thrown out. The output of the process is a set of feature pairs, and the homography. The algorithms in this innovation are well known, but the new implementation improves the process in several ways. It runs in real-time at 2 Hz on 64-megapixel imagery. The new Shi-Tomasi corner detector tries to produce the requested number of features by automatically adjusting the minimum distance between found features. The homography-finding code now uses an implementation of the RANSAC algorithm that adjusts the number of iterations automatically to achieve a pre-set probability of missing a set of inliers. The new interface allows the caller to pass in a set of predetermined points in one of the images. This allows the ability to track the same set of points through multiple frames.
What Top-Down Task Sets Do for Us: An ERP Study on the Benefits of Advance Preparation in Visual Search

ERIC Educational Resources Information Center

Eimer, Martin; Kiss, Monika; Nicholas, Susan

2011-01-01

When target-defining features are specified in advance, attentional target selection in visual search is controlled by preparatory top-down task sets. We used ERP measures to study voluntary target selection in the absence of such feature-specific task sets, and to compare it to selection that is guided by advance knowledge about target features.…
The effect of feature selection methods on computer-aided detection of masses in mammograms

NASA Astrophysics Data System (ADS)

Hupse, Rianne; Karssemeijer, Nico

2010-05-01

In computer-aided diagnosis (CAD) research, feature selection methods are often used to improve generalization performance of classifiers and shorten computation times. In an application that detects malignant masses in mammograms, we investigated the effect of using a selection criterion that is similar to the final performance measure we are optimizing, namely the mean sensitivity of the system in a predefined range of the free-response receiver operating characteristics (FROC). To obtain the generalization performance of the selected feature subsets, a cross validation procedure was performed on a dataset containing 351 abnormal and 7879 normal regions, each region providing a set of 71 mass features. The same number of noise features, not containing any information, were added to investigate the ability of the feature selection algorithms to distinguish between useful and non-useful features. It was found that significantly higher performances were obtained using feature sets selected by the general test statistic Wilks' lambda than using feature sets selected by the more specific FROC measure. Feature selection leads to better performance when compared to a system in which all features were used.
Military personnel recognition system using texture, colour, and SURF features

NASA Astrophysics Data System (ADS)

Irhebhude, Martins E.; Edirisinghe, Eran A.

2014-06-01

This paper presents an automatic, machine vision based, military personnel identification and classification system. Classification is done using a Support Vector Machine (SVM) on sets of Army, Air Force and Navy camouflage uniform personnel datasets. In the proposed system, the arm of service of personnel is recognised by the camouflage of a persons uniform, type of cap and the type of badge/logo. The detailed analysis done include; camouflage cap and plain cap differentiation using gray level co-occurrence matrix (GLCM) texture feature; classification on Army, Air Force and Navy camouflaged uniforms using GLCM texture and colour histogram bin features; plain cap badge classification into Army, Air Force and Navy using Speed Up Robust Feature (SURF). The proposed method recognised camouflage personnel arm of service on sets of data retrieved from google images and selected military websites. Correlation-based Feature Selection (CFS) was used to improve recognition and reduce dimensionality, thereby speeding the classification process. With this method success rates recorded during the analysis include 93.8% for camouflage appearance category, 100%, 90% and 100% rates of plain cap and camouflage cap categories for Army, Air Force and Navy categories, respectively. Accurate recognition was recorded using SURF for the plain cap badge category. Substantial analysis has been carried out and results prove that the proposed method can correctly classify military personnel into various arms of service. We show that the proposed method can be integrated into a face recognition system, which will recognise personnel in addition to determining the arm of service which the personnel belong. Such a system can be used to enhance the security of a military base or facility.
EEG-based recognition of video-induced emotions: selecting subject-independent feature set.

PubMed

Kortelainen, Jukka; Seppänen, Tapio

2013-01-01

Emotions are fundamental for everyday life affecting our communication, learning, perception, and decision making. Including emotions into the human-computer interaction (HCI) could be seen as a significant step forward offering a great potential for developing advanced future technologies. While the electrical activity of the brain is affected by emotions, offers electroencephalogram (EEG) an interesting channel to improve the HCI. In this paper, the selection of subject-independent feature set for EEG-based emotion recognition is studied. We investigate the effect of different feature sets in classifying person's arousal and valence while watching videos with emotional content. The classification performance is optimized by applying a sequential forward floating search algorithm for feature selection. The best classification rate (65.1% for arousal and 63.0% for valence) is obtained with a feature set containing power spectral features from the frequency band of 1-32 Hz. The proposed approach substantially improves the classification rate reported in the literature. In future, further analysis of the video-induced EEG changes including the topographical differences in the spectral features is needed.
BSIFT: toward data-independent codebook for large scale image search.

PubMed

Zhou, Wengang; Li, Houqiang; Hong, Richang; Lu, Yijuan; Tian, Qi

2015-03-01

Bag-of-Words (BoWs) model based on Scale Invariant Feature Transform (SIFT) has been widely used in large-scale image retrieval applications. Feature quantization by vector quantization plays a crucial role in BoW model, which generates visual words from the high- dimensional SIFT features, so as to adapt to the inverted file structure for the scalable retrieval. Traditional feature quantization approaches suffer several issues, such as necessity of visual codebook training, limited reliability, and update inefficiency. To avoid the above problems, in this paper, a novel feature quantization scheme is proposed to efficiently quantize each SIFT descriptor to a descriptive and discriminative bit-vector, which is called binary SIFT (BSIFT). Our quantizer is independent of image collections. In addition, by taking the first 32 bits out from BSIFT as code word, the generated BSIFT naturally lends itself to adapt to the classic inverted file structure for image indexing. Moreover, the quantization error is reduced by feature filtering, code word expansion, and query sensitive mask shielding. Without any explicit codebook for quantization, our approach can be readily applied in image search in some resource-limited scenarios. We evaluate the proposed algorithm for large scale image search on two public image data sets. Experimental results demonstrate the index efficiency and retrieval accuracy of our approach.
A Novel Feature Selection Technique for Text Classification Using Naïve Bayes.

PubMed

Dey Sarkar, Subhajit; Goswami, Saptarsi; Agarwal, Aman; Aktar, Javed

2014-01-01

With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.
Using listener-based perceptual features as intermediate representations in music information retrieval.

PubMed

Friberg, Anders; Schoonderwaldt, Erwin; Hedblad, Anton; Fabiani, Marco; Elowsson, Anders

2014-10-01

The notion of perceptual features is introduced for describing general music properties based on human perception. This is an attempt at rethinking the concept of features, aiming to approach the underlying human perception mechanisms. Instead of using concepts from music theory such as tones, pitches, and chords, a set of nine features describing overall properties of the music was selected. They were chosen from qualitative measures used in psychology studies and motivated from an ecological approach. The perceptual features were rated in two listening experiments using two different data sets. They were modeled both from symbolic and audio data using different sets of computational features. Ratings of emotional expression were predicted using the perceptual features. The results indicate that (1) at least some of the perceptual features are reliable estimates; (2) emotion ratings could be predicted by a small combination of perceptual features with an explained variance from 75% to 93% for the emotional dimensions activity and valence; (3) the perceptual features could only to a limited extent be modeled using existing audio features. Results clearly indicated that a small number of dedicated features were superior to a "brute force" model using a large number of general audio features.
Using statistical text classification to identify health information technology incidents

PubMed Central

Chai, Kevin E K; Anthony, Stephen; Coiera, Enrico; Magrabi, Farah

2013-01-01

Objective To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database. Design We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined. Measurements κ statistic, F1 score, precision and recall. Results Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165). Conclusions Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation. PMID:23666777
Systems and Methods for Correcting Optical Reflectance Measurements

NASA Technical Reports Server (NTRS)

Yang, Ye (Inventor); Shear, Michael A. (Inventor); Soller, Babs R. (Inventor); Soyemi, Olusola O. (Inventor)

2014-01-01

We disclose measurement systems and methods for measuring analytes in target regions of samples that also include features overlying the target regions. The systems include: (a) a light source; (b) a detection system; (c) a set of at least first, second, and third light ports which transmit light from the light source to a sample and receive and direct light reflected from the sample to the detection system, generating a first set of data including information corresponding to both an internal target within the sample and features overlying the internal target, and a second set of data including information corresponding to features overlying the internal target; and (d) a processor configured to remove information characteristic of the overlying features from the first set of data using the first and second sets of data to produce corrected information representing the internal target.
Systems and methods for correcting optical reflectance measurements

NASA Technical Reports Server (NTRS)

Yang, Ye (Inventor); Soller, Babs R. (Inventor); Soyemi, Olusola O. (Inventor); Shear, Michael A. (Inventor)

2009-01-01

We disclose measurement systems and methods for measuring analytes in target regions of samples that also include features overlying the target regions. The systems include: (a) a light source; (b) a detection system; (c) a set of at least first, second, and third light ports which transmit light from the light source to a sample and receive and direct light reflected from the sample to the detection system, generating a first set of data including information corresponding to both an internal target within the sample and features overlying the internal target, and a second set of data including information corresponding to features overlying the internal target; and (d) a processor configured to remove information characteristic of the overlying features from the first set of data using the first and second sets of data to produce corrected information representing the internal target.

A framework for feature extraction from hospital medical data with applications in risk prediction.

PubMed

Tran, Truyen; Luo, Wei; Phung, Dinh; Gupta, Sunil; Rana, Santu; Kennedy, Richard Lee; Larkins, Ann; Venkatesh, Svetha

2014-12-30

Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities. Hospital medical records was transformed to event sequences, to which filters were applied to extract feature sets capturing diversity in temporal scales and data types. The features were evaluated on a readmission prediction task, comparing with baseline feature sets generated from the Elixhauser comorbidities. The prediction model was through logistic regression with elastic net regularization. Predictions horizons of 1, 2, 3, 6, 12 months were considered for four diverse diseases: diabetes, COPD, mental disorders and pneumonia, with derivation and validation cohorts defined on non-overlapping data-collection periods. For unplanned readmissions, auto-extracted feature set using socio-demographic information and medical records, outperformed baselines derived from the socio-demographic information and Elixhauser comorbidities, over 20 settings (5 prediction horizons over 4 diseases). In particular over 30-day prediction, the AUCs are: COPD-baseline: 0.60 (95% CI: 0.57, 0.63), auto-extracted: 0.67 (0.64, 0.70); diabetes-baseline: 0.60 (0.58, 0.63), auto-extracted: 0.67 (0.64, 0.69); mental disorders-baseline: 0.57 (0.54, 0.60), auto-extracted: 0.69 (0.64,0.70); pneumonia-baseline: 0.61 (0.59, 0.63), auto-extracted: 0.70 (0.67, 0.72). The advantages of auto-extracted standard features from complex medical records, in a disease and task agnostic manner were demonstrated. Auto-extracted features have good predictive power over multiple time horizons. Such feature sets have potential to form the foundation of complex automated analytic tasks.
Ensemble methods with simple features for document zone classification

NASA Astrophysics Data System (ADS)

Obafemi-Ajayi, Tayo; Agam, Gady; Xie, Bingqing

2012-01-01

Document layout analysis is of fundamental importance for document image understanding and information retrieval. It requires the identification of blocks extracted from a document image via features extraction and block classification. In this paper, we focus on the classification of the extracted blocks into five classes: text (machine printed), handwriting, graphics, images, and noise. We propose a new set of features for efficient classifications of these blocks. We present a comparative evaluation of three ensemble based classification algorithms (boosting, bagging, and combined model trees) in addition to other known learning algorithms. Experimental results are demonstrated for a set of 36503 zones extracted from 416 document images which were randomly selected from the tobacco legacy document collection. The results obtained verify the robustness and effectiveness of the proposed set of features in comparison to the commonly used Ocropus recognition features. When used in conjunction with the Ocropus feature set, we further improve the performance of the block classification system to obtain a classification accuracy of 99.21%.
Aeromagnetic data in the UK: a study of the information content of baseline and modern surveys across Anglesey, North Wales

NASA Astrophysics Data System (ADS)

Beamish, David; White, James C.

2011-01-01

A number of modern, multiparameter, high resolution airborne geophysical surveys (termed HiRES) have been conducted over the past decade across onshore UK. These were undertaken, in part, as a response to the limited resolution of the existing UK national baseline magnetic survey data set acquired in the late 1950s and early 1960s. Modern magnetic survey data, obtained with higher precision and reduced line spacing and elevation, provide an improved data set; however the distinctions between the two available resources, existing and new, are rarely quantified. In this contribution we demonstrate and quantify the improvements that can be anticipated using the new data. The information content of the data sets is examined using a series of modern processing and modelling procedures that provide a full assessment of their resolution capabilities. The framework for the study involves two components. The first relates to the definition of the shallow magnetic structure in relation to an ongoing 1:10 k and 1:50 k geological map revision. The second component relates to the performance of the datasets in defining maps of magnetic basement and assisting with larger scale geological and structural interpretation. One of the smaller HiRES survey areas, the island of Anglesey (Ynys Môn), off the coast of NW Wales is used to provide a series of comparative studies. The geological setting here is both complex and debated and cultural interference is prevalent in the low altitude modern survey data. It is demonstrated that successful processing and interpretation can be carried out on data that have not been systematically corrected (decultured) for non-geological perturbations. Across the survey area a wide number of near-surface magnetic features are evident and are dominated by a reversely magnetized Palaeogene dyke swarm that extends offshore. The average depth to the upper surfaces of the dykes is found to be 44 m. The existing baseline data are necessarily limited in resolving features <1 km in scale; however a detailed comparison of the existing and new data reveals the extent to which these quasi-linear features can be resolved and mapped. The precise limitations of the baseline data in terms of detection, location and estimated depth are quantified. The spectral content of both data sets is examined and the longest wavelength information is extracted to estimate the resolution of magnetic basement features in the two data sets. A significant finding is the lack of information in the baseline data set across wavelengths of between 1 and ˜10 km. Here the HiRES data provide a detailed mapping of shallow magnetic basement features (1-3 km) that display a relevance to current understanding of the fault-bounded terranes that cross the survey area. Equally, the compact scale of the modern survey does not provide deeper (>3 km to upper surface) assessments of magnetic basement. This further assessment is successfully provided by the larger scale baseline data which locates and defines a mid-crustal magnetic basement feature, centred beneath the Snowdon Massif, and illustrates that basement of similar characteristic extends beneath much of Anglesey.
Learning to Detect Vandalism in Social Content Systems: A Study on Wikipedia

NASA Astrophysics Data System (ADS)

Javanmardi, Sara; McDonald, David W.; Caruana, Rich; Forouzan, Sholeh; Lopes, Cristina V.

A challenge facing user generated content systems is vandalism, i.e. edits that damage content quality. The high visibility and easy access to social networks makes them popular targets for vandals. Detecting and removing vandalism is critical for these user generated content systems. Because vandalism can take many forms, there are many different kinds of features that are potentially useful for detecting it. The complex nature of vandalism, and the large number of potential features, make vandalism detection difficult and time consuming for human editors. Machine learning techniques hold promise for developing accurate, tunable, and maintainable models that can be incorporated into vandalism detection tools. We describe a method for training classifiers for vandalism detection that yields classifiers that are more accurate on the PAN 2010 corpus than others previously developed. Because of the high turnaround in social network systems, it is important for vandalism detection tools to run in real-time. To this aim, we use feature selection to find the minimal set of features consistent with high accuracy. In addition, because some features are more costly to compute than others, we use cost-sensitive feature selection to reduce the total computational cost of executing our models. In addition to the features previously used for spam detection, we introduce new features based on user action histories. The user history features contribute significantly to classifier performance. The approach we use is general and can easily be applied to other user generated content systems.
Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction

PubMed Central

Arruti, Andoni; Cearreta, Idoia; Álvarez, Aitor; Lazkano, Elena; Sierra, Basilio

2014-01-01

Study of emotions in human–computer interaction is a growing research area. This paper shows an attempt to select the most significant features for emotion recognition in spoken Basque and Spanish Languages using different methods for feature selection. RekEmozio database was used as the experimental data set. Several Machine Learning paradigms were used for the emotion classification task. Experiments were executed in three phases, using different sets of features as classification variables in each phase. Moreover, feature subset selection was applied at each phase in order to seek for the most relevant feature subset. The three phases approach was selected to check the validity of the proposed approach. Achieved results show that an instance-based learning algorithm using feature subset selection techniques based on evolutionary algorithms is the best Machine Learning paradigm in automatic emotion recognition, with all different feature sets, obtaining a mean of 80,05% emotion recognition rate in Basque and a 74,82% in Spanish. In order to check the goodness of the proposed process, a greedy searching approach (FSS-Forward) has been applied and a comparison between them is provided. Based on achieved results, a set of most relevant non-speaker dependent features is proposed for both languages and new perspectives are suggested. PMID:25279686
Microwave assisted preparation of magnesium phosphate cement (MPC) for orthopedic applications: a novel solution to the exothermicity problem.

PubMed

Zhou, Huan; Agarwal, Anand K; Goel, Vijay K; Bhaduri, Sarit B

2013-10-01

There are two interesting features of this paper. First, we report herein a novel microwave assisted technique to prepare phosphate based orthopedic cements, which do not generate any exothermicity during setting. The exothermic reactions during the setting of phosphate cements can cause tissue damage during the administration of injectable compositions and hence a solution to the problem is sought via microwave processing. This solution through microwave exposure is based on a phenomenon that microwave irradiation can remove all water molecules from the alkaline earth phosphate cement paste to temporarily stop the setting reaction while preserving the active precursor phase in the formulation. The setting reaction can be initiated a second time by adding aqueous medium, but without any exothermicity. Second, a special emphasis is placed on using this technique to synthesize magnesium phosphate cements for orthopedic applications with their enhanced mechanical properties and possible uses as drug and protein delivery vehicles. The as-synthesized cements were evaluated for the occurrences of exothermic reactions, setting times, presence of Mg-phosphate phases, compressive strength levels, microstructural features before and after soaking in (simulated body fluid) SBF, and in vitro cytocompatibility responses. The major results show that exposure to microwaves solves the exothermicity problem, while simultaneously improving the mechanical performance of hardened cements and reducing the setting times. As expected, the cements are also found to be cytocompatible. Finally, it is observed that this process can be applied to calcium phosphate cements system (CPCs) as well. Based on the results, this microwave exposure provides a novel technique for the processing of injectable phosphate bone cement compositions. © 2013.
The interaction of feature and space based orienting within the attention set.

PubMed

Lim, Ahnate; Sinnett, Scott

2014-01-01

The processing of sensory information relies on interacting mechanisms of sustained attention and attentional capture, both of which operate in space and on object features. While evidence indicates that exogenous attentional capture, a mechanism previously understood to be automatic, can be eliminated while concurrently performing a demanding task, we reframe this phenomenon within the theoretical framework of the "attention set" (Most et al., 2005). Consequently, the specific prediction that cuing effects should reappear when feature dimensions of the cue overlap with those in the attention set (i.e., elements of the demanding task) was empirically tested and confirmed using a dual-task paradigm involving both sustained attention and attentional capture, adapted from Santangelo et al. (2007). Participants were required to either detect a centrally presented target presented in a stream of distractors (the primary task), or respond to a spatially cued target (the secondary task). Importantly, the spatial cue could either share features with the target in the centrally presented primary task, or not share any features. Overall, the findings supported the attention set hypothesis showing that a spatial cuing effect was only observed when the peripheral cue shared a feature with objects that were already in the attention set (i.e., the primary task). However, this finding was accompanied by differential attentional orienting dependent on the different types of objects within the attention set, with feature-based orienting occurring for target-related objects, and additional spatial-based orienting for distractor-related objects.
Reconstruction and feature selection for desorption electrospray ionization mass spectroscopy imagery

NASA Astrophysics Data System (ADS)

Gao, Yi; Zhu, Liangjia; Norton, Isaiah; Agar, Nathalie Y. R.; Tannenbaum, Allen

2014-03-01

Desorption electrospray ionization mass spectrometry (DESI-MS) provides a highly sensitive imaging technique for differentiating normal and cancerous tissue at the molecular level. This can be very useful, especially under intra-operative conditions where the surgeon has to make crucial decision about the tumor boundary. In such situations, the time it takes for imaging and data analysis becomes a critical factor. Therefore, in this work we utilize compressive sensing to perform the sparse sampling of the tissue, which halves the scanning time. Furthermore, sparse feature selection is performed, which not only reduces the dimension of data from about 104 to less than 50, and thus significantly shortens the analysis time. This procedure also identifies biochemically important molecules for further pathological analysis. The methods are validated on brain and breast tumor data sets.
A fully redundant power hinge for LANDSAT-D appendages

NASA Technical Reports Server (NTRS)

Mamrol, F. E.; Matteo, D. N.

1981-01-01

The configuration and testing of a power driven hinge for deployment of the solar array and antenna boom for the LANDSAT-D spacecraft is discussed. The hinge is fully mechanically and electrically redundant and, thereby, can sustain a single point failure of any one motor (or its power supply), speed reducer, or bearing set without loss of its ability to function. This design utilizes the capability of the stepper motor drive to remove the flexibility of the drive train from the joint stiffness equation when the hinge is loaded against its stop. This feature precludes gapping of the joint under spacecraft maneuver loads even in the absence of a latching feature. Thus, retraction is easily accomplished by motor reversal without the need for a solenoid function to remove the latch.
Ambulatory REACT: real-time seizure detection with a DSP microprocessor.

PubMed

McEvoy, Robert P; Faul, Stephen; Marnane, William P

2010-01-01

REACT (Real-Time EEG Analysis for event deteCTion) is a Support Vector Machine based technology which, in recent years, has been successfully applied to the problem of automated seizure detection in both adults and neonates. This paper describes the implementation of REACT on a commercial DSP microprocessor; the Analog Devices Blackfin®. The primary aim of this work is to develop a prototype system for use in ambulatory or in-ward automated EEG analysis. Furthermore, the complexity of the various stages of the REACT algorithm on the Blackfin processor is analysed; in particular the EEG feature extraction stages. This hardware profile is used to select a reduced, platform-aware feature set, in order to evaluate the seizure classification accuracy of a lower-complexity, lower-power REACT system.
mHealth: Using Mobile Technology to Support Healthcare

PubMed Central

Okuboyejo, Senanu; Eyesan, Omatseyin

2014-01-01

Adherence to long-term therapy in outpatient setting is required to reduce the prevalence of chronic diseases such as HIV/AIDS, Diabetes, Tuberculosis and Malaria. This paper presents a mobile technology-based medical alert system for outpatient adherence in Nigeria. The system makes use of the SMS and voice features of mobile phones. The system has the potential of improving adherence to medication in outpatient setting by reminding patients of dosing schedules and attendance to scheduled appointments through SMS and voice calls. It will also inform patients of benefits and risks associated with adherence. Interventions aimed at improving adherence would provide significant positive return on investment through primary prevention (of risk factors) and secondary prevention of adverse health outcomes. PMID:24678384
Modeling Tools for Propulsion Analysis and Computational Fluid Dynamics on the Internet

NASA Technical Reports Server (NTRS)

Muss, J. A.; Johnson, C. W.; Gotchy, M. B.

2000-01-01

The existing RocketWeb(TradeMark) Internet Analysis System (httr)://www.iohnsonrockets.com/rocketweb) provides an integrated set of advanced analysis tools that can be securely accessed over the Internet. Since these tools consist of both batch and interactive analysis codes, the system includes convenient methods for creating input files and evaluating the resulting data. The RocketWeb(TradeMark) system also contains many features that permit data sharing which, when further developed, will facilitate real-time, geographically diverse, collaborative engineering within a designated work group. Adding work group management functionality while simultaneously extending and integrating the system's set of design and analysis tools will create a system providing rigorous, controlled design development, reducing design cycle time and cost.
A keyword spotting model using perceptually significant energy features

NASA Astrophysics Data System (ADS)

Umakanthan, Padmalochini

The task of a keyword recognition system is to detect the presence of certain words in a conversation based on the linguistic information present in human speech. Such keyword spotting systems have applications in homeland security, telephone surveillance and human-computer interfacing. General procedure of a keyword spotting system involves feature generation and matching. In this work, new set of features that are based on the psycho-acoustic masking nature of human speech are proposed. After developing these features a time aligned pattern matching process was implemented to locate the words in a set of unknown words. A word boundary detection technique based on frame classification using the nonlinear characteristics of speech is also addressed in this work. Validation of this keyword spotting model was done using widely acclaimed Cepstral features. The experimental results indicate the viability of using these perceptually significant features as an augmented feature set in keyword spotting.
A random forest model based classification scheme for neonatal amplitude-integrated EEG.

PubMed

Chen, Weiting; Wang, Yu; Cao, Guitao; Chen, Guoqiang; Gu, Qiufang

2014-01-01

Modern medical advances have greatly increased the survival rate of infants, while they remain in the higher risk group for neurological problems later in life. For the infants with encephalopathy or seizures, identification of the extent of brain injury is clinically challenging. Continuous amplitude-integrated electroencephalography (aEEG) monitoring offers a possibility to directly monitor the brain functional state of the newborns over hours, and has seen an increasing application in neonatal intensive care units (NICUs). This paper presents a novel combined feature set of aEEG and applies random forest (RF) method to classify aEEG tracings. To that end, a series of experiments were conducted on 282 aEEG tracing cases (209 normal and 73 abnormal ones). Basic features, statistic features and segmentation features were extracted from both the tracing as a whole and the segmented recordings, and then form a combined feature set. All the features were sent to a classifier afterwards. The significance of feature, the data segmentation, the optimization of RF parameters, and the problem of imbalanced datasets were examined through experiments. Experiments were also done to evaluate the performance of RF on aEEG signal classifying, compared with several other widely used classifiers including SVM-Linear, SVM-RBF, ANN, Decision Tree (DT), Logistic Regression(LR), ML, and LDA. The combined feature set can better characterize aEEG signals, compared with basic features, statistic features and segmentation features respectively. With the combined feature set, the proposed RF-based aEEG classification system achieved a correct rate of 92.52% and a high F1-score of 95.26%. Among all of the seven classifiers examined in our work, the RF method got the highest correct rate, sensitivity, specificity, and F1-score, which means that RF outperforms all of the other classifiers considered here. The results show that the proposed RF-based aEEG classification system with the combined feature set is efficient and helpful to better detect the brain disorders in newborns.
Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach.

PubMed

Liu, Li; Shao, Ling; Li, Xuelong; Lu, Ke

2016-01-01

Extracting discriminative and robust features from video sequences is the first and most critical step in human action recognition. In this paper, instead of using handcrafted features, we automatically learn spatio-temporal motion features for action recognition. This is achieved via an evolutionary method, i.e., genetic programming (GP), which evolves the motion feature descriptor on a population of primitive 3D operators (e.g., 3D-Gabor and wavelet). In this way, the scale and shift invariant features can be effectively extracted from both color and optical flow sequences. We intend to learn data adaptive descriptors for different datasets with multiple layers, which makes fully use of the knowledge to mimic the physical structure of the human visual cortex for action recognition and simultaneously reduce the GP searching space to effectively accelerate the convergence of optimal solutions. In our evolutionary architecture, the average cross-validation classification error, which is calculated by an support-vector-machine classifier on the training set, is adopted as the evaluation criterion for the GP fitness function. After the entire evolution procedure finishes, the best-so-far solution selected by GP is regarded as the (near-)optimal action descriptor obtained. The GP-evolving feature extraction method is evaluated on four popular action datasets, namely KTH, HMDB51, UCF YouTube, and Hollywood2. Experimental results show that our method significantly outperforms other types of features, either hand-designed or machine-learned.
Two-speed phacoemulsification for soft cataracts using optimized parameters and procedure step toolbar with the CENTURION Vision System and Balanced Tip.

PubMed

Davison, James A

2015-01-01

To present a cause of posterior capsule aspiration and a technique using optimized parameters to prevent it from happening when operating soft cataracts. A prospective list of posterior capsule aspiration cases was kept over 4,062 consecutive cases operated with the Alcon CENTURION machine and Balanced Tip. Video analysis of one case of posterior capsule aspiration was accomplished. A surgical technique was developed using empirically derived machine parameters and customized setting-selection procedure step toolbar to reduce the pace of aspiration of soft nuclear quadrants in order to prevent capsule aspiration. Two cases out of 3,238 experienced posterior capsule aspiration before use of the soft quadrant technique. Video analysis showed an attractive vortex effect with capsule aspiration occurring in 1/5 of a second. A soft quadrant removal setting was empirically derived which had a slower pace and seemed more controlled with no capsule aspiration occurring in the subsequent 824 cases. The setting featured simultaneous linear control from zero to preset maximums for: aspiration flow, 20 mL/min; and vacuum, 400 mmHg, with the addition of torsional tip amplitude up to 20% after the fluidic maximums were achieved. A new setting selection procedure step toolbar was created to increase intraoperative flexibility by providing instantaneous shifting between the soft and normal settings. A technique incorporating a reduced pace for soft quadrant acquisition and aspiration can be accomplished through the use of a dedicated setting of integrated machine parameters. Toolbar placement of the procedure button next to the normal setting procedure button provides the opportunity to instantaneously alternate between the two settings. Simultaneous surgeon control over vacuum, aspiration flow, and torsional tip motion may make removal of soft nuclear quadrants more efficient and safer.
A three-way approach for protein function classification

PubMed Central

2017-01-01

The knowledge of protein functions plays an essential role in understanding biological cells and has a significant impact on human life in areas such as personalized medicine, better crops and improved therapeutic interventions. Due to expense and inherent difficulty of biological experiments, intelligent methods are generally relied upon for automatic assignment of functions to proteins. The technological advancements in the field of biology are improving our understanding of biological processes and are regularly resulting in new features and characteristics that better describe the role of proteins. It is inevitable to neglect and overlook these anticipated features in designing more effective classification techniques. A key issue in this context, that is not being sufficiently addressed, is how to build effective classification models and approaches for protein function prediction by incorporating and taking advantage from the ever evolving biological information. In this article, we propose a three-way decision making approach which provides provisions for seeking and incorporating future information. We considered probabilistic rough sets based models such as Game-Theoretic Rough Sets (GTRS) and Information-Theoretic Rough Sets (ITRS) for inducing three-way decisions. An architecture of protein functions classification with probabilistic rough sets based three-way decisions is proposed and explained. Experiments are carried out on Saccharomyces cerevisiae species dataset obtained from Uniprot database with the corresponding functional classes extracted from the Gene Ontology (GO) database. The results indicate that as the level of biological information increases, the number of deferred cases are reduced while maintaining similar level of accuracy. PMID:28234929
A three-way approach for protein function classification.

PubMed

Ur Rehman, Hafeez; Azam, Nouman; Yao, JingTao; Benso, Alfredo

2017-01-01

The knowledge of protein functions plays an essential role in understanding biological cells and has a significant impact on human life in areas such as personalized medicine, better crops and improved therapeutic interventions. Due to expense and inherent difficulty of biological experiments, intelligent methods are generally relied upon for automatic assignment of functions to proteins. The technological advancements in the field of biology are improving our understanding of biological processes and are regularly resulting in new features and characteristics that better describe the role of proteins. It is inevitable to neglect and overlook these anticipated features in designing more effective classification techniques. A key issue in this context, that is not being sufficiently addressed, is how to build effective classification models and approaches for protein function prediction by incorporating and taking advantage from the ever evolving biological information. In this article, we propose a three-way decision making approach which provides provisions for seeking and incorporating future information. We considered probabilistic rough sets based models such as Game-Theoretic Rough Sets (GTRS) and Information-Theoretic Rough Sets (ITRS) for inducing three-way decisions. An architecture of protein functions classification with probabilistic rough sets based three-way decisions is proposed and explained. Experiments are carried out on Saccharomyces cerevisiae species dataset obtained from Uniprot database with the corresponding functional classes extracted from the Gene Ontology (GO) database. The results indicate that as the level of biological information increases, the number of deferred cases are reduced while maintaining similar level of accuracy.
A Hot Downflowing Model Atmosphere for Umbral Flashes and the Physical Properties of Their Dark Fibrils

DOE Office of Scientific and Technical Information (OSTI.GOV)

Henriques, V. M. J.; Mathioudakis, M.; Socas-Navarro, H.

We perform non-LTE inversions in a large set of umbral flashes, including the dark fibrils visible within them, and in the quiescent umbra by using the inversion code NICOLE on a set of full Stokes high-resolution Ca ii λ 8542 observations of a sunspot at disk center. We find that the dark structures have Stokes profiles that are distinct from those of the quiescent and flashed regions. They are best reproduced by atmospheres that are more similar to the flashed atmosphere in terms of velocities, even if with reduced amplitudes. We also find two sets of solutions that finely fitmore » the flashed profiles: a set that is upflowing, featuring a transition region that is deeper than in the quiescent case and preceded by a slight dip in temperature, and a second solution with a hotter atmosphere in the chromosphere but featuring downflows close to the speed of sound at such heights. Such downflows may be related, or even dependent, on the presence of coronal loops, rooted in the umbra of sunspots, as is the case in the region analyzed. Similar loops have been recently observed to have supersonic downflows in the transition region and are consistent with the earlier “sunspot plumes,” which were invariably found to display strong downflows in sunspots. Finally, we find, on average, a magnetic field reduction in the flashed areas, suggesting that the shock pressure is moving field lines in the upper layers.« less
Effects of alternative label formats on choice of high- and low-sodium products in a New Zealand population sample.

PubMed

McLean, Rachael; Hoek, Janet; Hedderley, Duncan

2012-05-01

Dietary sodium reduction is a cost-effective public health intervention to reduce chronic disease. In response to calls for further research into front-of-pack labelling systems, we examined how alternative sodium nutrition label formats and nutrition claims influenced consumers' choice behaviour and whether consumers with or without a diagnosis of hypertension differed in their choice patterns. An anonymous online experiment in which participants viewed ten choice sets featuring three fictitious brands of baked beans with varied label formats and nutritional profiles (high and low sodium) and indicated which brand in each set they would purchase if shopping for this product. Participants were recruited from New Zealand's largest online nationwide research panel. Five hundred people with self-reported hypertension and 191 people without hypertension aged 18 to 79 years. The addition of a front-of-pack label increased both groups' ability to discriminate between products with high and low sodium, while the Traffic Light label enabled better identification of the high-sodium product. Both front-of-pack formats enhanced discrimination in the presence of a reduced salt claim, but the Traffic Light label also performed better than the Percentage Daily Intake label in moderating the effect of the claim for the high-sodium product. Front-of-pack labels, particularly those with simple visual cues, enhance consumers' ability to discriminate between high- and low-sodium products, even when those products feature nutrition claims.

Testing Product Generation in Software Product Lines Using Pairwise for Features Coverage

NASA Astrophysics Data System (ADS)

Pérez Lamancha, Beatriz; Polo Usaola, Macario

A Software Product Lines (SPL) is "a set of software-intensive systems sharing a common, managed set of features that satisfy the specific needs of a particular market segment or mission and that are developed from a common set of core assets in a prescribed way". Variability is a central concept that permits the generation of different products of the family by reusing core assets. It is captured through features which, for a SPL, define its scope. Features are represented in a feature model, which is later used to generate the products from the line. From the testing point of view, testing all the possible combinations in feature models is not practical because: (1) the number of possible combinations (i.e., combinations of features for composing products) may be untreatable, and (2) some combinations may contain incompatible features. Thus, this paper resolves the problem by the implementation of combinatorial testing techniques adapted to the SPL context.
High-level intuitive features (HLIFs) for intuitive skin lesion description.

PubMed

Amelard, Robert; Glaister, Jeffrey; Wong, Alexander; Clausi, David A

2015-03-01

A set of high-level intuitive features (HLIFs) is proposed to quantitatively describe melanoma in standard camera images. Melanoma is the deadliest form of skin cancer. With rising incidence rates and subjectivity in current clinical detection methods, there is a need for melanoma decision support systems. Feature extraction is a critical step in melanoma decision support systems. Existing feature sets for analyzing standard camera images are comprised of low-level features, which exist in high-dimensional feature spaces and limit the system's ability to convey intuitive diagnostic rationale. The proposed HLIFs were designed to model the ABCD criteria commonly used by dermatologists such that each HLIF represents a human-observable characteristic. As such, intuitive diagnostic rationale can be conveyed to the user. Experimental results show that concatenating the proposed HLIFs with a full low-level feature set increased classification accuracy, and that HLIFs were able to separate the data better than low-level features with statistical significance. An example of a graphical interface for providing intuitive rationale is given.
Unique sudden onsets capture attention even when observers are in feature-search mode.

PubMed

Spalek, Thomas M; Yanko, Matthew R; Poiese, Paola; Lagroix, Hayley E P

2012-01-01

Two sources of attentional capture have been proposed: stimulus-driven (exogenous) and goal-oriented (endogenous). A resolution between these modes of capture has not been straightforward. Even such a clearly exogenous event as the sudden onset of a stimulus can be said to capture attention endogenously if observers operate in singleton-detection mode rather than feature-search mode. In four experiments we show that a unique sudden onset captures attention even when observers are in feature-search mode. The displays were rapid serial visual presentation (RSVP) streams of differently coloured letters with the target letter defined by a specific colour. Distractors were four #s, one of the target colour, surrounding one of the non-target letters. Capture was substantially reduced when the onset of the distractor array was not unique because it was preceded by other sets of four grey # arrays in the RSVP stream. This provides unambiguous evidence that attention can be captured both exogenously and endogenously within a single task.
Multimodal Deep Autoencoder for Human Pose Recovery.

PubMed

Hong, Chaoqun; Yu, Jun; Wan, Jian; Tao, Dacheng; Wang, Meng

2015-12-01

Video-based human pose recovery is usually conducted by retrieving relevant poses using image features. In the retrieving process, the mapping between 2D images and 3D poses is assumed to be linear in most of the traditional methods. However, their relationships are inherently non-linear, which limits recovery performance of these methods. In this paper, we propose a novel pose recovery method using non-linear mapping with multi-layered deep neural network. It is based on feature extraction with multimodal fusion and back-propagation deep learning. In multimodal fusion, we construct hypergraph Laplacian with low-rank representation. In this way, we obtain a unified feature description by standard eigen-decomposition of the hypergraph Laplacian matrix. In back-propagation deep learning, we learn a non-linear mapping from 2D images to 3D poses with parameter fine-tuning. The experimental results on three data sets show that the recovery error has been reduced by 20%-25%, which demonstrates the effectiveness of the proposed method.
A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization.

PubMed

Vafaee Sharbaf, Fatemeh; Mosafer, Sara; Moattar, Mohammad Hossein

2016-06-01

This paper proposes an approach for gene selection in microarray data. The proposed approach consists of a primary filter approach using Fisher criterion which reduces the initial genes and hence the search space and time complexity. Then, a wrapper approach which is based on cellular learning automata (CLA) optimized with ant colony method (ACO) is used to find the set of features which improve the classification accuracy. CLA is applied due to its capability to learn and model complicated relationships. The selected features from the last phase are evaluated using ROC curve and the most effective while smallest feature subset is determined. The classifiers which are evaluated in the proposed framework are K-nearest neighbor; support vector machine and naïve Bayes. The proposed approach is evaluated on 4 microarray datasets. The evaluations confirm that the proposed approach can find the smallest subset of genes while approaching the maximum accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.
A reduced amino acid alphabet for understanding and designing protein adaptation to mutation.

PubMed

Etchebest, C; Benros, C; Bornot, A; Camproux, A-C; de Brevern, A G

2007-11-01

Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the alpha-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.
Computerized detection of noncalcified plaques in coronary CT angiography: Evaluation of topological soft gradient prescreening method and luminal analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, Jun, E-mail: jvwei@umich.edu; Zhou, Chuan; Chan, Heang-Ping

2014-08-15

Purpose: The buildup of noncalcified plaques (NCPs) that are vulnerable to rupture in coronary arteries is a risk for myocardial infarction. Interpretation of coronary CT angiography (cCTA) to search for NCP is a challenging task for radiologists due to the low CT number of NCP, the large number of coronary arteries, and multiple phase CT acquisition. The authors conducted a preliminary study to develop machine learning method for automated detection of NCPs in cCTA. Methods: With IRB approval, a data set of 83 ECG-gated contrast enhanced cCTA scans with 120 NCPs was collected retrospectively from patient files. A multiscale coronarymore » artery response and rolling balloon region growing (MSCAR-RBG) method was applied to each cCTA volume to extract the coronary arterial trees. Each extracted vessel was reformatted to a straightened volume composed of cCTA slices perpendicular to the vessel centerline. A topological soft-gradient (TSG) detection method was developed to prescreen for NCP candidates by analyzing the 2D topological features of the radial gradient field surface along the vessel wall. The NCP candidates were then characterized by a luminal analysis that used 3D geometric features to quantify the shape information and gray-level features to evaluate the density of the NCP candidates. With machine learning techniques, useful features were identified and combined into an NCP score to differentiate true NCPs from false positives (FPs). To evaluate the effectiveness of the image analysis methods, the authors performed tenfold cross-validation with the available data set. Receiver operating characteristic (ROC) analysis was used to assess the classification performance of individual features and the NCP score. The overall detection performance was estimated by free response ROC (FROC) analysis. Results: With our TSG prescreening method, a prescreening sensitivity of 92.5% (111/120) was achieved with a total of 1181 FPs (14.2 FPs/scan). On average, six features were selected during the tenfold cross-validation training. The average area under the ROC curve (AUC) value for training was 0.87 ± 0.01 and the AUC value for validation was 0.85 ± 0.01. Using the NCP score, FROC analysis of the validation set showed that the FP rates were reduced to 3.16, 1.90, and 1.39 FPs/scan at sensitivities of 90%, 80%, and 70%, respectively. Conclusions: The topological soft-gradient prescreening method in combination with the luminal analysis for FP reduction was effective for detection of NCPs in cCTA, including NCPs causing positive or negative vessel remodeling. The accuracy of vessel segmentation, tracking, and centerline identification has a strong impact on NCP detection. Studies are underway to further improve these techniques and reduce the FPs of the CADe system.« less
ROC analysis of lesion descriptors in breast ultrasound images

NASA Astrophysics Data System (ADS)

Andre, Michael P.; Galperin, Michael; Phan, Peter; Chiu, Peter

2003-05-01

Breast biopsy serves as the key diagnostic tool in the evaluation of breast masses for malignancy, yet the procedure affects patients physically and emotionally and may obscure results of future mammograms. Studies show that high quality ultrasound can distinguish a benign from malignant lesions with accuracy, however, it has proven difficult to teach and clinical results are highly variable. The purpose of this study is to develop a means to optimize an automated Computer Aided Imaging System (CAIS) to assess Level of Suspicion (LOS) of a breast mass. We examine the contribution of 15 object features to lesion classification by calculating the Wilcoxon area under the ROC curve, AW, for all combinations in a set of 146 masses with known findings. For each interval A, the frequency of appearance of each feature and its combinations with others was computed as a means to find an "optimum" feature vector. The original set of 15 was reduced to 6 (area, perimeter, diameter ferret Y, relief, homogeneity, average energy) with an improvement from Aw=0.82-/+0.04 for the original 15 to Aw=0.93-/+0.02 for the subset of 6, p=0.03. For comparison, two sub-specialty mammography radiologists also scored the images for LOS resulting in Az of 0.90 and 0.87. The CAIS performed significantly higher, p=0.02.
Enabling Quantitative Optical Imaging for In-die-capable Critical Dimension Targets

PubMed Central

Barnes, B.M.; Henn, M.-A.; Sohn, M. Y.; Zhou, H.; Silver, R. M.

2017-01-01

Dimensional scaling trends will eventually bring semiconductor critical dimensions (CDs) down to only a few atoms in width. New optical techniques are required to address the measurement and variability for these CDs using sufficiently small in-die metrology targets. Recently, Qin et al. [Light Sci Appl, 5, e16038 (2016)] demonstrated quantitative model-based measurements of finite sets of lines with features as small as 16 nm using 450 nm wavelength light. This paper uses simulation studies, augmented with experiments at 193 nm wavelength, to adapt and optimize the finite sets of features that work as in-die-capable metrology targets with minimal increases in parametric uncertainty. A finite element based solver for time-harmonic Maxwell's equations yields two- and three-dimensional simulations of the electromagnetic scattering for optimizing the design of such targets as functions of reduced line lengths, fewer number of lines, fewer focal positions, smaller critical dimensions, and shorter illumination wavelength. Metrology targets that exceeded performance requirements are as short as 3 μm for 193 nm light, feature as few as eight lines, and are extensible to sub-10 nm CDs. Target areas measured at 193 nm can be fifteen times smaller in area than current state-of-the-art scatterometry targets described in the literature. This new methodology is demonstrated to be a promising alternative for optical model-based in-die CD metrology. PMID:28757674
SVM Based Descriptor Selection and Classification of Neurodegenerative Disease Drugs for Pharmacological Modeling.

PubMed

Shahid, Mohammad; Shahzad Cheema, Muhammad; Klenner, Alexander; Younesi, Erfan; Hofmann-Apitius, Martin

2013-03-01

Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Influence of Texture and Colour in Breast TMA Classification

PubMed Central

Fernández-Carrobles, M. Milagro; Bueno, Gloria; Déniz, Oscar; Salido, Jesús; García-Rojo, Marcial; González-López, Lucía

2015-01-01

Breast cancer diagnosis is still done by observation of biopsies under the microscope. The development of automated methods for breast TMA classification would reduce diagnostic time. This paper is a step towards the solution for this problem and shows a complete study of breast TMA classification based on colour models and texture descriptors. The TMA images were divided into four classes: i) benign stromal tissue with cellularity, ii) adipose tissue, iii) benign and benign anomalous structures, and iv) ductal and lobular carcinomas. A relevant set of features was obtained on eight different colour models from first and second order Haralick statistical descriptors obtained from the intensity image, Fourier, Wavelets, Multiresolution Gabor, M-LBP and textons descriptors. Furthermore, four types of classification experiments were performed using six different classifiers: (1) classification per colour model individually, (2) classification by combination of colour models, (3) classification by combination of colour models and descriptors, and (4) classification by combination of colour models and descriptors with a previous feature set reduction. The best result shows an average of 99.05% accuracy and 98.34% positive predictive value. These results have been obtained by means of a bagging tree classifier with combination of six colour models and the use of 1719 non-correlated (correlation threshold of 97%) textural features based on Statistical, M-LBP, Gabor and Spatial textons descriptors. PMID:26513238
Automated identification of diagnosis and co-morbidity in clinical records.

PubMed

Cano, C; Blanco, A; Peshkin, L

2009-01-01

Automated understanding of clinical records is a challenging task involving various legal and technical difficulties. Clinical free text is inherently redundant, unstructured, and full of acronyms, abbreviations and domain-specific language which make it challenging to mine automatically. There is much effort in the field focused on creating specialized ontology, lexicons and heuristics based on expert knowledge of the domain. However, ad-hoc solutions poorly generalize across diseases or diagnoses. This paper presents a successful approach for a rapid prototyping of a diagnosis classifier based on a popular computational linguistics platform. The corpus consists of several hundred of full length discharge summaries provided by Partners Healthcare. The goal is to identify a diagnosis and assign co-morbidi-ty. Our approach is based on the rapid implementation of a logistic regression classifier using an existing toolkit: LingPipe (http://alias-i.com/lingpipe). We implement and compare three different classifiers. The baseline approach uses character 5-grams as features. The second approach uses a bag-of-words representation enriched with a small additional set of features. The third approach reduces a feature set to the most informative features according to the information content. The proposed systems achieve high performance (average F-micro 0.92) for the task. We discuss the relative merit of the three classifiers. Supplementary material with detailed results is available at: http:// decsai.ugr.es/~ccano/LR/supplementary_ material/ We show that our methodology for rapid prototyping of a domain-unaware system is effective for building an accurate classifier for clinical records.
Influence of nuclei segmentation on breast cancer malignancy classification

NASA Astrophysics Data System (ADS)

Jelen, Lukasz; Fevens, Thomas; Krzyzak, Adam

2009-02-01

Breast Cancer is one of the most deadly cancers affecting middle-aged women. Accurate diagnosis and prognosis are crucial to reduce the high death rate. Nowadays there are numerous diagnostic tools for breast cancer diagnosis. In this paper we discuss a role of nuclear segmentation from fine needle aspiration biopsy (FNA) slides and its influence on malignancy classification. Classification of malignancy plays a very important role during the diagnosis process of breast cancer. Out of all cancer diagnostic tools, FNA slides provide the most valuable information about the cancer malignancy grade which helps to choose an appropriate treatment. This process involves assessing numerous nuclear features and therefore precise segmentation of nuclei is very important. In this work we compare three powerful segmentation approaches and test their impact on the classification of breast cancer malignancy. The studied approaches involve level set segmentation, fuzzy c-means segmentation and textural segmentation based on co-occurrence matrix. Segmented nuclei were used to extract nuclear features for malignancy classification. For classification purposes four different classifiers were trained and tested with previously extracted features. The compared classifiers are Multilayer Perceptron (MLP), Self-Organizing Maps (SOM), Principal Component-based Neural Network (PCA) and Support Vector Machines (SVM). The presented results show that level set segmentation yields the best results over the three compared approaches and leads to a good feature extraction with a lowest average error rate of 6.51% over four different classifiers. The best performance was recorded for multilayer perceptron with an error rate of 3.07% using fuzzy c-means segmentation.
The relationship study between image features and detection probability based on psychology experiments

NASA Astrophysics Data System (ADS)

Lin, Wei; Chen, Yu-hua; Wang, Ji-yuan; Gao, Hong-sheng; Wang, Ji-jun; Su, Rong-hua; Mao, Wei

2011-04-01

Detection probability is an important index to represent and estimate target viability, which provides basis for target recognition and decision-making. But it will expend a mass of time and manpower to obtain detection probability in reality. At the same time, due to the different interpretation of personnel practice knowledge and experience, a great difference will often exist in the datum obtained. By means of studying the relationship between image features and perception quantity based on psychology experiments, the probability model has been established, in which the process is as following.Firstly, four image features have been extracted and quantified, which affect directly detection. Four feature similarity degrees between target and background were defined. Secondly, the relationship between single image feature similarity degree and perception quantity was set up based on psychological principle, and psychological experiments of target interpretation were designed which includes about five hundred people for interpretation and two hundred images. In order to reduce image features correlativity, a lot of artificial synthesis images have been made which include images with single brightness feature difference, images with single chromaticity feature difference, images with single texture feature difference and images with single shape feature difference. By analyzing and fitting a mass of experiments datum, the model quantitys have been determined. Finally, by applying statistical decision theory and experimental results, the relationship between perception quantity with target detection probability has been found. With the verification of a great deal of target interpretation in practice, the target detection probability can be obtained by the model quickly and objectively.
Using Gaussian windows to explore a multivariate data set

NASA Technical Reports Server (NTRS)

Jaeckel, Louis A.

1991-01-01

In an earlier paper, I recounted an exploratory analysis, using Gaussian windows, of a data set derived from the Infrared Astronomical Satellite. Here, my goals are to develop strategies for finding structural features in a data set in a many-dimensional space, and to find ways to describe the shape of such a data set. After a brief review of Gaussian windows, I describe the current implementation of the method. I give some ways of describing features that we might find in the data, such as clusters and saddle points, and also extended structures such as a 'bar', which is an essentially one-dimensional concentration of data points. I then define a distance function, which I use to determine which data points are 'associated' with a feature. Data points not associated with any feature are called 'outliers'. I then explore the data set, giving the strategies that I used and quantitative descriptions of the features that I found, including clusters, bars, and a saddle point. I tried to use strategies and procedures that could, in principle, be used in any number of dimensions.
The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data.

PubMed

Kim, Eunji; Ivanov, Ivan; Hua, Jianping; Lampe, Johanna W; Hullar, Meredith Aj; Chapkin, Robert S; Dougherty, Edward R

2017-01-01

Ranking feature sets for phenotype classification based on gene expression is a challenging issue in cancer bioinformatics. When the number of samples is small, all feature selection algorithms are known to be unreliable, producing significant error, and error estimators suffer from different degrees of imprecision. The problem is compounded by the fact that the accuracy of classification depends on the manner in which the phenomena are transformed into data by the measurement technology. Because next-generation sequencing technologies amount to a nonlinear transformation of the actual gene or RNA concentrations, they can potentially produce less discriminative data relative to the actual gene expression levels. In this study, we compare the performance of ranking feature sets derived from a model of RNA-Seq data with that of a multivariate normal model of gene concentrations using 3 measures: (1) ranking power, (2) length of extensions, and (3) Bayes features. This is the model-based study to examine the effectiveness of reporting lists of small feature sets using RNA-Seq data and the effects of different model parameters and error estimators. The results demonstrate that the general trends of the parameter effects on the ranking power of the underlying gene concentrations are preserved in the RNA-Seq data, whereas the power of finding a good feature set becomes weaker when gene concentrations are transformed by the sequencing machine.
Geospatial Analytics in Retail Site Selection and Sales Prediction.

PubMed

Ting, Choo-Yee; Ho, Chiung Ching; Yee, Hui Jia; Matsah, Wan Razali

2018-03-01

Studies have shown that certain features from geography, demography, trade area, and environment can play a vital role in retail site selection, largely due to the impact they asserted on retail performance. Although the relevant features could be elicited by domain experts, determining the optimal feature set can be intractable and labor-intensive exercise. The challenges center around (1) how to determine features that are important to a particular retail business and (2) how to estimate retail sales performance given a new location? The challenges become apparent when the features vary across time. In this light, this study proposed a nonintervening approach by employing feature selection algorithms and subsequently sales prediction through similarity-based methods. The results of prediction were validated by domain experts. In this study, data sets from different sources were transformed and aggregated before an analytics data set that is ready for analysis purpose could be obtained. The data sets included data about feature location, population count, property type, education status, and monthly sales from 96 branches of a telecommunication company in Malaysia. The finding suggested that (1) optimal retail performance can only be achieved through fulfillment of specific location features together with the surrounding trade area characteristics and (2) similarity-based method can provide solution to retail sales prediction.
Unsupervised Feature Selection Based on the Morisita Index for Hyperspectral Images

NASA Astrophysics Data System (ADS)

Golay, Jean; Kanevski, Mikhail

2017-04-01

Hyperspectral sensors are capable of acquiring images with hundreds of narrow and contiguous spectral bands. Compared with traditional multispectral imagery, the use of hyperspectral images allows better performance in discriminating between land-cover classes, but it also results in large redundancy and high computational data processing. To alleviate such issues, unsupervised feature selection techniques for redundancy minimization can be implemented. Their goal is to select the smallest subset of features (or bands) in such a way that all the information content of a data set is preserved as much as possible. The present research deals with the application to hyperspectral images of a recently introduced technique of unsupervised feature selection: the Morisita-Based filter for Redundancy Minimization (MBRM). MBRM is based on the (multipoint) Morisita index of clustering and on the Morisita estimator of Intrinsic Dimension (ID). The fundamental idea of the technique is to retain only the bands which contribute to increasing the ID of an image. In this way, redundant bands are disregarded, since they have no impact on the ID. Besides, MBRM has several advantages over benchmark techniques: in addition to its ability to deal with large data sets, it can capture highly-nonlinear dependences and its implementation is straightforward in any programming environment. Experimental results on freely available hyperspectral images show the good effectiveness of MBRM in remote sensing data processing. Comparisons with benchmark techniques are carried out and random forests are used to assess the performance of MBRM in reducing the data dimensionality without loss of relevant information. References [1] C. Traina Jr., A.J.M. Traina, L. Wu, C. Faloutsos, Fast feature selection using fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases, SBBD, pp. 158-171, 2000. [2] J. Golay, M. Kanevski, A new estimator of intrinsic dimension based on the multipoint Morisita index, Pattern Recognition 48(12), pp. 4070-4081, 2015. [3] J. Golay, M. Kanevski, Unsupervised feature selection based on the Morisita estimator of intrinsic dimension, arXiv:1608.05581, 2016.
Multiview Locally Linear Embedding for Effective Medical Image Retrieval

PubMed Central

Shen, Hualei; Tao, Dacheng; Ma, Dianfu

2013-01-01

Content-based medical image retrieval continues to gain attention for its potential to assist radiological image interpretation and decision making. Many approaches have been proposed to improve the performance of medical image retrieval system, among which visual features such as SIFT, LBP, and intensity histogram play a critical role. Typically, these features are concatenated into a long vector to represent medical images, and thus traditional dimension reduction techniques such as locally linear embedding (LLE), principal component analysis (PCA), or laplacian eigenmaps (LE) can be employed to reduce the “curse of dimensionality”. Though these approaches show promising performance for medical image retrieval, the feature-concatenating method ignores the fact that different features have distinct physical meanings. In this paper, we propose a new method called multiview locally linear embedding (MLLE) for medical image retrieval. Following the patch alignment framework, MLLE preserves the geometric structure of the local patch in each feature space according to the LLE criterion. To explore complementary properties among a range of features, MLLE assigns different weights to local patches from different feature spaces. Finally, MLLE employs global coordinate alignment and alternating optimization techniques to learn a smooth low-dimensional embedding from different features. To justify the effectiveness of MLLE for medical image retrieval, we compare it with conventional spectral embedding methods. We conduct experiments on a subset of the IRMA medical image data set. Evaluation results show that MLLE outperforms state-of-the-art dimension reduction methods. PMID:24349277
Parallel computation of GA search for the artery shape determinants with CFD

NASA Astrophysics Data System (ADS)

Himeno, M.; Noda, S.; Fukasaku, K.; Himeno, R.

2010-06-01

We studied which factors play important role to determine the shape of arteries at the carotid artery bifurcation by performing multi-objective optimization with computation fluid dynamics (CFD) and the genetic algorithm (GA). To perform it, the most difficult problem is how to reduce turn-around time of the GA optimization with 3D unsteady computation of blood flow. We devised two levels of parallel computation method with the following features: level 1: parallel CFD computation with appropriate number of cores; level 2: parallel jobs generated by "master", which finds quickly available job cue and dispatches jobs, to reduce turn-around time. As a result, the turn-around time of one GA trial, which would have taken 462 days with one core, was reduced to less than two days on RIKEN supercomputer system, RICC, with 8192 cores. We performed a multi-objective optimization to minimize the maximum mean WSS and to minimize the sum of circumference for four different shapes and obtained a set of trade-off solutions for each shape. In addition, we found that the carotid bulb has the feature of the minimum local mean WSS and minimum local radius. We confirmed that our method is effective for examining determinants of artery shapes.

A Computational, Tissue-Realistic Model of Pressure Ulcer Formation in Individuals with Spinal Cord Injury.

PubMed

Ziraldo, Cordelia; Solovyev, Alexey; Allegretti, Ana; Krishnan, Shilpa; Henzel, M Kristi; Sowa, Gwendolyn A; Brienza, David; An, Gary; Mi, Qi; Vodovotz, Yoram

2015-06-01

People with spinal cord injury (SCI) are predisposed to pressure ulcers (PU). PU remain a significant burden in cost of care and quality of life despite improved mechanistic understanding and advanced interventions. An agent-based model (ABM) of ischemia/reperfusion-induced inflammation and PU (the PUABM) was created, calibrated to serial images of post-SCI PU, and used to investigate potential treatments in silico. Tissue-level features of the PUABM recapitulated visual patterns of ulcer formation in individuals with SCI. These morphological features, along with simulated cell counts and mediator concentrations, suggested that the influence of inflammatory dynamics caused simulations to be committed to "better" vs. "worse" outcomes by 4 days of simulated time and prior to ulcer formation. Sensitivity analysis of model parameters suggested that increasing oxygen availability would reduce PU incidence. Using the PUABM, in silico trials of anti-inflammatory treatments such as corticosteroids and a neutralizing antibody targeted at Damage-Associated Molecular Pattern molecules (DAMPs) suggested that, at best, early application at a sufficiently high dose could attenuate local inflammation and reduce pressure-associated tissue damage, but could not reduce PU incidence. The PUABM thus shows promise as an adjunct for mechanistic understanding, diagnosis, and design of therapies in the setting of PU.
A Computational, Tissue-Realistic Model of Pressure Ulcer Formation in Individuals with Spinal Cord Injury

PubMed Central

Ziraldo, Cordelia; Solovyev, Alexey; Allegretti, Ana; Krishnan, Shilpa; Henzel, M. Kristi; Sowa, Gwendolyn A.; Brienza, David; An, Gary; Mi, Qi; Vodovotz, Yoram

2015-01-01

People with spinal cord injury (SCI) are predisposed to pressure ulcers (PU). PU remain a significant burden in cost of care and quality of life despite improved mechanistic understanding and advanced interventions. An agent-based model (ABM) of ischemia/reperfusion-induced inflammation and PU (the PUABM) was created, calibrated to serial images of post-SCI PU, and used to investigate potential treatments in silico. Tissue-level features of the PUABM recapitulated visual patterns of ulcer formation in individuals with SCI. These morphological features, along with simulated cell counts and mediator concentrations, suggested that the influence of inflammatory dynamics caused simulations to be committed to “better” vs. “worse” outcomes by 4 days of simulated time and prior to ulcer formation. Sensitivity analysis of model parameters suggested that increasing oxygen availability would reduce PU incidence. Using the PUABM, in silico trials of anti-inflammatory treatments such as corticosteroids and a neutralizing antibody targeted at Damage-Associated Molecular Pattern molecules (DAMPs) suggested that, at best, early application at a sufficiently high dose could attenuate local inflammation and reduce pressure-associated tissue damage, but could not reduce PU incidence. The PUABM thus shows promise as an adjunct for mechanistic understanding, diagnosis, and design of therapies in the setting of PU. PMID:26111346
Clinical features of symptomatic patellofemoral joint osteoarthritis

PubMed Central

2012-01-01

Introduction Patellofemoral joint osteoarthritis (OA) is common and leads to pain and disability. However, current classification criteria do not distinguish between patellofemoral and tibiofemoral joint OA. The objective of this study was to provide empirical evidence of the clinical features of patellofemoral joint OA (PFJOA) and to explore the potential for making a confident clinical diagnosis in the community setting. Methods This was a population-based cross-sectional study of 745 adults aged ≥50 years with knee pain. Information on risk factors and clinical signs and symptoms was gathered by a self-complete questionnaire, and standardised clinical interview and examination. Three radiographic views of the knee were obtained (weight-bearing semi-flexed posteroanterior, supine skyline and lateral) and individuals were classified into four subsets (no radiographic OA, isolated PFJOA, isolated tibiofemoral joint OA, combined patellofemoral/tibiofemoral joint OA) according to two different cut-offs: 'any OA' and 'moderate to severe OA'. A series of binary logistic and multinomial regression functions were performed to compare the clinical features of each subset and their ability in combination to discriminate PFJOA from other subsets. Results Distinctive clinical features of moderate to severe isolated PFJOA included a history of dramatic swelling, valgus deformity, markedly reduced quadriceps strength, and pain on patellofemoral joint compression. Mild isolated PFJOA was barely distinguished from no radiographic OA (AUC 0.71, 95% CI 0.66, 0.76) with only difficulty descending stairs and coarse crepitus marginally informative over age, sex and body mass index. Other cardinal signs of knee OA - the presence of effusion, bony enlargement, reduced flexion range of movement, mediolateral instability and varus deformity - were indicators of tibiofemoral joint OA. Conclusions Early isolated PFJOA is clinically manifest in symptoms and self-reported functional limitation but has fewer clear clinical signs. More advanced disease is indicated by a small number of simple-to-assess signs and the relative absence of classic signs of knee OA, which are predominantly manifestations of tibiofemoral joint OA. Confident diagnosis of even more advanced PFJOA may be limited in the community setting. PMID:22417687
Computer aided detection of clusters of microcalcifications on full field digital mammograms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ge Jun; Sahiner, Berkman; Hadjiiski, Lubomir M.

2006-08-15

We are developing a computer-aided detection (CAD) system to identify microcalcification clusters (MCCs) automatically on full field digital mammograms (FFDMs). The CAD system includes six stages: preprocessing; image enhancement; segmentation of microcalcification candidates; false positive (FP) reduction for individual microcalcifications; regional clustering; and FP reduction for clustered microcalcifications. At the stage of FP reduction for individual microcalcifications, a truncated sum-of-squares error function was used to improve the efficiency and robustness of the training of an artificial neural network in our CAD system for FFDMs. At the stage of FP reduction for clustered microcalcifications, morphological features and features derived from themore » artificial neural network outputs were extracted from each cluster. Stepwise linear discriminant analysis (LDA) was used to select the features. An LDA classifier was then used to differentiate clustered microcalcifications from FPs. A data set of 96 cases with 192 images was collected at the University of Michigan. This data set contained 96 MCCs, of which 28 clusters were proven by biopsy to be malignant and 68 were proven to be benign. The data set was separated into two independent data sets for training and testing of the CAD system in a cross-validation scheme. When one data set was used to train and validate the convolution neural network (CNN) in our CAD system, the other data set was used to evaluate the detection performance. With the use of a truncated error metric, the training of CNN could be accelerated and the classification performance was improved. The CNN in combination with an LDA classifier could substantially reduce FPs with a small tradeoff in sensitivity. By using the free-response receiver operating characteristic methodology, it was found that our CAD system can achieve a cluster-based sensitivity of 70, 80, and 90 % at 0.21, 0.61, and 1.49 FPs/image, respectively. For case-based performance evaluation, a sensitivity of 70, 80, and 90 % can be achieved at 0.07, 0.17, and 0.65 FPs/image, respectively. We also used a data set of 216 mammograms negative for clustered microcalcifications to further estimate the FP rate of our CAD system. The corresponding FP rates were 0.15, 0.31, and 0.86 FPs/image for cluster-based detection when negative mammograms were used for estimation of FP rates.« less
A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets.

PubMed

Li, Der-Chiang; Liu, Chiao-Wen; Hu, Susan C

2011-05-01

Medical data sets are usually small and have very high dimensionality. Too many attributes will make the analysis less efficient and will not necessarily increase accuracy, while too few data will decrease the modeling stability. Consequently, the main objective of this study is to extract the optimal subset of features to increase analytical performance when the data set is small. This paper proposes a fuzzy-based non-linear transformation method to extend classification related information from the original data attribute values for a small data set. Based on the new transformed data set, this study applies principal component analysis (PCA) to extract the optimal subset of features. Finally, we use the transformed data with these optimal features as the input data for a learning tool, a support vector machine (SVM). Six medical data sets: Pima Indians' diabetes, Wisconsin diagnostic breast cancer, Parkinson disease, echocardiogram, BUPA liver disorders dataset, and bladder cancer cases in Taiwan, are employed to illustrate the approach presented in this paper. This research uses the t-test to evaluate the classification accuracy for a single data set; and uses the Friedman test to show the proposed method is better than other methods over the multiple data sets. The experiment results indicate that the proposed method has better classification performance than either PCA or kernel principal component analysis (KPCA) when the data set is small, and suggest creating new purpose-related information to improve the analysis performance. This paper has shown that feature extraction is important as a function of feature selection for efficient data analysis. When the data set is small, using the fuzzy-based transformation method presented in this work to increase the information available produces better results than the PCA and KPCA approaches. Copyright © 2011 Elsevier B.V. All rights reserved.
Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.

PubMed

Tohka, Jussi; Moradi, Elaheh; Huttunen, Heikki

2016-07-01

We present a comparative split-half resampling analysis of various data driven feature selection and classification methods for the whole brain voxel-based classification analysis of anatomical magnetic resonance images. We compared support vector machines (SVMs), with or without filter based feature selection, several embedded feature selection methods and stability selection. While comparisons of the accuracy of various classification methods have been reported previously, the variability of the out-of-training sample classification accuracy and the set of selected features due to independent training and test sets have not been previously addressed in a brain imaging context. We studied two classification problems: 1) Alzheimer's disease (AD) vs. normal control (NC) and 2) mild cognitive impairment (MCI) vs. NC classification. In AD vs. NC classification, the variability in the test accuracy due to the subject sample did not vary between different methods and exceeded the variability due to different classifiers. In MCI vs. NC classification, particularly with a large training set, embedded feature selection methods outperformed SVM-based ones with the difference in the test accuracy exceeding the test accuracy variability due to the subject sample. The filter and embedded methods produced divergent feature patterns for MCI vs. NC classification that suggests the utility of the embedded feature selection for this problem when linked with the good generalization performance. The stability of the feature sets was strongly correlated with the number of features selected, weakly correlated with the stability of classification accuracy, and uncorrelated with the average classification accuracy.
Rethinking the REAL ID Act and National Identification Cards as a Counterterrorism Tool

DTIC Science & Technology

2009-12-01

federal government imposing national identification standards on states are also actively engaged in the debate. Michael Boldin , a 36-year-old Web...on the RIA.94 Boldin states, “Maine resisted, the government backed off, and soon all of these other states were doing the same thing.”95 Since...that acquires biometric data from an individual, extracts a feature set from the data, compares this feature set against the feature set stored in a
Inattentional blindness: A combination of a relational set and a feature inhibition set?

PubMed

Goldstein, Rebecca R; Beck, Melissa R

2016-07-01

Two experiments were conducted to directly test the feature set hypothesis and the relational set hypothesis in an inattentional blindness task. The feature set hypothesis predicts that unexpected objects that match the to-be-attended stimuli will be reported most. The relational set hypothesis predicts that unexpected objects that match the relationship between the to-be-attended and the to-be-ignored stimuli will be reported the most. Experiment 1 manipulated the luminance of the stimuli. Participants were instructed to monitor the gray letter shapes and to ignore either black or white letter shapes. The unexpected objects that exhibited the luminance relation of the to-be-attended to the to-be-ignored stimuli were reported by participants the most. Experiment 2 manipulated the color of the stimuli. Participants were instructed to monitor the yellower orange or the redder orange letter shapes and to ignore the redder orange or yellower letter shapes. The unexpected objects that exhibited the color relation of the to-be-attended to the to-be-ignored stimuli were reported the most. The results do not support the use of a feature set to accomplish the task and instead support the use of a relational set. In addition, the results point to the concurrent use of multiple attentional sets that are both excitatory and inhibitory.
Two-view information fusion for improvement of computer-aided detection (CAD) of breast masses on mammograms

NASA Astrophysics Data System (ADS)

Wei, Jun; Sahiner, Berkman; Hadjiiski, Lubomir M.; Chan, Heang-Ping; Helvie, Mark A.; Roubidoux, Marilyn A.; Zhou, Chuan; Ge, Jun; Zhang, Yiheng

2006-03-01

We are developing a two-view information fusion method to improve the performance of our CAD system for mass detection. Mass candidates on each mammogram were first detected with our single-view CAD system. Potential object pairs on the two-view mammograms were then identified by using the distance between the object and the nipple. Morphological features, Hessian feature, correlation coefficients between the two paired objects and texture features were used as input to train a similarity classifier that estimated a similarity scores for each pair. Finally, a linear discriminant analysis (LDA) classifier was used to fuse the score from the single-view CAD system and the similarity score. A data set of 475 patients containing 972 mammograms with 475 biopsy-proven masses was used to train and test the CAD system. All cases contained the CC view and the MLO or LM view. We randomly divided the data set into two independent sets of 243 cases and 232 cases. The training and testing were performed using the 2-fold cross validation method. The detection performance of the CAD system was assessed by free response receiver operating characteristic (FROC) analysis. The average test FROC curve was obtained from averaging the FP rates at the same sensitivity along the two corresponding test FROC curves from the 2-fold cross validation. At the case-based sensitivities of 90%, 85% and 80% on the test set, the single-view CAD system achieved an FP rate of 2.0, 1.5, and 1.2 FPs/image, respectively. With the two-view fusion system, the FP rates were reduced to 1.7, 1.3, and 1.0 FPs/image, respectively, at the corresponding sensitivities. The improvement was found to be statistically significant (p<0.05) by the AFROC method. Our results indicate that the two-view fusion scheme can improve the performance of mass detection on mammograms.
Experimental demonstration of single electron transistors featuring SiO{sub 2} plasma-enhanced atomic layer deposition in Ni-SiO{sub 2}-Ni tunnel junctions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karbasian, Golnaz, E-mail: Golnaz.Karbasian.1@nd.edu; McConnell, Michael S.; Orlov, Alexei O.

The authors report the use of plasma-enhanced atomic layer deposition (PEALD) to fabricate single-electron transistors (SETs) featuring ultrathin (≈1 nm) tunnel-transparent SiO{sub 2} in Ni-SiO{sub 2}-Ni tunnel junctions. They show that, as a result of the O{sub 2} plasma steps in PEALD of SiO{sub 2}, the top surface of the underlying Ni electrode is oxidized. Additionally, the bottom surface of the upper Ni layer is also oxidized where it is in contact with the deposited SiO{sub 2}, most likely as a result of oxygen-containing species on the surface of the SiO{sub 2}. Due to the presence of these surface parasitic layersmore » of NiO, which exhibit features typical of thermally activated transport, the resistance of Ni-SiO{sub 2}-Ni tunnel junctions is drastically increased. Moreover, the transport mechanism is changed from quantum tunneling through the dielectric barrier to one consistent with thermally activated resistors in series with tunnel junctions. The reduction of NiO to Ni is therefore required to restore the metal-insulator-metal (MIM) structure of the junctions. Rapid thermal annealing in a forming gas ambient at elevated temperatures is presented as a technique to reduce both parasitic oxide layers. This method is of great interest for devices that rely on MIM tunnel junctions with ultrathin barriers. Using this technique, the authors successfully fabricated MIM SETs with minimal trace of parasitic NiO component. They demonstrate that the properties of the tunnel barrier in nanoscale tunnel junctions (with <10{sup −15} m{sup 2} in area) can be evaluated by electrical characterization of SETs.« less
Computer-aided diagnosis of melanoma using border and wavelet-based texture analysis.

PubMed

Garnavi, Rahil; Aldeen, Mohammad; Bailey, James

2012-11-01

This paper presents a novel computer-aided diagnosis system for melanoma. The novelty lies in the optimised selection and integration of features derived from textural, borderbased and geometrical properties of the melanoma lesion. The texture features are derived from using wavelet-decomposition, the border features are derived from constructing a boundaryseries model of the lesion border and analysing it in spatial and frequency domains, and the geometry features are derived from shape indexes. The optimised selection of features is achieved by using the Gain-Ratio method, which is shown to be computationally efficient for melanoma diagnosis application. Classification is done through the use of four classifiers; namely, Support Vector Machine, Random Forest, Logistic Model Tree and Hidden Naive Bayes. The proposed diagnostic system is applied on a set of 289 dermoscopy images (114 malignant, 175 benign) partitioned into train, validation and test image sets. The system achieves and accuracy of 91.26% and AUC value of 0.937, when 23 features are used. Other important findings include (i) the clear advantage gained in complementing texture with border and geometry features, compared to using texture information only, and (ii) higher contribution of texture features than border-based features in the optimised feature set.
Methods for the Precise Locating and Forming of Arrays of Curved Features into a Workpiece

DOEpatents

Gill, David Dennis; Keeler, Gordon A.; Serkland, Darwin K.; Mukherjee, Sayan D.

2008-10-14

Methods for manufacturing high precision arrays of curved features (e.g. lenses) in the surface of a workpiece are described utilizing orthogonal sets of inter-fitting locating grooves to mate a workpiece to a workpiece holder mounted to the spindle face of a rotating machine tool. The matching inter-fitting groove sets in the workpiece and the chuck allow precisely and non-kinematically indexing the workpiece to locations defined in two orthogonal directions perpendicular to the turning axis of the machine tool. At each location on the workpiece a curved feature can then be on-center machined to create arrays of curved features on the workpiece. The averaging effect of the corresponding sets of inter-fitting grooves provide for precise repeatability in determining, the relative locations of the centers of each of the curved features in an array of curved features.
An enhanced digital line graph design

USGS Publications Warehouse

Guptill, Stephen C.

1990-01-01

In response to increasing information demands on its digital cartographic data, the U.S. Geological Survey has designed an enhanced version of the Digital Line Graph, termed Digital Line Graph - Enhanced (DLG-E). In the DLG-E model, the phenomena represented by geographic and cartographic data are termed entities. Entities represent individual phenomena in the real world. A feature is an abstraction of a set of entities, with the feature description encompassing only selected properties of the entities (typically the properties that have been portrayed cartographically on a map). Buildings, bridges, roads, streams, grasslands, and counties are examples of features. A feature instance, that is, one occurrence of a feature, is described in the digital environment by feature objects and spatial objects. A feature object identifies a feature instance and its nonlocational attributes. Nontopological relationships are associated with feature objects. The locational aspects of the feature instance are represented by spatial objects. Four spatial objects (points, nodes, chains, and polygons) and their topological relationships are defined. To link the locational and nonlocational aspects of the feature instance, a given feature object is associated with (or is composed of) a set of spatial objects. These objects, attributes, and relationships are the components of the DLG-E data model. To establish a domain of features for DLG-E, an approach using a set of classes, or views, of spatial entities was adopted. The five views that were developed are cover, division, ecosystem, geoposition, and morphology. The views are exclusive; each view is a self-contained analytical approach to the entire range of world features. Because each view is independent of the others, a single point on the surface of the Earth can be represented under multiple views. Under the five views, over 200 features were identified and defined. This set constitutes an initial domain of DLG-E features.
An Accurate Fire-Spread Algorithm in the Weather Research and Forecasting Model Using the Level-Set Method

NASA Astrophysics Data System (ADS)

Muñoz-Esparza, Domingo; Kosović, Branko; Jiménez, Pedro A.; Coen, Janice L.

2018-04-01

The level-set method is typically used to track and propagate the fire perimeter in wildland fire models. Herein, a high-order level-set method using fifth-order WENO scheme for the discretization of spatial derivatives and third-order explicit Runge-Kutta temporal integration is implemented within the Weather Research and Forecasting model wildland fire physics package, WRF-Fire. The algorithm includes solution of an additional partial differential equation for level-set reinitialization. The accuracy of the fire-front shape and rate of spread in uncoupled simulations is systematically analyzed. It is demonstrated that the common implementation used by level-set-based wildfire models yields to rate-of-spread errors in the range 10-35% for typical grid sizes (Δ = 12.5-100 m) and considerably underestimates fire area. Moreover, the amplitude of fire-front gradients in the presence of explicitly resolved turbulence features is systematically underestimated. In contrast, the new WRF-Fire algorithm results in rate-of-spread errors that are lower than 1% and that become nearly grid independent. Also, the underestimation of fire area at the sharp transition between the fire front and the lateral flanks is found to be reduced by a factor of ≈7. A hybrid-order level-set method with locally reduced artificial viscosity is proposed, which substantially alleviates the computational cost associated with high-order discretizations while preserving accuracy. Simulations of the Last Chance wildfire demonstrate additional benefits of high-order accurate level-set algorithms when dealing with complex fuel heterogeneities, enabling propagation across narrow fuel gaps and more accurate fire backing over the lee side of no fuel clusters.
Positive-Negative Asymmetry in the Evaluations of Political Candidates. The Role of Features of Similarity and Affect in Voter Behavior.

PubMed

Falkowski, Andrzej; Jabłońska, Magdalena

2018-01-01

In this study we followed the extension of Tversky's research about features of similarity with its application to open sets. Unlike the original closed-set model in which a feature was shifted between a common and a distinctive set, we investigated how addition of new features and deletion of existing features affected similarity judgments. The model was tested empirically in a political context and we analyzed how positive and negative changes in a candidate's profile affect the similarity of the politician to his or her ideal and opposite counterpart. The results showed a positive-negative asymmetry in comparison judgments where enhancing negative features (distinctive for an ideal political candidate) had a greater effect on judgments than operations on positive (common) features. However, the effect was not observed for comparisons to a bad politician. Further analyses showed that in the case of a negative reference point, the relationship between similarity judgments and voting intention was mediated by the affective evaluation of the candidate.
Applying for Noyce

NASA Astrophysics Data System (ADS)

Stewart, Gay; Prival, Joan

2012-02-01

The NSF Robert Noyce Teacher Scholarship Program seeks to encourage talented STEM majors and STEM professionals to become mathematics and science teachers. The program also supports the development of Master Teachers in science and mathematics. There are key features in managing a Noyce program that often present difficulty and are vital to successful, sustainable, teacher preparation programs: mentoring, advising and recruiting, and working with school partners. In this workshop, we will help participants consider ways to alleviate existing difficulties or how to set up a program to reduce them. A sample proposal will be available for a mock review.
A survey method for characterizing daily life experience: the day reconstruction method.

PubMed

Kahneman, Daniel; Krueger, Alan B; Schkade, David A; Schwarz, Norbert; Stone, Arthur A

2004-12-03

The Day Reconstruction Method (DRM) assesses how people spend their time and how they experience the various activities and settings of their lives, combining features of time-budget measurement and experience sampling. Participants systematically reconstruct their activities and experiences of the preceding day with procedures designed to reduce recall biases. The DRM's utility is shown by documenting close correspondences between the DRM reports of 909 employed women and established results from experience sampling. An analysis of the hedonic treadmill shows the DRM's potential for well-being research.
Optimized extreme learning machine for urban land cover classification using hyperspectral imagery

NASA Astrophysics Data System (ADS)

Su, Hongjun; Tian, Shufang; Cai, Yue; Sheng, Yehua; Chen, Chen; Najafian, Maryam

2017-12-01

This work presents a new urban land cover classification framework using the firefly algorithm (FA) optimized extreme learning machine (ELM). FA is adopted to optimize the regularization coefficient C and Gaussian kernel σ for kernel ELM. Additionally, effectiveness of spectral features derived from an FA-based band selection algorithm is studied for the proposed classification task. Three sets of hyperspectral databases were recorded using different sensors, namely HYDICE, HyMap, and AVIRIS. Our study shows that the proposed method outperforms traditional classification algorithms such as SVM and reduces computational cost significantly.
About decomposition approach for solving the classification problem

NASA Astrophysics Data System (ADS)

Andrianova, A. A.

2016-11-01

This article describes the features of the application of an algorithm with using of decomposition methods for solving the binary classification problem of constructing a linear classifier based on Support Vector Machine method. Application of decomposition reduces the volume of calculations, in particular, due to the emerging possibilities to build parallel versions of the algorithm, which is a very important advantage for the solution of problems with big data. The analysis of the results of computational experiments conducted using the decomposition approach. The experiment use known data set for binary classification problem.
Spiral Galaxy Lensing: A Model with Twist

NASA Astrophysics Data System (ADS)

Bell, Steven R.; Ernst, Brett; Fancher, Sean; Keeton, Charles R.; Komanduru, Abi; Lundberg, Erik

2014-12-01

We propose a single galaxy gravitational lensing model with a mass density that has a spiral structure. Namely, we extend the arcsine gravitational lens (a truncated singular isothermal elliptical model), adding an additional parameter that controls the amount of spiraling in the structure of the mass density. An important feature of our model is that, even though the mass density is sophisticated, we succeed in integrating the deflection term in closed form using a Gauss hypergeometric function. When the spiraling parameter is set to zero, this reduces to the arcsine lens.

Fault-Tree Compiler

NASA Technical Reports Server (NTRS)

Butler, Ricky W.; Boerschlein, David P.

1993-01-01

Fault-Tree Compiler (FTC) program, is software tool used to calculate probability of top event in fault tree. Gates of five different types allowed in fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N. High-level input language easy to understand and use. In addition, program supports hierarchical fault-tree definition feature, which simplifies tree-description process and reduces execution time. Set of programs created forming basis for reliability-analysis workstation: SURE, ASSIST, PAWS/STEM, and FTC fault-tree tool (LAR-14586). Written in PASCAL, ANSI-compliant C language, and FORTRAN 77. Other versions available upon request.
Hash Bit Selection for Nearest Neighbor Search.

PubMed

Xianglong Liu; Junfeng He; Shih-Fu Chang

2017-11-01

To overcome the barrier of storage and computation when dealing with gigantic-scale data sets, compact hashing has been studied extensively to approximate the nearest neighbor search. Despite the recent advances, critical design issues remain open in how to select the right features, hashing algorithms, and/or parameter settings. In this paper, we address these by posing an optimal hash bit selection problem, in which an optimal subset of hash bits are selected from a pool of candidate bits generated by different features, algorithms, or parameters. Inspired by the optimization criteria used in existing hashing algorithms, we adopt the bit reliability and their complementarity as the selection criteria that can be carefully tailored for hashing performance in different tasks. Then, the bit selection solution is discovered by finding the best tradeoff between search accuracy and time using a modified dynamic programming method. To further reduce the computational complexity, we employ the pairwise relationship among hash bits to approximate the high-order independence property, and formulate it as an efficient quadratic programming method that is theoretically equivalent to the normalized dominant set problem in a vertex- and edge-weighted graph. Extensive large-scale experiments have been conducted under several important application scenarios of hash techniques, where our bit selection framework can achieve superior performance over both the naive selection methods and the state-of-the-art hashing algorithms, with significant accuracy gains ranging from 10% to 50%, relatively.
Combining heterogeneous features for colonic polyp detection in CTC based on semi-definite programming

NASA Astrophysics Data System (ADS)

Wang, Shijun; Yao, Jianhua; Petrick, Nicholas A.; Summers, Ronald M.

2009-02-01

Colon cancer is the second leading cause of cancer-related deaths in the United States. Computed tomographic colonography (CTC) combined with a computer aided detection system provides a feasible combination for improving colonic polyps detection and increasing the use of CTC for colon cancer screening. To distinguish true polyps from false positives, various features extracted from polyp candidates have been proposed. Most of these features try to capture the shape information of polyp candidates or neighborhood knowledge about the surrounding structures (fold, colon wall, etc.). In this paper, we propose a new set of shape descriptors for polyp candidates based on statistical curvature information. These features, called histogram of curvature features, are rotation, translation and scale invariant and can be treated as complementing our existing feature set. Then in order to make full use of the traditional features (defined as group A) and the new features (group B) which are highly heterogeneous, we employed a multiple kernel learning method based on semi-definite programming to identify an optimized classification kernel based on the combined set of features. We did leave-one-patient-out test on a CTC dataset which contained scans from 50 patients (with 90 6-9mm polyp detections). Experimental results show that a support vector machine (SVM) based on the combined feature set and the semi-definite optimization kernel achieved higher FROC performance compared to SVMs using the two groups of features separately. At a false positive per patient rate of 7, the sensitivity on 6-9mm polyps using the combined features improved from 0.78 (Group A) and 0.73 (Group B) to 0.82 (p<=0.01).
Change descriptors for determining nodule malignancy in national lung screening trial CT screening images

NASA Astrophysics Data System (ADS)

Geiger, Benjamin; Hawkins, Samuel; Hall, Lawrence O.; Goldgof, Dmitry B.; Balagurunathan, Yoganand; Gatenby, Robert A.; Gillies, Robert J.

2016-03-01

Pulmonary nodules are effectively diagnosed in CT scans, but determining their malignancy has been a challenge. The rate of change of the volume of a pulmonary nodule is known to be a prognostic factor for cancer development. In this study, we propose that other changes in imaging characteristics are similarly informative. We examined the combination of image features across multiple CT scans, taken from the National Lung Screening Trial, with individual scans of the same patient separated by approximately one year. By subtracting the values of existing features in multiple scans for the same patient, we were able to improve the ability of existing classification algorithms to determine whether a nodule will become malignant. We trained each classifier on 83 nodules determined to be malignant by biopsy and 172 nodules determined to be benign by their clinical stability through two years of no change; classifiers were tested on 77 malignant and 144 benign nodules, using a set of features that in a test-retest experiment were shown to be stable. An accuracy of 83.71% and AUC of 0.814 were achieved with the Random Forests classifier on a subset of features determined to be stable via test-retest reproducibility analysis, further reduced with the Correlation-based Feature Selection algorithm.
Extreme Sparse Multinomial Logistic Regression: A Fast and Robust Framework for Hyperspectral Image Classification

NASA Astrophysics Data System (ADS)

Cao, Faxian; Yang, Zhijing; Ren, Jinchang; Ling, Wing-Kuen; Zhao, Huimin; Marshall, Stephen

2017-12-01

Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.
Machine-Learning Classifier for Patients with Major Depressive Disorder: Multifeature Approach Based on a High-Order Minimum Spanning Tree Functional Brain Network.

PubMed

Guo, Hao; Qin, Mengna; Chen, Junjie; Xu, Yong; Xiang, Jie

2017-01-01

High-order functional connectivity networks are rich in time information that can reflect dynamic changes in functional connectivity between brain regions. Accordingly, such networks are widely used to classify brain diseases. However, traditional methods for processing high-order functional connectivity networks generally include the clustering method, which reduces data dimensionality. As a result, such networks cannot be effectively interpreted in the context of neurology. Additionally, due to the large scale of high-order functional connectivity networks, it can be computationally very expensive to use complex network or graph theory to calculate certain topological properties. Here, we propose a novel method of generating a high-order minimum spanning tree functional connectivity network. This method increases the neurological significance of the high-order functional connectivity network, reduces network computing consumption, and produces a network scale that is conducive to subsequent network analysis. To ensure the quality of the topological information in the network structure, we used frequent subgraph mining technology to capture the discriminative subnetworks as features and combined this with quantifiable local network features. Then we applied a multikernel learning technique to the corresponding selected features to obtain the final classification results. We evaluated our proposed method using a data set containing 38 patients with major depressive disorder and 28 healthy controls. The experimental results showed a classification accuracy of up to 97.54%.
Machine-Learning Classifier for Patients with Major Depressive Disorder: Multifeature Approach Based on a High-Order Minimum Spanning Tree Functional Brain Network

PubMed Central

Qin, Mengna; Chen, Junjie; Xu, Yong; Xiang, Jie

2017-01-01

High-order functional connectivity networks are rich in time information that can reflect dynamic changes in functional connectivity between brain regions. Accordingly, such networks are widely used to classify brain diseases. However, traditional methods for processing high-order functional connectivity networks generally include the clustering method, which reduces data dimensionality. As a result, such networks cannot be effectively interpreted in the context of neurology. Additionally, due to the large scale of high-order functional connectivity networks, it can be computationally very expensive to use complex network or graph theory to calculate certain topological properties. Here, we propose a novel method of generating a high-order minimum spanning tree functional connectivity network. This method increases the neurological significance of the high-order functional connectivity network, reduces network computing consumption, and produces a network scale that is conducive to subsequent network analysis. To ensure the quality of the topological information in the network structure, we used frequent subgraph mining technology to capture the discriminative subnetworks as features and combined this with quantifiable local network features. Then we applied a multikernel learning technique to the corresponding selected features to obtain the final classification results. We evaluated our proposed method using a data set containing 38 patients with major depressive disorder and 28 healthy controls. The experimental results showed a classification accuracy of up to 97.54%. PMID:29387141
Organizational contextual features that influence the implementation of evidence-based practices across healthcare settings: a systematic integrative review.

PubMed

Li, Shelly-Anne; Jeffs, Lianne; Barwick, Melanie; Stevens, Bonnie

2018-05-05

Organizational contextual features have been recognized as important determinants for implementing evidence-based practices across healthcare settings for over a decade. However, implementation scientists have not reached consensus on which features are most important for implementing evidence-based practices. The aims of this review were to identify the most commonly reported organizational contextual features that influence the implementation of evidence-based practices across healthcare settings, and to describe how these features affect implementation. An integrative review was undertaken following literature searches in CINAHL, MEDLINE, PsycINFO, EMBASE, Web of Science, and Cochrane databases from January 2005 to June 2017. English language, peer-reviewed empirical studies exploring organizational context in at least one implementation initiative within a healthcare setting were included. Quality appraisal of the included studies was performed using the Mixed Methods Appraisal Tool. Inductive content analysis informed data extraction and reduction. The search generated 5152 citations. After removing duplicates and applying eligibility criteria, 36 journal articles were included. The majority (n = 20) of the study designs were qualitative, 11 were quantitative, and 5 used a mixed methods approach. Six main organizational contextual features (organizational culture; leadership; networks and communication; resources; evaluation, monitoring and feedback; and champions) were most commonly reported to influence implementation outcomes in the selected studies across a wide range of healthcare settings. We identified six organizational contextual features that appear to be interrelated and work synergistically to influence the implementation of evidence-based practices within an organization. Organizational contextual features did not influence implementation efforts independently from other features. Rather, features were interrelated and often influenced each other in complex, dynamic ways to effect change. These features corresponded to the constructs in the Consolidated Framework for Implementation Research (CFIR), which supports the use of CFIR as a guiding framework for studies that explore the relationship between organizational context and implementation. Organizational culture was most commonly reported to affect implementation. Leadership exerted influence on the five other features, indicating it may be a moderator or mediator that enhances or impedes the implementation of evidence-based practices. Future research should focus on how organizational features interact to influence implementation effectiveness.
Local Feature Selection for Data Classification.

PubMed

Armanfard, Narges; Reilly, James P; Komeili, Majid

2016-06-01

Typical feature selection methods choose an optimal global feature subset that is applied over all regions of the sample space. In contrast, in this paper we propose a novel localized feature selection (LFS) approach whereby each region of the sample space is associated with its own distinct optimized feature set, which may vary both in membership and size across the sample space. This allows the feature set to optimally adapt to local variations in the sample space. An associated method for measuring the similarities of a query datum to each of the respective classes is also proposed. The proposed method makes no assumptions about the underlying structure of the samples; hence the method is insensitive to the distribution of the data over the sample space. The method is efficiently formulated as a linear programming optimization problem. Furthermore, we demonstrate the method is robust against the over-fitting problem. Experimental results on eleven synthetic and real-world data sets demonstrate the viability of the formulation and the effectiveness of the proposed algorithm. In addition we show several examples where localized feature selection produces better results than a global feature selection method.
Classification of independent components of EEG into multiple artifact classes.

PubMed

Frølich, Laura; Andersen, Tobias S; Mørup, Morten

2015-01-01

In this study, we aim to automatically identify multiple artifact types in EEG. We used multinomial regression to classify independent components of EEG data, selecting from 65 spatial, spectral, and temporal features of independent components using forward selection. The classifier identified neural and five nonneural types of components. Between subjects within studies, high classification performances were obtained. Between studies, however, classification was more difficult. For neural versus nonneural classifications, performance was on par with previous results obtained by others. We found that automatic separation of multiple artifact classes is possible with a small feature set. Our method can reduce manual workload and allow for the selective removal of artifact classes. Identifying artifacts during EEG recording may be used to instruct subjects to refrain from activity causing them. Copyright © 2014 Society for Psychophysiological Research.
Estimating Fallout Building Attributes from Architectural Features and Global Earthquake Model (GEM) Building Descriptions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dillon, Michael B.; Kane, Staci R.

A nuclear explosion has the potential to injure or kill tens to hundreds of thousands (or more) of people through exposure to fallout (external gamma) radiation. Existing buildings can protect their occupants (reducing fallout radiation exposures) by placing material and distance between fallout particles and individuals indoors. Prior efforts have determined an initial set of building attributes suitable to reasonably assess a given building’s protection against fallout radiation. The current work provides methods to determine the quantitative values for these attributes from (a) common architectural features and data and (b) buildings described using the Global Earthquake Model (GEM) taxonomy. Thesemore » methods will be used to improve estimates of fallout protection for operational US Department of Defense (DoD) and US Department of Energy (DOE) consequence assessment models.« less
Evaluation of a New Ensemble Learning Framework for Mass Classification in Mammograms.

PubMed

Rahmani Seryasat, Omid; Haddadnia, Javad

2018-06-01

Mammography is the most common screening method for diagnosis of breast cancer. In this study, a computer-aided system for diagnosis of benignity and malignity of the masses was implemented in mammogram images. In the computer aided diagnosis system, we first reduce the noise in the mammograms using an effective noise removal technique. After the noise removal, the mass in the region of interest must be segmented and this segmentation is done using a deformable model. After the mass segmentation, a number of features are extracted from it. These features include: features of the mass shape and border, tissue properties, and the fractal dimension. After extracting a large number of features, a proper subset must be chosen from among them. In this study, we make use of a new method on the basis of a genetic algorithm for selection of a proper set of features. After determining the proper features, a classifier is trained. To classify the samples, a new architecture for combination of the classifiers is proposed. In this architecture, easy and difficult samples are identified and trained using different classifiers. Finally, the proposed mass diagnosis system was also tested on mini-Mammographic Image Analysis Society and digital database for screening mammography databases. The obtained results indicate that the proposed system can compete with the state-of-the-art methods in terms of accuracy. Copyright © 2017 Elsevier Inc. All rights reserved.
Accurate prediction of personalized olfactory perception from large-scale chemoinformatic features.

PubMed

Li, Hongyang; Panwar, Bharat; Omenn, Gilbert S; Guan, Yuanfang

2018-02-01

The olfactory stimulus-percept problem has been studied for more than a century, yet it is still hard to precisely predict the odor given the large-scale chemoinformatic features of an odorant molecule. A major challenge is that the perceived qualities vary greatly among individuals due to different genetic and cultural backgrounds. Moreover, the combinatorial interactions between multiple odorant receptors and diverse molecules significantly complicate the olfaction prediction. Many attempts have been made to establish structure-odor relationships for intensity and pleasantness, but no models are available to predict the personalized multi-odor attributes of molecules. In this study, we describe our winning algorithm for predicting individual and population perceptual responses to various odorants in the DREAM Olfaction Prediction Challenge. We find that random forest model consisting of multiple decision trees is well suited to this prediction problem, given the large feature spaces and high variability of perceptual ratings among individuals. Integrating both population and individual perceptions into our model effectively reduces the influence of noise and outliers. By analyzing the importance of each chemical feature, we find that a small set of low- and nondegenerative features is sufficient for accurate prediction. Our random forest model successfully predicts personalized odor attributes of structurally diverse molecules. This model together with the top discriminative features has the potential to extend our understanding of olfactory perception mechanisms and provide an alternative for rational odorant design.
Real-time hypoglycemia detection from continuous glucose monitoring data of subjects with type 1 diabetes.

PubMed

Jensen, Morten Hasselstrøm; Christensen, Toke Folke; Tarnow, Lise; Seto, Edmund; Dencker Johansen, Mette; Hejlesen, Ole Kristian

2013-07-01

Hypoglycemia is a potentially fatal condition. Continuous glucose monitoring (CGM) has the potential to detect hypoglycemia in real time and thereby reduce time in hypoglycemia and avoid any further decline in blood glucose level. However, CGM is inaccurate and shows a substantial number of cases in which the hypoglycemic event is not detected by the CGM. The aim of this study was to develop a pattern classification model to optimize real-time hypoglycemia detection. Features such as time since last insulin injection and linear regression, kurtosis, and skewness of the CGM signal in different time intervals were extracted from data of 10 male subjects experiencing 17 insulin-induced hypoglycemic events in an experimental setting. Nondiscriminative features were eliminated with SEPCOR and forward selection. The feature combinations were used in a Support Vector Machine model and the performance assessed by sample-based sensitivity and specificity and event-based sensitivity and number of false-positives. The best model was composed by using seven features and was able to detect 17 of 17 hypoglycemic events with one false-positive compared with 12 of 17 hypoglycemic events with zero false-positives for the CGM alone. Lead-time was 14 min and 0 min for the model and the CGM alone, respectively. This optimized real-time hypoglycemia detection provides a unique approach for the diabetes patient to reduce time in hypoglycemia and learn about patterns in glucose excursions. Although these results are promising, the model needs to be validated on CGM data from patients with spontaneous hypoglycemic events.
Application of complex discrete wavelet transform in classification of Doppler signals using complex-valued artificial neural network.

PubMed

Ceylan, Murat; Ceylan, Rahime; Ozbay, Yüksel; Kara, Sadik

2008-09-01

In biomedical signal classification, due to the huge amount of data, to compress the biomedical waveform data is vital. This paper presents two different structures formed using feature extraction algorithms to decrease size of feature set in training and test data. The proposed structures, named as wavelet transform-complex-valued artificial neural network (WT-CVANN) and complex wavelet transform-complex-valued artificial neural network (CWT-CVANN), use real and complex discrete wavelet transform for feature extraction. The aim of using wavelet transform is to compress data and to reduce training time of network without decreasing accuracy rate. In this study, the presented structures were applied to the problem of classification in carotid arterial Doppler ultrasound signals. Carotid arterial Doppler ultrasound signals were acquired from left carotid arteries of 38 patients and 40 healthy volunteers. The patient group included 22 males and 16 females with an established diagnosis of the early phase of atherosclerosis through coronary or aortofemoropopliteal (lower extremity) angiographies (mean age, 59 years; range, 48-72 years). Healthy volunteers were young non-smokers who seem to not bear any risk of atherosclerosis, including 28 males and 12 females (mean age, 23 years; range, 19-27 years). Sensitivity, specificity and average detection rate were calculated for comparison, after training and test phases of all structures finished. These parameters have demonstrated that training times of CVANN and real-valued artificial neural network (RVANN) were reduced using feature extraction algorithms without decreasing accuracy rate in accordance to our aim.
Data Reduction of Laser Ablation Split-Stream (LASS) Analyses Using Newly Developed Features Within Iolite: With Applications to Lu-Hf + U-Pb in Detrital Zircon and Sm-Nd +U-Pb in Igneous Monazite

NASA Astrophysics Data System (ADS)

Fisher, Christopher M.; Paton, Chad; Pearson, D. Graham; Sarkar, Chiranjeeb; Luo, Yan; Tersmette, Daniel B.; Chacko, Thomas

2017-12-01

A robust platform to view and integrate multiple data sets collected simultaneously is required to realize the utility and potential of the Laser Ablation Split-Stream (LASS) method. This capability, until now, has been unavailable and practitioners have had to laboriously process each data set separately, making it challenging to take full advantage of the benefits of LASS. We describe a new program for handling multiple mass spectrometric data sets collected simultaneously, designed specifically for the LASS technique, by which a laser aerosol is been split into two or more separate "streams" to be measured on separate mass spectrometers. New features within Iolite (https://iolite-software.com) enable the capability of loading, synchronizing, viewing, and reducing two or more data sets acquired simultaneously, as multiple DRSs (data reduction schemes) can be run concurrently. While this version of Iolite accommodates any combination of simultaneously collected mass spectrometer data, we demonstrate the utility using case studies where U-Pb and Lu-Hf isotope composition of zircon, and U-Pb and Sm-Nd isotope composition of monazite were analyzed simultaneously, in crystals showing complex isotopic zonation. These studies demonstrate the importance of being able to view and integrate simultaneously acquired data sets, especially for samples with complicated zoning and decoupled isotope systematics, in order to extract accurate and geologically meaningful isotopic and compositional data. This contribution provides instructions and examples for handling simultaneously collected laser ablation data. An instructional video is also provided. The updated Iolite software will help to fully develop the applications of both LASS and multi-instrument mass spectrometric measurement capabilities.
Reducing and Analyzing the PHAT Survey with the Cloud

NASA Astrophysics Data System (ADS)

Williams, Benjamin F.; Olsen, Knut; Khan, Rubab; Pirone, Daniel; Rosema, Keith

2018-05-01

We discuss the technical challenges we faced and the techniques we used to overcome them when reducing the Panchromatic Hubble Andromeda Treasury (PHAT) photometric data set on the Amazon Elastic Compute Cloud (EC2). We first describe the architecture of our photometry pipeline, which we found particularly efficient for reducing the data in multiple ways for different purposes. We then describe the features of EC2 that make this architecture both efficient to use and challenging to implement. We describe the techniques we adopted to process our data, and suggest ways these techniques may be improved for those interested in trying such reductions in the future. Finally, we summarize the output photometry data products, which are now hosted publicly in two places in two formats. They are in simple fits tables in the high-level science products on MAST, and on a queryable database available through the NOAO Data Lab.
The interaction of feature and space based orienting within the attention set

PubMed Central

Lim, Ahnate; Sinnett, Scott

2014-01-01

The processing of sensory information relies on interacting mechanisms of sustained attention and attentional capture, both of which operate in space and on object features. While evidence indicates that exogenous attentional capture, a mechanism previously understood to be automatic, can be eliminated while concurrently performing a demanding task, we reframe this phenomenon within the theoretical framework of the “attention set” (Most et al., 2005). Consequently, the specific prediction that cuing effects should reappear when feature dimensions of the cue overlap with those in the attention set (i.e., elements of the demanding task) was empirically tested and confirmed using a dual-task paradigm involving both sustained attention and attentional capture, adapted from Santangelo et al. (2007). Participants were required to either detect a centrally presented target presented in a stream of distractors (the primary task), or respond to a spatially cued target (the secondary task). Importantly, the spatial cue could either share features with the target in the centrally presented primary task, or not share any features. Overall, the findings supported the attention set hypothesis showing that a spatial cuing effect was only observed when the peripheral cue shared a feature with objects that were already in the attention set (i.e., the primary task). However, this finding was accompanied by differential attentional orienting dependent on the different types of objects within the attention set, with feature-based orienting occurring for target-related objects, and additional spatial-based orienting for distractor-related objects. PMID:24523682
Challenges in the Management of a Patient with Myxoedema Coma in Ghana: A Case Report.

PubMed

Akpalu, Josephine; Atiase, Yacoba; Yorke, Ernest; Fiscian, Henrietta; Kootin-Sanwu, Cecilia; Akpalu, Albert

2017-03-01

Myxoedema coma is a rare life-threatening disease, and it is essential that it is managed appropriately to reduce the associated high mortality. However, in the setting where efficient healthcare delivery is hampered by inadequacies, the management of such cases may pose a significant challenge. We present the case of a middle-aged woman diagnosed with myxoedema coma and severe hyponatremia. The case report highlights some of the challenges that may be encountered during the management of myxoedema coma in similar settings and outlines the management strategies undertaken to overcome them in the absence of national guidelines. It also brings to the fore the need for clinicians to look out for clinical features suggestive of hypothyroidism particularly among high risk individuals for early diagnosis and treatment. None declared.
Robust Point Set Matching for Partial Face Recognition.

PubMed

Weng, Renliang; Lu, Jiwen; Tan, Yap-Peng

2016-03-01

Over the past three decades, a number of face recognition methods have been proposed in computer vision, and most of them use holistic face images for person identification. In many real-world scenarios especially some unconstrained environments, human faces might be occluded by other objects, and it is difficult to obtain fully holistic face images for recognition. To address this, we propose a new partial face recognition approach to recognize persons of interest from their partial faces. Given a pair of gallery image and probe face patch, we first detect keypoints and extract their local textural features. Then, we propose a robust point set matching method to discriminatively match these two extracted local feature sets, where both the textural information and geometrical information of local features are explicitly used for matching simultaneously. Finally, the similarity of two faces is converted as the distance between these two aligned feature sets. Experimental results on four public face data sets show the effectiveness of the proposed approach.

Error modeling for surrogates of dynamical systems using machine learning: Machine-learning-based error model for surrogates of dynamical systems

DOE PAGES

Trehan, Sumeet; Carlberg, Kevin T.; Durlofsky, Louis J.

2017-07-14

A machine learning–based framework for modeling the error introduced by surrogate models of parameterized dynamical systems is proposed. The framework entails the use of high-dimensional regression techniques (eg, random forests, and LASSO) to map a large set of inexpensively computed “error indicators” (ie, features) produced by the surrogate model at a given time instance to a prediction of the surrogate-model error in a quantity of interest (QoI). This eliminates the need for the user to hand-select a small number of informative features. The methodology requires a training set of parameter instances at which the time-dependent surrogate-model error is computed bymore » simulating both the high-fidelity and surrogate models. Using these training data, the method first determines regression-model locality (via classification or clustering) and subsequently constructs a “local” regression model to predict the time-instantaneous error within each identified region of feature space. We consider 2 uses for the resulting error model: (1) as a correction to the surrogate-model QoI prediction at each time instance and (2) as a way to statistically model arbitrary functions of the time-dependent surrogate-model error (eg, time-integrated errors). We then apply the proposed framework to model errors in reduced-order models of nonlinear oil-water subsurface flow simulations, with time-varying well-control (bottom-hole pressure) parameters. The reduced-order models used in this work entail application of trajectory piecewise linearization in conjunction with proper orthogonal decomposition. Moreover, when the first use of the method is considered, numerical experiments demonstrate consistent improvement in accuracy in the time-instantaneous QoI prediction relative to the original surrogate model, across a large number of test cases. When the second use is considered, results show that the proposed method provides accurate statistical predictions of the time- and well-averaged errors.« less
Error modeling for surrogates of dynamical systems using machine learning: Machine-learning-based error model for surrogates of dynamical systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Trehan, Sumeet; Carlberg, Kevin T.; Durlofsky, Louis J.

A machine learning–based framework for modeling the error introduced by surrogate models of parameterized dynamical systems is proposed. The framework entails the use of high-dimensional regression techniques (eg, random forests, and LASSO) to map a large set of inexpensively computed “error indicators” (ie, features) produced by the surrogate model at a given time instance to a prediction of the surrogate-model error in a quantity of interest (QoI). This eliminates the need for the user to hand-select a small number of informative features. The methodology requires a training set of parameter instances at which the time-dependent surrogate-model error is computed bymore » simulating both the high-fidelity and surrogate models. Using these training data, the method first determines regression-model locality (via classification or clustering) and subsequently constructs a “local” regression model to predict the time-instantaneous error within each identified region of feature space. We consider 2 uses for the resulting error model: (1) as a correction to the surrogate-model QoI prediction at each time instance and (2) as a way to statistically model arbitrary functions of the time-dependent surrogate-model error (eg, time-integrated errors). We then apply the proposed framework to model errors in reduced-order models of nonlinear oil-water subsurface flow simulations, with time-varying well-control (bottom-hole pressure) parameters. The reduced-order models used in this work entail application of trajectory piecewise linearization in conjunction with proper orthogonal decomposition. Moreover, when the first use of the method is considered, numerical experiments demonstrate consistent improvement in accuracy in the time-instantaneous QoI prediction relative to the original surrogate model, across a large number of test cases. When the second use is considered, results show that the proposed method provides accurate statistical predictions of the time- and well-averaged errors.« less
SU-F-R-18: Updates to the Computational Environment for Radiological Research for Image Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Apte, Aditya P.; Deasy, Joseph O.

2016-06-15

Purpose: To present new tools in CERR for Texture Analysis and Visualization. Method: (1) Quantitative Image Analysis: We added the ability to compute Haralick texture features based on local neighbourhood. The Texture features depend on many parameters used in their derivation. For example: (a) directionality, (b) quantization of image, (c) patch-size for the neighborhood, (d) handling of the edge voxels within the region of interest, (e) Averaging co-occurance matrix vs texture features for different directions etc. A graphical user interface was built to set these parameters and then visualize their impact on the resulting texture maps. The entire functionality wasmore » written in Matlab. Array indexing was used to speed up the texture calculation. The computation speed is very competitive with the ITK library. Moreover, our implementation works with multiple CPUs and the computation time can be further reduced by using multiple processor threads. In order to reduce the Haralick texture maps into scalar features, we propose the use of Texture Volume Histograms. This lets users make use of the entire distribution of texture values within the region of interest rather than using just the mean and the standard deviations. (2) Qualitative/Visualization tools: The derived texture maps are stored as a new scan (derived) within CERR’s planC data structure. A display that compares various scans was built to show the raw image and the derived texture maps side-by-side. These images are positionally linked and can be navigated together. CERR’s graphics handling was updated and sped-up to be compatible with the newer Matlab versions. As a result, the users can use (a) different window levels and colormaps for different viewports, (b) click-and-drag or use mouse scroll-wheel to navigate slices. Results: The new features and updates are available via https://www.github.com/adityaapte/cerr . Conclusion: Features added to CERR increase its utility in Radiomics and Outcomes modeling.« less
An Evaluation of optional timing/synchronization features to support selection of an optimum design for the DCS digital communication network

NASA Technical Reports Server (NTRS)

Bradley, D. B.; Cain, J. B., III; Williard, M. W.

1978-01-01

The task was to evaluate the ability of a set of timing/synchronization subsystem features to provide a set of desirable characteristics for the evolving Defense Communications System digital communications network. The set of features related to the approaches by which timing/synchronization information could be disseminated throughout the network and the manner in which this information could be utilized to provide a synchronized network. These features, which could be utilized in a large number of different combinations, included mutual control, directed control, double ended reference links, independence of clock error measurement and correction, phase reference combining, and self organizing.
Detecting Parkinson's disease from sustained phonation and speech signals.

PubMed

Vaiciukynas, Evaldas; Verikas, Antanas; Gelzinis, Adas; Bacauskiene, Marija

2017-01-01

This study investigates signals from sustained phonation and text-dependent speech modalities for Parkinson's disease screening. Phonation corresponds to the vowel /a/ voicing task and speech to the pronunciation of a short sentence in Lithuanian language. Signals were recorded through two channels simultaneously, namely, acoustic cardioid (AC) and smart phone (SP) microphones. Additional modalities were obtained by splitting speech recording into voiced and unvoiced parts. Information in each modality is summarized by 18 well-known audio feature sets. Random forest (RF) is used as a machine learning algorithm, both for individual feature sets and for decision-level fusion. Detection performance is measured by the out-of-bag equal error rate (EER) and the cost of log-likelihood-ratio. Essentia audio feature set was the best using the AC speech modality and YAAFE audio feature set was the best using the SP unvoiced modality, achieving EER of 20.30% and 25.57%, respectively. Fusion of all feature sets and modalities resulted in EER of 19.27% for the AC and 23.00% for the SP channel. Non-linear projection of a RF-based proximity matrix into the 2D space enriched medical decision support by visualization.
Kernel-based discriminant feature extraction using a representative dataset

NASA Astrophysics Data System (ADS)

Li, Honglin; Sancho Gomez, Jose-Luis; Ahalt, Stanley C.

2002-07-01

Discriminant Feature Extraction (DFE) is widely recognized as an important pre-processing step in classification applications. Most DFE algorithms are linear and thus can only explore the linear discriminant information among the different classes. Recently, there has been several promising attempts to develop nonlinear DFE algorithms, among which is Kernel-based Feature Extraction (KFE). The efficacy of KFE has been experimentally verified by both synthetic data and real problems. However, KFE has some known limitations. First, KFE does not work well for strongly overlapped data. Second, KFE employs all of the training set samples during the feature extraction phase, which can result in significant computation when applied to very large datasets. Finally, KFE can result in overfitting. In this paper, we propose a substantial improvement to KFE that overcomes the above limitations by using a representative dataset, which consists of critical points that are generated from data-editing techniques and centroid points that are determined by using the Frequency Sensitive Competitive Learning (FSCL) algorithm. Experiments show that this new KFE algorithm performs well on significantly overlapped datasets, and it also reduces computational complexity. Further, by controlling the number of centroids, the overfitting problem can be effectively alleviated.
Bearing diagnostics: A method based on differential geometry

NASA Astrophysics Data System (ADS)

Tian, Ye; Wang, Zili; Lu, Chen; Wang, Zhipeng

2016-12-01

The structures around bearings are complex, and the working environment is variable. These conditions cause the collected vibration signals to become nonlinear, non-stationary, and chaotic characteristics that make noise reduction, feature extraction, fault diagnosis, and health assessment significantly challenging. Thus, a set of differential geometry-based methods with superiorities in nonlinear analysis is presented in this study. For noise reduction, the Local Projection method is modified by both selecting the neighborhood radius based on empirical mode decomposition and determining noise subspace constrained by neighborhood distribution information. For feature extraction, Hessian locally linear embedding is introduced to acquire manifold features from the manifold topological structures, and singular values of eigenmatrices as well as several specific frequency amplitudes in spectrograms are extracted subsequently to reduce the complexity of the manifold features. For fault diagnosis, information geometry-based support vector machine is applied to classify the fault states. For health assessment, the manifold distance is employed to represent the health information; the Gaussian mixture model is utilized to calculate the confidence values, which directly reflect the health status. Case studies on Lorenz signals and vibration datasets of bearings demonstrate the effectiveness of the proposed methods.
Evaluation of Apache Hadoop for parallel data analysis with ROOT

NASA Astrophysics Data System (ADS)

Lehrack, S.; Duckeck, G.; Ebke, J.

2014-06-01

The Apache Hadoop software is a Java based framework for distributed processing of large data sets across clusters of computers, using the Hadoop file system (HDFS) for data storage and backup and MapReduce as a processing platform. Hadoop is primarily designed for processing large textual data sets which can be processed in arbitrary chunks, and must be adapted to the use case of processing binary data files which cannot be split automatically. However, Hadoop offers attractive features in terms of fault tolerance, task supervision and control, multi-user functionality and job management. For this reason, we evaluated Apache Hadoop as an alternative approach to PROOF for ROOT data analysis. Two alternatives in distributing analysis data were discussed: either the data was stored in HDFS and processed with MapReduce, or the data was accessed via a standard Grid storage system (dCache Tier-2) and MapReduce was used only as execution back-end. The focus in the measurements were on the one hand to safely store analysis data on HDFS with reasonable data rates and on the other hand to process data fast and reliably with MapReduce. In the evaluation of the HDFS, read/write data rates from local Hadoop cluster have been measured and compared to standard data rates from the local NFS installation. In the evaluation of MapReduce, realistic ROOT analyses have been used and event rates have been compared to PROOF.
Model-Based Learning of Local Image Features for Unsupervised Texture Segmentation

NASA Astrophysics Data System (ADS)

Kiechle, Martin; Storath, Martin; Weinmann, Andreas; Kleinsteuber, Martin

2018-04-01

Features that capture well the textural patterns of a certain class of images are crucial for the performance of texture segmentation methods. The manual selection of features or designing new ones can be a tedious task. Therefore, it is desirable to automatically adapt the features to a certain image or class of images. Typically, this requires a large set of training images with similar textures and ground truth segmentation. In this work, we propose a framework to learn features for texture segmentation when no such training data is available. The cost function for our learning process is constructed to match a commonly used segmentation model, the piecewise constant Mumford-Shah model. This means that the features are learned such that they provide an approximately piecewise constant feature image with a small jump set. Based on this idea, we develop a two-stage algorithm which first learns suitable convolutional features and then performs a segmentation. We note that the features can be learned from a small set of images, from a single image, or even from image patches. The proposed method achieves a competitive rank in the Prague texture segmentation benchmark, and it is effective for segmenting histological images.
The TimeStudio Project: An open source scientific workflow system for the behavioral and brain sciences.

PubMed

Nyström, Pär; Falck-Ytter, Terje; Gredebäck, Gustaf

2016-06-01

This article describes a new open source scientific workflow system, the TimeStudio Project, dedicated to the behavioral and brain sciences. The program is written in MATLAB and features a graphical user interface for the dynamic pipelining of computer algorithms developed as TimeStudio plugins. TimeStudio includes both a set of general plugins (for reading data files, modifying data structures, visualizing data structures, etc.) and a set of plugins specifically developed for the analysis of event-related eyetracking data as a proof of concept. It is possible to create custom plugins to integrate new or existing MATLAB code anywhere in a workflow, making TimeStudio a flexible workbench for organizing and performing a wide range of analyses. The system also features an integrated sharing and archiving tool for TimeStudio workflows, which can be used to share workflows both during the data analysis phase and after scientific publication. TimeStudio thus facilitates the reproduction and replication of scientific studies, increases the transparency of analyses, and reduces individual researchers' analysis workload. The project website ( http://timestudioproject.com ) contains the latest releases of TimeStudio, together with documentation and user forums.
Statistical classification of road pavements using near field vehicle rolling noise measurements.

PubMed

Paulo, Joel Preto; Coelho, J L Bento; Figueiredo, Mário A T

2010-10-01

Low noise surfaces have been increasingly considered as a viable and cost-effective alternative to acoustical barriers. However, road planners and administrators frequently lack information on the correlation between the type of road surface and the resulting noise emission profile. To address this problem, a method to identify and classify different types of road pavements was developed, whereby near field road noise is analyzed using statistical learning methods. The vehicle rolling sound signal near the tires and close to the road surface was acquired by two microphones in a special arrangement which implements the Close-Proximity method. A set of features, characterizing the properties of the road pavement, was extracted from the corresponding sound profiles. A feature selection method was used to automatically select those that are most relevant in predicting the type of pavement, while reducing the computational cost. A set of different types of road pavement segments were tested and the performance of the classifier was evaluated. Results of pavement classification performed during a road journey are presented on a map, together with geographical data. This procedure leads to a considerable improvement in the quality of road pavement noise data, thereby increasing the accuracy of road traffic noise prediction models.
Automated method for measuring the extent of selective logging damage with airborne LiDAR data

NASA Astrophysics Data System (ADS)

Melendy, L.; Hagen, S. C.; Sullivan, F. B.; Pearson, T. R. H.; Walker, S. M.; Ellis, P.; Kustiyo; Sambodo, Ari Katmoko; Roswintiarti, O.; Hanson, M. A.; Klassen, A. W.; Palace, M. W.; Braswell, B. H.; Delgado, G. M.

2018-05-01

Selective logging has an impact on the global carbon cycle, as well as on the forest micro-climate, and longer-term changes in erosion, soil and nutrient cycling, and fire susceptibility. Our ability to quantify these impacts is dependent on methods and tools that accurately identify the extent and features of logging activity. LiDAR-based measurements of these features offers significant promise. Here, we present a set of algorithms for automated detection and mapping of critical features associated with logging - roads/decks, skid trails, and gaps - using commercial airborne LiDAR data as input. The automated algorithm was applied to commercial LiDAR data collected over two logging concessions in Kalimantan, Indonesia in 2014. The algorithm results were compared to measurements of the logging features collected in the field soon after logging was complete. The automated algorithm-mapped road/deck and skid trail features match closely with features measured in the field, with agreement levels ranging from 69% to 99% when adjusting for GPS location error. The algorithm performed most poorly with gaps, which, by their nature, are variable due to the unpredictable impact of tree fall versus the linear and regular features directly created by mechanical means. Overall, the automated algorithm performs well and offers significant promise as a generalizable tool useful to efficiently and accurately capture the effects of selective logging, including the potential to distinguish reduced impact logging from conventional logging.
Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata

PubMed Central

Liu, Aiming; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi

2017-01-01

Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain–computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain–computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain–computer interface systems. PMID:29117100
Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata.

PubMed

Liu, Aiming; Chen, Kun; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi

2017-11-08

Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain-computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain-computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain-computer interface systems.
A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification

PubMed Central

Pamukçu, Esra; Bozdogan, Hamparsum; Çalık, Sinan

2015-01-01

Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA) by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs) to be retained, we further introduce and develop celebrated Akaike's information criterion (AIC), consistent Akaike's information criterion (CAIC), and the information theoretic measure of complexity (ICOMP) criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions. PMID:25838836
DOE Office of Scientific and Technical Information (OSTI.GOV)

Levakhina, Y. M.; Mueller, J.; Buzug, T. M.

Purpose: This paper introduces a nonlinear weighting scheme into the backprojection operation within the simultaneous algebraic reconstruction technique (SART). It is designed for tomosynthesis imaging of objects with high-attenuation features in order to reduce limited angle artifacts. Methods: The algorithm estimates which projections potentially produce artifacts in a voxel. The contribution of those projections into the updating term is reduced. In order to identify those projections automatically, a four-dimensional backprojected space representation is used. Weighting coefficients are calculated based on a dissimilarity measure, evaluated in this space. For each combination of an angular view direction and a voxel position anmore » individual weighting coefficient for the updating term is calculated. Results: The feasibility of the proposed approach is shown based on reconstructions of the following real three-dimensional tomosynthesis datasets: a mammography quality phantom, an apple with metal needles, a dried finger bone in water, and a human hand. Datasets have been acquired with a Siemens Mammomat Inspiration tomosynthesis device and reconstructed using SART with and without suggested weighting. Out-of-focus artifacts are described using line profiles and measured using standard deviation (STD) in the plane and below the plane which contains artifact-causing features. Artifacts distribution in axial direction is measured using an artifact spread function (ASF). The volumes reconstructed with the weighting scheme demonstrate the reduction of out-of-focus artifacts, lower STD (meaning reduction of artifacts), and narrower ASF compared to nonweighted SART reconstruction. It is achieved successfully for different kinds of structures: point-like structures such as phantom features, long structures such as metal needles, and fine structures such as trabecular bone structures. Conclusions: Results indicate the feasibility of the proposed algorithm to reduce typical tomosynthesis artifacts produced by high-attenuation features. The proposed algorithm assigns weighting coefficients automatically and no segmentation or tissue-classification steps are required. The algorithm can be included into various iterative reconstruction algorithms with an additive updating strategy. It can also be extended to computed tomography case with the complete set of angular data.« less
A practical approach to Sasang constitutional diagnosis using vocal features

PubMed Central

2013-01-01

Background Sasang constitutional medicine (SCM) is a type of tailored medicine that divides human beings into four Sasang constitutional (SC) types. Diagnosis of SC types is crucial to proper treatment in SCM. Voice characteristics have been used as an essential clue for diagnosing SC types. In the past, many studies tried to extract quantitative vocal features to make diagnosis models; however, these studies were flawed by limited data collected from one or a few sites, long recording time, and low accuracy. We propose a practical diagnosis model having only a few variables, which decreases model complexity. This in turn, makes our model appropriate for clinical applications. Methods A total of 2,341 participants’ voice recordings were used in making a SC classification model and to test the generalization ability of the model. Although the voice data consisted of five vowels and two repeated sentences per participant, we used only the sentence part for our study. A total of 21 features were extracted, and an advanced feature selection method—the least absolute shrinkage and selection operator (LASSO)—was applied to reduce the number of variables for classifier learning. A SC classification model was developed using multinomial logistic regression via LASSO. Results We compared the proposed classification model to the previous study, which used both sentences and five vowels from the same patient’s group. The classification accuracies for the test set were 47.9% and 40.4% for male and female, respectively. Our result showed that the proposed method was superior to the previous study in that it required shorter voice recordings, is more applicable to practical use, and had better generalization performance. Conclusions We proposed a practical SC classification method and showed that our model having fewer variables outperformed the model having many variables in the generalization test. We attempted to reduce the number of variables in two ways: 1) the initial number of candidate features was decreased by considering shorter voice recording, and 2) LASSO was introduced for reducing model complexity. The proposed method is suitable for an actual clinical environment. Moreover, we expect it to yield more stable results because of the model’s simplicity. PMID:24200041
Deficits in Cross-Race Face Learning: Insights From Eye Movements and Pupillometry

PubMed Central

Goldinger, Stephen D.; He, Yi; Papesh, Megan H.

2010-01-01

The own-race bias (ORB) is a well-known finding wherein people are better able to recognize and discriminate own-race faces, relative to cross-race faces. In 2 experiments, participants viewed Asian and Caucasian faces, in preparation for recognition memory tests, while their eye movements and pupil diameters were continuously monitored. In Experiment 1 (with Caucasian participants), systematic differences emerged in both measures as a function of depicted race: While encoding cross-race faces, participants made fewer (and longer) fixations, they preferentially attended to different sets of features, and their pupils were more dilated, all relative to own-race faces. Also, in both measures, a pattern emerged wherein some participants reduced their apparent encoding effort to cross-race faces over trials. In Experiment 2 (with Asian participants), the authors observed the same patterns, although the ORB favored the opposite set of faces. Taken together, the results suggest that the ORB appears during initial perceptual encoding. Relative to own-race face encoding, cross-race encoding requires greater effort, which may reduce vigilance in some participants. PMID:19686008
Thiol-Ene functionalized siloxanes for use as elastomeric dental impression materials

PubMed Central

Cole, Megan A.; Jankousky, Katherine C.; Bowman, Christopher N.

2014-01-01

Objectives Thiol- and allyl-functionalized siloxane oligomers are synthesized and evaluated for use as a radical-mediated, rapid set elastomeric dental impression material. Thiol-ene siloxane formulations are crosslinked using a redox-initiated polymerization scheme, and the mechanical properties of the thiol-ene network are manipulated through the incorporation of varying degrees of plasticizer and kaolin filler. Formulations with medium and light body consistencies are further evaluated for their ability to accurately replicate features on both the gross and microscopic levels. We hypothesize that thiol-ene functionalized siloxane systems will exhibit faster setting times and greater detail reproduction than commercially available polyvinylsiloxane (PVS) materials of comparable consistencies. Methods Thiol-ene functionalized siloxane mixtures formulated with varying levels of redox initiators, plasticizer, and kaolin filler are made and evaluated for their polymerization speed (FTIR), consistency (ISO4823.9.2), and surface energy (goniometer). Feature replication is evaluated quantitatively by SEM. The Tg, storage modulus, and creep behavior are determined by DMA. Results Increasing redox initiation rate increases the polymerization rate but at high levels also limits working time. Combining 0.86 wt% oxidizing agent with up to 5 wt% plasticizer gave a working time of 3 min and a setting time of 2 min. The selected medium and light body thiol-ene formulations also achieved greater qualitative detail reproduction than the commercial material and reproduced micrometer patterns with 98% accuracy. Significance Improving detail reproduction and setting speed is a primary focus of dental impression material design and synthesis. Radical-mediated polymerizations, particularly thiol-ene reactions, are recognized for their speed, reduced shrinkage, and ‘click’ nature. PMID:24553250
Content validity of the DSM-IV borderline and narcissistic personality disorder criteria sets.

PubMed

Blais, M A; Hilsenroth, M J; Castlebury, F D

1997-01-01

This study sought to empirically evaluate the content validity of the newly revised DSM-IV narcissistic personality disorder (NPD) and borderline personality disorder (BPD) criteria sets. Using the essential features of each disorder as construct definitions, factor analysis was used to determine how adequately the criteria sets covered the constructs. In addition, this empirical investigation sought to: 1) help define the dimensions underlying these polythetic disorders; 2) identify core features of each diagnosis; and 3) highlight the characteristics that may be most useful in diagnosing these two disorders. Ninety-one outpatients meeting DSM-IV criteria for a personality disorder (PD) were identified through a retrospective analysis of chart information. Records of these 91 patients were independently rated on all of the BPD and NPD symptom criteria for the DSM-IV. Acceptable interrater reliability (kappa estimates) was obtained for both presence or absence of a PD and symptom criteria for BPD and NPD. The factor analysis, performed separately for each disorder, identified a three-factor solution for both the DSM-IV BPD and NPD criteria sets. The results of this study provide strong support for the content validity of the NPD criteria set and moderate support for the content validly of the BPD criteria set. Three domains were found to comprise the BPD criteria set, with the essential features of interpersonal and identity instability forming one domain, and impulsivity and affective instability each identified as separate domains. Factor analysis of the NPD criteria set found three factors basically corresponding to the essential features of grandiosity, lack of empathy, and need for admiration. Therefore, the NPD criteria set adequately covers the essential or defining features of the disorder.

Predicting a small molecule-kinase interaction map: A machine learning approach

PubMed Central

2011-01-01

Background We present a machine learning approach to the problem of protein ligand interaction prediction. We focus on a set of binding data obtained from 113 different protein kinases and 20 inhibitors. It was attained through ATP site-dependent binding competition assays and constitutes the first available dataset of this kind. We extract information about the investigated molecules from various data sources to obtain an informative set of features. Results A Support Vector Machine (SVM) as well as a decision tree algorithm (C5/See5) is used to learn models based on the available features which in turn can be used for the classification of new kinase-inhibitor pair test instances. We evaluate our approach using different feature sets and parameter settings for the employed classifiers. Moreover, the paper introduces a new way of evaluating predictions in such a setting, where different amounts of information about the binding partners can be assumed to be available for training. Results on an external test set are also provided. Conclusions In most of the cases, the presented approach clearly outperforms the baseline methods used for comparison. Experimental results indicate that the applied machine learning methods are able to detect a signal in the data and predict binding affinity to some extent. For SVMs, the binding prediction can be improved significantly by using features that describe the active site of a kinase. For C5, besides diversity in the feature set, alignment scores of conserved regions turned out to be very useful. PMID:21708012
Radio-nuclide mixture identification using medium energy resolution detectors

DOEpatents

Nelson, Karl Einar

2013-09-17

According to one embodiment, a method for identifying radio-nuclides includes receiving spectral data, extracting a feature set from the spectral data comparable to a plurality of templates in a template library, and using a branch and bound method to determine a probable template match based on the feature set and templates in the template library. In another embodiment, a device for identifying unknown radio-nuclides includes a processor, a multi-channel analyzer, and a memory operatively coupled to the processor, the memory having computer readable code stored thereon. The computer readable code is configured, when executed by the processor, to receive spectral data, to extract a feature set from the spectral data comparable to a plurality of templates in a template library, and to use a branch and bound method to determine a probable template match based on the feature set and templates in the template library.
Human red blood cell recognition enhancement with three-dimensional morphological features obtained by digital holographic imaging

NASA Astrophysics Data System (ADS)

Jaferzadeh, Keyvan; Moon, Inkyu

2016-12-01

The classification of erythrocytes plays an important role in the field of hematological diagnosis, specifically blood disorders. Since the biconcave shape of red blood cell (RBC) is altered during the different stages of hematological disorders, we believe that the three-dimensional (3-D) morphological features of erythrocyte provide better classification results than conventional two-dimensional (2-D) features. Therefore, we introduce a set of 3-D features related to the morphological and chemical properties of RBC profile and try to evaluate the discrimination power of these features against 2-D features with a neural network classifier. The 3-D features include erythrocyte surface area, volume, average cell thickness, sphericity index, sphericity coefficient and functionality factor, MCH and MCHSD, and two newly introduced features extracted from the ring section of RBC at the single-cell level. In contrast, the 2-D features are RBC projected surface area, perimeter, radius, elongation, and projected surface area to perimeter ratio. All features are obtained from images visualized by off-axis digital holographic microscopy with a numerical reconstruction algorithm, and four categories of biconcave (doughnut shape), flat-disc, stomatocyte, and echinospherocyte RBCs are interested. Our experimental results demonstrate that the 3-D features can be more useful in RBC classification than the 2-D features. Finally, we choose the best feature set of the 2-D and 3-D features by sequential forward feature selection technique, which yields better discrimination results. We believe that the final feature set evaluated with a neural network classification strategy can improve the RBC classification accuracy.
System and method for the detection of anomalies in an image

DOEpatents

Prasad, Lakshman; Swaminarayan, Sriram

2013-09-03

Preferred aspects of the present invention can include receiving a digital image at a processor; segmenting the digital image into a hierarchy of feature layers comprising one or more fine-scale features defining a foreground object embedded in one or more coarser-scale features defining a background to the one or more fine-scale features in the segmentation hierarchy; detecting a first fine-scale foreground feature as an anomaly with respect to a first background feature within which it is embedded; and constructing an anomalous feature layer by synthesizing spatially contiguous anomalous fine-scale features. Additional preferred aspects of the present invention can include detecting non-pervasive changes between sets of images in response at least in part to one or more difference images between the sets of images.
Potential benefits of farm scale measures versus landscape measures for reducing nitrate loads in a Danish catchment.

PubMed

Hashemi, Fatemeh; Olesen, Jørgen E; Børgesen, Christen D; Tornbjerg, Henrik; Thodsen, Hans; Dalgaard, Tommy

2018-05-08

To comply with the EU Water Framework Directive, Denmark must further reduce the nitrate (N)-load to marine ecosystems from agricultural areas. Under the anticipated future spatially targeted regulation, the required N-load reductions will differ between catchments, and these are expected to be mitigated by a combination of land and water management measures. Here, we explored how the expected N-load reduction target of 38% for a Danish catchment (River Odense) could be achieved through a combination of farm and landscape measures. These include: (a) N-leaching reduction through changing the crop rotation and applying cover crops, (b) enhancing N-reduction through (re)establishment of wetlands, and (c) reducing N-leaching through spatially targeting of set-aside to high N-load areas. Changes in crop rotations were effective in reducing N-leaching by growing crops with a longer growing season and by allowing a higher use of cover crops. A combination of wetlands and changes in crop rotations were needed for reaching the N-load reduction target without use of set-aside. However, not all combinations of wetlands and crop rotation changes achieved the required N-load reduction, resulting in a need for targeted set-aside, implying a need for balancing measures at farm and landscape scale to maximize N load reduction while minimizing loss of productive land. The effectiveness of farm scale measures is affected by farm and soil types as well as by N-reduction in groundwater, while the possibilities for using wetlands for decreasing the N-load depends on landscape features, allowing the establishment of wetlands connected to streams and rivers. Copyright © 2018 Elsevier B.V. All rights reserved.
Reliable pre-eclampsia pathways based on multiple independent microarray data sets.

PubMed

Kawasaki, Kaoru; Kondoh, Eiji; Chigusa, Yoshitsugu; Ujita, Mari; Murakami, Ryusuke; Mogami, Haruta; Brown, J B; Okuno, Yasushi; Konishi, Ikuo

2015-02-01

Pre-eclampsia is a multifactorial disorder characterized by heterogeneous clinical manifestations. Gene expression profiling of preeclamptic placenta have provided different and even opposite results, partly due to data compromised by various experimental artefacts. Here we aimed to identify reliable pre-eclampsia-specific pathways using multiple independent microarray data sets. Gene expression data of control and preeclamptic placentas were obtained from Gene Expression Omnibus. Single-sample gene-set enrichment analysis was performed to generate gene-set activation scores of 9707 pathways obtained from the Molecular Signatures Database. Candidate pathways were identified by t-test-based screening using data sets, GSE10588, GSE14722 and GSE25906. Additionally, recursive feature elimination was applied to arrive at a further reduced set of pathways. To assess the validity of the pre-eclampsia pathways, a statistically-validated protocol was executed using five data sets including two independent other validation data sets, GSE30186, GSE44711. Quantitative real-time PCR was performed for genes in a panel of potential pre-eclampsia pathways using placentas of 20 women with normal or severe preeclamptic singleton pregnancies (n = 10, respectively). A panel of ten pathways were found to discriminate women with pre-eclampsia from controls with high accuracy. Among these were pathways not previously associated with pre-eclampsia, such as the GABA receptor pathway, as well as pathways that have already been linked to pre-eclampsia, such as the glutathione and CDKN1C pathways. mRNA expression of GABRA3 (GABA receptor pathway), GCLC and GCLM (glutathione metabolic pathway), and CDKN1C was significantly reduced in the preeclamptic placentas. In conclusion, ten accurate and reliable pre-eclampsia pathways were identified based on multiple independent microarray data sets. A pathway-based classification may be a worthwhile approach to elucidate the pathogenesis of pre-eclampsia. © The Author 2014. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
General Approach for Rock Classification Based on Digital Image Analysis of Electrical Borehole Wall Images

NASA Astrophysics Data System (ADS)

Linek, M.; Jungmann, M.; Berlage, T.; Clauser, C.

2005-12-01

Within the Ocean Drilling Program (ODP), image logging tools have been routinely deployed such as the Formation MicroScanner (FMS) or the Resistivity-At-Bit (RAB) tools. Both logging methods are based on resistivity measurements at the borehole wall and therefore are sensitive to conductivity contrasts, which are mapped in color scale images. These images are commonly used to study the structure of the sedimentary rocks and the oceanic crust (petrologic fabric, fractures, veins, etc.). So far, mapping of lithology from electrical images is purely based on visual inspection and subjective interpretation. We apply digital image analysis on electrical borehole wall images in order to develop a method, which augments objective rock identification. We focus on supervised textural pattern recognition which studies the spatial gray level distribution with respect to certain rock types. FMS image intervals of rock classes known from core data are taken in order to train textural characteristics for each class. A so-called gray level co-occurrence matrix is computed by counting the occurrence of a pair of gray levels that are a certain distant apart. Once the matrix for an image interval is computed, we calculate the image contrast, homogeneity, energy, and entropy. We assign characteristic textural features to different rock types by reducing the image information into a small set of descriptive features. Once a discriminating set of texture features for each rock type is found, we are able to discriminate the entire FMS images regarding the trained rock type classification. A rock classification based on texture features enables quantitative lithology mapping and is characterized by a high repeatability, in contrast to a purely visual subjective image interpretation. We show examples for the rock classification between breccias, pillows, massive units, and horizontally bedded tuffs based on ODP image data.
CP-CHARM: segmentation-free image classification made accessible.

PubMed

Uhlmann, Virginie; Singh, Shantanu; Carpenter, Anne E

2016-01-27

Automated classification using machine learning often relies on features derived from segmenting individual objects, which can be difficult to automate. WND-CHARM is a previously developed classification algorithm in which features are computed on the whole image, thereby avoiding the need for segmentation. The algorithm obtained encouraging results but requires considerable computational expertise to execute. Furthermore, some benchmark sets have been shown to be subject to confounding artifacts that overestimate classification accuracy. We developed CP-CHARM, a user-friendly image-based classification algorithm inspired by WND-CHARM in (i) its ability to capture a wide variety of morphological aspects of the image, and (ii) the absence of requirement for segmentation. In order to make such an image-based classification method easily accessible to the biological research community, CP-CHARM relies on the widely-used open-source image analysis software CellProfiler for feature extraction. To validate our method, we reproduced WND-CHARM's results and ensured that CP-CHARM obtained comparable performance. We then successfully applied our approach on cell-based assay data and on tissue images. We designed these new training and test sets to reduce the effect of batch-related artifacts. The proposed method preserves the strengths of WND-CHARM - it extracts a wide variety of morphological features directly on whole images thereby avoiding the need for cell segmentation, but additionally, it makes the methods easily accessible for researchers without computational expertise by implementing them as a CellProfiler pipeline. It has been demonstrated to perform well on a wide range of bioimage classification problems, including on new datasets that have been carefully selected and annotated to minimize batch effects. This provides for the first time a realistic and reliable assessment of the whole image classification strategy.
GATOR: Requirements capturing of telephony features

NASA Technical Reports Server (NTRS)

Dankel, Douglas D., II; Walker, Wayne; Schmalz, Mark

1992-01-01

We are developing a natural language-based, requirements gathering system called GATOR (for the GATherer Of Requirements). GATOR assists in the development of more accurate and complete specifications of new telephony features. GATOR interacts with a feature designer who describes a new feature, set of features, or capability to be implemented. The system aids this individual in the specification process by asking for clarifications when potential ambiguities are present, by identifying potential conflicts with other existing features, and by presenting its understanding of the feature to the designer. Through user interaction with a model of the existing telephony feature set, GATOR constructs a formal representation of the new, 'to be implemented' feature. Ultimately GATOR will produce a requirements document and will maintain an internal representation of this feature to aid in future design and specification. This paper consists of three sections that describe (1) the structure of GATOR, (2) POND, GATOR's internal knowledge representation language, and (3) current research issues.
Assessment and improvement of statistical tools for comparative proteomics analysis of sparse data sets with few experimental replicates.

PubMed

Schwämmle, Veit; León, Ileana Rodríguez; Jensen, Ole Nørregaard

2013-09-06

Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.
Feature engineering for MEDLINE citation categorization with MeSH.

PubMed

Jimeno Yepes, Antonio Jose; Plaza, Laura; Carrillo-de-Albornoz, Jorge; Mork, James G; Aronson, Alan R

2015-04-08

Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations. Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance. We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system.
An automated procedure to identify biomedical articles that contain cancer-associated gene variants.

PubMed

McDonald, Ryan; Scott Winters, R; Ankuda, Claire K; Murphy, Joan A; Rogers, Amy E; Pereira, Fernando; Greenblatt, Marc S; White, Peter S

2006-09-01

The proliferation of biomedical literature makes it increasingly difficult for researchers to find and manage relevant information. However, identifying research articles containing mutation data, a requisite first step in integrating large and complex mutation data sets, is currently tedious, time-consuming and imprecise. More effective mechanisms for identifying articles containing mutation information would be beneficial both for the curation of mutation databases and for individual researchers. We developed an automated method that uses information extraction, classifier, and relevance ranking techniques to determine the likelihood of MEDLINE abstracts containing information regarding genomic variation data suitable for inclusion in mutation databases. We targeted the CDKN2A (p16) gene and the procedure for document identification currently used by CDKN2A Database curators as a measure of feasibility. A set of abstracts was manually identified from a MEDLINE search as potentially containing specific CDKN2A mutation events. A subset of these abstracts was used as a training set for a maximum entropy classifier to identify text features distinguishing "relevant" from "not relevant" abstracts. Each document was represented as a set of indicative word, word pair, and entity tagger-derived genomic variation features. When applied to a test set of 200 candidate abstracts, the classifier predicted 88 articles as being relevant; of these, 29 of 32 manuscripts in which manual curation found CDKN2A sequence variants were positively predicted. Thus, the set of potentially useful articles that a manual curator would have to review was reduced by 56%, maintaining 91% recall (sensitivity) and more than doubling precision (positive predictive value). Subsequent expansion of the training set to 494 articles yielded similar precision and recall rates, and comparison of the original and expanded trials demonstrated that the average precision improved with the larger data set. Our results show that automated systems can effectively identify article subsets relevant to a given task and may prove to be powerful tools for the broader research community. This procedure can be readily adapted to any or all genes, organisms, or sets of documents. Published 2006 Wiley-Liss, Inc.
Single and Multiple Object Tracking Using a Multi-Feature Joint Sparse Representation.

PubMed

Hu, Weiming; Li, Wei; Zhang, Xiaoqin; Maybank, Stephen

2015-04-01

In this paper, we propose a tracking algorithm based on a multi-feature joint sparse representation. The templates for the sparse representation can include pixel values, textures, and edges. In the multi-feature joint optimization, noise or occlusion is dealt with using a set of trivial templates. A sparse weight constraint is introduced to dynamically select the relevant templates from the full set of templates. A variance ratio measure is adopted to adaptively adjust the weights of different features. The multi-feature template set is updated adaptively. We further propose an algorithm for tracking multi-objects with occlusion handling based on the multi-feature joint sparse reconstruction. The observation model based on sparse reconstruction automatically focuses on the visible parts of an occluded object by using the information in the trivial templates. The multi-object tracking is simplified into a joint Bayesian inference. The experimental results show the superiority of our algorithm over several state-of-the-art tracking algorithms.
A wavelet-based approach for a continuous analysis of phonovibrograms.

PubMed

Unger, Jakob; Meyer, Tobias; Doellinger, Michael; Hecker, Dietmar J; Schick, Bernhard; Lohscheller, Joerg

2012-01-01

Recently, endoscopic high-speed laryngoscopy has been established for commercial use and constitutes a state-of-the-art technique to examine vocal fold dynamics. Despite overcoming many limitations of commonly applied stroboscopy it has not gained widespread clinical application, yet. A major drawback is a missing methodology of extracting valuable features to support visual assessment or computer-aided diagnosis. In this paper a compact and descriptive feature set is presented. The feature extraction routines are based on two-dimensional color graphs called phonovibrograms (PVG). These graphs contain the full spatio-temporal pattern of vocal fold dynamics and are therefore suited to derive features that comprehensively describe the vibration pattern of vocal folds. Within our approach, clinically relevant features such as glottal closure type, symmetry and periodicity are quantified in a set of 10 descriptive features. The suitability for classification tasks is shown using a clinical data set comprising 50 healthy and 50 paralytic subjects. A classification accuracy of 93.2% has been achieved.
Positive–Negative Asymmetry in the Evaluations of Political Candidates. The Role of Features of Similarity and Affect in Voter Behavior

PubMed Central

Falkowski, Andrzej; Jabłońska, Magdalena

2018-01-01

In this study we followed the extension of Tversky’s research about features of similarity with its application to open sets. Unlike the original closed-set model in which a feature was shifted between a common and a distinctive set, we investigated how addition of new features and deletion of existing features affected similarity judgments. The model was tested empirically in a political context and we analyzed how positive and negative changes in a candidate’s profile affect the similarity of the politician to his or her ideal and opposite counterpart. The results showed a positive–negative asymmetry in comparison judgments where enhancing negative features (distinctive for an ideal political candidate) had a greater effect on judgments than operations on positive (common) features. However, the effect was not observed for comparisons to a bad politician. Further analyses showed that in the case of a negative reference point, the relationship between similarity judgments and voting intention was mediated by the affective evaluation of the candidate. PMID:29535663
The research on multi-projection correction based on color coding grid array

NASA Astrophysics Data System (ADS)

Yang, Fan; Han, Cheng; Bai, Baoxing; Zhang, Chao; Zhao, Yunxiu

2017-10-01

There are many disadvantages such as lower timeliness, greater manual intervention in multi-channel projection system, in order to solve the above problems, this paper proposes a multi-projector correction technology based on color coding grid array. Firstly, a color structured light stripe is generated by using the De Bruijn sequences, then meshing the feature information of the color structured light stripe image. We put the meshing colored grid intersection as the center of the circle, and build a white solid circle as the feature sample set of projected images. It makes the constructed feature sample set not only has the perceptual localization, but also has good noise immunity. Secondly, we establish the subpixel geometric mapping relationship between the projection screen and the individual projectors by using the structure of light encoding and decoding based on the color array, and the geometrical mapping relation is used to solve the homography matrix of each projector. Lastly the brightness inconsistency of the multi-channel projection overlap area is seriously interfered, it leads to the corrected image doesn't fit well with the observer's visual needs, and we obtain the projection display image of visual consistency by using the luminance fusion correction algorithm. The experimental results show that this method not only effectively solved the problem of distortion of multi-projection screen and the issue of luminance interference in overlapping region, but also improved the calibration efficient of multi-channel projective system and reduced the maintenance cost of intelligent multi-projection system.
Wide-bandwidth high-resolution search for extraterrestrial intelligence

NASA Technical Reports Server (NTRS)

Horowitz, Paul

1993-01-01

A third antenna was added to the system. It is a terrestrial low-gain feed, to act as a veto for local interference. The 3-chip design for a 4 megapoint complex FFT was reduced to finished working hardware. The 4-Megachannel circuit board contains 36 MByte of DRAM, 5 CPLDs, the three large FFT ASICs, and 74 ICs in all. The Austek FDP-based Spectrometer/Power Accumulator (SPA) has now been implemented as a 4-layer printed circuit. A PC interface board has been designed and together with its associated user interface and control software allows an IBM compatible computer to control the SPA board, and facilitates the transfer of spectra to the PC for display, processing, and storage. The Feature Recognizer Array cards receive the stream of modulus words from the 4M FFT cards, and forward a greatly thinned set of reports to the PC's in whose backplane they reside. In particular, a powerful ROM-based state-machine architecture has been adopted, and DRAM has been added to permit integration modes when tracking or reobserving source candidates. The general purpose (GP) array consists of twenty '486 PC class computers, each of which receives and processes the data from a feature extractor/correlator board set. The array performs a first analysis on the provided 'features' and then passes this information on to the workstation. The core workstation software is now written. That is, the communication channels between the user interface, the backend monitor program and the PC's have working software.
Intrusion detection using rough set classification.

PubMed

Zhang, Lian-hua; Zhang, Guan-hua; Zhang, Jie; Bai, Ying-cai

2004-09-01

Recently machine learning-based intrusion detection approaches have been subjected to extensive researches because they can detect both misuse and anomaly. In this paper, rough set classification (RSC), a modern learning algorithm, is used to rank the features extracted for detecting intrusions and generate intrusion detection models. Feature ranking is a very critical step when building the model. RSC performs feature ranking before generating rules, and converts the feature ranking to minimal hitting set problem addressed by using genetic algorithm (GA). This is done in classical approaches using Support Vector Machine (SVM) by executing many iterations, each of which removes one useless feature. Compared with those methods, our method can avoid many iterations. In addition, a hybrid genetic algorithm is proposed to increase the convergence speed and decrease the training time of RSC. The models generated by RSC take the form of "IF-THEN" rules, which have the advantage of explication. Tests and comparison of RSC with SVM on DARPA benchmark data showed that for Probe and DoS attacks both RSC and SVM yielded highly accurate results (greater than 99% accuracy on testing set).
A multiresolution prostate representation for automatic segmentation in magnetic resonance images.

PubMed

Alvarez, Charlens; Martínez, Fabio; Romero, Eduardo

2017-04-01

Accurate prostate delineation is necessary in radiotherapy processes for concentrating the dose onto the prostate and reducing side effects in neighboring organs. Currently, manual delineation is performed over magnetic resonance imaging (MRI) taking advantage of its high soft tissue contrast property. Nevertheless, as human intervention is a consuming task with high intra- and interobserver variability rates, (semi)-automatic organ delineation tools have emerged to cope with these challenges, reducing the time spent for these tasks. This work presents a multiresolution representation that defines a novel metric and allows to segment a new prostate by combining a set of most similar prostates in a dataset. The proposed method starts by selecting the set of most similar prostates with respect to a new one using the proposed multiresolution representation. This representation characterizes the prostate through a set of salient points, extracted from a region of interest (ROI) that encloses the organ and refined using structural information, allowing to capture main relevant features of the organ boundary. Afterward, the new prostate is automatically segmented by combining the nonrigidly registered expert delineations associated to the previous selected similar prostates using a weighted patch-based strategy. Finally, the prostate contour is smoothed based on morphological operations. The proposed approach was evaluated with respect to the expert manual segmentation under a leave-one-out scheme using two public datasets, obtaining averaged Dice coefficients of 82% ± 0.07 and 83% ± 0.06, and demonstrating a competitive performance with respect to atlas-based state-of-the-art methods. The proposed multiresolution representation provides a feature space that follows a local salient point criteria and a global rule of the spatial configuration among these points to find out the most similar prostates. This strategy suggests an easy adaptation in the clinical routine, as supporting tool for annotation. © 2017 American Association of Physicists in Medicine.
Clinical and pathological features of kidney transplant patients with concurrent polyomavirus nephropathy and rejection-associated endarteritis

PubMed Central

McGregor, Stephanie M; Chon, W James; Kim, Lisa; Chang, Anthony; Meehan, Shane M

2015-01-01

AIM: To describe the clinicopathologic features of concurrent polyomavirus nephropathy (PVN) and endarteritis due to rejection in renal allografts. METHODS: We searched our electronic records database for cases with transplant kidney biopsies demonstrating features of both PVN and acute rejection (AR). PVN was defined by the presence of typical viral cytopathic effect on routine sections and positive polyomavirus SV40 large-T antigen immunohistochemistry. AR was identified by endarteritis (v1 by Banff criteria). All cases were subjected to chart review in order to determine clinical presentation, treatment course and outcomes. Outcomes were recorded with a length of follow-up of at least one year or time to nephrectomy. RESULTS: Of 94 renal allograft recipients who developed PVN over an 11-year period at our institution, we identified 7 (7.4%) with viral cytopathic changes, SV40 large T antigen staining, and endarteritis in the same biopsy specimen, indicative of concurrent PVN and AR. Four arose after reduction of immunosuppression (IS) (for treatment of PVN in 3 and tuberculosis in 1), and 3 patients had no decrease of IS before developing simultaneous concurrent disease. Treatment consisted of reduced oral IS and leflunomide for PVN, and anti-rejection therapy. Three of 4 patients who developed endarteritis in the setting of reduced IS lost their grafts to rejection. All 3 patients with simultaneous PVN and endarteritis cleared viremia and were stable at 1 year of follow up. Patients with endarteritis and PVN arising in a background of reduced IS had more severe rejection and poorer outcome. CONCLUSION: Concurrent PVN and endarteritis may be more frequent than is currently appreciated and may occur with or without prior reduction of IS. PMID:26722657

Quality assessment of data discrimination using self-organizing maps.

PubMed

Mekler, Alexey; Schwarz, Dmitri

2014-10-01

One of the important aspects of the data classification problem lies in making the most appropriate selection of features. The set of variables should be small and, at the same time, should provide reliable discrimination of the classes. The method for the discriminating power evaluation that enables a comparison between different sets of variables will be useful in the search for the set of variables. A new approach to feature selection is presented. Two methods of evaluation of the data discriminating power of a feature set are suggested. Both of the methods implement self-organizing maps (SOMs) and the newly introduced exponents of the degree of data clusterization on the SOM. The first method is based on the comparison of intraclass and interclass distances on the map. Another method concerns the evaluation of the relative number of best matching unit's (BMUs) nearest neighbors of the same class. Both methods make it possible to evaluate the discriminating power of a feature set in cases when this set provides nonlinear discrimination of the classes. Current algorithms in program code can be downloaded for free at http://mekler.narod.ru/Science/Articles_support.html, as well as the supporting data files. Copyright © 2014 Elsevier Inc. All rights reserved.
Learning representations for the early detection of sepsis with deep neural networks.

PubMed

Kam, Hye Jin; Kim, Ha Young

2017-10-01

Sepsis is one of the leading causes of death in intensive care unit patients. Early detection of sepsis is vital because mortality increases as the sepsis stage worsens. This study aimed to develop detection models for the early stage of sepsis using deep learning methodologies, and to compare the feasibility and performance of the new deep learning methodology with those of the regression method with conventional temporal feature extraction. Study group selection adhered to the InSight model. The results of the deep learning-based models and the InSight model were compared. With deep feedforward networks, the area under the ROC curve (AUC) of the models were 0.887 and 0.915 for the InSight and the new feature sets, respectively. For the model with the combined feature set, the AUC was the same as that of the basic feature set (0.915). For the long short-term memory model, only the basic feature set was applied and the AUC improved to 0.929 compared with the existing 0.887 of the InSight model. The contributions of this paper can be summarized in three ways: (i) improved performance without feature extraction using domain knowledge, (ii) verification of feature extraction capability of deep neural networks through comparison with reference features, and (iii) improved performance with feedforward neural networks using long short-term memory, a neural network architecture that can learn sequential patterns. Copyright © 2017 Elsevier Ltd. All rights reserved.
Feature selection for elderly faller classification based on wearable sensors.

PubMed

Howcroft, Jennifer; Kofman, Jonathan; Lemaire, Edward D

2017-05-30

Wearable sensors can be used to derive numerous gait pattern features for elderly fall risk and faller classification; however, an appropriate feature set is required to avoid high computational costs and the inclusion of irrelevant features. The objectives of this study were to identify and evaluate smaller feature sets for faller classification from large feature sets derived from wearable accelerometer and pressure-sensing insole gait data. A convenience sample of 100 older adults (75.5 ± 6.7 years; 76 non-fallers, 24 fallers based on 6 month retrospective fall occurrence) walked 7.62 m while wearing pressure-sensing insoles and tri-axial accelerometers at the head, pelvis, left and right shanks. Feature selection was performed using correlation-based feature selection (CFS), fast correlation based filter (FCBF), and Relief-F algorithms. Faller classification was performed using multi-layer perceptron neural network, naïve Bayesian, and support vector machine classifiers, with 75:25 single stratified holdout and repeated random sampling. The best performing model was a support vector machine with 78% accuracy, 26% sensitivity, 95% specificity, 0.36 F1 score, and 0.31 MCC and one posterior pelvis accelerometer input feature (left acceleration standard deviation). The second best model achieved better sensitivity (44%) and used a support vector machine with 74% accuracy, 83% specificity, 0.44 F1 score, and 0.29 MCC. This model had ten input features: maximum, mean and standard deviation posterior acceleration; maximum, mean and standard deviation anterior acceleration; mean superior acceleration; and three impulse features. The best multi-sensor model sensitivity (56%) was achieved using posterior pelvis and both shank accelerometers and a naïve Bayesian classifier. The best single-sensor model sensitivity (41%) was achieved using the posterior pelvis accelerometer and a naïve Bayesian classifier. Feature selection provided models with smaller feature sets and improved faller classification compared to faller classification without feature selection. CFS and FCBF provided the best feature subset (one posterior pelvis accelerometer feature) for faller classification. However, better sensitivity was achieved by the second best model based on a Relief-F feature subset with three pressure-sensing insole features and seven head accelerometer features. Feature selection should be considered as an important step in faller classification using wearable sensors.
A hybrid feature selection approach for the early diagnosis of Alzheimer’s disease

NASA Astrophysics Data System (ADS)

Gallego-Jutglà, Esteve; Solé-Casals, Jordi; Vialatte, François-Benoît; Elgendi, Mohamed; Cichocki, Andrzej; Dauwels, Justin

2015-02-01

Objective. Recently, significant advances have been made in the early diagnosis of Alzheimer’s disease (AD) from electroencephalography (EEG). However, choosing suitable measures is a challenging task. Among other measures, frequency relative power (RP) and loss of complexity have been used with promising results. In the present study we investigate the early diagnosis of AD using synchrony measures and frequency RP on EEG signals, examining the changes found in different frequency ranges. Approach. We first explore the use of a single feature for computing the classification rate (CR), looking for the best frequency range. Then, we present a multiple feature classification system that outperforms all previous results using a feature selection strategy. These two approaches are tested in two different databases, one containing mild cognitive impairment (MCI) and healthy subjects (patients age: 71.9 ± 10.2, healthy subjects age: 71.7 ± 8.3), and the other containing Mild AD and healthy subjects (patients age: 77.6 ± 10.0 healthy subjects age: 69.4 ± 11.5). Main results. Using a single feature to compute CRs we achieve a performance of 78.33% for the MCI data set and of 97.56% for Mild AD. Results are clearly improved using the multiple feature classification, where a CR of 95% is found for the MCI data set using 11 features, and 100% for the Mild AD data set using four features. Significance. The new features selection method described in this work may be a reliable tool that could help to design a realistic system that does not require prior knowledge of a patient's status. With that aim, we explore the standardization of features for MCI and Mild AD data sets with promising results.
All set, indeed! N2pc components reveal simultaneous attentional control settings for multiple target colors.

PubMed

Grubert, Anna; Eimer, Martin

2016-08-01

To study whether top-down attentional control processes can be set simultaneously for different visual features, we employed a spatial cueing procedure to measure behavioral and electrophysiological markers of task-set contingent attentional capture during search for targets defined by 1 or 2 possible colors (one-color and two-color tasks). Search arrays were preceded by spatially nonpredictive color singleton cues. Behavioral spatial cueing effects indicative of attentional capture were elicited only by target-matching but not by distractor-color cues. However, when search displays contained 1 target-color and 1 distractor-color object among gray nontargets, N2pc components were triggered not only by target-color but also by distractor-color cues both in the one-color and two-color task, demonstrating that task-set nonmatching items attracted attention. When search displays contained 6 items in 6 different colors, so that participants had to adopt a fully feature-specific task set, the N2pc to distractor-color cues was eliminated in both tasks, indicating that nonmatching items were now successfully excluded from attentional processing. These results demonstrate that when observers adopt a feature-specific search mode, attentional task sets can be configured flexibly for multiple features within the same dimension, resulting in the rapid allocation of attention to task-set matching objects only. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
High-performance computer aided detection system for polyp detection in CT colonography with fluid and fecal tagging

NASA Astrophysics Data System (ADS)

Liu, Jiamin; Wang, Shijun; Kabadi, Suraj; Summers, Ronald M.

2009-02-01

CT colonography (CTC) is a feasible and minimally invasive method for the detection of colorectal polyps and cancer screening. Computer-aided detection (CAD) of polyps has improved consistency and sensitivity of virtual colonoscopy interpretation and reduced interpretation burden. A CAD system typically consists of four stages: (1) image preprocessing including colon segmentation; (2) initial detection generation; (3) feature selection; and (4) detection classification. In our experience, three existing problems limit the performance of our current CAD system. First, highdensity orally administered contrast agents in fecal-tagging CTC have scatter effects on neighboring tissues. The scattering manifests itself as an artificial elevation in the observed CT attenuation values of the neighboring tissues. This pseudo-enhancement phenomenon presents a problem for the application of computer-aided polyp detection, especially when polyps are submerged in the contrast agents. Second, general kernel approach for surface curvature computation in the second stage of our CAD system could yield erroneous results for thin structures such as small (6-9 mm) polyps and for touching structures such as polyps that lie on haustral folds. Those erroneous curvatures will reduce the sensitivity of polyp detection. The third problem is that more than 150 features are selected from each polyp candidate in the third stage of our CAD system. These high dimensional features make it difficult to learn a good decision boundary for detection classification and reduce the accuracy of predictions. Therefore, an improved CAD system for polyp detection in CTC data is proposed by introducing three new techniques. First, a scale-based scatter correction algorithm is applied to reduce pseudo-enhancement effects in the image pre-processing stage. Second, a cubic spline interpolation method is utilized to accurately estimate curvatures for initial detection generation. Third, a new dimensionality reduction classifier, diffusion map and local linear embedding (DMLLE), is developed for classification and false positives (FP) reduction. Performance of the improved CAD system is evaluated and compared with our existing CAD system (without applying those techniques) using CT scans of 1186 patients. These scans are divided into a training set and a test set. The sensitivity of the improved CAD system increased 18% on training data at a rate of 5 FPs per patient and 15% on test data at a rate of 5 FPs per patient. Our results indicated that the improved CAD system achieved significantly better performance on medium-sized colonic adenomas with higher sensitivity and lower FP rate in CTC.
Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set.

PubMed

Adler, Werner; Gefeller, Olaf; Gul, Asma; Horn, Folkert K; Khan, Zardad; Lausen, Berthold

2016-12-07

Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased performance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges. We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation. The data set consists of 102 topographical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma. In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies. The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of performance in a population with increased risk of glaucoma.
Slycat™ User Manual

DOE Office of Scientific and Technical Information (OSTI.GOV)

Crossno, Patricia J.; Gittinger, Jaxon; Hunt, Warren L.

Slycat™ is a web-based system for performing data analysis and visualization of potentially large quantities of remote, high-dimensional data. Slycat™ specializes in working with ensemble data. An ensemble is a group of related data sets, which typically consists of a set of simulation runs exploring the same problem space. An ensemble can be thought of as a set of samples within a multi-variate domain, where each sample is a vector whose value defines a point in high-dimensional space. To understand and describe the underlying problem being modeled in the simulations, ensemble analysis looks for shared behaviors and common features acrossmore » the group of runs. Additionally, ensemble analysis tries to quantify differences found in any members that deviate from the rest of the group. The Slycat™ system integrates data management, scalable analysis, and visualization. Results are viewed remotely on a user’s desktop via commodity web clients using a multi-tiered hierarchy of computation and data storage, as shown in Figure 1. Our goal is to operate on data as close to the source as possible, thereby reducing time and storage costs associated with data movement. Consequently, we are working to develop parallel analysis capabilities that operate on High Performance Computing (HPC) platforms, to explore approaches for reducing data size, and to implement strategies for staging computation across the Slycat™ hierarchy. Within Slycat™, data and visual analysis are organized around projects, which are shared by a project team. Project members are explicitly added, each with a designated set of permissions. Although users sign-in to access Slycat™, individual accounts are not maintained. Instead, authentication is used to determine project access. Within projects, Slycat™ models capture analysis results and enable data exploration through various visual representations. Although for scientists each simulation run is a model of real-world phenomena given certain conditions, we use the term model to refer to our modeling of the ensemble data, not the physics. Different model types often provide complementary perspectives on data features when analyzing the same data set. Each model visualizes data at several levels of abstraction, allowing the user to range from viewing the ensemble holistically to accessing numeric parameter values for a single run. Bookmarks provide a mechanism for sharing results, enabling interesting model states to be labeled and saved.« less
Automated aural classification used for inter-species discrimination of cetaceans.

PubMed

Binder, Carolyn M; Hines, Paul C

2014-04-01

Passive acoustic methods are in widespread use to detect and classify cetacean species; however, passive acoustic systems often suffer from large false detection rates resulting from numerous transient sources. To reduce the acoustic analyst workload, automatic recognition methods may be implemented in a two-stage process. First, a general automatic detector is implemented that produces many detections to ensure cetacean presence is noted. Then an automatic classifier is used to significantly reduce the number of false detections and classify the cetacean species. This process requires development of a robust classifier capable of performing inter-species classification. Because human analysts can aurally discriminate species, an automated aural classifier that uses perceptual signal features was tested on a cetacean data set. The classifier successfully discriminated between four species of cetaceans-bowhead, humpback, North Atlantic right, and sperm whales-with 85% accuracy. It also performed well (100% accuracy) for discriminating sperm whale clicks from right whale gunshots. An accuracy of 92% and area under the receiver operating characteristic curve of 0.97 were obtained for the relatively challenging bowhead and humpback recognition case. These results demonstrated that the perceptual features employed by the aural classifier provided powerful discrimination cues for inter-species classification of cetaceans.
Rural-to-urban migration and sexual debut in Thailand.

PubMed

Anglewicz, Philip; VanLandingham, Mark; Phuengsamran, Dusita

2014-10-01

Migration from one's parents' home and sexual debut are common features of the transition to adulthood. Although many studies have described both of these features independently, few have examined the relationship between migration and sexual debut in a systematic manner. In this study, we explore this link for young adults in Thailand. With relatively high rates of internal migration, rapid modernization, a moderate HIV epidemic, and a declining average age of sexual debut, Thailand presents an instructive environment in which to examine migration and sexual debut. We use two waves of a longitudinal data set (2005 and 2007) that includes a subsample of young adults who migrated to urban areas during that period. We identify characteristics and behaviors associated with sexual debut and examine the role of migration on debut. Our approach reduces several common sources of bias that hamper existing work on both migration and sexual debut: (1) the longitudinal nature of the data enables us to examine the effects of characteristics that predate both behaviors of interest; (2) the survey on sexual behavior employed a technique that reduces response bias; and (3) we examine differences in debut by marital status. We find that migrants have a higher likelihood of sexual debut than nonmigrants.
A fast, automated, polynomial-based cosmic ray spike-removal method for the high-throughput processing of Raman spectra.

PubMed

Schulze, H Georg; Turner, Robin F B

2013-04-01

Raman spectra often contain undesirable, randomly positioned, intense, narrow-bandwidth, positive, unidirectional spectral features generated when cosmic rays strike charge-coupled device cameras. These must be removed prior to analysis, but doing so manually is not feasible for large data sets. We developed a quick, simple, effective, semi-automated procedure to remove cosmic ray spikes from spectral data sets that contain large numbers of relatively homogenous spectra. Although some inhomogeneous spectral data sets can be accommodated--it requires replacing excessively modified spectra with the originals and removing their spikes with a median filter instead--caution is advised when processing such data sets. In addition, the technique is suitable for interpolating missing spectra or replacing aberrant spectra with good spectral estimates. The method is applied to baseline-flattened spectra and relies on fitting a third-order (or higher) polynomial through all the spectra at every wavenumber. Pixel intensities in excess of a threshold of 3× the noise standard deviation above the fit are reduced to the threshold level. Because only two parameters (with readily specified default values) might require further adjustment, the method is easily implemented for semi-automated processing of large spectral sets.
Decorrelation of the true and estimated classifier errors in high-dimensional settings.

PubMed

Hanczar, Blaise; Hua, Jianping; Dougherty, Edward R

2007-01-01

The aim of many microarray experiments is to build discriminatory diagnosis and prognosis models. Given the huge number of features and the small number of examples, model validity which refers to the precision of error estimation is a critical issue. Previous studies have addressed this issue via the deviation distribution (estimated error minus true error), in particular, the deterioration of cross-validation precision in high-dimensional settings where feature selection is used to mitigate the peaking phenomenon (overfitting). Because classifier design is based upon random samples, both the true and estimated errors are sample-dependent random variables, and one would expect a loss of precision if the estimated and true errors are not well correlated, so that natural questions arise as to the degree of correlation and the manner in which lack of correlation impacts error estimation. We demonstrate the effect of correlation on error precision via a decomposition of the variance of the deviation distribution, observe that the correlation is often severely decreased in high-dimensional settings, and show that the effect of high dimensionality on error estimation tends to result more from its decorrelating effects than from its impact on the variance of the estimated error. We consider the correlation between the true and estimated errors under different experimental conditions using both synthetic and real data, several feature-selection methods, different classification rules, and three error estimators commonly used (leave-one-out cross-validation, k-fold cross-validation, and .632 bootstrap). Moreover, three scenarios are considered: (1) feature selection, (2) known-feature set, and (3) all features. Only the first is of practical interest; however, the other two are needed for comparison purposes. We will observe that the true and estimated errors tend to be much more correlated in the case of a known feature set than with either feature selection or using all features, with the better correlation between the latter two showing no general trend, but differing for different models.
Acetylated Histone H3K9 is associated with meiotic recombination hotspots, and plays a role in recombination redundantly with other factors including the H3K4 methylase Set1 in fission yeast

PubMed Central

Yamada, Shintaro; Ohta, Kunihiro; Yamada, Takatomi

2013-01-01

Histone modifications are associated with meiotic recombination hotspots, discrete sites with augmented recombination frequency. For example, trimethylation of histone H3 lysine4 (H3K4me3) marks most hotspots in budding yeast and mouse. Modified histones are known to regulate meiotic recombination partly by promoting DNA double-strand break (DSB) formation at hotspots, but the role and precise landscape of involved modifications remain unclear. Here, we studied hotspot-associated modifications in fission yeast and found general features: acetylation of H3 lysine9 (H3K9ac) is elevated, and H3K4me3 is not significantly enriched. Mutating H3K9 to non-acetylatable alanine mildly reduced levels of the DSB-inducing protein Rec12 (the fission yeast homologue of Spo11) and DSB at hotspots, indicating that H3K9ac may be involved in DSB formation by enhancing the interaction between Rec12 and hotspots. In addition, we found that the lack of the H3K4 methyltransferase Set1 generally increased Rec12 binding to chromatin but partially reduced DSB formation at some loci, suggesting that Set1 is also involved in DSB formation. These results suggest that meiotic DSB formation is redundantly regulated by multiple chromatin-related factors including H3K9ac and Set1 in fission yeast. PMID:23382177
a Landmark Extraction Method Associated with Geometric Features and Location Distribution

NASA Astrophysics Data System (ADS)

Zhang, W.; Li, J.; Wang, Y.; Xiao, Y.; Liu, P.; Zhang, S.

2018-04-01

Landmark plays an important role in spatial cognition and spatial knowledge organization. Significance measuring model is the main method of landmark extraction. It is difficult to take account of the spatial distribution pattern of landmarks because that the significance of landmark is built in one-dimensional space. In this paper, we start with the geometric features of the ground object, an extraction method based on the target height, target gap and field of view is proposed. According to the influence region of Voronoi Diagram, the description of target gap is established to the geometric representation of the distribution of adjacent targets. Then, segmentation process of the visual domain of Voronoi K order adjacent is given to set up target view under the multi view; finally, through three kinds of weighted geometric features, the landmarks are identified. Comparative experiments show that this method has a certain coincidence degree with the results of traditional significance measuring model, which verifies the effectiveness and reliability of the method and reduces the complexity of landmark extraction process without losing the reference value of landmark.
Deep Convolutional Extreme Learning Machine and Its Application in Handwritten Digit Classification

PubMed Central

Yang, Xinyi

2016-01-01

In recent years, some deep learning methods have been developed and applied to image classification applications, such as convolutional neuron network (CNN) and deep belief network (DBN). However they are suffering from some problems like local minima, slow convergence rate, and intensive human intervention. In this paper, we propose a rapid learning method, namely, deep convolutional extreme learning machine (DC-ELM), which combines the power of CNN and fast training of ELM. It uses multiple alternate convolution layers and pooling layers to effectively abstract high level features from input images. Then the abstracted features are fed to an ELM classifier, which leads to better generalization performance with faster learning speed. DC-ELM also introduces stochastic pooling in the last hidden layer to reduce dimensionality of features greatly, thus saving much training time and computation resources. We systematically evaluated the performance of DC-ELM on two handwritten digit data sets: MNIST and USPS. Experimental results show that our method achieved better testing accuracy with significantly shorter training time in comparison with deep learning methods and other ELM methods. PMID:27610128
Illumination-invariant and deformation-tolerant inner knuckle print recognition using portable devices.

PubMed

Xu, Xuemiao; Jin, Qiang; Zhou, Le; Qin, Jing; Wong, Tien-Tsin; Han, Guoqiang

2015-02-12

We propose a novel biometric recognition method that identifies the inner knuckle print (IKP). It is robust enough to confront uncontrolled lighting conditions, pose variations and low imaging quality. Such robustness is crucial for its application on portable devices equipped with consumer-level cameras. We achieve this robustness by two means. First, we propose a novel feature extraction scheme that highlights the salient structure and suppresses incorrect and/or unwanted features. The extracted IKP features retain simple geometry and morphology and reduce the interference of illumination. Second, to counteract the deformation induced by different hand orientations, we propose a novel structure-context descriptor based on local statistics. To our best knowledge, we are the first to simultaneously consider the illumination invariance and deformation tolerance for appearance-based low-resolution hand biometrics. Settings in previous works are more restrictive. They made strong assumptions either about the illumination condition or the restrictive hand orientation. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods in terms of recognition accuracy, especially under uncontrolled lighting conditions and the flexible hand orientation requirement.
Illumination-Invariant and Deformation-Tolerant Inner Knuckle Print Recognition Using Portable Devices

PubMed Central

Xu, Xuemiao; Jin, Qiang; Zhou, Le; Qin, Jing; Wong, Tien-Tsin; Han, Guoqiang

2015-01-01

We propose a novel biometric recognition method that identifies the inner knuckle print (IKP). It is robust enough to confront uncontrolled lighting conditions, pose variations and low imaging quality. Such robustness is crucial for its application on portable devices equipped with consumer-level cameras. We achieve this robustness by two means. First, we propose a novel feature extraction scheme that highlights the salient structure and suppresses incorrect and/or unwanted features. The extracted IKP features retain simple geometry and morphology and reduce the interference of illumination. Second, to counteract the deformation induced by different hand orientations, we propose a novel structure-context descriptor based on local statistics. To our best knowledge, we are the first to simultaneously consider the illumination invariance and deformation tolerance for appearance-based low-resolution hand biometrics. Settings in previous works are more restrictive. They made strong assumptions either about the illumination condition or the restrictive hand orientation. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods in terms of recognition accuracy, especially under uncontrolled lighting conditions and the flexible hand orientation requirement. PMID:25686317
OntoPop: An Ontology Population System for the Semantic Web

NASA Astrophysics Data System (ADS)

Thongkrau, Theerayut; Lalitrojwong, Pattarachai

The development of ontology at the instance level requires the extraction of the terms defining the instances from various data sources. These instances then are linked to the concepts of the ontology, and relationships are created between these instances for the next step. However, before establishing links among data, ontology engineers must classify terms or instances from a web document into an ontology concept. The tool for help ontology engineer in this task is called ontology population. The present research is not suitable for ontology development applications, such as long time processing or analyzing large or noisy data sets. OntoPop system introduces a methodology to solve these problems, which comprises two parts. First, we select meaningful features from syntactic relations, which can produce more significant features than any other method. Second, we differentiate feature meaning and reduce noise based on latent semantic analysis. Experimental evaluation demonstrates that the OntoPop works well, significantly out-performing the accuracy of 49.64%, a learning accuracy of 76.93%, and executes time of 5.46 second/instance.
Retrosplenial Cortex Codes for Permanent Landmarks

PubMed Central

Auger, Stephen D.; Mullally, Sinéad L.; Maguire, Eleanor A.

2012-01-01

Landmarks are critical components of our internal representation of the environment, yet their specific properties are rarely studied, and little is known about how they are processed in the brain. Here we characterised a large set of landmarks along a range of features that included size, visual salience, navigational utility, and permanence. When human participants viewed images of these single landmarks during functional magnetic resonance imaging (fMRI), parahippocampal cortex (PHC) and retrosplenial cortex (RSC) were both engaged by landmark features, but in different ways. PHC responded to a range of landmark attributes, while RSC was engaged by only the most permanent landmarks. Furthermore, when participants were divided into good and poor navigators, the latter were significantly less reliable at identifying the most permanent landmarks, and had reduced responses in RSC and anterodorsal thalamus when viewing such landmarks. The RSC has been widely implicated in navigation but its precise role remains uncertain. Our findings suggest that a primary function of the RSC may be to process the most stable features in an environment, and this could be a prerequisite for successful navigation. PMID:22912894
Deep Convolutional Extreme Learning Machine and Its Application in Handwritten Digit Classification.

PubMed

Pang, Shan; Yang, Xinyi

2016-01-01

In recent years, some deep learning methods have been developed and applied to image classification applications, such as convolutional neuron network (CNN) and deep belief network (DBN). However they are suffering from some problems like local minima, slow convergence rate, and intensive human intervention. In this paper, we propose a rapid learning method, namely, deep convolutional extreme learning machine (DC-ELM), which combines the power of CNN and fast training of ELM. It uses multiple alternate convolution layers and pooling layers to effectively abstract high level features from input images. Then the abstracted features are fed to an ELM classifier, which leads to better generalization performance with faster learning speed. DC-ELM also introduces stochastic pooling in the last hidden layer to reduce dimensionality of features greatly, thus saving much training time and computation resources. We systematically evaluated the performance of DC-ELM on two handwritten digit data sets: MNIST and USPS. Experimental results show that our method achieved better testing accuracy with significantly shorter training time in comparison with deep learning methods and other ELM methods.

Efficient Spatio-Temporal Local Binary Patterns for Spontaneous Facial Micro-Expression Recognition

PubMed Central

Wang, Yandan; See, John; Phan, Raphael C.-W.; Oh, Yee-Hui

2015-01-01

Micro-expression recognition is still in the preliminary stage, owing much to the numerous difficulties faced in the development of datasets. Since micro-expression is an important affective clue for clinical diagnosis and deceit analysis, much effort has gone into the creation of these datasets for research purposes. There are currently two publicly available spontaneous micro-expression datasets—SMIC and CASME II, both with baseline results released using the widely used dynamic texture descriptor LBP-TOP for feature extraction. Although LBP-TOP is popular and widely used, it is still not compact enough. In this paper, we draw further inspiration from the concept of LBP-TOP that considers three orthogonal planes by proposing two efficient approaches for feature extraction. The compact robust form described by the proposed LBP-Six Intersection Points (SIP) and a super-compact LBP-Three Mean Orthogonal Planes (MOP) not only preserves the essential patterns, but also reduces the redundancy that affects the discriminality of the encoded features. Through a comprehensive set of experiments, we demonstrate the strengths of our approaches in terms of recognition accuracy and efficiency. PMID:25993498
Annotation of phenotypic diversity: decoupling data curation and ontology curation using Phenex.

PubMed

Balhoff, James P; Dahdul, Wasila M; Dececchi, T Alexander; Lapp, Hilmar; Mabee, Paula M; Vision, Todd J

2014-01-01

Phenex (http://phenex.phenoscape.org/) is a desktop application for semantically annotating the phenotypic character matrix datasets common in evolutionary biology. Since its initial publication, we have added new features that address several major bottlenecks in the efficiency of the phenotype curation process: allowing curators during the data curation phase to provisionally request terms that are not yet available from a relevant ontology; supporting quality control against annotation guidelines to reduce later manual review and revision; and enabling the sharing of files for collaboration among curators. We decoupled data annotation from ontology development by creating an Ontology Request Broker (ORB) within Phenex. Curators can use the ORB to request a provisional term for use in data annotation; the provisional term can be automatically replaced with a permanent identifier once the term is added to an ontology. We added a set of annotation consistency checks to prevent common curation errors, reducing the need for later correction. We facilitated collaborative editing by improving the reliability of Phenex when used with online folder sharing services, via file change monitoring and continual autosave. With the addition of these new features, and in particular the Ontology Request Broker, Phenex users have been able to focus more effectively on data annotation. Phenoscape curators using Phenex have reported a smoother annotation workflow, with much reduced interruptions from ontology maintenance and file management issues.
Association of mammographic image feature change and an increasing risk trend of developing breast cancer: an assessment

NASA Astrophysics Data System (ADS)

Tan, Maxine; Leader, Joseph K.; Liu, Hong; Zheng, Bin

2015-03-01

We recently investigated a new mammographic image feature based risk factor to predict near-term breast cancer risk after a woman has a negative mammographic screening. We hypothesized that unlike the conventional epidemiology-based long-term (or lifetime) risk factors, the mammographic image feature based risk factor value will increase as the time lag between the negative and positive mammography screening decreases. The purpose of this study is to test this hypothesis. From a large and diverse full-field digital mammography (FFDM) image database with 1278 cases, we collected all available sequential FFDM examinations for each case including the "current" and 1 to 3 most recently "prior" examinations. All "prior" examinations were interpreted negative, and "current" ones were either malignant or recalled negative/benign. We computed 92 global mammographic texture and density based features, and included three clinical risk factors (woman's age, family history and subjective breast density BIRADS ratings). On this initial feature set, we applied a fast and accurate Sequential Forward Floating Selection (SFFS) feature selection algorithm to reduce feature dimensionality. The features computed on both mammographic views were individually/ separately trained using two artificial neural network (ANN) classifiers. The classification scores of the two ANNs were then merged with a sequential ANN. The results show that the maximum adjusted odds ratios were 5.59, 7.98, and 15.77 for using the 3rd, 2nd, and 1st "prior" FFDM examinations, respectively, which demonstrates a higher association of mammographic image feature change and an increasing risk trend of developing breast cancer in the near-term after a negative screening.
Setting conservation targets for sandy beach ecosystems

NASA Astrophysics Data System (ADS)

Harris, Linda; Nel, Ronel; Holness, Stephen; Sink, Kerry; Schoeman, David

2014-10-01

Representative and adequate reserve networks are key to conserving biodiversity. This begs the question, how much of which features need to be placed in protected areas? Setting specifically-derived conservation targets for most ecosystems is common practice; however, this has never been done for sandy beaches. The aims of this paper, therefore, are to propose a methodology for setting conservation targets for sandy beach ecosystems; and to pilot the proposed method using data describing biodiversity patterns and processes from microtidal beaches in South Africa. First, a classification scheme of valued features of beaches is constructed, including: biodiversity features; unique features; and important processes. Second, methodologies for setting targets for each feature under different data-availability scenarios are described. From this framework, targets are set for features characteristic of microtidal beaches in South Africa, as follows. 1) Targets for dune vegetation types were adopted from a previous assessment, and ranged 19-100%. 2) Targets for beach morphodynamic types (habitats) were set using species-area relationships (SARs). These SARs were derived from species richness data from 142 sampling events around the South African coast (extrapolated to total theoretical species richness estimates using previously-established species-accumulation curve relationships), plotted against the area of the beach (calculated from Google Earth imagery). The species-accumulation factor (z) was 0.22, suggesting a baseline habitat target of 27% is required to protect 75% of the species. This baseline target was modified by heuristic principles, based on habitat rarity and threat status, with final values ranging 27-40%. 3) Species targets were fixed at 20%, modified using heuristic principles based on endemism, threat status, and whether or not beaches play an important role in the species' life history, with targets ranging 20-100%. 4) Targets for processes and 5) important assemblages were set at 50%, following other studies. 6) Finally, a target for an outstanding feature (the Alexandria dunefield) was set at 80% because of its national, international and ecological importance. The greatest shortfall in the current target-setting process is in the lack of empirical models describing the key beach processes, from which robust ecological thresholds can be derived. As for many other studies, our results illustrate that the conservation target of 10% for coastal and marine systems proposed by the Convention on Biological Diversity is too low to conserve sandy beaches and their biota.
Reproducibility and Prognosis of Quantitative Features Extracted from CT Images12

PubMed Central

Balagurunathan, Yoganand; Gu, Yuhua; Wang, Hua; Kumar, Virendra; Grove, Olya; Hawkins, Sam; Kim, Jongphil; Goldgof, Dmitry B; Hall, Lawrence O; Gatenby, Robert A; Gillies, Robert J

2014-01-01

We study the reproducibility of quantitative imaging features that are used to describe tumor shape, size, and texture from computed tomography (CT) scans of non-small cell lung cancer (NSCLC). CT images are dependent on various scanning factors. We focus on characterizing image features that are reproducible in the presence of variations due to patient factors and segmentation methods. Thirty-two NSCLC nonenhanced lung CT scans were obtained from the Reference Image Database to Evaluate Response data set. The tumors were segmented using both manual (radiologist expert) and ensemble (software-automated) methods. A set of features (219 three-dimensional and 110 two-dimensional) was computed, and quantitative image features were statistically filtered to identify a subset of reproducible and nonredundant features. The variability in the repeated experiment was measured by the test-retest concordance correlation coefficient (CCCTreT). The natural range in the features, normalized to variance, was measured by the dynamic range (DR). In this study, there were 29 features across segmentation methods found with CCCTreT and DR ≥ 0.9 and R2Bet ≥ 0.95. These reproducible features were tested for predicting radiologist prognostic score; some texture features (run-length and Laws kernels) had an area under the curve of 0.9. The representative features were tested for their prognostic capabilities using an independent NSCLC data set (59 lung adenocarcinomas), where one of the texture features, run-length gray-level nonuniformity, was statistically significant in separating the samples into survival groups (P ≤ .046). PMID:24772210
Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure-Property Relationships.

PubMed

Janet, Jon Paul; Kulik, Heather J

2017-11-22

Machine learning (ML) of quantum mechanical properties shows promise for accelerating chemical discovery. For transition metal chemistry where accurate calculations are computationally costly and available training data sets are small, the molecular representation becomes a critical ingredient in ML model predictive accuracy. We introduce a series of revised autocorrelation functions (RACs) that encode relationships of the heuristic atomic properties (e.g., size, connectivity, and electronegativity) on a molecular graph. We alter the starting point, scope, and nature of the quantities evaluated in standard ACs to make these RACs amenable to inorganic chemistry. On an organic molecule set, we first demonstrate superior standard AC performance to other presently available topological descriptors for ML model training, with mean unsigned errors (MUEs) for atomization energies on set-aside test molecules as low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs on set-aside test molecules in spin-state splitting in comparison to 15-20× higher errors for feature sets that encode whole-molecule structural information. Systematic feature selection methods including univariate filtering, recursive feature elimination, and direct optimization (e.g., random forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5× smaller than the full RAC set produce sub- to 1 kcal/mol spin-splitting MUEs, with good transferability to metal-ligand bond length prediction (0.004-5 Å MUE) and redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature selection results across property sets reveals the relative importance of local, electronic descriptors (e.g., electronegativity, atomic number) in spin-splitting and distal, steric effects in redox potential and bond lengths.
Set of Frequent Word Item sets as Feature Representation for Text with Indonesian Slang

NASA Astrophysics Data System (ADS)

Sa'adillah Maylawati, Dian; Putri Saptawati, G. A.

2017-01-01

Indonesian slang are commonly used in social media. Due to their unstructured syntax, it is difficult to extract their features based on Indonesian grammar for text mining. To do so, we propose Set of Frequent Word Item sets (SFWI) as text representation which is considered match for Indonesian slang. Besides, SFWI is able to keep the meaning of Indonesian slang with regard to the order of appearance sentence. We use FP-Growth algorithm with adding separation sentence function into the algorithm to extract the feature of SFWI. The experiments is done with text data from social media such as Facebook, Twitter, and personal website. The result of experiments shows that Indonesian slang were more correctly interpreted based on SFWI.
Achieving reutilization of scheduling software through abstraction and generalization

NASA Technical Reports Server (NTRS)

Wilkinson, George J.; Monteleone, Richard A.; Weinstein, Stuart M.; Mohler, Michael G.; Zoch, David R.; Tong, G. Michael

1995-01-01

Reutilization of software is a difficult goal to achieve particularly in complex environments that require advanced software systems. The Request-Oriented Scheduling Engine (ROSE) was developed to create a reusable scheduling system for the diverse scheduling needs of the National Aeronautics and Space Administration (NASA). ROSE is a data-driven scheduler that accepts inputs such as user activities, available resources, timing contraints, and user-defined events, and then produces a conflict-free schedule. To support reutilization, ROSE is designed to be flexible, extensible, and portable. With these design features, applying ROSE to a new scheduling application does not require changing the core scheduling engine, even if the new application requires significantly larger or smaller data sets, customized scheduling algorithms, or software portability. This paper includes a ROSE scheduling system description emphasizing its general-purpose features, reutilization techniques, and tasks for which ROSE reuse provided a low-risk solution with significant cost savings and reduced software development time.
Preliminary Planck constant measurements via UME oscillating magnet Kibble balance

NASA Astrophysics Data System (ADS)

Ahmedov, H.; Babayiğit Aşkın, N.; Korutlu, B.; Orhan, R.

2018-06-01

The UME Kibble balance project was initiated in the second half of 2014. During this period we have studied the theoretical aspects of Kibble balances, in which an oscillating magnet generates AC Faraday’s voltage in a stationary coil, and constructed a trial version to implement this idea. The remarkable feature of this approach is that it can establish the link between the Planck constant and a macroscopic mass by one single experiment in the most natural way. Weak dependences on variations of environmental and experimental conditions, small size, and other useful features offered by this novel approach reduce the complexity of the experimental set-up. This paper describes the principles of the oscillating magnet Kibble balance and gives details of the preliminary Planck constant measurements. The value of the Planck constant determined with our apparatus is \\boldsymbol{h}/{{\\boldsymbol{h}}\\boldsymbol 90}={1}{.000} {004}~ , with a relative standard uncertainty of 6 ppm.
Use of volumetric features for temporal comparison of mass lesions in full field digital mammograms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bozek, Jelena, E-mail: jelena.bozek@fer.hr; Grgic, Mislav; Kallenberg, Michiel

2014-02-15

Purpose: Temporal comparison of lesions might improve classification between benign and malignant lesions in full-field digital mammograms (FFDM). The authors compare the use of volumetric features for lesion classification, which are computed from dense tissue thickness maps, to the use of mammographic lesion area. Use of dense tissue thickness maps for lesion characterization is advantageous, since it results in lesion features that are invariant to acquisition parameters. Methods: The dataset used in the analysis consisted of 60 temporal mammogram pairs comprising 120 mediolateral oblique or craniocaudal views with a total of 65 lesions, of which 41 were benign and 24more » malignant. The authors analyzed the performance of four volumetric features, area, and four other commonly used features obtained from temporal mammogram pairs, current mammograms, and prior mammograms. The authors evaluated the individual performance of all features and of different feature sets. The authors used linear discriminant analysis with leave-one-out cross validation to classify different feature sets. Results: Volumetric features from temporal mammogram pairs achieved the best individual performance, as measured by the area under the receiver operating characteristic curve (A{sub z} value). Volume change (A{sub z} = 0.88) achieved higher A{sub z} value than projected lesion area change (A{sub z} = 0.78) in the temporal comparison of lesions. Best performance was achieved with a set that consisted of a set of features extracted from the current exam combined with four volumetric features representing changes with respect to the prior mammogram (A{sub z} = 0.90). This was significantly better (p = 0.005) than the performance obtained using features from the current exam only (A{sub z} = 0.77). Conclusions: Volumetric features from temporal mammogram pairs combined with features from the single exam significantly improve discrimination of benign and malignant lesions in FFDM mammograms compared to using only single exam features. In the comparison with prior mammograms, use of volumetric change may lead to better performance than use of lesion area change.« less
Mass detection in digital breast tomosynthesis: Deep convolutional neural network with transfer learning from mammography.

PubMed

Samala, Ravi K; Chan, Heang-Ping; Hadjiiski, Lubomir; Helvie, Mark A; Wei, Jun; Cha, Kenny

2016-12-01

Develop a computer-aided detection (CAD) system for masses in digital breast tomosynthesis (DBT) volume using a deep convolutional neural network (DCNN) with transfer learning from mammograms. A data set containing 2282 digitized film and digital mammograms and 324 DBT volumes were collected with IRB approval. The mass of interest on the images was marked by an experienced breast radiologist as reference standard. The data set was partitioned into a training set (2282 mammograms with 2461 masses and 230 DBT views with 228 masses) and an independent test set (94 DBT views with 89 masses). For DCNN training, the region of interest (ROI) containing the mass (true positive) was extracted from each image. False positive (FP) ROIs were identified at prescreening by their previously developed CAD systems. After data augmentation, a total of 45 072 mammographic ROIs and 37 450 DBT ROIs were obtained. Data normalization and reduction of non-uniformity in the ROIs across heterogeneous data was achieved using a background correction method applied to each ROI. A DCNN with four convolutional layers and three fully connected (FC) layers was first trained on the mammography data. Jittering and dropout techniques were used to reduce overfitting. After training with the mammographic ROIs, all weights in the first three convolutional layers were frozen, and only the last convolution layer and the FC layers were randomly initialized again and trained using the DBT training ROIs. The authors compared the performances of two CAD systems for mass detection in DBT: one used the DCNN-based approach and the other used their previously developed feature-based approach for FP reduction. The prescreening stage was identical in both systems, passing the same set of mass candidates to the FP reduction stage. For the feature-based CAD system, 3D clustering and active contour method was used for segmentation; morphological, gray level, and texture features were extracted and merged with a linear discriminant classifier to score the detected masses. For the DCNN-based CAD system, ROIs from five consecutive slices centered at each candidate were passed through the trained DCNN and a mass likelihood score was generated. The performances of the CAD systems were evaluated using free-response ROC curves and the performance difference was analyzed using a non-parametric method. Before transfer learning, the DCNN trained only on mammograms with an AUC of 0.99 classified DBT masses with an AUC of 0.81 in the DBT training set. After transfer learning with DBT, the AUC improved to 0.90. For breast-based CAD detection in the test set, the sensitivity for the feature-based and the DCNN-based CAD systems was 83% and 91%, respectively, at 1 FP/DBT volume. The difference between the performances for the two systems was statistically significant (p-value < 0.05). The image patterns learned from the mammograms were transferred to the mass detection on DBT slices through the DCNN. This study demonstrated that large data sets collected from mammography are useful for developing new CAD systems for DBT, alleviating the problem and effort of collecting entirely new large data sets for the new modality.
Complex Topographic Feature Ontology Patterns

USGS Publications Warehouse

Varanka, Dalia E.; Jerris, Thomas J.

2015-01-01

Semantic ontologies are examined as effective data models for the representation of complex topographic feature types. Complex feature types are viewed as integrated relations between basic features for a basic purpose. In the context of topographic science, such component assemblages are supported by resource systems and found on the local landscape. Ontologies are organized within six thematic modules of a domain ontology called Topography that includes within its sphere basic feature types, resource systems, and landscape types. Context is constructed not only as a spatial and temporal setting, but a setting also based on environmental processes. Types of spatial relations that exist between components include location, generative processes, and description. An example is offered in a complex feature type ‘mine.’ The identification and extraction of complex feature types are an area for future research.
Clinical Decision Support in Electronic Prescribing: Recommendations and an Action Plan

PubMed Central

Teich, Jonathan M.; Osheroff, Jerome A.; Pifer, Eric A.; Sittig, Dean F.; Jenders, Robert A.

2005-01-01

Clinical decision support (CDS) in electronic prescribing (eRx) systems can improve the safety, quality, efficiency, and cost-effectiveness of care. However, at present, these potential benefits have not been fully realized. In this consensus white paper, we set forth recommendations and action plans in three critical domains: (1) advances in system capabilities, including basic and advanced sets of CDS interventions and knowledge, supporting database elements, operational features to improve usability and measure performance, and management and governance structures; (2) uniform standards, vocabularies, and centralized knowledge structures and services that could reduce rework by vendors and care providers, improve dissemination of well-constructed CDS interventions, promote generally applicable research in CDS methods, and accelerate the movement of new medical knowledge from research to practice; and (3) appropriate financial and legal incentives to promote adoption. PMID:15802474
Robust tumor morphometry in multispectral fluorescence microscopy

NASA Astrophysics Data System (ADS)

Tabesh, Ali; Vengrenyuk, Yevgen; Teverovskiy, Mikhail; Khan, Faisal M.; Sapir, Marina; Powell, Douglas; Mesa-Tejada, Ricardo; Donovan, Michael J.; Fernandez, Gerardo

2009-02-01

Morphological and architectural characteristics of primary tissue compartments, such as epithelial nuclei (EN) and cytoplasm, provide important cues for cancer diagnosis, prognosis, and therapeutic response prediction. We propose two feature sets for the robust quantification of these characteristics in multiplex immunofluorescence (IF) microscopy images of prostate biopsy specimens. To enable feature extraction, EN and cytoplasm regions were first segmented from the IF images. Then, feature sets consisting of the characteristics of the minimum spanning tree (MST) connecting the EN and the fractal dimension (FD) of gland boundaries were obtained from the segmented compartments. We demonstrated the utility of the proposed features in prostate cancer recurrence prediction on a multi-institution cohort of 1027 patients. Univariate analysis revealed that both FD and one of the MST features were highly effective for predicting cancer recurrence (p <= 0.0001). In multivariate analysis, an MST feature was selected for a model incorporating clinical and image features. The model achieved a concordance index (CI) of 0.73 on the validation set, which was significantly higher than the CI of 0.69 for the standard multivariate model based solely on clinical features currently used in clinical practice (p < 0.0001). The contributions of this work are twofold. First, it is the first demonstration of the utility of the proposed features in morphometric analysis of IF images. Second, this is the largest scale study of the efficacy and robustness of the proposed features in prostate cancer prognosis.
Computer-aided diagnosis of prostate cancer using multi-parametric MRI: comparison between PUN and Tofts models

NASA Astrophysics Data System (ADS)

Mazzetti, S.; Giannini, V.; Russo, F.; Regge, D.

2018-05-01

Computer-aided diagnosis (CAD) systems are increasingly being used in clinical settings to report multi-parametric magnetic resonance imaging (mp-MRI) of the prostate. Usually, CAD systems automatically highlight cancer-suspicious regions to the radiologist, reducing reader variability and interpretation errors. Nevertheless, implementing this software requires the selection of which mp-MRI parameters can best discriminate between malignant and non-malignant regions. To exploit functional information, some parameters are derived from dynamic contrast-enhanced (DCE) acquisitions. In particular, much CAD software employs pharmacokinetic features, such as K trans and k ep, derived from the Tofts model, to estimate a likelihood map of malignancy. However, non-pharmacokinetic models can be also used to describe DCE-MRI curves, without any requirement for prior knowledge or measurement of the arterial input function, which could potentially lead to large errors in parameter estimation. In this work, we implemented an empirical function derived from the phenomenological universalities (PUN) class to fit DCE-MRI. The parameters of the PUN model are used in combination with T2-weighted and diffusion-weighted acquisitions to feed a support vector machine classifier to produce a voxel-wise malignancy likelihood map of the prostate. The results were all compared to those for a CAD system based on Tofts pharmacokinetic features to describe DCE-MRI curves, using different quality aspects of image segmentation, while also evaluating the number and size of false positive (FP) candidate regions. This study included 61 patients with 70 biopsy-proven prostate cancers (PCa). The metrics used to evaluate segmentation quality between the two CAD systems were not statistically different, although the PUN-based CAD reported a lower number of FP, with reduced size compared to the Tofts-based CAD. In conclusion, the CAD software based on PUN parameters is a feasible means with which to detect PCa, without affecting segmentation quality, and hence it could be successfully applied in clinical settings, improving the automated diagnosis process and reducing computational complexity.
Two-speed phacoemulsification for soft cataracts using optimized parameters and procedure step toolbar with the CENTURION Vision System and Balanced Tip

PubMed Central

Davison, James A

2015-01-01

Purpose To present a cause of posterior capsule aspiration and a technique using optimized parameters to prevent it from happening when operating soft cataracts. Patients and methods A prospective list of posterior capsule aspiration cases was kept over 4,062 consecutive cases operated with the Alcon CENTURION machine and Balanced Tip. Video analysis of one case of posterior capsule aspiration was accomplished. A surgical technique was developed using empirically derived machine parameters and customized setting-selection procedure step toolbar to reduce the pace of aspiration of soft nuclear quadrants in order to prevent capsule aspiration. Results Two cases out of 3,238 experienced posterior capsule aspiration before use of the soft quadrant technique. Video analysis showed an attractive vortex effect with capsule aspiration occurring in 1/5 of a second. A soft quadrant removal setting was empirically derived which had a slower pace and seemed more controlled with no capsule aspiration occurring in the subsequent 824 cases. The setting featured simultaneous linear control from zero to preset maximums for: aspiration flow, 20 mL/min; and vacuum, 400 mmHg, with the addition of torsional tip amplitude up to 20% after the fluidic maximums were achieved. A new setting selection procedure step toolbar was created to increase intraoperative flexibility by providing instantaneous shifting between the soft and normal settings. Conclusion A technique incorporating a reduced pace for soft quadrant acquisition and aspiration can be accomplished through the use of a dedicated setting of integrated machine parameters. Toolbar placement of the procedure button next to the normal setting procedure button provides the opportunity to instantaneously alternate between the two settings. Simultaneous surgeon control over vacuum, aspiration flow, and torsional tip motion may make removal of soft nuclear quadrants more efficient and safer. PMID:26355695
Cross-Layer Protocol Combining Tree Routing and TDMA Slotting in Wireless Sensor Networks

NASA Astrophysics Data System (ADS)

Bai, Ronggang; Ji, Yusheng; Lin, Zhiting; Wang, Qinghua; Zhou, Xiaofang; Qu, Yugui; Zhao, Baohua

Being different from other networks, the load and direction of data traffic for wireless sensor networks are rather predictable. The relationships between nodes are cooperative rather than competitive. These features allow the design approach of a protocol stack to be able to use the cross-layer interactive way instead of a hierarchical structure. The proposed cross-layer protocol CLWSN optimizes the channel allocation in the MAC layer using the information from the routing tables, reduces the conflicting set, and improves the throughput. Simulations revealed that it outperforms SMAC and MINA in terms of delay and energy consumption.
Results of acoustic testing of the JT8D-109 refan engines

NASA Technical Reports Server (NTRS)

Burdsall, E. A.; Brochu, F. P.; Scaramella, V. M.

1975-01-01

A JT8D engine was modified to reduce jet noise levels by 6-8 PNdB at takeoff power without increasing fan generated noise levels. Designated the JT8D-109, the modified engines featured a larger single stage fan, and acoustic treatment in the fan discharge ducts. Noise levels were measured on an outdoor test facility for eight engine/acoustic treatment configurations. Compared to the baseline JT8D, the fully treated JT8D-109 showed reductions of 6 PNdB at takeoff, and 11 PNdB at a typical approach power setting.
Living in cities, naturally.

PubMed

Hartig, Terry; Kahn, Peter H

2016-05-20

Natural features, settings, and processes in urban areas can help to reduce stress associated with urban life. In this and other ways, public health benefits from, street trees, green roofs, community gardens, parks and open spaces, and extensive connective pathways for walking and biking. Such urban design provisions can also yield ecological benefits, not only directly but also through the role they play in shaping attitudes toward the environment and environmental protection. Knowledge of the psychological benefits of nature experience supports efforts to better integrate nature into the architecture, infrastructure, and public spaces of urban areas. Copyright © 2016, American Association for the Advancement of Science.
Perfection and Entry: An Example,

DTIC Science & Technology

1983-07-01

p(t),a(t)) 0 I t=fi I (2) - t-1 H ([p,a]) = (I - C) I h (p(t),a(t)) E t-1 E The stationary equilibrium outcomes of such a game are those strategy...aSO41 SuOwe seP! JO ague4:pxO a4) aMP)! 03 s! asodind Ja4aIL ye~s leuossajoid sll ol asias e se uoiejodioj) puell a4. Aq partssi ate siaded saiuaS...does not 1). reduce the set of equilibrium outcomes in the discounted gamelJ. The essential features of the market situation required to produce the phe

Corresponding-states behavior of SPC/E-based modified (bent and hybrid) water models

NASA Astrophysics Data System (ADS)

Weiss, Volker C.

2017-02-01

The remarkable and sometimes anomalous properties of water can be traced back at the molecular level to the tetrahedral coordination of molecules due to the ability of a water molecule to form four hydrogen bonds to its neighbors; this feature allows for the formation of a network that greatly influences the thermodynamic behavior. Computer simulations are becoming increasingly important for our understanding of water. Molecular models of water, such as SPC/E, are needed for this purpose, and they have proved to capture many important features of real water. Modifications of the SPC/E model have been proposed, some changing the H-O-H angle (bent models) and others increasing the importance of dispersion interactions (hybrid models), to study the structural features that set water apart from other polar fluids and from simple fluids such as argon. Here, we focus on the properties at liquid-vapor equilibrium and study the coexistence curve, the interfacial tension, and the vapor pressure in a corresponding-states approach. In particular, we calculate Guggenheim's ratio for the reduced apparent enthalpy of vaporization and Guldberg's ratio for the reduced normal boiling point. This analysis offers additional insight from a more macroscopic, thermodynamic perspective and augments that which has already been learned at the molecular level from simulations. In the hybrid models, the relative importance of dispersion interactions is increased, which turns the modified water into a Lennard-Jones-like fluid. Consequently, in a corresponding-states framework, the typical behavior of simple fluids, such as argon, is seen to be approached asymptotically. For the bent models, decreasing the bond angle turns the model essentially into a polar diatomic fluid in which the particles form linear molecular arrangements; as a consequence, characteristic features of the corresponding-states behavior of hydrogen halides emerge.
Features extraction of EMG signal using time domain analysis for arm rehabilitation device

NASA Astrophysics Data System (ADS)

Jali, Mohd Hafiz; Ibrahim, Iffah Masturah; Sulaima, Mohamad Fani; Bukhari, W. M.; Izzuddin, Tarmizi Ahmad; Nasir, Mohamad Na'im

2015-05-01

Rehabilitation device is used as an exoskeleton for people who had failure of their limb. Arm rehabilitation device may help the rehab program whom suffers from arm disability. The device that is used to facilitate the tasks of the program should improve the electrical activity in the motor unit and minimize the mental effort of the user. Electromyography (EMG) is the techniques to analyze the presence of electrical activity in musculoskeletal systems. The electrical activity in muscles of disable person is failed to contract the muscle for movements. In order to prevent the muscles from paralysis becomes spasticity, the force of movements should minimize the mental efforts. Therefore, the rehabilitation device should analyze the surface EMG signal of normal people that can be implemented to the device. The signal is collected according to procedure of surface electromyography for non-invasive assessment of muscles (SENIAM). The EMG signal is implemented to set the movements' pattern of the arm rehabilitation device. The filtered EMG signal was extracted for features of Standard Deviation (STD), Mean Absolute Value (MAV) and Root Mean Square (RMS) in time-domain. The extraction of EMG data is important to have the reduced vector in the signal features with less of error. In order to determine the best features for any movements, several trials of extraction methods are used by determining the features with less of errors. The accurate features can be use for future works of rehabilitation control in real-time.
Coding visual features extracted from video sequences.

PubMed

Baroffio, Luca; Cesana, Matteo; Redondi, Alessandro; Tagliasacchi, Marco; Tubaro, Stefano

2014-05-01

Visual features are successfully exploited in several applications (e.g., visual search, object recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis tasks require features to be transmitted over a bandwidth-limited network, thus calling for coding techniques to reduce the required bit budget, while attaining a target level of efficiency. In this paper, we propose, for the first time, a coding architecture designed for local features (e.g., SIFT, SURF) extracted from video sequences. To achieve high coding efficiency, we exploit both spatial and temporal redundancy by means of intraframe and interframe coding modes. In addition, we propose a coding mode decision based on rate-distortion optimization. The proposed coding scheme can be conveniently adopted to implement the analyze-then-compress (ATC) paradigm in the context of visual sensor networks. That is, sets of visual features are extracted from video frames, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast to the traditional compress-then-analyze (CTA) paradigm, in which video sequences acquired at a node are compressed and then sent to a central unit for further processing. In this paper, we compare these coding paradigms using metrics that are routinely adopted to evaluate the suitability of visual features in the context of content-based retrieval, object recognition, and tracking. Experimental results demonstrate that, thanks to the significant coding gains achieved by the proposed coding scheme, ATC outperforms CTA with respect to all evaluation metrics.
Diagnosis of multiple sclerosis from EEG signals using nonlinear methods.

PubMed

Torabi, Ali; Daliri, Mohammad Reza; Sabzposhan, Seyyed Hojjat

2017-12-01

EEG signals have essential and important information about the brain and neural diseases. The main purpose of this study is classifying two groups of healthy volunteers and Multiple Sclerosis (MS) patients using nonlinear features of EEG signals while performing cognitive tasks. EEG signals were recorded when users were doing two different attentional tasks. One of the tasks was based on detecting a desired change in color luminance and the other task was based on detecting a desired change in direction of motion. EEG signals were analyzed in two ways: EEG signals analysis without rhythms decomposition and EEG sub-bands analysis. After recording and preprocessing, time delay embedding method was used for state space reconstruction; embedding parameters were determined for original signals and their sub-bands. Afterwards nonlinear methods were used in feature extraction phase. To reduce the feature dimension, scalar feature selections were done by using T-test and Bhattacharyya criteria. Then, the data were classified using linear support vector machines (SVM) and k-nearest neighbor (KNN) method. The best combination of the criteria and classifiers was determined for each task by comparing performances. For both tasks, the best results were achieved by using T-test criterion and SVM classifier. For the direction-based and the color-luminance-based tasks, maximum classification performances were 93.08 and 79.79% respectively which were reached by using optimal set of features. Our results show that the nonlinear dynamic features of EEG signals seem to be useful and effective in MS diseases diagnosis.
Determining the saliency of feature measurements obtained from images of sedimentary organic matter for use in its classification

NASA Astrophysics Data System (ADS)

Weller, Andrew F.; Harris, Anthony J.; Ware, J. Andrew; Jarvis, Paul S.

2006-11-01

The classification of sedimentary organic matter (OM) images can be improved by determining the saliency of image analysis (IA) features measured from them. Knowing the saliency of IA feature measurements means that only the most significant discriminating features need be used in the classification process. This is an important consideration for classification techniques such as artificial neural networks (ANNs), where too many features can lead to the 'curse of dimensionality'. The classification scheme adopted in this work is a hybrid of morphologically and texturally descriptive features from previous manual classification schemes. Some of these descriptive features are assigned to IA features, along with several others built into the IA software (Halcon) to ensure that a valid cross-section is available. After an image is captured and segmented, a total of 194 features are measured for each particle. To reduce this number to a more manageable magnitude, the SPSS AnswerTree Exhaustive CHAID (χ 2 automatic interaction detector) classification tree algorithm is used to establish each measurement's saliency as a classification discriminator. In the case of continuous data as used here, the F-test is used as opposed to the published algorithm. The F-test checks various statistical hypotheses about the variance of groups of IA feature measurements obtained from the particles to be classified. The aim is to reduce the number of features required to perform the classification without reducing its accuracy. In the best-case scenario, 194 inputs are reduced to 8, with a subsequent multi-layer back-propagation ANN recognition rate of 98.65%. This paper demonstrates the ability of the algorithm to reduce noise, help overcome the curse of dimensionality, and facilitate an understanding of the saliency of IA features as discriminators for sedimentary OM classification.
Learning Compact Binary Face Descriptor for Face Recognition.

PubMed

Lu, Jiwen; Liong, Venice Erin; Zhou, Xiuzhuang; Zhou, Jie

2015-10-01

Binary feature descriptors such as local binary patterns (LBP) and its variations have been widely used in many face recognition systems due to their excellent robustness and strong discriminative power. However, most existing binary face descriptors are hand-crafted, which require strong prior knowledge to engineer them by hand. In this paper, we propose a compact binary face descriptor (CBFD) feature learning method for face representation and recognition. Given each face image, we first extract pixel difference vectors (PDVs) in local patches by computing the difference between each pixel and its neighboring pixels. Then, we learn a feature mapping to project these pixel difference vectors into low-dimensional binary vectors in an unsupervised manner, where 1) the variance of all binary codes in the training set is maximized, 2) the loss between the original real-valued codes and the learned binary codes is minimized, and 3) binary codes evenly distribute at each learned bin, so that the redundancy information in PDVs is removed and compact binary codes are obtained. Lastly, we cluster and pool these binary codes into a histogram feature as the final representation for each face image. Moreover, we propose a coupled CBFD (C-CBFD) method by reducing the modality gap of heterogeneous faces at the feature level to make our method applicable to heterogeneous face recognition. Extensive experimental results on five widely used face datasets show that our methods outperform state-of-the-art face descriptors.
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.

PubMed

Ni, Qianwu; Chen, Lei

2017-01-01

Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Local or global? How to choose the training set for principal component compression of hyperspectral satellite measurements: a hybrid approach

NASA Astrophysics Data System (ADS)

Hultberg, Tim; August, Thomas; Lenti, Flavia

2017-09-01

Principal Component (PC) compression is the method of choice to achieve band-width reduction for dissemination of hyper spectral (HS) satellite measurements and will become increasingly important with the advent of future HS missions (such as IASI-NG and MTG-IRS) with ever higher data-rates. It is a linear transformation defined by a truncated set of the leading eigenvectors of the covariance of the measurements as well as the mean of the measurements. We discuss the strategy for generation of the eigenvectors, based on the operational experience made with IASI. To compute the covariance and mean, a so-called training set of measurements is needed, which ideally should include all relevant spectral features. For the dissemination of IASI PC scores a global static training set consisting of a large sample of measured spectra covering all seasons and all regions is used. This training set was updated once after the start of the dissemination of IASI PC scores in April 2010 by adding spectra from the 2010 Russian wildfires, in which spectral features not captured by the previous training set were identified. An alternative approach, which has sometimes been proposed, is to compute the eigenvectors on the fly from a local training set, for example consisting of all measurements in the current processing granule. It might naively be thought that this local approach would improve the compression rate by reducing the number of PC scores needed to represent the measurements within each granule. This false belief is apparently confirmed, if the reconstruction scores (root mean square of the reconstruction residuals) is used as the sole criteria for choosing the number of PC scores to retain, which would overlook the fact that the decrease in reconstruction score (for the same number of PCs) is achieved only by the retention of an increased amount of random noise. We demonstrate that the local eigenvectors retain a higher amount of noise and a lower amount of atmospheric signal than global eigenvectors. Local eigenvectors do not increase the compression rate, but increase the amount of atmospheric loss and should be avoided. Only extremely rare situations, resulting in spectra with features which have not been observed previously, can lead to problems for the global approach. To cope with such situations we investigate a hybrid approach, which first apply the global eigenvectors and then apply local compression to the residuals in order to identify and disseminate in addition any directions in the local signal, which are orthogonal to the subspace spanned by the global eigenvectors.
Action recognition using mined hierarchical compound features.

PubMed

Gilbert, Andrew; Illingworth, John; Bowden, Richard

2011-05-01

The field of Action Recognition has seen a large increase in activity in recent years. Much of the progress has been through incorporating ideas from single-frame object recognition and adapting them for temporal-based action recognition. Inspired by the success of interest points in the 2D spatial domain, their 3D (space-time) counterparts typically form the basic components used to describe actions, and in action recognition the features used are often engineered to fire sparsely. This is to ensure that the problem is tractable; however, this can sacrifice recognition accuracy as it cannot be assumed that the optimum features in terms of class discrimination are obtained from this approach. In contrast, we propose to initially use an overcomplete set of simple 2D corners in both space and time. These are grouped spatially and temporally using a hierarchical process, with an increasing search area. At each stage of the hierarchy, the most distinctive and descriptive features are learned efficiently through data mining. This allows large amounts of data to be searched for frequently reoccurring patterns of features. At each level of the hierarchy, the mined compound features become more complex, discriminative, and sparse. This results in fast, accurate recognition with real-time performance on high-resolution video. As the compound features are constructed and selected based upon their ability to discriminate, their speed and accuracy increase at each level of the hierarchy. The approach is tested on four state-of-the-art data sets, the popular KTH data set to provide a comparison with other state-of-the-art approaches, the Multi-KTH data set to illustrate performance at simultaneous multiaction classification, despite no explicit localization information provided during training. Finally, the recent Hollywood and Hollywood2 data sets provide challenging complex actions taken from commercial movie sequences. For all four data sets, the proposed hierarchical approach outperforms all other methods reported thus far in the literature and can achieve real-time operation.
Assessing future vent opening locations at the Somma-Vesuvio volcanic complex: 1. A new information geodatabase with uncertainty characterizations

NASA Astrophysics Data System (ADS)

Tadini, A.; Bisson, M.; Neri, A.; Cioni, R.; Bevilacqua, A.; Aspinall, W. P.

2017-06-01

This study presents new and revised data sets about the spatial distribution of past volcanic vents, eruptive fissures, and regional/local structures of the Somma-Vesuvio volcanic system (Italy). The innovative features of the study are the identification and quantification of important sources of uncertainty affecting interpretations of the data sets. In this regard, the spatial uncertainty of each feature is modeled by an uncertainty area, i.e., a geometric element typically represented by a polygon drawn around points or lines. The new data sets have been assembled as an updatable geodatabase that integrates and complements existing databases for Somma-Vesuvio. The data are organized into 4 data sets and stored as 11 feature classes (points and lines for feature locations and polygons for the associated uncertainty areas), totaling more than 1700 elements. More specifically, volcanic vent and eruptive fissure elements are subdivided into feature classes according to their associated eruptive styles: (i) Plinian and sub-Plinian eruptions (i.e., large- or medium-scale explosive activity); (ii) violent Strombolian and continuous ash emission eruptions (i.e., small-scale explosive activity); and (iii) effusive eruptions (including eruptions from both parasitic vents and eruptive fissures). Regional and local structures (i.e., deep faults) are represented as linear feature classes. To support interpretation of the eruption data, additional data sets are provided for Somma-Vesuvio geological units and caldera morphological features. In the companion paper, the data presented here, and the associated uncertainties, are used to develop a first vent opening probability map for the Somma-Vesuvio caldera, with specific attention focused on large or medium explosive events.
Efficient sensor network vehicle classification using peak harmonics of acoustic emissions

NASA Astrophysics Data System (ADS)

William, Peter E.; Hoffman, Michael W.

2008-04-01

An application is proposed for detection and classification of battlefield ground vehicles using the emitted acoustic signal captured at individual sensor nodes of an ad hoc Wireless Sensor Network (WSN). We make use of the harmonic characteristics of the acoustic emissions of battlefield vehicles, in reducing both the computations carried on the sensor node and the transmitted data to the fusion center for reliable and effcient classification of targets. Previous approaches focus on the lower frequency band of the acoustic emissions up to 500Hz; however, we show in the proposed application how effcient discrimination between battlefield vehicles is performed using features extracted from higher frequency bands (50 - 1500Hz). The application shows that selective time domain acoustic features surpass equivalent spectral features. Collaborative signal processing is utilized, such that estimation of certain signal model parameters is carried by the sensor node, in order to reduce the communication between the sensor node and the fusion center, while the remaining model parameters are estimated at the fusion center. The transmitted data from the sensor node to the fusion center ranges from 1 ~ 5% of the sampled acoustic signal at the node. A variety of classification schemes were examined, such as maximum likelihood, vector quantization and artificial neural networks. Evaluation of the proposed application, through processing of an acoustic data set with comparison to previous results, shows that the improvement is not only in the number of computations but also in the detection and false alarm rate as well.
Voxel classification based airway tree segmentation

NASA Astrophysics Data System (ADS)

Lo, Pechin; de Bruijne, Marleen

2008-03-01

This paper presents a voxel classification based method for segmenting the human airway tree in volumetric computed tomography (CT) images. In contrast to standard methods that use only voxel intensities, our method uses a more complex appearance model based on a set of local image appearance features and Kth nearest neighbor (KNN) classification. The optimal set of features for classification is selected automatically from a large set of features describing the local image structure at several scales. The use of multiple features enables the appearance model to differentiate between airway tree voxels and other voxels of similar intensities in the lung, thus making the segmentation robust to pathologies such as emphysema. The classifier is trained on imperfect segmentations that can easily be obtained using region growing with a manual threshold selection. Experiments show that the proposed method results in a more robust segmentation that can grow into the smaller airway branches without leaking into emphysematous areas, and is able to segment many branches that are not present in the training set.
Fusion of shallow and deep features for classification of high-resolution remote sensing images

NASA Astrophysics Data System (ADS)

Gao, Lang; Tian, Tian; Sun, Xiao; Li, Hang

2018-02-01

Effective spectral and spatial pixel description plays a significant role for the classification of high resolution remote sensing images. Current approaches of pixel-based feature extraction are of two main kinds: one includes the widelyused principal component analysis (PCA) and gray level co-occurrence matrix (GLCM) as the representative of the shallow spectral and shape features, and the other refers to the deep learning-based methods which employ deep neural networks and have made great promotion on classification accuracy. However, the former traditional features are insufficient to depict complex distribution of high resolution images, while the deep features demand plenty of samples to train the network otherwise over fitting easily occurs if only limited samples are involved in the training. In view of the above, we propose a GLCM-based convolution neural network (CNN) approach to extract features and implement classification for high resolution remote sensing images. The employment of GLCM is able to represent the original images and eliminate redundant information and undesired noises. Meanwhile, taking shallow features as the input of deep network will contribute to a better guidance and interpretability. In consideration of the amount of samples, some strategies such as L2 regularization and dropout methods are used to prevent over-fitting. The fine-tuning strategy is also used in our study to reduce training time and further enhance the generalization performance of the network. Experiments with popular data sets such as PaviaU data validate that our proposed method leads to a performance improvement compared to individual involved approaches.
Incorporating clinical metadata with digital image features for automated identification of cutaneous melanoma.

PubMed

Liu, Z; Sun, J; Smith, M; Smith, L; Warr, R

2013-11-01

Computer-assisted diagnosis (CAD) of malignant melanoma (MM) has been advocated to help clinicians to achieve a more objective and reliable assessment. However, conventional CAD systems examine only the features extracted from digital photographs of lesions. Failure to incorporate patients' personal information constrains the applicability in clinical settings. To develop a new CAD system to improve the performance of automatic diagnosis of melanoma, which, for the first time, incorporates digital features of lesions with important patient metadata into a learning process. Thirty-two features were extracted from digital photographs to characterize skin lesions. Patients' personal information, such as age, gender and, lesion site, and their combinations, was quantified as metadata. The integration of digital features and metadata was realized through an extended Laplacian eigenmap, a dimensionality-reduction method grouping lesions with similar digital features and metadata into the same classes. The diagnosis reached 82.1% sensitivity and 86.1% specificity when only multidimensional digital features were used, but improved to 95.2% sensitivity and 91.0% specificity after metadata were incorporated appropriately. The proposed system achieves a level of sensitivity comparable with experienced dermatologists aided by conventional dermoscopes. This demonstrates the potential of our method for assisting clinicians in diagnosing melanoma, and the benefit it could provide to patients and hospitals by greatly reducing unnecessary excisions of benign naevi. This paper proposes an enhanced CAD system incorporating clinical metadata into the learning process for automatic classification of melanoma. Results demonstrate that the additional metadata and the mechanism to incorporate them are useful for improving CAD of melanoma. © 2013 British Association of Dermatologists.
Robust Feature Selection Technique using Rank Aggregation.

PubMed

Sarkar, Chandrima; Cooley, Sarah; Srivastava, Jaideep

2014-01-01

Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique which produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases while classifiers exploit different statistical properties of data for evaluation. In numerous situations this can put researchers into dilemma as to which feature selection method and a classifiers to choose from a vast range of choices. In this paper, we propose a technique that aggregates the consensus properties of various feature selection methods to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable towards achieving similar and ideally higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the Robustness Index (RI). We perform an extensive empirical evaluation of our technique on eight data sets with different dimensions including Arrythmia, Lung Cancer, Madelon, mfeat-fourier, internet-ads, Leukemia-3c and Embryonal Tumor and a real world data set namely Acute Myeloid Leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that compared to other techniques our algorithm improves the classification accuracy by approximately 3-4% (in data set with less than 500 features) and by more than 5% (in data set with more than 500 features), across a wide range of classifiers.
Vision-Based UAV Flight Control and Obstacle Avoidance

DTIC Science & Technology

2006-01-01

denoted it by Vb = (Vb1, Vb2 , Vb3). Fig. 2 shows the block diagram of the proposed vision-based motion analysis and obstacle avoidance system. We denote...structure analysis often involve computation- intensive computer vision tasks, such as feature extraction and geometric modeling. Computation-intensive...First, we extract a set of features from each block. 2) Second, we compute the distance between these two sets of features. In conventional motion
Non-specific filtering of beta-distributed data.

PubMed

Wang, Xinhui; Laird, Peter W; Hinoue, Toshinori; Groshen, Susan; Siegmund, Kimberly D

2014-06-19

Non-specific feature selection is a dimension reduction procedure performed prior to cluster analysis of high dimensional molecular data. Not all measured features are expected to show biological variation, so only the most varying are selected for analysis. In DNA methylation studies, DNA methylation is measured as a proportion, bounded between 0 and 1, with variance a function of the mean. Filtering on standard deviation biases the selection of probes to those with mean values near 0.5. We explore the effect this has on clustering, and develop alternate filter methods that utilize a variance stabilizing transformation for Beta distributed data and do not share this bias. We compared results for 11 different non-specific filters on eight Infinium HumanMethylation data sets, selected to span a variety of biological conditions. We found that for data sets having a small fraction of samples showing abnormal methylation of a subset of normally unmethylated CpGs, a characteristic of the CpG island methylator phenotype in cancer, a novel filter statistic that utilized a variance-stabilizing transformation for Beta distributed data outperformed the common filter of using standard deviation of the DNA methylation proportion, or its log-transformed M-value, in its ability to detect the cancer subtype in a cluster analysis. However, the standard deviation filter always performed among the best for distinguishing subgroups of normal tissue. The novel filter and standard deviation filter tended to favour features in different genome contexts; for the same data set, the novel filter always selected more features from CpG island promoters and the standard deviation filter always selected more features from non-CpG island intergenic regions. Interestingly, despite selecting largely non-overlapping sets of features, the two filters did find sample subsets that overlapped for some real data sets. We found two different filter statistics that tended to prioritize features with different characteristics, each performed well for identifying clusters of cancer and non-cancer tissue, and identifying a cancer CpG island hypermethylation phenotype. Since cluster analysis is for discovery, we would suggest trying both filters on any new data sets, evaluating the overlap of features selected and clusters discovered.
Optimizing Nanoscale Quantitative Optical Imaging of Subfield Scattering Targets

PubMed Central

Henn, Mark-Alexander; Barnes, Bryan M.; Zhou, Hui; Sohn, Martin; Silver, Richard M.

2016-01-01

The full 3-D scattered field above finite sets of features has been shown to contain a continuum of spatial frequency information, and with novel optical microscopy techniques and electromagnetic modeling, deep-subwavelength geometrical parameters can be determined. Similarly, by using simulations, scattering geometries and experimental conditions can be established to tailor scattered fields that yield lower parametric uncertainties while decreasing the number of measurements and the area of such finite sets of features. Such optimized conditions are reported through quantitative optical imaging in 193 nm scatterfield microscopy using feature sets up to four times smaller in area than state-of-the-art critical dimension targets. PMID:27805660
Visual Saliency Detection Based on Multiscale Deep CNN Features.

PubMed

Guanbin Li; Yizhou Yu

2016-11-01

Visual saliency is a fundamental problem in both cognitive and computational sciences, including computer vision. In this paper, we discover that a high-quality visual saliency model can be learned from multiscale features extracted using deep convolutional neural networks (CNNs), which have had many successes in visual recognition tasks. For learning such saliency models, we introduce a neural network architecture, which has fully connected layers on top of CNNs responsible for feature extraction at three different scales. The penultimate layer of our neural network has been confirmed to be a discriminative high-level feature vector for saliency detection, which we call deep contrast feature. To generate a more robust feature, we integrate handcrafted low-level features with our deep contrast feature. To promote further research and evaluation of visual saliency models, we also construct a new large database of 4447 challenging images and their pixelwise saliency annotations. Experimental results demonstrate that our proposed method is capable of achieving the state-of-the-art performance on all public benchmarks, improving the F-measure by 6.12% and 10%, respectively, on the DUT-OMRON data set and our new data set (HKU-IS), and lowering the mean absolute error by 9% and 35.3%, respectively, on these two data sets.
Roles and Responsibilities in Feature Teams

NASA Astrophysics Data System (ADS)

Eckstein, Jutta

Agile development requires self-organizing teams. The set-up of a (feature) team has to enable self-organization. Special care has to be taken if the project is not only distributed, but also large and more than one feature team is involved. Every feature team needs in such a setting a product owner who ensures the continuous focus on business delivery. The product owners collaborate by working together in a virtual team. Each feature team is supported by a coach who ensures not only the agile process of the individual feature team but also across all feature teams. An architect (or if necessary a team of architects) takes care that the system is technically sound. Contrariwise to small co-located projects, large global projects require a project manager who deals with—among other things—internal and especially external politics.

iPHLoc-ES: Identification of bacteriophage protein locations using evolutionary and structural features.

PubMed

Shatabda, Swakkhar; Saha, Sanjay; Sharma, Alok; Dehzangi, Abdollah

2017-12-21

Bacteriophage proteins are viruses that can significantly impact on the functioning of bacteria and can be used in phage based therapy. The functioning of Bacteriophage in the host bacteria depends on its location in those host cells. It is very important to know the subcellular location of the phage proteins in a host cell in order to understand their working mechanism. In this paper, we propose iPHLoc-ES, a prediction method for subcellular localization of bacteriophage proteins. We aim to solve two problems: discriminating between host located and non-host located phage proteins and discriminating between the locations of host located protein in a host cell (membrane or cytoplasm). To do this, we extract sets of evolutionary and structural features of phage protein and employ Support Vector Machine (SVM) as our classifier. We also use recursive feature elimination (RFE) to reduce the number of features for effective prediction. On standard dataset using standard evaluation criteria, our method significantly outperforms the state-of-the-art predictor. iPHLoc-ES is readily available to use as a standalone tool from: https://github.com/swakkhar/iPHLoc-ES/ and as a web application from: http://brl.uiu.ac.bd/iPHLoc-ES/. Copyright © 2017 Elsevier Ltd. All rights reserved.
Holistic approach for automated background EEG assessment in asphyxiated full-term infants

NASA Astrophysics Data System (ADS)

Matic, Vladimir; Cherian, Perumpillichira J.; Koolen, Ninah; Naulaers, Gunnar; Swarte, Renate M.; Govaert, Paul; Van Huffel, Sabine; De Vos, Maarten

2014-12-01

Objective. To develop an automated algorithm to quantify background EEG abnormalities in full-term neonates with hypoxic ischemic encephalopathy. Approach. The algorithm classifies 1 h of continuous neonatal EEG (cEEG) into a mild, moderate or severe background abnormality grade. These classes are well established in the literature and a clinical neurophysiologist labeled 272 1 h cEEG epochs selected from 34 neonates. The algorithm is based on adaptive EEG segmentation and mapping of the segments into the so-called segments’ feature space. Three features are suggested and further processing is obtained using a discretized three-dimensional distribution of the segments’ features represented as a 3-way data tensor. Further classification has been achieved using recently developed tensor decomposition/classification methods that reduce the size of the model and extract a significant and discriminative set of features. Main results. Effective parameterization of cEEG data has been achieved resulting in high classification accuracy (89%) to grade background EEG abnormalities. Significance. For the first time, the algorithm for the background EEG assessment has been validated on an extensive dataset which contained major artifacts and epileptic seizures. The demonstrated high robustness, while processing real-case EEGs, suggests that the algorithm can be used as an assistive tool to monitor the severity of hypoxic insults in newborns.
Classification of Error Related Brain Activity in an Auditory Identification Task with Conditions of Varying Complexity

NASA Astrophysics Data System (ADS)

Kakkos, I.; Gkiatis, K.; Bromis, K.; Asvestas, P. A.; Karanasiou, I. S.; Ventouras, E. M.; Matsopoulos, G. K.

2017-11-01

The detection of an error is the cognitive evaluation of an action outcome that is considered undesired or mismatches an expected response. Brain activity during monitoring of correct and incorrect responses elicits Event Related Potentials (ERPs) revealing complex cerebral responses to deviant sensory stimuli. Development of accurate error detection systems is of great importance both concerning practical applications and in investigating the complex neural mechanisms of decision making. In this study, data are used from an audio identification experiment that was implemented with two levels of complexity in order to investigate neurophysiological error processing mechanisms in actors and observers. To examine and analyse the variations of the processing of erroneous sensory information for each level of complexity we employ Support Vector Machines (SVM) classifiers with various learning methods and kernels using characteristic ERP time-windowed features. For dimensionality reduction and to remove redundant features we implement a feature selection framework based on Sequential Forward Selection (SFS). The proposed method provided high accuracy in identifying correct and incorrect responses both for actors and for observers with mean accuracy of 93% and 91% respectively. Additionally, computational time was reduced and the effects of the nesting problem usually occurring in SFS of large feature sets were alleviated.
Fall detection algorithms for real-world falls harvested from lumbar sensors in the elderly population: a machine learning approach.

PubMed

Bourke, Alan K; Klenk, Jochen; Schwickert, Lars; Aminian, Kamiar; Ihlen, Espen A F; Mellone, Sabato; Helbostad, Jorunn L; Chiari, Lorenzo; Becker, Clemens

2016-08-01

Automatic fall detection will promote independent living and reduce the consequences of falls in the elderly by ensuring people can confidently live safely at home for linger. In laboratory studies inertial sensor technology has been shown capable of distinguishing falls from normal activities. However less than 7% of fall-detection algorithm studies have used fall data recorded from elderly people in real life. The FARSEEING project has compiled a database of real life falls from elderly people, to gain new knowledge about fall events and to develop fall detection algorithms to combat the problems associated with falls. We have extracted 12 different kinematic, temporal and kinetic related features from a data-set of 89 real-world falls and 368 activities of daily living. Using the extracted features we applied machine learning techniques and produced a selection of algorithms based on different feature combinations. The best algorithm employs 10 different features and produced a sensitivity of 0.88 and a specificity of 0.87 in classifying falls correctly. This algorithm can be used distinguish real-world falls from normal activities of daily living in a sensor consisting of a tri-axial accelerometer and tri-axial gyroscope located at L5.
An expert system based on principal component analysis, artificial immune system and fuzzy k-NN for diagnosis of valvular heart diseases.

PubMed

Sengur, Abdulkadir

2008-03-01

In the last two decades, the use of artificial intelligence methods in medical analysis is increasing. This is mainly because the effectiveness of classification and detection systems have improved a great deal to help the medical experts in diagnosing. In this work, we investigate the use of principal component analysis (PCA), artificial immune system (AIS) and fuzzy k-NN to determine the normal and abnormal heart valves from the Doppler heart sounds. The proposed heart valve disorder detection system is composed of three stages. The first stage is the pre-processing stage. Filtering, normalization and white de-noising are the processes that were used in this stage. The feature extraction is the second stage. During feature extraction stage, wavelet packet decomposition was used. As a next step, wavelet entropy was considered as features. For reducing the complexity of the system, PCA was used for feature reduction. In the classification stage, AIS and fuzzy k-NN were used. To evaluate the performance of the proposed methodology, a comparative study is realized by using a data set containing 215 samples. The validation of the proposed method is measured by using the sensitivity and specificity parameters; 95.9% sensitivity and 96% specificity rate was obtained.
Improving the Computational Effort of Set-Inversion-Based Prandial Insulin Delivery for Its Integration in Insulin Pumps

PubMed Central

León-Vargas, Fabian; Calm, Remei; Bondia, Jorge; Vehí, Josep

2012-01-01

Objective Set-inversion-based prandial insulin delivery is a new model-based bolus advisor for postprandial glucose control in type 1 diabetes mellitus (T1DM). It automatically coordinates the values of basal–bolus insulin to be infused during the postprandial period so as to achieve some predefined control objectives. However, the method requires an excessive computation time to compute the solution set of feasible insulin profiles, which impedes its integration into an insulin pump. In this work, a new algorithm is presented, which reduces computation time significantly and enables the integration of this new bolus advisor into current processing features of smart insulin pumps. Methods A new strategy was implemented that focused on finding the combined basal–bolus solution of interest rather than an extensive search of the feasible set of solutions. Analysis of interval simulations, inclusion of physiological assumptions, and search domain contractions were used. Data from six real patients with T1DM were used to compare the performance between the optimized and the conventional computations. Results In all cases, the optimized version yielded the basal–bolus combination recommended by the conventional method and in only 0.032% of the computation time. Simulations show that the mean number of iterations for the optimized computation requires approximately 3.59 s at 20 MHz processing power, in line with current features of smart pumps. Conclusions A computationally efficient method for basal–bolus coordination in postprandial glucose control has been presented and tested. The results indicate that an embedded algorithm within smart insulin pumps is now feasible. Nonetheless, we acknowledge that a clinical trial will be needed in order to justify this claim. PMID:23294789
Non-rigid registration of 3D ultrasound for neurosurgery using automatic feature detection and matching.

PubMed

Machado, Inês; Toews, Matthew; Luo, Jie; Unadkat, Prashin; Essayed, Walid; George, Elizabeth; Teodoro, Pedro; Carvalho, Herculano; Martins, Jorge; Golland, Polina; Pieper, Steve; Frisken, Sarah; Golby, Alexandra; Wells, William

2018-06-04

The brain undergoes significant structural change over the course of neurosurgery, including highly nonlinear deformation and resection. It can be informative to recover the spatial mapping between structures identified in preoperative surgical planning and the intraoperative state of the brain. We present a novel feature-based method for achieving robust, fully automatic deformable registration of intraoperative neurosurgical ultrasound images. A sparse set of local image feature correspondences is first estimated between ultrasound image pairs, after which rigid, affine and thin-plate spline models are used to estimate dense mappings throughout the image. Correspondences are derived from 3D features, distinctive generic image patterns that are automatically extracted from 3D ultrasound images and characterized in terms of their geometry (i.e., location, scale, and orientation) and a descriptor of local image appearance. Feature correspondences between ultrasound images are achieved based on a nearest-neighbor descriptor matching and probabilistic voting model similar to the Hough transform. Experiments demonstrate our method on intraoperative ultrasound images acquired before and after opening of the dura mater, during resection and after resection in nine clinical cases. A total of 1620 automatically extracted 3D feature correspondences were manually validated by eleven experts and used to guide the registration. Then, using manually labeled corresponding landmarks in the pre- and post-resection ultrasound images, we show that our feature-based registration reduces the mean target registration error from an initial value of 3.3 to 1.5 mm. This result demonstrates that the 3D features promise to offer a robust and accurate solution for 3D ultrasound registration and to correct for brain shift in image-guided neurosurgery.
Optimizing Radiometric Processing and Feature Extraction of Drone Based Hyperspectral Frame Format Imagery for Estimation of Yield Quantity and Quality of a Grass Sward

NASA Astrophysics Data System (ADS)

Näsi, R.; Viljanen, N.; Oliveira, R.; Kaivosoja, J.; Niemeläinen, O.; Hakala, T.; Markelin, L.; Nezami, S.; Suomalainen, J.; Honkavaara, E.

2018-04-01

Light-weight 2D format hyperspectral imagers operable from unmanned aerial vehicles (UAV) have become common in various remote sensing tasks in recent years. Using these technologies, the area of interest is covered by multiple overlapping hypercubes, in other words multiview hyperspectral photogrammetric imagery, and each object point appears in many, even tens of individual hypercubes. The common practice is to calculate hyperspectral orthomosaics utilizing only the most nadir areas of the images. However, the redundancy of the data gives potential for much more versatile and thorough feature extraction. We investigated various options of extracting spectral features in the grass sward quantity evaluation task. In addition to the various sets of spectral features, we used photogrammetry-based ultra-high density point clouds to extract features describing the canopy 3D structure. Machine learning technique based on the Random Forest algorithm was used to estimate the fresh biomass. Results showed high accuracies for all investigated features sets. The estimation results using multiview data provided approximately 10 % better results than the most nadir orthophotos. The utilization of the photogrammetric 3D features improved estimation accuracy by approximately 40 % compared to approaches where only spectral features were applied. The best estimation RMSE of 239 kg/ha (6.0 %) was obtained with multiview anisotropy corrected data set and the 3D features.
Landslides Identification Using Airborne Laser Scanning Data Derived Topographic Terrain Attributes and Support Vector Machine Classification

NASA Astrophysics Data System (ADS)

Pawłuszek, Kamila; Borkowski, Andrzej

2016-06-01

Since the availability of high-resolution Airborne Laser Scanning (ALS) data, substantial progress in geomorphological research, especially in landslide analysis, has been carried out. First and second order derivatives of Digital Terrain Model (DTM) have become a popular and powerful tool in landslide inventory mapping. Nevertheless, an automatic landslide mapping based on sophisticated classifiers including Support Vector Machine (SVM), Artificial Neural Network or Random Forests is often computationally time consuming. The objective of this research is to deeply explore topographic information provided by ALS data and overcome computational time limitation. For this reason, an extended set of topographic features and the Principal Component Analysis (PCA) were used to reduce redundant information. The proposed novel approach was tested on a susceptible area affected by more than 50 landslides located on Rożnów Lake in Carpathian Mountains, Poland. The initial seven PCA components with 90% of the total variability in the original topographic attributes were used for SVM classification. Comparing results with landslide inventory map, the average user's accuracy (UA), producer's accuracy (PA), and overall accuracy (OA) were calculated for two models according to the classification results. Thereby, for the PCA-feature-reduced model UA, PA, and OA were found to be 72%, 76%, and 72%, respectively. Similarly, UA, PA, and OA in the non-reduced original topographic model, was 74%, 77% and 74%, respectively. Using the initial seven PCA components instead of the twenty original topographic attributes does not significantly change identification accuracy but reduce computational time.
TU-CD-BRB-04: Automated Radiomic Features Complement the Prognostic Value of VASARI in the TCGA-GBM Dataset

DOE Office of Scientific and Technical Information (OSTI.GOV)

Velazquez, E Rios; Narayan, V; Grossmann, P

2015-06-15

Purpose: To compare the complementary prognostic value of automated Radiomic features to that of radiologist-annotated VASARI features in TCGA-GBM MRI dataset. Methods: For 96 GBM patients, pre-operative MRI images were obtained from The Cancer Imaging Archive. The abnormal tumor bulks were manually defined on post-contrast T1w images. The contrast-enhancing and necrotic regions were segmented using FAST. From these sub-volumes and the total abnormal tumor bulk, a set of Radiomic features quantifying phenotypic differences based on the tumor intensity, shape and texture, were extracted from the post-contrast T1w images. Minimum-redundancy-maximum-relevance (MRMR) was used to identify the most informative Radiomic, VASARI andmore » combined Radiomic-VASARI features in 70% of the dataset (training-set). Multivariate Cox-proportional hazards models were evaluated in 30% of the dataset (validation-set) using the C-index for OS. A bootstrap procedure was used to assess significance while comparing the C-Indices of the different models. Results: Overall, the Radiomic features showed a moderate correlation with the radiologist-annotated VASARI features (r = −0.37 – 0.49); however that correlation was stronger for the Tumor Diameter and Proportion of Necrosis VASARI features (r = −0.71 – 0.69). After MRMR feature selection, the best-performing Radiomic, VASARI, and Radiomic-VASARI Cox-PH models showed a validation C-index of 0.56 (p = NS), 0.58 (p = NS) and 0.65 (p = 0.01), respectively. The combined Radiomic-VASARI model C-index was significantly higher than that obtained from either the Radiomic or VASARI model alone (p = <0.001). Conclusion: Quantitative volumetric and textural Radiomic features complement the qualitative and semi-quantitative annotated VASARI feature set. The prognostic value of informative qualitative VASARI features such as Eloquent Brain and Multifocality is increased with the addition of quantitative volumetric and textural features from the contrast-enhancing and necrotic tumor regions. These results should be further evaluated in larger validation cohorts.« less
Targeted Feature Detection for Data-Dependent Shotgun Proteomics

PubMed Central

2017-01-01

Label-free quantification of shotgun LC–MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification (“FFId”), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between “internal” and “external” (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the “uncertain” feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS (www.openms.org). PMID:28673088
Targeted Feature Detection for Data-Dependent Shotgun Proteomics.

PubMed

Weisser, Hendrik; Choudhary, Jyoti S

2017-08-04

Label-free quantification of shotgun LC-MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification ("FFId"), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between "internal" and "external" (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the "uncertain" feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS ( www.openms.org ).
Task representation in individual and joint settings

PubMed Central

Prinz, Wolfgang

2015-01-01

This paper outlines a framework for task representation and discusses applications to interference tasks in individual and joint settings. The framework is derived from the Theory of Event Coding (TEC). This theory regards task sets as transient assemblies of event codes in which stimulus and response codes interact and shape each other in particular ways. On the one hand, stimulus and response codes compete with each other within their respective subsets (horizontal interactions). On the other hand, stimulus and response code cooperate with each other (vertical interactions). Code interactions instantiating competition and cooperation apply to two time scales: on-line performance (i.e., doing the task) and off-line implementation (i.e., setting the task). Interference arises when stimulus and response codes overlap in features that are irrelevant for stimulus identification, but relevant for response selection. To resolve this dilemma, the feature profiles of event codes may become restructured in various ways. The framework is applied to three kinds of interference paradigms. Special emphasis is given to joint settings where tasks are shared between two participants. Major conclusions derived from these applications include: (1) Response competition is the chief driver of interference. Likewise, different modes of response competition give rise to different patterns of interference; (2) The type of features in which stimulus and response codes overlap is also a crucial factor. Different types of such features give likewise rise to different patterns of interference; and (3) Task sets for joint settings conflate intraindividual conflicts between responses (what), with interindividual conflicts between responding agents (whom). Features of response codes may, therefore, not only address responses, but also responding agents (both physically and socially). PMID:26029085
Fixed versus mixed RSA: Explaining visual representations by fixed and mixed feature sets from shallow and deep computational models.

PubMed

Khaligh-Razavi, Seyed-Mahdi; Henriksson, Linda; Kay, Kendrick; Kriegeskorte, Nikolaus

2017-02-01

Studies of the primate visual system have begun to test a wide range of complex computational object-vision models. Realistic models have many parameters, which in practice cannot be fitted using the limited amounts of brain-activity data typically available. Task performance optimization (e.g. using backpropagation to train neural networks) provides major constraints for fitting parameters and discovering nonlinear representational features appropriate for the task (e.g. object classification). Model representations can be compared to brain representations in terms of the representational dissimilarities they predict for an image set. This method, called representational similarity analysis (RSA), enables us to test the representational feature space as is (fixed RSA) or to fit a linear transformation that mixes the nonlinear model features so as to best explain a cortical area's representational space (mixed RSA). Like voxel/population-receptive-field modelling, mixed RSA uses a training set (different stimuli) to fit one weight per model feature and response channel (voxels here), so as to best predict the response profile across images for each response channel. We analysed response patterns elicited by natural images, which were measured with functional magnetic resonance imaging (fMRI). We found that early visual areas were best accounted for by shallow models, such as a Gabor wavelet pyramid (GWP). The GWP model performed similarly with and without mixing, suggesting that the original features already approximated the representational space, obviating the need for mixing. However, a higher ventral-stream visual representation (lateral occipital region) was best explained by the higher layers of a deep convolutional network and mixing of its feature set was essential for this model to explain the representation. We suspect that mixing was essential because the convolutional network had been trained to discriminate a set of 1000 categories, whose frequencies in the training set did not match their frequencies in natural experience or their behavioural importance. The latter factors might determine the representational prominence of semantic dimensions in higher-level ventral-stream areas. Our results demonstrate the benefits of testing both the specific representational hypothesis expressed by a model's original feature space and the hypothesis space generated by linear transformations of that feature space.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Honorio, J.; Goldstein, R.; Honorio, J.

We propose a simple, well grounded classification technique which is suited for group classification on brain fMRI data sets that have high dimensionality, small number of subjects, high noise level, high subject variability, imperfect registration and capture subtle cognitive effects. We propose threshold-split region as a new feature selection method and majority voteas the classification technique. Our method does not require a predefined set of regions of interest. We use average acros ssessions, only one feature perexperimental condition, feature independence assumption, and simple classifiers. The seeming counter-intuitive approach of using a simple design is supported by signal processing and statisticalmore » theory. Experimental results in two block design data sets that capture brain function under distinct monetary rewards for cocaine addicted and control subjects, show that our method exhibits increased generalization accuracy compared to commonly used feature selection and classification techniques.« less
Exploring the boundary between a siphon and barometer in a hypobaric chamber

PubMed Central

Hughes, Stephen; Gurung, Som

2014-01-01

Siphons have been used since ancient times, but exactly how they work is still a matter of debate. In order to elucidate the modus operandi of a siphon, a 1.5 m high siphon was set up in a hypobaric chamber to explore siphon behaviour in a low-pressure environment. When the pressure in the chamber was reduced to about 0.18 atmospheres, a curious waterfall-like feature appeared downstream from the apex of the siphon. A hypothesis is presented to explain the waterfall phenomenon. When the pressure was reduced further the siphon broke into two columns - in effect becoming two back-to-back barometers. This experiment demonstrates the role of atmospheric pressure in explaining the hydrostatic characteristics of a siphon and the role of molecular cohesion in explaining the hydrodynamic aspects. PMID:24751967
Barrier island morphodynamic classification based on lidar metrics for north Assateague Island, Maryland

USGS Publications Warehouse

Brock, John C.; Krabill, William; Sallenger, Asbury H.

2004-01-01

In order to reap the potential of airborne lidar surveys to provide geological information useful in understanding coastal sedimentary processes acting on various time scales, a new set of analysis methods are needed. This paper presents a multi-temporal lidar analysis of north Assateague Island, Maryland, and demonstrates the calculation of lidar metrics that condense barrier island morphology and morphological change into attributed linear features that may be used to analyze trends in coastal evolution. The new methods proposed in this paper are also of significant practical value, because lidar metric analysis reduces large volumes of point elevations into linear features attributed with essential morphological variables that are ideally suited for inclusion in Geographic Information Systems. A morphodynamic classification of north Assategue Island for a recent 10 month time period that is based on the recognition of simple patterns described by lidar change metrics is presented. Such morphodynamic classification reveals the relative magnitude and the fine scale alongshore variation in the importance of coastal changes over the study area during a defined time period. More generally, through the presentation of this morphodynamic classification of north Assateague Island, the value of lidar metrics in both examining large lidar data sets for coherent trends and in building hypotheses regarding processes driving barrier evolution is demonstrated
Geochemical features of the ore-bearing medium in uranium deposits in the Khiagda ore field

NASA Astrophysics Data System (ADS)

Kochkin, B. T.; Solodov, I. N.; Ganina, N. I.; Rekun, M. L.; Tarasov, N. N.; Shugina, G. A.; Shulik, L. S.

2017-09-01

The Neogene uranium deposits of the Khiagda ore field (KOF) belong to the paleovalley variety of the hydrogene type and differ from other deposits of this genetic type in the geological and geochemical localization conditions. The contemporary hydrogeochemical setting and microbiological composition of ore-bearing medium are discussed. The redox potential of the medium (Eh is as low as-400 mV) is much lower than those established at other hydrogenic deposits, both ancient Late Mesozoic and young Late Alpine, studied with the same methods in Russia, Uzbekistan, and southern Kazakhstan. The pH of subsurface water (6.86-8.13) differs in significant fluctuations both between neighboring deposits and within individual ore lodes. Hydrogen-forming and denitrifying bacteria are predominant in microbiological populations, whereas sulfate-reducing bacteria are low-active. The consideration of these factors allowed us to describe the mechanism of uranium ore conservation as resulting from the development of the cryolithic zone, which isolates ore lodes from the effect of the external medium. Carbonated water supplied from the basement along fault zones also participates in the formation of the present-day hydrogeochemical setting. Based on the features of the ore-bearing medium, we propose a method of borehole in situ acid leaching to increase the efficiency of mining in the Khiagda ore field.
A neural network for noise correlation classification

NASA Astrophysics Data System (ADS)

Paitz, Patrick; Gokhberg, Alexey; Fichtner, Andreas

2018-02-01

We present an artificial neural network (ANN) for the classification of ambient seismic noise correlations into two categories, suitable and unsuitable for noise tomography. By using only a small manually classified data subset for network training, the ANN allows us to classify large data volumes with low human effort and to encode the valuable subjective experience of data analysts that cannot be captured by a deterministic algorithm. Based on a new feature extraction procedure that exploits the wavelet-like nature of seismic time-series, we efficiently reduce the dimensionality of noise correlation data, still keeping relevant features needed for automated classification. Using global- and regional-scale data sets, we show that classification errors of 20 per cent or less can be achieved when the network training is performed with as little as 3.5 per cent and 16 per cent of the data sets, respectively. Furthermore, the ANN trained on the regional data can be applied to the global data, and vice versa, without a significant increase of the classification error. An experiment where four students manually classified the data, revealed that the classification error they would assign to each other is substantially larger than the classification error of the ANN (>35 per cent). This indicates that reproducibility would be hampered more by human subjectivity than by imperfections of the ANN.
Reduced-order model for underwater target identification using proper orthogonal decomposition

NASA Astrophysics Data System (ADS)

Ramesh, Sai Sudha; Lim, Kian Meng

2017-03-01

Research on underwater acoustics has seen major development over the past decade due to its widespread applications in domains such as underwater communication/navigation (SONAR), seismic exploration and oceanography. In particular, acoustic signatures from partially or fully buried targets can be used in the identification of buried mines for mine counter measures (MCM). Although there exist several techniques to identify target properties based on SONAR images and acoustic signatures, these methods first employ a feature extraction method to represent the dominant characteristics of a data set, followed by the use of an appropriate classifier based on neural networks or the relevance vector machine. The aim of the present study is to demonstrate the applications of proper orthogonal decomposition (POD) technique in capturing dominant features of a set of scattered pressure signals, and subsequent use of the POD modes and coefficients in the identification of partially buried underwater target parameters such as its location, size and material density. Several numerical examples are presented to demonstrate the performance of the system identification method based on POD. Although the present study is based on 2D acoustic model, the method can be easily extended to 3D models and thereby enables cost-effective representations of large-scale data.

Machine learning-based diagnosis of melanoma using macro images.

PubMed

Gautam, Diwakar; Ahmed, Mushtaq; Meena, Yogesh Kumar; Ul Haq, Ahtesham

2018-05-01

Cancer bears a poisoning threat to human society. Melanoma, the skin cancer, originates from skin layers and penetrates deep into subcutaneous layers. There exists an extensive research in melanoma diagnosis using dermatoscopic images captured through a dermatoscope. While designing a diagnostic model for general handheld imaging systems is an emerging trend, this article proposes a computer-aided decision support system for macro images captured by a general-purpose camera. General imaging conditions are adversely affected by nonuniform illumination, which further affects the extraction of relevant information. To mitigate it, we process an image to define a smooth illumination surface using the multistage illumination compensation approach, and the infected region is extracted using the proposed multimode segmentation method. The lesion information is numerated as a feature set comprising geometry, photometry, border series, and texture measures. The redundancy in feature set is reduced using information theory methods, and a classification boundary is modeled to distinguish benign and malignant samples using support vector machine, random forest, neural network, and fast discriminative mixed-membership-based naive Bayesian classifiers. Moreover, the experimental outcome is supported by hypothesis testing and boxplot representation for classification losses. The simulation results prove the significance of the proposed model that shows an improved performance as compared with competing arts. Copyright © 2017 John Wiley & Sons, Ltd.
Dose assessment of digital tomosynthesis in pediatric imaging

NASA Astrophysics Data System (ADS)

Gislason, Amber; Elbakri, Idris A.; Reed, Martin

2009-02-01

We investigated the potential for digital tomosynthesis (DT) to reduce pediatric x-ray dose while maintaining image quality. We utilized the DT feature (VolumeRadTM) on the GE DefiniumTM 8000 flat panel system installed in the Winnipeg Children's Hospital. Facial bones, cervical spine, thoracic spine, and knee of children aged 5, 10, and 15 years were represented by acrylic phantoms for DT dose measurements. Effective dose was estimated for DT and for corresponding digital radiography (DR) and computed tomography (CT) patient image sets. Anthropomorphic phantoms of selected body parts were imaged by DR, DT, and CT. Pediatric radiologists rated visualization of selected anatomic features in these images. Dose and image quality comparisons between DR, DT, and CT determined the usefulness of tomosynthesis for pediatric imaging. CT effective dose was highest; total DR effective dose was not always lowest - depending how many projections were in the DR image set. For the cervical spine, DT dose was close to and occasionally lower than DR dose. Expert radiologists rated visibility of the central facial complex in a skull phantom as better than DR and comparable to CT. Digital tomosynthesis has a significantly lower dose than CT. This study has demonstrated DT shows promise to replace CT for some facial bones and spinal diagnoses. Other clinical applications will be evaluated in the future.
An Evolving Worldview: Making Open Source Easy

NASA Technical Reports Server (NTRS)

Rice, Zachary

2017-01-01

NASA Worldview is an interactive interface for browsing full-resolution, global satellite imagery. Worldview supports an open data policy so that academia, private industries and the general public can use NASA's satellite data to address Earth science related issues. Worldview was open sourced in 2014. By shifting to an open source approach, the Worldview application has evolved to better serve end-users. Project developers are able to have discussions with end-users and community developers to understand issues and develop new features. New developers are able to track upcoming features, collaborate on them and make their own contributions. Getting new developers to contribute to the project has been one of the most important and difficult aspects of open sourcing Worldview. A focus has been made on making the installation of Worldview simple to reduce the initial learning curve and make contributing code easy. One way we have addressed this is through a simplified setup process. Our setup documentation includes a set of prerequisites and a set of straight forward commands to clone, configure, install and run. This presentation will emphasis our focus to simplify and standardize Worldview's open source code so more people are able to contribute. The more people who contribute, the better the application will become over time.
A Non-Parametric Approach for the Activation Detection of Block Design fMRI Simulated Data Using Self-Organizing Maps and Support Vector Machine.

PubMed

Bahrami, Sheyda; Shamsi, Mousa

2017-01-01

Functional magnetic resonance imaging (fMRI) is a popular method to probe the functional organization of the brain using hemodynamic responses. In this method, volume images of the entire brain are obtained with a very good spatial resolution and low temporal resolution. However, they always suffer from high dimensionality in the face of classification algorithms. In this work, we combine a support vector machine (SVM) with a self-organizing map (SOM) for having a feature-based classification by using SVM. Then, a linear kernel SVM is used for detecting the active areas. Here, we use SOM for feature extracting and labeling the datasets. SOM has two major advances: (i) it reduces dimension of data sets for having less computational complexity and (ii) it is useful for identifying brain regions with small onset differences in hemodynamic responses. Our non-parametric model is compared with parametric and non-parametric methods. We use simulated fMRI data sets and block design inputs in this paper and consider the contrast to noise ratio (CNR) value equal to 0.6 for simulated datasets. fMRI simulated dataset has contrast 1-4% in active areas. The accuracy of our proposed method is 93.63% and the error rate is 6.37%.
AAPM/RSNA physics tutorial for residents: physics of flat-panel fluoroscopy systems: Survey of modern fluoroscopy imaging: flat-panel detectors versus image intensifiers and more.

PubMed

Nickoloff, Edward Lee

2011-01-01

This article reviews the design and operation of both flat-panel detector (FPD) and image intensifier fluoroscopy systems. The different components of each imaging chain and their functions are explained and compared. FPD systems have multiple advantages such as a smaller size, extended dynamic range, no spatial distortion, and greater stability. However, FPD systems typically have the same spatial resolution for all fields of view (FOVs) and are prone to ghosting. Image intensifier systems have better spatial resolution with the use of smaller FOVs (magnification modes) and tend to be less expensive. However, the spatial resolution of image intensifier systems is limited by the television system to which they are coupled. Moreover, image intensifier systems are degraded by glare, vignetting, spatial distortions, and defocusing effects. FPD systems do not have these problems. Some recent innovations to fluoroscopy systems include automated filtration, pulsed fluoroscopy, automatic positioning, dose-area product meters, and improved automatic dose rate control programs. Operator-selectable features may affect both the patient radiation dose and image quality; these selectable features include dose level setting, the FOV employed, fluoroscopic pulse rates, geometric factors, display software settings, and methods to reduce the imaging time. © RSNA, 2011.
Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates Through Natural Language Processing and Machine Learning

PubMed Central

Cohen, Kevin Bretonnel; Glass, Benjamin; Greiner, Hansel M.; Holland-Bouley, Katherine; Standridge, Shannon; Arya, Ravindra; Faist, Robert; Morita, Diego; Mangano, Francesco; Connolly, Brian; Glauser, Tracy; Pestian, John

2016-01-01

Objective: We describe the development and evaluation of a system that uses machine learning and natural language processing techniques to identify potential candidates for surgical intervention for drug-resistant pediatric epilepsy. The data are comprised of free-text clinical notes extracted from the electronic health record (EHR). Both known clinical outcomes from the EHR and manual chart annotations provide gold standards for the patient’s status. The following hypotheses are then tested: 1) machine learning methods can identify epilepsy surgery candidates as well as physicians do and 2) machine learning methods can identify candidates earlier than physicians do. These hypotheses are tested by systematically evaluating the effects of the data source, amount of training data, class balance, classification algorithm, and feature set on classifier performance. The results support both hypotheses, with F-measures ranging from 0.71 to 0.82. The feature set, classification algorithm, amount of training data, class balance, and gold standard all significantly affected classification performance. It was further observed that classification performance was better than the highest agreement between two annotators, even at one year before documented surgery referral. The results demonstrate that such machine learning methods can contribute to predicting pediatric epilepsy surgery candidates and reducing lag time to surgery referral. PMID:27257386
Segmentation and texture analysis of structural biomarkers using neighborhood-clustering-based level set in MRI of the schizophrenic brain.

PubMed

Latha, Manohar; Kavitha, Ganesan

2018-02-03

Schizophrenia (SZ) is a psychiatric disorder that especially affects individuals during their adolescence. There is a need to study the subanatomical regions of SZ brain on magnetic resonance images (MRI) based on morphometry. In this work, an attempt was made to analyze alterations in structure and texture patterns in images of the SZ brain using the level-set method and Laws texture features. T1-weighted MRI of the brain from Center of Biomedical Research Excellence (COBRE) database were considered for analysis. Segmentation was carried out using the level-set method. Geometrical and Laws texture features were extracted from the segmented brain stem, corpus callosum, cerebellum, and ventricle regions to analyze pattern changes in SZ. The level-set method segmented multiple brain regions, with higher similarity and correlation values compared with an optimized method. The geometric features obtained from regions of the corpus callosum and ventricle showed significant variation (p < 0.00001) between normal and SZ brain. Laws texture feature identified a heterogeneous appearance in the brain stem, corpus callosum and ventricular regions, and features from the brain stem were correlated with Positive and Negative Syndrome Scale (PANSS) score (p < 0.005). A framework of geometric and Laws texture features obtained from brain subregions can be used as a supplement for diagnosis of psychiatric disorders.
Multidimensional team-based intervention using musical cues to reduce odds of facility-acquired pressure ulcers in long-term care: a paired randomized intervention study.

PubMed

Yap, Tracey L; Kennerly, Susan M; Simmons, Mark R; Buncher, Charles R; Miller, Elaine; Kim, Jay; Yap, Winston Y

2013-09-01

To test the effectiveness of a pressure ulcer (PU) prevention intervention featuring musical cues to remind all long-term care (LTC) staff (nursing and ancillary) to help every resident move or reposition every 2 hours. Twelve-month paired-facility two-arm (with one-arm crossover) randomized intervention trial. Ten midwestern U.S. LTC facilities. Four treatment facilities received intervention during Months 1 to 12, four comparison facilities received intervention during Months 7 to 12, and two pseudo-control facilities received no intervention. LTC facility residents (N = 1,928). All facility staff received in-person education, video, and handouts, and visiting family members received informational pamphlets on PU prevention and an intervention featuring musical cues. Nurse-led multidisciplinary staff teams presented the cues as prompts for staff and family to reposition residents or remind them to move. Musical selections (with and without lyrics) customized to facility preferences were played daily over the facility intercom or public address system every 2 hours for the 12-hour daytime period. Primary outcome measure was the frequency of new facility-acquired PUs divided by the total number of facility Minimum Data Set (MDS) resident assessments conducted during the study period. Odds of a new PU were lower in intervention facilities (P = .08) for MDS 2.0 assessments and were significantly lower (P = .05) for MDS 3.0. Mean odds ratios suggested intervention facility residents were 45% less likely than comparison facility residents to develop a new PU. Customized musical cues that prompt multidisciplinary staff teams to encourage or enable movement of all residents hold promise for reducing facility-acquired PUs in LTC settings. © 2013, Copyright the Authors Journal compilation © 2013, The American Geriatrics Society.
Does semantic preactivation reduce inattentional blindness?

PubMed

Kreitz, Carina; Schnuerch, Robert; Furley, Philip A; Gibbons, Henning; Memmert, Daniel

2015-04-01

We are susceptible to failures of awareness if a stimulus occurs unexpectedly and our attention is focused elsewhere. Such inattentional blindness is modulated by various parameters, including stimulus attributes, the observer's cognitive resources, and the observer's attentional set regarding the primary task. In three behavioral experiments with a total of 360 participants, we investigated whether mere semantic preactivation of the color of an unexpected object can reduce inattentional blindness. Neither explicitly mentioning the color several times before the occurrence of the unexpected stimulus nor priming the color more implicitly via color-related concepts could significantly reduce the susceptibility to inattentional blindness. Even putting the specific color concept in the main focus of the primary task did not lead to reduced inattentional blindness. Thus, we have shown that the failure to consciously perceive unexpected objects was not moderated by semantic preactivation of the objects' most prominent feature: its color. We suggest that this finding reflects the rather general principle that preactivations that are not motivationally relevant for one's current selection goals do not suffice to make an unexpected object overcome the threshold of awareness.
Novel Data Reduction Based on Statistical Similarity

DOE PAGES

Lee, Dongeun; Sim, Alex; Choi, Jaesik; ...

2016-07-18

Applications such as scientific simulations and power grid monitoring are generating so much data quickly that compression is essential to reduce storage requirement or transmission capacity. To achieve better compression, one is often willing to discard some repeated information. These lossy compression methods are primarily designed to minimize the Euclidean distance between the original data and the compressed data. But this measure of distance severely limits either reconstruction quality or compression performance. In this paper, we propose a new class of compression method by redefining the distance measure with a statistical concept known as exchangeability. This approach reduces the storagemore » requirement and captures essential features, while reducing the storage requirement. In this paper, we report our design and implementation of such a compression method named IDEALEM. To demonstrate its effectiveness, we apply it on a set of power grid monitoring data, and show that it can reduce the volume of data much more than the best known compression method while maintaining the quality of the compressed data. Finally, in these tests, IDEALEM captures extraordinary events in the data, while its compression ratios can far exceed 100.« less
Emotions at work: what is the link to patient and staff safety? Implications for nurse managers in the NHS.

PubMed

Smith, Pam; Pearson, Pauline H; Ross, Fiona

2009-03-01

This paper sets the discussion of emotions at work within the modern NHS and the current prioritisation of creating a safety culture within the service. The paper focuses on the work of students, frontline nurses and their managers drawing on recent studies of patient safety in the curriculum, and governance and incentives in the care of patients with complex long term conditions. The primary research featured in the paper combined a case study design with focus groups, interviews and observation. In the patient safety research the importance of physical and emotional safety emerged as a key finding both for users and professionals. In the governance and incentives research, risk emerged as a key concern for managers, frontline workers and users. The recognition of emotions and the importance of emotional labour at an individual and organizational level managed by emotionally intelligent leaders played an important role in promoting worker and patient safety and reducing workplace risk. Nurse managers need to be aware of the emotional complexities of their organizations in order to set up systems to support the emotional wellbeing of professionals and users which in turn ensures safety and reduces risk.
Visual search, visual streams, and visual architectures.

PubMed

Green, M

1991-10-01

Most psychological, physiological, and computational models of early vision suggest that retinal information is divided into a parallel set of feature modules. The dominant theories of visual search assume that these modules form a "blackboard" architecture: a set of independent representations that communicate only through a central processor. A review of research shows that blackboard-based theories, such as feature-integration theory, cannot easily explain the existing data. The experimental evidence is more consistent with a "network" architecture, which stresses that: (1) feature modules are directly connected to one another, (2) features and their locations are represented together, (3) feature detection and integration are not distinct processing stages, and (4) no executive control process, such as focal attention, is needed to integrate features. Attention is not a spotlight that synthesizes objects from raw features. Instead, it is better to conceptualize attention as an aperture which masks irrelevant visual information.
An ensemble method for extracting adverse drug events from social media.

PubMed

Liu, Jing; Zhao, Songzheng; Zhang, Xiaodi

2016-06-01

Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2.0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media. We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i.e., subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i.e., majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i.e., Twitter). When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78.6%, 72.2%, and 79.2% on the three data sets. When individual methods are used, we attain the best AUC values of 82.1%, 73.2%, and 77.0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84.5%, 77.3%, and 84.5% on the three data sets, outperforming the baselines. Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness. Copyright © 2016 Elsevier B.V. All rights reserved.
Digital health technology and trauma: development of an app to standardize care.

PubMed

Hsu, Jeremy M

2015-04-01

Standardized practice results in less variation, therefore reducing errors and improving outcome. Optimal trauma care is achieved through standardization, as is evidenced by the widespread adoption of the Advanced Trauma Life Support approach. The challenge for an individual institution is how does one educate and promulgate these standardized processes widely and efficiently? In today's world, digital health technology must be considered in the process. The aim of this study was to describe the process of developing an app, which includes standardized trauma algorithms. The objective of the app was to allow easy, real-time access to trauma algorithms, and therefore reduce omissions/errors. A set of trauma algorithms, relevant to the local setting, was derived from the best available evidence. After obtaining grant funding, a collaborative endeavour was undertaken with an external specialist app developing company. The process required 6 months to translate the existing trauma algorithms into an app. The app contains 32 separate trauma algorithms, formatted as a single-page flow diagram. It utilizes specific smartphone features such as 'pinch to zoom', jump-words and pop-ups to allow rapid access to the desired information. Improvements in trauma care outcomes result from reducing variation. By incorporating digital health technology, a trauma app has been developed, allowing easy and intuitive access to evidenced-based algorithms. © 2015 Royal Australasian College of Surgeons.
Method to assess the temporal persistence of potential biometric features: Application to oculomotor, gait, face and brain structure databases

PubMed Central

Nixon, Mark S.; Komogortsev, Oleg V.

2017-01-01

We introduce the intraclass correlation coefficient (ICC) to the biometric community as an index of the temporal persistence, or stability, of a single biometric feature. It requires, as input, a feature on an interval or ratio scale, and which is reasonably normally distributed, and it can only be calculated if each subject is tested on 2 or more occasions. For a biometric system, with multiple features available for selection, the ICC can be used to measure the relative stability of each feature. We show, for 14 distinct data sets (1 synthetic, 8 eye-movement-related, 2 gait-related, and 2 face-recognition-related, and one brain-structure-related), that selecting the most stable features, based on the ICC, resulted in the best biometric performance generally. Analyses based on using only the most stable features produced superior Rank-1-Identification Rate (Rank-1-IR) performance in 12 of 14 databases (p = 0.0065, one-tailed), when compared to other sets of features, including the set of all features. For Equal Error Rate (EER), using a subset of only high-ICC features also produced superior performance in 12 of 14 databases (p = 0. 0065, one-tailed). In general, then, for our databases, prescreening potential biometric features, and choosing only highly reliable features yields better performance than choosing lower ICC features or than choosing all features combined. We also determined that, as the ICC of a group of features increases, the median of the genuine similarity score distribution increases and the spread of this distribution decreases. There was no statistically significant similar relationships for the impostor distributions. We believe that the ICC will find many uses in biometric research. In case of the eye movement-driven biometrics, the use of reliable features, as measured by ICC, allowed to us achieve the authentication performance with EER = 2.01%, which was not possible before. PMID:28575030
Method to assess the temporal persistence of potential biometric features: Application to oculomotor, gait, face and brain structure databases.

PubMed

Friedman, Lee; Nixon, Mark S; Komogortsev, Oleg V

2017-01-01

We introduce the intraclass correlation coefficient (ICC) to the biometric community as an index of the temporal persistence, or stability, of a single biometric feature. It requires, as input, a feature on an interval or ratio scale, and which is reasonably normally distributed, and it can only be calculated if each subject is tested on 2 or more occasions. For a biometric system, with multiple features available for selection, the ICC can be used to measure the relative stability of each feature. We show, for 14 distinct data sets (1 synthetic, 8 eye-movement-related, 2 gait-related, and 2 face-recognition-related, and one brain-structure-related), that selecting the most stable features, based on the ICC, resulted in the best biometric performance generally. Analyses based on using only the most stable features produced superior Rank-1-Identification Rate (Rank-1-IR) performance in 12 of 14 databases (p = 0.0065, one-tailed), when compared to other sets of features, including the set of all features. For Equal Error Rate (EER), using a subset of only high-ICC features also produced superior performance in 12 of 14 databases (p = 0. 0065, one-tailed). In general, then, for our databases, prescreening potential biometric features, and choosing only highly reliable features yields better performance than choosing lower ICC features or than choosing all features combined. We also determined that, as the ICC of a group of features increases, the median of the genuine similarity score distribution increases and the spread of this distribution decreases. There was no statistically significant similar relationships for the impostor distributions. We believe that the ICC will find many uses in biometric research. In case of the eye movement-driven biometrics, the use of reliable features, as measured by ICC, allowed to us achieve the authentication performance with EER = 2.01%, which was not possible before.
Stereo Image Dense Matching by Integrating Sift and Sgm Algorithm

NASA Astrophysics Data System (ADS)

Zhou, Y.; Song, Y.; Lu, J.

2018-05-01

Semi-global matching(SGM) performs the dynamic programming by treating the different path directions equally. It does not consider the impact of different path directions on cost aggregation, and with the expansion of the disparity search range, the accuracy and efficiency of the algorithm drastically decrease. This paper presents a dense matching algorithm by integrating SIFT and SGM. It takes the successful matching pairs matched by SIFT as control points to direct the path in dynamic programming with truncating error propagation. Besides, matching accuracy can be improved by using the gradient direction of the detected feature points to modify the weights of the paths in different directions. The experimental results based on Middlebury stereo data sets and CE-3 lunar data sets demonstrate that the proposed algorithm can effectively cut off the error propagation, reduce disparity search range and improve matching accuracy.
In-memory interconnect protocol configuration registers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cheng, Kevin Y.; Roberts, David A.

Systems, apparatuses, and methods for moving the interconnect protocol configuration registers into the main memory space of a node. The region of memory used for storing the interconnect protocol configuration registers may also be made cacheable to reduce the latency of accesses to the interconnect protocol configuration registers. Interconnect protocol configuration registers which are used during a startup routine may be prefetched into the host's cache to make the startup routine more efficient. The interconnect protocol configuration registers for various interconnect protocols may include one or more of device capability tables, memory-side statistics (e.g., to support two-level memory data mappingmore » decisions), advanced memory and interconnect features such as repair resources and routing tables, prefetching hints, error correcting code (ECC) bits, lists of device capabilities, set and store base address, capability, device ID, status, configuration, capabilities, and other settings.« less
A Comparative Study of Workplace Bullying Among Public and Private Employees in Europe.

PubMed

Ariza-Montes, Antonio; Leal-Rodríguez, Antonio L; Leal-Millán, Antonio G

2015-06-01

Workplace bullying emerges from a set of individual, organizational, and contextual factors. The purpose of this article is hence to identify the influence of these factors among public and private employees. The study is carried out as a statistical-empirical cross-sectional study. The database used was obtained from the 5th European Working Conditions Survey 2010. The results reveal a common core with respect to the factors that determine workplace bullying. Despite this common base that integrates both models, the distinctive features of the harassed employee within the public sector deal with age, full-time work, the greater nighttime associated with certain public service professions, and a lower level of motivation. The present work summarizes a set of implications and proposes that, under normal conditions, workplace bullying could be reduced if job demands are limited and job resources are increased.
Non-rigid precession of magnetic stars

NASA Astrophysics Data System (ADS)

Lander, S. K.; Jones, D. I.

2017-06-01

Stars are, generically, rotating and magnetized objects with a misalignment between their magnetic and rotation axes. Since a magnetic field induces a permanent distortion to its host, it provides effective rigidity even to a fluid star, leading to bulk stellar motion that resembles free precession. This bulk motion is, however, accompanied by induced interior velocity and magnetic field perturbations, which are oscillatory on the precession time-scale. Extending previous work, we show that these quantities are described by a set of second-order perturbation equations featuring cross-terms scaling with the product of the magnetic and centrifugal distortions to the star. For the case of a background toroidal field, we reduce these to a set of differential equations in radial functions, and find a method for their solution. The resulting magnetic field and velocity perturbations show complex multipolar structure and are strongest towards the centre of the star.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.