Fast detection of vascular plaque in optical coherence tomography images using a reduced feature set
NASA Astrophysics Data System (ADS)
Prakash, Ammu; Ocana Macias, Mariano; Hewko, Mark; Sowa, Michael; Sherif, Sherif
2018-03-01
Optical coherence tomography (OCT) images are capable of detecting vascular plaque by using the full set of 26 Haralick textural features and a standard K-means clustering algorithm. However, the use of the full set of 26 textural features is computationally expensive and may not be feasible for real time implementation. In this work, we identified a reduced set of 3 textural feature which characterizes vascular plaque and used a generalized Fuzzy C-means clustering algorithm. Our work involves three steps: 1) the reduction of a full set 26 textural feature to a reduced set of 3 textural features by using genetic algorithm (GA) optimization method 2) the implementation of an unsupervised generalized clustering algorithm (Fuzzy C-means) on the reduced feature space, and 3) the validation of our results using histology and actual photographic images of vascular plaque. Our results show an excellent match with histology and actual photographic images of vascular tissue. Therefore, our results could provide an efficient pre-clinical tool for the detection of vascular plaque in real time OCT imaging.
Full-Text Searching on Major Supermarket Systems: Dialog, Data-Star, and Nexis.
ERIC Educational Resources Information Center
Tenopir, Carol; Berglund, Sharon
1993-01-01
Examines the similarities, differences, and full-text features of the three most-used online systems for full-text searching in general libraries: DIALOG, Data-Star, and NEXIS. Overlapping databases, unique sources, search features, proximity operators, set building, language enhancement and word equivalencies, and display features are discussed.…
Feature Selection for Ridge Regression with Provable Guarantees.
Paul, Saurabh; Drineas, Petros
2016-04-01
We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
NASA Astrophysics Data System (ADS)
Rees, S. J.; Jones, Bryan F.
1992-11-01
Once feature extraction has occurred in a processed image, the recognition problem becomes one of defining a set of features which maps sufficiently well onto one of the defined shape/object models to permit a claimed recognition. This process is usually handled by aggregating features until a large enough weighting is obtained to claim membership, or an adequate number of located features are matched to the reference set. A requirement has existed for an operator or measure capable of a more direct assessment of membership/occupancy between feature sets, particularly where the feature sets may be defective representations. Such feature set errors may be caused by noise, by overlapping of objects, and by partial obscuration of features. These problems occur at the point of acquisition: repairing the data would then assume a priori knowledge of the solution. The technique described in this paper offers a set theoretical measure for partial occupancy defined in terms of the set of minimum additions to permit full occupancy and the set of locations of occupancy if such additions are made. As is shown, this technique permits recognition of partial feature sets with quantifiable degrees of uncertainty. A solution to the problems of obscuration and overlapping is therefore available.
Optimizing Nanoscale Quantitative Optical Imaging of Subfield Scattering Targets
Henn, Mark-Alexander; Barnes, Bryan M.; Zhou, Hui; Sohn, Martin; Silver, Richard M.
2016-01-01
The full 3-D scattered field above finite sets of features has been shown to contain a continuum of spatial frequency information, and with novel optical microscopy techniques and electromagnetic modeling, deep-subwavelength geometrical parameters can be determined. Similarly, by using simulations, scattering geometries and experimental conditions can be established to tailor scattered fields that yield lower parametric uncertainties while decreasing the number of measurements and the area of such finite sets of features. Such optimized conditions are reported through quantitative optical imaging in 193 nm scatterfield microscopy using feature sets up to four times smaller in area than state-of-the-art critical dimension targets. PMID:27805660
Learning and Recognition of Clothing Genres From Full-Body Images.
Hidayati, Shintami C; You, Chuang-Wen; Cheng, Wen-Huang; Hua, Kai-Lung
2018-05-01
According to the theory of clothing design, the genres of clothes can be recognized based on a set of visually differentiable style elements, which exhibit salient features of visual appearance and reflect high-level fashion styles for better describing clothing genres. Instead of using less-discriminative low-level features or ambiguous keywords to identify clothing genres, we proposed a novel approach for automatically classifying clothing genres based on the visually differentiable style elements. A set of style elements, that are crucial for recognizing specific visual styles of clothing genres, were identified based on the clothing design theory. In addition, the corresponding salient visual features of each style element were identified and formulated with variables that can be computationally derived with various computer vision algorithms. To evaluate the performance of our algorithm, a dataset containing 3250 full-body shots crawled from popular online stores was built. Recognition results show that our proposed algorithms achieved promising overall precision, recall, and -score of 88.76%, 88.53%, and 88.64% for recognizing upperwear genres, and 88.21%, 88.17%, and 88.19% for recognizing lowerwear genres, respectively. The effectiveness of each style element and its visual features on recognizing clothing genres was demonstrated through a set of experiments involving different sets of style elements or features. In summary, our experimental results demonstrate the effectiveness of the proposed method in clothing genre recognition.
Waveform fitting and geometry analysis for full-waveform lidar feature extraction
NASA Astrophysics Data System (ADS)
Tsai, Fuan; Lai, Jhe-Syuan; Cheng, Yi-Hsiu
2016-10-01
This paper presents a systematic approach that integrates spline curve fitting and geometry analysis to extract full-waveform LiDAR features for land-cover classification. The cubic smoothing spline algorithm is used to fit the waveform curve of the received LiDAR signals. After that, the local peak locations of the waveform curve are detected using a second derivative method. According to the detected local peak locations, commonly used full-waveform features such as full width at half maximum (FWHM) and amplitude can then be obtained. In addition, the number of peaks, time difference between the first and last peaks, and the average amplitude are also considered as features of LiDAR waveforms with multiple returns. Based on the waveform geometry, dynamic time-warping (DTW) is applied to measure the waveform similarity. The sum of the absolute amplitude differences that remain after time-warping can be used as a similarity feature in a classification procedure. An airborne full-waveform LiDAR data set was used to test the performance of the developed feature extraction method for land-cover classification. Experimental results indicate that the developed spline curve- fitting algorithm and geometry analysis can extract helpful full-waveform LiDAR features to produce better land-cover classification than conventional LiDAR data and feature extraction methods. In particular, the multiple-return features and the dynamic time-warping index can improve the classification results significantly.
High-level intuitive features (HLIFs) for intuitive skin lesion description.
Amelard, Robert; Glaister, Jeffrey; Wong, Alexander; Clausi, David A
2015-03-01
A set of high-level intuitive features (HLIFs) is proposed to quantitatively describe melanoma in standard camera images. Melanoma is the deadliest form of skin cancer. With rising incidence rates and subjectivity in current clinical detection methods, there is a need for melanoma decision support systems. Feature extraction is a critical step in melanoma decision support systems. Existing feature sets for analyzing standard camera images are comprised of low-level features, which exist in high-dimensional feature spaces and limit the system's ability to convey intuitive diagnostic rationale. The proposed HLIFs were designed to model the ABCD criteria commonly used by dermatologists such that each HLIF represents a human-observable characteristic. As such, intuitive diagnostic rationale can be conveyed to the user. Experimental results show that concatenating the proposed HLIFs with a full low-level feature set increased classification accuracy, and that HLIFs were able to separate the data better than low-level features with statistical significance. An example of a graphical interface for providing intuitive rationale is given.
Perceptual quality estimation of H.264/AVC videos using reduced-reference and no-reference models
NASA Astrophysics Data System (ADS)
Shahid, Muhammad; Pandremmenou, Katerina; Kondi, Lisimachos P.; Rossholm, Andreas; Lövström, Benny
2016-09-01
Reduced-reference (RR) and no-reference (NR) models for video quality estimation, using features that account for the impact of coding artifacts, spatio-temporal complexity, and packet losses, are proposed. The purpose of this study is to analyze a number of potentially quality-relevant features in order to select the most suitable set of features for building the desired models. The proposed sets of features have not been used in the literature and some of the features are used for the first time in this study. The features are employed by the least absolute shrinkage and selection operator (LASSO), which selects only the most influential of them toward perceptual quality. For comparison, we apply feature selection in the complete feature sets and ridge regression on the reduced sets. The models are validated using a database of H.264/AVC encoded videos that were subjectively assessed for quality in an ITU-T compliant laboratory. We infer that just two features selected by RR LASSO and two bitstream-based features selected by NR LASSO are able to estimate perceptual quality with high accuracy, higher than that of ridge, which uses more features. The comparisons with competing works and two full-reference metrics also verify the superiority of our models.
Use of volumetric features for temporal comparison of mass lesions in full field digital mammograms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bozek, Jelena, E-mail: jelena.bozek@fer.hr; Grgic, Mislav; Kallenberg, Michiel
2014-02-15
Purpose: Temporal comparison of lesions might improve classification between benign and malignant lesions in full-field digital mammograms (FFDM). The authors compare the use of volumetric features for lesion classification, which are computed from dense tissue thickness maps, to the use of mammographic lesion area. Use of dense tissue thickness maps for lesion characterization is advantageous, since it results in lesion features that are invariant to acquisition parameters. Methods: The dataset used in the analysis consisted of 60 temporal mammogram pairs comprising 120 mediolateral oblique or craniocaudal views with a total of 65 lesions, of which 41 were benign and 24more » malignant. The authors analyzed the performance of four volumetric features, area, and four other commonly used features obtained from temporal mammogram pairs, current mammograms, and prior mammograms. The authors evaluated the individual performance of all features and of different feature sets. The authors used linear discriminant analysis with leave-one-out cross validation to classify different feature sets. Results: Volumetric features from temporal mammogram pairs achieved the best individual performance, as measured by the area under the receiver operating characteristic curve (A{sub z} value). Volume change (A{sub z} = 0.88) achieved higher A{sub z} value than projected lesion area change (A{sub z} = 0.78) in the temporal comparison of lesions. Best performance was achieved with a set that consisted of a set of features extracted from the current exam combined with four volumetric features representing changes with respect to the prior mammogram (A{sub z} = 0.90). This was significantly better (p = 0.005) than the performance obtained using features from the current exam only (A{sub z} = 0.77). Conclusions: Volumetric features from temporal mammogram pairs combined with features from the single exam significantly improve discrimination of benign and malignant lesions in FFDM mammograms compared to using only single exam features. In the comparison with prior mammograms, use of volumetric change may lead to better performance than use of lesion area change.« less
A Review of Feature Extraction Software for Microarray Gene Expression Data
Tan, Ching Siang; Ting, Wai Soon; Mohamad, Mohd Saberi; Chan, Weng Howe; Deris, Safaai; Ali Shah, Zuraini
2014-01-01
When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming large-scale gene expression data into a set of genes is called feature extraction. If the genes extracted are carefully chosen, this gene set can extract the relevant information from the large-scale gene expression data, allowing further analysis by using this reduced representation instead of the full size data. In this paper, we review numerous software applications that can be used for feature extraction. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component Analysis (ICA), Partial Least Squares (PLS), and Local Linear Embedding (LLE). A summary and sources of the software are provided in the last section for each feature extraction method. PMID:25250315
Dimensionality Reduction Through Classifier Ensembles
NASA Technical Reports Server (NTRS)
Oza, Nikunj C.; Tumer, Kagan; Norwig, Peter (Technical Monitor)
1999-01-01
In data mining, one often needs to analyze datasets with a very large number of attributes. Performing machine learning directly on such data sets is often impractical because of extensive run times, excessive complexity of the fitted model (often leading to overfitting), and the well-known "curse of dimensionality." In practice, to avoid such problems, feature selection and/or extraction are often used to reduce data dimensionality prior to the learning step. However, existing feature selection/extraction algorithms either evaluate features by their effectiveness across the entire data set or simply disregard class information altogether (e.g., principal component analysis). Furthermore, feature extraction algorithms such as principal components analysis create new features that are often meaningless to human users. In this article, we present input decimation, a method that provides "feature subsets" that are selected for their ability to discriminate among the classes. These features are subsequently used in ensembles of classifiers, yielding results superior to single classifiers, ensembles that use the full set of features, and ensembles based on principal component analysis on both real and synthetic datasets.
Single and Multiple Object Tracking Using a Multi-Feature Joint Sparse Representation.
Hu, Weiming; Li, Wei; Zhang, Xiaoqin; Maybank, Stephen
2015-04-01
In this paper, we propose a tracking algorithm based on a multi-feature joint sparse representation. The templates for the sparse representation can include pixel values, textures, and edges. In the multi-feature joint optimization, noise or occlusion is dealt with using a set of trivial templates. A sparse weight constraint is introduced to dynamically select the relevant templates from the full set of templates. A variance ratio measure is adopted to adaptively adjust the weights of different features. The multi-feature template set is updated adaptively. We further propose an algorithm for tracking multi-objects with occlusion handling based on the multi-feature joint sparse reconstruction. The observation model based on sparse reconstruction automatically focuses on the visible parts of an occluded object by using the information in the trivial templates. The multi-object tracking is simplified into a joint Bayesian inference. The experimental results show the superiority of our algorithm over several state-of-the-art tracking algorithms.
A wavelet-based approach for a continuous analysis of phonovibrograms.
Unger, Jakob; Meyer, Tobias; Doellinger, Michael; Hecker, Dietmar J; Schick, Bernhard; Lohscheller, Joerg
2012-01-01
Recently, endoscopic high-speed laryngoscopy has been established for commercial use and constitutes a state-of-the-art technique to examine vocal fold dynamics. Despite overcoming many limitations of commonly applied stroboscopy it has not gained widespread clinical application, yet. A major drawback is a missing methodology of extracting valuable features to support visual assessment or computer-aided diagnosis. In this paper a compact and descriptive feature set is presented. The feature extraction routines are based on two-dimensional color graphs called phonovibrograms (PVG). These graphs contain the full spatio-temporal pattern of vocal fold dynamics and are therefore suited to derive features that comprehensively describe the vibration pattern of vocal folds. Within our approach, clinically relevant features such as glottal closure type, symmetry and periodicity are quantified in a set of 10 descriptive features. The suitability for classification tasks is shown using a clinical data set comprising 50 healthy and 50 paralytic subjects. A classification accuracy of 93.2% has been achieved.
How well does multiple OCR error correction generalize?
NASA Astrophysics Data System (ADS)
Lund, William B.; Ringger, Eric K.; Walker, Daniel D.
2013-12-01
As the digitization of historical documents, such as newspapers, becomes more common, the need of the archive patron for accurate digital text from those documents increases. Building on our earlier work, the contributions of this paper are: 1. in demonstrating the applicability of novel methods for correcting optical character recognition (OCR) on disparate data sets, including a new synthetic training set, 2. enhancing the correction algorithm with novel features, and 3. assessing the data requirements of the correction learning method. First, we correct errors using conditional random fields (CRF) trained on synthetic training data sets in order to demonstrate the applicability of the methodology to unrelated test sets. Second, we show the strength of lexical features from the training sets on two unrelated test sets, yielding a relative reduction in word error rate on the test sets of 6.52%. New features capture the recurrence of hypothesis tokens and yield an additional relative reduction in WER of 2.30%. Further, we show that only 2.0% of the full training corpus of over 500,000 feature cases is needed to achieve correction results comparable to those using the entire training corpus, effectively reducing both the complexity of the training process and the learned correction model.
NASA Astrophysics Data System (ADS)
Wang, Shijun; Yao, Jianhua; Petrick, Nicholas A.; Summers, Ronald M.
2009-02-01
Colon cancer is the second leading cause of cancer-related deaths in the United States. Computed tomographic colonography (CTC) combined with a computer aided detection system provides a feasible combination for improving colonic polyps detection and increasing the use of CTC for colon cancer screening. To distinguish true polyps from false positives, various features extracted from polyp candidates have been proposed. Most of these features try to capture the shape information of polyp candidates or neighborhood knowledge about the surrounding structures (fold, colon wall, etc.). In this paper, we propose a new set of shape descriptors for polyp candidates based on statistical curvature information. These features, called histogram of curvature features, are rotation, translation and scale invariant and can be treated as complementing our existing feature set. Then in order to make full use of the traditional features (defined as group A) and the new features (group B) which are highly heterogeneous, we employed a multiple kernel learning method based on semi-definite programming to identify an optimized classification kernel based on the combined set of features. We did leave-one-patient-out test on a CTC dataset which contained scans from 50 patients (with 90 6-9mm polyp detections). Experimental results show that a support vector machine (SVM) based on the combined feature set and the semi-definite optimization kernel achieved higher FROC performance compared to SVMs using the two groups of features separately. At a false positive per patient rate of 7, the sensitivity on 6-9mm polyps using the combined features improved from 0.78 (Group A) and 0.73 (Group B) to 0.82 (p<=0.01).
Janet, Jon Paul; Kulik, Heather J
2017-11-22
Machine learning (ML) of quantum mechanical properties shows promise for accelerating chemical discovery. For transition metal chemistry where accurate calculations are computationally costly and available training data sets are small, the molecular representation becomes a critical ingredient in ML model predictive accuracy. We introduce a series of revised autocorrelation functions (RACs) that encode relationships of the heuristic atomic properties (e.g., size, connectivity, and electronegativity) on a molecular graph. We alter the starting point, scope, and nature of the quantities evaluated in standard ACs to make these RACs amenable to inorganic chemistry. On an organic molecule set, we first demonstrate superior standard AC performance to other presently available topological descriptors for ML model training, with mean unsigned errors (MUEs) for atomization energies on set-aside test molecules as low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs on set-aside test molecules in spin-state splitting in comparison to 15-20× higher errors for feature sets that encode whole-molecule structural information. Systematic feature selection methods including univariate filtering, recursive feature elimination, and direct optimization (e.g., random forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5× smaller than the full RAC set produce sub- to 1 kcal/mol spin-splitting MUEs, with good transferability to metal-ligand bond length prediction (0.004-5 Å MUE) and redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature selection results across property sets reveals the relative importance of local, electronic descriptors (e.g., electronegativity, atomic number) in spin-splitting and distal, steric effects in redox potential and bond lengths.
ERIC Educational Resources Information Center
Striefel, Sebastian; And Others
The review papers are a product of the 3-year project, "Functional Mainstreaming for Success," designed to develop a model for instructional mainstreaming of 162 handicapped children (3-6 years old) in community settings. The major feature of the project was development of a full reverse mainstreamed preschool program, which included…
TU-CD-BRB-01: Normal Lung CT Texture Features Improve Predictive Models for Radiation Pneumonitis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krafft, S; The University of Texas Graduate School of Biomedical Sciences, Houston, TX; Briere, T
2015-06-15
Purpose: Existing normal tissue complication probability (NTCP) models for radiation pneumonitis (RP) traditionally rely on dosimetric and clinical data but are limited in terms of performance and generalizability. Extraction of pre-treatment image features provides a potential new category of data that can improve NTCP models for RP. We consider quantitative measures of total lung CT intensity and texture in a framework for prediction of RP. Methods: Available clinical and dosimetric data was collected for 198 NSCLC patients treated with definitive radiotherapy. Intensity- and texture-based image features were extracted from the T50 phase of the 4D-CT acquired for treatment planning. Amore » total of 3888 features (15 clinical, 175 dosimetric, and 3698 image features) were gathered and considered candidate predictors for modeling of RP grade≥3. A baseline logistic regression model with mean lung dose (MLD) was first considered. Additionally, a least absolute shrinkage and selection operator (LASSO) logistic regression was applied to the set of clinical and dosimetric features, and subsequently to the full set of clinical, dosimetric, and image features. Model performance was assessed by comparing area under the curve (AUC). Results: A simple logistic fit of MLD was an inadequate model of the data (AUC∼0.5). Including clinical and dosimetric parameters within the framework of the LASSO resulted in improved performance (AUC=0.648). Analysis of the full cohort of clinical, dosimetric, and image features provided further and significant improvement in model performance (AUC=0.727). Conclusions: To achieve significant gains in predictive modeling of RP, new categories of data should be considered in addition to clinical and dosimetric features. We have successfully incorporated CT image features into a framework for modeling RP and have demonstrated improved predictive performance. Validation and further investigation of CT image features in the context of RP NTCP modeling is warranted. This work was supported by the Rosalie B. Hite Fellowship in Cancer research awarded to SPK.« less
Late summer sea ice segmentation with multi-polarisation SAR features in C- and X-band
NASA Astrophysics Data System (ADS)
Fors, A. S.; Brekke, C.; Doulgeris, A. P.; Eltoft, T.; Renner, A. H. H.; Gerland, S.
2015-09-01
In this study we investigate the potential of sea ice segmentation by C- and X-band multi-polarisation synthetic aperture radar (SAR) features during late summer. Five high-resolution satellite SAR scenes were recorded in the Fram Strait covering iceberg-fast first-year and old sea ice during a week with air temperatures varying around zero degrees Celsius. In situ data consisting of sea ice thickness, surface roughness and aerial photographs were collected during a helicopter flight at the site. Six polarimetric SAR features were extracted for each of the scenes. The ability of the individual SAR features to discriminate between sea ice types and their temporally consistency were examined. All SAR features were found to add value to sea ice type discrimination. Relative kurtosis, geometric brightness, cross-polarisation ratio and co-polarisation correlation angle were found to be temporally consistent in the investigated period, while co-polarisation ratio and co-polarisation correlation magnitude were found to be temporally inconsistent. An automatic feature-based segmentation algorithm was tested both for a full SAR feature set, and for a reduced SAR feature set limited to temporally consistent features. In general, the algorithm produces a good late summer sea ice segmentation. Excluding temporally inconsistent SAR features improved the segmentation at air temperatures above zero degrees Celcius.
Orlandi, Silvia; Reyes Garcia, Carlos Alberto; Bandini, Andrea; Donzelli, Gianpaolo; Manfredi, Claudia
2016-11-01
Scientific and clinical advances in perinatology and neonatology have enhanced the chances of survival of preterm and very low weight neonates. Infant cry analysis is a suitable noninvasive complementary tool to assess the neurologic state of infants particularly important in the case of preterm neonates. This article aims at exploiting differences between full-term and preterm infant cry with robust automatic acoustical analysis and data mining techniques. Twenty-two acoustical parameters are estimated in more than 3000 cry units from cry recordings of 28 full-term and 10 preterm newborns. Feature extraction is performed through the BioVoice dedicated software tool, developed at the Biomedical Engineering Lab, University of Firenze, Italy. Classification and pattern recognition is based on genetic algorithms for the selection of the best attributes. Training is performed comparing four classifiers: Logistic Curve, Multilayer Perceptron, Support Vector Machine, and Random Forest and three different testing options: full training set, 10-fold cross-validation, and 66% split. Results show that the best feature set is made up by 10 parameters capable to assess differences between preterm and full-term newborns with about 87% of accuracy. Best results are obtained with the Random Forest method (receiver operating characteristic area, 0.94). These 10 cry features might convey important additional information to assist the clinical specialist in the diagnosis and follow-up of possible delays or disorders in the neurologic development due to premature birth in this extremely vulnerable population of patients. The proposed approach is a first step toward an automatic infant cry recognition system for fast and proper identification of risk in preterm babies. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Feature Selection and Pedestrian Detection Based on Sparse Representation.
Yao, Shihong; Wang, Tao; Shen, Weiming; Pan, Shaoming; Chong, Yanwen; Ding, Fei
2015-01-01
Pedestrian detection have been currently devoted to the extraction of effective pedestrian features, which has become one of the obstacles in pedestrian detection application according to the variety of pedestrian features and their large dimension. Based on the theoretical analysis of six frequently-used features, SIFT, SURF, Haar, HOG, LBP and LSS, and their comparison with experimental results, this paper screens out the sparse feature subsets via sparse representation to investigate whether the sparse subsets have the same description abilities and the most stable features. When any two of the six features are fused, the fusion feature is sparsely represented to obtain its important components. Sparse subsets of the fusion features can be rapidly generated by avoiding calculation of the corresponding index of dimension numbers of these feature descriptors; thus, the calculation speed of the feature dimension reduction is improved and the pedestrian detection time is reduced. Experimental results show that sparse feature subsets are capable of keeping the important components of these six feature descriptors. The sparse features of HOG and LSS possess the same description ability and consume less time compared with their full features. The ratios of the sparse feature subsets of HOG and LSS to their full sets are the highest among the six, and thus these two features can be used to best describe the characteristics of the pedestrian and the sparse feature subsets of the combination of HOG-LSS show better distinguishing ability and parsimony.
Wang, Shijun; Yao, Jianhua; Petrick, Nicholas; Summers, Ronald M.
2010-01-01
Colon cancer is the second leading cause of cancer-related deaths in the United States. Computed tomographic colonography (CTC) combined with a computer aided detection system provides a feasible approach for improving colonic polyps detection and increasing the use of CTC for colon cancer screening. To distinguish true polyps from false positives, various features extracted from polyp candidates have been proposed. Most of these traditional features try to capture the shape information of polyp candidates or neighborhood knowledge about the surrounding structures (fold, colon wall, etc.). In this paper, we propose a new set of shape descriptors for polyp candidates based on statistical curvature information. These features called histograms of curvature features are rotation, translation and scale invariant and can be treated as complementing existing feature set. Then in order to make full use of the traditional geometric features (defined as group A) and the new statistical features (group B) which are highly heterogeneous, we employed a multiple kernel learning method based on semi-definite programming to learn an optimized classification kernel from the two groups of features. We conducted leave-one-patient-out test on a CTC dataset which contained scans from 66 patients. Experimental results show that a support vector machine (SVM) based on the combined feature set and the semi-definite optimization kernel achieved higher FROC performance compared to SVMs using the two groups of features separately. At a false positive per scan rate of 5, the sensitivity of the SVM using the combined features improved from 0.77 (Group A) and 0.73 (Group B) to 0.83 (p ≤ 0.01). PMID:20953299
Keller, Brad M; Oustimov, Andrew; Wang, Yan; Chen, Jinbo; Acciavatti, Raymond J; Zheng, Yuanjie; Ray, Shonket; Gee, James C; Maidment, Andrew D A; Kontos, Despina
2015-04-01
An analytical framework is presented for evaluating the equivalence of parenchymal texture features across different full-field digital mammography (FFDM) systems using a physical breast phantom. Phantom images (FOR PROCESSING) are acquired from three FFDM systems using their automated exposure control setting. A panel of texture features, including gray-level histogram, co-occurrence, run length, and structural descriptors, are extracted. To identify features that are robust across imaging systems, a series of equivalence tests are performed on the feature distributions, in which the extent of their intersystem variation is compared to their intrasystem variation via the Hodges-Lehmann test statistic. Overall, histogram and structural features tend to be most robust across all systems, and certain features, such as edge enhancement, tend to be more robust to intergenerational differences between detectors of a single vendor than to intervendor differences. Texture features extracted from larger regions of interest (i.e., [Formula: see text]) and with a larger offset length (i.e., [Formula: see text]), when applicable, also appear to be more robust across imaging systems. This framework and observations from our experiments may benefit applications utilizing mammographic texture analysis on images acquired in multivendor settings, such as in multicenter studies of computer-aided detection and breast cancer risk assessment.
VARS-TOOL: A Comprehensive, Efficient, and Robust Sensitivity Analysis Toolbox
NASA Astrophysics Data System (ADS)
Razavi, S.; Sheikholeslami, R.; Haghnegahdar, A.; Esfahbod, B.
2016-12-01
VARS-TOOL is an advanced sensitivity and uncertainty analysis toolbox, applicable to the full range of computer simulation models, including Earth and Environmental Systems Models (EESMs). The toolbox was developed originally around VARS (Variogram Analysis of Response Surfaces), which is a general framework for Global Sensitivity Analysis (GSA) that utilizes the variogram/covariogram concept to characterize the full spectrum of sensitivity-related information, thereby providing a comprehensive set of "global" sensitivity metrics with minimal computational cost. VARS-TOOL is unique in that, with a single sample set (set of simulation model runs), it generates simultaneously three philosophically different families of global sensitivity metrics, including (1) variogram-based metrics called IVARS (Integrated Variogram Across a Range of Scales - VARS approach), (2) variance-based total-order effects (Sobol approach), and (3) derivative-based elementary effects (Morris approach). VARS-TOOL is also enabled with two novel features; the first one being a sequential sampling algorithm, called Progressive Latin Hypercube Sampling (PLHS), which allows progressively increasing the sample size for GSA while maintaining the required sample distributional properties. The second feature is a "grouping strategy" that adaptively groups the model parameters based on their sensitivity or functioning to maximize the reliability of GSA results. These features in conjunction with bootstrapping enable the user to monitor the stability, robustness, and convergence of GSA with the increase in sample size for any given case study. VARS-TOOL has been shown to achieve robust and stable results within 1-2 orders of magnitude smaller sample sizes (fewer model runs) than alternative tools. VARS-TOOL, available in MATLAB and Python, is under continuous development and new capabilities and features are forthcoming.
Oversampling the Minority Class in the Feature Space.
Perez-Ortiz, Maria; Gutierrez, Pedro Antonio; Tino, Peter; Hervas-Martinez, Cesar
2016-09-01
The imbalanced nature of some real-world data is one of the current challenges for machine learning researchers. One common approach oversamples the minority class through convex combination of its patterns. We explore the general idea of synthetic oversampling in the feature space induced by a kernel function (as opposed to input space). If the kernel function matches the underlying problem, the classes will be linearly separable and synthetically generated patterns will lie on the minority class region. Since the feature space is not directly accessible, we use the empirical feature space (EFS) (a Euclidean space isomorphic to the feature space) for oversampling purposes. The proposed method is framed in the context of support vector machines, where the imbalanced data sets can pose a serious hindrance. The idea is investigated in three scenarios: 1) oversampling in the full and reduced-rank EFSs; 2) a kernel learning technique maximizing the data class separation to study the influence of the feature space structure (implicitly defined by the kernel function); and 3) a unified framework for preferential oversampling that spans some of the previous approaches in the literature. We support our investigation with extensive experiments over 50 imbalanced data sets.
Online gesture spotting from visual hull data.
Peng, Bo; Qian, Gang
2011-06-01
This paper presents a robust framework for online full-body gesture spotting from visual hull data. Using view-invariant pose features as observations, hidden Markov models (HMMs) are trained for gesture spotting from continuous movement data streams. Two major contributions of this paper are 1) view-invariant pose feature extraction from visual hulls, and 2) a systematic approach to automatically detecting and modeling specific nongesture movement patterns and using their HMMs for outlier rejection in gesture spotting. The experimental results have shown the view-invariance property of the proposed pose features for both training poses and new poses unseen in training, as well as the efficacy of using specific nongesture models for outlier rejection. Using the IXMAS gesture data set, the proposed framework has been extensively tested and the gesture spotting results are superior to those reported on the same data set obtained using existing state-of-the-art gesture spotting methods.
2014-08-12
ISS040-E-092581 (12 Aug. 2014) --- A portion of the International Space Station?s Zvezda Service Module with the newly attached "Georges Lemaitre" Automated Transfer Vehicle-5 (ATV-5) is featured in this image photographed by an Expedition 40 crew member onboard the station. A waning full moon is visible in the background.
Optimizing data collection for public health decisions: a data mining approach
2014-01-01
Background Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. Methods The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Results Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. Conclusions While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost. PMID:24919484
Optimizing data collection for public health decisions: a data mining approach.
Partington, Susan N; Papakroni, Vasil; Menzies, Tim
2014-06-12
Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost.
Targeted Feature Detection for Data-Dependent Shotgun Proteomics
2017-01-01
Label-free quantification of shotgun LC–MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification (“FFId”), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between “internal” and “external” (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the “uncertain” feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS (www.openms.org). PMID:28673088
Targeted Feature Detection for Data-Dependent Shotgun Proteomics.
Weisser, Hendrik; Choudhary, Jyoti S
2017-08-04
Label-free quantification of shotgun LC-MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification ("FFId"), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between "internal" and "external" (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the "uncertain" feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS ( www.openms.org ).
A Statistical Analysis of Corona Topography: New Insights into Corona Formation and Evolution
NASA Technical Reports Server (NTRS)
Stofan, E. R.; Glaze, L. S.; Smrekar, S. E.; Baloga, S. M.
2003-01-01
Extensive mapping of the surface of Venus and continued analysis of Magellan data have allowed a more comprehensive survey of coronae to be conducted. Our updated corona database contains 514 features, an increase from the 326 coronae of the previous survey. We include a new set of 106 Type 2 or stealth coronae, which have a topographic rather than a fracture annulus. The large increase in the number of coronae over the 1992 survey results from several factors, including the use of the full Magellan data set and the addition of features identified as part of the systematic geologic mapping of Venus. Parameters of the population that we have analyzed to date include size and topography.
Online feature selection with streaming features.
Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan
2013-05-01
We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.
Late-summer sea ice segmentation with multi-polarisation SAR features in C and X band
NASA Astrophysics Data System (ADS)
Fors, Ane S.; Brekke, Camilla; Doulgeris, Anthony P.; Eltoft, Torbjørn; Renner, Angelika H. H.; Gerland, Sebastian
2016-02-01
In this study, we investigate the potential of sea ice segmentation by C- and X-band multi-polarisation synthetic aperture radar (SAR) features during late summer. Five high-resolution satellite SAR scenes were recorded in the Fram Strait covering iceberg-fast first-year and old sea ice during a week with air temperatures varying around 0 °C. Sea ice thickness, surface roughness and aerial photographs were collected during a helicopter flight at the site. Six polarimetric SAR features were extracted for each of the scenes. The ability of the individual SAR features to discriminate between sea ice types and their temporal consistency were examined. All SAR features were found to add value to sea ice type discrimination. Relative kurtosis, geometric brightness, cross-polarisation ratio and co-polarisation correlation angle were found to be temporally consistent in the investigated period, while co-polarisation ratio and co-polarisation correlation magnitude were found to be temporally inconsistent. An automatic feature-based segmentation algorithm was tested both for a full SAR feature set and for a reduced SAR feature set limited to temporally consistent features. In C band, the algorithm produced a good late-summer sea ice segmentation, separating the scenes into segments that could be associated with different sea ice types in the next step. The X-band performance was slightly poorer. Excluding temporally inconsistent SAR features improved the segmentation in one of the X-band scenes.
ERIC Educational Resources Information Center
Adams, Kate; Bull, Rebecca; Maynes, Mary-Louise
2016-01-01
Early years education is a holistic endeavour, with some education policies including spiritual development as part of that approach. However, studies exploring the spirituality of young children are scarce, which limits understanding of the phenomenon and its full application in educational settings. Furthermore, nurturing children's spiritual…
"Shakespeare with Heart": An Inclusive Drama Project
ERIC Educational Resources Information Center
Wilkins, Ilene E.
2008-01-01
This article features Shakespeare with Heart, a two week inclusive summer program for middle and high school students with and without disabilities. The program runs each morning until noon, culminating with a workshop performance of a Shakespeare play with full costume and set with a live audience of parents, friends, and community members. The…
2014-08-12
ISS040-E-092583 (12 Aug. 2014) --- A portion of the International Space Station?s Russian segment with the newly attached "Georges Lemaitre" Automated Transfer Vehicle-5 (ATV-5) to the Zvezda Service Module is featured in this image photographed by an Expedition 40 crew member onboard the station. A waning full moon is visible in the background.
Thermo-mechanical evaluation of carbon-carbon primary structure for SSTO vehicles
NASA Astrophysics Data System (ADS)
Croop, Harold C.; Lowndes, Holland B.; Hahn, Steven E.; Barthel, Chris A.
1998-01-01
An advanced development program to demonstrate carbon-carbon composite structure for use as primary load carrying structure has entered the experimental validation phase. The component being evaluated is a wing torque box section for a single-stage-to-orbit (SSTO) vehicle. The validation or demonstration component features an advanced carbon-carbon design incorporating 3D woven graphite preforms, integral spars, oxidation inhibited matrix, chemical vapor deposited (CVD) oxidation protection coating, and ceramic matrix composite fasteners. The validation component represents the culmination of a four phase design and fabrication development effort. Extensive developmental testing was performed to verify material properties and integrity of basic design features before committing to fabrication of the full scale box. The wing box component is now being set up for testing in the Air Force Research Laboratory Structural Test Facility at Wright-Patterson Air Force Base, Ohio. One of the important developmental tests performed in support of the design and planned testing of the full scale box was the fabrication and test of a skin/spar trial subcomponent. The trial subcomponent incorporated critical features of the full scale wing box design. This paper discusses the results of the trial subcomponent test which served as a pathfinder for the upcoming full scale box test.
NASA Astrophysics Data System (ADS)
Castillo, Richard; Castillo, Edward; Fuentes, David; Ahmad, Moiz; Wood, Abbie M.; Ludwig, Michelle S.; Guerrero, Thomas
2013-05-01
Landmark point-pairs provide a strategy to assess deformable image registration (DIR) accuracy in terms of the spatial registration of the underlying anatomy depicted in medical images. In this study, we propose to augment a publicly available database (www.dir-lab.com) of medical images with large sets of manually identified anatomic feature pairs between breath-hold computed tomography (BH-CT) images for DIR spatial accuracy evaluation. Ten BH-CT image pairs were randomly selected from the COPDgene study cases. Each patient had received CT imaging of the entire thorax in the supine position at one-fourth dose normal expiration and maximum effort full dose inspiration. Using dedicated in-house software, an imaging expert manually identified large sets of anatomic feature pairs between images. Estimates of inter- and intra-observer spatial variation in feature localization were determined by repeat measurements of multiple observers over subsets of randomly selected features. 7298 anatomic landmark features were manually paired between the 10 sets of images. Quantity of feature pairs per case ranged from 447 to 1172. Average 3D Euclidean landmark displacements varied substantially among cases, ranging from 12.29 (SD: 6.39) to 30.90 (SD: 14.05) mm. Repeat registration of uniformly sampled subsets of 150 landmarks for each case yielded estimates of observer localization error, which ranged in average from 0.58 (SD: 0.87) to 1.06 (SD: 2.38) mm for each case. The additions to the online web database (www.dir-lab.com) described in this work will broaden the applicability of the reference data, providing a freely available common dataset for targeted critical evaluation of DIR spatial accuracy performance in multiple clinical settings. Estimates of observer variance in feature localization suggest consistent spatial accuracy for all observers across both four-dimensional CT and COPDgene patient cohorts.
Polypeptide having or assisting in carbohydrate material degrading activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter
2016-02-16
The invention relates to a polypeptide which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well asmore » the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.« less
Polypeptide having swollenin activity and uses thereof
Schoonneveld-Bergmans, Margot Elizabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica D; Damveld, Robbertus Antonius
2015-11-04
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel; Damveld, Robbertus Antonius
2015-09-01
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having cellobiohydrolase activity and uses thereof
Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter
2015-09-15
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having acetyl xylan esterase activity and uses thereof
Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter
2015-10-20
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having carbohydrate degrading activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica Diana; Damveld, Robbertus Antonius
2015-08-18
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Impacting Readiness: Nature and Nurture
ERIC Educational Resources Information Center
Healy, Jane M.
2011-01-01
Whereas some four year olds could draw a person with five fingers on each hand and a full set of facial features, others could barely hold a pencil. Some sat quietly in a small group, intently listening to and understanding a story, while others wiggled, fidgeted, and couldn't focus their attention. In those days, before the explosion of…
Automated identification of diagnosis and co-morbidity in clinical records.
Cano, C; Blanco, A; Peshkin, L
2009-01-01
Automated understanding of clinical records is a challenging task involving various legal and technical difficulties. Clinical free text is inherently redundant, unstructured, and full of acronyms, abbreviations and domain-specific language which make it challenging to mine automatically. There is much effort in the field focused on creating specialized ontology, lexicons and heuristics based on expert knowledge of the domain. However, ad-hoc solutions poorly generalize across diseases or diagnoses. This paper presents a successful approach for a rapid prototyping of a diagnosis classifier based on a popular computational linguistics platform. The corpus consists of several hundred of full length discharge summaries provided by Partners Healthcare. The goal is to identify a diagnosis and assign co-morbidi-ty. Our approach is based on the rapid implementation of a logistic regression classifier using an existing toolkit: LingPipe (http://alias-i.com/lingpipe). We implement and compare three different classifiers. The baseline approach uses character 5-grams as features. The second approach uses a bag-of-words representation enriched with a small additional set of features. The third approach reduces a feature set to the most informative features according to the information content. The proposed systems achieve high performance (average F-micro 0.92) for the task. We discuss the relative merit of the three classifiers. Supplementary material with detailed results is available at: http:// decsai.ugr.es/~ccano/LR/supplementary_ material/ We show that our methodology for rapid prototyping of a domain-unaware system is effective for building an accurate classifier for clinical records.
Automated placement of interfaces in conformational kinetics calculations using machine learning
NASA Astrophysics Data System (ADS)
Grazioli, Gianmarc; Butts, Carter T.; Andricioaei, Ioan
2017-10-01
Several recent implementations of algorithms for sampling reaction pathways employ a strategy for placing interfaces or milestones across the reaction coordinate manifold. Interfaces can be introduced such that the full feature space describing the dynamics of a macromolecule is divided into Voronoi (or other) cells, and the global kinetics of the molecular motions can be calculated from the set of fluxes through the interfaces between the cells. Although some methods of this type are exact for an arbitrary set of cells, in practice, the calculations will converge fastest when the interfaces are placed in regions where they can best capture transitions between configurations corresponding to local minima. The aim of this paper is to introduce a fully automated machine-learning algorithm for defining a set of cells for use in kinetic sampling methodologies based on subdividing the dynamical feature space; the algorithm requires no intuition about the system or input from the user and scales to high-dimensional systems.
Automated placement of interfaces in conformational kinetics calculations using machine learning.
Grazioli, Gianmarc; Butts, Carter T; Andricioaei, Ioan
2017-10-21
Several recent implementations of algorithms for sampling reaction pathways employ a strategy for placing interfaces or milestones across the reaction coordinate manifold. Interfaces can be introduced such that the full feature space describing the dynamics of a macromolecule is divided into Voronoi (or other) cells, and the global kinetics of the molecular motions can be calculated from the set of fluxes through the interfaces between the cells. Although some methods of this type are exact for an arbitrary set of cells, in practice, the calculations will converge fastest when the interfaces are placed in regions where they can best capture transitions between configurations corresponding to local minima. The aim of this paper is to introduce a fully automated machine-learning algorithm for defining a set of cells for use in kinetic sampling methodologies based on subdividing the dynamical feature space; the algorithm requires no intuition about the system or input from the user and scales to high-dimensional systems.
Online tracking of outdoor lighting variations for augmented reality with moving cameras.
Liu, Yanli; Granier, Xavier
2012-04-01
In augmented reality, one of key tasks to achieve a convincing visual appearance consistency between virtual objects and video scenes is to have a coherent illumination along the whole sequence. As outdoor illumination is largely dependent on the weather, the lighting condition may change from frame to frame. In this paper, we propose a full image-based approach for online tracking of outdoor illumination variations from videos captured with moving cameras. Our key idea is to estimate the relative intensities of sunlight and skylight via a sparse set of planar feature-points extracted from each frame. To address the inevitable feature misalignments, a set of constraints are introduced to select the most reliable ones. Exploiting the spatial and temporal coherence of illumination, the relative intensities of sunlight and skylight are finally estimated by using an optimization process. We validate our technique on a set of real-life videos and show that the results with our estimations are visually coherent along the video sequences.
Computer aided detection of clusters of microcalcifications on full field digital mammograms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ge Jun; Sahiner, Berkman; Hadjiiski, Lubomir M.
2006-08-15
We are developing a computer-aided detection (CAD) system to identify microcalcification clusters (MCCs) automatically on full field digital mammograms (FFDMs). The CAD system includes six stages: preprocessing; image enhancement; segmentation of microcalcification candidates; false positive (FP) reduction for individual microcalcifications; regional clustering; and FP reduction for clustered microcalcifications. At the stage of FP reduction for individual microcalcifications, a truncated sum-of-squares error function was used to improve the efficiency and robustness of the training of an artificial neural network in our CAD system for FFDMs. At the stage of FP reduction for clustered microcalcifications, morphological features and features derived from themore » artificial neural network outputs were extracted from each cluster. Stepwise linear discriminant analysis (LDA) was used to select the features. An LDA classifier was then used to differentiate clustered microcalcifications from FPs. A data set of 96 cases with 192 images was collected at the University of Michigan. This data set contained 96 MCCs, of which 28 clusters were proven by biopsy to be malignant and 68 were proven to be benign. The data set was separated into two independent data sets for training and testing of the CAD system in a cross-validation scheme. When one data set was used to train and validate the convolution neural network (CNN) in our CAD system, the other data set was used to evaluate the detection performance. With the use of a truncated error metric, the training of CNN could be accelerated and the classification performance was improved. The CNN in combination with an LDA classifier could substantially reduce FPs with a small tradeoff in sensitivity. By using the free-response receiver operating characteristic methodology, it was found that our CAD system can achieve a cluster-based sensitivity of 70, 80, and 90 % at 0.21, 0.61, and 1.49 FPs/image, respectively. For case-based performance evaluation, a sensitivity of 70, 80, and 90 % can be achieved at 0.07, 0.17, and 0.65 FPs/image, respectively. We also used a data set of 216 mammograms negative for clustered microcalcifications to further estimate the FP rate of our CAD system. The corresponding FP rates were 0.15, 0.31, and 0.86 FPs/image for cluster-based detection when negative mammograms were used for estimation of FP rates.« less
High-performance execution of psychophysical tasks with complex visual stimuli in MATLAB
Asaad, Wael F.; Santhanam, Navaneethan; McClellan, Steven
2013-01-01
Behavioral, psychological, and physiological experiments often require the ability to present sensory stimuli, monitor and record subjects' responses, interface with a wide range of devices, and precisely control the timing of events within a behavioral task. Here, we describe our recent progress developing an accessible and full-featured software system for controlling such studies using the MATLAB environment. Compared with earlier reports on this software, key new features have been implemented to allow the presentation of more complex visual stimuli, increase temporal precision, and enhance user interaction. These features greatly improve the performance of the system and broaden its applicability to a wider range of possible experiments. This report describes these new features and improvements, current limitations, and quantifies the performance of the system in a real-world experimental setting. PMID:23034363
Carbohydrate degrading polypeptide and uses thereof
Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter
2015-10-20
The invention relates to a polypeptide having carbohydrate material degrading activity which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 4, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional protein and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Intestinal tuberculosis masquerading as difficult to treat Crohn disease: a case report.
Niriella, Madunil A; Kodisinghe, S Kuleesha; De Silva, Arjuna P; Hewavisenthi, Janaki; de Silva, Hithanadura J
2016-08-24
Crohn disease has low prevalence in Sri Lanka while compared to the West, while intestinal tuberculosis is common in the region. Since clinical, endoscopic and investigation features of Crohn disease overlap with intestinal tuberculosis, differentiating these two conditions becomes a dilemma for the clinician in the intestinal tuberculosis endemic setting. An 18-year old Sri Lankan Muslim female presented with chronic abdominal pain and weight loss. Colonoscopy revealed an ulcerated ileocaecal valve and a terminal ileal stricture. Biopsy confirmed Crohn disease with no supportive features to suggest intestinal tuberculosis. Despite treatment with adequate immunosuppression she failed to improve and underwent a limited right hemicolectomy and terminal ileal resection. Histology confirmed intestinal tuberculosis and she made full recover with 6 months of anti-tuberculosis treatment. This case illustrates the importance of reviewing the diagnosis to include intestinal tuberculosis in an endemic setting, when already diagnosed Crohn disease is treatment refractory.
Two-photon calcium imaging during fictive navigation in virtual environments.
Ahrens, Misha B; Huang, Kuo Hua; Narayan, Sujatha; Mensh, Brett D; Engert, Florian
2013-01-01
A full understanding of nervous system function requires recording from large populations of neurons during naturalistic behaviors. Here we enable paralyzed larval zebrafish to fictively navigate two-dimensional virtual environments while we record optically from many neurons with two-photon imaging. Electrical recordings from motor nerves in the tail are decoded into intended forward swims and turns, which are used to update a virtual environment displayed underneath the fish. Several behavioral features-such as turning responses to whole-field motion and dark avoidance-are well-replicated in this virtual setting. We readily observed neuronal populations in the hindbrain with laterally selective responses that correlated with right or left optomotor behavior. We also observed neurons in the habenula, pallium, and midbrain with response properties specific to environmental features. Beyond single-cell correlations, the classification of network activity in such virtual settings promises to reveal principles of brainwide neural dynamics during behavior.
Estimates of the atmospheric parameters of M-type stars: a machine-learning perspective
NASA Astrophysics Data System (ADS)
Sarro, L. M.; Ordieres-Meré, J.; Bello-García, A.; González-Marcos, A.; Solano, E.
2018-05-01
Estimating the atmospheric parameters of M-type stars has been a difficult task due to the lack of simple diagnostics in the stellar spectra. We aim at uncovering good sets of predictive features of stellar atmospheric parameters (Teff, log (g), [M/H]) in spectra of M-type stars. We define two types of potential features (equivalent widths and integrated flux ratios) able to explain the atmospheric physical parameters. We search the space of feature sets using a genetic algorithm that evaluates solutions by their prediction performance in the framework of the BT-Settl library of stellar spectra. Thereafter, we construct eight regression models using different machine-learning techniques and compare their performances with those obtained using the classical χ2 approach and independent component analysis (ICA) coefficients. Finally, we validate the various alternatives using two sets of real spectra from the NASA Infrared Telescope Facility (IRTF) and Dwarf Archives collections. We find that the cross-validation errors are poor measures of the performance of regression models in the context of physical parameter prediction in M-type stars. For R ˜ 2000 spectra with signal-to-noise ratios typical of the IRTF and Dwarf Archives, feature selection with genetic algorithms or alternative techniques produces only marginal advantages with respect to representation spaces that are unconstrained in wavelength (full spectrum or ICA). We make available the atmospheric parameters for the two collections of observed spectra as online material.
Optimal number of features as a function of sample size for various classification rules.
Hua, Jianping; Xiong, Zixiang; Lowey, James; Suh, Edward; Dougherty, Edward R
2005-04-15
Given the joint feature-label distribution, increasing the number of features always results in decreased classification error; however, this is not the case when a classifier is designed via a classification rule from sample data. Typically (but not always), for fixed sample size, the error of a designed classifier decreases and then increases as the number of features grows. The potential downside of using too many features is most critical for small samples, which are commonplace for gene-expression-based classifiers for phenotype discrimination. For fixed sample size and feature-label distribution, the issue is to find an optimal number of features. Since only in rare cases is there a known distribution of the error as a function of the number of features and sample size, this study employs simulation for various feature-label distributions and classification rules, and across a wide range of sample and feature-set sizes. To achieve the desired end, finding the optimal number of features as a function of sample size, it employs massively parallel computation. Seven classifiers are treated: 3-nearest-neighbor, Gaussian kernel, linear support vector machine, polynomial support vector machine, perceptron, regular histogram and linear discriminant analysis. Three Gaussian-based models are considered: linear, nonlinear and bimodal. In addition, real patient data from a large breast-cancer study is considered. To mitigate the combinatorial search for finding optimal feature sets, and to model the situation in which subsets of genes are co-regulated and correlation is internal to these subsets, we assume that the covariance matrix of the features is blocked, with each block corresponding to a group of correlated features. Altogether there are a large number of error surfaces for the many cases. These are provided in full on a companion website, which is meant to serve as resource for those working with small-sample classification. For the companion website, please visit http://public.tgen.org/tamu/ofs/ e-dougherty@ee.tamu.edu.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woodruff, David; Hackebeil, Gabe; Laird, Carl Damon
Pyomo supports the formulation and analysis of mathematical models for complex optimization applications. This capability is commonly associated with algebraic modeling languages (AMLs), which support the description and analysis of mathematical models with a high-level language. Although most AMLs are implemented in custom modeling languages, Pyomo's modeling objects are embedded within Python, a full- featured high-level programming language that contains a rich set of supporting libraries.
Ultraviolet divergences in non-renormalizable supersymmetric theories
NASA Astrophysics Data System (ADS)
Smilga, A.
2017-03-01
We present a pedagogical review of our current understanding of the ultraviolet structure of N = (1,1) 6D supersymmetric Yang-Mills theory and of N = 8 4 D supergravity. These theories are not renormalizable, they involve power ultraviolet divergences and, in all probability, an infinite set of higherdimensional counterterms that contribute to on-mass-shell scattering amplitudes. A specific feature of supersymmetric theories (especially, of extended supersymmetric theories) is that these counterterms may not be invariant off shell under the full set of supersymmetry transformations. The lowest-dimensional nontrivial counterterm is supersymmetric on shell. Still higher counterterms may lose even the on-shell invariance. On the other hand, the full effective Lagrangian, generating the amplitudes and representing an infinite sum of counterterms, still enjoys the complete symmetry of original theory. We also discuss simple supersymmetric quantum-mechanical models that exhibit the same behaviour.
Quantifying and visualizing variations in sets of images using continuous linear optimal transport
NASA Astrophysics Data System (ADS)
Kolouri, Soheil; Rohde, Gustavo K.
2014-03-01
Modern advancements in imaging devices have enabled us to explore the subcellular structure of living organisms and extract vast amounts of information. However, interpreting the biological information mined in the captured images is not a trivial task. Utilizing predetermined numerical features is usually the only hope for quantifying this information. Nonetheless, direct visual or biological interpretation of results obtained from these selected features is non-intuitive and difficult. In this paper, we describe an automatic method for modeling visual variations in a set of images, which allows for direct visual interpretation of the most significant differences, without the need for predefined features. The method is based on a linearized version of the continuous optimal transport (OT) metric, which provides a natural linear embedding for the image data set, in which linear combination of images leads to a visually meaningful image. This enables us to apply linear geometric data analysis techniques such as principal component analysis and linear discriminant analysis in the linearly embedded space and visualize the most prominent modes, as well as the most discriminant modes of variations, in the dataset. Using the continuous OT framework, we are able to analyze variations in shape and texture in a set of images utilizing each image at full resolution, that otherwise cannot be done by existing methods. The proposed method is applied to a set of nuclei images segmented from Feulgen stained liver tissues in order to investigate the major visual differences in chromatin distribution of Fetal-Type Hepatoblastoma (FHB) cells compared to the normal cells.
Development and Assessment of a New Empirical Model for Predicting Full Creep Curves
Gray, Veronica; Whittaker, Mark
2015-01-01
This paper details the development and assessment of a new empirical creep model that belongs to the limited ranks of models reproducing full creep curves. The important features of the model are that it is fully standardised and is universally applicable. By standardising, the user no longer chooses functions but rather fits one set of constants only. Testing it on 7 contrasting materials, reproducing 181 creep curves we demonstrate its universality. New model and Theta Projection curves are compared to one another using an assessment tool developed within this paper. PMID:28793458
Subset selective search on the basis of color and preview.
Donk, Mieke
2017-01-01
In the preview paradigm observers are presented with one set of elements (the irrelevant set) followed by the addition of a second set among which the target is presented (the relevant set). Search efficiency in such a preview condition has been demonstrated to be higher than that in a full-baseline condition in which both sets are simultaneously presented, suggesting that a preview of the irrelevant set reduces its influence on the search process. However, numbers of irrelevant and relevant elements are typically not independently manipulated. Moreover, subset selective search also occurs when both sets are presented simultaneously but differ in color. The aim of the present study was to investigate how numbers of irrelevant and relevant elements contribute to preview search in the absence and presence of a color difference between subsets. In two experiments it was demonstrated that a preview reduced the influence of the number of irrelevant elements in the absence but not in the presence of a color difference between subsets. In the presence of a color difference, a preview lowered the effect of the number of relevant elements but only when the target was defined by a unique feature within the relevant set (Experiment 1); when the target was defined by a conjunction of features (Experiment 2), search efficiency as a function of the number of relevant elements was not modulated by a preview. Together the results are in line with the idea that subset selective search is based on different simultaneously operating mechanisms.
Malek, Salim; Melgani, Farid; Mekhalfi, Mohamed Lamine; Bazi, Yakoub
2017-11-16
This paper describes three coarse image description strategies, which are meant to promote a rough perception of surrounding objects for visually impaired individuals, with application to indoor spaces. The described algorithms operate on images (grabbed by the user, by means of a chest-mounted camera), and provide in output a list of objects that likely exist in his context across the indoor scene. In this regard, first, different colour, texture, and shape-based feature extractors are generated, followed by a feature learning step by means of AutoEncoder (AE) models. Second, the produced features are fused and fed into a multilabel classifier in order to list the potential objects. The conducted experiments point out that fusing a set of AE-learned features scores higher classification rates with respect to using the features individually. Furthermore, with respect to reference works, our method: (i) yields higher classification accuracies, and (ii) runs (at least four times) faster, which enables a potential full real-time application.
Suh, Jong Hwan
2016-01-01
In recent years, the anonymous nature of the Internet has made it difficult to detect manipulated user reputations in social media, as well as to ensure the qualities of users and their posts. To deal with this, this study designs and examines an automatic approach that adopts writing style features to estimate user reputations in social media. Under varying ways of defining Good and Bad classes of user reputations based on the collected data, it evaluates the classification performance of the state-of-art methods: four writing style features, i.e. lexical, syntactic, structural, and content-specific, and eight classification techniques, i.e. four base learners-C4.5, Neural Network (NN), Support Vector Machine (SVM), and Naïve Bayes (NB)-and four Random Subspace (RS) ensemble methods based on the four base learners. When South Korea's Web forum, Daum Agora, was selected as a test bed, the experimental results show that the configuration of the full feature set containing content-specific features and RS-SVM combining RS and SVM gives the best accuracy for classification if the test bed poster reputations are segmented strictly into Good and Bad classes by portfolio approach. Pairwise t tests on accuracy confirm two expectations coming from the literature reviews: first, the feature set adding content-specific features outperform the others; second, ensemble learning methods are more viable than base learners. Moreover, among the four ways on defining the classes of user reputations, i.e. like, dislike, sum, and portfolio, the results show that the portfolio approach gives the highest accuracy.
Holistic approach for automated background EEG assessment in asphyxiated full-term infants
NASA Astrophysics Data System (ADS)
Matic, Vladimir; Cherian, Perumpillichira J.; Koolen, Ninah; Naulaers, Gunnar; Swarte, Renate M.; Govaert, Paul; Van Huffel, Sabine; De Vos, Maarten
2014-12-01
Objective. To develop an automated algorithm to quantify background EEG abnormalities in full-term neonates with hypoxic ischemic encephalopathy. Approach. The algorithm classifies 1 h of continuous neonatal EEG (cEEG) into a mild, moderate or severe background abnormality grade. These classes are well established in the literature and a clinical neurophysiologist labeled 272 1 h cEEG epochs selected from 34 neonates. The algorithm is based on adaptive EEG segmentation and mapping of the segments into the so-called segments’ feature space. Three features are suggested and further processing is obtained using a discretized three-dimensional distribution of the segments’ features represented as a 3-way data tensor. Further classification has been achieved using recently developed tensor decomposition/classification methods that reduce the size of the model and extract a significant and discriminative set of features. Main results. Effective parameterization of cEEG data has been achieved resulting in high classification accuracy (89%) to grade background EEG abnormalities. Significance. For the first time, the algorithm for the background EEG assessment has been validated on an extensive dataset which contained major artifacts and epileptic seizures. The demonstrated high robustness, while processing real-case EEGs, suggests that the algorithm can be used as an assistive tool to monitor the severity of hypoxic insults in newborns.
A comparison study of image features between FFDM and film mammogram images
Jing, Hao; Yang, Yongyi; Wernick, Miles N.; Yarusso, Laura M.; Nishikawa, Robert M.
2012-01-01
Purpose: This work is to provide a direct, quantitative comparison of image features measured by film and full-field digital mammography (FFDM). The purpose is to investigate whether there is any systematic difference between film and FFDM in terms of quantitative image features and their influence on the performance of a computer-aided diagnosis (CAD) system. Methods: The authors make use of a set of matched film-FFDM image pairs acquired from cadaver breast specimens with simulated microcalcifications consisting of bone and teeth fragments using both a GE digital mammography system and a screen-film system. To quantify the image features, the authors consider a set of 12 textural features of lesion regions and six image features of individual microcalcifications (MCs). The authors first conduct a direct comparison on these quantitative features extracted from film and FFDM images. The authors then study the performance of a CAD classifier for discriminating between MCs and false positives (FPs) when the classifier is trained on images of different types (film, FFDM, or both). Results: For all the features considered, the quantitative results show a high degree of correlation between features extracted from film and FFDM, with the correlation coefficients ranging from 0.7326 to 0.9602 for the different features. Based on a Fisher sign rank test, there was no significant difference observed between the features extracted from film and those from FFDM. For both MC detection and discrimination of FPs from MCs, FFDM had a slight but statistically significant advantage in performance; however, when the classifiers were trained on different types of images (acquired with FFDM or SFM) for discriminating MCs from FPs, there was little difference. Conclusions: The results indicate good agreement between film and FFDM in quantitative image features. While FFDM images provide better detection performance in MCs, FFDM and film images may be interchangeable for the purposes of training CAD algorithms, and a single CAD algorithm may be applied to either type of images. PMID:22830771
Training Effectiveness Evaluation of Device A/F37A-T59
1982-07-01
selected airplane by manually setting track, crosstrack, and altitude on thE control panel. Posi ion is maintained by flying the attitude director...simulator’s other design capabilities includes full SKE airdrop simulation, radar simulation, manual or pre-programmed malfunctions, a library of...during IFS testing, this feature was not available for this study. Thus, the instructors had to manually program all mission profiles prior to each
Network Sampling and Classification:An Investigation of Network Model Representations
Airoldi, Edoardo M.; Bai, Xue; Carley, Kathleen M.
2011-01-01
Methods for generating a random sample of networks with desired properties are important tools for the analysis of social, biological, and information networks. Algorithm-based approaches to sampling networks have received a great deal of attention in recent literature. Most of these algorithms are based on simple intuitions that associate the full features of connectivity patterns with specific values of only one or two network metrics. Substantive conclusions are crucially dependent on this association holding true. However, the extent to which this simple intuition holds true is not yet known. In this paper, we examine the association between the connectivity patterns that a network sampling algorithm aims to generate and the connectivity patterns of the generated networks, measured by an existing set of popular network metrics. We find that different network sampling algorithms can yield networks with similar connectivity patterns. We also find that the alternative algorithms for the same connectivity pattern can yield networks with different connectivity patterns. We argue that conclusions based on simulated network studies must focus on the full features of the connectivity patterns of a network instead of on the limited set of network metrics for a specific network type. This fact has important implications for network data analysis: for instance, implications related to the way significance is currently assessed. PMID:21666773
Paroxysmal atrial fibrillation prediction method with shorter HRV sequences.
Boon, K H; Khalil-Hani, M; Malarvili, M B; Sia, C W
2016-10-01
This paper proposes a method that predicts the onset of paroxysmal atrial fibrillation (PAF), using heart rate variability (HRV) segments that are shorter than those applied in existing methods, while maintaining good prediction accuracy. PAF is a common cardiac arrhythmia that increases the health risk of a patient, and the development of an accurate predictor of the onset of PAF is clinical important because it increases the possibility to stabilize (electrically) and prevent the onset of atrial arrhythmias with different pacing techniques. We investigate the effect of HRV features extracted from different lengths of HRV segments prior to PAF onset with the proposed PAF prediction method. The pre-processing stage of the predictor includes QRS detection, HRV quantification and ectopic beat correction. Time-domain, frequency-domain, non-linear and bispectrum features are then extracted from the quantified HRV. In the feature selection, the HRV feature set and classifier parameters are optimized simultaneously using an optimization procedure based on genetic algorithm (GA). Both full feature set and statistically significant feature subset are optimized by GA respectively. For the statistically significant feature subset, Mann-Whitney U test is used to filter non-statistical significance features that cannot pass the statistical test at 20% significant level. The final stage of our predictor is the classifier that is based on support vector machine (SVM). A 10-fold cross-validation is applied in performance evaluation, and the proposed method achieves 79.3% prediction accuracy using 15-minutes HRV segment. This accuracy is comparable to that achieved by existing methods that use 30-minutes HRV segments, most of which achieves accuracy of around 80%. More importantly, our method significantly outperforms those that applied segments shorter than 30 minutes. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Deep Learning Methods for Underwater Target Feature Extraction and Recognition
Peng, Yuan; Qiu, Mengran; Shi, Jianfei; Liu, Liangliang
2018-01-01
The classification and recognition technology of underwater acoustic signal were always an important research content in the field of underwater acoustic signal processing. Currently, wavelet transform, Hilbert-Huang transform, and Mel frequency cepstral coefficients are used as a method of underwater acoustic signal feature extraction. In this paper, a method for feature extraction and identification of underwater noise data based on CNN and ELM is proposed. An automatic feature extraction method of underwater acoustic signals is proposed using depth convolution network. An underwater target recognition classifier is based on extreme learning machine. Although convolution neural networks can execute both feature extraction and classification, their function mainly relies on a full connection layer, which is trained by gradient descent-based; the generalization ability is limited and suboptimal, so an extreme learning machine (ELM) was used in classification stage. Firstly, CNN learns deep and robust features, followed by the removing of the fully connected layers. Then ELM fed with the CNN features is used as the classifier to conduct an excellent classification. Experiments on the actual data set of civil ships obtained 93.04% recognition rate; compared to the traditional Mel frequency cepstral coefficients and Hilbert-Huang feature, recognition rate greatly improved. PMID:29780407
Liu, Jingfang; Zhang, Pengzhu; Lu, Yingjie
2014-11-01
User-generated medical messages on Internet contain extensive information related to adverse drug reactions (ADRs) and are known as valuable resources for post-marketing drug surveillance. The aim of this study was to find an effective method to identify messages related to ADRs automatically from online user reviews. We conducted experiments on online user reviews using different feature set and different classification technique. Firstly, the messages from three communities, allergy community, schizophrenia community and pain management community, were collected, the 3000 messages were annotated. Secondly, the N-gram-based features set and medical domain-specific features set were generated. Thirdly, three classification techniques, SVM, C4.5 and Naïve Bayes, were used to perform classification tasks separately. Finally, we evaluated the performance of different method using different feature set and different classification technique by comparing the metrics including accuracy and F-measure. In terms of accuracy, the accuracy of SVM classifier was higher than 0.8, the accuracy of C4.5 classifier or Naïve Bayes classifier was lower than 0.8; meanwhile, the combination feature sets including n-gram-based feature set and domain-specific feature set consistently outperformed single feature set. In terms of F-measure, the highest F-measure is 0.895 which was achieved by using combination feature sets and a SVM classifier. In all, we can get the best classification performance by using combination feature sets and SVM classifier. By using combination feature sets and SVM classifier, we can get an effective method to identify messages related to ADRs automatically from online user reviews.
NASA Astrophysics Data System (ADS)
Ferreira, P.; Avilés, A.; Dafflon, J.; Mönnich, A.; Trichopoulos, I.
2015-12-01
Indico has come a long way since it was first used to organize CHEP 2004. More than ten years of development have brought new features and projects, widening the application's feature set and enabling event organizers to work even more efficiently. While that has boosted the tool's usage and facilitated its adoption by a remarkable 300,000 events (at CERN only), it has also generated a whole new range of challenges, which have been the target of the team's attention for the last 2 years. One of them was that of scalability and the maintainability of the current database solution (ZODB). After careful consideration, the decision was taken to move away from ZODB to PostgreSQL, a relational and widely-adopted solution that will permit the development of a more ambitious feature set as well as improved performance and scalability. A change of this type is by no means trivial in nature and requires the refactoring of most backend code as well as the full rewrite of significant portions of it. We are taking this opportunity to modernize Indico, by employing standard web modules, technologies and concepts that not only make development and maintenance easier but also constitute an upgrade to Indico's stack. The first results are already visible since August 2014, with the full migration of the Room Booking module to the new paradigm. In this paper we explain what has been done so far in the context of this ambitious migration, what have been the main findings and challenges, as well as the main technologies and concepts that will constitute the foundation of the resultant Indico 2.0.
Markov random field based automatic image alignment for electron tomography.
Amat, Fernando; Moussavi, Farshid; Comolli, Luis R; Elidan, Gal; Downing, Kenneth H; Horowitz, Mark
2008-03-01
We present a method for automatic full-precision alignment of the images in a tomographic tilt series. Full-precision automatic alignment of cryo electron microscopy images has remained a difficult challenge to date, due to the limited electron dose and low image contrast. These facts lead to poor signal to noise ratio (SNR) in the images, which causes automatic feature trackers to generate errors, even with high contrast gold particles as fiducial features. To enable fully automatic alignment for full-precision reconstructions, we frame the problem probabilistically as finding the most likely particle tracks given a set of noisy images, using contextual information to make the solution more robust to the noise in each image. To solve this maximum likelihood problem, we use Markov Random Fields (MRF) to establish the correspondence of features in alignment and robust optimization for projection model estimation. The resulting algorithm, called Robust Alignment and Projection Estimation for Tomographic Reconstruction, or RAPTOR, has not needed any manual intervention for the difficult datasets we have tried, and has provided sub-pixel alignment that is as good as the manual approach by an expert user. We are able to automatically map complete and partial marker trajectories and thus obtain highly accurate image alignment. Our method has been applied to challenging cryo electron tomographic datasets with low SNR from intact bacterial cells, as well as several plastic section and X-ray datasets.
Incipient fault feature extraction of rolling bearings based on the MVMD and Teager energy operator.
Ma, Jun; Wu, Jiande; Wang, Xiaodong
2018-06-04
Aiming at the problems that the incipient fault of rolling bearings is difficult to recognize and the number of intrinsic mode functions (IMFs) decomposed by variational mode decomposition (VMD) must be set in advance and can not be adaptively selected, taking full advantages of the adaptive segmentation of scale spectrum and Teager energy operator (TEO) demodulation, a new method for early fault feature extraction of rolling bearings based on the modified VMD and Teager energy operator (MVMD-TEO) is proposed. Firstly, the vibration signal of rolling bearings is analyzed by adaptive scale space spectrum segmentation to obtain the spectrum segmentation support boundary, and then the number K of IMFs decomposed by VMD is adaptively determined. Secondly, the original vibration signal is adaptively decomposed into K IMFs, and the effective IMF components are extracted based on the correlation coefficient criterion. Finally, the Teager energy spectrum of the reconstructed signal of the effective IMF components is calculated by the TEO, and then the early fault features of rolling bearings are extracted to realize the fault identification and location. Comparative experiments of the proposed method and the existing fault feature extraction method based on Local Mean Decomposition and Teager energy operator (LMD-TEO) have been implemented using experimental data-sets and a measured data-set. The results of comparative experiments in three application cases show that the presented method can achieve a fairly or slightly better performance than LMD-TEO method, and the validity and feasibility of the proposed method are proved. Copyright © 2018. Published by Elsevier Ltd.
Recovery of sparse translation-invariant signals with continuous basis pursuit
Ekanadham, Chaitanya; Tranchina, Daniel; Simoncelli, Eero
2013-01-01
We consider the problem of decomposing a signal into a linear combination of features, each a continuously translated version of one of a small set of elementary features. Although these constituents are drawn from a continuous family, most current signal decomposition methods rely on a finite dictionary of discrete examples selected from this family (e.g., shifted copies of a set of basic waveforms), and apply sparse optimization methods to select and solve for the relevant coefficients. Here, we generate a dictionary that includes auxiliary interpolation functions that approximate translates of features via adjustment of their coefficients. We formulate a constrained convex optimization problem, in which the full set of dictionary coefficients represents a linear approximation of the signal, the auxiliary coefficients are constrained so as to only represent translated features, and sparsity is imposed on the primary coefficients using an L1 penalty. The basis pursuit denoising (BP) method may be seen as a special case, in which the auxiliary interpolation functions are omitted, and we thus refer to our methodology as continuous basis pursuit (CBP). We develop two implementations of CBP for a one-dimensional translation-invariant source, one using a first-order Taylor approximation, and another using a form of trigonometric spline. We examine the tradeoff between sparsity and signal reconstruction accuracy in these methods, demonstrating empirically that trigonometric CBP substantially outperforms Taylor CBP, which in turn offers substantial gains over ordinary BP. In addition, the CBP bases can generally achieve equally good or better approximations with much coarser sampling than BP, leading to a reduction in dictionary dimensionality. PMID:24352562
Effective traffic features selection algorithm for cyber-attacks samples
NASA Astrophysics Data System (ADS)
Li, Yihong; Liu, Fangzheng; Du, Zhenyu
2018-05-01
By studying the defense scheme of Network attacks, this paper propose an effective traffic features selection algorithm based on k-means++ clustering to deal with the problem of high dimensionality of traffic features which extracted from cyber-attacks samples. Firstly, this algorithm divide the original feature set into attack traffic feature set and background traffic feature set by the clustering. Then, we calculates the variation of clustering performance after removing a certain feature. Finally, evaluating the degree of distinctiveness of the feature vector according to the result. Among them, the effective feature vector is whose degree of distinctiveness exceeds the set threshold. The purpose of this paper is to select out the effective features from the extracted original feature set. In this way, it can reduce the dimensionality of the features so as to reduce the space-time overhead of subsequent detection. The experimental results show that the proposed algorithm is feasible and it has some advantages over other selection algorithms.
Mathieson, Luke; Mendes, Alexandre; Marsden, John; Pond, Jeffrey; Moscato, Pablo
2017-01-01
This chapter introduces a new method for knowledge extraction from databases for the purpose of finding a discriminative set of features that is also a robust set for within-class classification. Our method is generic and we introduce it here in the field of breast cancer diagnosis from digital mammography data. The mathematical formalism is based on a generalization of the k-Feature Set problem called (α, β)-k-Feature Set problem, introduced by Cotta and Moscato (J Comput Syst Sci 67(4):686-690, 2003). This method proceeds in two steps: first, an optimal (α, β)-k-feature set of minimum cardinality is identified and then, a set of classification rules using these features is obtained. We obtain the (α, β)-k-feature set in two phases; first a series of extremely powerful reduction techniques, which do not lose the optimal solution, are employed; and second, a metaheuristic search to identify the remaining features to be considered or disregarded. Two algorithms were tested with a public domain digital mammography dataset composed of 71 malignant and 75 benign cases. Based on the results provided by the algorithms, we obtain classification rules that employ only a subset of these features.
Prediction of Cognitive States During Flight Simulation Using Multimodal Psychophysiological Sensing
NASA Technical Reports Server (NTRS)
Harrivel, Angela R.; Stephens, Chad L.; Milletich, Robert J.; Heinich, Christina M.; Last, Mary Carolyn; Napoli, Nicholas J.; Abraham, Nijo A.; Prinzel, Lawrence J.; Motter, Mark A.; Pope, Alan T.
2017-01-01
The Commercial Aviation Safety Team found the majority of recent international commercial aviation accidents attributable to loss of control inflight involved flight crew loss of airplane state awareness (ASA), and distraction was involved in all of them. Research on attention-related human performance limiting states (AHPLS) such as channelized attention, diverted attention, startle/surprise, and confirmation bias, has been recommended in a Safety Enhancement (SE) entitled "Training for Attention Management." To accomplish the detection of such cognitive and psychophysiological states, a broad suite of sensors was implemented to simultaneously measure their physiological markers during a high fidelity flight simulation human subject study. Twenty-four pilot participants were asked to wear the sensors while they performed benchmark tasks and motion-based flight scenarios designed to induce AHPLS. Pattern classification was employed to predict the occurrence of AHPLS during flight simulation also designed to induce those states. Classifier training data were collected during performance of the benchmark tasks. Multimodal classification was performed, using pre-processed electroencephalography, galvanic skin response, electrocardiogram, and respiration signals as input features. A combination of one, some or all modalities were used. Extreme gradient boosting, random forest and two support vector machine classifiers were implemented. The best accuracy for each modality-classifier combination is reported. Results using a select set of features and using the full set of available features are presented. Further, results are presented for training one classifier with the combined features and for training multiple classifiers with features from each modality separately. Using the select set of features and combined training, multistate prediction accuracy averaged 0.64 +/- 0.14 across thirteen participants and was significantly higher than that for the separate training case. These results support the goal of demonstrating simultaneous real-time classification of multiple states using multiple sensing modalities in high fidelity flight simulators. This detection is intended to support and inform training methods under development to mitigate the loss of ASA and thus reduce accidents and incidents.
Automatic morphological classification of galaxy images
Shamir, Lior
2009-01-01
We describe an image analysis supervised learning algorithm that can automatically classify galaxy images. The algorithm is first trained using a manually classified images of elliptical, spiral, and edge-on galaxies. A large set of image features is extracted from each image, and the most informative features are selected using Fisher scores. Test images can then be classified using a simple Weighted Nearest Neighbor rule such that the Fisher scores are used as the feature weights. Experimental results show that galaxy images from Galaxy Zoo can be classified automatically to spiral, elliptical and edge-on galaxies with accuracy of ~90% compared to classifications carried out by the author. Full compilable source code of the algorithm is available for free download, and its general-purpose nature makes it suitable for other uses that involve automatic image analysis of celestial objects. PMID:20161594
NASA Astrophysics Data System (ADS)
Marble, Jay A.; Gorman, John D.
1999-08-01
A feature based approach is taken to reduce the occurrence of false alarms in foliage penetrating, ultra-wideband, synthetic aperture radar data. A set of 'generic' features is defined based on target size, shape, and pixel intensity. A second set of features is defined that contains generic features combined with features based on scattering phenomenology. Each set is combined using a quadratic polynomial discriminant (QPD), and performance is characterized by generating a receiver operating characteristic (ROC) curve. Results show that the feature set containing phenomenological features improves performance against both broadside and end-on targets. Performance against end-on targets, however, is especially pronounced.
Elucidating high-dimensional cancer hallmark annotation via enriched ontology.
Yan, Shankai; Wong, Ka-Chun
2017-09-01
Cancer hallmark annotation is a promising technique that could discover novel knowledge about cancer from the biomedical literature. The automated annotation of cancer hallmarks could reveal relevant cancer transformation processes in the literature or extract the articles that correspond to the cancer hallmark of interest. It acts as a complementary approach that can retrieve knowledge from massive text information, advancing numerous focused studies in cancer research. Nonetheless, the high-dimensional nature of cancer hallmark annotation imposes a unique challenge. To address the curse of dimensionality, we compared multiple cancer hallmark annotation methods on 1580 PubMed abstracts. Based on the insights, a novel approach, UDT-RF, which makes use of ontological features is proposed. It expands the feature space via the Medical Subject Headings (MeSH) ontology graph and utilizes novel feature selections for elucidating the high-dimensional cancer hallmark annotation space. To demonstrate its effectiveness, state-of-the-art methods are compared and evaluated by a multitude of performance metrics, revealing the full performance spectrum on the full set of cancer hallmarks. Several case studies are conducted, demonstrating how the proposed approach could reveal novel insights into cancers. https://github.com/cskyan/chmannot. Copyright © 2017 Elsevier Inc. All rights reserved.
Toward natural selection in virtual reality.
Sherstyuk, Andrei; Vincent, Dale; Treskunov, Anton
2010-01-01
Here we describe a vision of VR games that combine the best features of gaming and VR: large, persistent worlds experienced in photorealistic settings with full immersion. For example, Figure 1 illustrates a hypothetical immersive VR game that could be developed using current technologies, including real-time, cinematic-quality graphics; a panoramic head-mounted display (HMD); and wide-area tracking. We also examine the gap between available VR and gaming technologies, and offer solutions for bridging it.
Role of ultrafast dissociation in the fragmentation of chlorinated methanes
NASA Astrophysics Data System (ADS)
Kokkonen, E.; Jänkälä, K.; Patanen, M.; Cao, W.; Hrast, M.; Bučar, K.; Žitnik, M.; Huttula, M.
2018-05-01
Photon-induced fragmentation of a full set of chlorinated methanes (CH3Cl, CH2Cl2, CHCl3, CCl4) has been investigated both experimentally and computationally. Using synchrotron radiation and electron-ion coincidence measurements, the dissociation processes were studied after chlorine 2p electron excitation. Experimental evidence for CH3Cl and CH2Cl2 contains unique features suggesting that fast dissociation processes take place. By contrast, CHCl3 and CCl4 molecules do not contain the same features, hinting that they experience alternative mechanisms for dissociation and charge migration. Computational work indicates differing rates of charge movement after the core-excitation, which can be used to explain the differences observed experimentally.
Role of ultrafast dissociation in the fragmentation of chlorinated methanes.
Kokkonen, E; Jänkälä, K; Patanen, M; Cao, W; Hrast, M; Bučar, K; Žitnik, M; Huttula, M
2018-05-07
Photon-induced fragmentation of a full set of chlorinated methanes (CH 3 Cl, CH 2 Cl 2 , CHCl 3 , CCl 4 ) has been investigated both experimentally and computationally. Using synchrotron radiation and electron-ion coincidence measurements, the dissociation processes were studied after chlorine 2p electron excitation. Experimental evidence for CH 3 Cl and CH 2 Cl 2 contains unique features suggesting that fast dissociation processes take place. By contrast, CHCl 3 and CCl 4 molecules do not contain the same features, hinting that they experience alternative mechanisms for dissociation and charge migration. Computational work indicates differing rates of charge movement after the core-excitation, which can be used to explain the differences observed experimentally.
Dissimilarity representations in lung parenchyma classification
NASA Astrophysics Data System (ADS)
Sørensen, Lauge; de Bruijne, Marleen
2009-02-01
A good problem representation is important for a pattern recognition system to be successful. The traditional approach to statistical pattern recognition is feature representation. More specifically, objects are represented by a number of features in a feature vector space, and classifiers are built in this representation. This is also the general trend in lung parenchyma classification in computed tomography (CT) images, where the features often are measures on feature histograms. Instead, we propose to build normal density based classifiers in dissimilarity representations for lung parenchyma classification. This allows for the classifiers to work on dissimilarities between objects, which might be a more natural way of representing lung parenchyma. In this context, dissimilarity is defined between CT regions of interest (ROI)s. ROIs are represented by their CT attenuation histogram and ROI dissimilarity is defined as a histogram dissimilarity measure between the attenuation histograms. In this setting, the full histograms are utilized according to the chosen histogram dissimilarity measure. We apply this idea to classification of different emphysema patterns as well as normal, healthy tissue. Two dissimilarity representation approaches as well as different histogram dissimilarity measures are considered. The approaches are evaluated on a set of 168 CT ROIs using normal density based classifiers all showing good performance. Compared to using histogram dissimilarity directly as distance in a emph{k} nearest neighbor classifier, which achieves a classification accuracy of 92.9%, the best dissimilarity representation based classifier is significantly better with a classification accuracy of 97.0% (text{emph{p" border="0" class="imgtopleft"> = 0.046).
The Distribution and Behaviour of Photospheric Magnetic Features
NASA Astrophysics Data System (ADS)
Parnell, C. E.; Lamb, D. A.; DeForest, C. E.
2014-12-01
Over the past two decades enormous amounts of data on the magnetic fields of the solar photosphere have been produced by both ground-based (Kitt Peak & SOLIS), as well as space-based instruments (MDI, Hinode & HMI). In order to study the behaviour and distribution of photospheric magnetic features, efficient automated detection routines need to be utilised to identify and track magnetic features. In this talk, I will discuss the pros and cons of different automated magnetic feature identification and tracking routines with a special focus on the requirements of these codes to deal with the large data sets produced by HMI. By patching together results from Hinode and MDI (high-res & full-disk), the fluxes of magnetic features were found to follow a power-law over 5 orders of magnitude. At the strong flux tail of this distribution, the power law was found to fall off at solar minimum, but was maintained over all fluxes during solar maximum. However, the point of deflection in the power-law distribution occurs at a patching point between instruments and so questions remain over the reasons for the deflection. The feature fluxes determined from the superb high-resolution HMI data covers almost all of the 5 orders of magnitude. Considering both solar mimimum and solar maximum HMI data sets, we investigate whether the power-law over 5 orders of magnitude in flux still holds. Furthermore, we investigate the behaviour of magnetic features in order to probe the nature of their origin. In particular, we analyse small-scale flux emergence events using HMI data to investigate the existence of a small-scale dynamo just below the solar photosphere.
Detection of explosive cough events in audio recordings by internal sound analysis.
Rocha, B M; Mendes, L; Couceiro, R; Henriques, J; Carvalho, P; Paiva, R P
2017-07-01
We present a new method for the discrimination of explosive cough events, which is based on a combination of spectral content descriptors and pitch-related features. After the removal of near-silent segments, a vector of event boundaries is obtained and a proposed set of 9 features is extracted for each event. Two data sets, recorded using electronic stethoscopes and comprising a total of 46 healthy subjects and 13 patients, were employed to evaluate the method. The proposed feature set is compared to three other sets of descriptors: a baseline, a combination of both sets, and an automatic selection of the best 10 features from both sets. The combined feature set yields good results on the cross-validated database, attaining a sensitivity of 92.3±2.3% and a specificity of 84.7±3.3%. Besides, this feature set seems to generalize well when it is trained on a small data set of patients, with a variety of respiratory and cardiovascular diseases, and tested on a bigger data set of mostly healthy subjects: a sensitivity of 93.4% and a specificity of 83.4% are achieved in those conditions. These results demonstrate that complementing the proposed feature set with a baseline set is a promising approach.
Minutia Tensor Matrix: A New Strategy for Fingerprint Matching
Fu, Xiang; Feng, Jufu
2015-01-01
Establishing correspondences between two minutia sets is a fundamental issue in fingerprint recognition. This paper proposes a new tensor matching strategy. First, the concept of minutia tensor matrix (simplified as MTM) is proposed. It describes the first-order features and second-order features of a matching pair. In the MTM, the diagonal elements indicate similarities of minutia pairs and non-diagonal elements indicate pairwise compatibilities between minutia pairs. Correct minutia pairs are likely to establish both large similarities and large compatibilities, so they form a dense sub-block. Minutia matching is then formulated as recovering the dense sub-block in the MTM. This is a new tensor matching strategy for fingerprint recognition. Second, as fingerprint images show both local rigidity and global nonlinearity, we design two different kinds of MTMs: local MTM and global MTM. Meanwhile, a two-level matching algorithm is proposed. For local matching level, the local MTM is constructed and a novel local similarity calculation strategy is proposed. It makes full use of local rigidity in fingerprints. For global matching level, the global MTM is constructed to calculate similarities of entire minutia sets. It makes full use of global compatibility in fingerprints. Proposed method has stronger description ability and better robustness to noise and nonlinearity. Experiments conducted on Fingerprint Verification Competition databases (FVC2002 and FVC2004) demonstrate the effectiveness and the efficiency. PMID:25822489
Compact and Hybrid Feature Description for Building Extraction
NASA Astrophysics Data System (ADS)
Li, Z.; Liu, Y.; Hu, Y.; Li, P.; Ding, Y.
2017-05-01
Building extraction in aerial orthophotos is crucial for various applications. Currently, deep learning has been shown to be successful in addressing building extraction with high accuracy and high robustness. However, quite a large number of samples is required in training a classifier when using deep learning model. In order to realize accurate and semi-interactive labelling, the performance of feature description is crucial, as it has significant effect on the accuracy of classification. In this paper, we bring forward a compact and hybrid feature description method, in order to guarantees desirable classification accuracy of the corners on the building roof contours. The proposed descriptor is a hybrid description of an image patch constructed from 4 sets of binary intensity tests. Experiments show that benefiting from binary description and making full use of color channels, this descriptor is not only computationally frugal, but also accurate than SURF for building extraction.
VAS: A Vision Advisor System combining agents and object-oriented databases
NASA Technical Reports Server (NTRS)
Eilbert, James L.; Lim, William; Mendelsohn, Jay; Braun, Ron; Yearwood, Michael
1994-01-01
A model-based approach to identifying and finding the orientation of non-overlapping parts on a tray has been developed. The part models contain both exact and fuzzy descriptions of part features, and are stored in an object-oriented database. Full identification of the parts involves several interacting tasks each of which is handled by a distinct agent. Using fuzzy information stored in the model allowed part features that were essentially at the noise level to be extracted and used for identification. This was done by focusing attention on the portion of the part where the feature must be found if the current hypothesis of the part ID is correct. In going from one set of parts to another the only thing that needs to be changed is the database of part models. This work is part of an effort in developing a Vision Advisor System (VAS) that combines agents and objected-oriented databases.
Robust Learning of High-dimensional Biological Networks with Bayesian Networks
NASA Astrophysics Data System (ADS)
Nägele, Andreas; Dejori, Mathäus; Stetter, Martin
Structure learning of Bayesian networks applied to gene expression data has become a potentially useful method to estimate interactions between genes. However, the NP-hardness of Bayesian network structure learning renders the reconstruction of the full genetic network with thousands of genes unfeasible. Consequently, the maximal network size is usually restricted dramatically to a small set of genes (corresponding with variables in the Bayesian network). Although this feature reduction step makes structure learning computationally tractable, on the downside, the learned structure might be adversely affected due to the introduction of missing genes. Additionally, gene expression data are usually very sparse with respect to the number of samples, i.e., the number of genes is much greater than the number of different observations. Given these problems, learning robust network features from microarray data is a challenging task. This chapter presents several approaches tackling the robustness issue in order to obtain a more reliable estimation of learned network features.
Ding, Huijun; He, Qing; Zhou, Yongjin; Dan, Guo; Cui, Song
2017-01-01
Motion-intent-based finger gesture recognition systems are crucial for many applications such as prosthesis control, sign language recognition, wearable rehabilitation system, and human–computer interaction. In this article, a motion-intent-based finger gesture recognition system is designed to correctly identify the tapping of every finger for the first time. Two auto-event annotation algorithms are firstly applied and evaluated for detecting the finger tapping frame. Based on the truncated signals, the Wavelet packet transform (WPT) coefficients are calculated and compressed as the features, followed by a feature selection method that is able to improve the performance by optimizing the feature set. Finally, three popular classifiers including naive Bayes (NBC), K-nearest neighbor (KNN), and support vector machine (SVM) are applied and evaluated. The recognition accuracy can be achieved up to 94%. The design and the architecture of the system are presented with full system characterization results. PMID:29167655
Deep neural networks for automatic detection of osteoporotic vertebral fractures on CT scans.
Tomita, Naofumi; Cheung, Yvonne Y; Hassanpour, Saeed
2018-07-01
Osteoporotic vertebral fractures (OVFs) are prevalent in older adults and are associated with substantial personal suffering and socio-economic burden. Early diagnosis and treatment of OVFs are critical to prevent further fractures and morbidity. However, OVFs are often under-diagnosed and under-reported in computed tomography (CT) exams as they can be asymptomatic at an early stage. In this paper, we present and evaluate an automatic system that can detect incidental OVFs in chest, abdomen, and pelvis CT examinations at the level of practicing radiologists. Our OVF detection system leverages a deep convolutional neural network (CNN) to extract radiological features from each slice in a CT scan. These extracted features are processed through a feature aggregation module to make the final diagnosis for the full CT scan. In this work, we explored different methods for this feature aggregation, including the use of a long short-term memory (LSTM) network. We trained and evaluated our system on 1432 CT scans, comprised of 10,546 two-dimensional (2D) images in sagittal view. Our system achieved an accuracy of 89.2% and an F1 score of 90.8% based on our evaluation on a held-out test set of 129 CT scans, which were established as reference standards through standard semiquantitative and quantitative methods. The results of our system matched the performance of practicing radiologists on this test set in real-world clinical circumstances. We expect the proposed system will assist and improve OVF diagnosis in clinical settings by pre-screening routine CT examinations and flagging suspicious cases prior to review by radiologists. Copyright © 2018 Elsevier Ltd. All rights reserved.
Sensor-oriented feature usability evaluation in fingerprint segmentation
NASA Astrophysics Data System (ADS)
Li, Ying; Yin, Yilong; Yang, Gongping
2013-06-01
Existing fingerprint segmentation methods usually process fingerprint images captured by different sensors with the same feature or feature set. We propose to improve the fingerprint segmentation result in view of an important fact that images from different sensors have different characteristics for segmentation. Feature usability evaluation, which means to evaluate the usability of features to find the personalized feature or feature set for different sensors to improve the performance of segmentation. The need for feature usability evaluation for fingerprint segmentation is raised and analyzed as a new issue. To address this issue, we present a decision-tree-based feature-usability evaluation method, which utilizes a C4.5 decision tree algorithm to evaluate and pick the best suitable feature or feature set for fingerprint segmentation from a typical candidate feature set. We apply the novel method on the FVC2002 database of fingerprint images, which are acquired by four different respective sensors and technologies. Experimental results show that the accuracy of segmentation is improved, and time consumption for feature extraction is dramatically reduced with selected feature(s).
Fingerprint imaging from the inside of a finger with full-field optical coherence tomography
Auksorius, Egidijus; Boccara, A. Claude
2015-01-01
Imaging below fingertip surface might be a useful alternative to the traditional fingerprint sensing since the internal finger features are more reliable than the external ones. One of the most promising subsurface imaging technique is optical coherence tomography (OCT), which, however, has to acquire 3-D data even when a single en face image is required. This makes OCT inherently slow for en face imaging and produce unnecessary large data sets. Here we demonstrate that full-field optical coherence tomography (FF-OCT) can be used to produce en face images of sweat pores and internal fingerprints, which can be used for the identification purposes. PMID:26601009
NASA Astrophysics Data System (ADS)
Ahmed, Shamim; Miorelli, Roberto; Calmon, Pierre; Anselmi, Nicola; Salucci, Marco
2018-04-01
This paper describes Learning-By-Examples (LBE) technique for performing quasi real time flaw localization and characterization within a conductive tube based on Eddy Current Testing (ECT) signals. Within the framework of LBE, the combination of full-factorial (i.e., GRID) sampling and Partial Least Squares (PLS) feature extraction (i.e., GRID-PLS) techniques are applied for generating a suitable training set in offine phase. Support Vector Regression (SVR) is utilized for model development and inversion during offine and online phases, respectively. The performance and robustness of the proposed GIRD-PLS/SVR strategy on noisy test set is evaluated and compared with standard GRID/SVR approach.
Ab initio Eliashberg Theory: Making Genuine Predictions of Superconducting Features
NASA Astrophysics Data System (ADS)
Sanna, Antonio; Flores-Livas, José A.; Davydov, Arkadiy; Profeta, Gianni; Dewhurst, Kay; Sharma, Sangeeta; Gross, E. K. U.
2018-04-01
We present an application of Eliashberg theory of superconductivity to study a set of novel superconducting systems with a wide range of structural and chemical properties. The set includes three intercalated group-IV honeycomb layered structures, SH3 at 200 GPa (the superconductor with the highest measured critical temperature), the similar system SeH3 at 150 GPa, and a lithium doped mono-layer of black phosphorus. The theoretical approach we adopt is a recently developed, fully ab initio Eliashberg approach that takes into account the Coulomb interaction in a full energy-resolved fashion avoiding any free parameters like μ*. This method provides reasonable estimations of superconducting properties, including TC and the excitation spectra of superconductors.
X-ray EM simulation tool for ptychography dataset construction
NASA Astrophysics Data System (ADS)
Stoevelaar, L. Pjotr; Gerini, Giampiero
2018-03-01
In this paper, we present an electromagnetic full-wave modeling framework, as a support EM tool providing data sets for X-ray ptychographic imaging. Modeling the entire scattering problem with Finite Element Method (FEM) tools is, in fact, a prohibitive task, because of the large area illuminated by the beam (due to the poor focusing power at these wavelengths) and the very small features to be imaged. To overcome this problem, the spectrum of the illumination beam is decomposed into a discrete set of plane waves. This allows reducing the electromagnetic modeling volume to the one enclosing the area to be imaged. The total scattered field is reconstructed by superimposing the solutions for each plane wave illumination.
NASA Technical Reports Server (NTRS)
Mineck, Raymond E.; Thomas, James L.; Biedron, Robert T.; Diskin, Boris
2005-01-01
FMG3D (full multigrid 3 dimensions) is a pilot computer program that solves equations of fluid flow using a finite difference representation on a structured grid. Infrastructure exists for three dimensions but the current implementation treats only two dimensions. Written in Fortran 90, FMG3D takes advantage of the recursive subroutine feature, dynamic memory allocation, and structured-programming constructs of that language. FMG3D supports multi-block grids with three types of block-to-block interfaces: periodic, C-zero, and C-infinity. For all three types, grid points must match at interfaces. For periodic and C-infinity types, derivatives of grid metrics must be continuous at interfaces. The available equation sets are as follows: scalar elliptic equations, scalar convection equations, and the pressure-Poisson formulation of the Navier-Stokes equations for an incompressible fluid. All the equation sets are implemented with nonzero forcing functions to enable the use of user-specified solutions to assist in verification and validation. The equations are solved with a full multigrid scheme using a full approximation scheme to converge the solution on each succeeding grid level. Restriction to the next coarser mesh uses direct injection for variables and full weighting for residual quantities; prolongation of the coarse grid correction from the coarse mesh to the fine mesh uses bilinear interpolation; and prolongation of the coarse grid solution uses bicubic interpolation.
Exploring point-cloud features from partial body views for gender classification
NASA Astrophysics Data System (ADS)
Fouts, Aaron; McCoppin, Ryan; Rizki, Mateen; Tamburino, Louis; Mendoza-Schrock, Olga
2012-06-01
In this paper we extend a previous exploration of histogram features extracted from 3D point cloud images of human subjects for gender discrimination. Feature extraction used a collection of concentric cylinders to define volumes for counting 3D points. The histogram features are characterized by a rotational axis and a selected set of volumes derived from the concentric cylinders. The point cloud images are drawn from the CAESAR anthropometric database provided by the Air Force Research Laboratory (AFRL) Human Effectiveness Directorate and SAE International. This database contains approximately 4400 high resolution LIDAR whole body scans of carefully posed human subjects. Success from our previous investigation was based on extracting features from full body coverage which required integration of multiple camera images. With the full body coverage, the central vertical body axis and orientation are readily obtainable; however, this is not the case with a one camera view providing less than one half body coverage. Assuming that the subjects are upright, we need to determine or estimate the position of the vertical axis and the orientation of the body about this axis relative to the camera. In past experiments the vertical axis was located through the center of mass of torso points projected on the ground plane and the body orientation derived using principle component analysis. In a natural extension of our previous work to partial body views, the absence of rotational invariance about the cylindrical axis greatly increases the difficulty for gender classification. Even the problem of estimating the axis is no longer simple. We describe some simple feasibility experiments that use partial image histograms. Here, the cylindrical axis is assumed to be known. We also discuss experiments with full body images that explore the sensitivity of classification accuracy relative to displacements of the cylindrical axis. Our initial results provide the basis for further investigation of more complex partial body viewing problems and new methods for estimating the two position coordinates for the axis location and the unknown body orientation angle.
Comparing Pattern Recognition Feature Sets for Sorting Triples in the FIRST Database
NASA Astrophysics Data System (ADS)
Proctor, D. D.
2006-07-01
Pattern recognition techniques have been used with increasing success for coping with the tremendous amounts of data being generated by automated surveys. Usually this process involves construction of training sets, the typical examples of data with known classifications. Given a feature set, along with the training set, statistical methods can be employed to generate a classifier. The classifier is then applied to process the remaining data. Feature set selection, however, is still an issue. This paper presents techniques developed for accommodating data for which a substantive portion of the training set cannot be classified unambiguously, a typical case for low-resolution data. Significance tests on the sort-ordered, sample-size-normalized vote distribution of an ensemble of decision trees is introduced as a method of evaluating relative quality of feature sets. The technique is applied to comparing feature sets for sorting a particular radio galaxy morphology, bent-doubles, from the Faint Images of the Radio Sky at Twenty Centimeters (FIRST) database. Also examined are alternative functional forms for feature sets. Associated standard deviations provide the means to evaluate the effect of the number of folds, the number of classifiers per fold, and the sample size on the resulting classifications. The technique also may be applied to situations for which, although accurate classifications are available, the feature set is clearly inadequate, but is desired nonetheless to make the best of available information.
Relational Network for Knowledge Discovery through Heterogeneous Biomedical and Clinical Features
Chen, Huaidong; Chen, Wei; Liu, Chenglin; Zhang, Le; Su, Jing; Zhou, Xiaobo
2016-01-01
Biomedical big data, as a whole, covers numerous features, while each dataset specifically delineates part of them. “Full feature spectrum” knowledge discovery across heterogeneous data sources remains a major challenge. We developed a method called bootstrapping for unified feature association measurement (BUFAM) for pairwise association analysis, and relational dependency network (RDN) modeling for global module detection on features across breast cancer cohorts. Discovered knowledge was cross-validated using data from Wake Forest Baptist Medical Center’s electronic medical records and annotated with BioCarta signaling signatures. The clinical potential of the discovered modules was exhibited by stratifying patients for drug responses. A series of discovered associations provided new insights into breast cancer, such as the effects of patient’s cultural background on preferences for surgical procedure. We also discovered two groups of highly associated features, the HER2 and the ER modules, each of which described how phenotypes were associated with molecular signatures, diagnostic features, and clinical decisions. The discovered “ER module”, which was dominated by cancer immunity, was used as an example for patient stratification and prediction of drug responses to tamoxifen and chemotherapy. BUFAM-derived RDN modeling demonstrated unique ability to discover clinically meaningful and actionable knowledge across highly heterogeneous biomedical big data sets. PMID:27427091
Relational Network for Knowledge Discovery through Heterogeneous Biomedical and Clinical Features
NASA Astrophysics Data System (ADS)
Chen, Huaidong; Chen, Wei; Liu, Chenglin; Zhang, Le; Su, Jing; Zhou, Xiaobo
2016-07-01
Biomedical big data, as a whole, covers numerous features, while each dataset specifically delineates part of them. “Full feature spectrum” knowledge discovery across heterogeneous data sources remains a major challenge. We developed a method called bootstrapping for unified feature association measurement (BUFAM) for pairwise association analysis, and relational dependency network (RDN) modeling for global module detection on features across breast cancer cohorts. Discovered knowledge was cross-validated using data from Wake Forest Baptist Medical Center’s electronic medical records and annotated with BioCarta signaling signatures. The clinical potential of the discovered modules was exhibited by stratifying patients for drug responses. A series of discovered associations provided new insights into breast cancer, such as the effects of patient’s cultural background on preferences for surgical procedure. We also discovered two groups of highly associated features, the HER2 and the ER modules, each of which described how phenotypes were associated with molecular signatures, diagnostic features, and clinical decisions. The discovered “ER module”, which was dominated by cancer immunity, was used as an example for patient stratification and prediction of drug responses to tamoxifen and chemotherapy. BUFAM-derived RDN modeling demonstrated unique ability to discover clinically meaningful and actionable knowledge across highly heterogeneous biomedical big data sets.
A novel feature extraction approach for microarray data based on multi-algorithm fusion
Jiang, Zhu; Xu, Rong
2015-01-01
Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions. PMID:25780277
A novel feature extraction approach for microarray data based on multi-algorithm fusion.
Jiang, Zhu; Xu, Rong
2015-01-01
Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions.
NASA Technical Reports Server (NTRS)
Romanofsky, Robert R.; Shalkhauser, Kurt A.
1989-01-01
The design and evaluation of a novel fixturing technique for characterizing millimeter wave solid state devices is presented. The technique utilizes a cosine-tapered ridge guide fixture and a one-tier de-embedding procedure to produce accurate and repeatable device level data. Advanced features of this technique include nondestructive testing, full waveguide bandwidth operation, universality of application, and rapid, yet repeatable, chip-level characterization. In addition, only one set of calibration standards is required regardless of the device geometry.
A serial digital data communications device. [for real time flight simulation
NASA Technical Reports Server (NTRS)
Fetter, J. L.
1977-01-01
A general purpose computer peripheral device which is used to provide a full-duplex, serial, digital data transmission link between a Xerox Sigma computer and a wide variety of external equipment, including computers, terminals, and special purpose devices is reported. The interface has an extensive set of user defined options to assist the user in establishing the necessary data links. This report describes those options and other features of the serial communications interface and its performance by discussing its application to a particular problem.
Stone, B N; Griesinger, G L; Modelevsky, J L
1984-01-01
We describe an interactive computational tool, PLASMAP, which allows the user to electronically store, retrieve, and display circular restriction maps. PLASMAP permits users to construct libraries of plasmid restriction maps as a set of files which may be edited in the laboratory at any time. The display feature of PLASMAP quickly generates device-independent, artist-quality, full-color or monochrome, hard copies or CRT screens of complex, conventional circular restriction maps. PMID:6320096
Diagnostic and prognostic histopathology system using morphometric indices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parvin, Bahram; Chang, Hang; Han, Ju
Determining at least one of a prognosis or a therapy for a patient based on a stained tissue section of the patient. An image of a stained tissue section of a patient is processed by a processing device. A set of features values for a set of cell-based features is extracted from the processed image, and the processed image is associated with a particular cluster of a plurality of clusters based on the set of feature values, where the plurality of clusters is defined with respect to a feature space corresponding to the set of features.
NASA Astrophysics Data System (ADS)
Fox, Neil I.; Micheas, Athanasios C.; Peng, Yuqiang
2016-07-01
This paper introduces the use of Bayesian full Procrustes shape analysis in object-oriented meteorological applications. In particular, the Procrustes methodology is used to generate mean forecast precipitation fields from a set of ensemble forecasts. This approach has advantages over other ensemble averaging techniques in that it can produce a forecast that retains the morphological features of the precipitation structures and present the range of forecast outcomes represented by the ensemble. The production of the ensemble mean avoids the problems of smoothing that result from simple pixel or cell averaging, while producing credible sets that retain information on ensemble spread. Also in this paper, the full Bayesian Procrustes scheme is used as an object verification tool for precipitation forecasts. This is an extension of a previously presented Procrustes shape analysis based verification approach into a full Bayesian format designed to handle the verification of precipitation forecasts that match objects from an ensemble of forecast fields to a single truth image. The methodology is tested on radar reflectivity nowcasts produced in the Warning Decision Support System - Integrated Information (WDSS-II) by varying parameters in the K-means cluster tracking scheme.
Anomaly Detection Using an Ensemble of Feature Models
Noto, Keith; Brodley, Carla; Slonim, Donna
2011-01-01
We present a new approach to semi-supervised anomaly detection. Given a set of training examples believed to come from the same distribution or class, the task is to learn a model that will be able to distinguish examples in the future that do not belong to the same class. Traditional approaches typically compare the position of a new data point to the set of “normal” training data points in a chosen representation of the feature space. For some data sets, the normal data may not have discernible positions in feature space, but do have consistent relationships among some features that fail to appear in the anomalous examples. Our approach learns to predict the values of training set features from the values of other features. After we have formed an ensemble of predictors, we apply this ensemble to new data points. To combine the contribution of each predictor in our ensemble, we have developed a novel, information-theoretic anomaly measure that our experimental results show selects against noisy and irrelevant features. Our results on 47 data sets show that for most data sets, this approach significantly improves performance over current state-of-the-art feature space distance and density-based approaches. PMID:22020249
Jain, Anil K; Feng, Jianjiang
2009-06-01
The evidential value of palmprints in forensic applications is clear as about 30 percent of the latents recovered from crime scenes are from palms. While biometric systems for palmprint-based personal authentication in access control type of applications have been developed, they mostly deal with low-resolution (about 100 ppi) palmprints and only perform full-to-full palmprint matching. We propose a latent-to-full palmprint matching system that is needed in forensic applications. Our system deals with palmprints captured at 500 ppi (the current standard in forensic applications) or higher resolution and uses minutiae as features to be compatible with the methodology used by latent experts. Latent palmprint matching is a challenging problem because latent prints lifted at crime scenes are of poor image quality, cover only a small area of the palm, and have a complex background. Other difficulties include a large number of minutiae in full prints (about 10 times as many as fingerprints), and the presence of many creases in latents and full prints. A robust algorithm to reliably estimate the local ridge direction and frequency in palmprints is developed. This facilitates the extraction of ridge and minutiae features even in poor quality palmprints. A fixed-length minutia descriptor, MinutiaCode, is utilized to capture distinctive information around each minutia and an alignment-based minutiae matching algorithm is used to match two palmprints. Two sets of partial palmprints (150 live-scan partial palmprints and 100 latent palmprints) are matched to a background database of 10,200 full palmprints to test the proposed system. Despite the inherent difficulty of latent-to-full palmprint matching, rank-1 recognition rates of 78.7 and 69 percent, respectively, were achieved in searching live-scan partial palmprints and latent palmprints against the background database.
Feature Selection Methods for Zero-Shot Learning of Neural Activity.
Caceres, Carlos A; Roos, Matthew J; Rupp, Kyle M; Milsap, Griffin; Crone, Nathan E; Wolmetz, Michael E; Ratto, Christopher R
2017-01-01
Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy.
Wang, Juan; Nishikawa, Robert M; Yang, Yongyi
2017-07-01
Mammograms acquired with full-field digital mammography (FFDM) systems are provided in both "for-processing'' and "for-presentation'' image formats. For-presentation images are traditionally intended for visual assessment by the radiologists. In this study, we investigate the feasibility of using for-presentation images in computerized analysis and diagnosis of microcalcification (MC) lesions. We make use of a set of 188 matched mammogram image pairs of MC lesions from 95 cases (biopsy proven), in which both for-presentation and for-processing images are provided for each lesion. We then analyze and characterize the MC lesions from for-presentation images and compare them with their counterparts in for-processing images. Specifically, we consider three important aspects in computer-aided diagnosis (CAD) of MC lesions. First, we quantify each MC lesion with a set of 10 image features of clustered MCs and 12 textural features of the lesion area. Second, we assess the detectability of individual MCs in each lesion from the for-presentation images by a commonly used difference-of-Gaussians (DoG) detector. Finally, we study the diagnostic accuracy in discriminating between benign and malignant MC lesions from the for-presentation images by a pretrained support vector machine (SVM) classifier. To accommodate the underlying background suppression and image enhancement in for-presentation images, a normalization procedure is applied. The quantitative image features of MC lesions from for-presentation images are highly consistent with that from for-processing images. The values of Pearson's correlation coefficient between features from the two formats range from 0.824 to 0.961 for the 10 MC image features, and from 0.871 to 0.963 for the 12 textural features. In detection of individual MCs, the FROC curve from for-presentation is similar to that from for-processing. In particular, at sensitivity level of 80%, the average number of false-positives (FPs) per image region is 9.55 for both for-presentation and for-processing images. Finally, for classifying MC lesions as malignant or benign, the area under the ROC curve is 0.769 in for-presentation, compared to 0.761 in for-processing (P = 0.436). The quantitative results demonstrate that MC lesions in for-presentation images are highly consistent with that in for-processing images in terms of image features, detectability of individual MCs, and classification accuracy between malignant and benign lesions. These results indicate that for-presentation images can be compatible with for-processing images for use in CAD algorithms for MC lesions. © 2017 American Association of Physicists in Medicine.
Virtual screening of cathepsin k inhibitors using docking and pharmacophore models.
Ravikumar, Muttineni; Pavan, S; Bairy, Santhosh; Pramod, A B; Sumakanth, M; Kishore, Madala; Sumithra, Tirunagaram
2008-07-01
Cathepsin K is a lysosomal cysteine protease that is highly and selectively expressed in osteoclasts, the cells which degrade bone during the continuous cycle of bone degradation and formation. Inhibition of cathepsin K represents a potential therapeutic approach for diseases characterized by excessive bone resorption such as osteoporosis. In order to elucidate the essential structural features for cathepsin K, a three-dimensional pharmacophore hypotheses were built on the basis of a set of known cathepsin K inhibitors selected from the literature using catalyst program. Several methods are used in validation of pharmacophore hypothesis were presented, and the fourth hypothesis (Hypo4) was considered to be the best pharmacophore hypothesis which has a correlation coefficient of 0.944 with training set and has high prediction of activity for a set of 30 test molecules with correlation of 0.909. The model (Hypo4) was then employed as 3D search query to screen the Maybridge database containing 59,000 compounds, to discover novel and highly potent ligands. For analyzing intermolecular interactions between protein and ligand, all the molecules were docked using Glide software. The result showed that the type and spatial location of chemical features encoded in the pharmacophore are in full agreement with the enzyme inhibitor interaction pattern identified from molecular docking.
Sturm, Irene; Blankertz, Benjamin; Potes, Cristhian; Schalk, Gerwin; Curio, Gabriel
2014-01-01
Listening to music moves our minds and moods, stirring interest in its neural underpinnings. A multitude of compositional features drives the appeal of natural music. How such original music, where a composer's opus is not manipulated for experimental purposes, engages a listener's brain has not been studied until recently. Here, we report an in-depth analysis of two electrocorticographic (ECoG) data sets obtained over the left hemisphere in ten patients during presentation of either a rock song or a read-out narrative. First, the time courses of five acoustic features (intensity, presence/absence of vocals with lyrics, spectral centroid, harmonic change, and pulse clarity) were extracted from the audio tracks and found to be correlated with each other to varying degrees. In a second step, we uncovered the specific impact of each musical feature on ECoG high-gamma power (70-170 Hz) by calculating partial correlations to remove the influence of the other four features. In the music condition, the onset and offset of vocal lyrics in ongoing instrumental music was consistently identified within the group as the dominant driver for ECoG high-gamma power changes over temporal auditory areas, while concurrently subject-individual activation spots were identified for sound intensity, timbral, and harmonic features. The distinct cortical activations to vocal speech-related content embedded in instrumental music directly demonstrate that song integrated in instrumental music represents a distinct dimension in complex music. In contrast, in the speech condition, the full sound envelope was reflected in the high gamma response rather than the onset or offset of the vocal lyrics. This demonstrates how the contributions of stimulus features that modulate the brain response differ across the two examples of a full-length natural stimulus, which suggests a context-dependent feature selection in the processing of complex auditory stimuli.
Mougiakakou, Stavroula G; Valavanis, Ioannis K; Nikita, Alexandra; Nikita, Konstantina S
2007-09-01
The aim of the present study is to define an optimally performing computer-aided diagnosis (CAD) architecture for the classification of liver tissue from non-enhanced computed tomography (CT) images into normal liver (C1), hepatic cyst (C2), hemangioma (C3), and hepatocellular carcinoma (C4). To this end, various CAD architectures, based on texture features and ensembles of classifiers (ECs), are comparatively assessed. Number of regions of interests (ROIs) corresponding to C1-C4 have been defined by experienced radiologists in non-enhanced liver CT images. For each ROI, five distinct sets of texture features were extracted using first order statistics, spatial gray level dependence matrix, gray level difference method, Laws' texture energy measures, and fractal dimension measurements. Two different ECs were constructed and compared. The first one consists of five multilayer perceptron neural networks (NNs), each using as input one of the computed texture feature sets or its reduced version after genetic algorithm-based feature selection. The second EC comprised five different primary classifiers, namely one multilayer perceptron NN, one probabilistic NN, and three k-nearest neighbor classifiers, each fed with the combination of the five texture feature sets or their reduced versions. The final decision of each EC was extracted by using appropriate voting schemes, while bootstrap re-sampling was utilized in order to estimate the generalization ability of the CAD architectures based on the available relatively small-sized data set. The best mean classification accuracy (84.96%) is achieved by the second EC using a fused feature set, and the weighted voting scheme. The fused feature set was obtained after appropriate feature selection applied to specific subsets of the original feature set. The comparative assessment of the various CAD architectures shows that combining three types of classifiers with a voting scheme, fed with identical feature sets obtained after appropriate feature selection and fusion, may result in an accurate system able to assist differential diagnosis of focal liver lesions from non-enhanced CT images.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lopez, Juan; Liefer, Nathan C.; Busho, Colin R.
Here, the need for improved Critical Infrastructure and Key Resource (CIKR) security is unquestioned and there has been minimal emphasis on Level-0 (PHY Process) improvements. Wired Signal Distinct Native Attribute (WS-DNA) Fingerprinting is investigated here as a non-intrusive PHY-based security augmentation to support an envisioned layered security strategy. Results are based on experimental response collections from Highway Addressable Remote Transducer (HART) Differential Pressure Transmitter (DPT) devices from three manufacturers (Yokogawa, Honeywell, Endress+Hauer) installed in an automated process control system. Device discrimination is assessed using Time Domain (TD) and Slope-Based FSK (SB-FSK) fingerprints input to Multiple Discriminant Analysis, Maximum Likelihood (MDA/ML)more » and Random Forest (RndF) classifiers. For 12 different classes (two devices per manufacturer at two distinct set points), both classifiers performed reliably and achieved an arbitrary performance benchmark of average cross-class percent correct of %C > 90%. The least challenging cross-manufacturer results included near-perfect %C ≈ 100%, while the more challenging like-model (serial number) discrimination results included 90%< %C < 100%, with TD Fingerprinting marginally outperforming SB-FSK Fingerprinting; SB-FSK benefits from having less stringent response alignment and registration requirements. The RndF classifier was most beneficial and enabled reliable selection of dimensionally reduced fingerprint subsets that minimize data storage and computational requirements. The RndF selected feature sets contained 15% of the full-dimensional feature sets and only suffered a worst case %CΔ = 3% to 4% performance degradation.« less
Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R; Barman, Ishan; Kumar Gundawar, Manoj
2015-08-19
Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the 'curse of dimensionality' have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers -based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations.
Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R.; Barman, Ishan; Kumar Gundawar, Manoj
2015-01-01
Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the ‘curse of dimensionality’ have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers –based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations. PMID:26286630
Nagarajan, Mahesh B; Coan, Paola; Huber, Markus B; Diemoz, Paul C; Wismüller, Axel
2015-01-01
Phase contrast X-ray computed tomography (PCI-CT) has been demonstrated as a novel imaging technique that can visualize human cartilage with high spatial resolution and soft tissue contrast. Different textural approaches have been previously investigated for characterizing chondrocyte organization on PCI-CT to enable classification of healthy and osteoarthritic cartilage. However, the large size of feature sets extracted in such studies motivates an investigation into algorithmic feature reduction for computing efficient feature representations without compromising their discriminatory power. For this purpose, geometrical feature sets derived from the scaling index method (SIM) were extracted from 1392 volumes of interest (VOI) annotated on PCI-CT images of ex vivo human patellar cartilage specimens. The extracted feature sets were subject to linear and non-linear dimension reduction techniques as well as feature selection based on evaluation of mutual information criteria. The reduced feature set was subsequently used in a machine learning task with support vector regression to classify VOIs as healthy or osteoarthritic; classification performance was evaluated using the area under the receiver-operating characteristic (ROC) curve (AUC). Our results show that the classification performance achieved by 9-D SIM-derived geometric feature sets (AUC: 0.96 ± 0.02) can be maintained with 2-D representations computed from both dimension reduction and feature selection (AUC values as high as 0.97 ± 0.02). Thus, such feature reduction techniques can offer a high degree of compaction to large feature sets extracted from PCI-CT images while maintaining their ability to characterize the underlying chondrocyte patterns.
Feature Selection Methods for Zero-Shot Learning of Neural Activity
Caceres, Carlos A.; Roos, Matthew J.; Rupp, Kyle M.; Milsap, Griffin; Crone, Nathan E.; Wolmetz, Michael E.; Ratto, Christopher R.
2017-01-01
Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy. PMID:28690513
Impact of experimental design on PET radiomics in predicting somatic mutation status.
Yip, Stephen S F; Parmar, Chintan; Kim, John; Huynh, Elizabeth; Mak, Raymond H; Aerts, Hugo J W L
2017-12-01
PET-based radiomic features have demonstrated great promises in predicting genetic data. However, various experimental parameters can influence the feature extraction pipeline, and hence, Here, we investigated how experimental settings affect the performance of radiomic features in predicting somatic mutation status in non-small cell lung cancer (NSCLC) patients. 348 NSCLC patients with somatic mutation testing and diagnostic PET images were included in our analysis. Radiomic feature extractions were analyzed for varying voxel sizes, filters and bin widths. 66 radiomic features were evaluated. The performance of features in predicting mutations status was assessed using the area under the receiver-operating-characteristic curve (AUC). The influence of experimental parameters on feature predictability was quantified as the relative difference between the minimum and maximum AUC (δ). The large majority of features (n=56, 85%) were significantly predictive for EGFR mutation status (AUC≥0.61). 29 radiomic features significantly predicted EGFR mutations and were robust to experimental settings with δ Overall <5%. The overall influence (δ Overall ) of the voxel size, filter and bin width for all features ranged from 5% to 15%, respectively. For all features, none of the experimental designs was predictive of KRAS+ from KRAS- (AUC≤0.56). The predictability of 29 radiomic features was robust to the choice of experimental settings; however, these settings need to be carefully chosen for all other features. The combined effect of the investigated processing methods could be substantial and must be considered. Optimized settings that will maximize the predictive performance of individual radiomic features should be investigated in the future. Copyright © 2017 Elsevier B.V. All rights reserved.
Reproducibility of radiomics for deciphering tumor phenotype with imaging
NASA Astrophysics Data System (ADS)
Zhao, Binsheng; Tan, Yongqiang; Tsai, Wei-Yann; Qi, Jing; Xie, Chuanmiao; Lu, Lin; Schwartz, Lawrence H.
2016-03-01
Radiomics (radiogenomics) characterizes tumor phenotypes based on quantitative image features derived from routine radiologic imaging to improve cancer diagnosis, prognosis, prediction and response to therapy. Although radiomic features must be reproducible to qualify as biomarkers for clinical care, little is known about how routine imaging acquisition techniques/parameters affect reproducibility. To begin to fill this knowledge gap, we assessed the reproducibility of a comprehensive, commonly-used set of radiomic features using a unique, same-day repeat computed tomography data set from lung cancer patients. Each scan was reconstructed at 6 imaging settings, varying slice thicknesses (1.25 mm, 2.5 mm and 5 mm) and reconstruction algorithms (sharp, smooth). Reproducibility was assessed using the repeat scans reconstructed at identical imaging setting (6 settings in total). In separate analyses, we explored differences in radiomic features due to different imaging parameters by assessing the agreement of these radiomic features extracted from the repeat scans reconstructed at the same slice thickness but different algorithms (3 settings in total). Our data suggest that radiomic features are reproducible over a wide range of imaging settings. However, smooth and sharp reconstruction algorithms should not be used interchangeably. These findings will raise awareness of the importance of properly setting imaging acquisition parameters in radiomics/radiogenomics research.
A local structure model for network analysis
Casleton, Emily; Nordman, Daniel; Kaiser, Mark
2017-04-01
The statistical analysis of networks is a popular research topic with ever widening applications. Exponential random graph models (ERGMs), which specify a model through interpretable, global network features, are common for this purpose. In this study we introduce a new class of models for network analysis, called local structure graph models (LSGMs). In contrast to an ERGM, a LSGM specifies a network model through local features and allows for an interpretable and controllable local dependence structure. In particular, LSGMs are formulated by a set of full conditional distributions for each network edge, e.g., the probability of edge presence/absence, depending onmore » neighborhoods of other edges. Additional model features are introduced to aid in specification and to help alleviate a common issue (occurring also with ERGMs) of model degeneracy. Finally, the proposed models are demonstrated on a network of tornadoes in Arkansas where a LSGM is shown to perform significantly better than a model without local dependence.« less
A local structure model for network analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Casleton, Emily; Nordman, Daniel; Kaiser, Mark
The statistical analysis of networks is a popular research topic with ever widening applications. Exponential random graph models (ERGMs), which specify a model through interpretable, global network features, are common for this purpose. In this study we introduce a new class of models for network analysis, called local structure graph models (LSGMs). In contrast to an ERGM, a LSGM specifies a network model through local features and allows for an interpretable and controllable local dependence structure. In particular, LSGMs are formulated by a set of full conditional distributions for each network edge, e.g., the probability of edge presence/absence, depending onmore » neighborhoods of other edges. Additional model features are introduced to aid in specification and to help alleviate a common issue (occurring also with ERGMs) of model degeneracy. Finally, the proposed models are demonstrated on a network of tornadoes in Arkansas where a LSGM is shown to perform significantly better than a model without local dependence.« less
Feature Selection for Chemical Sensor Arrays Using Mutual Information
Wang, X. Rosalind; Lizier, Joseph T.; Nowotny, Thomas; Berna, Amalia Z.; Prokopenko, Mikhail; Trowell, Stephen C.
2014-01-01
We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays. PMID:24595058
DOE Office of Scientific and Technical Information (OSTI.GOV)
Henriques, V. M. J.; Mathioudakis, M.; Socas-Navarro, H.
We perform non-LTE inversions in a large set of umbral flashes, including the dark fibrils visible within them, and in the quiescent umbra by using the inversion code NICOLE on a set of full Stokes high-resolution Ca ii λ 8542 observations of a sunspot at disk center. We find that the dark structures have Stokes profiles that are distinct from those of the quiescent and flashed regions. They are best reproduced by atmospheres that are more similar to the flashed atmosphere in terms of velocities, even if with reduced amplitudes. We also find two sets of solutions that finely fitmore » the flashed profiles: a set that is upflowing, featuring a transition region that is deeper than in the quiescent case and preceded by a slight dip in temperature, and a second solution with a hotter atmosphere in the chromosphere but featuring downflows close to the speed of sound at such heights. Such downflows may be related, or even dependent, on the presence of coronal loops, rooted in the umbra of sunspots, as is the case in the region analyzed. Similar loops have been recently observed to have supersonic downflows in the transition region and are consistent with the earlier “sunspot plumes,” which were invariably found to display strong downflows in sunspots. Finally, we find, on average, a magnetic field reduction in the flashed areas, suggesting that the shock pressure is moving field lines in the upper layers.« less
Nagarajan, Mahesh B.; Coan, Paola; Huber, Markus B.; Diemoz, Paul C.; Wismüller, Axel
2015-01-01
Phase contrast X-ray computed tomography (PCI-CT) has been demonstrated as a novel imaging technique that can visualize human cartilage with high spatial resolution and soft tissue contrast. Different textural approaches have been previously investigated for characterizing chondrocyte organization on PCI-CT to enable classification of healthy and osteoarthritic cartilage. However, the large size of feature sets extracted in such studies motivates an investigation into algorithmic feature reduction for computing efficient feature representations without compromising their discriminatory power. For this purpose, geometrical feature sets derived from the scaling index method (SIM) were extracted from 1392 volumes of interest (VOI) annotated on PCI-CT images of ex vivo human patellar cartilage specimens. The extracted feature sets were subject to linear and non-linear dimension reduction techniques as well as feature selection based on evaluation of mutual information criteria. The reduced feature set was subsequently used in a machine learning task with support vector regression to classify VOIs as healthy or osteoarthritic; classification performance was evaluated using the area under the receiver-operating characteristic (ROC) curve (AUC). Our results show that the classification performance achieved by 9-D SIM-derived geometric feature sets (AUC: 0.96 ± 0.02) can be maintained with 2-D representations computed from both dimension reduction and feature selection (AUC values as high as 0.97 ± 0.02). Thus, such feature reduction techniques can offer a high degree of compaction to large feature sets extracted from PCI-CT images while maintaining their ability to characterize the underlying chondrocyte patterns. PMID:25710875
Speech recognition features for EEG signal description in detection of neonatal seizures.
Temko, A; Boylan, G; Marnane, W; Lightbody, G
2010-01-01
In this work, features which are usually employed in automatic speech recognition (ASR) are used for the detection of neonatal seizures in newborn EEG. Three conventional ASR feature sets are compared to the feature set which has been previously developed for this task. The results indicate that the thoroughly-studied spectral envelope based ASR features perform reasonably well on their own. Additionally, the SVM Recursive Feature Elimination routine is applied to all extracted features pooled together. It is shown that ASR features consistently appear among the top-rank features.
Music Structure Analysis from Acoustic Signals
NASA Astrophysics Data System (ADS)
Dannenberg, Roger B.; Goto, Masataka
Music is full of structure, including sections, sequences of distinct musical textures, and the repetition of phrases or entire sections. The analysis of music audio relies upon feature vectors that convey information about music texture or pitch content. Texture generally refers to the average spectral shape and statistical fluctuation, often reflecting the set of sounding instruments, e.g., strings, vocal, or drums. Pitch content reflects melody and harmony, which is often independent of texture. Structure is found in several ways. Segment boundaries can be detected by observing marked changes in locally averaged texture.
Kiranyaz, Serkan; Mäkinen, Toni; Gabbouj, Moncef
2012-10-01
In this paper, we propose a novel framework based on a collective network of evolutionary binary classifiers (CNBC) to address the problems of feature and class scalability. The main goal of the proposed framework is to achieve a high classification performance over dynamic audio and video repositories. The proposed framework adopts a "Divide and Conquer" approach in which an individual network of binary classifiers (NBC) is allocated to discriminate each audio class. An evolutionary search is applied to find the best binary classifier in each NBC with respect to a given criterion. Through the incremental evolution sessions, the CNBC framework can dynamically adapt to each new incoming class or feature set without resorting to a full-scale re-training or re-configuration. Therefore, the CNBC framework is particularly designed for dynamically varying databases where no conventional static classifiers can adapt to such changes. In short, it is entirely a novel topology, an unprecedented approach for dynamic, content/data adaptive and scalable audio classification. A large set of audio features can be effectively used in the framework, where the CNBCs make appropriate selections and combinations so as to achieve the highest discrimination among individual audio classes. Experiments demonstrate a high classification accuracy (above 90%) and efficiency of the proposed framework over large and dynamic audio databases. Copyright © 2012 Elsevier Ltd. All rights reserved.
Automatic feature design for optical character recognition using an evolutionary search procedure.
Stentiford, F W
1985-03-01
An automatic evolutionary search is applied to the problem of feature extraction in an OCR application. A performance measure based on feature independence is used to generate features which do not appear to suffer from peaking effects [17]. Features are extracted from a training set of 30 600 machine printed 34 class alphanumeric characters derived from British mail. Classification results on the training set and a test set of 10 200 characters are reported for an increasing number of features. A 1.01 percent forced decision error rate is obtained on the test data using 316 features. The hardware implementation should be cheap and fast to operate. The performance compares favorably with current low cost OCR page readers.
Wong, Gerard; Leckie, Christopher; Kowalczyk, Adam
2012-01-15
Feature selection is a key concept in machine learning for microarray datasets, where features represented by probesets are typically several orders of magnitude larger than the available sample size. Computational tractability is a key challenge for feature selection algorithms in handling very high-dimensional datasets beyond a hundred thousand features, such as in datasets produced on single nucleotide polymorphism microarrays. In this article, we present a novel feature set reduction approach that enables scalable feature selection on datasets with hundreds of thousands of features and beyond. Our approach enables more efficient handling of higher resolution datasets to achieve better disease subtype classification of samples for potentially more accurate diagnosis and prognosis, which allows clinicians to make more informed decisions in regards to patient treatment options. We applied our feature set reduction approach to several publicly available cancer single nucleotide polymorphism (SNP) array datasets and evaluated its performance in terms of its multiclass predictive classification accuracy over different cancer subtypes, its speedup in execution as well as its scalability with respect to sample size and array resolution. Feature Set Reduction (FSR) was able to reduce the dimensions of an SNP array dataset by more than two orders of magnitude while achieving at least equal, and in most cases superior predictive classification performance over that achieved on features selected by existing feature selection methods alone. An examination of the biological relevance of frequently selected features from FSR-reduced feature sets revealed strong enrichment in association with cancer. FSR was implemented in MATLAB R2010b and is available at http://ww2.cs.mu.oz.au/~gwong/FSR.
NASA Astrophysics Data System (ADS)
Beamish, David; White, James C.
2011-01-01
A number of modern, multiparameter, high resolution airborne geophysical surveys (termed HiRES) have been conducted over the past decade across onshore UK. These were undertaken, in part, as a response to the limited resolution of the existing UK national baseline magnetic survey data set acquired in the late 1950s and early 1960s. Modern magnetic survey data, obtained with higher precision and reduced line spacing and elevation, provide an improved data set; however the distinctions between the two available resources, existing and new, are rarely quantified. In this contribution we demonstrate and quantify the improvements that can be anticipated using the new data. The information content of the data sets is examined using a series of modern processing and modelling procedures that provide a full assessment of their resolution capabilities. The framework for the study involves two components. The first relates to the definition of the shallow magnetic structure in relation to an ongoing 1:10 k and 1:50 k geological map revision. The second component relates to the performance of the datasets in defining maps of magnetic basement and assisting with larger scale geological and structural interpretation. One of the smaller HiRES survey areas, the island of Anglesey (Ynys Môn), off the coast of NW Wales is used to provide a series of comparative studies. The geological setting here is both complex and debated and cultural interference is prevalent in the low altitude modern survey data. It is demonstrated that successful processing and interpretation can be carried out on data that have not been systematically corrected (decultured) for non-geological perturbations. Across the survey area a wide number of near-surface magnetic features are evident and are dominated by a reversely magnetized Palaeogene dyke swarm that extends offshore. The average depth to the upper surfaces of the dykes is found to be 44 m. The existing baseline data are necessarily limited in resolving features <1 km in scale; however a detailed comparison of the existing and new data reveals the extent to which these quasi-linear features can be resolved and mapped. The precise limitations of the baseline data in terms of detection, location and estimated depth are quantified. The spectral content of both data sets is examined and the longest wavelength information is extracted to estimate the resolution of magnetic basement features in the two data sets. A significant finding is the lack of information in the baseline data set across wavelengths of between 1 and ˜10 km. Here the HiRES data provide a detailed mapping of shallow magnetic basement features (1-3 km) that display a relevance to current understanding of the fault-bounded terranes that cross the survey area. Equally, the compact scale of the modern survey does not provide deeper (>3 km to upper surface) assessments of magnetic basement. This further assessment is successfully provided by the larger scale baseline data which locates and defines a mid-crustal magnetic basement feature, centred beneath the Snowdon Massif, and illustrates that basement of similar characteristic extends beneath much of Anglesey.
Park, Jong-Uk; Erdenebayar, Urtnasan; Joo, Eun-Yeon; Lee, Kyoung-Joung
2017-06-27
This paper proposes a method for classifying sleep-wakefulness and estimating sleep parameters using nasal pressure signals applicable to a continuous positive airway pressure (CPAP) device. In order to classify the sleep-wakefulness states of patients with sleep-disordered breathing (SDB), apnea-hypopnea and snoring events are first detected. Epochs detected as SDB are classified as sleep, and time-domain- and frequency-domain-based features are extracted from the epochs that are detected as normal breathing. Subsequently, sleep-wakefulness is classified using a support vector machine (SVM) classifier in the normal breathing epoch. Finally, four sleep parameters-sleep onset, wake after sleep onset, total sleep time and sleep efficiency-are estimated based on the classified sleep-wakefulness. In order to develop and test the algorithm, 110 patients diagnosed with SDB participated in this study. Ninety of the subjects underwent full-night polysomnography (PSG) and twenty underwent split-night PSG. The subjects were divided into 50 patients of a training set (full/split: 42/8), 30 of a validation set (full/split: 24/6) and 30 of a test set (full/split: 24/6). In the experiments conducted, sleep-wakefulness classification accuracy was found to be 83.2% in the test set, compared with the PSG scoring results of clinical experts. Furthermore, all four sleep parameters showed higher correlations than the results obtained via PSG (r ⩾ 0.84, p < 0.05). In order to determine whether the proposed method is applicable to CPAP, sleep-wakefulness classification performances were evaluated for each CPAP in the split-night PSG data. The results indicate that the accuracy and sensitivity of sleep-wakefulness classification by CPAP variation shows no statistically significant difference (p < 0.05). The contributions made in this study are applicable to the automatic classification of sleep-wakefulness states in CPAP devices and evaluation of the quality of sleep.
Shrivastava, Vimal K; Londhe, Narendra D; Sonawane, Rajendra S; Suri, Jasjit S
2016-04-01
Psoriasis is an autoimmune skin disease with red and scaly plaques on skin and affecting about 125 million people worldwide. Currently, dermatologist use visual and haptic methods for diagnosis the disease severity. This does not help them in stratification and risk assessment of the lesion stage and grade. Further, current methods add complexity during monitoring and follow-up phase. The current diagnostic tools lead to subjectivity in decision making and are unreliable and laborious. This paper presents a first comparative performance study of its kind using principal component analysis (PCA) based CADx system for psoriasis risk stratification and image classification utilizing: (i) 11 higher order spectra (HOS) features, (ii) 60 texture features, and (iii) 86 color feature sets and their seven combinations. Aggregate 540 image samples (270 healthy and 270 diseased) from 30 psoriasis patients of Indian ethnic origin are used in our database. Machine learning using PCA is used for dominant feature selection which is then fed to support vector machine classifier (SVM) to obtain optimized performance. Three different protocols are implemented using three kinds of feature sets. Reliability index of the CADx is computed. Among all feature combinations, the CADx system shows optimal performance of 100% accuracy, 100% sensitivity and specificity, when all three sets of feature are combined. Further, our experimental result with increasing data size shows that all feature combinations yield high reliability index throughout the PCA-cutoffs except color feature set and combination of color and texture feature sets. HOS features are powerful in psoriasis disease classification and stratification. Even though, independently, all three set of features HOS, texture, and color perform competitively, but when combined, the machine learning system performs the best. The system is fully automated, reliable and accurate. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Assessing the precision of gaze following using a stereoscopic 3D virtual reality setting.
Atabaki, Artin; Marciniak, Karolina; Dicke, Peter W; Thier, Peter
2015-07-01
Despite the ecological importance of gaze following, little is known about the underlying neuronal processes, which allow us to extract gaze direction from the geometric features of the eye and head of a conspecific. In order to understand the neuronal mechanisms underlying this ability, a careful description of the capacity and the limitations of gaze following at the behavioral level is needed. Previous studies of gaze following, which relied on naturalistic settings have the disadvantage of allowing only very limited control of potentially relevant visual features guiding gaze following, such as the contrast of iris and sclera, the shape of the eyelids and--in the case of photographs--they lack depth. Hence, in order to get full control of potentially relevant features we decided to study gaze following of human observers guided by the gaze of a human avatar seen stereoscopically. To this end we established a stereoscopic 3D virtual reality setup, in which we tested human subjects' abilities to detect at which target a human avatar was looking at. Following the gaze of the avatar showed all the features of the gaze following of a natural person, namely a substantial degree of precision associated with a consistent pattern of systematic deviations from the target. Poor stereo vision affected performance surprisingly little (only in certain experimental conditions). Only gaze following guided by targets at larger downward eccentricities exhibited a differential effect of the presence or absence of accompanying movements of the avatar's eyelids and eyebrows. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Nemoto, Mitsutaka; Hayashi, Naoto; Hanaoka, Shouhei; Nomura, Yukihiro; Miki, Soichiro; Yoshikawa, Takeharu; Ohtomo, Kuni
2016-03-01
The purpose of this study is to evaluate the feasibility of a novel feature generation, which is based on multiple deep neural networks (DNNs) with boosting, for computer-assisted detection (CADe). It is hard and time-consuming to optimize the hyperparameters for DNNs such as stacked denoising autoencoder (SdA). The proposed method allows using SdA based features without the burden of the hyperparameter setting. The proposed method was evaluated by an application for detecting cerebral aneurysms on magnetic resonance angiogram (MRA). A baseline CADe process included four components; scaling, candidate area limitation, candidate detection, and candidate classification. Proposed feature generation method was applied to extract the optimal features for candidate classification. Proposed method only required setting range of the hyperparameters for SdA. The optimal feature set was selected from a large quantity of SdA based features by multiple SdAs, each of which was trained using different hyperparameter set. The feature selection was operated through ada-boost ensemble learning method. Training of the baseline CADe process and proposed feature generation were operated with 200 MRA cases, and the evaluation was performed with 100 MRA cases. Proposed method successfully provided SdA based features just setting the range of some hyperparameters for SdA. The CADe process by using both previous voxel features and SdA based features had the best performance with 0.838 of an area under ROC curve and 0.312 of ANODE score. The results showed that proposed method was effective in the application for detecting cerebral aneurysms on MRA.
Working memory for visual features and conjunctions in schizophrenia.
Gold, James M; Wilk, Christopher M; McMahon, Robert P; Buchanan, Robert W; Luck, Steven J
2003-02-01
The visual working memory (WM) storage capacity of patients with schizophrenia was investigated using a change detection paradigm. Participants were presented with 2, 3, 4, or 6 colored bars with testing of both single feature (color, orientation) and feature conjunction conditions. Patients performed significantly worse than controls at all set sizes but demonstrated normal feature binding. Unlike controls, patient WM capacity declined at set size 6 relative to set size 4. Impairments with subcapacity arrays suggest a deficit in task set maintenance: Greater impairment for supercapacity set sizes suggests a deficit in the ability to selectively encode information for WM storage. Thus, the WM impairment in schizophrenia appears to be a consequence of attentional deficits rather than a reduction in storage capacity.
Can two dots form a Gestalt? Measuring emergent features with the capacity coefficient.
Hawkins, Robert X D; Houpt, Joseph W; Eidels, Ami; Townsend, James T
2016-09-01
While there is widespread agreement among vision researchers on the importance of some local aspects of visual stimuli, such as hue and intensity, there is no general consensus on a full set of basic sources of information used in perceptual tasks or how they are processed. Gestalt theories place particular value on emergent features, which are based on the higher-order relationships among elements of a stimulus rather than local properties. Thus, arbitrating between different accounts of features is an important step in arbitrating between local and Gestalt theories of perception in general. In this paper, we present the capacity coefficient from Systems Factorial Technology (SFT) as a quantitative approach for formalizing and rigorously testing predictions made by local and Gestalt theories of features. As a simple, easily controlled domain for testing this approach, we focus on the local feature of location and the emergent features of Orientation and Proximity in a pair of dots. We introduce a redundant-target change detection task to compare our capacity measure on (1) trials where the configuration of the dots changed along with their location against (2) trials where the amount of local location change was exactly the same, but there was no change in the configuration. Our results, in conjunction with our modeling tools, favor the Gestalt account of emergent features. We conclude by suggesting several candidate information-processing models that incorporate emergent features, which follow from our approach. Copyright © 2015 Elsevier Ltd. All rights reserved.
Evidence of tampering in watermark identification
NASA Astrophysics Data System (ADS)
McLauchlan, Lifford; Mehrübeoglu, Mehrübe
2009-08-01
In this work, watermarks are embedded in digital images in the discrete wavelet transform (DWT) domain. Principal component analysis (PCA) is performed on the DWT coefficients. Next higher order statistics based on the principal components and the eigenvalues are determined for different sets of images. Feature sets are analyzed for different types of attacks in m dimensional space. The results demonstrate the separability of the features for the tampered digital copies. Different feature sets are studied to determine more effective tamper evident feature sets. The digital forensics, the probable manipulation(s) or modification(s) performed on the digital information can be identified using the described technique.
NASA Astrophysics Data System (ADS)
Demro, James C.; Hartshorne, Richard; Woody, Loren M.; Levine, Peter A.; Tower, John R.
1995-06-01
The next generation Wedge Imaging Spectrometer (WIS) instruments currently in integration at Hughes SBRD incorporate advanced features to increase operation flexibility for remotely sensed hyperspectral imagery collection and use. These features include: a) multiple linear wedge filters to tailor the spectral bands to the scene phenomenology; b) simple, replaceable fore-optics to allow different spatial resolutions and coverages; c) data acquisition system (DAS) that collects the full data stream simultaneously from both WIS instruments (VNIR and SWIR/MWIR), stores the data in a RAID storage, and provides for down-loading of the data to MO disks; the WIS DAS also allows selection of the spectral band sets to be stored; d) high-performance VNIR camera subsystem based upon a 512 X 512 CCD area array and associated electronics.
NASA Astrophysics Data System (ADS)
Song, Bowen; Zhang, Guopeng; Wang, Huafeng; Zhu, Wei; Liang, Zhengrong
2013-02-01
Various types of features, e.g., geometric features, texture features, projection features etc., have been introduced for polyp detection and differentiation tasks via computer aided detection and diagnosis (CAD) for computed tomography colonography (CTC). Although these features together cover more information of the data, some of them are statistically highly-related to others, which made the feature set redundant and burdened the computation task of CAD. In this paper, we proposed a new dimension reduction method which combines hierarchical clustering and principal component analysis (PCA) for false positives (FPs) reduction task. First, we group all the features based on their similarity using hierarchical clustering, and then PCA is employed within each group. Different numbers of principal components are selected from each group to form the final feature set. Support vector machine is used to perform the classification. The results show that when three principal components were chosen from each group we can achieve an area under the curve of receiver operating characteristics of 0.905, which is as high as the original dataset. Meanwhile, the computation time is reduced by 70% and the feature set size is reduce by 77%. It can be concluded that the proposed method captures the most important information of the feature set and the classification accuracy is not affected after the dimension reduction. The result is promising and further investigation, such as automatically threshold setting, are worthwhile and are under progress.
Feature selection gait-based gender classification under different circumstances
NASA Astrophysics Data System (ADS)
Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah
2014-05-01
This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.
NASA Astrophysics Data System (ADS)
Idehara, H.; Carbon, D. F.
2004-12-01
We present two new, publicly available tools to support the examination and interpretation of spectra. SCAMP is a specialized graphical user interface for MATLAB. It allows researchers to rapidly intercompare sets of observational, theoretical, and/or laboratory spectra. Users have extensive control over the colors and placement of individual spectra, and over spectrum normalization from one spectral region to another. Spectra can be interactively assigned to user-defined groups and the groupings recalled at a later time. The user can measure/record positions and intensities of spectral features, interactively spline-fit spectra, and normalize spectra by fitted splines. User-defined wavelengths can be automatically highlighted in SCAMP plots. The user can save/print annotated graphical output suitable for a scientific notebook depicting the work at any point. The ASP is a WWW portal that provides interactive access to two spectrum data sets: a library of synthetic stellar spectra and a library of laboratory PAH spectra. The synthetic stellar spectra in the ASP are appropriate to the giant branch with an assortment of compositions. Each spectrum spans the full range from 2 to 600 microns at a variety of resolutions. The ASP is designed to allow users to quickly identify individual features at any resolution that arise from any of the included isotopic species. The user may also retrieve the depth of formation of individual features at any resolution. PAH spectra accessible through the ASP are drawn from the extensive library of spectra measured by the NASA Ames Astrochemistry Laboratory. The user may interactively choose any subset of PAHs in the data set, combine them with user-defined weights and temperatures, and view/download the resultant spectrum at any user-defined resolution. This work was funded by the NASA Advanced Supercomputing Division, NASA Ames Research Center.
Free-Form Region Description with Second-Order Pooling.
Carreira, João; Caseiro, Rui; Batista, Jorge; Sminchisescu, Cristian
2015-06-01
Semantic segmentation and object detection are nowadays dominated by methods operating on regions obtained as a result of a bottom-up grouping process (segmentation) but use feature extractors developed for recognition on fixed-form (e.g. rectangular) patches, with full images as a special case. This is most likely suboptimal. In this paper we focus on feature extraction and description over free-form regions and study the relationship with their fixed-form counterparts. Our main contributions are novel pooling techniques that capture the second-order statistics of local descriptors inside such free-form regions. We introduce second-order generalizations of average and max-pooling that together with appropriate non-linearities, derived from the mathematical structure of their embedding space, lead to state-of-the-art recognition performance in semantic segmentation experiments without any type of local feature coding. In contrast, we show that codebook-based local feature coding is more important when feature extraction is constrained to operate over regions that include both foreground and large portions of the background, as typical in image classification settings, whereas for high-accuracy localization setups, second-order pooling over free-form regions produces results superior to those of the winning systems in the contemporary semantic segmentation challenges, with models that are much faster in both training and testing.
Rough sets and Laplacian score based cost-sensitive feature selection
Yu, Shenglong
2018-01-01
Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of “good” features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms. PMID:29912884
Rough sets and Laplacian score based cost-sensitive feature selection.
Yu, Shenglong; Zhao, Hong
2018-01-01
Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of "good" features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo, P; Young, S; Kim, G
2015-06-15
Purpose: Texture features have been investigated as a biomarker of response and malignancy. Because these features reflect local differences in density, they may be influenced by acquisition and reconstruction parameters. The purpose of this study was to investigate the effects of radiation dose level and reconstruction method on features derived from lung lesions. Methods: With IRB approval, 33 lung tumor cases were identified from clinically indicated thoracic CT scans in which the raw projection (sinogram) data were available. Based on a previously-published technique, noise was added to the raw data to simulate reduced-dose versions of each case at 25%, 10%more » and 3% of the original dose. Original and simulated reduced dose projection data were reconstructed with conventional and two iterative-reconstruction settings, yielding 12 combinations of dose/recon conditions. One lesion from each case was contoured. At the reference condition (full dose, conventional recon), 17 lesions were randomly selected for repeat contouring (repeatability). For each lesion at each dose/recon condition, 151 texture measures were calculated. A paired differences approach was employed to compare feature variation from repeat contours at the reference condition to the variation observed in other dose/recon conditions (reproducibility). The ratio of standard deviation of the reproducibility to repeatability was used as the variation measure for each feature. Results: The mean variation (standard deviation) across dose levels and kernel was significantly different with a ratio of 2.24 (±5.85) across texture features (p=0.01). The mean variation (standard deviation) across dose levels with conventional recon was also significantly different with 2.30 (7.11) (p=0.025). The mean variation across reconstruction settings of original dose has a trend in showing difference with 1.35 (2.60) among all features (p=0.09). Conclusion: Texture features varied considerably with variations in dose and reconstruction condition. Care should be taken to standardize these conditions when using texture as a quantitative feature. This effort supported in part by a grant from the National Cancer Institute’s Quantitative Imaging Network (QIN): U01 CA181156; The UCLA Department of Radiology has a Master Research Agreement with Siemens Healthcare; Dr. McNitt-Gray has previously received research support from Siemens Healthcare.« less
Detecting spam comments on Indonesia’s Instagram posts
NASA Astrophysics Data System (ADS)
Septiandri, Ali Akbar; Wibisono, Okiriza
2017-01-01
In this paper we experimented with several feature sets for detecting spam comments in social media contents authored by Indonesian public figures. We define spam comments as comments which have promotional purposes (e.g. referring other users to products and services) and thus not related to the content to which the comments are posted. Three sets of features are evaluated for detecting spams: (1) hand-engineered features such as comment length, number of capital letters, and number of emojis, (2) keyword features such as whether the comment contains advertising words or product-related words, and (3) text features, namely, bag-of-words, TF-IDF, and fastText embeddings, each combined with latent semantic analysis. With 24,000 manually-annotated comments scraped from Instagram posts authored by more than 100 Indonesian public figures, we compared the performance of these feature sets and their combinations using 3 popular classification algorithms: Na¨ıve Bayes, SVM, and XGBoost. We find that using all three feature sets (with fastText embedding for the text features) gave the best F 1-score of 0.9601 on a holdout dataset. More interestingly, fastText embedding combined with hand-engineered features (i.e. without keyword features) yield similar F 1-score of 0.9523, and McNemar’s test failed to reject the hypothesis that the two results are not significantly different. This result is important as keyword features are largely dependent on the dataset and may not be as generalisable as the other feature sets when applied to new data. For future work, we hope to collect bigger and more diverse dataset of Indonesian spam comments, improve our model’s performance and generalisability, and publish a programming package for others to reliably detect spam comments.
Contingent attentional capture across multiple feature dimensions in a temporal search task.
Ito, Motohiro; Kawahara, Jun I
2016-01-01
The present study examined whether attention can be flexibly controlled to monitor two different feature dimensions (shape and color) in a temporal search task. Specifically, we investigated the occurrence of contingent attentional capture (i.e., interference from task-relevant distractors) and resulting set reconfiguration (i.e., enhancement of single task-relevant set). If observers can restrict searches to a specific value for each relevant feature dimension independently, the capture and reconfiguration effect should only occur when the single relevant distractor in each dimension appears. Participants identified a target letter surrounded by a non-green square or a non-square green frame. The results revealed contingent attentional capture, as target identification accuracy was lower when the distractor contained a target-defining feature than when it contained a nontarget feature. Resulting set reconfiguration was also obtained in that accuracy was superior when the current target's feature (e.g., shape) corresponded to the defining feature of the present distractor (shape) than when the current target's feature did not match the distractor's feature (color). This enhancement was not due to perceptual priming. The present study demonstrated that the principles of contingent attentional capture and resulting set reconfiguration held even when multiple target feature dimensions were monitored. Copyright © 2015 Elsevier B.V. All rights reserved.
New Features for Neuron Classification.
Hernández-Pérez, Leonardo A; Delgado-Castillo, Duniel; Martín-Pérez, Rainer; Orozco-Morales, Rubén; Lorenzo-Ginori, Juan V
2018-04-28
This paper addresses the problem of obtaining new neuron features capable of improving results of neuron classification. Most studies on neuron classification using morphological features have been based on Euclidean geometry. Here three one-dimensional (1D) time series are derived from the three-dimensional (3D) structure of neuron instead, and afterwards a spatial time series is finally constructed from which the features are calculated. Digitally reconstructed neurons were separated into control and pathological sets, which are related to three categories of alterations caused by epilepsy, Alzheimer's disease (long and local projections), and ischemia. These neuron sets were then subjected to supervised classification and the results were compared considering three sets of features: morphological, features obtained from the time series and a combination of both. The best results were obtained using features from the time series, which outperformed the classification using only morphological features, showing higher correct classification rates with differences of 5.15, 3.75, 5.33% for epilepsy and Alzheimer's disease (long and local projections) respectively. The morphological features were better for the ischemia set with a difference of 3.05%. Features like variance, Spearman auto-correlation, partial auto-correlation, mutual information, local minima and maxima, all related to the time series, exhibited the best performance. Also we compared different evaluators, among which ReliefF was the best ranked.
Low-Rank Discriminant Embedding for Multiview Learning.
Li, Jingjing; Wu, Yue; Zhao, Jidong; Lu, Ke
2017-11-01
This paper focuses on the specific problem of multiview learning where samples have the same feature set but different probability distributions, e.g., different viewpoints or different modalities. Since samples lying in different distributions cannot be compared directly, this paper aims to learn a latent subspace shared by multiple views assuming that the input views are generated from this latent subspace. Previous approaches usually learn the common subspace by either maximizing the empirical likelihood, or preserving the geometric structure. However, considering the complementarity between the two objectives, this paper proposes a novel approach, named low-rank discriminant embedding (LRDE), for multiview learning by taking full advantage of both sides. By further considering the duality between data points and features of multiview scene, i.e., data points can be grouped based on their distribution on features, while features can be grouped based on their distribution on the data points, LRDE not only deploys low-rank constraints on both sample level and feature level to dig out the shared factors across different views, but also preserves geometric information in both the ambient sample space and the embedding feature space by designing a novel graph structure under the framework of graph embedding. Finally, LRDE jointly optimizes low-rank representation and graph embedding in a unified framework. Comprehensive experiments in both multiview manner and pairwise manner demonstrate that LRDE performs much better than previous approaches proposed in recent literatures.
The effect of sequential information on consumers' willingness to pay for credence food attributes.
Botelho, A; Dinis, I; Lourenço-Gomes, L; Moreira, J; Costa Pinto, L; Simões, O
2017-11-01
The use of experimental methods to determine consumers' willingness to pay for "quality" food has been gaining importance in scientific research. In most of the empirical literature on this issue the experimental design starts with blind tasting, after which information is introduced. It is assumed that this approach allows consumers to elicit the real value that they attach to each of the features added through specific information. In this paper, the starting hypothesis is that this technique overestimates the weight of the features introduced by information in consumers' willingness to pay when compared to a real market situation, in which consumers are confronted with all the information at once. The data obtained through contingent valuation in an in-store setting was used to estimate a hedonic model aiming at assessing consumers' willingness to pay (WTP) for the feature "geographical origin of the variety" of pears and apples in different information scenarios: i) blind tasting followed by extrinsic information and ii) full information provided at once. The results show that, in fact, features are more valued when gradually added to background information than when consumers receive all the information from the beginning. Copyright © 2017 Elsevier Ltd. All rights reserved.
Compartmentalization of the Coso East Flank geothermal field imaged by 3-D full-tensor MT inversion
Lindsey, Nathaniel J.; Kaven, Joern; Davatzes, Nicholas C.; Newman, Gregory A.
2017-01-01
Previous magnetotelluric (MT) studies of the high-temperature Coso geothermal system in California identified a subvertical feature of low resistivity (2–5 Ohm m) and appreciable lateral extent (>1 km) in the producing zone of the East Flank field. However, these models could not reproduce gross 3-D effects in the recorded data. We perform 3-D full-tensor inversion and retrieve a resistivity model that out-performs previous 2-D and 3-D off-diagonal models in terms of its fit to the complete 3-D MT data set as well as the degree of modelling bias. Inclusion of secondary Zxx and Zyy data components leads to a robust east-dip (60†) to the previously identified conductive East Flank reservoir feature, which correlates strongly with recently mapped surface faults, downhole well temperatures, 3-D seismic reflection data, and local microseismicity. We perform synthetic forward modelling to test the best-fit dip of this conductor using the response at a nearby MT station. We interpret the dipping conductor as a fractured and fluidized compartment, which is structurally controlled by an unmapped blind East Flank fault zone.
Compartmentalization of the Coso East Flank geothermal field imaged by 3-D full-tensor MT inversion
NASA Astrophysics Data System (ADS)
Lindsey, Nathaniel J.; Kaven, Joern Ole; Davatzes, Nicholas; Newman, Gregory A.
2017-02-01
Previous magnetotelluric (MT) studies of the high-temperature Coso geothermal system in California identified a subvertical feature of low resistivity (2-5 Ohm m) and appreciable lateral extent (>1 km) in the producing zone of the East Flank field. However, these models could not reproduce gross 3-D effects in the recorded data. We perform 3-D full-tensor inversion and retrieve a resistivity model that out-performs previous 2-D and 3-D off-diagonal models in terms of its fit to the complete 3-D MT data set as well as the degree of modelling bias. Inclusion of secondary Zxx and Zyy data components leads to a robust east-dip (60†) to the previously identified conductive East Flank reservoir feature, which correlates strongly with recently mapped surface faults, downhole well temperatures, 3-D seismic reflection data, and local microseismicity. We perform synthetic forward modelling to test the best-fit dip of this conductor using the response at a nearby MT station. We interpret the dipping conductor as a fractured and fluidized compartment, which is structurally controlled by an unmapped blind East Flank fault zone.
Precedence of the eye region in neural processing of faces
Issa, Elias; DiCarlo, James
2012-01-01
SUMMARY Functional magnetic resonance imaging (fMRI) has revealed multiple subregions in monkey inferior temporal cortex (IT) that are selective for images of faces over other objects. The earliest of these subregions, the posterior lateral face patch (PL), has not been studied previously at the neurophysiological level. Perhaps not surprisingly, we found that PL contains a high concentration of ‘face selective’ cells when tested with standard image sets comparable to those used previously to define the region at the level of fMRI. However, we here report that several different image sets and analytical approaches converge to show that nearly all face selective PL cells are driven by the presence of a single eye in the context of a face outline. Most strikingly, images containing only an eye, even when incorrectly positioned in an outline, drove neurons nearly as well as full face images, and face images lacking only this feature led to longer latency responses. Thus, bottom-up face processing is relatively local and linearly integrates features -- consistent with parts-based models -- grounding investigation of how the presence of a face is first inferred in the IT face processing hierarchy. PMID:23175821
probeBase—an online resource for rRNA-targeted oligonucleotide probes and primers: new features 2016
Greuter, Daniel; Loy, Alexander; Horn, Matthias; Rattei, Thomas
2016-01-01
probeBase http://www.probebase.net is a manually maintained and curated database of rRNA-targeted oligonucleotide probes and primers. Contextual information and multiple options for evaluating in silico hybridization performance against the most recent rRNA sequence databases are provided for each oligonucleotide entry, which makes probeBase an important and frequently used resource for microbiology research and diagnostics. Here we present a major update of probeBase, which was last featured in the NAR Database Issue 2007. This update describes a complete remodeling of the database architecture and environment to accommodate computationally efficient access. Improved search functions, sequence match tools and data output now extend the opportunities for finding suitable hierarchical probe sets that target an organism or taxon at different taxonomic levels. To facilitate the identification of complementary probe sets for organisms represented by short rRNA sequence reads generated by amplicon sequencing or metagenomic analysis with next generation sequencing technologies such as Illumina and IonTorrent, we introduce a novel tool that recovers surrogate near full-length rRNA sequences for short query sequences and finds matching oligonucleotides in probeBase. PMID:26586809
NASA Astrophysics Data System (ADS)
Fisher, Christopher M.; Paton, Chad; Pearson, D. Graham; Sarkar, Chiranjeeb; Luo, Yan; Tersmette, Daniel B.; Chacko, Thomas
2017-12-01
A robust platform to view and integrate multiple data sets collected simultaneously is required to realize the utility and potential of the Laser Ablation Split-Stream (LASS) method. This capability, until now, has been unavailable and practitioners have had to laboriously process each data set separately, making it challenging to take full advantage of the benefits of LASS. We describe a new program for handling multiple mass spectrometric data sets collected simultaneously, designed specifically for the LASS technique, by which a laser aerosol is been split into two or more separate "streams" to be measured on separate mass spectrometers. New features within Iolite (https://iolite-software.com) enable the capability of loading, synchronizing, viewing, and reducing two or more data sets acquired simultaneously, as multiple DRSs (data reduction schemes) can be run concurrently. While this version of Iolite accommodates any combination of simultaneously collected mass spectrometer data, we demonstrate the utility using case studies where U-Pb and Lu-Hf isotope composition of zircon, and U-Pb and Sm-Nd isotope composition of monazite were analyzed simultaneously, in crystals showing complex isotopic zonation. These studies demonstrate the importance of being able to view and integrate simultaneously acquired data sets, especially for samples with complicated zoning and decoupled isotope systematics, in order to extract accurate and geologically meaningful isotopic and compositional data. This contribution provides instructions and examples for handling simultaneously collected laser ablation data. An instructional video is also provided. The updated Iolite software will help to fully develop the applications of both LASS and multi-instrument mass spectrometric measurement capabilities.
Efficient feature selection using a hybrid algorithm for the task of epileptic seizure detection
NASA Astrophysics Data System (ADS)
Lai, Kee Huong; Zainuddin, Zarita; Ong, Pauline
2014-07-01
Feature selection is a very important aspect in the field of machine learning. It entails the search of an optimal subset from a very large data set with high dimensional feature space. Apart from eliminating redundant features and reducing computational cost, a good selection of feature also leads to higher prediction and classification accuracy. In this paper, an efficient feature selection technique is introduced in the task of epileptic seizure detection. The raw data are electroencephalography (EEG) signals. Using discrete wavelet transform, the biomedical signals were decomposed into several sets of wavelet coefficients. To reduce the dimension of these wavelet coefficients, a feature selection method that combines the strength of both filter and wrapper methods is proposed. Principal component analysis (PCA) is used as part of the filter method. As for wrapper method, the evolutionary harmony search (HS) algorithm is employed. This metaheuristic method aims at finding the best discriminating set of features from the original data. The obtained features were then used as input for an automated classifier, namely wavelet neural networks (WNNs). The WNNs model was trained to perform a binary classification task, that is, to determine whether a given EEG signal was normal or epileptic. For comparison purposes, different sets of features were also used as input. Simulation results showed that the WNNs that used the features chosen by the hybrid algorithm achieved the highest overall classification accuracy.
NASA Astrophysics Data System (ADS)
Shi, Bibo; Hou, Rui; Mazurowski, Maciej A.; Grimm, Lars J.; Ren, Yinhao; Marks, Jeffrey R.; King, Lorraine M.; Maley, Carlo C.; Hwang, E. Shelley; Lo, Joseph Y.
2018-02-01
Purpose: To determine whether domain transfer learning can improve the performance of deep features extracted from digital mammograms using a pre-trained deep convolutional neural network (CNN) in the prediction of occult invasive disease for patients with ductal carcinoma in situ (DCIS) on core needle biopsy. Method: In this study, we collected digital mammography magnification views for 140 patients with DCIS at biopsy, 35 of which were subsequently upstaged to invasive cancer. We utilized a deep CNN model that was pre-trained on two natural image data sets (ImageNet and DTD) and one mammographic data set (INbreast) as the feature extractor, hypothesizing that these data sets are increasingly more similar to our target task and will lead to better representations of deep features to describe DCIS lesions. Through a statistical pooling strategy, three sets of deep features were extracted using the CNNs at different levels of convolutional layers from the lesion areas. A logistic regression classifier was then trained to predict which tumors contain occult invasive disease. The generalization performance was assessed and compared using repeated random sub-sampling validation and receiver operating characteristic (ROC) curve analysis. Result: The best performance of deep features was from CNN model pre-trained on INbreast, and the proposed classifier using this set of deep features was able to achieve a median classification performance of ROC-AUC equal to 0.75, which is significantly better (p<=0.05) than the performance of deep features extracted using ImageNet data set (ROCAUC = 0.68). Conclusion: Transfer learning is helpful for learning a better representation of deep features, and improves the prediction of occult invasive disease in DCIS.
Real-Time Feature Tracking Using Homography
NASA Technical Reports Server (NTRS)
Clouse, Daniel S.; Cheng, Yang; Ansar, Adnan I.; Trotz, David C.; Padgett, Curtis W.
2010-01-01
This software finds feature point correspondences in sequences of images. It is designed for feature matching in aerial imagery. Feature matching is a fundamental step in a number of important image processing operations: calibrating the cameras in a camera array, stabilizing images in aerial movies, geo-registration of images, and generating high-fidelity surface maps from aerial movies. The method uses a Shi-Tomasi corner detector and normalized cross-correlation. This process is likely to result in the production of some mismatches. The feature set is cleaned up using the assumption that there is a large planar patch visible in both images. At high altitude, this assumption is often reasonable. A mathematical transformation, called an homography, is developed that allows us to predict the position in image 2 of any point on the plane in image 1. Any feature pair that is inconsistent with the homography is thrown out. The output of the process is a set of feature pairs, and the homography. The algorithms in this innovation are well known, but the new implementation improves the process in several ways. It runs in real-time at 2 Hz on 64-megapixel imagery. The new Shi-Tomasi corner detector tries to produce the requested number of features by automatically adjusting the minimum distance between found features. The homography-finding code now uses an implementation of the RANSAC algorithm that adjusts the number of iterations automatically to achieve a pre-set probability of missing a set of inliers. The new interface allows the caller to pass in a set of predetermined points in one of the images. This allows the ability to track the same set of points through multiple frames.
ERIC Educational Resources Information Center
Eimer, Martin; Kiss, Monika; Nicholas, Susan
2011-01-01
When target-defining features are specified in advance, attentional target selection in visual search is controlled by preparatory top-down task sets. We used ERP measures to study voluntary target selection in the absence of such feature-specific task sets, and to compare it to selection that is guided by advance knowledge about target features.…
Classification of large-scale fundus image data sets: a cloud-computing framework.
Roychowdhury, Sohini
2016-08-01
Large medical image data sets with high dimensionality require substantial amount of computation time for data creation and data processing. This paper presents a novel generalized method that finds optimal image-based feature sets that reduce computational time complexity while maximizing overall classification accuracy for detection of diabetic retinopathy (DR). First, region-based and pixel-based features are extracted from fundus images for classification of DR lesions and vessel-like structures. Next, feature ranking strategies are used to distinguish the optimal classification feature sets. DR lesion and vessel classification accuracies are computed using the boosted decision tree and decision forest classifiers in the Microsoft Azure Machine Learning Studio platform, respectively. For images from the DIARETDB1 data set, 40 of its highest-ranked features are used to classify four DR lesion types with an average classification accuracy of 90.1% in 792 seconds. Also, for classification of red lesion regions and hemorrhages from microaneurysms, accuracies of 85% and 72% are observed, respectively. For images from STARE data set, 40 high-ranked features can classify minor blood vessels with an accuracy of 83.5% in 326 seconds. Such cloud-based fundus image analysis systems can significantly enhance the borderline classification performances in automated screening systems.
The effect of feature selection methods on computer-aided detection of masses in mammograms
NASA Astrophysics Data System (ADS)
Hupse, Rianne; Karssemeijer, Nico
2010-05-01
In computer-aided diagnosis (CAD) research, feature selection methods are often used to improve generalization performance of classifiers and shorten computation times. In an application that detects malignant masses in mammograms, we investigated the effect of using a selection criterion that is similar to the final performance measure we are optimizing, namely the mean sensitivity of the system in a predefined range of the free-response receiver operating characteristics (FROC). To obtain the generalization performance of the selected feature subsets, a cross validation procedure was performed on a dataset containing 351 abnormal and 7879 normal regions, each region providing a set of 71 mass features. The same number of noise features, not containing any information, were added to investigate the ability of the feature selection algorithms to distinguish between useful and non-useful features. It was found that significantly higher performances were obtained using feature sets selected by the general test statistic Wilks' lambda than using feature sets selected by the more specific FROC measure. Feature selection leads to better performance when compared to a system in which all features were used.
EEG-based recognition of video-induced emotions: selecting subject-independent feature set.
Kortelainen, Jukka; Seppänen, Tapio
2013-01-01
Emotions are fundamental for everyday life affecting our communication, learning, perception, and decision making. Including emotions into the human-computer interaction (HCI) could be seen as a significant step forward offering a great potential for developing advanced future technologies. While the electrical activity of the brain is affected by emotions, offers electroencephalogram (EEG) an interesting channel to improve the HCI. In this paper, the selection of subject-independent feature set for EEG-based emotion recognition is studied. We investigate the effect of different feature sets in classifying person's arousal and valence while watching videos with emotional content. The classification performance is optimized by applying a sequential forward floating search algorithm for feature selection. The best classification rate (65.1% for arousal and 63.0% for valence) is obtained with a feature set containing power spectral features from the frequency band of 1-32 Hz. The proposed approach substantially improves the classification rate reported in the literature. In future, further analysis of the video-induced EEG changes including the topographical differences in the spectral features is needed.
Development of automatic body condition scoring using a low-cost 3-dimensional Kinect camera.
Spoliansky, Roii; Edan, Yael; Parmet, Yisrael; Halachmi, Ilan
2016-09-01
Body condition scoring (BCS) is a farm-management tool for estimating dairy cows' energy reserves. Today, BCS is performed manually by experts. This paper presents a 3-dimensional algorithm that provides a topographical understanding of the cow's body to estimate BCS. An automatic BCS system consisting of a Kinect camera (Microsoft Corp., Redmond, WA) triggered by a passive infrared motion detector was designed and implemented. Image processing and regression algorithms were developed and included the following steps: (1) image restoration, the removal of noise; (2) object recognition and separation, identification and separation of the cows; (3) movie and image selection, selection of movies and frames that include the relevant data; (4) image rotation, alignment of the cow parallel to the x-axis; and (5) image cropping and normalization, removal of irrelevant data, setting the image size to 150×200 pixels, and normalizing image values. All steps were performed automatically, including image selection and classification. Fourteen individual features per cow, derived from the cows' topography, were automatically extracted from the movies and from the farm's herd-management records. These features appear to be measurable in a commercial farm. Manual BCS was performed by a trained expert and compared with the output of the training set. A regression model was developed, correlating the features with the manual BCS references. Data were acquired for 4 d, resulting in a database of 422 movies of 101 cows. Movies containing cows' back ends were automatically selected (389 movies). The data were divided into a training set of 81 cows and a test set of 20 cows; both sets included the identical full range of BCS classes. Accuracy tests gave a mean absolute error of 0.26, median absolute error of 0.19, and coefficient of determination of 0.75, with 100% correct classification within 1 step and 91% correct classification within a half step for BCS classes. Results indicated good repeatability, with all standard deviations under 0.33. The algorithm is independent of the background and requires 10 cows for training with approximately 30 movies of 4 s each. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Friberg, Anders; Schoonderwaldt, Erwin; Hedblad, Anton; Fabiani, Marco; Elowsson, Anders
2014-10-01
The notion of perceptual features is introduced for describing general music properties based on human perception. This is an attempt at rethinking the concept of features, aiming to approach the underlying human perception mechanisms. Instead of using concepts from music theory such as tones, pitches, and chords, a set of nine features describing overall properties of the music was selected. They were chosen from qualitative measures used in psychology studies and motivated from an ecological approach. The perceptual features were rated in two listening experiments using two different data sets. They were modeled both from symbolic and audio data using different sets of computational features. Ratings of emotional expression were predicted using the perceptual features. The results indicate that (1) at least some of the perceptual features are reliable estimates; (2) emotion ratings could be predicted by a small combination of perceptual features with an explained variance from 75% to 93% for the emotional dimensions activity and valence; (3) the perceptual features could only to a limited extent be modeled using existing audio features. Results clearly indicated that a small number of dedicated features were superior to a "brute force" model using a large number of general audio features.
Systems and Methods for Correcting Optical Reflectance Measurements
NASA Technical Reports Server (NTRS)
Yang, Ye (Inventor); Shear, Michael A. (Inventor); Soller, Babs R. (Inventor); Soyemi, Olusola O. (Inventor)
2014-01-01
We disclose measurement systems and methods for measuring analytes in target regions of samples that also include features overlying the target regions. The systems include: (a) a light source; (b) a detection system; (c) a set of at least first, second, and third light ports which transmit light from the light source to a sample and receive and direct light reflected from the sample to the detection system, generating a first set of data including information corresponding to both an internal target within the sample and features overlying the internal target, and a second set of data including information corresponding to features overlying the internal target; and (d) a processor configured to remove information characteristic of the overlying features from the first set of data using the first and second sets of data to produce corrected information representing the internal target.
Systems and methods for correcting optical reflectance measurements
NASA Technical Reports Server (NTRS)
Yang, Ye (Inventor); Soller, Babs R. (Inventor); Soyemi, Olusola O. (Inventor); Shear, Michael A. (Inventor)
2009-01-01
We disclose measurement systems and methods for measuring analytes in target regions of samples that also include features overlying the target regions. The systems include: (a) a light source; (b) a detection system; (c) a set of at least first, second, and third light ports which transmit light from the light source to a sample and receive and direct light reflected from the sample to the detection system, generating a first set of data including information corresponding to both an internal target within the sample and features overlying the internal target, and a second set of data including information corresponding to features overlying the internal target; and (d) a processor configured to remove information characteristic of the overlying features from the first set of data using the first and second sets of data to produce corrected information representing the internal target.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ogden, K; O’Dwyer, R; Bradford, T
Purpose: To reduce differences in features calculated from MRI brain scans acquired at different field strengths with or without Gadolinium contrast. Methods: Brain scans were processed for 111 epilepsy patients to extract hippocampus and thalamus features. Scans were acquired on 1.5 T scanners with Gadolinium contrast (group A), 1.5T scanners without Gd (group B), and 3.0 T scanners without Gd (group C). A total of 72 features were extracted. Features were extracted from original scans and from scans where the image pixel values were rescaled to the mean of the hippocampi and thalami values. For each data set, cluster analysismore » was performed on the raw feature set and for feature sets with normalization (conversion to Z scores). Two methods of normalization were used: The first was over all values of a given feature, and the second by normalizing within the patient group membership. The clustering software was configured to produce 3 clusters. Group fractions in each cluster were calculated. Results: For features calculated from both the non-rescaled and rescaled data, cluster membership was identical for both the non-normalized and normalized data sets. Cluster 1 was comprised entirely of Group A data, Cluster 2 contained data from all three groups, and Cluster 3 contained data from only groups 1 and 2. For the categorically normalized data sets there was a more uniform distribution of group data in the three Clusters. A less pronounced effect was seen in the rescaled image data features. Conclusion: Image Rescaling and feature renormalization can have a significant effect on the results of clustering analysis. These effects are also likely to influence the results of supervised machine learning algorithms. It may be possible to partly remove the influence of scanner field strength and the presence of Gadolinium based contrast in feature extraction for radiomics applications.« less
A framework for feature extraction from hospital medical data with applications in risk prediction.
Tran, Truyen; Luo, Wei; Phung, Dinh; Gupta, Sunil; Rana, Santu; Kennedy, Richard Lee; Larkins, Ann; Venkatesh, Svetha
2014-12-30
Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities. Hospital medical records was transformed to event sequences, to which filters were applied to extract feature sets capturing diversity in temporal scales and data types. The features were evaluated on a readmission prediction task, comparing with baseline feature sets generated from the Elixhauser comorbidities. The prediction model was through logistic regression with elastic net regularization. Predictions horizons of 1, 2, 3, 6, 12 months were considered for four diverse diseases: diabetes, COPD, mental disorders and pneumonia, with derivation and validation cohorts defined on non-overlapping data-collection periods. For unplanned readmissions, auto-extracted feature set using socio-demographic information and medical records, outperformed baselines derived from the socio-demographic information and Elixhauser comorbidities, over 20 settings (5 prediction horizons over 4 diseases). In particular over 30-day prediction, the AUCs are: COPD-baseline: 0.60 (95% CI: 0.57, 0.63), auto-extracted: 0.67 (0.64, 0.70); diabetes-baseline: 0.60 (0.58, 0.63), auto-extracted: 0.67 (0.64, 0.69); mental disorders-baseline: 0.57 (0.54, 0.60), auto-extracted: 0.69 (0.64,0.70); pneumonia-baseline: 0.61 (0.59, 0.63), auto-extracted: 0.70 (0.67, 0.72). The advantages of auto-extracted standard features from complex medical records, in a disease and task agnostic manner were demonstrated. Auto-extracted features have good predictive power over multiple time horizons. Such feature sets have potential to form the foundation of complex automated analytic tasks.
A Reduced Set of Features for Chronic Kidney Disease Prediction
Misir, Rajesh; Mitra, Malay; Samanta, Ranjit Kumar
2017-01-01
Chronic kidney disease (CKD) is one of the life-threatening diseases. Early detection and proper management are solicited for augmenting survivability. As per the UCI data set, there are 24 attributes for predicting CKD or non-CKD. At least there are 16 attributes need pathological investigations involving more resources, money, time, and uncertainties. The objective of this work is to explore whether we can predict CKD or non-CKD with reasonable accuracy using less number of features. An intelligent system development approach has been used in this study. We attempted one important feature selection technique to discover reduced features that explain the data set much better. Two intelligent binary classification techniques have been adopted for the validity of the reduced feature set. Performances were evaluated in terms of four important classification evaluation parameters. As suggested from our results, we may more concentrate on those reduced features for identifying CKD and thereby reduces uncertainty, saves time, and reduces costs. PMID:28706750
Ensemble methods with simple features for document zone classification
NASA Astrophysics Data System (ADS)
Obafemi-Ajayi, Tayo; Agam, Gady; Xie, Bingqing
2012-01-01
Document layout analysis is of fundamental importance for document image understanding and information retrieval. It requires the identification of blocks extracted from a document image via features extraction and block classification. In this paper, we focus on the classification of the extracted blocks into five classes: text (machine printed), handwriting, graphics, images, and noise. We propose a new set of features for efficient classifications of these blocks. We present a comparative evaluation of three ensemble based classification algorithms (boosting, bagging, and combined model trees) in addition to other known learning algorithms. Experimental results are demonstrated for a set of 36503 zones extracted from 416 document images which were randomly selected from the tobacco legacy document collection. The results obtained verify the robustness and effectiveness of the proposed set of features in comparison to the commonly used Ocropus recognition features. When used in conjunction with the Ocropus feature set, we further improve the performance of the block classification system to obtain a classification accuracy of 99.21%.
Arruti, Andoni; Cearreta, Idoia; Álvarez, Aitor; Lazkano, Elena; Sierra, Basilio
2014-01-01
Study of emotions in human–computer interaction is a growing research area. This paper shows an attempt to select the most significant features for emotion recognition in spoken Basque and Spanish Languages using different methods for feature selection. RekEmozio database was used as the experimental data set. Several Machine Learning paradigms were used for the emotion classification task. Experiments were executed in three phases, using different sets of features as classification variables in each phase. Moreover, feature subset selection was applied at each phase in order to seek for the most relevant feature subset. The three phases approach was selected to check the validity of the proposed approach. Achieved results show that an instance-based learning algorithm using feature subset selection techniques based on evolutionary algorithms is the best Machine Learning paradigm in automatic emotion recognition, with all different feature sets, obtaining a mean of 80,05% emotion recognition rate in Basque and a 74,82% in Spanish. In order to check the goodness of the proposed process, a greedy searching approach (FSS-Forward) has been applied and a comparison between them is provided. Based on achieved results, a set of most relevant non-speaker dependent features is proposed for both languages and new perspectives are suggested. PMID:25279686
Snoring classified: The Munich-Passau Snore Sound Corpus.
Janott, Christoph; Schmitt, Maximilian; Zhang, Yue; Qian, Kun; Pandit, Vedhas; Zhang, Zixing; Heiser, Clemens; Hohenhorst, Winfried; Herzog, Michael; Hemmert, Werner; Schuller, Björn
2018-03-01
Snoring can be excited in different locations within the upper airways during sleep. It was hypothesised that the excitation locations are correlated with distinct acoustic characteristics of the snoring noise. To verify this hypothesis, a database of snore sounds is developed, labelled with the location of sound excitation. Video and audio recordings taken during drug induced sleep endoscopy (DISE) examinations from three medical centres have been semi-automatically screened for snore events, which subsequently have been classified by ENT experts into four classes based on the VOTE classification. The resulting dataset containing 828 snore events from 219 subjects has been split into Train, Development, and Test sets. An SVM classifier has been trained using low level descriptors (LLDs) related to energy, spectral features, mel frequency cepstral coefficients (MFCC), formants, voicing, harmonic-to-noise ratio (HNR), spectral harmonicity, pitch, and microprosodic features. An unweighted average recall (UAR) of 55.8% could be achieved using the full set of LLDs including formants. Best performing subset is the MFCC-related set of LLDs. A strong difference in performance could be observed between the permutations of train, development, and test partition, which may be caused by the relatively low number of subjects included in the smaller classes of the strongly unbalanced data set. A database of snoring sounds is presented which are classified according to their sound excitation location based on objective criteria and verifiable video material. With the database, it could be demonstrated that machine classifiers can distinguish different excitation location of snoring sounds in the upper airway based on acoustic parameters. Copyright © 2018 Elsevier Ltd. All rights reserved.
The interaction of feature and space based orienting within the attention set.
Lim, Ahnate; Sinnett, Scott
2014-01-01
The processing of sensory information relies on interacting mechanisms of sustained attention and attentional capture, both of which operate in space and on object features. While evidence indicates that exogenous attentional capture, a mechanism previously understood to be automatic, can be eliminated while concurrently performing a demanding task, we reframe this phenomenon within the theoretical framework of the "attention set" (Most et al., 2005). Consequently, the specific prediction that cuing effects should reappear when feature dimensions of the cue overlap with those in the attention set (i.e., elements of the demanding task) was empirically tested and confirmed using a dual-task paradigm involving both sustained attention and attentional capture, adapted from Santangelo et al. (2007). Participants were required to either detect a centrally presented target presented in a stream of distractors (the primary task), or respond to a spatially cued target (the secondary task). Importantly, the spatial cue could either share features with the target in the centrally presented primary task, or not share any features. Overall, the findings supported the attention set hypothesis showing that a spatial cuing effect was only observed when the peripheral cue shared a feature with objects that were already in the attention set (i.e., the primary task). However, this finding was accompanied by differential attentional orienting dependent on the different types of objects within the attention set, with feature-based orienting occurring for target-related objects, and additional spatial-based orienting for distractor-related objects.
Contributions of individual face features to face discrimination.
Logan, Andrew J; Gordon, Gael E; Loffler, Gunter
2017-08-01
Faces are highly complex stimuli that contain a host of information. Such complexity poses the following questions: (a) do observers exhibit preferences for specific information? (b) how does sensitivity to individual face parts compare? These questions were addressed by quantifying sensitivity to different face features. Discrimination thresholds were determined for synthetic faces under the following conditions: (i) 'full face': all face features visible; (ii) 'isolated feature': single feature presented in isolation; (iii) 'embedded feature': all features visible, but only one feature modified. Mean threshold elevations for isolated features, relative to full-faces, were 0.84x, 1.08, 2.12, 3.34, 4.07 and 4.47 for head-shape, hairline, nose, mouth, eyes and eyebrows respectively. Hence, when two full faces can be discriminated at threshold, the difference between the eyes is about four times less than what is required when discriminating between isolated eyes. In all cases, sensitivity was higher when features were presented in isolation than when they were embedded within a face context (threshold elevations of 0.94x, 1.74, 2.67, 2.90, 5.94 and 9.94). This reveals a specific pattern of sensitivity to face information. Observers are between two and four times more sensitive to external than internal features. The pattern for internal features (higher sensitivity for the nose, compared to mouth, eyes and eyebrows) is consistent with lower sensitivity for those parts affected by facial dynamics (e.g. facial expressions). That isolated features are easier to discriminate than embedded features supports a holistic face processing mechanism which impedes extraction of information about individual features from full faces. Copyright © 2017 Elsevier Ltd. All rights reserved.
A keyword spotting model using perceptually significant energy features
NASA Astrophysics Data System (ADS)
Umakanthan, Padmalochini
The task of a keyword recognition system is to detect the presence of certain words in a conversation based on the linguistic information present in human speech. Such keyword spotting systems have applications in homeland security, telephone surveillance and human-computer interfacing. General procedure of a keyword spotting system involves feature generation and matching. In this work, new set of features that are based on the psycho-acoustic masking nature of human speech are proposed. After developing these features a time aligned pattern matching process was implemented to locate the words in a set of unknown words. A word boundary detection technique based on frame classification using the nonlinear characteristics of speech is also addressed in this work. Validation of this keyword spotting model was done using widely acclaimed Cepstral features. The experimental results indicate the viability of using these perceptually significant features as an augmented feature set in keyword spotting.
A random forest model based classification scheme for neonatal amplitude-integrated EEG.
Chen, Weiting; Wang, Yu; Cao, Guitao; Chen, Guoqiang; Gu, Qiufang
2014-01-01
Modern medical advances have greatly increased the survival rate of infants, while they remain in the higher risk group for neurological problems later in life. For the infants with encephalopathy or seizures, identification of the extent of brain injury is clinically challenging. Continuous amplitude-integrated electroencephalography (aEEG) monitoring offers a possibility to directly monitor the brain functional state of the newborns over hours, and has seen an increasing application in neonatal intensive care units (NICUs). This paper presents a novel combined feature set of aEEG and applies random forest (RF) method to classify aEEG tracings. To that end, a series of experiments were conducted on 282 aEEG tracing cases (209 normal and 73 abnormal ones). Basic features, statistic features and segmentation features were extracted from both the tracing as a whole and the segmented recordings, and then form a combined feature set. All the features were sent to a classifier afterwards. The significance of feature, the data segmentation, the optimization of RF parameters, and the problem of imbalanced datasets were examined through experiments. Experiments were also done to evaluate the performance of RF on aEEG signal classifying, compared with several other widely used classifiers including SVM-Linear, SVM-RBF, ANN, Decision Tree (DT), Logistic Regression(LR), ML, and LDA. The combined feature set can better characterize aEEG signals, compared with basic features, statistic features and segmentation features respectively. With the combined feature set, the proposed RF-based aEEG classification system achieved a correct rate of 92.52% and a high F1-score of 95.26%. Among all of the seven classifiers examined in our work, the RF method got the highest correct rate, sensitivity, specificity, and F1-score, which means that RF outperforms all of the other classifiers considered here. The results show that the proposed RF-based aEEG classification system with the combined feature set is efficient and helpful to better detect the brain disorders in newborns.
Discretionary power, lies, and broken trust: justification and discomfort.
Potter, N
1996-12-01
This paper explores the relationship between the bonds of practitioner/patient trust and the notion of a justified lie. The intersection of moral theories on lying which prioritize right action with institutional discretionary power allows practitioners to dismiss, or at least not take seriously enough, the harm done when a patient's trust is betrayed. Even when a lie can be shown to be justified, the trustworthiness of the practitioner may be called into question in ways that neither theories of right action nor contemporary discourse in health care attends to adequately. I set out features of full trustworthiness along Aristotelian lines.
NASA Astrophysics Data System (ADS)
Goupil, Ph.; Puyou, G.
2013-12-01
This paper presents a high-fidelity generic twin engine civil aircraft model developed by Airbus for advanced flight control system research. The main features of this benchmark are described to make the reader aware of the model complexity and representativeness. It is a complete representation including the nonlinear rigid-body aircraft model with a full set of control surfaces, actuator models, sensor models, flight control laws (FCL), and pilot inputs. Two applications of this benchmark in the framework of European projects are presented: FCL clearance using optimization and advanced fault detection and diagnosis (FDD).
A probabilistic model of cross-categorization.
Shafto, Patrick; Kemp, Charles; Mansinghka, Vikash; Tenenbaum, Joshua B
2011-07-01
Most natural domains can be represented in multiple ways: we can categorize foods in terms of their nutritional content or social role, animals in terms of their taxonomic groupings or their ecological niches, and musical instruments in terms of their taxonomic categories or social uses. Previous approaches to modeling human categorization have largely ignored the problem of cross-categorization, focusing on learning just a single system of categories that explains all of the features. Cross-categorization presents a difficult problem: how can we infer categories without first knowing which features the categories are meant to explain? We present a novel model that suggests that human cross-categorization is a result of joint inference about multiple systems of categories and the features that they explain. We also formalize two commonly proposed alternative explanations for cross-categorization behavior: a features-first and an objects-first approach. The features-first approach suggests that cross-categorization is a consequence of attentional processes, where features are selected by an attentional mechanism first and categories are derived second. The objects-first approach suggests that cross-categorization is a consequence of repeated, sequential attempts to explain features, where categories are derived first, then features that are poorly explained are recategorized. We present two sets of simulations and experiments testing the models' predictions about human categorization. We find that an approach based on joint inference provides the best fit to human categorization behavior, and we suggest that a full account of human category learning will need to incorporate something akin to these capabilities. Copyright © 2011 Elsevier B.V. All rights reserved.
Feature Selection for Classification of Polar Regions Using a Fuzzy Expert System
NASA Technical Reports Server (NTRS)
Penaloza, Mauel A.; Welch, Ronald M.
1996-01-01
Labeling, feature selection, and the choice of classifier are critical elements for classification of scenes and for image understanding. This study examines several methods for feature selection in polar regions, including the list, of a fuzzy logic-based expert system for further refinement of a set of selected features. Six Advanced Very High Resolution Radiometer (AVHRR) Local Area Coverage (LAC) arctic scenes are classified into nine classes: water, snow / ice, ice cloud, land, thin stratus, stratus over water, cumulus over water, textured snow over water, and snow-covered mountains. Sixty-seven spectral and textural features are computed and analyzed by the feature selection algorithms. The divergence, histogram analysis, and discriminant analysis approaches are intercompared for their effectiveness in feature selection. The fuzzy expert system method is used not only to determine the effectiveness of each approach in classifying polar scenes, but also to further reduce the features into a more optimal set. For each selection method,features are ranked from best to worst, and the best half of the features are selected. Then, rules using these selected features are defined. The results of running the fuzzy expert system with these rules show that the divergence method produces the best set features, not only does it produce the highest classification accuracy, but also it has the lowest computation requirements. A reduction of the set of features produced by the divergence method using the fuzzy expert system results in an overall classification accuracy of over 95 %. However, this increase of accuracy has a high computation cost.
Using spectrotemporal indices to improve the fruit-tree crop classification accuracy
NASA Astrophysics Data System (ADS)
Peña, M. A.; Liao, R.; Brenning, A.
2017-06-01
This study assesses the potential of spectrotemporal indices derived from satellite image time series (SITS) to improve the classification accuracy of fruit-tree crops. Six major fruit-tree crop types in the Aconcagua Valley, Chile, were classified by applying various linear discriminant analysis (LDA) techniques on a Landsat-8 time series of nine images corresponding to the 2014-15 growing season. As features we not only used the complete spectral resolution of the SITS, but also all possible normalized difference indices (NDIs) that can be constructed from any two bands of the time series, a novel approach to derive features from SITS. Due to the high dimensionality of this "enhanced" feature set we used the lasso and ridge penalized variants of LDA (PLDA). Although classification accuracies yielded by the standard LDA applied on the full-band SITS were good (misclassification error rate, MER = 0.13), they were further improved by 23% (MER = 0.10) with ridge PLDA using the enhanced feature set. The most important bands to discriminate the crops of interest were mainly concentrated on the first two image dates of the time series, corresponding to the crops' greenup stage. Despite the high predictor weights provided by the red and near infrared bands, typically used to construct greenness spectral indices, other spectral regions were also found important for the discrimination, such as the shortwave infrared band at 2.11-2.19 μm, sensitive to foliar water changes. These findings support the usefulness of spectrotemporal indices in the context of SITS-based crop type classifications, which until now have been mainly constructed by the arithmetic combination of two bands of the same image date in order to derive greenness temporal profiles like those from the normalized difference vegetation index.
An Expert Opinion on Advanced Insulin Pump Use in Youth with Type 1 Diabetes.
Bode, Bruce W; Kaufman, Francine R; Vint, Nan
2017-03-01
Among children and adolescents with type 1 diabetes mellitus, the use of insulin pump therapy has increased since its introduction in the early 1980s. Optimal management of type 1 diabetes mellitus depends on sufficient understanding by patients, their families, and healthcare providers on how to use pump technology. The goal for the use of insulin pump therapy should be to advance proficiency over time from the basics taught at the initiation of pump therapy to utilizing advanced settings to obtain optimal glycemic control. However, this goal is often not met, and appropriate understanding of the full features of pump technology can be lacking. The objective of this review is to provide an expert perspective on the advanced features and use of insulin pump therapy, including practical guidelines for the successful use of insulin pump technology, and other considerations specific to patients and healthcare providers.
Integrated system for automated financial document processing
NASA Astrophysics Data System (ADS)
Hassanein, Khaled S.; Wesolkowski, Slawo; Higgins, Ray; Crabtree, Ralph; Peng, Antai
1997-02-01
A system was developed that integrates intelligent document analysis with multiple character/numeral recognition engines in order to achieve high accuracy automated financial document processing. In this system, images are accepted in both their grayscale and binary formats. A document analysis module starts by extracting essential features from the document to help identify its type (e.g. personal check, business check, etc.). These features are also utilized to conduct a full analysis of the image to determine the location of interesting zones such as the courtesy amount and the legal amount. These fields are then made available to several recognition knowledge sources such as courtesy amount recognition engines and legal amount recognition engines through a blackboard architecture. This architecture allows all the available knowledge sources to contribute incrementally and opportunistically to the solution of the given recognition query. Performance results on a test set of machine printed business checks using the integrated system are also reported.
Lima, C S; Barbosa, D; Ramos, J; Tavares, A; Monteiro, L; Carvalho, L
2008-01-01
This paper presents a system to support medical diagnosis and detection of abnormal lesions by processing capsule endoscopic images. Endoscopic images possess rich information expressed by texture. Texture information can be efficiently extracted from medium scales of the wavelet transform. The set of features proposed in this paper to code textural information is named color wavelet covariance (CWC). CWC coefficients are based on the covariances of second order textural measures, an optimum subset of them is proposed. Third and forth order moments are added to cope with distributions that tend to become non-Gaussian, especially in some pathological cases. The proposed approach is supported by a classifier based on radial basis functions procedure for the characterization of the image regions along the video frames. The whole methodology has been applied on real data containing 6 full endoscopic exams and reached 95% specificity and 93% sensitivity.
Individualized grid-enabled mammographic training system
NASA Astrophysics Data System (ADS)
Yap, M. H.; Gale, A. G.
2009-02-01
The PERFORMS self-assessment scheme measures individuals skills in identifying key mammographic features on sets of known cases. One aspect of this is that it allows radiologists' skills to be trained, based on their data from this scheme. Consequently, a new strategy is introduced to provide revision training based on mammographic features that the radiologist has had difficulty with in these sets. To do this requires a lot of random cases to provide dynamic, unique, and up-to-date training modules for each individual. We propose GIMI (Generic Infrastructure in Medical Informatics) middleware as the solution to harvest cases from distributed grid servers. The GIMI middleware enables existing and legacy data to support healthcare delivery, research, and training. It is technology-agnostic, data-agnostic, and has a security policy. The trainee examines each case, indicating the location of regions of interest, and completes an evaluation form, to determine mammographic feature labelling, diagnosis, and decisions. For feedback, the trainee can choose to have immediate feedback after examining each case or batch feedback after examining a number of cases. All the trainees' result are recorded in a database which also contains their trainee profile. A full report can be prepared for the trainee after they have completed their training. This project demonstrates the practicality of a grid-based individualised training strategy and the efficacy in generating dynamic training modules within the coverage/outreach of the GIMI middleware. The advantages and limitations of the approach are discussed together with future plans.
An Evolving Worldview: Making Open Source Easy
NASA Astrophysics Data System (ADS)
Rice, Z.
2017-12-01
NASA Worldview is an interactive interface for browsing full-resolution, global satellite imagery. Worldview supports an open data policy so that academia, private industries and the general public can use NASA's satellite data to address Earth science related issues. Worldview was open sourced in 2014. By shifting to an open source approach, the Worldview application has evolved to better serve end-users. Project developers are able to have discussions with end-users and community developers to understand issues and develop new features. Community developers are able to track upcoming features, collaborate on them and make their own contributions. Developers who discover issues are able to address those issues and submit a fix. This reduces the time it takes for a project developer to reproduce an issue or develop a new feature. Getting new developers to contribute to the project has been one of the most important and difficult aspects of open sourcing Worldview. After witnessing potential outside contributors struggle, a focus has been made on making the installation of Worldview simple to reduce the initial learning curve and make contributing code easy. One way we have addressed this is through a simplified setup process. Our setup documentation includes a set of prerequisites and a set of straightforward commands to clone, configure, install and run. This presentation will emphasize our focus to simplify and standardize Worldview's open source code so that more people are able to contribute. The more people who contribute, the better the application will become over time.
SOLAR FLARE PREDICTION USING SDO/HMI VECTOR MAGNETIC FIELD DATA WITH A MACHINE-LEARNING ALGORITHM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bobra, M. G.; Couvidat, S., E-mail: couvidat@stanford.edu
2015-01-10
We attempt to forecast M- and X-class solar flares using a machine-learning algorithm, called support vector machine (SVM), and four years of data from the Solar Dynamics Observatory's Helioseismic and Magnetic Imager, the first instrument to continuously map the full-disk photospheric vector magnetic field from space. Most flare forecasting efforts described in the literature use either line-of-sight magnetograms or a relatively small number of ground-based vector magnetograms. This is the first time a large data set of vector magnetograms has been used to forecast solar flares. We build a catalog of flaring and non-flaring active regions sampled from a databasemore » of 2071 active regions, comprised of 1.5 million active region patches of vector magnetic field data, and characterize each active region by 25 parameters. We then train and test the machine-learning algorithm and we estimate its performances using forecast verification metrics with an emphasis on the true skill statistic (TSS). We obtain relatively high TSS scores and overall predictive abilities. We surmise that this is partly due to fine-tuning the SVM for this purpose and also to an advantageous set of features that can only be calculated from vector magnetic field data. We also apply a feature selection algorithm to determine which of our 25 features are useful for discriminating between flaring and non-flaring active regions and conclude that only a handful are needed for good predictive abilities.« less
The implementation and use of Ada on distributed systems with high reliability requirements
NASA Technical Reports Server (NTRS)
Knight, J. C.
1986-01-01
The general inadequacy of Ada for programming systems that must survive processor loss was shown. A solution to the problem was proposed in which there are no syntatic changes to Ada. The approach was evaluated using a full-scale, realistic application. The application used was the Advanced Transport Operating System (ATOPS), an experimental computer control system developed for a modified Boeing 737 aircraft. The ATOPS system is a full authority, real-time avionics system providing a large variety of advanced features. Methods of building fault tolerance into concurrent systems were explored. A set of criteria by which the proposed method will be judged was examined. Extensive interaction with personnel from Computer Sciences Corporation and NASA Langley occurred to determine the requirements of the ATOPS software. Backward error recovery in concurrent systems was assessed.
Manual for a workstation-based generic flight simulation program (LaRCsim), version 1.4
NASA Technical Reports Server (NTRS)
Jackson, E. Bruce
1995-01-01
LaRCsim is a set of ANSI C routines that implement a full set of equations of motion for a rigid-body aircraft in atmospheric and low-earth orbital flight, suitable for pilot-in-the-loop simulations on a workstation-class computer. All six rigid-body degrees of freedom are modeled. The modules provided include calculations of the typical aircraft rigid-body simulation variables, earth geodesy, gravity and atmospheric models, and support several data recording options. Features/limitations of the current version include English units of measure, a 1962 atmosphere model in cubic spline function lookup form, ranging from sea level to 75,000 feet, rotating oblate spheroidal earth model, with aircraft C.G. coordinates in both geocentric and geodetic axes. Angular integrations are done using quaternion state variables Vehicle X-Z symmetry is assumed.
Testing Product Generation in Software Product Lines Using Pairwise for Features Coverage
NASA Astrophysics Data System (ADS)
Pérez Lamancha, Beatriz; Polo Usaola, Macario
A Software Product Lines (SPL) is "a set of software-intensive systems sharing a common, managed set of features that satisfy the specific needs of a particular market segment or mission and that are developed from a common set of core assets in a prescribed way". Variability is a central concept that permits the generation of different products of the family by reusing core assets. It is captured through features which, for a SPL, define its scope. Features are represented in a feature model, which is later used to generate the products from the line. From the testing point of view, testing all the possible combinations in feature models is not practical because: (1) the number of possible combinations (i.e., combinations of features for composing products) may be untreatable, and (2) some combinations may contain incompatible features. Thus, this paper resolves the problem by the implementation of combinatorial testing techniques adapted to the SPL context.
Using Gaussian windows to explore a multivariate data set
NASA Technical Reports Server (NTRS)
Jaeckel, Louis A.
1991-01-01
In an earlier paper, I recounted an exploratory analysis, using Gaussian windows, of a data set derived from the Infrared Astronomical Satellite. Here, my goals are to develop strategies for finding structural features in a data set in a many-dimensional space, and to find ways to describe the shape of such a data set. After a brief review of Gaussian windows, I describe the current implementation of the method. I give some ways of describing features that we might find in the data, such as clusters and saddle points, and also extended structures such as a 'bar', which is an essentially one-dimensional concentration of data points. I then define a distance function, which I use to determine which data points are 'associated' with a feature. Data points not associated with any feature are called 'outliers'. I then explore the data set, giving the strategies that I used and quantitative descriptions of the features that I found, including clusters, bars, and a saddle point. I tried to use strategies and procedures that could, in principle, be used in any number of dimensions.
Kim, Eunji; Ivanov, Ivan; Hua, Jianping; Lampe, Johanna W; Hullar, Meredith Aj; Chapkin, Robert S; Dougherty, Edward R
2017-01-01
Ranking feature sets for phenotype classification based on gene expression is a challenging issue in cancer bioinformatics. When the number of samples is small, all feature selection algorithms are known to be unreliable, producing significant error, and error estimators suffer from different degrees of imprecision. The problem is compounded by the fact that the accuracy of classification depends on the manner in which the phenomena are transformed into data by the measurement technology. Because next-generation sequencing technologies amount to a nonlinear transformation of the actual gene or RNA concentrations, they can potentially produce less discriminative data relative to the actual gene expression levels. In this study, we compare the performance of ranking feature sets derived from a model of RNA-Seq data with that of a multivariate normal model of gene concentrations using 3 measures: (1) ranking power, (2) length of extensions, and (3) Bayes features. This is the model-based study to examine the effectiveness of reporting lists of small feature sets using RNA-Seq data and the effects of different model parameters and error estimators. The results demonstrate that the general trends of the parameter effects on the ranking power of the underlying gene concentrations are preserved in the RNA-Seq data, whereas the power of finding a good feature set becomes weaker when gene concentrations are transformed by the sequencing machine.
Geospatial Analytics in Retail Site Selection and Sales Prediction.
Ting, Choo-Yee; Ho, Chiung Ching; Yee, Hui Jia; Matsah, Wan Razali
2018-03-01
Studies have shown that certain features from geography, demography, trade area, and environment can play a vital role in retail site selection, largely due to the impact they asserted on retail performance. Although the relevant features could be elicited by domain experts, determining the optimal feature set can be intractable and labor-intensive exercise. The challenges center around (1) how to determine features that are important to a particular retail business and (2) how to estimate retail sales performance given a new location? The challenges become apparent when the features vary across time. In this light, this study proposed a nonintervening approach by employing feature selection algorithms and subsequently sales prediction through similarity-based methods. The results of prediction were validated by domain experts. In this study, data sets from different sources were transformed and aggregated before an analytics data set that is ready for analysis purpose could be obtained. The data sets included data about feature location, population count, property type, education status, and monthly sales from 96 branches of a telecommunication company in Malaysia. The finding suggested that (1) optimal retail performance can only be achieved through fulfillment of specific location features together with the surrounding trade area characteristics and (2) similarity-based method can provide solution to retail sales prediction.
GARNET--gene set analysis with exploration of annotation relations.
Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu
2011-02-15
Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).
NASA Astrophysics Data System (ADS)
Tan, Maxine; Leader, Joseph K.; Liu, Hong; Zheng, Bin
2015-03-01
We recently investigated a new mammographic image feature based risk factor to predict near-term breast cancer risk after a woman has a negative mammographic screening. We hypothesized that unlike the conventional epidemiology-based long-term (or lifetime) risk factors, the mammographic image feature based risk factor value will increase as the time lag between the negative and positive mammography screening decreases. The purpose of this study is to test this hypothesis. From a large and diverse full-field digital mammography (FFDM) image database with 1278 cases, we collected all available sequential FFDM examinations for each case including the "current" and 1 to 3 most recently "prior" examinations. All "prior" examinations were interpreted negative, and "current" ones were either malignant or recalled negative/benign. We computed 92 global mammographic texture and density based features, and included three clinical risk factors (woman's age, family history and subjective breast density BIRADS ratings). On this initial feature set, we applied a fast and accurate Sequential Forward Floating Selection (SFFS) feature selection algorithm to reduce feature dimensionality. The features computed on both mammographic views were individually/ separately trained using two artificial neural network (ANN) classifiers. The classification scores of the two ANNs were then merged with a sequential ANN. The results show that the maximum adjusted odds ratios were 5.59, 7.98, and 15.77 for using the 3rd, 2nd, and 1st "prior" FFDM examinations, respectively, which demonstrates a higher association of mammographic image feature change and an increasing risk trend of developing breast cancer in the near-term after a negative screening.
Capela, Nicole A; Lemaire, Edward D; Baddour, Natalie
2015-01-01
Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.
2015-01-01
Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations. PMID:25885272
Hong, Chih-Yuan; Guo, Lan-Yuen; Song, Rong; Nagurka, Mark L; Sung, Jia-Li; Yen, Chen-Wen
2016-08-02
Many methods have been proposed to assess the stability of human postural balance by using a force plate. While most of these approaches characterize postural stability by extracting features from the trajectory of the center of pressure (COP), this work develops stability measures derived from components of the ground reaction force (GRF). In comparison with previous GRF-based approaches that extract stability features from the GRF resultant force, this study proposes three feature sets derived from the correlation patterns among the vertical GRF (VGRF) components. The first and second feature sets quantitatively assess the strength and changing speed of the correlation patterns, respectively. The third feature set is used to quantify the stabilizing effect of the GRF coordination patterns on the COP. In addition to experimentally demonstrating the reliability of the proposed features, the efficacy of the proposed features has also been tested by using them to classify two age groups (18-24 and 65-73 years) in quiet standing. The experimental results show that the proposed features are considerably more sensitive to aging than one of the most effective conventional COP features and two recently proposed COM features. By extracting information from the correlation patterns of the VGRF components, this study proposes three sets of features to assess human postural stability during quiet standing. As demonstrated by the experimental results, the proposed features are not only robust to inter-trial variability but also more accurate than the tested COP and COM features in classifying the older and younger age groups. An additional advantage of the proposed approach is that it reduces the force sensing requirement from 3D to 1D, substantially reducing the cost of the force plate measurement system.
Li, Der-Chiang; Liu, Chiao-Wen; Hu, Susan C
2011-05-01
Medical data sets are usually small and have very high dimensionality. Too many attributes will make the analysis less efficient and will not necessarily increase accuracy, while too few data will decrease the modeling stability. Consequently, the main objective of this study is to extract the optimal subset of features to increase analytical performance when the data set is small. This paper proposes a fuzzy-based non-linear transformation method to extend classification related information from the original data attribute values for a small data set. Based on the new transformed data set, this study applies principal component analysis (PCA) to extract the optimal subset of features. Finally, we use the transformed data with these optimal features as the input data for a learning tool, a support vector machine (SVM). Six medical data sets: Pima Indians' diabetes, Wisconsin diagnostic breast cancer, Parkinson disease, echocardiogram, BUPA liver disorders dataset, and bladder cancer cases in Taiwan, are employed to illustrate the approach presented in this paper. This research uses the t-test to evaluate the classification accuracy for a single data set; and uses the Friedman test to show the proposed method is better than other methods over the multiple data sets. The experiment results indicate that the proposed method has better classification performance than either PCA or kernel principal component analysis (KPCA) when the data set is small, and suggest creating new purpose-related information to improve the analysis performance. This paper has shown that feature extraction is important as a function of feature selection for efficient data analysis. When the data set is small, using the fuzzy-based transformation method presented in this work to increase the information available produces better results than the PCA and KPCA approaches. Copyright © 2011 Elsevier B.V. All rights reserved.
Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.
Tohka, Jussi; Moradi, Elaheh; Huttunen, Heikki
2016-07-01
We present a comparative split-half resampling analysis of various data driven feature selection and classification methods for the whole brain voxel-based classification analysis of anatomical magnetic resonance images. We compared support vector machines (SVMs), with or without filter based feature selection, several embedded feature selection methods and stability selection. While comparisons of the accuracy of various classification methods have been reported previously, the variability of the out-of-training sample classification accuracy and the set of selected features due to independent training and test sets have not been previously addressed in a brain imaging context. We studied two classification problems: 1) Alzheimer's disease (AD) vs. normal control (NC) and 2) mild cognitive impairment (MCI) vs. NC classification. In AD vs. NC classification, the variability in the test accuracy due to the subject sample did not vary between different methods and exceeded the variability due to different classifiers. In MCI vs. NC classification, particularly with a large training set, embedded feature selection methods outperformed SVM-based ones with the difference in the test accuracy exceeding the test accuracy variability due to the subject sample. The filter and embedded methods produced divergent feature patterns for MCI vs. NC classification that suggests the utility of the embedded feature selection for this problem when linked with the good generalization performance. The stability of the feature sets was strongly correlated with the number of features selected, weakly correlated with the stability of classification accuracy, and uncorrelated with the average classification accuracy.
Rethinking the REAL ID Act and National Identification Cards as a Counterterrorism Tool
2009-12-01
federal government imposing national identification standards on states are also actively engaged in the debate. Michael Boldin , a 36-year-old Web...on the RIA.94 Boldin states, “Maine resisted, the government backed off, and soon all of these other states were doing the same thing.”95 Since...that acquires biometric data from an individual, extracts a feature set from the data, compares this feature set against the feature set stored in a
Inattentional blindness: A combination of a relational set and a feature inhibition set?
Goldstein, Rebecca R; Beck, Melissa R
2016-07-01
Two experiments were conducted to directly test the feature set hypothesis and the relational set hypothesis in an inattentional blindness task. The feature set hypothesis predicts that unexpected objects that match the to-be-attended stimuli will be reported most. The relational set hypothesis predicts that unexpected objects that match the relationship between the to-be-attended and the to-be-ignored stimuli will be reported the most. Experiment 1 manipulated the luminance of the stimuli. Participants were instructed to monitor the gray letter shapes and to ignore either black or white letter shapes. The unexpected objects that exhibited the luminance relation of the to-be-attended to the to-be-ignored stimuli were reported by participants the most. Experiment 2 manipulated the color of the stimuli. Participants were instructed to monitor the yellower orange or the redder orange letter shapes and to ignore the redder orange or yellower letter shapes. The unexpected objects that exhibited the color relation of the to-be-attended to the to-be-ignored stimuli were reported the most. The results do not support the use of a feature set to accomplish the task and instead support the use of a relational set. In addition, the results point to the concurrent use of multiple attentional sets that are both excitatory and inhibitory.
Color image definition evaluation method based on deep learning method
NASA Astrophysics Data System (ADS)
Liu, Di; Li, YingChun
2018-01-01
In order to evaluate different blurring levels of color image and improve the method of image definition evaluation, this paper proposed a method based on the depth learning framework and BP neural network classification model, and presents a non-reference color image clarity evaluation method. Firstly, using VGG16 net as the feature extractor to extract 4,096 dimensions features of the images, then the extracted features and labeled images are employed in BP neural network to train. And finally achieve the color image definition evaluation. The method in this paper are experimented by using images from the CSIQ database. The images are blurred at different levels. There are 4,000 images after the processing. Dividing the 4,000 images into three categories, each category represents a blur level. 300 out of 400 high-dimensional features are trained in VGG16 net and BP neural network, and the rest of 100 samples are tested. The experimental results show that the method can take full advantage of the learning and characterization capability of deep learning. Referring to the current shortcomings of the major existing image clarity evaluation methods, which manually design and extract features. The method in this paper can extract the images features automatically, and has got excellent image quality classification accuracy for the test data set. The accuracy rate is 96%. Moreover, the predicted quality levels of original color images are similar to the perception of the human visual system.
Computer-aided diagnosis of melanoma using border and wavelet-based texture analysis.
Garnavi, Rahil; Aldeen, Mohammad; Bailey, James
2012-11-01
This paper presents a novel computer-aided diagnosis system for melanoma. The novelty lies in the optimised selection and integration of features derived from textural, borderbased and geometrical properties of the melanoma lesion. The texture features are derived from using wavelet-decomposition, the border features are derived from constructing a boundaryseries model of the lesion border and analysing it in spatial and frequency domains, and the geometry features are derived from shape indexes. The optimised selection of features is achieved by using the Gain-Ratio method, which is shown to be computationally efficient for melanoma diagnosis application. Classification is done through the use of four classifiers; namely, Support Vector Machine, Random Forest, Logistic Model Tree and Hidden Naive Bayes. The proposed diagnostic system is applied on a set of 289 dermoscopy images (114 malignant, 175 benign) partitioned into train, validation and test image sets. The system achieves and accuracy of 91.26% and AUC value of 0.937, when 23 features are used. Other important findings include (i) the clear advantage gained in complementing texture with border and geometry features, compared to using texture information only, and (ii) higher contribution of texture features than border-based features in the optimised feature set.
Methods for the Precise Locating and Forming of Arrays of Curved Features into a Workpiece
Gill, David Dennis; Keeler, Gordon A.; Serkland, Darwin K.; Mukherjee, Sayan D.
2008-10-14
Methods for manufacturing high precision arrays of curved features (e.g. lenses) in the surface of a workpiece are described utilizing orthogonal sets of inter-fitting locating grooves to mate a workpiece to a workpiece holder mounted to the spindle face of a rotating machine tool. The matching inter-fitting groove sets in the workpiece and the chuck allow precisely and non-kinematically indexing the workpiece to locations defined in two orthogonal directions perpendicular to the turning axis of the machine tool. At each location on the workpiece a curved feature can then be on-center machined to create arrays of curved features on the workpiece. The averaging effect of the corresponding sets of inter-fitting grooves provide for precise repeatability in determining, the relative locations of the centers of each of the curved features in an array of curved features.
An enhanced digital line graph design
Guptill, Stephen C.
1990-01-01
In response to increasing information demands on its digital cartographic data, the U.S. Geological Survey has designed an enhanced version of the Digital Line Graph, termed Digital Line Graph - Enhanced (DLG-E). In the DLG-E model, the phenomena represented by geographic and cartographic data are termed entities. Entities represent individual phenomena in the real world. A feature is an abstraction of a set of entities, with the feature description encompassing only selected properties of the entities (typically the properties that have been portrayed cartographically on a map). Buildings, bridges, roads, streams, grasslands, and counties are examples of features. A feature instance, that is, one occurrence of a feature, is described in the digital environment by feature objects and spatial objects. A feature object identifies a feature instance and its nonlocational attributes. Nontopological relationships are associated with feature objects. The locational aspects of the feature instance are represented by spatial objects. Four spatial objects (points, nodes, chains, and polygons) and their topological relationships are defined. To link the locational and nonlocational aspects of the feature instance, a given feature object is associated with (or is composed of) a set of spatial objects. These objects, attributes, and relationships are the components of the DLG-E data model. To establish a domain of features for DLG-E, an approach using a set of classes, or views, of spatial entities was adopted. The five views that were developed are cover, division, ecosystem, geoposition, and morphology. The views are exclusive; each view is a self-contained analytical approach to the entire range of world features. Because each view is independent of the others, a single point on the surface of the Earth can be represented under multiple views. Under the five views, over 200 features were identified and defined. This set constitutes an initial domain of DLG-E features.
Falkowski, Andrzej; Jabłońska, Magdalena
2018-01-01
In this study we followed the extension of Tversky's research about features of similarity with its application to open sets. Unlike the original closed-set model in which a feature was shifted between a common and a distinctive set, we investigated how addition of new features and deletion of existing features affected similarity judgments. The model was tested empirically in a political context and we analyzed how positive and negative changes in a candidate's profile affect the similarity of the politician to his or her ideal and opposite counterpart. The results showed a positive-negative asymmetry in comparison judgments where enhancing negative features (distinctive for an ideal political candidate) had a greater effect on judgments than operations on positive (common) features. However, the effect was not observed for comparisons to a bad politician. Further analyses showed that in the case of a negative reference point, the relationship between similarity judgments and voting intention was mediated by the affective evaluation of the candidate.
NASA Astrophysics Data System (ADS)
Asiedu, Mercy Nyamewaa; Simhal, Anish; Lam, Christopher T.; Mueller, Jenna; Chaudhary, Usamah; Schmitt, John W.; Sapiro, Guillermo; Ramanujam, Nimmi
2018-02-01
The world health organization recommends visual inspection with acetic acid (VIA) and/or Lugol's Iodine (VILI) for cervical cancer screening in low-resource settings. Human interpretation of diagnostic indicators for visual inspection is qualitative, subjective, and has high inter-observer discordance, which could lead both to adverse outcomes for the patient and unnecessary follow-ups. In this work, we a simple method for automatic feature extraction and classification for Lugol's Iodine cervigrams acquired with a low-cost, miniature, digital colposcope. Algorithms to preprocess expert physician-labelled cervigrams and to extract simple but powerful color-based features are introduced. The features are used to train a support vector machine model to classify cervigrams based on expert physician labels. The selected framework achieved a sensitivity, specificity, and accuracy of 89.2%, 66.7% and 80.6% with majority diagnosis of the expert physicians in discriminating cervical intraepithelial neoplasia (CIN +) relative to normal tissues. The proposed classifier also achieved an area under the curve of 84 when trained with majority diagnosis of the expert physicians. The results suggest that utilizing simple color-based features may enable unbiased automation of VILI cervigrams, opening the door to a full system of low-cost data acquisition complemented with automatic interpretation.
Li, Shelly-Anne; Jeffs, Lianne; Barwick, Melanie; Stevens, Bonnie
2018-05-05
Organizational contextual features have been recognized as important determinants for implementing evidence-based practices across healthcare settings for over a decade. However, implementation scientists have not reached consensus on which features are most important for implementing evidence-based practices. The aims of this review were to identify the most commonly reported organizational contextual features that influence the implementation of evidence-based practices across healthcare settings, and to describe how these features affect implementation. An integrative review was undertaken following literature searches in CINAHL, MEDLINE, PsycINFO, EMBASE, Web of Science, and Cochrane databases from January 2005 to June 2017. English language, peer-reviewed empirical studies exploring organizational context in at least one implementation initiative within a healthcare setting were included. Quality appraisal of the included studies was performed using the Mixed Methods Appraisal Tool. Inductive content analysis informed data extraction and reduction. The search generated 5152 citations. After removing duplicates and applying eligibility criteria, 36 journal articles were included. The majority (n = 20) of the study designs were qualitative, 11 were quantitative, and 5 used a mixed methods approach. Six main organizational contextual features (organizational culture; leadership; networks and communication; resources; evaluation, monitoring and feedback; and champions) were most commonly reported to influence implementation outcomes in the selected studies across a wide range of healthcare settings. We identified six organizational contextual features that appear to be interrelated and work synergistically to influence the implementation of evidence-based practices within an organization. Organizational contextual features did not influence implementation efforts independently from other features. Rather, features were interrelated and often influenced each other in complex, dynamic ways to effect change. These features corresponded to the constructs in the Consolidated Framework for Implementation Research (CFIR), which supports the use of CFIR as a guiding framework for studies that explore the relationship between organizational context and implementation. Organizational culture was most commonly reported to affect implementation. Leadership exerted influence on the five other features, indicating it may be a moderator or mediator that enhances or impedes the implementation of evidence-based practices. Future research should focus on how organizational features interact to influence implementation effectiveness.
Local Feature Selection for Data Classification.
Armanfard, Narges; Reilly, James P; Komeili, Majid
2016-06-01
Typical feature selection methods choose an optimal global feature subset that is applied over all regions of the sample space. In contrast, in this paper we propose a novel localized feature selection (LFS) approach whereby each region of the sample space is associated with its own distinct optimized feature set, which may vary both in membership and size across the sample space. This allows the feature set to optimally adapt to local variations in the sample space. An associated method for measuring the similarities of a query datum to each of the respective classes is also proposed. The proposed method makes no assumptions about the underlying structure of the samples; hence the method is insensitive to the distribution of the data over the sample space. The method is efficiently formulated as a linear programming optimization problem. Furthermore, we demonstrate the method is robust against the over-fitting problem. Experimental results on eleven synthetic and real-world data sets demonstrate the viability of the formulation and the effectiveness of the proposed algorithm. In addition we show several examples where localized feature selection produces better results than a global feature selection method.
Ferentinos, Panagiotis; Rivera, Margarita; Ising, Marcus; Spain, Sarah L; Cohen-Woods, Sarah; Butler, Amy W; Craddock, Nicholas; Owen, Michael J; Korszun, Ania; Jones, Lisa; Jones, Ian; Gill, Michael; Rice, John P; Maier, Wolfgang; Mors, Ole; Rietschel, Marcella; Lucae, Susanne; Binder, Elisabeth B; Preisig, Martin; Tozzi, Federica; Muglia, Pierandrea; Breen, Gerome; Craig, Ian W; Farmer, Anne E; Müller-Myhsok, Bertram; McGuffin, Peter; Lewis, Cathryn M
2014-02-01
Highly recurrent major depressive disorder (MDD) has reportedly increased risk of shifting to bipolar disorder; high recurrence frequency has, therefore, featured as evidence of 'soft bipolarity'. We aimed to investigate the genetic underpinnings of total depressive episode count in recurrent MDD. Our primary sample included 1966 MDD cases with negative family history of bipolar disorder from the RADIANT studies. Total episode count was adjusted for gender, age, MDD duration, study and center before being tested for association with genotype in two separate genome-wide analyses (GWAS), in the full set and in a subset of 1364 cases with positive family history of MDD (FH+). We also calculated polygenic scores from the Psychiatric Genomics Consortium MDD and bipolar disorder studies. Episodicity (especially intermediate episode counts) was an independent index of MDD familial aggregation, replicating previous reports. The GWAS produced no genome-wide significant findings. The strongest signals were detected in the full set at MAGI1 (p=5.1×10(-7)), previously associated with bipolar disorder, and in the FH+ subset at STIM1 (p=3.9×10(-6) after imputation), a calcium channel signaling gene. However, these findings failed to replicate in an independent Munich cohort. In the full set polygenic profile analyses, MDD polygenes predicted episodicity better than bipolar polygenes; however, in the FH+ subset, both polygenic scores performed similarly. Episode count was self-reported and, therefore, subject to recall bias. Our findings lend preliminary support to the hypothesis that highly recurrent MDD with FH+ is part of a 'soft bipolar spectrum' but await replication in larger cohorts. © 2013 Published by Elsevier B.V.
Hazes, Bart
2014-02-28
Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.
ASDF: An Adaptable Seismic Data Format with Full Provenance
NASA Astrophysics Data System (ADS)
Smith, J. A.; Krischer, L.; Tromp, J.; Lefebvre, M. P.
2015-12-01
In order for seismologists to maximize their knowledge of how the Earth works, they must extract the maximum amount of useful information from all recorded seismic data available for their research. This requires assimilating large sets of waveform data, keeping track of vast amounts of metadata, using validated standards for quality control, and automating the workflow in a careful and efficient manner. In addition, there is a growing gap between CPU/GPU speeds and disk access speeds that leads to an I/O bottleneck in seismic workflows. This is made even worse by existing seismic data formats that were not designed for performance and are limited to a few fixed headers for storing metadata.The Adaptable Seismic Data Format (ASDF) is a new data format for seismology that solves the problems with existing seismic data formats and integrates full provenance into the definition. ASDF is a self-describing format that features parallel I/O using the parallel HDF5 library. This makes it a great choice for use on HPC clusters. The format integrates the standards QuakeML for seismic sources and StationXML for receivers. ASDF is suitable for storing earthquake data sets, where all waveforms for a single earthquake are stored in a one file, ambient noise cross-correlations, and adjoint sources. The format comes with a user-friendly Python reader and writer that gives seismologists access to a full set of Python tools for seismology. There is also a faster C/Fortran library for integrating ASDF into performance-focused numerical wave solvers, such as SPECFEM3D_GLOBE. Finally, a GUI tool designed for visually exploring the format exists that provides a flexible interface for both research and educational applications. ASDF is a new seismic data format that offers seismologists high-performance parallel processing, organized and validated contents, and full provenance tracking for automated seismological workflows.
The interaction of feature and space based orienting within the attention set
Lim, Ahnate; Sinnett, Scott
2014-01-01
The processing of sensory information relies on interacting mechanisms of sustained attention and attentional capture, both of which operate in space and on object features. While evidence indicates that exogenous attentional capture, a mechanism previously understood to be automatic, can be eliminated while concurrently performing a demanding task, we reframe this phenomenon within the theoretical framework of the “attention set” (Most et al., 2005). Consequently, the specific prediction that cuing effects should reappear when feature dimensions of the cue overlap with those in the attention set (i.e., elements of the demanding task) was empirically tested and confirmed using a dual-task paradigm involving both sustained attention and attentional capture, adapted from Santangelo et al. (2007). Participants were required to either detect a centrally presented target presented in a stream of distractors (the primary task), or respond to a spatially cued target (the secondary task). Importantly, the spatial cue could either share features with the target in the centrally presented primary task, or not share any features. Overall, the findings supported the attention set hypothesis showing that a spatial cuing effect was only observed when the peripheral cue shared a feature with objects that were already in the attention set (i.e., the primary task). However, this finding was accompanied by differential attentional orienting dependent on the different types of objects within the attention set, with feature-based orienting occurring for target-related objects, and additional spatial-based orienting for distractor-related objects. PMID:24523682
NASA Astrophysics Data System (ADS)
Tamkin, G.; Schnase, J. L.; Duffy, D.; Li, J.; Strong, S.; Thompson, J. H.
2017-12-01
NASA's efforts to advance climate analytics-as-a-service are making new capabilities available to the research community: (1) A full-featured Reanalysis Ensemble Service (RES) comprising monthly means data from multiple reanalysis data sets, accessible through an enhanced set of extraction, analytic, arithmetic, and intercomparison operations. The operations are made accessible through NASA's climate data analytics Web services and our client-side Climate Data Services Python library, CDSlib; (2) A cloud-based, high-performance Virtual Real-Time Analytics Testbed supporting a select set of climate variables. This near real-time capability enables advanced technologies like Spark and Hadoop-based MapReduce analytics over native NetCDF files; and (3) A WPS-compliant Web service interface to our climate data analytics service that will enable greater interoperability with next-generation systems such as ESGF. The Reanalysis Ensemble Service includes the following: - New API that supports full temporal, spatial, and grid-based resolution services with sample queries - A Docker-ready RES application to deploy across platforms - Extended capabilities that enable single- and multiple reanalysis area average, vertical average, re-gridding, standard deviation, and ensemble averages - Convenient, one-stop shopping for commonly used data products from multiple reanalyses including basic sub-setting and arithmetic operations (e.g., avg, sum, max, min, var, count, anomaly) - Full support for the MERRA-2 reanalysis dataset in addition to, ECMWF ERA-Interim, NCEP CFSR, JMA JRA-55 and NOAA/ESRL 20CR… - A Jupyter notebook-based distribution mechanism designed for client use cases that combines CDSlib documentation with interactive scenarios and personalized project management - Supporting analytic services for NASA GMAO Forward Processing datasets - Basic uncertainty quantification services that combine heterogeneous ensemble products with comparative observational products (e.g., reanalysis, observational, visualization) - The ability to compute and visualize multiple reanalysis for ease of inter-comparisons - Automated tools to retrieve and prepare data collections for analytic processing
Robust Point Set Matching for Partial Face Recognition.
Weng, Renliang; Lu, Jiwen; Tan, Yap-Peng
2016-03-01
Over the past three decades, a number of face recognition methods have been proposed in computer vision, and most of them use holistic face images for person identification. In many real-world scenarios especially some unconstrained environments, human faces might be occluded by other objects, and it is difficult to obtain fully holistic face images for recognition. To address this, we propose a new partial face recognition approach to recognize persons of interest from their partial faces. Given a pair of gallery image and probe face patch, we first detect keypoints and extract their local textural features. Then, we propose a robust point set matching method to discriminatively match these two extracted local feature sets, where both the textural information and geometrical information of local features are explicitly used for matching simultaneously. Finally, the similarity of two faces is converted as the distance between these two aligned feature sets. Experimental results on four public face data sets show the effectiveness of the proposed approach.
Interactive visualization to advance earthquake simulation
Kellogg, L.H.; Bawden, G.W.; Bernardin, T.; Billen, M.; Cowgill, E.; Hamann, B.; Jadamec, M.; Kreylos, O.; Staadt, O.; Sumner, D.
2008-01-01
The geological sciences are challenged to manage and interpret increasing volumes of data as observations and simulations increase in size and complexity. For example, simulations of earthquake-related processes typically generate complex, time-varying data sets in two or more dimensions. To facilitate interpretation and analysis of these data sets, evaluate the underlying models, and to drive future calculations, we have developed methods of interactive visualization with a special focus on using immersive virtual reality (VR) environments to interact with models of Earth's surface and interior. Virtual mapping tools allow virtual "field studies" in inaccessible regions. Interactive tools allow us to manipulate shapes in order to construct models of geological features for geodynamic models, while feature extraction tools support quantitative measurement of structures that emerge from numerical simulation or field observations, thereby enabling us to improve our interpretation of the dynamical processes that drive earthquakes. VR has traditionally been used primarily as a presentation tool, albeit with active navigation through data. Reaping the full intellectual benefits of immersive VR as a tool for scientific analysis requires building on the method's strengths, that is, using both 3D perception and interaction with observed or simulated data. This approach also takes advantage of the specialized skills of geological scientists who are trained to interpret, the often limited, geological and geophysical data available from field observations. ?? Birkhaueser 2008.
An Evolving Worldview: Making Open Source Easy
NASA Technical Reports Server (NTRS)
Rice, Zachary
2017-01-01
NASA Worldview is an interactive interface for browsing full-resolution, global satellite imagery. Worldview supports an open data policy so that academia, private industries and the general public can use NASA's satellite data to address Earth science related issues. Worldview was open sourced in 2014. By shifting to an open source approach, the Worldview application has evolved to better serve end-users. Project developers are able to have discussions with end-users and community developers to understand issues and develop new features. New developers are able to track upcoming features, collaborate on them and make their own contributions. Getting new developers to contribute to the project has been one of the most important and difficult aspects of open sourcing Worldview. A focus has been made on making the installation of Worldview simple to reduce the initial learning curve and make contributing code easy. One way we have addressed this is through a simplified setup process. Our setup documentation includes a set of prerequisites and a set of straight forward commands to clone, configure, install and run. This presentation will emphasis our focus to simplify and standardize Worldview's open source code so more people are able to contribute. The more people who contribute, the better the application will become over time.
NASA Technical Reports Server (NTRS)
Bradley, D. B.; Cain, J. B., III; Williard, M. W.
1978-01-01
The task was to evaluate the ability of a set of timing/synchronization subsystem features to provide a set of desirable characteristics for the evolving Defense Communications System digital communications network. The set of features related to the approaches by which timing/synchronization information could be disseminated throughout the network and the manner in which this information could be utilized to provide a synchronized network. These features, which could be utilized in a large number of different combinations, included mutual control, directed control, double ended reference links, independence of clock error measurement and correction, phase reference combining, and self organizing.
Detecting Parkinson's disease from sustained phonation and speech signals.
Vaiciukynas, Evaldas; Verikas, Antanas; Gelzinis, Adas; Bacauskiene, Marija
2017-01-01
This study investigates signals from sustained phonation and text-dependent speech modalities for Parkinson's disease screening. Phonation corresponds to the vowel /a/ voicing task and speech to the pronunciation of a short sentence in Lithuanian language. Signals were recorded through two channels simultaneously, namely, acoustic cardioid (AC) and smart phone (SP) microphones. Additional modalities were obtained by splitting speech recording into voiced and unvoiced parts. Information in each modality is summarized by 18 well-known audio feature sets. Random forest (RF) is used as a machine learning algorithm, both for individual feature sets and for decision-level fusion. Detection performance is measured by the out-of-bag equal error rate (EER) and the cost of log-likelihood-ratio. Essentia audio feature set was the best using the AC speech modality and YAAFE audio feature set was the best using the SP unvoiced modality, achieving EER of 20.30% and 25.57%, respectively. Fusion of all feature sets and modalities resulted in EER of 19.27% for the AC and 23.00% for the SP channel. Non-linear projection of a RF-based proximity matrix into the 2D space enriched medical decision support by visualization.
A Comparative Study of Workplace Bullying Among Public and Private Employees in Europe.
Ariza-Montes, Antonio; Leal-Rodríguez, Antonio L; Leal-Millán, Antonio G
2015-06-01
Workplace bullying emerges from a set of individual, organizational, and contextual factors. The purpose of this article is hence to identify the influence of these factors among public and private employees. The study is carried out as a statistical-empirical cross-sectional study. The database used was obtained from the 5th European Working Conditions Survey 2010. The results reveal a common core with respect to the factors that determine workplace bullying. Despite this common base that integrates both models, the distinctive features of the harassed employee within the public sector deal with age, full-time work, the greater nighttime associated with certain public service professions, and a lower level of motivation. The present work summarizes a set of implications and proposes that, under normal conditions, workplace bullying could be reduced if job demands are limited and job resources are increased.
Breast Reference Set Application: Karen Anderson-ASU (2014) — EDRN Public Portal
In order to increase the predictive value of tumor-specific antibodies for use as immunodiagnostics, our EDRN BDL has developed a novel protein microarray technology, termed Nucleic Acid Protein Programmable Array (NAPPA), which circumvents many of the limitations of traditional protein microarrays. NAPPA arrays are generated by printing full-length cDNAs encoding the target proteins at each feature of the array. The proteins are then transcribed and translated by a cell-free system and immobilized in situ using epitope tags fused to the proteins. Sera are added, and bound IgG is detected by standard secondary reagents. Using a sequential screening strategy to select AAb from 4,988 candidate tumor antigens, we have identified 28 potential AAb biomarkers for the early detection of breast cancer, and here we propose to evaluate these biomarkers using the EDRN Breast Cancer Reference Set.
Two-photon calcium imaging during fictive navigation in virtual environments
Ahrens, Misha B.; Huang, Kuo Hua; Narayan, Sujatha; Mensh, Brett D.; Engert, Florian
2013-01-01
A full understanding of nervous system function requires recording from large populations of neurons during naturalistic behaviors. Here we enable paralyzed larval zebrafish to fictively navigate two-dimensional virtual environments while we record optically from many neurons with two-photon imaging. Electrical recordings from motor nerves in the tail are decoded into intended forward swims and turns, which are used to update a virtual environment displayed underneath the fish. Several behavioral features—such as turning responses to whole-field motion and dark avoidance—are well-replicated in this virtual setting. We readily observed neuronal populations in the hindbrain with laterally selective responses that correlated with right or left optomotor behavior. We also observed neurons in the habenula, pallium, and midbrain with response properties specific to environmental features. Beyond single-cell correlations, the classification of network activity in such virtual settings promises to reveal principles of brainwide neural dynamics during behavior. PMID:23761738
Flowering time and seed dormancy control use external coincidence to generate life history strategy.
Springthorpe, Vicki; Penfield, Steven
2015-03-31
Climate change is accelerating plant developmental transitions coordinated with the seasons in temperate environments. To understand the importance of these timing advances for a stable life history strategy, we constructed a full life cycle model of Arabidopsis thaliana. Modelling and field data reveal that a cryptic function of flowering time control is to limit seed set of winter annuals to an ambient temperature window which coincides with a temperature-sensitive switch in seed dormancy state. This coincidence is predicted to be conserved independent of climate at the expense of flowering date, suggesting that temperature control of flowering time has evolved to constrain seed set environment and therefore frequency of dormant and non-dormant seed states. We show that late flowering can disrupt this bet-hedging germination strategy. Our analysis shows that life history modelling can reveal hidden fitness constraints and identify non-obvious selection pressures as emergent features.
A target recognition method for maritime surveillance radars based on hybrid ensemble selection
NASA Astrophysics Data System (ADS)
Fan, Xueman; Hu, Shengliang; He, Jingbo
2017-11-01
In order to improve the generalisation ability of the maritime surveillance radar, a novel ensemble selection technique, termed Optimisation and Dynamic Selection (ODS), is proposed. During the optimisation phase, the non-dominated sorting genetic algorithm II for multi-objective optimisation is used to find the Pareto front, i.e. a set of ensembles of classifiers representing different tradeoffs between the classification error and diversity. During the dynamic selection phase, the meta-learning method is used to predict whether a candidate ensemble is competent enough to classify a query instance based on three different aspects, namely, feature space, decision space and the extent of consensus. The classification performance and time complexity of ODS are compared against nine other ensemble methods using a self-built full polarimetric high resolution range profile data-set. The experimental results clearly show the effectiveness of ODS. In addition, the influence of the selection of diversity measures is studied concurrently.
Bayesian learning of visual chunks by human observers
Orbán, Gergő; Fiser, József; Aslin, Richard N.; Lengyel, Máté
2008-01-01
Efficient and versatile processing of any hierarchically structured information requires a learning mechanism that combines lower-level features into higher-level chunks. We investigated this chunking mechanism in humans with a visual pattern-learning paradigm. We developed an ideal learner based on Bayesian model comparison that extracts and stores only those chunks of information that are minimally sufficient to encode a set of visual scenes. Our ideal Bayesian chunk learner not only reproduced the results of a large set of previous empirical findings in the domain of human pattern learning but also made a key prediction that we confirmed experimentally. In accordance with Bayesian learning but contrary to associative learning, human performance was well above chance when pair-wise statistics in the exemplars contained no relevant information. Thus, humans extract chunks from complex visual patterns by generating accurate yet economical representations and not by encoding the full correlational structure of the input. PMID:18268353
Model-Based Learning of Local Image Features for Unsupervised Texture Segmentation
NASA Astrophysics Data System (ADS)
Kiechle, Martin; Storath, Martin; Weinmann, Andreas; Kleinsteuber, Martin
2018-04-01
Features that capture well the textural patterns of a certain class of images are crucial for the performance of texture segmentation methods. The manual selection of features or designing new ones can be a tedious task. Therefore, it is desirable to automatically adapt the features to a certain image or class of images. Typically, this requires a large set of training images with similar textures and ground truth segmentation. In this work, we propose a framework to learn features for texture segmentation when no such training data is available. The cost function for our learning process is constructed to match a commonly used segmentation model, the piecewise constant Mumford-Shah model. This means that the features are learned such that they provide an approximately piecewise constant feature image with a small jump set. Based on this idea, we develop a two-stage algorithm which first learns suitable convolutional features and then performs a segmentation. We note that the features can be learned from a small set of images, from a single image, or even from image patches. The proposed method achieves a competitive rank in the Prague texture segmentation benchmark, and it is effective for segmenting histological images.
Peer-Based Social Media Features in Behavior Change Interventions: Systematic Review
Weal, Mark; Morrison, Leanne; Yardley, Lucy
2018-01-01
Background Incorporating social media features into digital behavior change interventions (DBCIs) has the potential to contribute positively to their success. However, the lack of clear design principles to describe and guide the use of these features in behavioral interventions limits cross-study comparisons of their uses and effects. Objective The aim of this study was to provide a systematic review of DBCIs targeting modifiable behavioral risk factors that have included social media features as part of their intervention infrastructure. A taxonomy of social media features is presented to inform the development, description, and evaluation of behavioral interventions. Methods Search terms were used in 8 databases to identify DBCIs that incorporated social media features and targeted tobacco smoking, diet and nutrition, physical activities, or alcohol consumption. The screening and review process was performed by 2 independent researchers. Results A total of 5264 articles were screened, and 143 articles describing a total of 134 studies were retained for full review. The majority of studies (70%) reported positive outcomes, followed by 28% finding no effects with regard to their respective objectives and hypothesis, and 2% of the studies found that their interventions had negative outcomes. Few studies reported on the association between the inclusion of social media features and intervention effect. A taxonomy of social media features used in behavioral interventions has been presented with 36 social media features organized under 7 high-level categories. The taxonomy has been used to guide the analysis of this review. Conclusions Although social media features are commonly included in DBCIs, there is an acute lack of information with respect to their effect on outcomes and a lack of clear guidance to inform the selection process based on the features’ suitability for the different behaviors. The proposed taxonomy along with the set of recommendations included in this review will support future research aimed at isolating and reporting the effects of social media features on DBCIs, cross-study comparisons, and evaluations. PMID:29472174
Content validity of the DSM-IV borderline and narcissistic personality disorder criteria sets.
Blais, M A; Hilsenroth, M J; Castlebury, F D
1997-01-01
This study sought to empirically evaluate the content validity of the newly revised DSM-IV narcissistic personality disorder (NPD) and borderline personality disorder (BPD) criteria sets. Using the essential features of each disorder as construct definitions, factor analysis was used to determine how adequately the criteria sets covered the constructs. In addition, this empirical investigation sought to: 1) help define the dimensions underlying these polythetic disorders; 2) identify core features of each diagnosis; and 3) highlight the characteristics that may be most useful in diagnosing these two disorders. Ninety-one outpatients meeting DSM-IV criteria for a personality disorder (PD) were identified through a retrospective analysis of chart information. Records of these 91 patients were independently rated on all of the BPD and NPD symptom criteria for the DSM-IV. Acceptable interrater reliability (kappa estimates) was obtained for both presence or absence of a PD and symptom criteria for BPD and NPD. The factor analysis, performed separately for each disorder, identified a three-factor solution for both the DSM-IV BPD and NPD criteria sets. The results of this study provide strong support for the content validity of the NPD criteria set and moderate support for the content validly of the BPD criteria set. Three domains were found to comprise the BPD criteria set, with the essential features of interpersonal and identity instability forming one domain, and impulsivity and affective instability each identified as separate domains. Factor analysis of the NPD criteria set found three factors basically corresponding to the essential features of grandiosity, lack of empathy, and need for admiration. Therefore, the NPD criteria set adequately covers the essential or defining features of the disorder.
Predicting a small molecule-kinase interaction map: A machine learning approach
2011-01-01
Background We present a machine learning approach to the problem of protein ligand interaction prediction. We focus on a set of binding data obtained from 113 different protein kinases and 20 inhibitors. It was attained through ATP site-dependent binding competition assays and constitutes the first available dataset of this kind. We extract information about the investigated molecules from various data sources to obtain an informative set of features. Results A Support Vector Machine (SVM) as well as a decision tree algorithm (C5/See5) is used to learn models based on the available features which in turn can be used for the classification of new kinase-inhibitor pair test instances. We evaluate our approach using different feature sets and parameter settings for the employed classifiers. Moreover, the paper introduces a new way of evaluating predictions in such a setting, where different amounts of information about the binding partners can be assumed to be available for training. Results on an external test set are also provided. Conclusions In most of the cases, the presented approach clearly outperforms the baseline methods used for comparison. Experimental results indicate that the applied machine learning methods are able to detect a signal in the data and predict binding affinity to some extent. For SVMs, the binding prediction can be improved significantly by using features that describe the active site of a kinase. For C5, besides diversity in the feature set, alignment scores of conserved regions turned out to be very useful. PMID:21708012
Radio-nuclide mixture identification using medium energy resolution detectors
Nelson, Karl Einar
2013-09-17
According to one embodiment, a method for identifying radio-nuclides includes receiving spectral data, extracting a feature set from the spectral data comparable to a plurality of templates in a template library, and using a branch and bound method to determine a probable template match based on the feature set and templates in the template library. In another embodiment, a device for identifying unknown radio-nuclides includes a processor, a multi-channel analyzer, and a memory operatively coupled to the processor, the memory having computer readable code stored thereon. The computer readable code is configured, when executed by the processor, to receive spectral data, to extract a feature set from the spectral data comparable to a plurality of templates in a template library, and to use a branch and bound method to determine a probable template match based on the feature set and templates in the template library.
Local Multi-Grouped Binary Descriptor With Ring-Based Pooling Configuration and Optimization.
Gao, Yongqiang; Huang, Weilin; Qiao, Yu
2015-12-01
Local binary descriptors are attracting increasingly attention due to their great advantages in computational speed, which are able to achieve real-time performance in numerous image/vision applications. Various methods have been proposed to learn data-dependent binary descriptors. However, most existing binary descriptors aim overly at computational simplicity at the expense of significant information loss which causes ambiguity in similarity measure using Hamming distance. In this paper, by considering multiple features might share complementary information, we present a novel local binary descriptor, referred as ring-based multi-grouped descriptor (RMGD), to successfully bridge the performance gap between current binary and floated-point descriptors. Our contributions are twofold. First, we introduce a new pooling configuration based on spatial ring-region sampling, allowing for involving binary tests on the full set of pairwise regions with different shapes, scales, and distances. This leads to a more meaningful description than the existing methods which normally apply a limited set of pooling configurations. Then, an extended Adaboost is proposed for an efficient bit selection by emphasizing high variance and low correlation, achieving a highly compact representation. Second, the RMGD is computed from multiple image properties where binary strings are extracted. We cast multi-grouped features integration as rankSVM or sparse support vector machine learning problem, so that different features can compensate strongly for each other, which is the key to discriminativeness and robustness. The performance of the RMGD was evaluated on a number of publicly available benchmarks, where the RMGD outperforms the state-of-the-art binary descriptors significantly.
NASA Astrophysics Data System (ADS)
Jaferzadeh, Keyvan; Moon, Inkyu
2016-12-01
The classification of erythrocytes plays an important role in the field of hematological diagnosis, specifically blood disorders. Since the biconcave shape of red blood cell (RBC) is altered during the different stages of hematological disorders, we believe that the three-dimensional (3-D) morphological features of erythrocyte provide better classification results than conventional two-dimensional (2-D) features. Therefore, we introduce a set of 3-D features related to the morphological and chemical properties of RBC profile and try to evaluate the discrimination power of these features against 2-D features with a neural network classifier. The 3-D features include erythrocyte surface area, volume, average cell thickness, sphericity index, sphericity coefficient and functionality factor, MCH and MCHSD, and two newly introduced features extracted from the ring section of RBC at the single-cell level. In contrast, the 2-D features are RBC projected surface area, perimeter, radius, elongation, and projected surface area to perimeter ratio. All features are obtained from images visualized by off-axis digital holographic microscopy with a numerical reconstruction algorithm, and four categories of biconcave (doughnut shape), flat-disc, stomatocyte, and echinospherocyte RBCs are interested. Our experimental results demonstrate that the 3-D features can be more useful in RBC classification than the 2-D features. Finally, we choose the best feature set of the 2-D and 3-D features by sequential forward feature selection technique, which yields better discrimination results. We believe that the final feature set evaluated with a neural network classification strategy can improve the RBC classification accuracy.
System and method for the detection of anomalies in an image
Prasad, Lakshman; Swaminarayan, Sriram
2013-09-03
Preferred aspects of the present invention can include receiving a digital image at a processor; segmenting the digital image into a hierarchy of feature layers comprising one or more fine-scale features defining a foreground object embedded in one or more coarser-scale features defining a background to the one or more fine-scale features in the segmentation hierarchy; detecting a first fine-scale foreground feature as an anomaly with respect to a first background feature within which it is embedded; and constructing an anomalous feature layer by synthesizing spatially contiguous anomalous fine-scale features. Additional preferred aspects of the present invention can include detecting non-pervasive changes between sets of images in response at least in part to one or more difference images between the sets of images.
GATOR: Requirements capturing of telephony features
NASA Technical Reports Server (NTRS)
Dankel, Douglas D., II; Walker, Wayne; Schmalz, Mark
1992-01-01
We are developing a natural language-based, requirements gathering system called GATOR (for the GATherer Of Requirements). GATOR assists in the development of more accurate and complete specifications of new telephony features. GATOR interacts with a feature designer who describes a new feature, set of features, or capability to be implemented. The system aids this individual in the specification process by asking for clarifications when potential ambiguities are present, by identifying potential conflicts with other existing features, and by presenting its understanding of the feature to the designer. Through user interaction with a model of the existing telephony feature set, GATOR constructs a formal representation of the new, 'to be implemented' feature. Ultimately GATOR will produce a requirements document and will maintain an internal representation of this feature to aid in future design and specification. This paper consists of three sections that describe (1) the structure of GATOR, (2) POND, GATOR's internal knowledge representation language, and (3) current research issues.
The value of teaching about geomorphology in non-traditional settings
NASA Astrophysics Data System (ADS)
Davis, R. Laurence
2002-10-01
Academics usually teach about geomorphology in the classroom, where the audience is enthusiastic, but generally small. Less traditional settings offer opportunities to reach a wider audience, one that is equally enthusiastic, given its love of geomorphic features in the National Parks, but one which has little knowledge of the science behind what they are seeing. I have "taught" geomorphology in four non-traditional settings: at a summer camp, a state wildlife refuge, on community field trips, and at meetings for clubs and government boards. This paper discusses my experiences and offers suggestions to others who may wish to follow this less-traveled educational path. As Head of Nature Programs at Camp Pemigewassett in New Hampshire, I have worked, over the last 33 years, with thousands of campers ranging from 8 to 15 years old. Our setting, in a glaciated valley on a small lake, exhibits a wide range of geomorphic features and offers many opportunities for direct learning through field investigations. I have found that even 8-year olds can do real science, if we avoid the jargon. Once "taught" they carry their knowledge about landforms and processes with them and eagerly share it with their friends and family on outings and trips, thus reaching an even wider public. Parks, wildlife refuges, nature preserves, and other similar areas generally have nature trails, often with educational information about the environment. Generally, interpretive signs are prepared by biologists and the content ignores the site's physical features, as well as the connections between ecological communities and the underlying geology and geomorphology. My students and I have addressed this situation at two places in Connecticut, one a state wildlife management area, also used for training teachers to teach Environmental Education, and the other, a town recreation area. We catalogued the geomorphic features, looked at relationships of the community level ecology to those features, and prepared interpretive signs that added this perspective to the trails. The public response has been extremely favorable. Geomorphology can also be taught by leading field trips for community organizations. I have done this twice, once for the Manchester (NH) Historical Society and once for a small watershed association. The attendance and interest surprised me. We finally had to limit the Manchester trip to one full busload (˜45) and the watershed trip, which was part of a "trails day," drew over 90 people. Finally, I have found that organizations such as Sierra Club chapters and town conservation boards are frequently looking for speakers for their periodic meetings. Why not a geomorphologist? After all, much of what conservationists do is related to what geomorphologists do. I have given several of these presentations and the receptions have always been enthusiastic. While the work involved in preparing to teach in one of these non-traditional settings is frequently substantial, the rewards are equally large. It is a way to reach masses of people who know little about the science of geomorphology and to demonstrate its importance to them. Taking our message directly to the public in these settings is an effective way to put geomorphology in the public eye.
Schwämmle, Veit; León, Ileana Rodríguez; Jensen, Ole Nørregaard
2013-09-06
Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.
Feature engineering for MEDLINE citation categorization with MeSH.
Jimeno Yepes, Antonio Jose; Plaza, Laura; Carrillo-de-Albornoz, Jorge; Mork, James G; Aronson, Alan R
2015-04-08
Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations. Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance. We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system.
Falkowski, Andrzej; Jabłońska, Magdalena
2018-01-01
In this study we followed the extension of Tversky’s research about features of similarity with its application to open sets. Unlike the original closed-set model in which a feature was shifted between a common and a distinctive set, we investigated how addition of new features and deletion of existing features affected similarity judgments. The model was tested empirically in a political context and we analyzed how positive and negative changes in a candidate’s profile affect the similarity of the politician to his or her ideal and opposite counterpart. The results showed a positive–negative asymmetry in comparison judgments where enhancing negative features (distinctive for an ideal political candidate) had a greater effect on judgments than operations on positive (common) features. However, the effect was not observed for comparisons to a bad politician. Further analyses showed that in the case of a negative reference point, the relationship between similarity judgments and voting intention was mediated by the affective evaluation of the candidate. PMID:29535663
Kramer, Robin S S; Manesi, Zoi; Towler, Alice; Reynolds, Michael G; Burton, A Mike
2018-01-01
As faces become familiar, we come to rely more on their internal features for recognition and matching tasks. Here, we assess whether this same pattern is also observed for a card sorting task. Participants sorted photos showing either the full face, only the internal features, or only the external features into multiple piles, one pile per identity. In Experiments 1 and 2, we showed the standard advantage for familiar faces-sorting was more accurate and showed very few errors in comparison with unfamiliar faces. However, for both familiar and unfamiliar faces, sorting was less accurate for external features and equivalent for internal and full faces. In Experiment 3, we asked whether external features can ever be used to make an accurate sort. Using familiar faces and instructions on the number of identities present, we nevertheless found worse performance for the external in comparison with the internal features, suggesting that less identity information was available in the former. Taken together, we show that full faces and internal features are similarly informative with regard to identity. In comparison, external features contain less identity information and produce worse card sorting performance. This research extends current thinking on the shift in focus, both in attention and importance, toward the internal features and away from the external features as familiarity with a face increases.
Intrusion detection using rough set classification.
Zhang, Lian-hua; Zhang, Guan-hua; Zhang, Jie; Bai, Ying-cai
2004-09-01
Recently machine learning-based intrusion detection approaches have been subjected to extensive researches because they can detect both misuse and anomaly. In this paper, rough set classification (RSC), a modern learning algorithm, is used to rank the features extracted for detecting intrusions and generate intrusion detection models. Feature ranking is a very critical step when building the model. RSC performs feature ranking before generating rules, and converts the feature ranking to minimal hitting set problem addressed by using genetic algorithm (GA). This is done in classical approaches using Support Vector Machine (SVM) by executing many iterations, each of which removes one useless feature. Compared with those methods, our method can avoid many iterations. In addition, a hybrid genetic algorithm is proposed to increase the convergence speed and decrease the training time of RSC. The models generated by RSC take the form of "IF-THEN" rules, which have the advantage of explication. Tests and comparison of RSC with SVM on DARPA benchmark data showed that for Probe and DoS attacks both RSC and SVM yielded highly accurate results (greater than 99% accuracy on testing set).
NASA Astrophysics Data System (ADS)
Planck Collaboration; Aghanim, N.; Akrami, Y.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Ballardini, M.; Banday, A. J.; Barreiro, R. B.; Bartolo, N.; Basak, S.; Benabed, K.; Bersanelli, M.; Bielewicz, P.; Bonaldi, A.; Bonavera, L.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Burigana, C.; Calabrese, E.; Cardoso, J.-F.; Challinor, A.; Chiang, H. C.; Colombo, L. P. L.; Combet, C.; Crill, B. P.; Curto, A.; Cuttaia, F.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Di Valentino, E.; Dickinson, C.; Diego, J. M.; Doré, O.; Ducout, A.; Dupac, X.; Dusini, S.; Efstathiou, G.; Elsner, F.; Enßlin, T. A.; Eriksen, H. K.; Fantaye, Y.; Finelli, F.; Forastieri, F.; Frailis, M.; Franceschi, E.; Frolov, A.; Galeotta, S.; Galli, S.; Ganga, K.; Génova-Santos, R. T.; Gerbino, M.; González-Nuevo, J.; Górski, K. M.; Gratton, S.; Gruppuso, A.; Gudmundsson, J. E.; Herranz, D.; Hivon, E.; Huang, Z.; Jaffe, A. H.; Jones, W. C.; Keihänen, E.; Keskitalo, R.; Kiiveri, K.; Kim, J.; Kisner, T. S.; Knox, L.; Krachmalnicoff, N.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Le Jeune, M.; Levrier, F.; Lewis, A.; Liguori, M.; Lilje, P. B.; Lilley, M.; Lindholm, V.; López-Caniego, M.; Lubin, P. M.; Ma, Y.-Z.; Macías-Pérez, J. F.; Maggio, G.; Maino, D.; Mandolesi, N.; Mangilli, A.; Maris, M.; Martin, P. G.; Martínez-González, E.; Matarrese, S.; Mauri, N.; McEwen, J. D.; Meinhold, P. R.; Mennella, A.; Migliaccio, M.; Millea, M.; Miville-Deschênes, M.-A.; Molinari, D.; Moneti, A.; Montier, L.; Morgante, G.; Moss, A.; Narimani, A.; Natoli, P.; Oxborrow, C. A.; Pagano, L.; Paoletti, D.; Partridge, B.; Patanchon, G.; Patrizii, L.; Pettorino, V.; Piacentini, F.; Polastri, L.; Polenta, G.; Puget, J.-L.; Rachen, J. P.; Racine, B.; Reinecke, M.; Remazeilles, M.; Renzi, A.; Rocha, G.; Rossetti, M.; Roudier, G.; Rubiño-Martín, J. A.; Ruiz-Granados, B.; Salvati, L.; Sandri, M.; Savelainen, M.; Scott, D.; Sirignano, C.; Sirri, G.; Stanco, L.; Suur-Uski, A.-S.; Tauber, J. A.; Tavagnacco, D.; Tenti, M.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Trombetti, T.; Valiviita, J.; Van Tent, F.; Vielva, P.; Villa, F.; Vittorio, N.; Wandelt, B. D.; Wehus, I. K.; White, M.; Zacchei, A.; Zonca, A.
2017-11-01
The six parameters of the standard ΛCDM model have best-fit values derived from the Planck temperature power spectrum that are shifted somewhat from the best-fit values derived from WMAP data. These shifts are driven by features in the Planck temperature power spectrum at angular scales that had never before been measured to cosmic-variance level precision. We have investigated these shifts to determine whether they are within the range of expectation and to understand their origin in the data. Taking our parameter set to be the optical depth of the reionized intergalactic medium τ, the baryon density ωb, the matter density ωm, the angular size of the sound horizon θ∗, the spectral index of the primordial power spectrum, ns, and Ase- 2τ (where As is the amplitude of the primordial power spectrum), we have examined the change in best-fit values between a WMAP-like large angular-scale data set (with multipole moment ℓ < 800 in the Planck temperature power spectrum) and an all angular-scale data set (ℓ < 2500Planck temperature power spectrum), each with a prior on τ of 0.07 ± 0.02. We find that the shifts, in units of the 1σ expected dispersion for each parameter, are { Δτ,ΔAse- 2τ,Δns,Δωm,Δωb,Δθ∗ } = { -1.7,-2.2,1.2,-2.0,1.1,0.9 }, with a χ2 value of 8.0. We find that this χ2 value is exceeded in 15% of our simulated data sets, and that a parameter deviates by more than 2.2σ in 9% of simulated data sets, meaning that the shifts are not unusually large. Comparing ℓ < 800 instead to ℓ> 800, or splitting at a different multipole, yields similar results. We examined the ℓ < 800 model residuals in the ℓ> 800 power spectrum data and find that the features there that drive these shifts are a set of oscillations across a broad range of angular scales. Although they partly appear similar to the effects of enhanced gravitational lensing, the shifts in ΛCDM parameters that arise in response to these features correspond to model spectrum changes that are predominantly due to non-lensing effects; the only exception is τ, which, at fixed Ase- 2τ, affects the ℓ> 800 temperature power spectrum solely through the associated change in As and the impact of that on the lensing potential power spectrum. We also ask, "what is it about the power spectrum at ℓ < 800 that leads to somewhat different best-fit parameters than come from the full ℓ range?" We find that if we discard the data at ℓ < 30, where there is a roughly 2σ downward fluctuation in power relative to the model that best fits the full ℓ range, the ℓ < 800 best-fit parameters shift significantly towards the ℓ < 2500 best-fit parameters. In contrast, including ℓ < 30, this previously noted "low-ℓ deficit" drives ns up and impacts parameters correlated with ns, such as ωm and H0. As expected, the ℓ < 30 data have a much greater impact on the ℓ < 800 best fit than on the ℓ < 2500 best fit. So although the shifts are not very significant, we find that they can be understood through the combined effects of an oscillatory-like set of high-ℓ residuals and the deficit in low-ℓ power, excursions consistent with sample variance that happen to map onto changes in cosmological parameters. Finally, we examine agreement between PlanckTT data and two other CMB data sets, namely the Planck lensing reconstruction and the TT power spectrum measured by the South Pole Telescope, again finding a lack of convincing evidence of any significant deviations in parameters, suggesting that current CMB data sets give an internally consistent picture of the ΛCDM model.
Quality assessment of data discrimination using self-organizing maps.
Mekler, Alexey; Schwarz, Dmitri
2014-10-01
One of the important aspects of the data classification problem lies in making the most appropriate selection of features. The set of variables should be small and, at the same time, should provide reliable discrimination of the classes. The method for the discriminating power evaluation that enables a comparison between different sets of variables will be useful in the search for the set of variables. A new approach to feature selection is presented. Two methods of evaluation of the data discriminating power of a feature set are suggested. Both of the methods implement self-organizing maps (SOMs) and the newly introduced exponents of the degree of data clusterization on the SOM. The first method is based on the comparison of intraclass and interclass distances on the map. Another method concerns the evaluation of the relative number of best matching unit's (BMUs) nearest neighbors of the same class. Both methods make it possible to evaluate the discriminating power of a feature set in cases when this set provides nonlinear discrimination of the classes. Current algorithms in program code can be downloaded for free at http://mekler.narod.ru/Science/Articles_support.html, as well as the supporting data files. Copyright © 2014 Elsevier Inc. All rights reserved.
Learning representations for the early detection of sepsis with deep neural networks.
Kam, Hye Jin; Kim, Ha Young
2017-10-01
Sepsis is one of the leading causes of death in intensive care unit patients. Early detection of sepsis is vital because mortality increases as the sepsis stage worsens. This study aimed to develop detection models for the early stage of sepsis using deep learning methodologies, and to compare the feasibility and performance of the new deep learning methodology with those of the regression method with conventional temporal feature extraction. Study group selection adhered to the InSight model. The results of the deep learning-based models and the InSight model were compared. With deep feedforward networks, the area under the ROC curve (AUC) of the models were 0.887 and 0.915 for the InSight and the new feature sets, respectively. For the model with the combined feature set, the AUC was the same as that of the basic feature set (0.915). For the long short-term memory model, only the basic feature set was applied and the AUC improved to 0.929 compared with the existing 0.887 of the InSight model. The contributions of this paper can be summarized in three ways: (i) improved performance without feature extraction using domain knowledge, (ii) verification of feature extraction capability of deep neural networks through comparison with reference features, and (iii) improved performance with feedforward neural networks using long short-term memory, a neural network architecture that can learn sequential patterns. Copyright © 2017 Elsevier Ltd. All rights reserved.
Feature selection for elderly faller classification based on wearable sensors.
Howcroft, Jennifer; Kofman, Jonathan; Lemaire, Edward D
2017-05-30
Wearable sensors can be used to derive numerous gait pattern features for elderly fall risk and faller classification; however, an appropriate feature set is required to avoid high computational costs and the inclusion of irrelevant features. The objectives of this study were to identify and evaluate smaller feature sets for faller classification from large feature sets derived from wearable accelerometer and pressure-sensing insole gait data. A convenience sample of 100 older adults (75.5 ± 6.7 years; 76 non-fallers, 24 fallers based on 6 month retrospective fall occurrence) walked 7.62 m while wearing pressure-sensing insoles and tri-axial accelerometers at the head, pelvis, left and right shanks. Feature selection was performed using correlation-based feature selection (CFS), fast correlation based filter (FCBF), and Relief-F algorithms. Faller classification was performed using multi-layer perceptron neural network, naïve Bayesian, and support vector machine classifiers, with 75:25 single stratified holdout and repeated random sampling. The best performing model was a support vector machine with 78% accuracy, 26% sensitivity, 95% specificity, 0.36 F1 score, and 0.31 MCC and one posterior pelvis accelerometer input feature (left acceleration standard deviation). The second best model achieved better sensitivity (44%) and used a support vector machine with 74% accuracy, 83% specificity, 0.44 F1 score, and 0.29 MCC. This model had ten input features: maximum, mean and standard deviation posterior acceleration; maximum, mean and standard deviation anterior acceleration; mean superior acceleration; and three impulse features. The best multi-sensor model sensitivity (56%) was achieved using posterior pelvis and both shank accelerometers and a naïve Bayesian classifier. The best single-sensor model sensitivity (41%) was achieved using the posterior pelvis accelerometer and a naïve Bayesian classifier. Feature selection provided models with smaller feature sets and improved faller classification compared to faller classification without feature selection. CFS and FCBF provided the best feature subset (one posterior pelvis accelerometer feature) for faller classification. However, better sensitivity was achieved by the second best model based on a Relief-F feature subset with three pressure-sensing insole features and seven head accelerometer features. Feature selection should be considered as an important step in faller classification using wearable sensors.
A hybrid feature selection approach for the early diagnosis of Alzheimer’s disease
NASA Astrophysics Data System (ADS)
Gallego-Jutglà, Esteve; Solé-Casals, Jordi; Vialatte, François-Benoît; Elgendi, Mohamed; Cichocki, Andrzej; Dauwels, Justin
2015-02-01
Objective. Recently, significant advances have been made in the early diagnosis of Alzheimer’s disease (AD) from electroencephalography (EEG). However, choosing suitable measures is a challenging task. Among other measures, frequency relative power (RP) and loss of complexity have been used with promising results. In the present study we investigate the early diagnosis of AD using synchrony measures and frequency RP on EEG signals, examining the changes found in different frequency ranges. Approach. We first explore the use of a single feature for computing the classification rate (CR), looking for the best frequency range. Then, we present a multiple feature classification system that outperforms all previous results using a feature selection strategy. These two approaches are tested in two different databases, one containing mild cognitive impairment (MCI) and healthy subjects (patients age: 71.9 ± 10.2, healthy subjects age: 71.7 ± 8.3), and the other containing Mild AD and healthy subjects (patients age: 77.6 ± 10.0 healthy subjects age: 69.4 ± 11.5). Main results. Using a single feature to compute CRs we achieve a performance of 78.33% for the MCI data set and of 97.56% for Mild AD. Results are clearly improved using the multiple feature classification, where a CR of 95% is found for the MCI data set using 11 features, and 100% for the Mild AD data set using four features. Significance. The new features selection method described in this work may be a reliable tool that could help to design a realistic system that does not require prior knowledge of a patient's status. With that aim, we explore the standardization of features for MCI and Mild AD data sets with promising results.
On the BRST Quantization of the Massless Bosonic Particle in Twistor-Like Formulation
NASA Astrophysics Data System (ADS)
Bandos, Igor; Maznytsia, Alexey; Rudychev, Igor; Sorokin, Dmitri
We study some features of bosonic-particle path-integral quantization in a twistor-like approach by the use of the BRST-BFV-quantization prescription. In the course of the Hamiltonian analysis we observe links between various formulations of the twistor-like particle by performing a conversion of the Hamiltonian constraints of one formulation to another. A particular feature of the conversion procedure applied to turn the second-class constraints into first-class constraints is that the simplest Lorentz-covariant way to do this is to convert a full mixed set of the initial first- and second-class constraints rather than explicitly extracting and converting only the second-class constraints. Another novel feature of the conversion procedure applied below is that in the case of the D = 4 and D = 6 twistor-like particle the number of new auxiliary Lorentz-covariant coordinates, which one introduces to get a system of first-class constraints in an extended phase space, exceeds the number of independent second-class constraints of the original dynamical system. We calculate the twistor-like particle propagator in D = 3,4,6 space-time dimensions and show that it coincides with that of a conventional massless bosonic particle.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rodgers, A J; Petersson, N A; Morency, C E
The California Academy of Sciences (CAS) Morrison Planetarium is producing a 'full-dome' planetarium show on earthquakes and asked LLNL to produce content for the show. Specifically the show features numerical ground motion simulations of the M 7.9 1906 San Francisco and a possible future M 7.05 Hayward fault scenario earthquake. The show also features concepts of plate tectonics and mantle convection using images from LLNL's G3D global seismic tomography. This document describes the data that was provided to the CAS in support of production of the 'Earthquake' show. The CAS is located in Golden Gate Park, San Francisco and hostsmore » over 1.6 million visitors. The Morrison Planetarium, within the CAS, is the largest all digital planetarium in the world. It features a 75-foot diameter spherical section projection screen tilted at a 30-degree angle. Six projectors cover the entire field of view and give a three-dimensional immersive experience. CAS shows strive to use scientifically accurate digital data in their productions. The show, entitled simply 'Earthquake', will debut on 26 May 2012. They are working on graphics and animations based on the same data sets for display on LLNL powerwalls and flat-screens as well as for public release.« less
Grubert, Anna; Eimer, Martin
2016-08-01
To study whether top-down attentional control processes can be set simultaneously for different visual features, we employed a spatial cueing procedure to measure behavioral and electrophysiological markers of task-set contingent attentional capture during search for targets defined by 1 or 2 possible colors (one-color and two-color tasks). Search arrays were preceded by spatially nonpredictive color singleton cues. Behavioral spatial cueing effects indicative of attentional capture were elicited only by target-matching but not by distractor-color cues. However, when search displays contained 1 target-color and 1 distractor-color object among gray nontargets, N2pc components were triggered not only by target-color but also by distractor-color cues both in the one-color and two-color task, demonstrating that task-set nonmatching items attracted attention. When search displays contained 6 items in 6 different colors, so that participants had to adopt a fully feature-specific task set, the N2pc to distractor-color cues was eliminated in both tasks, indicating that nonmatching items were now successfully excluded from attentional processing. These results demonstrate that when observers adopt a feature-specific search mode, attentional task sets can be configured flexibly for multiple features within the same dimension, resulting in the rapid allocation of attention to task-set matching objects only. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Decorrelation of the true and estimated classifier errors in high-dimensional settings.
Hanczar, Blaise; Hua, Jianping; Dougherty, Edward R
2007-01-01
The aim of many microarray experiments is to build discriminatory diagnosis and prognosis models. Given the huge number of features and the small number of examples, model validity which refers to the precision of error estimation is a critical issue. Previous studies have addressed this issue via the deviation distribution (estimated error minus true error), in particular, the deterioration of cross-validation precision in high-dimensional settings where feature selection is used to mitigate the peaking phenomenon (overfitting). Because classifier design is based upon random samples, both the true and estimated errors are sample-dependent random variables, and one would expect a loss of precision if the estimated and true errors are not well correlated, so that natural questions arise as to the degree of correlation and the manner in which lack of correlation impacts error estimation. We demonstrate the effect of correlation on error precision via a decomposition of the variance of the deviation distribution, observe that the correlation is often severely decreased in high-dimensional settings, and show that the effect of high dimensionality on error estimation tends to result more from its decorrelating effects than from its impact on the variance of the estimated error. We consider the correlation between the true and estimated errors under different experimental conditions using both synthetic and real data, several feature-selection methods, different classification rules, and three error estimators commonly used (leave-one-out cross-validation, k-fold cross-validation, and .632 bootstrap). Moreover, three scenarios are considered: (1) feature selection, (2) known-feature set, and (3) all features. Only the first is of practical interest; however, the other two are needed for comparison purposes. We will observe that the true and estimated errors tend to be much more correlated in the case of a known feature set than with either feature selection or using all features, with the better correlation between the latter two showing no general trend, but differing for different models.
Setting conservation targets for sandy beach ecosystems
NASA Astrophysics Data System (ADS)
Harris, Linda; Nel, Ronel; Holness, Stephen; Sink, Kerry; Schoeman, David
2014-10-01
Representative and adequate reserve networks are key to conserving biodiversity. This begs the question, how much of which features need to be placed in protected areas? Setting specifically-derived conservation targets for most ecosystems is common practice; however, this has never been done for sandy beaches. The aims of this paper, therefore, are to propose a methodology for setting conservation targets for sandy beach ecosystems; and to pilot the proposed method using data describing biodiversity patterns and processes from microtidal beaches in South Africa. First, a classification scheme of valued features of beaches is constructed, including: biodiversity features; unique features; and important processes. Second, methodologies for setting targets for each feature under different data-availability scenarios are described. From this framework, targets are set for features characteristic of microtidal beaches in South Africa, as follows. 1) Targets for dune vegetation types were adopted from a previous assessment, and ranged 19-100%. 2) Targets for beach morphodynamic types (habitats) were set using species-area relationships (SARs). These SARs were derived from species richness data from 142 sampling events around the South African coast (extrapolated to total theoretical species richness estimates using previously-established species-accumulation curve relationships), plotted against the area of the beach (calculated from Google Earth imagery). The species-accumulation factor (z) was 0.22, suggesting a baseline habitat target of 27% is required to protect 75% of the species. This baseline target was modified by heuristic principles, based on habitat rarity and threat status, with final values ranging 27-40%. 3) Species targets were fixed at 20%, modified using heuristic principles based on endemism, threat status, and whether or not beaches play an important role in the species' life history, with targets ranging 20-100%. 4) Targets for processes and 5) important assemblages were set at 50%, following other studies. 6) Finally, a target for an outstanding feature (the Alexandria dunefield) was set at 80% because of its national, international and ecological importance. The greatest shortfall in the current target-setting process is in the lack of empirical models describing the key beach processes, from which robust ecological thresholds can be derived. As for many other studies, our results illustrate that the conservation target of 10% for coastal and marine systems proposed by the Convention on Biological Diversity is too low to conserve sandy beaches and their biota.
Reproducibility and Prognosis of Quantitative Features Extracted from CT Images12
Balagurunathan, Yoganand; Gu, Yuhua; Wang, Hua; Kumar, Virendra; Grove, Olya; Hawkins, Sam; Kim, Jongphil; Goldgof, Dmitry B; Hall, Lawrence O; Gatenby, Robert A; Gillies, Robert J
2014-01-01
We study the reproducibility of quantitative imaging features that are used to describe tumor shape, size, and texture from computed tomography (CT) scans of non-small cell lung cancer (NSCLC). CT images are dependent on various scanning factors. We focus on characterizing image features that are reproducible in the presence of variations due to patient factors and segmentation methods. Thirty-two NSCLC nonenhanced lung CT scans were obtained from the Reference Image Database to Evaluate Response data set. The tumors were segmented using both manual (radiologist expert) and ensemble (software-automated) methods. A set of features (219 three-dimensional and 110 two-dimensional) was computed, and quantitative image features were statistically filtered to identify a subset of reproducible and nonredundant features. The variability in the repeated experiment was measured by the test-retest concordance correlation coefficient (CCCTreT). The natural range in the features, normalized to variance, was measured by the dynamic range (DR). In this study, there were 29 features across segmentation methods found with CCCTreT and DR ≥ 0.9 and R2Bet ≥ 0.95. These reproducible features were tested for predicting radiologist prognostic score; some texture features (run-length and Laws kernels) had an area under the curve of 0.9. The representative features were tested for their prognostic capabilities using an independent NSCLC data set (59 lung adenocarcinomas), where one of the texture features, run-length gray-level nonuniformity, was statistically significant in separating the samples into survival groups (P ≤ .046). PMID:24772210
Set of Frequent Word Item sets as Feature Representation for Text with Indonesian Slang
NASA Astrophysics Data System (ADS)
Sa'adillah Maylawati, Dian; Putri Saptawati, G. A.
2017-01-01
Indonesian slang are commonly used in social media. Due to their unstructured syntax, it is difficult to extract their features based on Indonesian grammar for text mining. To do so, we propose Set of Frequent Word Item sets (SFWI) as text representation which is considered match for Indonesian slang. Besides, SFWI is able to keep the meaning of Indonesian slang with regard to the order of appearance sentence. We use FP-Growth algorithm with adding separation sentence function into the algorithm to extract the feature of SFWI. The experiments is done with text data from social media such as Facebook, Twitter, and personal website. The result of experiments shows that Indonesian slang were more correctly interpreted based on SFWI.
Pérez-Hernández, Guillermo; Noé, Frank
2016-12-13
Analysis of molecular dynamics, for example using Markov models, often requires the identification of order parameters that are good indicators of the rare events, i.e. good reaction coordinates. Recently, it has been shown that the time-lagged independent component analysis (TICA) finds the linear combinations of input coordinates that optimally represent the slow kinetic modes and may serve in order to define reaction coordinates between the metastable states of the molecular system. A limitation of the method is that both computing time and memory requirements scale with the square of the number of input features. For large protein systems, this exacerbates the use of extensive feature sets such as the distances between all pairs of residues or even heavy atoms. Here we derive a hierarchical TICA (hTICA) method that approximates the full TICA solution by a hierarchical, divide-and-conquer calculation. By using hTICA on distances between heavy atoms we identify previously unknown relaxation processes in the bovine pancreatic trypsin inhibitor.
Piloted Simulator Evaluation of Maneuvering Envelope Information for Flight Crew Awareness
NASA Technical Reports Server (NTRS)
Lombaerts, Thomas; Schuet, Stefan; Acosta, Diana; Kaneshige, John; Shish, Kimberlee; Martin, Lynne
2015-01-01
The implementation and evaluation of an efficient method for estimating safe aircraft maneuvering envelopes are discussed. A Bayesian approach is used to produce a deterministic algorithm for estimating aerodynamic system parameters from existing noisy sensor measurements, which are then used to estimate the trim envelope through efficient high- fidelity model-based computations of attainable equilibrium sets. The safe maneuverability limitations are extended beyond the trim envelope through a robust reachability analysis derived from an optimal control formulation. The trim and maneuvering envelope limits are then conveyed to pilots through three axes on the primary flight display. To evaluate the new display features, commercial airline crews flew multiple challenging approach and landing scenarios in the full motion Advanced Concepts Flight Simulator at NASA Ames Research Center, as part of a larger research initiative to investigate the impact on the energy state awareness of the crew. Results show that the additional display features have the potential to significantly improve situational awareness of the flight crew.
Balaur, Eugeniu; Sadatnajafi, Catherine; Kou, Shan Shan; Lin, Jiao; Abbey, Brian
2016-06-17
Colour filters based on nano-apertures in thin metallic films have been widely studied due to their extraordinary optical transmission and small size. These properties make them prime candidates for use in high-resolution colour displays and high accuracy bio-sensors. The inclusion of polarization sensitive plasmonic features in such devices allow additional control over the electromagnetic field distribution, critical for investigations of polarization induced phenomena. Here we demonstrate that cross-shaped nano-apertures can be used for polarization controlled color tuning in the visible range and apply fundamental theoretical models to interpret key features of the transmitted spectrum. Full color transmission was achieved by fine-tuning the periodicity of the apertures, whilst keeping the geometry of individual apertures constant. We demonstrate this effect for both transverse electric and magnetic fields. Furthermore we have been able to demonstrate the same polarization sensitivity even for nano-size, sub-wavelength sets of arrays, which is paramount for ultra-high resolution compact colour displays.
Balaur, Eugeniu; Sadatnajafi, Catherine; Kou, Shan Shan; Lin, Jiao; Abbey, Brian
2016-01-01
Colour filters based on nano-apertures in thin metallic films have been widely studied due to their extraordinary optical transmission and small size. These properties make them prime candidates for use in high-resolution colour displays and high accuracy bio-sensors. The inclusion of polarization sensitive plasmonic features in such devices allow additional control over the electromagnetic field distribution, critical for investigations of polarization induced phenomena. Here we demonstrate that cross-shaped nano-apertures can be used for polarization controlled color tuning in the visible range and apply fundamental theoretical models to interpret key features of the transmitted spectrum. Full color transmission was achieved by fine-tuning the periodicity of the apertures, whilst keeping the geometry of individual apertures constant. We demonstrate this effect for both transverse electric and magnetic fields. Furthermore we have been able to demonstrate the same polarization sensitivity even for nano-size, sub-wavelength sets of arrays, which is paramount for ultra-high resolution compact colour displays. PMID:27312072
Complex Topographic Feature Ontology Patterns
Varanka, Dalia E.; Jerris, Thomas J.
2015-01-01
Semantic ontologies are examined as effective data models for the representation of complex topographic feature types. Complex feature types are viewed as integrated relations between basic features for a basic purpose. In the context of topographic science, such component assemblages are supported by resource systems and found on the local landscape. Ontologies are organized within six thematic modules of a domain ontology called Topography that includes within its sphere basic feature types, resource systems, and landscape types. Context is constructed not only as a spatial and temporal setting, but a setting also based on environmental processes. Types of spatial relations that exist between components include location, generative processes, and description. An example is offered in a complex feature type ‘mine.’ The identification and extraction of complex feature types are an area for future research.
Robust tumor morphometry in multispectral fluorescence microscopy
NASA Astrophysics Data System (ADS)
Tabesh, Ali; Vengrenyuk, Yevgen; Teverovskiy, Mikhail; Khan, Faisal M.; Sapir, Marina; Powell, Douglas; Mesa-Tejada, Ricardo; Donovan, Michael J.; Fernandez, Gerardo
2009-02-01
Morphological and architectural characteristics of primary tissue compartments, such as epithelial nuclei (EN) and cytoplasm, provide important cues for cancer diagnosis, prognosis, and therapeutic response prediction. We propose two feature sets for the robust quantification of these characteristics in multiplex immunofluorescence (IF) microscopy images of prostate biopsy specimens. To enable feature extraction, EN and cytoplasm regions were first segmented from the IF images. Then, feature sets consisting of the characteristics of the minimum spanning tree (MST) connecting the EN and the fractal dimension (FD) of gland boundaries were obtained from the segmented compartments. We demonstrated the utility of the proposed features in prostate cancer recurrence prediction on a multi-institution cohort of 1027 patients. Univariate analysis revealed that both FD and one of the MST features were highly effective for predicting cancer recurrence (p <= 0.0001). In multivariate analysis, an MST feature was selected for a model incorporating clinical and image features. The model achieved a concordance index (CI) of 0.73 on the validation set, which was significantly higher than the CI of 0.69 for the standard multivariate model based solely on clinical features currently used in clinical practice (p < 0.0001). The contributions of this work are twofold. First, it is the first demonstration of the utility of the proposed features in morphometric analysis of IF images. Second, this is the largest scale study of the efficacy and robustness of the proposed features in prostate cancer prognosis.
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.
Ni, Qianwu; Chen, Lei
2017-01-01
Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Action recognition using mined hierarchical compound features.
Gilbert, Andrew; Illingworth, John; Bowden, Richard
2011-05-01
The field of Action Recognition has seen a large increase in activity in recent years. Much of the progress has been through incorporating ideas from single-frame object recognition and adapting them for temporal-based action recognition. Inspired by the success of interest points in the 2D spatial domain, their 3D (space-time) counterparts typically form the basic components used to describe actions, and in action recognition the features used are often engineered to fire sparsely. This is to ensure that the problem is tractable; however, this can sacrifice recognition accuracy as it cannot be assumed that the optimum features in terms of class discrimination are obtained from this approach. In contrast, we propose to initially use an overcomplete set of simple 2D corners in both space and time. These are grouped spatially and temporally using a hierarchical process, with an increasing search area. At each stage of the hierarchy, the most distinctive and descriptive features are learned efficiently through data mining. This allows large amounts of data to be searched for frequently reoccurring patterns of features. At each level of the hierarchy, the mined compound features become more complex, discriminative, and sparse. This results in fast, accurate recognition with real-time performance on high-resolution video. As the compound features are constructed and selected based upon their ability to discriminate, their speed and accuracy increase at each level of the hierarchy. The approach is tested on four state-of-the-art data sets, the popular KTH data set to provide a comparison with other state-of-the-art approaches, the Multi-KTH data set to illustrate performance at simultaneous multiaction classification, despite no explicit localization information provided during training. Finally, the recent Hollywood and Hollywood2 data sets provide challenging complex actions taken from commercial movie sequences. For all four data sets, the proposed hierarchical approach outperforms all other methods reported thus far in the literature and can achieve real-time operation.
NASA Astrophysics Data System (ADS)
Tadini, A.; Bisson, M.; Neri, A.; Cioni, R.; Bevilacqua, A.; Aspinall, W. P.
2017-06-01
This study presents new and revised data sets about the spatial distribution of past volcanic vents, eruptive fissures, and regional/local structures of the Somma-Vesuvio volcanic system (Italy). The innovative features of the study are the identification and quantification of important sources of uncertainty affecting interpretations of the data sets. In this regard, the spatial uncertainty of each feature is modeled by an uncertainty area, i.e., a geometric element typically represented by a polygon drawn around points or lines. The new data sets have been assembled as an updatable geodatabase that integrates and complements existing databases for Somma-Vesuvio. The data are organized into 4 data sets and stored as 11 feature classes (points and lines for feature locations and polygons for the associated uncertainty areas), totaling more than 1700 elements. More specifically, volcanic vent and eruptive fissure elements are subdivided into feature classes according to their associated eruptive styles: (i) Plinian and sub-Plinian eruptions (i.e., large- or medium-scale explosive activity); (ii) violent Strombolian and continuous ash emission eruptions (i.e., small-scale explosive activity); and (iii) effusive eruptions (including eruptions from both parasitic vents and eruptive fissures). Regional and local structures (i.e., deep faults) are represented as linear feature classes. To support interpretation of the eruption data, additional data sets are provided for Somma-Vesuvio geological units and caldera morphological features. In the companion paper, the data presented here, and the associated uncertainties, are used to develop a first vent opening probability map for the Somma-Vesuvio caldera, with specific attention focused on large or medium explosive events.
Voxel classification based airway tree segmentation
NASA Astrophysics Data System (ADS)
Lo, Pechin; de Bruijne, Marleen
2008-03-01
This paper presents a voxel classification based method for segmenting the human airway tree in volumetric computed tomography (CT) images. In contrast to standard methods that use only voxel intensities, our method uses a more complex appearance model based on a set of local image appearance features and Kth nearest neighbor (KNN) classification. The optimal set of features for classification is selected automatically from a large set of features describing the local image structure at several scales. The use of multiple features enables the appearance model to differentiate between airway tree voxels and other voxels of similar intensities in the lung, thus making the segmentation robust to pathologies such as emphysema. The classifier is trained on imperfect segmentations that can easily be obtained using region growing with a manual threshold selection. Experiments show that the proposed method results in a more robust segmentation that can grow into the smaller airway branches without leaking into emphysematous areas, and is able to segment many branches that are not present in the training set.
Zhu, Lei; Gonder, Jeffrey; Lin, Lei
2017-08-16
With the development of and advances in smartphones and global positioning system (GPS) devices, travelers’ long-term travel behaviors are not impossible to obtain. This study investigates the pattern of individual travel behavior and its correlation with social-demographic features. For different social-demographic groups (e.g., full-time employees and students), the individual travel behavior may have specific temporal-spatial-mobile constraints. The study first extracts the home-based tours, including Home-to-Home and Home-to-Non-Home, from long-term raw GPS data. The travel behavior pattern is then delineated by home-based tour features, such as departure time, destination location entropy, travel time, and driving time ratio. The travel behavior variabilitymore » describes the variances of travelers’ activity behavior features for an extended period. After that, the variability pattern of an individual’s travel behavior is used for estimating the individual’s social-demographic information, such as social-demographic role, by a supervised learning approach, support vector machine. In this study, a long-term (18-month) recorded GPS data set from Puget Sound Regional Council is used. The experiment’s result is very promising. In conclusion, the sensitivity analysis shows that as the number of tours thresholds increases, the variability of most travel behavior features converges, while the prediction performance may not change for the fixed test data.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhu, Lei; Gonder, Jeffrey; Lin, Lei
With the development of and advances in smartphones and global positioning system (GPS) devices, travelers’ long-term travel behaviors are not impossible to obtain. This study investigates the pattern of individual travel behavior and its correlation with social-demographic features. For different social-demographic groups (e.g., full-time employees and students), the individual travel behavior may have specific temporal-spatial-mobile constraints. The study first extracts the home-based tours, including Home-to-Home and Home-to-Non-Home, from long-term raw GPS data. The travel behavior pattern is then delineated by home-based tour features, such as departure time, destination location entropy, travel time, and driving time ratio. The travel behavior variabilitymore » describes the variances of travelers’ activity behavior features for an extended period. After that, the variability pattern of an individual’s travel behavior is used for estimating the individual’s social-demographic information, such as social-demographic role, by a supervised learning approach, support vector machine. In this study, a long-term (18-month) recorded GPS data set from Puget Sound Regional Council is used. The experiment’s result is very promising. In conclusion, the sensitivity analysis shows that as the number of tours thresholds increases, the variability of most travel behavior features converges, while the prediction performance may not change for the fixed test data.« less
Robust Feature Selection Technique using Rank Aggregation.
Sarkar, Chandrima; Cooley, Sarah; Srivastava, Jaideep
2014-01-01
Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique which produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases while classifiers exploit different statistical properties of data for evaluation. In numerous situations this can put researchers into dilemma as to which feature selection method and a classifiers to choose from a vast range of choices. In this paper, we propose a technique that aggregates the consensus properties of various feature selection methods to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable towards achieving similar and ideally higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the Robustness Index (RI). We perform an extensive empirical evaluation of our technique on eight data sets with different dimensions including Arrythmia, Lung Cancer, Madelon, mfeat-fourier, internet-ads, Leukemia-3c and Embryonal Tumor and a real world data set namely Acute Myeloid Leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that compared to other techniques our algorithm improves the classification accuracy by approximately 3-4% (in data set with less than 500 features) and by more than 5% (in data set with more than 500 features), across a wide range of classifiers.
Improving reticle defect disposition via fully automated lithography simulation
NASA Astrophysics Data System (ADS)
Mann, Raunak; Goodman, Eliot; Lao, Keith; Ha, Steven; Vacca, Anthony; Fiekowsky, Peter; Fiekowsky, Dan
2016-03-01
Most advanced wafer fabs have embraced complex pattern decoration, which creates numerous challenges during in-fab reticle qualification. These optical proximity correction (OPC) techniques create assist features that tend to be very close in size and shape to the main patterns as seen in Figure 1. A small defect on an assist feature will most likely have little or no impact on the fidelity of the wafer image, whereas the same defect on a main feature could significantly decrease device functionality. In order to properly disposition these defects, reticle inspection technicians need an efficient method that automatically separates main from assist features and predicts the resulting defect impact on the wafer image. Analysis System (ADAS) defect simulation system[1]. Up until now, using ADAS simulation was limited to engineers due to the complexity of the settings that need to be manually entered in order to create an accurate result. A single error in entering one of these values can cause erroneous results, therefore full automation is necessary. In this study, we propose a new method where all needed simulation parameters are automatically loaded into ADAS. This is accomplished in two parts. First we have created a scanner parameter database that is automatically identified from mask product and level names. Second, we automatically determine the appropriate simulation printability threshold by using a new reference image (provided by the inspection tool) that contains a known measured value of the reticle critical dimension (CD). This new method automatically loads the correct scanner conditions, sets the appropriate simulation threshold, and automatically measures the percentage of CD change caused by the defect. This streamlines qualification and reduces the number of reticles being put on hold, waiting for engineer review. We also present data showing the consistency and reliability of the new method, along with the impact on the efficiency of in-fab reticle qualification.
Vision-Based UAV Flight Control and Obstacle Avoidance
2006-01-01
denoted it by Vb = (Vb1, Vb2 , Vb3). Fig. 2 shows the block diagram of the proposed vision-based motion analysis and obstacle avoidance system. We denote...structure analysis often involve computation- intensive computer vision tasks, such as feature extraction and geometric modeling. Computation-intensive...First, we extract a set of features from each block. 2) Second, we compute the distance between these two sets of features. In conventional motion
Non-specific filtering of beta-distributed data.
Wang, Xinhui; Laird, Peter W; Hinoue, Toshinori; Groshen, Susan; Siegmund, Kimberly D
2014-06-19
Non-specific feature selection is a dimension reduction procedure performed prior to cluster analysis of high dimensional molecular data. Not all measured features are expected to show biological variation, so only the most varying are selected for analysis. In DNA methylation studies, DNA methylation is measured as a proportion, bounded between 0 and 1, with variance a function of the mean. Filtering on standard deviation biases the selection of probes to those with mean values near 0.5. We explore the effect this has on clustering, and develop alternate filter methods that utilize a variance stabilizing transformation for Beta distributed data and do not share this bias. We compared results for 11 different non-specific filters on eight Infinium HumanMethylation data sets, selected to span a variety of biological conditions. We found that for data sets having a small fraction of samples showing abnormal methylation of a subset of normally unmethylated CpGs, a characteristic of the CpG island methylator phenotype in cancer, a novel filter statistic that utilized a variance-stabilizing transformation for Beta distributed data outperformed the common filter of using standard deviation of the DNA methylation proportion, or its log-transformed M-value, in its ability to detect the cancer subtype in a cluster analysis. However, the standard deviation filter always performed among the best for distinguishing subgroups of normal tissue. The novel filter and standard deviation filter tended to favour features in different genome contexts; for the same data set, the novel filter always selected more features from CpG island promoters and the standard deviation filter always selected more features from non-CpG island intergenic regions. Interestingly, despite selecting largely non-overlapping sets of features, the two filters did find sample subsets that overlapped for some real data sets. We found two different filter statistics that tended to prioritize features with different characteristics, each performed well for identifying clusters of cancer and non-cancer tissue, and identifying a cancer CpG island hypermethylation phenotype. Since cluster analysis is for discovery, we would suggest trying both filters on any new data sets, evaluating the overlap of features selected and clusters discovered.
Visual Saliency Detection Based on Multiscale Deep CNN Features.
Guanbin Li; Yizhou Yu
2016-11-01
Visual saliency is a fundamental problem in both cognitive and computational sciences, including computer vision. In this paper, we discover that a high-quality visual saliency model can be learned from multiscale features extracted using deep convolutional neural networks (CNNs), which have had many successes in visual recognition tasks. For learning such saliency models, we introduce a neural network architecture, which has fully connected layers on top of CNNs responsible for feature extraction at three different scales. The penultimate layer of our neural network has been confirmed to be a discriminative high-level feature vector for saliency detection, which we call deep contrast feature. To generate a more robust feature, we integrate handcrafted low-level features with our deep contrast feature. To promote further research and evaluation of visual saliency models, we also construct a new large database of 4447 challenging images and their pixelwise saliency annotations. Experimental results demonstrate that our proposed method is capable of achieving the state-of-the-art performance on all public benchmarks, improving the F-measure by 6.12% and 10%, respectively, on the DUT-OMRON data set and our new data set (HKU-IS), and lowering the mean absolute error by 9% and 35.3%, respectively, on these two data sets.
Roles and Responsibilities in Feature Teams
NASA Astrophysics Data System (ADS)
Eckstein, Jutta
Agile development requires self-organizing teams. The set-up of a (feature) team has to enable self-organization. Special care has to be taken if the project is not only distributed, but also large and more than one feature team is involved. Every feature team needs in such a setting a product owner who ensures the continuous focus on business delivery. The product owners collaborate by working together in a virtual team. Each feature team is supported by a coach who ensures not only the agile process of the individual feature team but also across all feature teams. An architect (or if necessary a team of architects) takes care that the system is technically sound. Contrariwise to small co-located projects, large global projects require a project manager who deals with—among other things—internal and especially external politics.
NASA Astrophysics Data System (ADS)
Jaenisch, Holger M.; Handley, James W.
2010-04-01
Malware are analogs of viruses. Viruses are comprised of large numbers of polypeptide proteins. The shape and function of the protein strands determines the functionality of the segment, similar to a subroutine in malware. The full combination of subroutines is the malware organism, in analogous fashion as a collection of polypeptides forms protein structures that are information bearing. We propose to apply the methods of Bioinformatics to analyze malware to provide a rich feature set for creating a unique and novel detection and classification scheme that is originally applied to Bioinformatics amino acid sequencing. Our proposed methods enable real time in situ (in contrast to in vivo) detection applications.
JBrowse: a dynamic web platform for genome visualization and analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buels, Robert; Yao, Eric; Diesh, Colin M.
JBrowse is a fast and full-featured genome browser built with JavaScript and HTML5. It is easily embedded into websites or apps but can also be served as a standalone web page. Overall improvements to speed and scalability are accompanied by specific enhancements that support complex interactive queries on large track sets. Analysis functions can readily be added using the plugin framework; most visual aspects of tracks can also be customized, along with clicks, mouseovers, menus, and popup boxes. JBrowse can also be used to browse local annotation files offline and to generate high-resolution figures for publication. JBrowse is a maturemore » web application suitable for genome visualization and analysis.« less
Workshop discusses community models for coastal sediment transport
NASA Astrophysics Data System (ADS)
Sherwood, Christopher R.; Signell, Richard P.; Harris, Courtney K.; Butman, Bradford
Numerical models of coastal sediment transport are increasingly used to address problems ranging from remediation of contaminated sediments, to siting of sewage outfalls and disposal sites, to evaluating impacts of coastal development. They are also used as a test bed for sediment-transport algorithms, to provide realistic settings for biological and geochemical models, and for a variety of other research, both fundamental and applied. However, there are few full-featured, publicly available coastal sediment-transport models, and fewer still that are well tested and have been widely applied.This was the motivation for a workshop in Woods Hole, Massachusetts, on June 22-23, 2000, that explored the establishment of community models for coastal sediment-transport processes.
JBrowse: a dynamic web platform for genome visualization and analysis.
Buels, Robert; Yao, Eric; Diesh, Colin M; Hayes, Richard D; Munoz-Torres, Monica; Helt, Gregg; Goodstein, David M; Elsik, Christine G; Lewis, Suzanna E; Stein, Lincoln; Holmes, Ian H
2016-04-12
JBrowse is a fast and full-featured genome browser built with JavaScript and HTML5. It is easily embedded into websites or apps but can also be served as a standalone web page. Overall improvements to speed and scalability are accompanied by specific enhancements that support complex interactive queries on large track sets. Analysis functions can readily be added using the plugin framework; most visual aspects of tracks can also be customized, along with clicks, mouseovers, menus, and popup boxes. JBrowse can also be used to browse local annotation files offline and to generate high-resolution figures for publication. JBrowse is a mature web application suitable for genome visualization and analysis.
Sneutrino driven GUT inflation in supergravity
NASA Astrophysics Data System (ADS)
Gonzalo, Tomás E.; Heurtier, Lucien; Moursy, Ahmad
2017-06-01
In this paper, we embed the model of flipped GUT sneutrino inflation — in a flipped SU(5) or SO(10) set up — developed by Ellis et al. in a supergravity framework. The GUT symmetry is broken by a waterfall which could happen at early or late stage of the inflationary period. The full field dynamics is thus studied in detail and these two main inflationary configurations are exposed, whose cosmological predictions are both in agreement with recent astrophysical measurements. The model has an interesting feature where the inflaton has natural decay channels to the MSSM particles allowed by the GUT gauge symmetry. Hence it can account for the reheating after the inflationary epoch.
Plasmon confinement in fractal quantum systems
NASA Astrophysics Data System (ADS)
Westerhout, Tom; van Veen, Edo; Katsnelson, Mikhail I.; Yuan, Shengjun
2018-05-01
Recent progress in the fabrication of materials has made it possible to create arbitrary nonperiodic two-dimensional structures in the quantum plasmon regime. This paves the way for exploring the quantum plasmonic properties of electron gases in complex geometries. In this work we study systems with a fractal dimension. We calculate the full dielectric functions of two prototypical fractals with different ramification numbers, namely the Sierpinski carpet and gasket. We show that the Sierpinski carpet has a dispersion comparable to a square lattice, but the Sierpinski gasket features highly localized plasmon modes with a flat dispersion. This strong plasmon confinement in finitely ramified fractals can provide a novel setting for manipulating light at the quantum level.
NASA Astrophysics Data System (ADS)
Näsi, R.; Viljanen, N.; Oliveira, R.; Kaivosoja, J.; Niemeläinen, O.; Hakala, T.; Markelin, L.; Nezami, S.; Suomalainen, J.; Honkavaara, E.
2018-04-01
Light-weight 2D format hyperspectral imagers operable from unmanned aerial vehicles (UAV) have become common in various remote sensing tasks in recent years. Using these technologies, the area of interest is covered by multiple overlapping hypercubes, in other words multiview hyperspectral photogrammetric imagery, and each object point appears in many, even tens of individual hypercubes. The common practice is to calculate hyperspectral orthomosaics utilizing only the most nadir areas of the images. However, the redundancy of the data gives potential for much more versatile and thorough feature extraction. We investigated various options of extracting spectral features in the grass sward quantity evaluation task. In addition to the various sets of spectral features, we used photogrammetry-based ultra-high density point clouds to extract features describing the canopy 3D structure. Machine learning technique based on the Random Forest algorithm was used to estimate the fresh biomass. Results showed high accuracies for all investigated features sets. The estimation results using multiview data provided approximately 10 % better results than the most nadir orthophotos. The utilization of the photogrammetric 3D features improved estimation accuracy by approximately 40 % compared to approaches where only spectral features were applied. The best estimation RMSE of 239 kg/ha (6.0 %) was obtained with multiview anisotropy corrected data set and the 3D features.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Velazquez, E Rios; Narayan, V; Grossmann, P
2015-06-15
Purpose: To compare the complementary prognostic value of automated Radiomic features to that of radiologist-annotated VASARI features in TCGA-GBM MRI dataset. Methods: For 96 GBM patients, pre-operative MRI images were obtained from The Cancer Imaging Archive. The abnormal tumor bulks were manually defined on post-contrast T1w images. The contrast-enhancing and necrotic regions were segmented using FAST. From these sub-volumes and the total abnormal tumor bulk, a set of Radiomic features quantifying phenotypic differences based on the tumor intensity, shape and texture, were extracted from the post-contrast T1w images. Minimum-redundancy-maximum-relevance (MRMR) was used to identify the most informative Radiomic, VASARI andmore » combined Radiomic-VASARI features in 70% of the dataset (training-set). Multivariate Cox-proportional hazards models were evaluated in 30% of the dataset (validation-set) using the C-index for OS. A bootstrap procedure was used to assess significance while comparing the C-Indices of the different models. Results: Overall, the Radiomic features showed a moderate correlation with the radiologist-annotated VASARI features (r = −0.37 – 0.49); however that correlation was stronger for the Tumor Diameter and Proportion of Necrosis VASARI features (r = −0.71 – 0.69). After MRMR feature selection, the best-performing Radiomic, VASARI, and Radiomic-VASARI Cox-PH models showed a validation C-index of 0.56 (p = NS), 0.58 (p = NS) and 0.65 (p = 0.01), respectively. The combined Radiomic-VASARI model C-index was significantly higher than that obtained from either the Radiomic or VASARI model alone (p = <0.001). Conclusion: Quantitative volumetric and textural Radiomic features complement the qualitative and semi-quantitative annotated VASARI feature set. The prognostic value of informative qualitative VASARI features such as Eloquent Brain and Multifocality is increased with the addition of quantitative volumetric and textural features from the contrast-enhancing and necrotic tumor regions. These results should be further evaluated in larger validation cohorts.« less
Exact partition functions for the Ω-deformed {N}={2}^{ast } SU(2) gauge theory
NASA Astrophysics Data System (ADS)
Beccaria, Matteo; Macorini, Guido
2016-07-01
We study the low energy effective action of the Ω-deformed {N}={2}^{ast } SU(2) gauge theory. It depends on the deformation parameters ɛ 1, ɛ 2, the scalar field expectation value a, and the hypermultiplet mass m. We explore the plane (m/ɛ_1,ɛ_2/ɛ_1) looking for special features in the multi-instanton contributions to the prepotential, motivated by what happens in the Nekrasov-Shatashvili limit ɛ 2 → 0. We propose a simple condition on the structure of poles of the k-instanton prepotential and show that it is admissible at a finite set of points in the above plane. At these special points, the prepotential has poles at fixed positions independent on the instanton number. Besides and remarkably, both the instanton partition function and the full prepotential, including the perturbative contribution, may be given in closed form as functions of the scalar expectation value a and the modular parameter q appearing in special combinations of Eisenstein series and Dedekind η function. As a byproduct, the modular anomaly equation can be tested at all orders at these points. We discuss these special features from the point of view of the AGT correspondence and provide explicit toroidal 1-blocks in non-trivial closed form. The full list of solutions with 1, 2, 3, and 4 poles is determined and described in details.
Interactive Visualization to Advance Earthquake Simulation
NASA Astrophysics Data System (ADS)
Kellogg, Louise H.; Bawden, Gerald W.; Bernardin, Tony; Billen, Magali; Cowgill, Eric; Hamann, Bernd; Jadamec, Margarete; Kreylos, Oliver; Staadt, Oliver; Sumner, Dawn
2008-04-01
The geological sciences are challenged to manage and interpret increasing volumes of data as observations and simulations increase in size and complexity. For example, simulations of earthquake-related processes typically generate complex, time-varying data sets in two or more dimensions. To facilitate interpretation and analysis of these data sets, evaluate the underlying models, and to drive future calculations, we have developed methods of interactive visualization with a special focus on using immersive virtual reality (VR) environments to interact with models of Earth’s surface and interior. Virtual mapping tools allow virtual “field studies” in inaccessible regions. Interactive tools allow us to manipulate shapes in order to construct models of geological features for geodynamic models, while feature extraction tools support quantitative measurement of structures that emerge from numerical simulation or field observations, thereby enabling us to improve our interpretation of the dynamical processes that drive earthquakes. VR has traditionally been used primarily as a presentation tool, albeit with active navigation through data. Reaping the full intellectual benefits of immersive VR as a tool for scientific analysis requires building on the method’s strengths, that is, using both 3D perception and interaction with observed or simulated data. This approach also takes advantage of the specialized skills of geological scientists who are trained to interpret, the often limited, geological and geophysical data available from field observations.
Korean standard nuclear plant ex-vessel neutron dosimetry program Ulchin 4
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duo, J.I.; Chen, J.; Kulesza, J.A.
2011-07-01
A comprehensive ex-vessel neutron dosimetry (EVND) surveillance program has been deployed in 16 pressurized water reactors (PWR) in South Korea and EVND dosimetry sets have already been installed and analyzed in Westinghouse reactor designs. In this paper, the unique features of the design, training, and installation in the Korean standard nuclear plant (KSNP) Ulchin Unit 4 are presented. Ulchin Unit 4 Cycle 9 represents the first dosimetry analyzed from the EVND design deployed in KSNP plants: Yonggwang Units 3 through 6 and Ulchin Units 3 through 6. KSNP's cavity configuration precludes a conventional installation from the cavity floor. The solution,more » requiring the installation crew to access the cavity at an elevation of the active core, places a premium on rapid installation due to high area dose rates. Numerous geometrical features warranted the use of a detailed design in true 3D mechanical design software to control interferences. A full-size training mockup maximized the crew ability to correctly install the instrument in minimum time. The analysis of the first dosimetry set shows good agreements between measurement and calculation within the associated uncertainties. A complete EVND system has been successfully designed, installed, and analyzed for a KNSP plant. Current and future EVND analyses will continue supporting the successful operation of PWR units in South Korea. (authors)« less
Tbahriti, Imad; Chichester, Christine; Lisacek, Frédérique; Ruch, Patrick
2006-06-01
The aim of this study is to investigate the relationships between citations and the scientific argumentation found abstracts. We design a related article search task and observe how the argumentation can affect the search results. We extracted citation lists from a set of 3200 full-text papers originating from a narrow domain. In parallel, we recovered the corresponding MEDLINE records for analysis of the argumentative moves. Our argumentative model is founded on four classes: PURPOSE, METHODS, RESULTS and CONCLUSION. A Bayesian classifier trained on explicitly structured MEDLINE abstracts generates these argumentative categories. The categories are used to generate four different argumentative indexes. A fifth index contains the complete abstract, together with the title and the list of Medical Subject Headings (MeSH) terms. To appraise the relationship of the moves to the citations, the citation lists were used as the criteria for determining relatedness of articles, establishing a benchmark; it means that two articles are considered as "related" if they share a significant set of co-citations. Our results show that the average precision of queries with the PURPOSE and CONCLUSION features is the highest, while the precision of the RESULTS and METHODS features was relatively low. A linear weighting combination of the moves is proposed, which significantly improves retrieval of related articles.
GDSCalc: A Web-Based Application for Evaluating Discrete Graph Dynamical Systems
Elmeligy Abdelhamid, Sherif H.; Kuhlman, Chris J.; Marathe, Madhav V.; Mortveit, Henning S.; Ravi, S. S.
2015-01-01
Discrete dynamical systems are used to model various realistic systems in network science, from social unrest in human populations to regulation in biological networks. A common approach is to model the agents of a system as vertices of a graph, and the pairwise interactions between agents as edges. Agents are in one of a finite set of states at each discrete time step and are assigned functions that describe how their states change based on neighborhood relations. Full characterization of state transitions of one system can give insights into fundamental behaviors of other dynamical systems. In this paper, we describe a discrete graph dynamical systems (GDSs) application called GDSCalc for computing and characterizing system dynamics. It is an open access system that is used through a web interface. We provide an overview of GDS theory. This theory is the basis of the web application; i.e., an understanding of GDS provides an understanding of the software features, while abstracting away implementation details. We present a set of illustrative examples to demonstrate its use in education and research. Finally, we compare GDSCalc with other discrete dynamical system software tools. Our perspective is that no single software tool will perform all computations that may be required by all users; tools typically have particular features that are more suitable for some tasks. We situate GDSCalc within this space of software tools. PMID:26263006
GDSCalc: A Web-Based Application for Evaluating Discrete Graph Dynamical Systems.
Elmeligy Abdelhamid, Sherif H; Kuhlman, Chris J; Marathe, Madhav V; Mortveit, Henning S; Ravi, S S
2015-01-01
Discrete dynamical systems are used to model various realistic systems in network science, from social unrest in human populations to regulation in biological networks. A common approach is to model the agents of a system as vertices of a graph, and the pairwise interactions between agents as edges. Agents are in one of a finite set of states at each discrete time step and are assigned functions that describe how their states change based on neighborhood relations. Full characterization of state transitions of one system can give insights into fundamental behaviors of other dynamical systems. In this paper, we describe a discrete graph dynamical systems (GDSs) application called GDSCalc for computing and characterizing system dynamics. It is an open access system that is used through a web interface. We provide an overview of GDS theory. This theory is the basis of the web application; i.e., an understanding of GDS provides an understanding of the software features, while abstracting away implementation details. We present a set of illustrative examples to demonstrate its use in education and research. Finally, we compare GDSCalc with other discrete dynamical system software tools. Our perspective is that no single software tool will perform all computations that may be required by all users; tools typically have particular features that are more suitable for some tasks. We situate GDSCalc within this space of software tools.
Task representation in individual and joint settings
Prinz, Wolfgang
2015-01-01
This paper outlines a framework for task representation and discusses applications to interference tasks in individual and joint settings. The framework is derived from the Theory of Event Coding (TEC). This theory regards task sets as transient assemblies of event codes in which stimulus and response codes interact and shape each other in particular ways. On the one hand, stimulus and response codes compete with each other within their respective subsets (horizontal interactions). On the other hand, stimulus and response code cooperate with each other (vertical interactions). Code interactions instantiating competition and cooperation apply to two time scales: on-line performance (i.e., doing the task) and off-line implementation (i.e., setting the task). Interference arises when stimulus and response codes overlap in features that are irrelevant for stimulus identification, but relevant for response selection. To resolve this dilemma, the feature profiles of event codes may become restructured in various ways. The framework is applied to three kinds of interference paradigms. Special emphasis is given to joint settings where tasks are shared between two participants. Major conclusions derived from these applications include: (1) Response competition is the chief driver of interference. Likewise, different modes of response competition give rise to different patterns of interference; (2) The type of features in which stimulus and response codes overlap is also a crucial factor. Different types of such features give likewise rise to different patterns of interference; and (3) Task sets for joint settings conflate intraindividual conflicts between responses (what), with interindividual conflicts between responding agents (whom). Features of response codes may, therefore, not only address responses, but also responding agents (both physically and socially). PMID:26029085
Peer-Based Social Media Features in Behavior Change Interventions: Systematic Review.
Elaheebocus, Sheik Mohammad Roushdat Ally; Weal, Mark; Morrison, Leanne; Yardley, Lucy
2018-02-22
Incorporating social media features into digital behavior change interventions (DBCIs) has the potential to contribute positively to their success. However, the lack of clear design principles to describe and guide the use of these features in behavioral interventions limits cross-study comparisons of their uses and effects. The aim of this study was to provide a systematic review of DBCIs targeting modifiable behavioral risk factors that have included social media features as part of their intervention infrastructure. A taxonomy of social media features is presented to inform the development, description, and evaluation of behavioral interventions. Search terms were used in 8 databases to identify DBCIs that incorporated social media features and targeted tobacco smoking, diet and nutrition, physical activities, or alcohol consumption. The screening and review process was performed by 2 independent researchers. A total of 5264 articles were screened, and 143 articles describing a total of 134 studies were retained for full review. The majority of studies (70%) reported positive outcomes, followed by 28% finding no effects with regard to their respective objectives and hypothesis, and 2% of the studies found that their interventions had negative outcomes. Few studies reported on the association between the inclusion of social media features and intervention effect. A taxonomy of social media features used in behavioral interventions has been presented with 36 social media features organized under 7 high-level categories. The taxonomy has been used to guide the analysis of this review. Although social media features are commonly included in DBCIs, there is an acute lack of information with respect to their effect on outcomes and a lack of clear guidance to inform the selection process based on the features' suitability for the different behaviors. The proposed taxonomy along with the set of recommendations included in this review will support future research aimed at isolating and reporting the effects of social media features on DBCIs, cross-study comparisons, and evaluations. ©Sheik Mohammad Roushdat Ally Elaheebocus, Mark Weal, Leanne Morrison, Lucy Yardley. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 22.02.2018.
Automatic detection of solar features in HSOS full-disk solar images using guided filter
NASA Astrophysics Data System (ADS)
Yuan, Fei; Lin, Jiaben; Guo, Jingjing; Wang, Gang; Tong, Liyue; Zhang, Xinwei; Wang, Bingxiang
2018-02-01
A procedure is introduced for the automatic detection of solar features using full-disk solar images from Huairou Solar Observing Station (HSOS), National Astronomical Observatories of China. In image preprocessing, median filter is applied to remove the noises. Guided filter is adopted to enhance the edges of solar features and restrain the solar limb darkening, which is first introduced into the astronomical target detection. Then specific features are detected by Otsu algorithm and further threshold processing technique. Compared with other automatic detection procedures, our procedure has some advantages such as real time and reliability as well as no need of local threshold. Also, it reduces the amount of computation largely, which is benefited from the efficient guided filter algorithm. The procedure has been tested on one month sequences (December 2013) of HSOS full-disk solar images and the result shows that the number of features detected by our procedure is well consistent with the manual one.
Khaligh-Razavi, Seyed-Mahdi; Henriksson, Linda; Kay, Kendrick; Kriegeskorte, Nikolaus
2017-02-01
Studies of the primate visual system have begun to test a wide range of complex computational object-vision models. Realistic models have many parameters, which in practice cannot be fitted using the limited amounts of brain-activity data typically available. Task performance optimization (e.g. using backpropagation to train neural networks) provides major constraints for fitting parameters and discovering nonlinear representational features appropriate for the task (e.g. object classification). Model representations can be compared to brain representations in terms of the representational dissimilarities they predict for an image set. This method, called representational similarity analysis (RSA), enables us to test the representational feature space as is (fixed RSA) or to fit a linear transformation that mixes the nonlinear model features so as to best explain a cortical area's representational space (mixed RSA). Like voxel/population-receptive-field modelling, mixed RSA uses a training set (different stimuli) to fit one weight per model feature and response channel (voxels here), so as to best predict the response profile across images for each response channel. We analysed response patterns elicited by natural images, which were measured with functional magnetic resonance imaging (fMRI). We found that early visual areas were best accounted for by shallow models, such as a Gabor wavelet pyramid (GWP). The GWP model performed similarly with and without mixing, suggesting that the original features already approximated the representational space, obviating the need for mixing. However, a higher ventral-stream visual representation (lateral occipital region) was best explained by the higher layers of a deep convolutional network and mixing of its feature set was essential for this model to explain the representation. We suspect that mixing was essential because the convolutional network had been trained to discriminate a set of 1000 categories, whose frequencies in the training set did not match their frequencies in natural experience or their behavioural importance. The latter factors might determine the representational prominence of semantic dimensions in higher-level ventral-stream areas. Our results demonstrate the benefits of testing both the specific representational hypothesis expressed by a model's original feature space and the hypothesis space generated by linear transformations of that feature space.
Emotional recognition from the speech signal for a virtual education agent
NASA Astrophysics Data System (ADS)
Tickle, A.; Raghu, S.; Elshaw, M.
2013-06-01
This paper explores the extraction of features from the speech wave to perform intelligent emotion recognition. A feature extract tool (openSmile) was used to obtain a baseline set of 998 acoustic features from a set of emotional speech recordings from a microphone. The initial features were reduced to the most important ones so recognition of emotions using a supervised neural network could be performed. Given that the future use of virtual education agents lies with making the agents more interactive, developing agents with the capability to recognise and adapt to the emotional state of humans is an important step.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Honorio, J.; Goldstein, R.; Honorio, J.
We propose a simple, well grounded classification technique which is suited for group classification on brain fMRI data sets that have high dimensionality, small number of subjects, high noise level, high subject variability, imperfect registration and capture subtle cognitive effects. We propose threshold-split region as a new feature selection method and majority voteas the classification technique. Our method does not require a predefined set of regions of interest. We use average acros ssessions, only one feature perexperimental condition, feature independence assumption, and simple classifiers. The seeming counter-intuitive approach of using a simple design is supported by signal processing and statisticalmore » theory. Experimental results in two block design data sets that capture brain function under distinct monetary rewards for cocaine addicted and control subjects, show that our method exhibits increased generalization accuracy compared to commonly used feature selection and classification techniques.« less
49 CFR 236.746 - Feature, restoring.
Code of Federal Regulations, 2010 CFR
2010-10-01
... Feature, restoring. An arrangement on an electro-pneumatic switch by means of which power is applied to restore the switch movement to full normal or to full reverse position, before the driving bar creeps sufficiently to unlock the switch, with control level in normal or reverse position. [49 FR 3388, Jan. 26, 1984] ...
Latha, Manohar; Kavitha, Ganesan
2018-02-03
Schizophrenia (SZ) is a psychiatric disorder that especially affects individuals during their adolescence. There is a need to study the subanatomical regions of SZ brain on magnetic resonance images (MRI) based on morphometry. In this work, an attempt was made to analyze alterations in structure and texture patterns in images of the SZ brain using the level-set method and Laws texture features. T1-weighted MRI of the brain from Center of Biomedical Research Excellence (COBRE) database were considered for analysis. Segmentation was carried out using the level-set method. Geometrical and Laws texture features were extracted from the segmented brain stem, corpus callosum, cerebellum, and ventricle regions to analyze pattern changes in SZ. The level-set method segmented multiple brain regions, with higher similarity and correlation values compared with an optimized method. The geometric features obtained from regions of the corpus callosum and ventricle showed significant variation (p < 0.00001) between normal and SZ brain. Laws texture feature identified a heterogeneous appearance in the brain stem, corpus callosum and ventricular regions, and features from the brain stem were correlated with Positive and Negative Syndrome Scale (PANSS) score (p < 0.005). A framework of geometric and Laws texture features obtained from brain subregions can be used as a supplement for diagnosis of psychiatric disorders.
Optimizing methods for linking cinematic features to fMRI data.
Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia
2015-04-15
One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.
Visual search, visual streams, and visual architectures.
Green, M
1991-10-01
Most psychological, physiological, and computational models of early vision suggest that retinal information is divided into a parallel set of feature modules. The dominant theories of visual search assume that these modules form a "blackboard" architecture: a set of independent representations that communicate only through a central processor. A review of research shows that blackboard-based theories, such as feature-integration theory, cannot easily explain the existing data. The experimental evidence is more consistent with a "network" architecture, which stresses that: (1) feature modules are directly connected to one another, (2) features and their locations are represented together, (3) feature detection and integration are not distinct processing stages, and (4) no executive control process, such as focal attention, is needed to integrate features. Attention is not a spotlight that synthesizes objects from raw features. Instead, it is better to conceptualize attention as an aperture which masks irrelevant visual information.
Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods.
Qu, Kaiyang; Han, Ke; Wu, Song; Wang, Guohua; Wei, Leyi
2017-09-22
DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.
An ensemble method for extracting adverse drug events from social media.
Liu, Jing; Zhao, Songzheng; Zhang, Xiaodi
2016-06-01
Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2.0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media. We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i.e., subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i.e., majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i.e., Twitter). When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78.6%, 72.2%, and 79.2% on the three data sets. When individual methods are used, we attain the best AUC values of 82.1%, 73.2%, and 77.0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84.5%, 77.3%, and 84.5% on the three data sets, outperforming the baselines. Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness. Copyright © 2016 Elsevier B.V. All rights reserved.
Nixon, Mark S.; Komogortsev, Oleg V.
2017-01-01
We introduce the intraclass correlation coefficient (ICC) to the biometric community as an index of the temporal persistence, or stability, of a single biometric feature. It requires, as input, a feature on an interval or ratio scale, and which is reasonably normally distributed, and it can only be calculated if each subject is tested on 2 or more occasions. For a biometric system, with multiple features available for selection, the ICC can be used to measure the relative stability of each feature. We show, for 14 distinct data sets (1 synthetic, 8 eye-movement-related, 2 gait-related, and 2 face-recognition-related, and one brain-structure-related), that selecting the most stable features, based on the ICC, resulted in the best biometric performance generally. Analyses based on using only the most stable features produced superior Rank-1-Identification Rate (Rank-1-IR) performance in 12 of 14 databases (p = 0.0065, one-tailed), when compared to other sets of features, including the set of all features. For Equal Error Rate (EER), using a subset of only high-ICC features also produced superior performance in 12 of 14 databases (p = 0. 0065, one-tailed). In general, then, for our databases, prescreening potential biometric features, and choosing only highly reliable features yields better performance than choosing lower ICC features or than choosing all features combined. We also determined that, as the ICC of a group of features increases, the median of the genuine similarity score distribution increases and the spread of this distribution decreases. There was no statistically significant similar relationships for the impostor distributions. We believe that the ICC will find many uses in biometric research. In case of the eye movement-driven biometrics, the use of reliable features, as measured by ICC, allowed to us achieve the authentication performance with EER = 2.01%, which was not possible before. PMID:28575030
Friedman, Lee; Nixon, Mark S; Komogortsev, Oleg V
2017-01-01
We introduce the intraclass correlation coefficient (ICC) to the biometric community as an index of the temporal persistence, or stability, of a single biometric feature. It requires, as input, a feature on an interval or ratio scale, and which is reasonably normally distributed, and it can only be calculated if each subject is tested on 2 or more occasions. For a biometric system, with multiple features available for selection, the ICC can be used to measure the relative stability of each feature. We show, for 14 distinct data sets (1 synthetic, 8 eye-movement-related, 2 gait-related, and 2 face-recognition-related, and one brain-structure-related), that selecting the most stable features, based on the ICC, resulted in the best biometric performance generally. Analyses based on using only the most stable features produced superior Rank-1-Identification Rate (Rank-1-IR) performance in 12 of 14 databases (p = 0.0065, one-tailed), when compared to other sets of features, including the set of all features. For Equal Error Rate (EER), using a subset of only high-ICC features also produced superior performance in 12 of 14 databases (p = 0. 0065, one-tailed). In general, then, for our databases, prescreening potential biometric features, and choosing only highly reliable features yields better performance than choosing lower ICC features or than choosing all features combined. We also determined that, as the ICC of a group of features increases, the median of the genuine similarity score distribution increases and the spread of this distribution decreases. There was no statistically significant similar relationships for the impostor distributions. We believe that the ICC will find many uses in biometric research. In case of the eye movement-driven biometrics, the use of reliable features, as measured by ICC, allowed to us achieve the authentication performance with EER = 2.01%, which was not possible before.
Shao, Feng; Li, Kemeng; Lin, Weisi; Jiang, Gangyi; Yu, Mei; Dai, Qionghai
2015-10-01
Quality assessment of 3D images encounters more challenges than its 2D counterparts. Directly applying 2D image quality metrics is not the solution. In this paper, we propose a new full-reference quality assessment for stereoscopic images by learning binocular receptive field properties to be more in line with human visual perception. To be more specific, in the training phase, we learn a multiscale dictionary from the training database, so that the latent structure of images can be represented as a set of basis vectors. In the quality estimation phase, we compute sparse feature similarity index based on the estimated sparse coefficient vectors by considering their phase difference and amplitude difference, and compute global luminance similarity index by considering luminance changes. The final quality score is obtained by incorporating binocular combination based on sparse energy and sparse complexity. Experimental results on five public 3D image quality assessment databases demonstrate that in comparison with the most related existing methods, the devised algorithm achieves high consistency with subjective assessment.
Aad, G.; Abbott, B.; Abdallah, J.; ...
2015-03-09
Dijet events produced in LHC proton-proton collisions at a center-of-mass energy \\(\\sqrt{s}=8\\) TeV are studied with the ATLAS detector using the full 2012 data set, with an integrated luminosity of 20.3 fb -1. Dijet masses up to about 4.5 TeV are probed. Resonancelike features are not observed in the dijet mass spectrum. Limits on the cross section times acceptance are set at the 95% credibility level for various hypotheses of new phenomena in terms of mass or energy scale, as appropriate. This analysis excludes excited quarks with a mass below 4.06 TeV, color-octet scalars with a mass below 2.70 TeV,more » heavy W' bosons with a mass below 2.45 TeV, chiral W* bosons with a mass below 1.75 TeV, and quantum black holes with six extra space-time dimensions with threshold mass below 5.66 TeV.« less
An improved nuclear mass model: FRDM (2012)
NASA Astrophysics Data System (ADS)
Moller, Peter
2011-10-01
We have developed an improved nuclear mass model which we plan to finalize in 2012, so we designate it FRDM(2012). Relative to our previous mass table in 1995 we do a full four-dimensional variation of the shape coordinates EPS2, EPS3, EPS4, and EPS6, we consider axial asymmetric shape degrees of freedom and we vary the density symmetry parameter L. Other additional features are also implemented. With respect to the Audi 2003 data base we now have an accuracy of 0.57 MeV. We have carefully tested the extrapolation properties of the new mass table by adjusting model parameters to limited data sets and testing on extended data sets and find it is highly reliable in new regions of nuclei. We discuss what the remaining differences between model calculations and experiment tell us about the limitations of the currently used effective single-particle potential and possible extensions. DOE No. DE-AC52-06NA25396.
NASA Astrophysics Data System (ADS)
Alloul, Adam; Christensen, Neil D.; Degrande, Céline; Duhr, Claude; Fuks, Benjamin
2014-06-01
The program FEYNRULES is a MATHEMATICA package developed to facilitate the implementation of new physics theories into high-energy physics tools. Starting from a minimal set of information such as the model gauge symmetries, its particle content, parameters and Lagrangian, FEYNRULES provides all necessary routines to extract automatically from the Lagrangian (that can also be computed semi-automatically for supersymmetric theories) the associated Feynman rules. These can be further exported to several Monte Carlo event generators through dedicated interfaces, as well as translated into a PYTHON library, under the so-called UFO model format, agnostic of the model complexity, especially in terms of Lorentz and/or color structures appearing in the vertices or of number of external legs. In this work, we briefly report on the most recent new features that have been added to FEYNRULES, including full support for spin-1 fermions, a new module allowing for the automated diagonalization of the particle spectrum and a new set of routines dedicated to decay width calculations.
Libertarian paternalism and health care policy: a deliberative proposal.
Schiavone, Giuseppe; De Anna, Gabriele; Mameli, Matteo; Rebba, Vincenzo; Boniolo, Giovanni
2014-02-01
Cass Sunstein and Richard Thaler have been arguing for what they named libertarian paternalism (henceforth LP). Their proposal generated extensive debate as to how and whether LP might lead down a full-blown paternalistic slippery slope. LP has the indubitable merit of having hardwired the best of the empirical psychological and sociological evidence into public and private policy making. It is unclear, though, to what extent the implementation of policies so constructed could enhance the capability for the exercise of an autonomous citizenship. Sunstein and Thaler submit it that in most of the cases in which one is confronted with a set of choices, some default option must be picked out. In those cases whoever devises the features of the set of options ought to rank them according to the moral principle of non-maleficence and possibly to that of beneficence. In this paper we argue that LP can be better implemented if there is a preliminary deliberative debate among the stakeholders that elicits their preferences, and makes it possible to rationally defend them.
Flowering time and seed dormancy control use external coincidence to generate life history strategy
Springthorpe, Vicki; Penfield, Steven
2015-01-01
Climate change is accelerating plant developmental transitions coordinated with the seasons in temperate environments. To understand the importance of these timing advances for a stable life history strategy, we constructed a full life cycle model of Arabidopsis thaliana. Modelling and field data reveal that a cryptic function of flowering time control is to limit seed set of winter annuals to an ambient temperature window which coincides with a temperature-sensitive switch in seed dormancy state. This coincidence is predicted to be conserved independent of climate at the expense of flowering date, suggesting that temperature control of flowering time has evolved to constrain seed set environment and therefore frequency of dormant and non-dormant seed states. We show that late flowering can disrupt this bet-hedging germination strategy. Our analysis shows that life history modelling can reveal hidden fitness constraints and identify non-obvious selection pressures as emergent features. DOI: http://dx.doi.org/10.7554/eLife.05557.001 PMID:25824056
Accurate double many-body expansion potential energy surface for the 2(1)A' state of N2O.
Li, Jing; Varandas, António J C
2014-08-28
An accurate double many-body expansion potential energy surface is reported for the 2(1)A' state of N2O. The new double many-body expansion (DMBE) form has been fitted to a wealth of ab initio points that have been calculated at the multi-reference configuration interaction level using the full-valence-complete-active-space wave function as reference and the cc-pVQZ basis set, and subsequently corrected semiempirically via double many-body expansion-scaled external correlation method to extrapolate the calculated energies to the limit of a complete basis set and, most importantly, the limit of an infinite configuration interaction expansion. The topographical features of the novel potential energy surface are then examined in detail and compared with corresponding attributes of other potential functions available in the literature. Exploratory trajectories have also been run on this DMBE form with the quasiclassical trajectory method, with the thermal rate constant so determined at room temperature significantly enhancing agreement with experimental data.
Nie, Zhi; Vairavan, Srinivasan; Narayan, Vaibhav A; Ye, Jieping; Li, Qingqin S
2018-01-01
Identification of risk factors of treatment resistance may be useful to guide treatment selection, avoid inefficient trial-and-error, and improve major depressive disorder (MDD) care. We extended the work in predictive modeling of treatment resistant depression (TRD) via partition of the data from the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) cohort into a training and a testing dataset. We also included data from a small yet completely independent cohort RIS-INT-93 as an external test dataset. We used features from enrollment and level 1 treatment (up to week 2 response only) of STAR*D to explore the feature space comprehensively and applied machine learning methods to model TRD outcome at level 2. For TRD defined using QIDS-C16 remission criteria, multiple machine learning models were internally cross-validated in the STAR*D training dataset and externally validated in both the STAR*D testing dataset and RIS-INT-93 independent dataset with an area under the receiver operating characteristic curve (AUC) of 0.70-0.78 and 0.72-0.77, respectively. The upper bound for the AUC achievable with the full set of features could be as high as 0.78 in the STAR*D testing dataset. Model developed using top 30 features identified using feature selection technique (k-means clustering followed by χ2 test) achieved an AUC of 0.77 in the STAR*D testing dataset. In addition, the model developed using overlapping features between STAR*D and RIS-INT-93, achieved an AUC of > 0.70 in both the STAR*D testing and RIS-INT-93 datasets. Among all the features explored in STAR*D and RIS-INT-93 datasets, the most important feature was early or initial treatment response or symptom severity at week 2. These results indicate that prediction of TRD prior to undergoing a second round of antidepressant treatment could be feasible even in the absence of biomarker data.
Machine Learning Feature Selection for Tuning Memory Page Swapping
2013-09-01
environments we set up. 13 Figure 4.1 Updated Feature Vector List. Features we added to the kernel are anno - tated with “(MLVM...Feb. 1966. [2] P. J . Denning, “The working set model for program behavior,” Communications of the ACM, vol. 11, no. 5, pp. 323–333, May 1968. [3] L. A...8] R. W. Cart and J . L. Hennessy, “WSClock — A simple and effective algorithm for virtual memory management,” M.S. thesis, Dept. Computer Science
Neuron’s eye view: Inferring features of complex stimuli from neural responses
Chen, Xin; Beck, Jeffrey M.
2017-01-01
Experiments that study neural encoding of stimuli at the level of individual neurons typically choose a small set of features present in the world—contrast and luminance for vision, pitch and intensity for sound—and assemble a stimulus set that systematically varies along these dimensions. Subsequent analysis of neural responses to these stimuli typically focuses on regression models, with experimenter-controlled features as predictors and spike counts or firing rates as responses. Unfortunately, this approach requires knowledge in advance about the relevant features coded by a given population of neurons. For domains as complex as social interaction or natural movement, however, the relevant feature space is poorly understood, and an arbitrary a priori choice of features may give rise to confirmation bias. Here, we present a Bayesian model for exploratory data analysis that is capable of automatically identifying the features present in unstructured stimuli based solely on neuronal responses. Our approach is unique within the class of latent state space models of neural activity in that it assumes that firing rates of neurons are sensitive to multiple discrete time-varying features tied to the stimulus, each of which has Markov (or semi-Markov) dynamics. That is, we are modeling neural activity as driven by multiple simultaneous stimulus features rather than intrinsic neural dynamics. We derive a fast variational Bayesian inference algorithm and show that it correctly recovers hidden features in synthetic data, as well as ground-truth stimulus features in a prototypical neural dataset. To demonstrate the utility of the algorithm, we also apply it to cluster neural responses and demonstrate successful recovery of features corresponding to monkeys and faces in the image set. PMID:28827790
Pinto, Anabela; Almeida, José Pedro; Pinto, Susana; Pereira, João; Oliveira, António Gouveia; de Carvalho, Mamede
2010-11-01
Non-invasive ventilation (NIV) is an efficient method for treating respiratory failure in patients with amyotrophic lateral sclerosis (ALS). However, it requires a process of adaptation not always achieved due to poor compliance. The role of telemonitoring of NIV is not yet established. To test the advantage of using modem communication in NIV of ALS patients. Prospective, single blinded controlled trial. Population and methods According to their residence, 40 consecutive ventilated ALS patients were assigned to one of two groups: a control group (G1, n=20) in which compliance and ventilator parameter settings were assessed during office visits; or an intervention group (G2, n=20) in which patients received a modem device connected to the ventilator. The number of office and emergency room visits and hospital admissions during the entire span of NIV use and the number of parameter setting changes to achieve full compliance were the primary outcome measurements. Demographic and clinical features were similar between the two groups at admission. No difference in compliance was found between the groups. The incidence of changes in parameter settings throughout the survival period with NIV was lower in G2 (p<0.0001) but it was increased during the initial period needed to achieve full compliance. The number of office or emergency room visits and inhospital admissions was significantly lower in G2 (p<0.0001). Survival showed a trend favouring G2 (p=0.13). This study shows that telemonitoring reduces health care utilisation with probable favourable implications on costs, survival and functional status.
Novel chromatin texture features for the classification of pap smears
NASA Astrophysics Data System (ADS)
Bejnordi, Babak E.; Moshavegh, Ramin; Sujathan, K.; Malm, Patrik; Bengtsson, Ewert; Mehnert, Andrew
2013-03-01
This paper presents a set of novel structural texture features for quantifying nuclear chromatin patterns in cells on a conventional Pap smear. The features are derived from an initial segmentation of the chromatin into bloblike texture primitives. The results of a comprehensive feature selection experiment, including the set of proposed structural texture features and a range of different cytology features drawn from the literature, show that two of the four top ranking features are structural texture features. They also show that a combination of structural and conventional features yields a classification performance of 0.954±0.019 (AUC±SE) for the discrimination of normal (NILM) and abnormal (LSIL and HSIL) slides. The results of a second classification experiment, using only normal-appearing cells from both normal and abnormal slides, demonstrates that a single structural texture feature measuring chromatin margination yields a classification performance of 0.815±0.019. Overall the results demonstrate the efficacy of the proposed structural approach and that it is possible to detect malignancy associated changes (MACs) in Papanicoloau stain.
Contingent Attentional Capture
NASA Technical Reports Server (NTRS)
Remington, Roger; Folk, Charles L.
1994-01-01
Four experiments address the degree of top-down selectivity in attention capture by feature singletons through manipulations of the spatial relationship and featural similarity of target and distractor singletons in a modified spatial cuing paradigm. Contrary to previous studies, all four experiments show that when searching for a singleton target, an irrelevant featural singleton captures attention only when defined by the same feature value as the target. Experiments 2, 3, and 4 provide a potential explanation for this empirical discrepancy by showing that irrelevant singletons can produce distraction effects that are independent of shifts of spatial attention. The results further support the notion that attentional capture is contingent on top-down attention control settings but indicates that such settings can be instantiated at the level of feature values.
Generalizations of the subject-independent feature set for music-induced emotion recognition.
Lin, Yuan-Pin; Chen, Jyh-Horng; Duann, Jeng-Ren; Lin, Chin-Teng; Jung, Tzyy-Ping
2011-01-01
Electroencephalogram (EEG)-based emotion recognition has been an intensely growing field. Yet, how to achieve acceptable accuracy on a practical system with as fewer electrodes as possible is less concerned. This study evaluates a set of subject-independent features, based on differential power asymmetry of symmetric electrode pairs [1], with emphasis on its applicability to subject variability in music-induced emotion classification problem. Results of this study have evidently validated the feasibility of using subject-independent EEG features to classify four emotional states with acceptable accuracy in second-scale temporal resolution. These features could be generalized across subjects to detect emotion induced by music excerpts not limited to the music database that was used to derive the emotion-specific features.
Constant size descriptors for accurate machine learning models of molecular properties
NASA Astrophysics Data System (ADS)
Collins, Christopher R.; Gordon, Geoffrey J.; von Lilienfeld, O. Anatole; Yaron, David J.
2018-06-01
Two different classes of molecular representations for use in machine learning of thermodynamic and electronic properties are studied. The representations are evaluated by monitoring the performance of linear and kernel ridge regression models on well-studied data sets of small organic molecules. One class of representations studied here counts the occurrence of bonding patterns in the molecule. These require only the connectivity of atoms in the molecule as may be obtained from a line diagram or a SMILES string. The second class utilizes the three-dimensional structure of the molecule. These include the Coulomb matrix and Bag of Bonds, which list the inter-atomic distances present in the molecule, and Encoded Bonds, which encode such lists into a feature vector whose length is independent of molecular size. Encoded Bonds' features introduced here have the advantage of leading to models that may be trained on smaller molecules and then used successfully on larger molecules. A wide range of feature sets are constructed by selecting, at each rank, either a graph or geometry-based feature. Here, rank refers to the number of atoms involved in the feature, e.g., atom counts are rank 1, while Encoded Bonds are rank 2. For atomization energies in the QM7 data set, the best graph-based feature set gives a mean absolute error of 3.4 kcal/mol. Inclusion of 3D geometry substantially enhances the performance, with Encoded Bonds giving 2.4 kcal/mol, when used alone, and 1.19 kcal/mol, when combined with graph features.
NASA Astrophysics Data System (ADS)
Bangs, Corey F.; Kruse, Fred A.; Olsen, Chris R.
2013-05-01
Hyperspectral data were assessed to determine the effect of integrating spectral data and extracted texture feature data on classification accuracy. Four separate spectral ranges (hundreds of spectral bands total) were used from the Visible and Near Infrared (VNIR) and Shortwave Infrared (SWIR) portions of the electromagnetic spectrum. Haralick texture features (contrast, entropy, and correlation) were extracted from the average gray-level image for each of the four spectral ranges studied. A maximum likelihood classifier was trained using a set of ground truth regions of interest (ROIs) and applied separately to the spectral data, texture data, and a fused dataset containing both. Classification accuracy was measured by comparison of results to a separate verification set of test ROIs. Analysis indicates that the spectral range (source of the gray-level image) used to extract the texture feature data has a significant effect on the classification accuracy. This result applies to texture-only classifications as well as the classification of integrated spectral data and texture feature data sets. Overall classification improvement for the integrated data sets was near 1%. Individual improvement for integrated spectral and texture classification of the "Urban" class showed approximately 9% accuracy increase over spectral-only classification. Texture-only classification accuracy was highest for the "Dirt Path" class at approximately 92% for the spectral range from 947 to 1343nm. This research demonstrates the effectiveness of texture feature data for more accurate analysis of hyperspectral data and the importance of selecting the correct spectral range to be used for the gray-level image source to extract these features.
Machine learning approach to automatic exudate detection in retinal images from diabetic patients
NASA Astrophysics Data System (ADS)
Sopharak, Akara; Dailey, Matthew N.; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Tom; Thet Nwe, Khine; Aye Moe, Yin
2010-01-01
Exudates are among the preliminary signs of diabetic retinopathy, a major cause of vision loss in diabetic patients. Early detection of exudates could improve patients' chances to avoid blindness. In this paper, we present a series of experiments on feature selection and exudates classification using naive Bayes and support vector machine (SVM) classifiers. We first fit the naive Bayes model to a training set consisting of 15 features extracted from each of 115,867 positive examples of exudate pixels and an equal number of negative examples. We then perform feature selection on the naive Bayes model, repeatedly removing features from the classifier, one by one, until classification performance stops improving. To find the best SVM, we begin with the best feature set from the naive Bayes classifier, and repeatedly add the previously-removed features to the classifier. For each combination of features, we perform a grid search to determine the best combination of hyperparameters ν (tolerance for training errors) and γ (radial basis function width). We compare the best naive Bayes and SVM classifiers to a baseline nearest neighbour (NN) classifier using the best feature sets from both classifiers. We find that the naive Bayes and SVM classifiers perform better than the NN classifier. The overall best sensitivity, specificity, precision, and accuracy are 92.28%, 98.52%, 53.05%, and 98.41%, respectively.
Bayesian network interface for assisting radiology interpretation and education
NASA Astrophysics Data System (ADS)
Duda, Jeffrey; Botzolakis, Emmanuel; Chen, Po-Hao; Mohan, Suyash; Nasrallah, Ilya; Rauschecker, Andreas; Rudie, Jeffrey; Bryan, R. Nick; Gee, James; Cook, Tessa
2018-03-01
In this work, we present the use of Bayesian networks for radiologist decision support during clinical interpretation. This computational approach has the advantage of avoiding incorrect diagnoses that result from known human cognitive biases such as anchoring bias, framing effect, availability bias, and premature closure. To integrate Bayesian networks into clinical practice, we developed an open-source web application that provides diagnostic support for a variety of radiology disease entities (e.g., basal ganglia diseases, bone lesions). The Clinical tool presents the user with a set of buttons representing clinical and imaging features of interest. These buttons are used to set the value for each observed feature. As features are identified, the conditional probabilities for each possible diagnosis are updated in real time. Additionally, using sensitivity analysis, the interface may be set to inform the user which remaining imaging features provide maximum discriminatory information to choose the most likely diagnosis. The Case Submission tools allow the user to submit a validated case and the associated imaging features to a database, which can then be used for future tuning/testing of the Bayesian networks. These submitted cases are then reviewed by an assigned expert using the provided QC tool. The Research tool presents users with cases with previously labeled features and a chosen diagnosis, for the purpose of performance evaluation. Similarly, the Education page presents cases with known features, but provides real time feedback on feature selection.
Detection of Tampering Inconsistencies on Mobile Photos
NASA Astrophysics Data System (ADS)
Cao, Hong; Kot, Alex C.
Fast proliferation of mobile cameras and the deteriorating trust on digital images have created needs in determining the integrity of photos captured by mobile devices. As tampering often creates some inconsistencies, we propose in this paper a novel framework to statistically detect the image tampering inconsistency using accurately detected demosaicing weights features. By first cropping four non-overlapping blocks, each from one of the four quadrants in the mobile photo, we extract a set of demosaicing weights features from each block based on a partial derivative correlation model. Through regularizing the eigenspectrum of the within-photo covariance matrix and performing eigenfeature transformation, we further derive a compact set of eigen demosaicing weights features, which are sensitive to image signal mixing from different photo sources. A metric is then proposed to quantify the inconsistency based on the eigen weights features among the blocks cropped from different regions of the mobile photo. Through comparison, we show our eigen weights features perform better than the eigen features extracted from several other conventional sets of statistical forensics features in detecting the presence of tampering. Experimentally, our method shows a good confidence in tampering detection especially when one of the four cropped blocks is from a different camera model or brand with different demosaicing process.
Morphological and wavelet features towards sonographic thyroid nodules evaluation.
Tsantis, Stavros; Dimitropoulos, Nikos; Cavouras, Dionisis; Nikiforidis, George
2009-03-01
This paper presents a computer-based classification scheme that utilized various morphological and novel wavelet-based features towards malignancy risk evaluation of thyroid nodules in ultrasonography. The study comprised 85 ultrasound images-patients that were cytological confirmed (54 low-risk and 31 high-risk). A set of 20 features (12 based on nodules boundary shape and 8 based on wavelet local maxima located within each nodule) has been generated. Two powerful pattern recognition algorithms (support vector machines and probabilistic neural networks) have been designed and developed in order to quantify the power of differentiation of the introduced features. A comparative study has also been held, in order to estimate the impact speckle had onto the classification procedure. The diagnostic sensitivity and specificity of both classifiers was made by means of receiver operating characteristics (ROC) analysis. In the speckle-free feature set, the area under the ROC curve was 0.96 for the support vector machines classifier whereas for the probabilistic neural networks was 0.91. In the feature set with speckle, the corresponding areas under the ROC curves were 0.88 and 0.86 respectively for the two classifiers. The proposed features can increase the classification accuracy and decrease the rate of missing and misdiagnosis in thyroid cancer control.
Astronomical Software Directory Service
NASA Astrophysics Data System (ADS)
Hanisch, Robert J.; Payne, Harry; Hayes, Jeffrey
1997-01-01
With the support of NASA's Astrophysics Data Program (NRA 92-OSSA-15), we have developed the Astronomical Software Directory Service (ASDS): a distributed, searchable, WWW-based database of software packages and their related documentation. ASDS provides integrated access to 56 astronomical software packages, with more than 16,000 URLs indexed for full-text searching. Users are performing about 400 searches per month. A new aspect of our service is the inclusion of telescope and instrumentation manuals, which prompted us to change the name to the Astronomical Software and Documentation Service. ASDS was originally conceived to serve two purposes: to provide a useful Internet service in an area of expertise of the investigators (astronomical software), and as a research project to investigate various architectures for searching through a set of documents distributed across the Internet. Two of the co-investigators were then installing and maintaining astronomical software as their primary job responsibility. We felt that a service which incorporated our experience in this area would be more useful than a straightforward listing of software packages. The original concept was for a service based on the client/server model, which would function as a directory/referral service rather than as an archive. For performing the searches, we began our investigation with a decision to evaluate the Isite software from the Center for Networked Information Discovery and Retrieval (CNIDR). This software was intended as a replacement for Wide-Area Information Service (WAIS), a client/server technology for performing full-text searches through a set of documents. Isite had some additional features that we considered attractive, and we enjoyed the cooperation of the Isite developers, who were happy to have ASDS as a demonstration project. We ended up staying with the software throughout the project, making modifications to take advantage of new features as they came along, as well as influencing the software development. The Web interface to the search engine is provided by a gateway program written in C++ by a consultant to the project (A. Warnock).
Feature instructions improve face-matching accuracy
Bindemann, Markus
2018-01-01
Identity comparisons of photographs of unfamiliar faces are prone to error but important for applied settings, such as person identification at passport control. Finding techniques to improve face-matching accuracy is therefore an important contemporary research topic. This study investigated whether matching accuracy can be improved by instruction to attend to specific facial features. Experiment 1 showed that instruction to attend to the eyebrows enhanced matching accuracy for optimized same-day same-race face pairs but not for other-race faces. By contrast, accuracy was unaffected by instruction to attend to the eyes, and declined with instruction to attend to ears. Experiment 2 replicated the eyebrow-instruction improvement with a different set of same-race faces, comprising both optimized same-day and more challenging different-day face pairs. These findings suggest that instruction to attend to specific features can enhance face-matching accuracy, but feature selection is crucial and generalization across face sets may be limited. PMID:29543822
Automated detection of pulmonary nodules in CT images with support vector machines
NASA Astrophysics Data System (ADS)
Liu, Lu; Liu, Wanyu; Sun, Xiaoming
2008-10-01
Many methods have been proposed to avoid radiologists fail to diagnose small pulmonary nodules. Recently, support vector machines (SVMs) had received an increasing attention for pattern recognition. In this paper, we present a computerized system aimed at pulmonary nodules detection; it identifies the lung field, extracts a set of candidate regions with a high sensitivity ratio and then classifies candidates by the use of SVMs. The Computer Aided Diagnosis (CAD) system presented in this paper supports the diagnosis of pulmonary nodules from Computed Tomography (CT) images as inflammation, tuberculoma, granuloma..sclerosing hemangioma, and malignant tumor. Five texture feature sets were extracted for each lesion, while a genetic algorithm based feature selection method was applied to identify the most robust features. The selected feature set was fed into an ensemble of SVMs classifiers. The achieved classification performance was 100%, 92.75% and 90.23% in the training, validation and testing set, respectively. It is concluded that computerized analysis of medical images in combination with artificial intelligence can be used in clinical practice and may contribute to more efficient diagnosis.
Solvent-accessible surface area: How well can be applied to hot-spot detection?
Martins, João M; Ramos, Rui M; Pimenta, António C; Moreira, Irina S
2014-03-01
A detailed comprehension of protein-based interfaces is essential for the rational drug development. One of the key features of these interfaces is their solvent accessible surface area profile. With that in mind, we tested a group of 12 SASA-based features for their ability to correlate and differentiate hot- and null-spots. These were tested in three different data sets, explicit water MD, implicit water MD, and static PDB structure. We found no discernible improvement with the use of more comprehensive data sets obtained from molecular dynamics. The features tested were shown to be capable of discerning between hot- and null-spots, while presenting low correlations. Residue standardization such as rel SASAi or rel/res SASAi , improved the features as a tool to predict ΔΔGbinding values. A new method using support machine learning algorithms was developed: SBHD (Sasa-Based Hot-spot Detection). This method presents a precision, recall, and F1 score of 0.72, 0.81, and 0.76 for the training set and 0.91, 0.73, and 0.81 for an independent test set. Copyright © 2013 Wiley Periodicals, Inc.
Natural image statistics and low-complexity feature selection.
Vasconcelos, Manuela; Vasconcelos, Nuno
2009-02-01
Low-complexity feature selection is analyzed in the context of visual recognition. It is hypothesized that high-order dependences of bandpass features contain little information for discrimination of natural images. This hypothesis is characterized formally by the introduction of the concepts of conjunctive interference and decomposability order of a feature set. Necessary and sufficient conditions for the feasibility of low-complexity feature selection are then derived in terms of these concepts. It is shown that the intrinsic complexity of feature selection is determined by the decomposability order of the feature set and not its dimension. Feature selection algorithms are then derived for all levels of complexity and are shown to be approximated by existing information-theoretic methods, which they consistently outperform. The new algorithms are also used to objectively test the hypothesis of low decomposability order through comparison of classification performance. It is shown that, for image classification, the gain of modeling feature dependencies has strongly diminishing returns: best results are obtained under the assumption of decomposability order 1. This suggests a generic law for bandpass features extracted from natural images: that the effect, on the dependence of any two features, of observing any other feature is constant across image classes.
A Feature-based Approach to Big Data Analysis of Medical Images
Toews, Matthew; Wachinger, Christian; Estepar, Raul San Jose; Wells, William M.
2015-01-01
This paper proposes an inference method well-suited to large sets of medical images. The method is based upon a framework where distinctive 3D scale-invariant features are indexed efficiently to identify approximate nearest-neighbor (NN) feature matches in O(log N) computational complexity in the number of images N. It thus scales well to large data sets, in contrast to methods based on pair-wise image registration or feature matching requiring O(N) complexity. Our theoretical contribution is a density estimator based on a generative model that generalizes kernel density estimation and K-nearest neighbor (KNN) methods. The estimator can be used for on-the-fly queries, without requiring explicit parametric models or an off-line training phase. The method is validated on a large multi-site data set of 95,000,000 features extracted from 19,000 lung CT scans. Subject-level classification identifies all images of the same subjects across the entire data set despite deformation due to breathing state, including unintentional duplicate scans. State-of-the-art performance is achieved in predicting chronic pulmonary obstructive disorder (COPD) severity across the 5-category GOLD clinical rating, with an accuracy of 89% if both exact and one-off predictions are considered correct. PMID:26221685
A Feature-Based Approach to Big Data Analysis of Medical Images.
Toews, Matthew; Wachinger, Christian; Estepar, Raul San Jose; Wells, William M
2015-01-01
This paper proposes an inference method well-suited to large sets of medical images. The method is based upon a framework where distinctive 3D scale-invariant features are indexed efficiently to identify approximate nearest-neighbor (NN) feature matches-in O (log N) computational complexity in the number of images N. It thus scales well to large data sets, in contrast to methods based on pair-wise image registration or feature matching requiring O(N) complexity. Our theoretical contribution is a density estimator based on a generative model that generalizes kernel density estimation and K-nearest neighbor (KNN) methods.. The estimator can be used for on-the-fly queries, without requiring explicit parametric models or an off-line training phase. The method is validated on a large multi-site data set of 95,000,000 features extracted from 19,000 lung CT scans. Subject-level classification identifies all images of the same subjects across the entire data set despite deformation due to breathing state, including unintentional duplicate scans. State-of-the-art performance is achieved in predicting chronic pulmonary obstructive disorder (COPD) severity across the 5-category GOLD clinical rating, with an accuracy of 89% if both exact and one-off predictions are considered correct.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pon, R K; Cardenas, A F; Buttler, D J
The definition of what makes an article interesting varies from user to user and continually evolves even for a single user. As a result, for news recommendation systems, useless document features can not be determined a priori and all features are usually considered for interestingness classification. Consequently, the presence of currently useless features degrades classification performance [1], particularly over the initial set of news articles being classified. The initial set of document is critical for a user when considering which particular news recommendation system to adopt. To address these problems, we introduce an improved version of the naive Bayes classifiermore » with online feature selection. We use correlation to determine the utility of each feature and take advantage of the conditional independence assumption used by naive Bayes for online feature selection and classification. The augmented naive Bayes classifier performs 28% better than the traditional naive Bayes classifier in recommending news articles from the Yahoo! RSS feeds.« less
Attention-based image similarity measure with application to content-based information retrieval
NASA Astrophysics Data System (ADS)
Stentiford, Fred W. M.
2003-01-01
Whilst storage and capture technologies are able to cope with huge numbers of images, image retrieval is in danger of rendering many repositories valueless because of the difficulty of access. This paper proposes a similarity measure that imposes only very weak assumptions on the nature of the features used in the recognition process. This approach does not make use of a pre-defined set of feature measurements which are extracted from a query image and used to match those from database images, but instead generates features on a trial and error basis during the calculation of the similarity measure. This has the significant advantage that features that determine similarity can match whatever image property is important in a particular region whether it be a shape, a texture, a colour or a combination of all three. It means that effort is expended searching for the best feature for the region rather than expecting that a fixed feature set will perform optimally over the whole area of an image and over every image in a database. The similarity measure is evaluated on a problem of distinguishing similar shapes in sets of black and white symbols.
WND-CHARM: Multi-purpose image classification using compound image transforms
Orlov, Nikita; Shamir, Lior; Macura, Tomasz; Johnston, Josiah; Eckley, D. Mark; Goldberg, Ilya G.
2008-01-01
We describe a multi-purpose image classifier that can be applied to a wide variety of image classification tasks without modifications or fine-tuning, and yet provide classification accuracy comparable to state-of-the-art task-specific image classifiers. The proposed image classifier first extracts a large set of 1025 image features including polynomial decompositions, high contrast features, pixel statistics, and textures. These features are computed on the raw image, transforms of the image, and transforms of transforms of the image. The feature values are then used to classify test images into a set of pre-defined image classes. This classifier was tested on several different problems including biological image classification and face recognition. Although we cannot make a claim of universality, our experimental results show that this classifier performs as well or better than classifiers developed specifically for these image classification tasks. Our classifier’s high performance on a variety of classification problems is attributed to (i) a large set of features extracted from images; and (ii) an effective feature selection and weighting algorithm sensitive to specific image classification problems. The algorithms are available for free download from openmicroscopy.org. PMID:18958301
Feature Selection based on Machine Learning in MRIs for Hippocampal Segmentation
NASA Astrophysics Data System (ADS)
Tangaro, Sabina; Amoroso, Nicola; Brescia, Massimo; Cavuoti, Stefano; Chincarini, Andrea; Errico, Rosangela; Paolo, Inglese; Longo, Giuseppe; Maglietta, Rosalia; Tateo, Andrea; Riccio, Giuseppe; Bellotti, Roberto
2015-01-01
Neurodegenerative diseases are frequently associated with structural changes in the brain. Magnetic resonance imaging (MRI) scans can show these variations and therefore can be used as a supportive feature for a number of neurodegenerative diseases. The hippocampus has been known to be a biomarker for Alzheimer disease and other neurological and psychiatric diseases. However, it requires accurate, robust, and reproducible delineation of hippocampal structures. Fully automatic methods are usually the voxel based approach; for each voxel a number of local features were calculated. In this paper, we compared four different techniques for feature selection from a set of 315 features extracted for each voxel: (i) filter method based on the Kolmogorov-Smirnov test; two wrapper methods, respectively, (ii) sequential forward selection and (iii) sequential backward elimination; and (iv) embedded method based on the Random Forest Classifier on a set of 10 T1-weighted brain MRIs and tested on an independent set of 25 subjects. The resulting segmentations were compared with manual reference labelling. By using only 23 feature for each voxel (sequential backward elimination) we obtained comparable state-of-the-art performances with respect to the standard tool FreeSurfer.
NASA Astrophysics Data System (ADS)
Saha, Ashirbani; Harowicz, Michael R.; Grimm, Lars J.; Kim, Connie E.; Ghate, Sujata V.; Walsh, Ruth; Mazurowski, Maciej A.
2018-02-01
One of the methods widely used to measure the proliferative activity of cells in breast cancer patients is the immunohistochemical (IHC) measurement of the percentage of cells stained for nuclear antigen Ki-67. Use of Ki-67 expression as a prognostic marker is still under investigation. However, numerous clinical studies have reported an association between a high Ki-67 and overall survival (OS) and disease free survival (DFS). On the other hand, to offer non-invasive alternative in determining Ki-67 expression, researchers have made recent attempts to study the association of Ki-67 expression with magnetic resonance (MR) imaging features of breast cancer in small cohorts (<30). Here, we present a large scale evaluation of the relationship between imaging features and Ki-67 score as: (a) we used a set of 450 invasive breast cancer patients, (b) we extracted a set of 529 imaging features of shape and enhancement from breast, tumor and fibroglandular tissue of the patients, (c) used a subset of patients as the training set to select features and trained a multivariate logistic regression model to predict high versus low Ki-67 values, and (d) we validated the performance of the trained model in an independent test set using the area-under the receiver operating characteristics (ROC) curve (AUC) of the values predicted. Our model was able to predict high versus low Ki-67 in the test set with an AUC of 0.67 (95% CI: 0.58-0.75, p<1.1e-04). Thus, a moderate strength of association of Ki-67 values and MRextracted imaging features was demonstrated in our experiments.
Assessment of chronic post-surgical pain after knee replacement: development of a core outcome set.
Wylde, V; MacKichan, F; Bruce, J; Gooberman-Hill, R
2015-05-01
Approximately 20% of patients experience chronic post-surgical pain (CPSP) after total knee replacement (TKR). There is scope to improve assessment of CPSP after TKR, and this study aimed to develop a core outcome set. Eighty patients and 43 clinicians were recruited into a three-round modified Delphi study. In Round 1, participants were presented with 56 pain features identified from a systematic review, structured interviews with patients and focus groups with clinicians. Participants assigned importance ratings, using a 1-9 scale, to individual pain features; those features rated as most important were retained in subsequent rounds. Consensus that a pain feature should be included in the core outcome set was defined as the feature having a rating of 7-9 by ≥70% of both panels (patients and clinicians) and 1-3 by ≤15% of both panels or rated as 7-9 by ≥90% of one panel. Round 1 was completed by 71 patients and 39 clinicians, and Round 3 by 62 patients and 33 clinicians. The final consensus was that 33 pain features were important. These were grouped into an 8-item core outcome set comprising: pain intensity, pain interference with daily living, pain and physical functioning, temporal aspects of pain, pain description, emotional aspects of pain, use of pain medication, and improvement and satisfaction with pain relief. This core outcome set serves to guide assessment of CPSP after TKR. Consistency in assessment can promote standardized reporting and facilitate comparability between studies that address a common but understudied type of CPSP. © 2014 The Authors. European Journal of Pain published by John Wiley & Sons Ltd on behalf of European Pain Federation - EFIC®.
NASA Astrophysics Data System (ADS)
Accomazzi, Alberto; Henneken, E.; Grant, C. S.; Kurtz, M. J.; Di Milia, G.; Luker, J.; Thompson, D. M.; Bohlen, E.; Murray, S. S.
2011-05-01
ADS Labs is a platform that ADS is introducing in order to test and receive feedback from the community on new technologies and prototype services. Currently, ADS Labs features a new interface for abstract searches, faceted filtering of results, visualization of co-authorship networks, article-level recommendations, and a full-text search service. The streamlined abstract search interface provides a simple, one-box search with options for ranking results based on a paper relevancy, freshness, number of citations, and downloads. In addition, it provides advanced rankings based on collaborative filtering techniques. The faceted filtering interface allows users to narrow search results based on a particular property or set of properties ("facets"), allowing users to manage large lists and explore the relationship between them. For any set or sub-set of records, the co-authorship network can be visualized in an interactive way, offering a view of the distribution of contributors and their inter-relationships. This provides an immediate way to detect groups and collaborations involved in a particular research field. For a majority of papers in Astronomy, our new interface will provide a list of related articles of potential interest. The recommendations are based on a number of factors, including text similarity, citations, and co-readership information. The new full-text search interface allows users to find all instances of particular words or phrases in the body of the articles in our full-text archive. This includes all of the scanned literature in ADS as well as a select portion of the current astronomical literature, including ApJ, ApJS, AJ, MNRAS, PASP, A&A, and soon additional content from Springer journals. Fulltext search results include a list of the matching papers as well as a list of "snippets" of text highlighting the context in which the search terms were found. ADS Labs is available at http://adslabs.org
Joint inversion of NMR and SIP data to estimate pore size distribution of geomaterials
NASA Astrophysics Data System (ADS)
Niu, Qifei; Zhang, Chi
2018-03-01
There are growing interests in using geophysical tools to characterize the microstructure of geomaterials because of the non-invasive nature and the applicability in field. In these applications, multiple types of geophysical data sets are usually processed separately, which may be inadequate to constrain the key feature of target variables. Therefore, simultaneous processing of multiple data sets could potentially improve the resolution. In this study, we propose a method to estimate pore size distribution by joint inversion of nuclear magnetic resonance (NMR) T2 relaxation and spectral induced polarization (SIP) spectra. The petrophysical relation between NMR T2 relaxation time and SIP relaxation time is incorporated in a nonlinear least squares problem formulation, which is solved using Gauss-Newton method. The joint inversion scheme is applied to a synthetic sample and a Berea sandstone sample. The jointly estimated pore size distributions are very close to the true model and results from other experimental method. Even when the knowledge of the petrophysical models of the sample is incomplete, the joint inversion can still capture the main features of the pore size distribution of the samples, including the general shape and relative peak positions of the distribution curves. It is also found from the numerical example that the surface relaxivity of the sample could be extracted with the joint inversion of NMR and SIP data if the diffusion coefficient of the ions in the electrical double layer is known. Comparing to individual inversions, the joint inversion could improve the resolution of the estimated pore size distribution because of the addition of extra data sets. The proposed approach might constitute a first step towards a comprehensive joint inversion that can extract the full pore geometry information of a geomaterial from NMR and SIP data.
Bogale, Bezawork Afework; Aoyama, Masato; Sugita, Shoei
2011-01-01
We trained jungle crows to discriminate among photographs of human face according to their sex in a simultaneous two-alternative task to study their categorical learning ability. Once the crows reached a discrimination criterion (greater than or equal to 80% correct choices in two consecutive sessions; binomial probability test, p<.05), they next received generalization and transfer tests (i.e., greyscale, contour, and 'full' occlusion) in Experiment 1 followed by a 'partial' occlusion test in Experiment 2 and random stimuli pair test in Experiment 3. Jungle crows learned the discrimination task in a few trials and successfully generalized to novel stimuli sets. However, all crows failed the greyscale test and half of them the contour test. Neither occlusion of internal features of the face, nor randomly pairing of exemplars affected discrimination performance of most, if not all crows. We suggest that jungle crows categorize human face photographs based on perceptual similarities as other non-human animals do, and colour appears to be the most salient feature controlling discriminative behaviour. However, the variability in the use of facial contours among individuals suggests an exploitation of multiple features and individual differences in visual information processing among jungle crows. Copyright © 2010 Elsevier B.V. All rights reserved.
OligoIS: Scalable Instance Selection for Class-Imbalanced Data Sets.
García-Pedrajas, Nicolás; Perez-Rodríguez, Javier; de Haro-García, Aida
2013-02-01
In current research, an enormous amount of information is constantly being produced, which poses a challenge for data mining algorithms. Many of the problems in extremely active research areas, such as bioinformatics, security and intrusion detection, or text mining, share the following two features: large data sets and class-imbalanced distribution of samples. Although many methods have been proposed for dealing with class-imbalanced data sets, most of these methods are not scalable to the very large data sets common to those research fields. In this paper, we propose a new approach to dealing with the class-imbalance problem that is scalable to data sets with many millions of instances and hundreds of features. This proposal is based on the divide-and-conquer principle combined with application of the selection process to balanced subsets of the whole data set. This divide-and-conquer principle allows the execution of the algorithm in linear time. Furthermore, the proposed method is easy to implement using a parallel environment and can work without loading the whole data set into memory. Using 40 class-imbalanced medium-sized data sets, we will demonstrate our method's ability to improve the results of state-of-the-art instance selection methods for class-imbalanced data sets. Using three very large data sets, we will show the scalability of our proposal to millions of instances and hundreds of features.
Facial soft biometric features for forensic face recognition.
Tome, Pedro; Vera-Rodriguez, Ruben; Fierrez, Julian; Ortega-Garcia, Javier
2015-12-01
This paper proposes a functional feature-based approach useful for real forensic caseworks, based on the shape, orientation and size of facial traits, which can be considered as a soft biometric approach. The motivation of this work is to provide a set of facial features, which can be understood by non-experts such as judges and support the work of forensic examiners who, in practice, carry out a thorough manual comparison of face images paying special attention to the similarities and differences in shape and size of various facial traits. This new approach constitutes a tool that automatically converts a set of facial landmarks to a set of features (shape and size) corresponding to facial regions of forensic value. These features are furthermore evaluated in a population to generate statistics to support forensic examiners. The proposed features can also be used as additional information that can improve the performance of traditional face recognition systems. These features follow the forensic methodology and are obtained in a continuous and discrete manner from raw images. A statistical analysis is also carried out to study the stability, discrimination power and correlation of the proposed facial features on two realistic databases: MORPH and ATVS Forensic DB. Finally, the performance of both continuous and discrete features is analyzed using different similarity measures. Experimental results show high discrimination power and good recognition performance, especially for continuous features. A final fusion of the best systems configurations achieves rank 10 match results of 100% for ATVS database and 75% for MORPH database demonstrating the benefits of using this information in practice. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Garcia-Seisdedos, Hector; Ibarra-Molero, Beatriz; Sanchez-Ruiz, Jose M
2012-01-01
Protein promiscuity is of considerable interest due its role in adaptive metabolic plasticity, its fundamental connection with molecular evolution and also because of its biotechnological applications. Current views on the relation between primary and promiscuous protein activities stem largely from laboratory evolution experiments aimed at increasing promiscuous activity levels. Here, on the other hand, we attempt to assess the main features of the simultaneous modulation of the primary and promiscuous functions during the course of natural evolution. The computational/experimental approach we propose for this task involves the following steps: a function-targeted, statistical coupling analysis of evolutionary data is used to determine a set of positions likely linked to the recruitment of a promiscuous activity for a new function; a combinatorial library of mutations on this set of positions is prepared and screened for both, the primary and the promiscuous activities; a partial-least-squares reconstruction of the full combinatorial space is carried out; finally, an approximation to the Pareto set of variants with optimal primary/promiscuous activities is derived. Application of the approach to the emergence of folding catalysis in thioredoxin scaffolds reveals an unanticipated scenario: diverse patterns of primary/promiscuous activity modulation are possible, including a moderate (but likely significant in a biological context) simultaneous enhancement of both activities. We show that this scenario can be most simply explained on the basis of the conformational diversity hypothesis, although alternative interpretations cannot be ruled out. Overall, the results reported may help clarify the mechanisms of the evolution of new functions. From a different viewpoint, the partial-least-squares-reconstruction/Pareto-set-prediction approach we have introduced provides the computational basis for an efficient directed-evolution protocol aimed at the simultaneous enhancement of several protein features and should therefore open new possibilities in the engineering of multi-functional enzymes.
Garcia-Seisdedos, Hector; Ibarra-Molero, Beatriz; Sanchez-Ruiz, Jose M.
2012-01-01
Protein promiscuity is of considerable interest due its role in adaptive metabolic plasticity, its fundamental connection with molecular evolution and also because of its biotechnological applications. Current views on the relation between primary and promiscuous protein activities stem largely from laboratory evolution experiments aimed at increasing promiscuous activity levels. Here, on the other hand, we attempt to assess the main features of the simultaneous modulation of the primary and promiscuous functions during the course of natural evolution. The computational/experimental approach we propose for this task involves the following steps: a function-targeted, statistical coupling analysis of evolutionary data is used to determine a set of positions likely linked to the recruitment of a promiscuous activity for a new function; a combinatorial library of mutations on this set of positions is prepared and screened for both, the primary and the promiscuous activities; a partial-least-squares reconstruction of the full combinatorial space is carried out; finally, an approximation to the Pareto set of variants with optimal primary/promiscuous activities is derived. Application of the approach to the emergence of folding catalysis in thioredoxin scaffolds reveals an unanticipated scenario: diverse patterns of primary/promiscuous activity modulation are possible, including a moderate (but likely significant in a biological context) simultaneous enhancement of both activities. We show that this scenario can be most simply explained on the basis of the conformational diversity hypothesis, although alternative interpretations cannot be ruled out. Overall, the results reported may help clarify the mechanisms of the evolution of new functions. From a different viewpoint, the partial-least-squares-reconstruction/Pareto-set-prediction approach we have introduced provides the computational basis for an efficient directed-evolution protocol aimed at the simultaneous enhancement of several protein features and should therefore open new possibilities in the engineering of multi-functional enzymes. PMID:22719242
NASA Astrophysics Data System (ADS)
Monti, Alessio; Toscano, Alessandro; Bilotti, Filiberto
2017-06-01
The introduction of nanoparticles-based screens [C. W. Hsu, Nat. Commun. 5, 3152 (2014)] has paved the way to the realization of low-cost transparent displays with a wide viewing angle and scalability to large size. Despite the huge potentialities of this approach, the design of a nanoparticles array exhibiting a sharp scattering response in the optical spectrum is still a challenging task. In this manuscript, we investigate the suitability of ellipsoidal plasmonic nanoparticles for this purpose. First, we show that some trade-offs between the sharpness of the scattering response of the array and its absorption level apply. Starting from these considerations, we prove that prolate nanoparticles may be a plausible candidate for achieving the peculiar features required in transparent screen applications. An example of a full-color and almost-isotropic transparent screen is finally proposed and its robustness towards the geometrical inaccuracies that may arise during the fabrication process is assessed. All the analytical considerations, carried out through an analytical model taking into account the surface dispersion effect affecting the nanoparticles, are supported by a proper set of full-wave simulations.
Diversity and Divergence of Dinoflagellate Histone Proteins
Marinov, Georgi K.; Lynch, Michael
2015-01-01
Histone proteins and the nucleosomal organization of chromatin are near-universal eukaroytic features, with the exception of dinoflagellates. Previous studies have suggested that histones do not play a major role in the packaging of dinoflagellate genomes, although several genomic and transcriptomic surveys have detected a full set of core histone genes. Here, transcriptomic and genomic sequence data from multiple dinoflagellate lineages are analyzed, and the diversity of histone proteins and their variants characterized, with particular focus on their potential post-translational modifications and the conservation of the histone code. In addition, the set of putative epigenetic mark readers and writers, chromatin remodelers and histone chaperones are examined. Dinoflagellates clearly express the most derived set of histones among all autonomous eukaryote nuclei, consistent with a combination of relaxation of sequence constraints imposed by the histone code and the presence of numerous specialized histone variants. The histone code itself appears to have diverged significantly in some of its components, yet others are conserved, implying conservation of the associated biochemical processes. Specifically, and with major implications for the function of histones in dinoflagellates, the results presented here strongly suggest that transcription through nucleosomal arrays happens in dinoflagellates. Finally, the plausible roles of histones in dinoflagellate nuclei are discussed. PMID:26646152
Towards Automated Three-Dimensional Tracking of Nephrons through Stacked Histological Image Sets
Bhikha, Charita; Andreasen, Arne; Christensen, Erik I.; Letts, Robyn F. R.; Pantanowitz, Adam; Rubin, David M.; Thomsen, Jesper S.; Zhai, Xiao-Yue
2015-01-01
An automated approach for tracking individual nephrons through three-dimensional histological image sets of mouse and rat kidneys is presented. In a previous study, the available images were tracked manually through the image sets in order to explore renal microarchitecture. The purpose of the current research is to reduce the time and effort required to manually trace nephrons by creating an automated, intelligent system as a standard tool for such datasets. The algorithm is robust enough to isolate closely packed nephrons and track their convoluted paths despite a number of nonideal, interfering conditions such as local image distortions, artefacts, and interstitial tissue interference. The system comprises image preprocessing, feature extraction, and a custom graph-based tracking algorithm, which is validated by a rule base and a machine learning algorithm. A study of a selection of automatically tracked nephrons, when compared with manual tracking, yields a 95% tracking accuracy for structures in the cortex, while those in the medulla have lower accuracy due to narrower diameter and higher density. Limited manual intervention is introduced to improve tracking, enabling full nephron paths to be obtained with an average of 17 manual corrections per mouse nephron and 58 manual corrections per rat nephron. PMID:26170896
Towards Automated Three-Dimensional Tracking of Nephrons through Stacked Histological Image Sets.
Bhikha, Charita; Andreasen, Arne; Christensen, Erik I; Letts, Robyn F R; Pantanowitz, Adam; Rubin, David M; Thomsen, Jesper S; Zhai, Xiao-Yue
2015-01-01
An automated approach for tracking individual nephrons through three-dimensional histological image sets of mouse and rat kidneys is presented. In a previous study, the available images were tracked manually through the image sets in order to explore renal microarchitecture. The purpose of the current research is to reduce the time and effort required to manually trace nephrons by creating an automated, intelligent system as a standard tool for such datasets. The algorithm is robust enough to isolate closely packed nephrons and track their convoluted paths despite a number of nonideal, interfering conditions such as local image distortions, artefacts, and interstitial tissue interference. The system comprises image preprocessing, feature extraction, and a custom graph-based tracking algorithm, which is validated by a rule base and a machine learning algorithm. A study of a selection of automatically tracked nephrons, when compared with manual tracking, yields a 95% tracking accuracy for structures in the cortex, while those in the medulla have lower accuracy due to narrower diameter and higher density. Limited manual intervention is introduced to improve tracking, enabling full nephron paths to be obtained with an average of 17 manual corrections per mouse nephron and 58 manual corrections per rat nephron.
Satheesha, T. Y.; Prasad, M. N. Giri; Dhruve, Kashyap D.
2017-01-01
Melanoma mortality rates are the highest amongst skin cancer patients. Melanoma is life threating when it grows beyond the dermis of the skin. Hence, depth is an important factor to diagnose melanoma. This paper introduces a non-invasive computerized dermoscopy system that considers the estimated depth of skin lesions for diagnosis. A 3-D skin lesion reconstruction technique using the estimated depth obtained from regular dermoscopic images is presented. On basis of the 3-D reconstruction, depth and 3-D shape features are extracted. In addition to 3-D features, regular color, texture, and 2-D shape features are also extracted. Feature extraction is critical to achieve accurate results. Apart from melanoma, in-situ melanoma the proposed system is designed to diagnose basal cell carcinoma, blue nevus, dermatofibroma, haemangioma, seborrhoeic keratosis, and normal mole lesions. For experimental evaluations, the PH2, ISIC: Melanoma Project, and ATLAS dermoscopy data sets is considered. Different feature set combinations is considered and performance is evaluated. Significant performance improvement is reported the post inclusion of estimated depth and 3-D features. The good classification scores of sensitivity = 96%, specificity = 97% on PH2 data set and sensitivity = 98%, specificity = 99% on the ATLAS data set is achieved. Experiments conducted to estimate tumor depth from 3-D lesion reconstruction is presented. Experimental results achieved prove that the proposed computerized dermoscopy system is efficient and can be used to diagnose varied skin lesion dermoscopy images. PMID:28512610
NASA Astrophysics Data System (ADS)
Thomaz, Ricardo L.; Carneiro, Pedro C.; Patrocinio, Ana C.
2017-03-01
Breast cancer is the leading cause of death for women in most countries. The high levels of mortality relate mostly to late diagnosis and to the direct proportionally relationship between breast density and breast cancer development. Therefore, the correct assessment of breast density is important to provide better screening for higher risk patients. However, in modern digital mammography the discrimination among breast densities is highly complex due to increased contrast and visual information for all densities. Thus, a computational system for classifying breast density might be a useful tool for aiding medical staff. Several machine-learning algorithms are already capable of classifying small number of classes with good accuracy. However, machinelearning algorithms main constraint relates to the set of features extracted and used for classification. Although well-known feature extraction techniques might provide a good set of features, it is a complex task to select an initial set during design of a classifier. Thus, we propose feature extraction using a Convolutional Neural Network (CNN) for classifying breast density by a usual machine-learning classifier. We used 307 mammographic images downsampled to 260x200 pixels to train a CNN and extract features from a deep layer. After training, the activation of 8 neurons from a deep fully connected layer are extracted and used as features. Then, these features are feedforward to a single hidden layer neural network that is cross-validated using 10-folds to classify among four classes of breast density. The global accuracy of this method is 98.4%, presenting only 1.6% of misclassification. However, the small set of samples and memory constraints required the reuse of data in both CNN and MLP-NN, therefore overfitting might have influenced the results even though we cross-validated the network. Thus, although we presented a promising method for extracting features and classifying breast density, a greater database is still required for evaluating the results.
Textural features for image classification
NASA Technical Reports Server (NTRS)
Haralick, R. M.; Dinstein, I.; Shanmugam, K.
1973-01-01
Description of some easily computable textural features based on gray-tone spatial dependances, and illustration of their application in category-identification tasks of three different kinds of image data - namely, photomicrographs of five kinds of sandstones, 1:20,000 panchromatic aerial photographs of eight land-use categories, and ERTS multispectral imagery containing several land-use categories. Two kinds of decision rules are used - one for which the decision regions are convex polyhedra (a piecewise-linear decision rule), and one for which the decision regions are rectangular parallelpipeds (a min-max decision rule). In each experiment the data set was divided into two parts, a training set and a test set. Test set identification accuracy is 89% for the photomicrographs, 82% for the aerial photographic imagery, and 83% for the satellite imagery. These results indicate that the easily computable textural features probably have a general applicability for a wide variety of image-classification applications.
Systems and methods for predicting materials properties
Ceder, Gerbrand; Fischer, Chris; Tibbetts, Kevin; Morgan, Dane; Curtarolo, Stefano
2007-11-06
Systems and methods for predicting features of materials of interest. Reference data are analyzed to deduce relationships between the input data sets and output data sets. Reference data includes measured values and/or computed values. The deduced relationships can be specified as equations, correspondences, and/or algorithmic processes that produce appropriate output data when suitable input data is used. In some instances, the output data set is a subset of the input data set, and computational results may be refined by optionally iterating the computational procedure. To deduce features of a new material of interest, a computed or measured input property of the material is provided to an equation, correspondence, or algorithmic procedure previously deduced, and an output is obtained. In some instances, the output is iteratively refined. In some instances, new features deduced for the material of interest are added to a database of input and output data for known materials.
Online Feature Transformation Learning for Cross-Domain Object Category Recognition.
Zhang, Xuesong; Zhuang, Yan; Wang, Wei; Pedrycz, Witold
2017-06-09
In this paper, we introduce a new research problem termed online feature transformation learning in the context of multiclass object category recognition. The learning of a feature transformation is viewed as learning a global similarity metric function in an online manner. We first consider the problem of online learning a feature transformation matrix expressed in the original feature space and propose an online passive aggressive feature transformation algorithm. Then these original features are mapped to kernel space and an online single kernel feature transformation (OSKFT) algorithm is developed to learn a nonlinear feature transformation. Based on the OSKFT and the existing Hedge algorithm, a novel online multiple kernel feature transformation algorithm is also proposed, which can further improve the performance of online feature transformation learning in large-scale application. The classifier is trained with k nearest neighbor algorithm together with the learned similarity metric function. Finally, we experimentally examined the effect of setting different parameter values in the proposed algorithms and evaluate the model performance on several multiclass object recognition data sets. The experimental results demonstrate the validity and good performance of our methods on cross-domain and multiclass object recognition application.
Acoustic features of objects matched by an echolocating bottlenose dolphin.
Delong, Caroline M; Au, Whitlow W L; Lemonds, David W; Harley, Heidi E; Roitblat, Herbert L
2006-03-01
The focus of this study was to investigate how dolphins use acoustic features in returning echolocation signals to discriminate among objects. An echolocating dolphin performed a match-to-sample task with objects that varied in size, shape, material, and texture. After the task was completed, the features of the object echoes were measured (e.g., target strength, peak frequency). The dolphin's error patterns were examined in conjunction with the between-object variation in acoustic features to identify the acoustic features that the dolphin used to discriminate among the objects. The present study explored two hypotheses regarding the way dolphins use acoustic information in echoes: (1) use of a single feature, or (2) use of a linear combination of multiple features. The results suggested that dolphins do not use a single feature across all object sets or a linear combination of six echo features. Five features appeared to be important to the dolphin on four or more sets: the echo spectrum shape, the pattern of changes in target strength and number of highlights as a function of object orientation, and peak and center frequency. These data suggest that dolphins use multiple features and integrate information across echoes from a range of object orientations.
SoFoCles: feature filtering for microarray classification based on gene ontology.
Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A
2010-02-01
Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.
Developing a radiomics framework for classifying non-small cell lung carcinoma subtypes
NASA Astrophysics Data System (ADS)
Yu, Dongdong; Zang, Yali; Dong, Di; Zhou, Mu; Gevaert, Olivier; Fang, Mengjie; Shi, Jingyun; Tian, Jie
2017-03-01
Patient-targeted treatment of non-small cell lung carcinoma (NSCLC) has been well documented according to the histologic subtypes over the past decade. In parallel, recent development of quantitative image biomarkers has recently been highlighted as important diagnostic tools to facilitate histological subtype classification. In this study, we present a radiomics analysis that classifies the adenocarcinoma (ADC) and squamous cell carcinoma (SqCC). We extract 52-dimensional, CT-based features (7 statistical features and 45 image texture features) to represent each nodule. We evaluate our approach on a clinical dataset including 324 ADCs and 110 SqCCs patients with CT image scans. Classification of these features is performed with four different machine-learning classifiers including Support Vector Machines with Radial Basis Function kernel (RBF-SVM), Random forest (RF), K-nearest neighbor (KNN), and RUSBoost algorithms. To improve the classifiers' performance, optimal feature subset is selected from the original feature set by using an iterative forward inclusion and backward eliminating algorithm. Extensive experimental results demonstrate that radiomics features achieve encouraging classification results on both complete feature set (AUC=0.89) and optimal feature subset (AUC=0.91).
Use of a New Set of Linguistic Features to Improve Automatic Assessment of Text Readability
ERIC Educational Resources Information Center
Yoshimi, Takehiko; Kotani, Katsunori; Isahara, Hitoshi
2012-01-01
The present paper proposes and evaluates a readability assessment method designed for Japanese learners of EFL (English as a foreign language). The proposed readability assessment method is constructed by a regression algorithm using a new set of linguistic features that were employed separately in previous studies. The results showed that the…
ERIC Educational Resources Information Center
Adamo, Maha; Pun, Carson; Pratt, Jay; Ferber, Susanne
2008-01-01
When non-informative peripheral cues precede a target defined by a specific feature, cues that share the critical feature will capture attention while cues that do not will be effectively ignored. We tested whether different attentional control sets can be simultaneously maintained over distinct regions of space. Participants were instructed to…
Inflationary features and shifts in cosmological parameters from Planck 2015 data
NASA Astrophysics Data System (ADS)
Obied, Georges; Dvorkin, Cora; Heinrich, Chen; Hu, Wayne; Miranda, Vinicius
2017-10-01
We explore the relationship between features in the Planck 2015 temperature and polarization data, shifts in the cosmological parameters, and features from inflation. Residuals in the temperature data from the best-fit power-law Λ CDM model at low multipole ℓ≲40 are mainly responsible for the high H0 and low σ8Ωm1 /2 values when comparing the ℓ<1000 portion to the full data set. These same residuals are better fit to inflationary features with a 1.9 σ preference for running of the running of the tilt or a stronger 99% C.L. local significance preference for a sharp drop in power around k =0.004 Mpc-1, relieving the internal tension with H0. At ℓ>1000 , the same in-phase acoustic residuals that drive the global H0 constraints and appear as a lensing anomaly also favor running parameters which allow even lower H0, but not once lensing reconstruction is considered. Polarization spectra are intrinsically highly sensitive to these parameter shifts, and even more so in the Planck 2015 TE data due to an anomalous suppression in power at ℓ≈165 , which disfavors the best-fit H0 Λ CDM solution by more than 2 σ , and high H0 value at almost 3 σ . Current polarization data also slightly enhance the significance of a sharp suppression of large-scale power but leave room for large improvements in the future with cosmic variance limited E -mode measurements.
Park, Hyunjin; Yang, Jin-ju; Seo, Jongbum; Choi, Yu-yong; Lee, Kun-ho; Lee, Jong-min
2014-04-01
Cortical features derived from magnetic resonance imaging (MRI) provide important information to account for human intelligence. Cortical thickness, surface area, sulcal depth, and mean curvature were considered to explain human intelligence. One region of interest (ROI) of a cortical structure consisting of thousands of vertices contained thousands of measurements, and typically, one mean value (first order moment), was used to represent a chosen ROI, which led to a potentially significant loss of information. We proposed a technological improvement to account for human intelligence in which a second moment (variance) in addition to the mean value was adopted to represent a chosen ROI, so that the loss of information would be less severe. Two computed moments for the chosen ROIs were analyzed with partial least squares regression (PLSR). Cortical features for 78 adults were measured and analyzed in conjunction with the full-scale intelligence quotient (FSIQ). Our results showed that 45% of the variance of the FSIQ could be explained using the combination of four cortical features using two moments per chosen ROI. Our results showed improvement over using a mean value for each ROI, which explained 37% of the variance of FSIQ using the same set of cortical measurements. Our results suggest that using additional second order moments is potentially better than using mean values of chosen ROIs for regression analysis to account for human intelligence. Copyright © 2014 Elsevier Ltd. All rights reserved.
Searching for helium in the exosphere of HD 209458b
NASA Astrophysics Data System (ADS)
Moutou, C.; Coustenis, A.; Schneider, J.; Queloz, D.; Mayor, M.
2003-07-01
Atmospheric models of the extrasolar, close-in giant planet HD 209458b predict strong absorption features from alkali metals (Seager & Sasselov \\cite{Seager00}; Brown 2001). This was confirmed by the discovery of NaI by HST observations (Charbonneau et al. \\cite{Charbonneau02}). In this study we focus on the search for the helium absorption feature at 10 830 Å, also predicted to be among the strongest ones. Helium is a major component of the planet's exosphere, for which models are yet not as robust as atmosphere models. One full transit was observed with the VLT/ISAAC instrument. We do not report the detection of the HeI feature. The data set is strongly affected by instrumental fringing, at a level up to 5% in the extracted spectra. After filtering, a residual noise of the order of 0.2% remains. An upper limit of the HeI line was derived, which further constrains future models of the HD 209458b planet exosphere. This upper limit, in terms of the feature depth, is 0.5% at 3sigma for a 3 Å bandwidth. Prospects are proposed to lower the detectability limit; the ultimate detectability limit with ISAAC in the absence of electronic fringing and in ideal atmospheric conditions could be as low as a line depth of 0.1% (3 Å width, 3sigma ). Based on data acquired on the Very Large Telescope at Paranal Observatory, ESO Chile.
Frank, Laurence E; Heiser, Willem J
2008-05-01
A set of features is the basis for the network representation of proximity data achieved by feature network models (FNMs). Features are binary variables that characterize the objects in an experiment, with some measure of proximity as response variable. Sometimes features are provided by theory and play an important role in the construction of the experimental conditions. In some research settings, the features are not known a priori. This paper shows how to generate features in this situation and how to select an adequate subset of features that takes into account a good compromise between model fit and model complexity, using a new version of least angle regression that restricts coefficients to be non-negative, called the Positive Lasso. It will be shown that features can be generated efficiently with Gray codes that are naturally linked to the FNMs. The model selection strategy makes use of the fact that FNM can be considered as univariate multiple regression model. A simulation study shows that the proposed strategy leads to satisfactory results if the number of objects is less than or equal to 22. If the number of objects is larger than 22, the number of features selected by our method exceeds the true number of features in some conditions.
NASA Astrophysics Data System (ADS)
Liu, Xiwu; Guo, Zhiqi; Han, Xu
2018-06-01
A set of parallel vertical fractures embedded in a vertically transverse isotropy (VTI) background leads to orthorhombic anisotropy and corresponding azimuthal seismic responses. We conducted seismic modeling of full waveform amplitude variations versus azimuth (AVAZ) responses of anisotropic shale by integrating a rock physics model and a reflectivity method. The results indicate that the azimuthal variation of P-wave velocity tends to be more complicated for orthorhombic medium compared to the horizontally transverse isotropy (HTI) case, especially at high polar angles. Correspondingly, for the HTI layer in the theoretical model, the short axis of the azimuthal PP amplitudes at the top interface is parallel to the fracture strike, while the long axis at the bottom reflection directs the fracture strike. In contrast, the orthorhombic layer in the theoretical model shows distinct AVAZ responses in terms of PP reflections. Nevertheless, the azimuthal signatures of the R- and T-components of the mode-converted PS reflections show similar AVAZ features for the HTI and orthorhombic layers, which may imply that the PS responses are dominated by fractures. For the application to real data, a seismic-well tie based on upscaled data and a reflectivity method illustrate good agreement between the reference layers and the corresponding reflected events. Finally, the full waveform seismic AVAZ responses of the Longmaxi shale formation are computed for the cases of HTI and orthorhombic anisotropy for comparison. For the two cases, the azimuthal features represent differences mainly in amplitudes, while slightly in the phases of the reflected waveforms. Azimuth variations in the PP reflections from the reference layers show distinct behaviors for the HTI and orthorhombic cases, while the mode-converted PS reflections in terms of the R- and T-components show little differences in azimuthal features. It may suggest that the behaviors of the PS waves are dominated by vertically aligned fractures. This work provides further insight into the azimuthal seismic response of orthorhombic shales. The proposed method may help to improve the seismic-well tie, seismic interpretation, and inversion results using an azimuth anisotropy dataset.
Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning
Casanova, Ramon; Saldana, Santiago; Simpson, Sean L.; Lacy, Mary E.; Subauste, Angela R.; Blackshear, Chad; Wagenknecht, Lynne; Bertoni, Alain G.
2016-01-01
Statistical models to predict incident diabetes are often based on limited variables. Here we pursued two main goals: 1) investigate the relative performance of a machine learning method such as Random Forests (RF) for detecting incident diabetes in a high-dimensional setting defined by a large set of observational data, and 2) uncover potential predictors of diabetes. The Jackson Heart Study collected data at baseline and in two follow-up visits from 5,301 African Americans. We excluded those with baseline diabetes and no follow-up, leaving 3,633 individuals for analyses. Over a mean 8-year follow-up, 584 participants developed diabetes. The full RF model evaluated 93 variables including demographic, anthropometric, blood biomarker, medical history, and echocardiogram data. We also used RF metrics of variable importance to rank variables according to their contribution to diabetes prediction. We implemented other models based on logistic regression and RF where features were preselected. The RF full model performance was similar (AUC = 0.82) to those more parsimonious models. The top-ranked variables according to RF included hemoglobin A1C, fasting plasma glucose, waist circumference, adiponectin, c-reactive protein, triglycerides, leptin, left ventricular mass, high-density lipoprotein cholesterol, and aldosterone. This work shows the potential of RF for incident diabetes prediction while dealing with high-dimensional data. PMID:27727289
Generic decoding of seen and imagined objects using hierarchical visual features.
Horikawa, Tomoyasu; Kamitani, Yukiyasu
2017-05-22
Object recognition is a key function in both human and machine vision. While brain decoding of seen and imagined objects has been achieved, the prediction is limited to training examples. We present a decoding approach for arbitrary objects using the machine vision principle that an object category is represented by a set of features rendered invariant through hierarchical processing. We show that visual features, including those derived from a deep convolutional neural network, can be predicted from fMRI patterns, and that greater accuracy is achieved for low-/high-level features with lower-/higher-level visual areas, respectively. Predicted features are used to identify seen/imagined object categories (extending beyond decoder training) from a set of computed features for numerous object images. Furthermore, decoding of imagined objects reveals progressive recruitment of higher-to-lower visual representations. Our results demonstrate a homology between human and machine vision and its utility for brain-based information retrieval.
Iris recognition using possibilistic fuzzy matching on local features.
Tsai, Chung-Chih; Lin, Heng-Yi; Taur, Jinshiuh; Tao, Chin-Wang
2012-02-01
In this paper, we propose a novel possibilistic fuzzy matching strategy with invariant properties, which can provide a robust and effective matching scheme for two sets of iris feature points. In addition, the nonlinear normalization model is adopted to provide more accurate position before matching. Moreover, an effective iris segmentation method is proposed to refine the detected inner and outer boundaries to smooth curves. For feature extraction, the Gabor filters are adopted to detect the local feature points from the segmented iris image in the Cartesian coordinate system and to generate a rotation-invariant descriptor for each detected point. After that, the proposed matching algorithm is used to compute a similarity score for two sets of feature points from a pair of iris images. The experimental results show that the performance of our system is better than those of the systems based on the local features and is comparable to those of the typical systems.
HIV-1 protease cleavage site prediction based on two-stage feature selection method.
Niu, Bing; Yuan, Xiao-Cheng; Roeper, Preston; Su, Qiang; Peng, Chun-Rong; Yin, Jing-Yuan; Ding, Juan; Li, HaiPeng; Lu, Wen-Cong
2013-03-01
Knowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method. Thirty important biochemical features were found based on a jackknife test from the original data set containing 4,248 features. By using the AdaBoost method with the thirty selected features the prediction model yields an accuracy of 96.7% for the jackknife test and 92.1% for an independent set test, with increased accuracy over the original dataset by 6.7% and 77.4%, respectively. Our feature selection scheme could be a useful technique for finding effective competitive inhibitors of HIV protease.
Kindler syndrome: a case report and proposal for clinical diagnostic criteria.
Fischer, Irena Angelova; Kazandjieva, Jana; Vassileva, Snejina; Dourmishev, Assen
2005-06-01
Kindler syndrome is a rare hereditary disorder characterized by acral blister formation in infancy and childhood, progressive poikiloderma, cutaneous atrophy and increased photosensitivity. Since it was first described in 1954, less than 100 cases have been reported worldwide. Recently it has been reported that Kindler syndrome is the first genodermatosis caused by a defect in the actin-extracellular matrix linkage, and the gene was mapped to chromosome 20p12.3. The clinical features of the syndrome have been annotated by different authors but the definite of criteria to confirm the diagnosis have not yet been generally accepted. We report a case of Kindler syndrome that presents a full spectrum of clinical manifestations, and we propose a set of clinical criteria for diagnosis.
XLWrap - Querying and Integrating Arbitrary Spreadsheets with SPARQL
NASA Astrophysics Data System (ADS)
Langegger, Andreas; Wöß, Wolfram
In this paper a novel approach is presented for generating RDF graphs of arbitrary complexity from various spreadsheet layouts. Currently, none of the available spreadsheet-to-RDF wrappers supports cross tables and tables where data is not aligned in rows. Similar to RDF123, XLWrap is based on template graphs where fragments of triples can be mapped to specific cells of a spreadsheet. Additionally, it features a full expression algebra based on the syntax of OpenOffice Calc and various shift operations, which can be used to repeat similar mappings in order to wrap cross tables including multiple sheets and spreadsheet files. The set of available expression functions includes most of the native functions of OpenOffice Calc and can be easily extended by users of XLWrap.
JBrowse: A dynamic web platform for genome visualization and analysis
Buels, Robert; Yao, Eric; Diesh, Colin M.; ...
2016-04-12
Background: JBrowse is a fast and full-featured genome browser built with JavaScript and HTML5. It is easily embedded into websites or apps but can also be served as a standalone web page. Results: Overall improvements to speed and scalability are accompanied by specific enhancements that support complex interactive queries on large track sets. Analysis functions can readily be added using the plugin framework; most visual aspects of tracks can also be customized, along with clicks, mouseovers, menus, and popup boxes. JBrowse can also be used to browse local annotation files offline and to generate high-resolution figures for publication. Conclusions: JBrowsemore » is a mature web application suitable for genome visualization and analysis.« less
Cancer survival classification using integrated data sets and intermediate information.
Kim, Shinuk; Park, Taesung; Kon, Mark
2014-09-01
Although numerous studies related to cancer survival have been published, increasing the prediction accuracy of survival classes still remains a challenge. Integration of different data sets, such as microRNA (miRNA) and mRNA, might increase the accuracy of survival class prediction. Therefore, we suggested a machine learning (ML) approach to integrate different data sets, and developed a novel method based on feature selection with Cox proportional hazard regression model (FSCOX) to improve the prediction of cancer survival time. FSCOX provides us with intermediate survival information, which is usually discarded when separating survival into 2 groups (short- and long-term), and allows us to perform survival analysis. We used an ML-based protocol for feature selection, integrating information from miRNA and mRNA expression profiles at the feature level. To predict survival phenotypes, we used the following classifiers, first, existing ML methods, support vector machine (SVM) and random forest (RF), second, a new median-based classifier using FSCOX (FSCOX_median), and third, an SVM classifier using FSCOX (FSCOX_SVM). We compared these methods using 3 types of cancer tissue data sets: (i) miRNA expression, (ii) mRNA expression, and (iii) combined miRNA and mRNA expression. The latter data set included features selected either from the combined miRNA/mRNA profile or independently from miRNAs and mRNAs profiles (IFS). In the ovarian data set, the accuracy of survival classification using the combined miRNA/mRNA profiles with IFS was 75% using RF, 86.36% using SVM, 84.09% using FSCOX_median, and 88.64% using FSCOX_SVM with a balanced 22 short-term and 22 long-term survivor data set. These accuracies are higher than those using miRNA alone (70.45%, RF; 75%, SVM; 75%, FSCOX_median; and 75%, FSCOX_SVM) or mRNA alone (65.91%, RF; 63.64%, SVM; 72.73%, FSCOX_median; and 70.45%, FSCOX_SVM). Similarly in the glioblastoma multiforme data, the accuracy of miRNA/mRNA using IFS was 75.51% (RF), 87.76% (SVM) 85.71% (FSCOX_median), 85.71% (FSCOX_SVM). These results are higher than the results of using miRNA expression and mRNA expression alone. In addition we predict 16 hsa-miR-23b and hsa-miR-27b target genes in ovarian cancer data sets, obtained by SVM-based feature selection through integration of sequence information and gene expression profiles. Among the approaches used, the integrated miRNA and mRNA data set yielded better results than the individual data sets. The best performance was achieved using the FSCOX_SVM method with independent feature selection, which uses intermediate survival information between short-term and long-term survival time and the combination of the 2 different data sets. The results obtained using the combined data set suggest that there are some strong interactions between miRNA and mRNA features that are not detectable in the individual analyses. Copyright © 2014 Elsevier B.V. All rights reserved.
Sarker, Abeed; Gonzalez, Graciela
2015-02-01
Automatic detection of adverse drug reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media-where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing (NLP) approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-corpus Training
Gonzalez, Graciela
2014-01-01
Objective Automatic detection of Adverse Drug Reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media — where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. Methods One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Results Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Conclusions Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. PMID:25451103
Validation of the SimSET simulation package for modeling the Siemens Biograph mCT PET scanner
NASA Astrophysics Data System (ADS)
Poon, Jonathan K.; Dahlbom, Magnus L.; Casey, Michael E.; Qi, Jinyi; Cherry, Simon R.; Badawi, Ramsey D.
2015-02-01
Monte Carlo simulation provides a valuable tool in performance assessment and optimization of system design parameters for PET scanners. SimSET is a popular Monte Carlo simulation toolkit that features fast simulation time, as well as variance reduction tools to further enhance computational efficiency. However, SimSET has lacked the ability to simulate block detectors until its most recent release. Our goal is to validate new features of SimSET by developing a simulation model of the Siemens Biograph mCT PET scanner and comparing the results to a simulation model developed in the GATE simulation suite and to experimental results. We used the NEMA NU-2 2007 scatter fraction, count rates, and spatial resolution protocols to validate the SimSET simulation model and its new features. The SimSET model overestimated the experimental results of the count rate tests by 11-23% and the spatial resolution test by 13-28%, which is comparable to previous validation studies of other PET scanners in the literature. The difference between the SimSET and GATE simulation was approximately 4-8% for the count rate test and approximately 3-11% for the spatial resolution test. In terms of computational time, SimSET performed simulations approximately 11 times faster than GATE simulations. The new block detector model in SimSET offers a fast and reasonably accurate simulation toolkit for PET imaging applications.
EEG analysis of seizure patterns using visibility graphs for detection of generalized seizures.
Wang, Lei; Long, Xi; Arends, Johan B A M; Aarts, Ronald M
2017-10-01
The traditional EEG features in the time and frequency domain show limited seizure detection performance in the epileptic population with intellectual disability (ID). In addition, the influence of EEG seizure patterns on detection performance was less studied. A single-channel EEG signal can be mapped into visibility graphs (VGS), including basic visibility graph (VG), horizontal VG (HVG), and difference VG (DVG). These graphs were used to characterize different EEG seizure patterns. To demonstrate its effectiveness in identifying EEG seizure patterns and detecting generalized seizures, EEG recordings of 615h on one EEG channel from 29 epileptic patients with ID were analyzed. A novel feature set with discriminative power for seizure detection was obtained by using the VGS method. The degree distributions (DDs) of DVG can clearly distinguish EEG of each seizure pattern. The degree entropy and power-law degree power in DVG were proposed here for the first time, and they show significant difference between seizure and non-seizure EEG. The connecting structure measured by HVG can better distinguish seizure EEG from background than those by VG and DVG. A traditional EEG feature set based on frequency analysis was used here as a benchmark feature set. With a support vector machine (SVM) classifier, the seizure detection performance of the benchmark feature set (sensitivity of 24%, FD t /h of 1.8s) can be improved by combining our proposed VGS features extracted from one EEG channel (sensitivity of 38%, FD t /h of 1.4s). The proposed VGS-based features can help improve seizure detection for ID patients. Copyright © 2017 Elsevier B.V. All rights reserved.
[Research Progress of Multi-Model Medical Image Fusion at Feature Level].
Zhang, Junjie; Zhou, Tao; Lu, Huiling; Wang, Huiqun
2016-04-01
Medical image fusion realizes advantage integration of functional images and anatomical images.This article discusses the research progress of multi-model medical image fusion at feature level.We firstly describe the principle of medical image fusion at feature level.Then we analyze and summarize fuzzy sets,rough sets,D-S evidence theory,artificial neural network,principal component analysis and other fusion methods’ applications in medical image fusion and get summery.Lastly,we in this article indicate present problems and the research direction of multi-model medical images in the future.
Mosaic Trisomy 9p in a Patient with Mild Dysmorphic Features and Normal Intelligence.
Brar, Randeep; Basel, Donald G; Bick, David P; Weik, LuAnn; vanTuinen, Peter; Peterson, Jess F
2017-01-01
To the Editor: Partial and whole duplications of the short arm of chromosome 9 have been commonly reported in the literature with characteristic phenotypic features and intellectual disabilities. The clinical features of 9p duplications are broad and can include growth retardation, developmental delay, intellectual disability, microbrachycephaly, deep set eyes, hypertelorism, downslanting palpebral fissures, prominent nasal root, bulbous nasal tip, low-set ears, short fingers and toes with hypoplastic nails, and delayed bone age (Bonaglia et al., 2002; Zou et al., 2009; Guilherme et al., 2014).
Gene/protein name recognition based on support vector machine using dictionary as features.
Mitsumori, Tomohiro; Fation, Sevrani; Murata, Masaki; Doi, Kouichi; Doi, Hirohumi
2005-01-01
Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition. In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting. During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required.
1989-08-21
Photo by Voyager 2 (JPL) During August 16 and 17, 1989, the Voyager 2 narrow-angle camera was used to photograph Neptune almost continuously, recording approximately two and one-half rotations of the planet. These images represent the most complete set of full disk Neptune images that the spacecraft will acquire. This picture from the sequence shows two of the four cloud features which have been tracked by the Voyager cameras during the past two months. The large dark oval near the western limb (the left edge) is at a latitude of 22 degrees south and circuits Neptune every 18.3 hours. The bright clouds immediately to the south and east of this oval are seen to substantially change their appearances in periods as short as four hours. The second dark spot, at 54 degrees south latitude near the terminator (lower right edge), circuits Neptune every 16.1 hours. This image has been processed to enchance the visibility of small features, at some sacrifice of color fidelity. The Voyager Mission is conducted by JPL for NASA's Office of Space Science and Applications. (JPL Ref: A-34611 Voyager 2-N29)
AIRID: an application of the KAS/Prospector expert system builder to airplane identification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aldridge, J.P.
1984-01-01
The Knowledge Acquisition System/Prospector expert system building tool developed by SRI, International, has been used to construct an expert system to identify aircraft on the basis of observables such as wing shape, engine number/location, fuselage shape, and tail assembly shape. Additional detailed features are allowed to influence the identification as other favorable features. Constraints on the observations imposed by bad weather and distant observations have been included as contexts to the models. Models for Soviet and US fighter aircraft have been included. Inclusion of other types of aircraft such as bombers, transports, and reconnaissance craft is straightforward. Two models permitmore » exploration of the interaction of semantic and taxonomic networks with the models. A full set of text data for fluid communication with the user has been included. The use of demons as triggered output responses to enhance utility to the user has been explored. This paper presents discussion of the ease of building the expert system using this powerful tool and problems encountered in the construction process.« less
Visual Odometry Based on Structural Matching of Local Invariant Features Using Stereo Camera Sensor
Núñez, Pedro; Vázquez-Martín, Ricardo; Bandera, Antonio
2011-01-01
This paper describes a novel sensor system to estimate the motion of a stereo camera. Local invariant image features are matched between pairs of frames and linked into image trajectories at video rate, providing the so-called visual odometry, i.e., motion estimates from visual input alone. Our proposal conducts two matching sessions: the first one between sets of features associated to the images of the stereo pairs and the second one between sets of features associated to consecutive frames. With respect to previously proposed approaches, the main novelty of this proposal is that both matching algorithms are conducted by means of a fast matching algorithm which combines absolute and relative feature constraints. Finding the largest-valued set of mutually consistent matches is equivalent to finding the maximum-weighted clique on a graph. The stereo matching allows to represent the scene view as a graph which emerge from the features of the accepted clique. On the other hand, the frame-to-frame matching defines a graph whose vertices are features in 3D space. The efficiency of the approach is increased by minimizing the geometric and algebraic errors to estimate the final displacement of the stereo camera between consecutive acquired frames. The proposed approach has been tested for mobile robotics navigation purposes in real environments and using different features. Experimental results demonstrate the performance of the proposal, which could be applied in both industrial and service robot fields. PMID:22164016
Exploring KM Features of High-Performance Companies
NASA Astrophysics Data System (ADS)
Wu, Wei-Wen
2007-12-01
For reacting to an increasingly rival business environment, many companies emphasize the importance of knowledge management (KM). It is a favorable way to explore and learn KM features of high-performance companies. However, finding out the critical KM features of high-performance companies is a qualitative analysis problem. To handle this kind of problem, the rough set approach is suitable because it is based on data-mining techniques to discover knowledge without rigorous statistical assumptions. Thus, this paper explored KM features of high-performance companies by using the rough set approach. The results show that high-performance companies stress the importance on both tacit and explicit knowledge, and consider that incentives and evaluations are the essentials to implementing KM.
Histological Image Feature Mining Reveals Emergent Diagnostic Properties for Renal Cancer
Kothari, Sonal; Phan, John H.; Young, Andrew N.; Wang, May D.
2016-01-01
Computer-aided histological image classification systems are important for making objective and timely cancer diagnostic decisions. These systems use combinations of image features that quantify a variety of image properties. Because researchers tend to validate their diagnostic systems on specific cancer endpoints, it is difficult to predict which image features will perform well given a new cancer endpoint. In this paper, we define a comprehensive set of common image features (consisting of 12 distinct feature subsets) that quantify a variety of image properties. We use a data-mining approach to determine which feature subsets and image properties emerge as part of an “optimal” diagnostic model when applied to specific cancer endpoints. Our goal is to assess the performance of such comprehensive image feature sets for application to a wide variety of diagnostic problems. We perform this study on 12 endpoints including 6 renal tumor subtype endpoints and 6 renal cancer grade endpoints. Keywords-histology, image mining, computer-aided diagnosis PMID:28163980
Information based universal feature extraction
NASA Astrophysics Data System (ADS)
Amiri, Mohammad; Brause, Rüdiger
2015-02-01
In many real world image based pattern recognition tasks, the extraction and usage of task-relevant features are the most crucial part of the diagnosis. In the standard approach, they mostly remain task-specific, although humans who perform such a task always use the same image features, trained in early childhood. It seems that universal feature sets exist, but they are not yet systematically found. In our contribution, we tried to find those universal image feature sets that are valuable for most image related tasks. In our approach, we trained a neural network by natural and non-natural images of objects and background, using a Shannon information-based algorithm and learning constraints. The goal was to extract those features that give the most valuable information for classification of visual objects hand-written digits. This will give a good start and performance increase for all other image learning tasks, implementing a transfer learning approach. As result, in our case we found that we could indeed extract features which are valid in all three kinds of tasks.
Tsatsishvili, Valeri; Burunat, Iballa; Cong, Fengyu; Toiviainen, Petri; Alluri, Vinoo; Ristaniemi, Tapani
2018-06-01
There has been growing interest towards naturalistic neuroimaging experiments, which deepen our understanding of how human brain processes and integrates incoming streams of multifaceted sensory information, as commonly occurs in real world. Music is a good example of such complex continuous phenomenon. In a few recent fMRI studies examining neural correlates of music in continuous listening settings, multiple perceptual attributes of music stimulus were represented by a set of high-level features, produced as the linear combination of the acoustic descriptors computationally extracted from the stimulus audio. NEW METHOD: fMRI data from naturalistic music listening experiment were employed here. Kernel principal component analysis (KPCA) was applied to acoustic descriptors extracted from the stimulus audio to generate a set of nonlinear stimulus features. Subsequently, perceptual and neural correlates of the generated high-level features were examined. The generated features captured musical percepts that were hidden from the linear PCA features, namely Rhythmic Complexity and Event Synchronicity. Neural correlates of the new features revealed activations associated to processing of complex rhythms, including auditory, motor, and frontal areas. Results were compared with the findings in the previously published study, which analyzed the same fMRI data but applied linear PCA for generating stimulus features. To enable comparison of the results, methodology for finding stimulus-driven functional maps was adopted from the previous study. Exploiting nonlinear relationships among acoustic descriptors can lead to the novel high-level stimulus features, which can in turn reveal new brain structures involved in music processing. Copyright © 2018 Elsevier B.V. All rights reserved.
Efficient feature subset selection with probabilistic distance criteria. [pattern recognition
NASA Technical Reports Server (NTRS)
Chittineni, C. B.
1979-01-01
Recursive expressions are derived for efficiently computing the commonly used probabilistic distance measures as a change in the criteria both when a feature is added to and when a feature is deleted from the current feature subset. A combinatorial algorithm for generating all possible r feature combinations from a given set of s features in (s/r) steps with a change of a single feature at each step is presented. These expressions can also be used for both forward and backward sequential feature selection.
MO-AB-BRA-10: Cancer Therapy Outcome Prediction Based On Dempster-Shafer Theory and PET Imaging
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lian, C; University of Rouen, QuantIF - EA 4108 LITIS, 76000 Rouen; Li, H
2015-06-15
Purpose: In cancer therapy, utilizing FDG-18 PET image-based features for accurate outcome prediction is challenging because of 1) limited discriminative information within a small number of PET image sets, and 2) fluctuant feature characteristics caused by the inferior spatial resolution and system noise of PET imaging. In this study, we proposed a new Dempster-Shafer theory (DST) based approach, evidential low-dimensional transformation with feature selection (ELT-FS), to accurately predict cancer therapy outcome with both PET imaging features and clinical characteristics. Methods: First, a specific loss function with sparse penalty was developed to learn an adaptive low-rank distance metric for representing themore » dissimilarity between different patients’ feature vectors. By minimizing this loss function, a linear low-dimensional transformation of input features was achieved. Also, imprecise features were excluded simultaneously by applying a l2,1-norm regularization of the learnt dissimilarity metric in the loss function. Finally, the learnt dissimilarity metric was applied in an evidential K-nearest-neighbor (EK- NN) classifier to predict treatment outcome. Results: Twenty-five patients with stage II–III non-small-cell lung cancer and thirty-six patients with esophageal squamous cell carcinomas treated with chemo-radiotherapy were collected. For the two groups of patients, 52 and 29 features, respectively, were utilized. The leave-one-out cross-validation (LOOCV) protocol was used for evaluation. Compared to three existing linear transformation methods (PCA, LDA, NCA), the proposed ELT-FS leads to higher prediction accuracy for the training and testing sets both for lung-cancer patients (100+/−0.0, 88.0+/−33.17) and for esophageal-cancer patients (97.46+/−1.64, 83.33+/−37.8). The ELT-FS also provides superior class separation in both test data sets. Conclusion: A novel DST- based approach has been proposed to predict cancer treatment outcome using PET image features and clinical characteristics. A specific loss function has been designed for robust accommodation of feature set incertitude and imprecision, facilitating adaptive learning of the dissimilarity metric for the EK-NN classifier.« less
Subject-specific and pose-oriented facial features for face recognition across poses.
Lee, Ping-Han; Hsu, Gee-Sern; Wang, Yun-Wen; Hung, Yi-Ping
2012-10-01
Most face recognition scenarios assume that frontal faces or mug shots are available for enrollment to the database, faces of other poses are collected in the probe set. Given a face from the probe set, one needs to determine whether a match in the database exists. This is under the assumption that in forensic applications, most suspects have their mug shots available in the database, and face recognition aims at recognizing the suspects when their faces of various poses are captured by a surveillance camera. This paper considers a different scenario: given a face with multiple poses available, which may or may not include a mug shot, develop a method to recognize the face with poses different from those captured. That is, given two disjoint sets of poses of a face, one for enrollment and the other for recognition, this paper reports a method best for handling such cases. The proposed method includes feature extraction and classification. For feature extraction, we first cluster the poses of each subject's face in the enrollment set into a few pose classes and then decompose the appearance of the face in each pose class using Embedded Hidden Markov Model, which allows us to define a set of subject-specific and pose-priented (SSPO) facial components for each subject. For classification, an Adaboost weighting scheme is used to fuse the component classifiers with SSPO component features. The proposed method is proven to outperform other approaches, including a component-based classifier with local facial features cropped manually, in an extensive performance evaluation study.
Search asymmetry and eye movements in infants and adults.
Adler, Scott A; Gallego, Pamela
2014-08-01
Search asymmetry is characterized by the detection of a feature-present target amidst feature-absent distractors being efficient and unaffected by the number of distractors, whereas detection of a feature-absent target amidst feature-present distractors is typically inefficient and affected by the number of distractors. Although studies have attempted to investigate this phenomenon with infants (e.g., Adler, Inslicht, Rovee-Collier, & Gerhardstein in Infant Behavioral Development, 21, 253-272, 1998; Colombo, Mitchell, Coldren, & Atwater in Journal of Experimental Psychology: Learning, Memory and Cognition, 19, 98-109, 1990), due to methodological limitations, their findings have been unable to definitively establish the development of visual search mechanisms in infants. The present study assessed eye movements as a means to examine an asymmetry in responding to feature-present versus feature-absent targets in 3-month-olds, relative to adults. Saccade latencies to localize a target (or a distractor, as in the homogeneous conditions) were measured as infants and adults randomly viewed feature-present (R among Ps), feature-absent (P among Rs), and homogeneous (either all Rs or all Ps) arrays at set sizes of 1, 3, 5, and 8. Results indicated that neither infants' nor adults' saccade latencies to localize the target in the feature-present arrays were affected by increasing set sizes, suggesting that localization of the target was efficient. In contrast, saccade latencies to localize the target in the feature-absent arrays increased with increasing set sizes for both infants and adults, suggesting an inefficient localization. These findings indicate that infants exhibit an asymmetry consistent with that found with adults, providing support for functional bottom-up selective attention mechanisms in early infancy.
Newell, Nicholas E
2011-12-15
The extraction of the set of features most relevant to function from classified biological sequence sets is still a challenging problem. A central issue is the determination of expected counts for higher order features so that artifact features may be screened. Cascade detection (CD), a new algorithm for the extraction of localized features from sequence sets, is introduced. CD is a natural extension of the proportional modeling techniques used in contingency table analysis into the domain of feature detection. The algorithm is successfully tested on synthetic data and then applied to feature detection problems from two different domains to demonstrate its broad utility. An analysis of HIV-1 protease specificity reveals patterns of strong first-order features that group hydrophobic residues by side chain geometry and exhibit substantial symmetry about the cleavage site. Higher order results suggest that favorable cooperativity is weak by comparison and broadly distributed, but indicate possible synergies between negative charge and hydrophobicity in the substrate. Structure-function results for the Schellman loop, a helix-capping motif in proteins, contain strong first-order features and also show statistically significant cooperativities that provide new insights into the design of the motif. These include a new 'hydrophobic staple' and multiple amphipathic and electrostatic pair features. CD should prove useful not only for sequence analysis, but also for the detection of multifactor synergies in cross-classified data from clinical studies or other sources. Windows XP/7 application and data files available at: https://sites.google.com/site/cascadedetect/home. nacnewell@comcast.net Supplementary information is available at Bioinformatics online.
The Roles of Feature-Specific Task Set and Bottom-Up Salience in Attentional Capture: An ERP Study
ERIC Educational Resources Information Center
Eimer, Martin; Kiss, Monika; Press, Clare; Sauter, Disa
2009-01-01
We investigated the roles of top-down task set and bottom-up stimulus salience for feature-specific attentional capture. Spatially nonpredictive cues preceded search arrays that included a color-defined target. For target-color singleton cues, behavioral spatial cueing effects were accompanied by cue-induced N2pc components, indicative of…
The Use of Discourse Markers as an Interactive Feature in Science Lecture Discourse in L2 Setting
ERIC Educational Resources Information Center
Rido, Akhyar
2010-01-01
The objective of this research is to investigate the function of discourse markers as an interpersonal-interactive feature in a science lecture in second language (L2) setting in Malaysia. This research employs qualitative method while the data are gathered through non-participant observation and video recording. From the findings, there are…
Longitudinal Variations in Jupiter's Winds
NASA Astrophysics Data System (ADS)
Simon-Miller, Amy A.; Gierasch, P. J.; Tierney, G.
2010-10-01
Long-term studies of Jupiter's zonal wind field revealed temporal variations on the order of 20 to 40 m/s at many latitudes, greater than the typical data uncertainties of 1 to 10 m/s. No definitive periodicities were evident, however, though some latitudinally-confined signals did appear at periods relevant to the Quasi-Quadrennial Oscillation (Simon-Miller & Gierasch, Icarus, in press). As the QQO appears, from vertical temperature profiles, to propagate downward, it is unclear why a signal is not more obvious, unless other processes dominate over possibly weaker forcing from the QQO. An additional complication is that zonal wind profiles represent an average over some particular set of longitudes for an image pair and most data sets do not offer global wind coverage. Even avoiding known features, such as the large anticyclonic vortices especially prevalent in the south, there can be distinct variations in longitude. We present results on the full wind field from Voyager and Cassini data, showing apparent longitudinal variations of up to 60 m/s or more. These are particularly obvious near disruptions such as the South Equatorial Disturbance, even when the feature itself is not clearly visible. These two dates represent very different states of the planet for comparison: Voyagers 1 & 2 flew by Jupiter shortly after a global upheaval, while many regions were in a disturbed state, while the Cassini view is typical of a more quiescent period present during much of the 1990s and early 2000s.
Exploration of the relationship between topology and designability of conformations
NASA Astrophysics Data System (ADS)
Leelananda, Sumudu P.; Towfic, Fadi; Jernigan, Robert L.; Kloczkowski, Andrzej
2011-06-01
Protein structures are evolutionarily more conserved than sequences, and sequences with very low sequence identity frequently share the same fold. This leads to the concept of protein designability. Some folds are more designable and lots of sequences can assume that fold. Elucidating the relationship between protein sequence and the three-dimensional (3D) structure that the sequence folds into is an important problem in computational structural biology. Lattice models have been utilized in numerous studies to model protein folds and predict the designability of certain folds. In this study, all possible compact conformations within a set of two-dimensional and 3D lattice spaces are explored. Complementary interaction graphs are then generated for each conformation and are described using a set of graph features. The full HP sequence space for each lattice model is generated and contact energies are calculated by threading each sequence onto all the possible conformations. Unique conformation giving minimum energy is identified for each sequence and the number of sequences folding to each conformation (designability) is obtained. Machine learning algorithms are used to predict the designability of each conformation. We find that the highly designable structures can be distinguished from other non-designable conformations based on certain graphical geometric features of the interactions. This finding confirms the fact that the topology of a conformation is an important determinant of the extent of its designability and suggests that the interactions themselves are important for determining the designability.
Longitudinal Variations in Jupiter's Winds
NASA Technical Reports Server (NTRS)
Simon-Miller, Amy A.; Gierasch, P. J.; Tierney, G.
2010-01-01
Long-term studies of Jupiter's zonal wind field revealed temporal variations on the order of 20 to 40 m/s at many latitudes, greater than the typical data uncertainties of 1 to 10 m/s. No definitive periodicities were evident, however, though some latitudinally-confined signals did appear at periods relevant to the Quasi- Quadrennial Oscillation (Simon-Miller & Gierasch, Icarus, in press). As the QQO appears, from vertical temperature profiles, to propagate downward, it is unclear why a signal is not more obvious, unless other processes dominate over possibly weaker forcing from the QQO. An additional complication is that zonal wind profiles represent an average over some particular set of longitudes for an image pair and most data sets do not offer global wind coverage. Lien avoiding known features, such as the large anticyclonic vortices especially prevalent in the south, there can be distinct variations in longitude. We present results on the full wind field from Voyager and Cassini data, showing apparent longitudinal variations of up to 60 m/s or more. These are particularly obvious near disruptions such as the South Equatorial Disturbance, even when the feature itself is not clearly visible. These two dates represent very different states of the planet for comparison: Voyagers 1 & 2 flew by Jupiter shortly after a global upheaval, while many regions were in a disturbed state, while the Cassini view is typical of a more quiescent period present during much of the 1990s and early 2000s.
Xu, Xinxing; Li, Wen; Xu, Dong
2015-12-01
In this paper, we propose a new approach to improve face verification and person re-identification in the RGB images by leveraging a set of RGB-D data, in which we have additional depth images in the training data captured using depth cameras such as Kinect. In particular, we extract visual features and depth features from the RGB images and depth images, respectively. As the depth features are available only in the training data, we treat the depth features as privileged information, and we formulate this task as a distance metric learning with privileged information problem. Unlike the traditional face verification and person re-identification tasks that only use visual features, we further employ the extra depth features in the training data to improve the learning of distance metric in the training process. Based on the information-theoretic metric learning (ITML) method, we propose a new formulation called ITML with privileged information (ITML+) for this task. We also present an efficient algorithm based on the cyclic projection method for solving the proposed ITML+ formulation. Extensive experiments on the challenging faces data sets EUROCOM and CurtinFaces for face verification as well as the BIWI RGBD-ID data set for person re-identification demonstrate the effectiveness of our proposed approach.
Improved Diagnostic Multimodal Biomarkers for Alzheimer's Disease and Mild Cognitive Impairment
Martínez-Torteya, Antonio; Treviño, Víctor; Tamez-Peña, José G.
2015-01-01
The early diagnosis of Alzheimer's disease (AD) and mild cognitive impairment (MCI) is very important for treatment research and patient care purposes. Few biomarkers are currently considered in clinical settings, and their use is still optional. The objective of this work was to determine whether multimodal and nonpreviously AD associated features could improve the classification accuracy between AD, MCI, and healthy controls, which may impact future AD biomarkers. For this, Alzheimer's Disease Neuroimaging Initiative database was mined for case-control candidates. At least 652 baseline features extracted from MRI and PET analyses, biological samples, and clinical data up to February 2014 were used. A feature selection methodology that includes a genetic algorithm search coupled to a logistic regression classifier and forward and backward selection strategies was used to explore combinations of features. This generated diagnostic models with sizes ranging from 3 to 8, including well documented AD biomarkers, as well as unexplored image, biochemical, and clinical features. Accuracies of 0.85, 0.79, and 0.80 were achieved for HC-AD, HC-MCI, and MCI-AD classifications, respectively, when evaluated using a blind test set. In conclusion, a set of features provided additional and independent information to well-established AD biomarkers, aiding in the classification of MCI and AD. PMID:26106620
Freezing effect on bread appearance evaluated by digital imaging
NASA Astrophysics Data System (ADS)
Zayas, Inna Y.
1999-01-01
In marketing channels, bread is sometimes delivered in a frozen sate for distribution. Changes occur in physical dimensions, crumb grain and appearance of slices. Ten loaves, twelve bread slices per loaf were scanned for digital image analysis and then frozen in a commercial refrigerator. The bread slices were stored for four weeks scanned again, permitted to thaw and scanned a third time. Image features were extracted, to determine shape, size and image texture of the slices. Different thresholds of grey levels were set to detect changes that occurred in crumb, images were binarized at these settings. The number of pixels falling into these gray level settings were determined for each slice. Image texture features of subimages of each slice were calculated to quantify slice crumb grain. The image features of the slice size showed shrinking of bread slices, as a results of freezing and storage, although shape of slices did not change markedly. Visible crumb texture changes occurred and these changes were depicted by changes in image texture features. Image texture features showed that slice crumb changed differently at the center of a slice compared to a peripheral area close to the crust. Image texture and slice features were sufficient for discrimination of slices before and after freezing and after thawing.
Hwang, Wonjun; Wang, Haitao; Kim, Hyunwoo; Kee, Seok-Cheol; Kim, Junmo
2011-04-01
The authors present a robust face recognition system for large-scale data sets taken under uncontrolled illumination variations. The proposed face recognition system consists of a novel illumination-insensitive preprocessing method, a hybrid Fourier-based facial feature extraction, and a score fusion scheme. First, in the preprocessing stage, a face image is transformed into an illumination-insensitive image, called an "integral normalized gradient image," by normalizing and integrating the smoothed gradients of a facial image. Then, for feature extraction of complementary classifiers, multiple face models based upon hybrid Fourier features are applied. The hybrid Fourier features are extracted from different Fourier domains in different frequency bandwidths, and then each feature is individually classified by linear discriminant analysis. In addition, multiple face models are generated by plural normalized face images that have different eye distances. Finally, to combine scores from multiple complementary classifiers, a log likelihood ratio-based score fusion scheme is applied. The proposed system using the face recognition grand challenge (FRGC) experimental protocols is evaluated; FRGC is a large available data set. Experimental results on the FRGC version 2.0 data sets have shown that the proposed method shows an average of 81.49% verification rate on 2-D face images under various environmental variations such as illumination changes, expression changes, and time elapses.
Assessment of features for automatic CTG analysis based on expert annotation.
Chudácek, Vacláv; Spilka, Jirí; Lhotská, Lenka; Janku, Petr; Koucký, Michal; Huptych, Michal; Bursa, Miroslav
2011-01-01
Cardiotocography (CTG) is the monitoring of fetal heart rate (FHR) and uterine contractions (TOCO) since 1960's used routinely by obstetricians to detect fetal hypoxia. The evaluation of the FHR in clinical settings is based on an evaluation of macroscopic morphological features and so far has managed to avoid adopting any achievements from the HRV research field. In this work, most of the ever-used features utilized for FHR characterization, including FIGO, HRV, nonlinear, wavelet, and time and frequency domain features, are investigated and the features are assessed based on their statistical significance in the task of distinguishing the FHR into three FIGO classes. Annotation derived from the panel of experts instead of the commonly utilized pH values was used for evaluation of the features on a large data set (552 records). We conclude the paper by presenting the best uncorrelated features and their individual rank of importance according to the meta-analysis of three different ranking methods. Number of acceleration and deceleration, interval index, as well as Lempel-Ziv complexity and Higuchi's fractal dimension are among the top five features.
Hybrid feature selection for supporting lightweight intrusion detection systems
NASA Astrophysics Data System (ADS)
Song, Jianglong; Zhao, Wentao; Liu, Qiang; Wang, Xin
2017-08-01
Redundant and irrelevant features not only cause high resource consumption but also degrade the performance of Intrusion Detection Systems (IDS), especially when coping with big data. These features slow down the process of training and testing in network traffic classification. Therefore, a hybrid feature selection approach in combination with wrapper and filter selection is designed in this paper to build a lightweight intrusion detection system. Two main phases are involved in this method. The first phase conducts a preliminary search for an optimal subset of features, in which the chi-square feature selection is utilized. The selected set of features from the previous phase is further refined in the second phase in a wrapper manner, in which the Random Forest(RF) is used to guide the selection process and retain an optimized set of features. After that, we build an RF-based detection model and make a fair comparison with other approaches. The experimental results on NSL-KDD datasets show that our approach results are in higher detection accuracy as well as faster training and testing processes.
NASA Astrophysics Data System (ADS)
Ma, L.; Zhou, M.; Li, C.
2017-09-01
In this study, a Random Forest (RF) based land covers classification method is presented to predict the types of land covers in Miyun area. The returned full-waveforms which were acquired by a LiteMapper 5600 airborne LiDAR system were processed, including waveform filtering, waveform decomposition and features extraction. The commonly used features that were distance, intensity, Full Width at Half Maximum (FWHM), skewness and kurtosis were extracted. These waveform features were used as attributes of training data for generating the RF prediction model. The RF prediction model was applied to predict the types of land covers in Miyun area as trees, buildings, farmland and ground. The classification results of these four types of land covers were obtained according to the ground truth information acquired from CCD image data of the same region. The RF classification results were compared with that of SVM method and show better results. The RF classification accuracy reached 89.73% and the classification Kappa was 0.8631.
Sewell, Justin L; Boscardin, Christy K; Young, John Q; Ten Cate, Olle; O'Sullivan, Patricia S
2017-11-01
Cognitive load theory, focusing on limits of the working memory, is relevant to medical education; however, factors associated with cognitive load during procedural skills training are not well characterized. The authors sought to determine how features of learners, patients/tasks, settings, and supervisors were associated with three types of cognitive load among learners performing a specific procedure, colonoscopy, to identify implications for procedural teaching. Data were collected through an electronically administered survey sent to 1,061 U.S. gastroenterology fellows during the 2014-2015 academic year; 477 (45.0%) participated. Participants completed the survey immediately following a colonoscopy. Using multivariable linear regression analyses, the authors identified sets of features associated with intrinsic, extraneous, and germane loads. Features associated with intrinsic load included learners (prior experience and year in training negatively associated, fatigue positively associated) and patient/tasks (procedural complexity positively associated, better patient tolerance negatively associated). Features associated with extraneous load included learners (fatigue positively associated), setting (queue order positively associated), and supervisors (supervisor engagement and confidence negatively associated). Only one feature, supervisor engagement, was (positively) associated with germane load. These data support practical recommendations for teaching procedural skills through the lens of cognitive load theory. To optimize intrinsic load, level of experience and competence of learners should be balanced with procedural complexity; part-task approaches and scaffolding may be beneficial. To reduce extraneous load, teachers should remain engaged, and factors within the procedural setting that may interfere with learning should be minimized. To optimize germane load, teachers should remain engaged.
Automatic machine learning based prediction of cardiovascular events in lung cancer screening data
NASA Astrophysics Data System (ADS)
de Vos, Bob D.; de Jong, Pim A.; Wolterink, Jelmer M.; Vliegenthart, Rozemarijn; Wielingen, Geoffrey V. F.; Viergever, Max A.; Išgum, Ivana
2015-03-01
Calcium burden determined in CT images acquired in lung cancer screening is a strong predictor of cardiovascular events (CVEs). This study investigated whether subjects undergoing such screening who are at risk of a CVE can be identified using automatic image analysis and subject characteristics. Moreover, the study examined whether these individuals can be identified using solely image information, or if a combination of image and subject data is needed. A set of 3559 male subjects undergoing Dutch-Belgian lung cancer screening trial was included. Low-dose non-ECG synchronized chest CT images acquired at baseline were analyzed (1834 scanned in the University Medical Center Groningen, 1725 in the University Medical Center Utrecht). Aortic and coronary calcifications were identified using previously developed automatic algorithms. A set of features describing number, volume and size distribution of the detected calcifications was computed. Age of the participants was extracted from image headers. Features describing participants' smoking status, smoking history and past CVEs were obtained. CVEs that occurred within three years after the imaging were used as outcome. Support vector machine classification was performed employing different feature sets using sets of only image features, or a combination of image and subject related characteristics. Classification based solely on the image features resulted in the area under the ROC curve (Az) of 0.69. A combination of image and subject features resulted in an Az of 0.71. The results demonstrate that subjects undergoing lung cancer screening who are at risk of CVE can be identified using automatic image analysis. Adding subject information slightly improved the performance.
The perceptual processing capacity of summary statistics between and within feature dimensions
Attarha, Mouna; Moore, Cathleen M.
2015-01-01
The simultaneous–sequential method was used to test the processing capacity of statistical summary representations both within and between feature dimensions. Sixteen gratings varied with respect to their size and orientation. In Experiment 1, the gratings were equally divided into four separate smaller sets, one of which with a mean size that was larger or smaller than the other three sets, and one of which with a mean orientation that was tilted more leftward or rightward. The task was to report the mean size and orientation of the oddball sets. This therefore required four summary representations for size and another four for orientation. The sets were presented at the same time in the simultaneous condition or across two temporal frames in the sequential condition. Experiment 1 showed evidence of a sequential advantage, suggesting that the system may be limited with respect to establishing multiple within-feature summaries. Experiment 2 eliminates the possibility that some aspect of the task, other than averaging, was contributing to this observed limitation. In Experiment 3, the same 16 gratings appeared as one large superset, and therefore the task only required one summary representation for size and another one for orientation. Equal simultaneous–sequential performance indicated that between-feature summaries are capacity free. These findings challenge the view that within-feature summaries drive a global sense of visual continuity across areas of the peripheral visual field, and suggest a shift in focus to seeking an understanding of how between-feature summaries in one area of the environment control behavior. PMID:26360153
Full-Text Databases in Medicine.
ERIC Educational Resources Information Center
Sievert, MaryEllen C.; And Others
1995-01-01
Describes types of full-text databases in medicine; discusses features for searching full-text journal databases available through online vendors; reviews research on full-text databases in medicine; and describes the MEDLINE/Full-Text Research Project at the University of Missouri (Columbia) which investigated precision, recall, and relevancy.…
DEVELOPMENT OF RIPARIAN ZONE INDICATORS (INT. GRANT)
Landscape features (e.g., land use) influence water quality characteristics on a variety of spatial scales. For example, while land use is controlled by anthropogenic features at a local scale, geologic features are set at larger spatial, and longer temporal scales. Individual ...
Texture-based approach to palmprint retrieval for personal identification
NASA Astrophysics Data System (ADS)
Li, Wenxin; Zhang, David; Xu, Z.; You, J.
2000-12-01
This paper presents a new approach to palmprint retrieval for personal identification. Three key issues in image retrieval are considered - feature selection, similarity measures and dynamic search for the best matching of the sample in the image database. We propose a texture-based method for palmprint feature representation. The concept of texture energy is introduced to define a palm print's global and local features, which are characterized with high convergence of inner-palm similarities and good dispersion of inter-palm discrimination. The search is carried out in a layered fashion: first global features are used to guide the fast selection of a small set of similar candidates from the database from the database and then local features are used to decide the final output within the candidate set. The experimental results demonstrate the effectiveness and accuracy of the proposed method.
Texture-based approach to palmprint retrieval for personal identification
NASA Astrophysics Data System (ADS)
Li, Wenxin; Zhang, David; Xu, Z.; You, J.
2001-01-01
This paper presents a new approach to palmprint retrieval for personal identification. Three key issues in image retrieval are considered - feature selection, similarity measures and dynamic search for the best matching of the sample in the image database. We propose a texture-based method for palmprint feature representation. The concept of texture energy is introduced to define a palm print's global and local features, which are characterized with high convergence of inner-palm similarities and good dispersion of inter-palm discrimination. The search is carried out in a layered fashion: first global features are used to guide the fast selection of a small set of similar candidates from the database from the database and then local features are used to decide the final output within the candidate set. The experimental results demonstrate the effectiveness and accuracy of the proposed method.
A bootstrap based Neyman-Pearson test for identifying variable importance.
Ditzler, Gregory; Polikar, Robi; Rosen, Gail
2015-04-01
Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.
Prominent feature extraction for review analysis: an empirical study
NASA Astrophysics Data System (ADS)
Agarwal, Basant; Mittal, Namita
2016-05-01
Sentiment analysis (SA) research has increased tremendously in recent times. SA aims to determine the sentiment orientation of a given text into positive or negative polarity. Motivation for SA research is the need for the industry to know the opinion of the users about their product from online portals, blogs, discussion boards and reviews and so on. Efficient features need to be extracted for machine-learning algorithm for better sentiment classification. In this paper, initially various features are extracted such as unigrams, bi-grams and dependency features from the text. In addition, new bi-tagged features are also extracted that conform to predefined part-of-speech patterns. Furthermore, various composite features are created using these features. Information gain (IG) and minimum redundancy maximum relevancy (mRMR) feature selection methods are used to eliminate the noisy and irrelevant features from the feature vector. Finally, machine-learning algorithms are used for classifying the review document into positive or negative class. Effects of different categories of features are investigated on four standard data-sets, namely, movie review and product (book, DVD and electronics) review data-sets. Experimental results show that composite features created from prominent features of unigram and bi-tagged features perform better than other features for sentiment classification. mRMR is a better feature selection method as compared with IG for sentiment classification. Boolean Multinomial Naïve Bayes) algorithm performs better than support vector machine classifier for SA in terms of accuracy and execution time.
Breaking the polar-nonpolar division in solvation free energy prediction.
Wang, Bao; Wang, Chengzhang; Wu, Kedi; Wei, Guo-Wei
2018-02-05
Implicit solvent models divide solvation free energies into polar and nonpolar additive contributions, whereas polar and nonpolar interactions are inseparable and nonadditive. We present a feature functional theory (FFT) framework to break this ad hoc division. The essential ideas of FFT are as follows: (i) representability assumption: there exists a microscopic feature vector that can uniquely characterize and distinguish one molecule from another; (ii) feature-function relationship assumption: the macroscopic features, including solvation free energy, of a molecule is a functional of microscopic feature vectors; and (iii) similarity assumption: molecules with similar microscopic features have similar macroscopic properties, such as solvation free energies. Based on these assumptions, solvation free energy prediction is carried out in the following protocol. First, we construct a molecular microscopic feature vector that is efficient in characterizing the solvation process using quantum mechanics and Poisson-Boltzmann theory. Microscopic feature vectors are combined with macroscopic features, that is, physical observable, to form extended feature vectors. Additionally, we partition a solvation dataset into queries according to molecular compositions. Moreover, for each target molecule, we adopt a machine learning algorithm for its nearest neighbor search, based on the selected microscopic feature vectors. Finally, from the extended feature vectors of obtained nearest neighbors, we construct a functional of solvation free energy, which is employed to predict the solvation free energy of the target molecule. The proposed FFT model has been extensively validated via a large dataset of 668 molecules. The leave-one-out test gives an optimal root-mean-square error (RMSE) of 1.05 kcal/mol. FFT predictions of SAMPL0, SAMPL1, SAMPL2, SAMPL3, and SAMPL4 challenge sets deliver the RMSEs of 0.61, 1.86, 1.64, 0.86, and 1.14 kcal/mol, respectively. Using a test set of 94 molecules and its associated training set, the present approach was carefully compared with a classic solvation model based on weighted solvent accessible surface area. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Smith, J. LaRue; Damar, Nancy A.; Charlet, David A.; Westenburg, Craig L.
2014-01-01
DigitalGlobe’s QuickBird satellite high-resolution multispectral imagery was classified by using Visual Learning Systems’ Feature Analyst feature extraction software to produce land-cover data sets for the Red Rock Canyon National Conservation Area and the Coyote Springs, Piute-Eldorado Valley, and Mormon Mesa Areas of Critical Environmental Concern in Clark County, Nevada. Over 1,000 vegetation field samples were collected at the stand level. The field samples were classified to the National Vegetation Classification Standard, Version 2 hierarchy at the alliance level and above. Feature extraction models were developed for vegetation on the basis of the spectral and spatial characteristics of selected field samples by using the Feature Analyst hierarchical learning process. Individual model results were merged to create one data set for the Red Rock Canyon National Conservation Area and one for each of the Areas of Critical Environmental Concern. Field sample points and photographs were used to validate and update the data set after model results were merged. Non-vegetation data layers, such as roads and disturbed areas, were delineated from the imagery and added to the final data sets. The resulting land-cover data sets are significantly more detailed than previously were available, both in resolution and in vegetation classes.
[Research on spectra recognition method for cabbages and weeds based on PCA and SIMCA].
Zu, Qin; Deng, Wei; Wang, Xiu; Zhao, Chun-Jiang
2013-10-01
In order to improve the accuracy and efficiency of weed identification, the difference of spectral reflectance was employed to distinguish between crops and weeds. Firstly, the different combinations of Savitzky-Golay (SG) convolutional derivation and multiplicative scattering correction (MSC) method were applied to preprocess the raw spectral data. Then the clustering analysis of various types of plants was completed by using principal component analysis (PCA) method, and the feature wavelengths which were sensitive for classifying various types of plants were extracted according to the corresponding loading plots of the optimal principal components in PCA results. Finally, setting the feature wavelengths as the input variables, the soft independent modeling of class analogy (SIMCA) classification method was used to identify the various types of plants. The experimental results of classifying cabbages and weeds showed that on the basis of the optimal pretreatment by a synthetic application of MSC and SG convolutional derivation with SG's parameters set as 1rd order derivation, 3th degree polynomial and 51 smoothing points, 23 feature wavelengths were extracted in accordance with the top three principal components in PCA results. When SIMCA method was used for classification while the previously selected 23 feature wavelengths were set as the input variables, the classification rates of the modeling set and the prediction set were respectively up to 98.6% and 100%.
Hailstone classifier based on Rough Set Theory
NASA Astrophysics Data System (ADS)
Wan, Huisong; Jiang, Shuming; Wei, Zhiqiang; Li, Jian; Li, Fengjiao
2017-09-01
The Rough Set Theory was used for the construction of the hailstone classifier. Firstly, the database of the radar image feature was constructed. It included transforming the base data reflected by the Doppler radar into the bitmap format which can be seen. Then through the image processing, the color, texture, shape and other dimensional features should be extracted and saved as the characteristic database to provide data support for the follow-up work. Secondly, Through the Rough Set Theory, a machine for hailstone classifications can be built to achieve the hailstone samples’ auto-classification.
Baijal, Shruti; Nakatani, Chie; van Leeuwen, Cees; Srinivasan, Narayanan
2013-06-07
Human observers show remarkable efficiency in statistical estimation; they are able, for instance, to estimate the mean size of visual objects, even if their number exceeds the capacity limits of focused attention. This ability has been understood as the result of a distinct mode of attention, i.e. distributed attention. Compared to the focused attention mode, working memory representations under distributed attention are proposed to be more compressed, leading to reduced working memory loads. An alternate proposal is that distributed attention uses less structured, feature-level representations. These would fill up working memory (WM) more, even when target set size is low. Using event-related potentials, we compared WM loading in a typical distributed attention task (mean size estimation) to that in a corresponding focused attention task (object recognition), using a measure called contralateral delay activity (CDA). Participants performed both tasks on 2, 4, or 8 different-sized target disks. In the recognition task, CDA amplitude increased with set size; notably, however, in the mean estimation task the CDA amplitude was high regardless of set size. In particular for set-size 2, the amplitude was higher in the mean estimation task than in the recognition task. The result showed that the task involves full WM loading even with a low target set size. This suggests that in the distributed attention mode, representations are not compressed, but rather less structured than under focused attention conditions. Copyright © 2012 Elsevier Ltd. All rights reserved.
Application of preprocessing filtering on Decision Tree C4.5 and rough set theory
NASA Astrophysics Data System (ADS)
Chan, Joseph C. C.; Lin, Tsau Y.
2001-03-01
This paper compares two artificial intelligence methods: the Decision Tree C4.5 and Rough Set Theory on the stock market data. The Decision Tree C4.5 is reviewed with the Rough Set Theory. An enhanced window application is developed to facilitate the pre-processing filtering by introducing the feature (attribute) transformations, which allows users to input formulas and create new attributes. Also, the application produces three varieties of data set with delaying, averaging, and summation. The results prove the improvement of pre-processing by applying feature (attribute) transformations on Decision Tree C4.5. Moreover, the comparison between Decision Tree C4.5 and Rough Set Theory is based on the clarity, automation, accuracy, dimensionality, raw data, and speed, which is supported by the rules sets generated by both algorithms on three different sets of data.
Holmes, Tyson H; He, Xiao-Song
2016-10-01
Small, wide data sets are commonplace in human immunophenotyping research. As defined here, a small, wide data set is constructed by sampling a small to modest quantity n,1
Automated Detection of Driver Fatigue Based on AdaBoost Classifier with EEG Signals.
Hu, Jianfeng
2017-01-01
Purpose: Driving fatigue has become one of the important causes of road accidents, there are many researches to analyze driver fatigue. EEG is becoming increasingly useful in the measuring fatigue state. Manual interpretation of EEG signals is impossible, so an effective method for automatic detection of EEG signals is crucial needed. Method: In order to evaluate the complex, unstable, and non-linear characteristics of EEG signals, four feature sets were computed from EEG signals, in which fuzzy entropy (FE), sample entropy (SE), approximate Entropy (AE), spectral entropy (PE), and combined entropies (FE + SE + AE + PE) were included. All these feature sets were used as the input vectors of AdaBoost classifier, a boosting method which is fast and highly accurate. To assess our method, several experiments including parameter setting and classifier comparison were conducted on 28 subjects. For comparison, Decision Trees (DT), Support Vector Machine (SVM) and Naive Bayes (NB) classifiers are used. Results: The proposed method (combination of FE and AdaBoost) yields superior performance than other schemes. Using FE feature extractor, AdaBoost achieves improved area (AUC) under the receiver operating curve of 0.994, error rate (ERR) of 0.024, Precision of 0.969, Recall of 0.984, F1 score of 0.976, and Matthews correlation coefficient (MCC) of 0.952, compared to SVM (ERR at 0.035, Precision of 0.957, Recall of 0.974, F1 score of 0.966, and MCC of 0.930 with AUC of 0.990), DT (ERR at 0.142, Precision of 0.857, Recall of 0.859, F1 score of 0.966, and MCC of 0.716 with AUC of 0.916) and NB (ERR at 0.405, Precision of 0.646, Recall of 0.434, F1 score of 0.519, and MCC of 0.203 with AUC of 0.606). It shows that the FE feature set and combined feature set outperform other feature sets. AdaBoost seems to have better robustness against changes of ratio of test samples for all samples and number of subjects, which might therefore aid in the real-time detection of driver fatigue through the classification of EEG signals. Conclusion: By using combination of FE features and AdaBoost classifier to detect EEG-based driver fatigue, this paper ensured confidence in exploring the inherent physiological mechanisms and wearable application.
Automated Detection of Driver Fatigue Based on AdaBoost Classifier with EEG Signals
Hu, Jianfeng
2017-01-01
Purpose: Driving fatigue has become one of the important causes of road accidents, there are many researches to analyze driver fatigue. EEG is becoming increasingly useful in the measuring fatigue state. Manual interpretation of EEG signals is impossible, so an effective method for automatic detection of EEG signals is crucial needed. Method: In order to evaluate the complex, unstable, and non-linear characteristics of EEG signals, four feature sets were computed from EEG signals, in which fuzzy entropy (FE), sample entropy (SE), approximate Entropy (AE), spectral entropy (PE), and combined entropies (FE + SE + AE + PE) were included. All these feature sets were used as the input vectors of AdaBoost classifier, a boosting method which is fast and highly accurate. To assess our method, several experiments including parameter setting and classifier comparison were conducted on 28 subjects. For comparison, Decision Trees (DT), Support Vector Machine (SVM) and Naive Bayes (NB) classifiers are used. Results: The proposed method (combination of FE and AdaBoost) yields superior performance than other schemes. Using FE feature extractor, AdaBoost achieves improved area (AUC) under the receiver operating curve of 0.994, error rate (ERR) of 0.024, Precision of 0.969, Recall of 0.984, F1 score of 0.976, and Matthews correlation coefficient (MCC) of 0.952, compared to SVM (ERR at 0.035, Precision of 0.957, Recall of 0.974, F1 score of 0.966, and MCC of 0.930 with AUC of 0.990), DT (ERR at 0.142, Precision of 0.857, Recall of 0.859, F1 score of 0.966, and MCC of 0.716 with AUC of 0.916) and NB (ERR at 0.405, Precision of 0.646, Recall of 0.434, F1 score of 0.519, and MCC of 0.203 with AUC of 0.606). It shows that the FE feature set and combined feature set outperform other feature sets. AdaBoost seems to have better robustness against changes of ratio of test samples for all samples and number of subjects, which might therefore aid in the real-time detection of driver fatigue through the classification of EEG signals. Conclusion: By using combination of FE features and AdaBoost classifier to detect EEG-based driver fatigue, this paper ensured confidence in exploring the inherent physiological mechanisms and wearable application. PMID:28824409
Holmes, Tyson H.; He, Xiao-Song
2016-01-01
Small, wide data sets are commonplace in human immunophenotyping research. As defined here, a small, wide data set is constructed by sampling a small to modest quantity n, 1 < n < 50, of human participants for the purpose of estimating many parameters p, such that n < p < 1,000. We offer a set of prescriptions that are designed to facilitate low-variance (i.e. stable), low-bias, interpretive regression modeling of small, wide data sets. These prescriptions are distinctive in their especially heavy emphasis on minimizing use of out-of-sample information for conducting statistical inference. That allows the working immunologist to proceed without being encumbered by imposed and often untestable statistical assumptions. Problems of unmeasured confounders, confidence-interval coverage, feature selection, and shrinkage/denoising are defined clearly and treated in detail. We propose an extension of an existing nonparametric technique for improved small-sample confidence-interval tail coverage from the univariate case (single immune feature) to the multivariate (many, possibly correlated immune features). An important role for derived features in the immunological interpretation of regression analyses is stressed. Areas of further research are discussed. Presented principles and methods are illustrated through application to a small, wide data set of adults spanning a wide range in ages and multiple immunophenotypes that were assayed before and after immunization with inactivated influenza vaccine (IIV). Our regression modeling prescriptions identify some potentially important topics for future immunological research. 1) Immunologists may wish to distinguish age-related differences in immune features from changes in immune features caused by aging. 2) A form of the bootstrap that employs linear extrapolation may prove to be an invaluable analytic tool because it allows the working immunologist to obtain accurate estimates of the stability of immune parameter estimates with a bare minimum of imposed assumptions. 3) Liberal inclusion of immune features in phenotyping panels can facilitate accurate separation of biological signal of interest from noise. In addition, through a combination of denoising and potentially improved confidence interval coverage, we identify some candidate immune correlates (frequency of cell subset and concentration of cytokine) with B cell response as measured by quantity of IIV-specific IgA antibody-secreting cells and quantity of IIV-specific IgG antibody-secreting cells. PMID:27196789
Fang, Chunying; Li, Haifeng; Ma, Lin; Zhang, Mancai
2017-01-01
Pathological speech usually refers to speech distortion resulting from illness or other biological insults. The assessment of pathological speech plays an important role in assisting the experts, while automatic evaluation of speech intelligibility is difficult because it is usually nonstationary and mutational. In this paper, we carry out an independent innovation of feature extraction and reduction, and we describe a multigranularity combined feature scheme which is optimized by the hierarchical visual method. A novel method of generating feature set based on S -transform and chaotic analysis is proposed. There are BAFS (430, basic acoustics feature), local spectral characteristics MSCC (84, Mel S -transform cepstrum coefficients), and chaotic features (12). Finally, radar chart and F -score are proposed to optimize the features by the hierarchical visual fusion. The feature set could be optimized from 526 to 96 dimensions based on NKI-CCRT corpus and 104 dimensions based on SVD corpus. The experimental results denote that new features by support vector machine (SVM) have the best performance, with a recognition rate of 84.4% on NKI-CCRT corpus and 78.7% on SVD corpus. The proposed method is thus approved to be effective and reliable for pathological speech intelligibility evaluation.
NASA Astrophysics Data System (ADS)
Wan, Xiaoqing; Zhao, Chunhui; Wang, Yanchun; Liu, Wu
2017-11-01
This paper proposes a novel classification paradigm for hyperspectral image (HSI) using feature-level fusion and deep learning-based methodologies. Operation is carried out in three main steps. First, during a pre-processing stage, wave atoms are introduced into bilateral filter to smooth HSI, and this strategy can effectively attenuate noise and restore texture information. Meanwhile, high quality spectral-spatial features can be extracted from HSI by taking geometric closeness and photometric similarity among pixels into consideration simultaneously. Second, higher order statistics techniques are firstly introduced into hyperspectral data classification to characterize the phase correlations of spectral curves. Third, multifractal spectrum features are extracted to characterize the singularities and self-similarities of spectra shapes. To this end, a feature-level fusion is applied to the extracted spectral-spatial features along with higher order statistics and multifractal spectrum features. Finally, stacked sparse autoencoder is utilized to learn more abstract and invariant high-level features from the multiple feature sets, and then random forest classifier is employed to perform supervised fine-tuning and classification. Experimental results on two real hyperspectral data sets demonstrate that the proposed method outperforms some traditional alternatives.
Obermeier, S.F.
1996-01-01
Liquefaction features can be used in many field settings to estimate the recurrence interval and magnitude of strong earthquakes through much of the Holocene. These features include dikes, craters, vented sand, sills, and laterally spreading landslides. The relatively high seismic shaking level required for their formation makes them particularly valuable as records of strong paleo-earthquakes. This state-of-the-art summary for using liquefaction-induced features for paleoseismic interpretation and analysis takes into account both geological and geotechnical engineering perspectives. The driving mechanism for formation of the features is primarily the increased pore-water pressure associated with liquefaction of sand-rich sediment. The role of this mechanism is often supplemented greatly by the direct action of seismic shaking at the ground surface, which strains and breaks the clay-rich cap that lies immediately above the sediment that liquefied. Discussed in the text are the processes involved in formation of the features, as well as their morphology and characteristics in field settings. Whether liquefaction occurs is controlled mainly by sediment grain size, sediment packing, depth to the water table, and strength and duration of seismic shaking. Formation of recognizable features in the field generally requires a low-permeability cap above the sediment that liquefied. Field manifestations are controlled largely by the severity of liquefaction and the thickness and properties of the low-permeability cap. Criteria are presented for determining whether observed sediment deformation in the field originated by seismically induced liquefaction. These criteria have been developed mainly by observing historic effects of liquefaction in varied field settings. The most important criterion is that a seismic liquefaction origin requires widespread, regional development of features around a core area where the effects are most severe. In addition, the features must have a morphology that is consistent with a very sudden application of a large hydraulic force. This article discusses case studies in widely separated and different geological settings: coastal South Carolina, the New Madrid seismic zone, the Wabash Valley seismic zone, and coastal Washington State. These studies encompass most of the range of settings and the types of liquefaction-induced features likely to be encountered anywhere. The case studies describe the observed features and the logic for assigning a seismic liquefaction origin to them. Also discussed are some types of sediment deformations that can be misinterpreted as having a seismic origin. Two independent methods for estimating prehistoric magnitude are discussed briefly. One method is based on determination of the maximum distance from the epicenter over which liquefaction-induced effects have formed. The other method is based on use of geotechnical engineering techniques at sites of marginal liquefaction, in order to bracket the peak accelerations as a function of epicentral distance; these accelerations can then be compared with predictions from seismological models.
2012-01-01
Background Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years. Simultaneously, the risk of incorporating ‘noisy’ data with misleading phylogenetic signal has also increased, and may disproportionately influence the topology of weakly supported nodes and lineages featuring rapid radiations and/or elevated rates of evolution. Results We investigated the influence of phylogenetic noise in large data sets by applying two fundamental strategies, variable site removal and long-branch exclusion, to the phylogenetic analysis of a full plastome alignment of 107 species of Pinus and six Pinaceae outgroups. While high overall phylogenetic resolution resulted from inclusion of all data, three historically recalcitrant nodes remained conflicted with previous analyses. Close investigation of these nodes revealed dramatically different responses to data removal. Whereas topological resolution and bootstrap support for two clades peaked with removal of highly variable sites, the third clade resolved most strongly when all sites were included. Similar trends were observed using long-branch exclusion, but patterns were neither as strong nor as clear. When compared to previous phylogenetic analyses of nuclear loci and morphological data, the most highly supported topologies seen in Pinus plastome analysis are congruent for the two clades gaining support from variable site removal and long-branch exclusion, but in conflict for the clade with highest support from the full data set. Conclusions These results suggest that removal of misleading signal in phylogenomic datasets can result not only in increased resolution for poorly supported nodes, but may serve as a tool for identifying erroneous yet highly supported topologies. For Pinus chloroplast genomes, removal of variable sites appears to be more effective than long-branch exclusion for clarifying phylogenetic hypotheses. PMID:22731878
Scanning tunneling microscopy studies of diamond films and optoelectronic materials
NASA Technical Reports Server (NTRS)
Perez, Jose M.
1993-01-01
In this report, we report on progress achieved from 12/1/92 to 10/1/93 under the grant entitled 'Scanning Tunneling Microscopy Studies of Diamond Films and Optoelectronic Materials'. We have set-up a chemical vapor deposition (CVD) diamond film growth system and a Raman spectroscopy system to study the nucleation and growth of diamond films with atomic resolution using scanning tunneling microscopy (STM). A unique feature of the diamond film growth system is that diamond films can be transferred directly to the ultrahigh vacuum (UHV) chamber of a scanning tunneling microscope without contaminating the films by exposure to air. The University of North Texas (UNT) provided $20,000 this year as matching funds for the NASA grant to purchase the diamond growth system. In addition, UNT provided a Coherent Innova 90S Argon ion laser, a Spex 1404 double spectrometer, and a Newport optical table costing $90,000 to set-up the Raman spectroscopy system. The CVD diamond growth system and Raman spectroscopy system will be used to grow and characterize diamond films with atomic resolution using STM as described in our proposal. One full-time graduate student and one full-time undergraduate student are supported under this grant. In addition, several graduate and undergraduate students were supported during the summer to assist in setting-up the diamond growth and Raman spectroscopy systems. We have obtained research results concerning STM of the structural and electronic properties of CVD grown diamond films, and STM and scanning tunneling spectroscopy of carbon nanotubes. In collaboration with the transmission electron microscopy (TEM) group at UNT, we have also obtained results concerning the optoelectronic material siloxene. These results were published in refereed scientific journals, submitted for publication, and presented as invited and contributed talks at scientific conferences.
NASA Astrophysics Data System (ADS)
Pålsson, Björn A.; Nielsen, Jens C. O.
2015-06-01
A model for simulation of dynamic interaction between a railway vehicle and a turnout (switch and crossing, S&C) is validated versus field measurements. In particular, the implementation and accuracy of viscously damped track models with different complexities are assessed. The validation data come from full-scale field measurements of dynamic track stiffness and wheel-rail contact forces in a demonstrator turnout that was installed as part of the INNOTRACK project with funding from the European Union Sixth Framework Programme. Vertical track stiffness at nominal wheel loads, in the frequency range up to 20 Hz, was measured using a rolling stiffness measurement vehicle (RSMV). Vertical and lateral wheel-rail contact forces were measured by an instrumented wheel set mounted in a freight car featuring Y25 bogies. The measurements were performed for traffic in both the through and diverging routes, and in the facing and trailing moves. The full set of test runs was repeated with different types of rail pad to investigate the influence of rail pad stiffness on track stiffness and contact forces. It is concluded that impact loads on the crossing can be reduced by using more resilient rail pads. To allow for vehicle dynamics simulations at low computational cost, the track models are discretised space-variant mass-spring-damper models that are moving with each wheel set of the vehicle model. Acceptable agreement between simulated and measured vertical contact forces at the crossing can be obtained when the standard GENSYS track model is extended with one ballast/subgrade mass under each rail. This model can be tuned to capture the large phase delay in dynamic track stiffness at low frequencies, as measured by the RSMV, while remaining sufficiently resilient at higher frequencies.
a Method for the Registration of Hemispherical Photographs and Tls Intensity Images
NASA Astrophysics Data System (ADS)
Schmidt, A.; Schilling, A.; Maas, H.-G.
2012-07-01
Terrestrial laser scanners generate dense and accurate 3D point clouds with minimal effort, which represent the geometry of real objects, while image data contains texture information of object surfaces. Based on the complementary characteristics of both data sets, a combination is very appealing for many applications, including forest-related tasks. In the scope of our research project, independent data sets of a plain birch stand have been taken by a full-spherical laser scanner and a hemispherical digital camera. Previously, both kinds of data sets have been considered separately: Individual trees were successfully extracted from large 3D point clouds, and so-called forest inventory parameters could be determined. Additionally, a simplified tree topology representation was retrieved. From hemispherical images, leaf area index (LAI) values, as a very relevant parameter for describing a stand, have been computed. The objective of our approach is to merge a 3D point cloud with image data in a way that RGB values are assigned to each 3D point. So far, segmentation and classification of TLS point clouds in forestry applications was mainly based on geometrical aspects of the data set. However, a 3D point cloud with colour information provides valuable cues exceeding simple statistical evaluation of geometrical object features and thus may facilitate the analysis of the scan data significantly.
Assessing the accuracy and stability of variable selection ...
Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used, or stepwise procedures are employed which iteratively add/remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating dataset consists of the good/poor condition of n=1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p=212) of landscape features from the StreamCat dataset. Two types of RF models are compared: a full variable set model with all 212 predictors, and a reduced variable set model selected using a backwards elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors, and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substanti
DOE Office of Scientific and Technical Information (OSTI.GOV)
Winkler, Mirko S., E-mail: mirko.winkler@unibas.ch; University of Basel, P.O. Box, CH-4003 Basel; Divall, Mark J., E-mail: mdivall@shapeconsulting.org
2012-02-15
The quantitative assessment of health impacts has been identified as a crucial feature for realising the full potential of health impact assessment (HIA). In settings where demographic and health data are notoriously scarce, but there is a broad range of ascertainable ecological, environmental, epidemiological and socioeconomic information, a diverse toolkit of data collection strategies becomes relevant for the mainly small-area impacts of interest. We present a modular, cross-sectional baseline health survey study design, which has been developed for HIA of industrial development projects in the humid tropics. The modular nature of our toolkit allows our methodology to be readily adaptedmore » to the prevailing eco-epidemiological characteristics of a given project setting. Central to our design is a broad set of key performance indicators, covering a multiplicity of health outcomes and determinants at different levels and scales. We present experience and key findings from our modular baseline health survey methodology employed in 14 selected sentinel sites within an iron ore mining project in the Republic of Guinea. We argue that our methodology is a generic example of rapid evidence assembly in difficult-to-reach localities, where improvement of the predictive validity of the assessment and establishment of a benchmark for longitudinal monitoring of project impacts and mitigation efforts is needed.« less
Unsupervised universal steganalyzer for high-dimensional steganalytic features
NASA Astrophysics Data System (ADS)
Hou, Xiaodan; Zhang, Tao
2016-11-01
The research in developing steganalytic features has been highly successful. These features are extremely powerful when applied to supervised binary classification problems. However, they are incompatible with unsupervised universal steganalysis because the unsupervised method cannot distinguish embedding distortion from varying levels of noises caused by cover variation. This study attempts to alleviate the problem by introducing similarity retrieval of image statistical properties (SRISP), with the specific aim of mitigating the effect of cover variation on the existing steganalytic features. First, cover images with some statistical properties similar to those of a given test image are searched from a retrieval cover database to establish an aided sample set. Then, unsupervised outlier detection is performed on a test set composed of the given test image and its aided sample set to determine the type (cover or stego) of the given test image. Our proposed framework, called SRISP-aided unsupervised outlier detection, requires no training. Thus, it does not suffer from model mismatch mess. Compared with prior unsupervised outlier detectors that do not consider SRISP, the proposed framework not only retains the universality but also exhibits superior performance when applied to high-dimensional steganalytic features.
Electrophysiological evidence for parallel and serial processing during visual search.
Luck, S J; Hillyard, S A
1990-12-01
Event-related potentials were recorded from young adults during a visual search task in order to evaluate parallel and serial models of visual processing in the context of Treisman's feature integration theory. Parallel and serial search strategies were produced by the use of feature-present and feature-absent targets, respectively. In the feature-absent condition, the slopes of the functions relating reaction time and latency of the P3 component to set size were essentially identical, indicating that the longer reaction times observed for larger set sizes can be accounted for solely by changes in stimulus identification and classification time, rather than changes in post-perceptual processing stages. In addition, the amplitude of the P3 wave on target-present trials in this condition increased with set size and was greater when the preceding trial contained a target, whereas P3 activity was minimal on target-absent trials. These effects are consistent with the serial self-terminating search model and appear to contradict parallel processing accounts of attention-demanding visual search performance, at least for a subset of search paradigms. Differences in ERP scalp distributions further suggested that different physiological processes are utilized for the detection of feature presence and absence.
A unified framework for image retrieval using keyword and visual features.
Jing, Feng; Li, Mingling; Zhang, Hong-Jiang; Zhang, Bo
2005-07-01
In this paper, a unified image retrieval framework based on both keyword annotations and visual features is proposed. In this framework, a set of statistical models are built based on visual features of a small set of manually labeled images to represent semantic concepts and used to propagate keywords to other unlabeled images. These models are updated periodically when more images implicitly labeled by users become available through relevance feedback. In this sense, the keyword models serve the function of accumulation and memorization of knowledge learned from user-provided relevance feedback. Furthermore, two sets of effective and efficient similarity measures and relevance feedback schemes are proposed for query by keyword scenario and query by image example scenario, respectively. Keyword models are combined with visual features in these schemes. In particular, a new, entropy-based active learning strategy is introduced to improve the efficiency of relevance feedback for query by keyword. Furthermore, a new algorithm is proposed to estimate the keyword features of the search concept for query by image example. It is shown to be more appropriate than two existing relevance feedback algorithms. Experimental results demonstrate the effectiveness of the proposed framework.
Zhou, Jingyu; Tian, Shulin; Yang, Chenglin
2014-01-01
Few researches pay attention to prediction about analog circuits. The few methods lack the correlation with circuit analysis during extracting and calculating features so that FI (fault indicator) calculation often lack rationality, thus affecting prognostic performance. To solve the above problem, this paper proposes a novel prediction method about single components of analog circuits based on complex field modeling. Aiming at the feature that faults of single components hold the largest number in analog circuits, the method starts with circuit structure, analyzes transfer function of circuits, and implements complex field modeling. Then, by an established parameter scanning model related to complex field, it analyzes the relationship between parameter variation and degeneration of single components in the model in order to obtain a more reasonable FI feature set via calculation. According to the obtained FI feature set, it establishes a novel model about degeneration trend of analog circuits' single components. At last, it uses particle filter (PF) to update parameters for the model and predicts remaining useful performance (RUP) of analog circuits' single components. Since calculation about the FI feature set is more reasonable, accuracy of prediction is improved to some extent. Finally, the foregoing conclusions are verified by experiments.
System Complexity Reduction via Feature Selection
ERIC Educational Resources Information Center
Deng, Houtao
2011-01-01
This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree…
Extraction of Features from High-resolution 3D LiDaR Point-cloud Data
NASA Astrophysics Data System (ADS)
Keller, P.; Kreylos, O.; Hamann, B.; Kellogg, L. H.; Cowgill, E. S.; Yikilmaz, M. B.; Hering-Bertram, M.; Hagen, H.
2008-12-01
Airborne and tripod-based LiDaR scans are capable of producing new insight into geologic features by providing high-quality 3D measurements of the landscape. High-resolution LiDaR is a promising method for studying slip on faults, erosion, and other landscape-altering processes. LiDaR scans can produce up to several billion individual point returns associated with the reflection of a laser from natural and engineered surfaces; these point clouds are typically used to derive a high-resolution digital elevation model (DEM). Currently, there exist only few methods that can support the analysis of the data at full resolution and in the natural 3D perspective in which it was collected by working directly with the points. We are developing new algorithms for extracting features from LiDaR scans, and present method for determining the local curvature of a LiDaR data set, working directly with the individual point returns of a scan. Computing the curvature enables us to rapidly and automatically identify key features such as ridge-lines, stream beds, and edges of terraces. We fit polynomial surface patches via a moving least squares (MLS) approach to local point neighborhoods, determining curvature values for each point. The size of the local point neighborhood is defined by a user. Since both terrestrial and airborne LiDaR scans suffer from high noise, we apply additional pre- and post-processing smoothing steps to eliminate unwanted features. LiDaR data also captures objects like buildings and trees complicating greatly the task of extracting reliable curvature values. Hence, we use a stochastic approach to determine whether a point can be reliably used to estimate curvature or not. Additionally, we have developed a graph-based approach to establish connectivities among points that correspond to regions of high curvature. The result is an explicit description of ridge-lines, for example. We have applied our method to the raw point cloud data collected as part of the GeoEarthScope B-4 project on a section of the San Andreas Fault (Segment SA09). This section provides an excellent test site for our method as it exposes the fault clearly, contains few extraneous structures, and exhibits multiple dry stream-beds that have been off-set by motion on the fault.
Classifying transcription factor targets and discovering relevant biological features
Holloway, Dustin T; Kon, Mark; DeLisi, Charles
2008-01-01
Background An important goal in post-genomic research is discovering the network of interactions between transcription factors (TFs) and the genes they regulate. We have previously reported the development of a supervised-learning approach to TF target identification, and used it to predict targets of 104 transcription factors in yeast. We now include a new sequence conservation measure, expand our predictions to include 59 new TFs, introduce a web-server, and implement an improved ranking method to reveal the biological features contributing to regulation. The classifiers combine 8 genomic datasets covering a broad range of measurements including sequence conservation, sequence overrepresentation, gene expression, and DNA structural properties. Principal Findings (1) Application of the method yields an amplification of information about yeast regulators. The ratio of total targets to previously known targets is greater than 2 for 11 TFs, with several having larger gains: Ash1(4), Ino2(2.6), Yaf1(2.4), and Yap6(2.4). (2) Many predicted targets for TFs match well with the known biology of their regulators. As a case study we discuss the regulator Swi6, presenting evidence that it may be important in the DNA damage response, and that the previously uncharacterized gene YMR279C plays a role in DNA damage response and perhaps in cell-cycle progression. (3) A procedure based on recursive-feature-elimination is able to uncover from the large initial data sets those features that best distinguish targets for any TF, providing clues relevant to its biology. An analysis of Swi6 suggests a possible role in lipid metabolism, and more specifically in metabolism of ceramide, a bioactive lipid currently being investigated for anti-cancer properties. (4) An analysis of global network properties highlights the transcriptional network hubs; the factors which control the most genes and the genes which are bound by the largest set of regulators. Cell-cycle and growth related regulators dominate the former; genes involved in carbon metabolism and energy generation dominate the latter. Conclusion Postprocessing of regulatory-classifier results can provide high quality predictions, and feature ranking strategies can deliver insight into the regulatory functions of TFs. Predictions are available at an online web-server, including the full transcriptional network, which can be analyzed using VisAnt network analysis suite. Reviewers This article was reviewed by Igor Jouline, Todd Mockler(nominated by Valerian Dolja), and Sandor Pongor. PMID:18513408
Hadrava, Jiří; Albrecht, Tomáš; Tryjanowski, Piotr
2018-01-01
Birds sitting or feeding on live large African herbivorous mammals are a visible, yet quite neglected, type of commensalistic–mutualistic association. Here, we investigate general patterns in such relationships at large spatial and taxonomic scales. To obtain large-scale data, an extensive internet-based search for photos was carried out on Google Images. To characterize patterns of the structural organization of commensalistic–mutualistic associations between African birds and herbivorous mammals, we used a network analysis approach. We then employed phylogenetically-informed comparative analysis to explore whether features of bird visitation of mammals, i.e., their mean number, mass and species richness per mammal species, are shaped by a combination of host mammal (body mass and herd size) and environmental (habitat openness) characteristics. We found that the association web structure was only weakly nested for commensalistic as well as for mutualistic birds (oxpeckers Buphagus spp.) and African mammals. Moreover, except for oxpeckers, nestedness did not differ significantly from a null model indicating that birds do not prefer mammal species which are visited by a large number of bird species. In oxpeckers, however, a nested structure suggests a non-random assignment of birds to their mammal hosts. We also identified some new or rare associations between birds and mammals, but we failed to find several previously described associations. Furthermore, we found that mammal body mass positively influenced the number and mass of birds observed sitting on them in the full set of species (i.e., taking oxpeckers together with other bird species). We also found a positive correlation between mammal body mass and mass of non-oxpecker species as well as oxpeckers. Mammal herd size was associated with a higher mass of birds in the full set of species as well as in non-oxpecker species, and mammal species living in larger herds also attracted more bird species in the full set of species. Habitat openness influenced the mass of birds sitting on mammals as well as the number of species recorded sitting on mammals in the full set of species. In non-oxpecker species habitat openness was correlated with the bird number, mass and species richness. Our results provide evidence that patterns of bird–mammal associations can be linked to mammal and environmental characteristics and highlight the potential role of information technologies and new media in further studies of ecology and evolution. However, further study is needed to get a proper insight into the biological and methodological processes underlying the observed patterns. PMID:29576981
ERIC Educational Resources Information Center
Soares, S. N.; Wagner, F. R.
2011-01-01
Teaching and Design Workbench (T&D-Bench) is a framework aimed at education and research in the areas of computer architecture and embedded systems. It includes a set of features not found in other educational environments. This set of features is the result of an original combination of design requirements for T&D-Bench: that the…
Toward real-time performance benchmarks for Ada
NASA Technical Reports Server (NTRS)
Clapp, Russell M.; Duchesneau, Louis; Volz, Richard A.; Mudge, Trevor N.; Schultze, Timothy
1986-01-01
The issue of real-time performance measurements for the Ada programming language through the use of benchmarks is addressed. First, the Ada notion of time is examined and a set of basic measurement techniques are developed. Then a set of Ada language features believed to be important for real-time performance are presented and specific measurement methods discussed. In addition, other important time related features which are not explicitly part of the language but are part of the run-time related features which are not explicitly part of the language but are part of the run-time system are also identified and measurement techniques developed. The measurement techniques are applied to the language and run-time system features and the results are presented.
Carney, Patricia A; Waller, Elaine; Dexter, Eve; Marino, Miguel; Rosener, Stephanie E; Green, Larry A; Jones, Geoffrey; M Keister, J Drew; Dostal, Julie A; Jones, Samuel M; Eiff, M Patrice
2016-11-01
Primary care residencies are undergoing dramatic changes because of changing health care systems and evolving demands for updated training models. We examined the relationships between residents' exposures to patient-centered medical home (PCMH) features in their assigned continuity clinics and their satisfaction with training. Longitudinal surveys were collected annually from residents evaluating satisfaction with training using a 5-point Likert-type scale (1=very unsatisfied to 5=very satisfied) from 2007 through 2011, and the presence or absence of PCMH features were collected from 24 continuity clinics during the same time period. Odds ratios on residents' overall satisfaction were compared according to whether they had no exposure to PCMH features, some exposure (1-2 years), or full exposure (all 3 or more years). Fourteen programs and 690 unique residents provided data to this study. Resident satisfaction with training was highest with full exposure for integrated case management compared to no exposure, which occurred in 2010 (OR=2.85, 95% CI=1.40, 5.80). Resident satisfaction was consistently statistically lower with any or full exposure (versus none) to expanded clinic hours in 2007 and 2009 (eg, OR for some exposure in 2009 was 0.31 95% CI=0.19, 0.51, and OR for full exposure 0.28 95% CI=0.16, 0.49). Resident satisfaction for many electronic health record (EHR)-based features tended to be significantly lower with any exposure (some or full) versus no exposure over the study period. For example, the odds ratio for resident satisfaction was significantly lower with any exposure to electronic health records in continuity practice in 2008, 2009, and 2010 (OR for some exposure in 2008 was 0.36; 95% CI=0.19, 0.70, with comparable results in 2009, 2010). Resident satisfaction with training was inconsistently correlated with exposure to features of PCMH. No correlation between PCMH exposure and resident satisfaction was sustained over time.
Hyperspectral data discrimination methods
NASA Astrophysics Data System (ADS)
Casasent, David P.; Chen, Xuewen
2000-12-01
Hyperspectral data provides spectral response information that provides detailed chemical, moisture, and other description of constituent parts of an item. These new sensor data are useful in USDA product inspection. However, such data introduce problems such as the curse of dimensionality, the need to reduce the number of features used to accommodate realistic small training set sizes, and the need to employ discriminatory features and still achieve good generalization (comparable training and test set performance). Several two-step methods are compared to a new and preferable single-step spectral decomposition algorithm. Initial results on hyperspectral data for good/bad almonds and for good/bad (aflatoxin infested) corn kernels are presented. The hyperspectral application addressed differs greatly from prior USDA work (PLS) in which the level of a specific channel constituent in food was estimated. A validation set (separate from the test set) is used in selecting algorithm parameters. Threshold parameters are varied to select the best Pc operating point. Initial results show that nonlinear features yield improved performance.
Guiding Students through Expository Text with Text Feature Walks
ERIC Educational Resources Information Center
Kelley, Michelle J.; Clausen-Grace, Nicki
2010-01-01
The Text Feature Walk is a structure created and employed by the authors that guides students in the reading of text features in order to access prior knowledge, make connections, and set a purpose for reading expository text. Results from a pilot study are described in order to illustrate the benefits of using the Text Feature Walk over…
THE IMPACT OF POINT-SOURCE SUBTRACTION RESIDUALS ON 21 cm EPOCH OF REIONIZATION ESTIMATION
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trott, Cathryn M.; Wayth, Randall B.; Tingay, Steven J., E-mail: cathryn.trott@curtin.edu.au
Precise subtraction of foreground sources is crucial for detecting and estimating 21 cm H I signals from the Epoch of Reionization (EoR). We quantify how imperfect point-source subtraction due to limitations of the measurement data set yields structured residual signal in the data set. We use the Cramer-Rao lower bound, as a metric for quantifying the precision with which a parameter may be measured, to estimate the residual signal in a visibility data set due to imperfect point-source subtraction. We then propagate these residuals into two metrics of interest for 21 cm EoR experiments-the angular power spectrum and two-dimensional powermore » spectrum-using a combination of full analytic covariant derivation, analytic variant derivation, and covariant Monte Carlo simulations. This methodology differs from previous work in two ways: (1) it uses information theory to set the point-source position error, rather than assuming a global rms error, and (2) it describes a method for propagating the errors analytically, thereby obtaining the full correlation structure of the power spectra. The methods are applied to two upcoming low-frequency instruments that are proposing to perform statistical EoR experiments: the Murchison Widefield Array and the Precision Array for Probing the Epoch of Reionization. In addition to the actual antenna configurations, we apply the methods to minimally redundant and maximally redundant configurations. We find that for peeling sources above 1 Jy, the amplitude of the residual signal, and its variance, will be smaller than the contribution from thermal noise for the observing parameters proposed for upcoming EoR experiments, and that optimal subtraction of bright point sources will not be a limiting factor for EoR parameter estimation. We then use the formalism to provide an ab initio analytic derivation motivating the 'wedge' feature in the two-dimensional power spectrum, complementing previous discussion in the literature.« less
Obermeier, S.F.; Jacobson, R.B.; Smoot, J.P.; Weems, R.E.; Gohn, G.S.; Monroe, J.E.; Powars, D.S.
1990-01-01
Many types of liquefaction-related features (sand blows, fissures, lateral spreads, dikes, and sills) have been induced by earthquakes in coastal South Carolina and in the New Madrid seismic zone in the Central United States. In addition, abundant features of unknown and nonseismic origin are present. Geologic criteria for interpreting an earthquake origin in these areas are illustrated in practical applications; these criteria can be used to determine the origin of liquefaction features in many other geographic and geologic settings. In both coastal South Carolina and the New Madrid seismic zone, the earthquake-induced liquefaction features generally originated in clean sand deposits that contain no or few intercalated silt or clay-rich strata. The local geologic setting is a major influence on both development and surface expression of sand blows. Major factors controlling sand-blow formation include the thickness and physical properties of the deposits above the source sands, and these relationships are illustrated by comparing sand blows found in coastal South Carolina (in marine deposits) with sand blows found in the New Madrid seismic zone (in fluvial deposits). In coastal South Carolina, the surface stratum is typically a thin (about 1 m) soil that is weakly cemented with humate, and the sand blows are expressed as craters surrounded by a thin sheet of sand; in the New Madrid seismic zone the surface stratum generally is a clay-rich deposit ranging in thickness from 2 to 10 m, in which case sand blows characteristically are expressed as sand mounded above the original ground surface. Recognition of the various features described in this paper, and identification of the most probable origin for each, provides a set of important tools for understanding paleoseismicity in areas such as the Central and Eastern United States where faults are not exposed for study and strong seismic activity is infrequent.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rios Velazquez, E; Parmar, C; Narayan, V
Purpose: To compare the complementary value of quantitative radiomic features to that of radiologist-annotated semantic features in predicting EGFR mutations in lung adenocarcinomas. Methods: Pre-operative CT images of 258 lung adenocarcinoma patients were available. Tumors were segmented using the sing-click ensemble segmentation algorithm. A set of radiomic features was extracted using 3D-Slicer. Test-retest reproducibility and unsupervised dimensionality reduction were applied to select a subset of reproducible and independent radiomic features. Twenty semantic annotations were scored by an expert radiologist, describing the tumor, surrounding tissue and associated findings. Minimum-redundancy-maximum-relevance (MRMR) was used to identify the most informative radiomic and semantic featuresmore » in 172 patients (training-set, temporal split). Radiomic, semantic and combined radiomic-semantic logistic regression models to predict EGFR mutations were evaluated in and independent validation dataset of 86 patients using the area under the receiver operating curve (AUC). Results: EGFR mutations were found in 77/172 (45%) and 39/86 (45%) of the training and validation sets, respectively. Univariate AUCs showed a similar range for both feature types: radiomics median AUC = 0.57 (range: 0.50 – 0.62); semantic median AUC = 0.53 (range: 0.50 – 0.64, Wilcoxon p = 0.55). After MRMR feature selection, the best-performing radiomic, semantic, and radiomic-semantic logistic regression models, for EGFR mutations, showed a validation AUC of 0.56 (p = 0.29), 0.63 (p = 0.063) and 0.67 (p = 0.004), respectively. Conclusion: Quantitative volumetric and textural Radiomic features complement the qualitative and semi-quantitative radiologist annotations. The prognostic value of informative qualitative semantic features such as cavitation and lobulation is increased with the addition of quantitative textural features from the tumor region.« less
Computer aided diagnosis based on medical image processing and artificial intelligence methods
NASA Astrophysics Data System (ADS)
Stoitsis, John; Valavanis, Ioannis; Mougiakakou, Stavroula G.; Golemati, Spyretta; Nikita, Alexandra; Nikita, Konstantina S.
2006-12-01
Advances in imaging technology and computer science have greatly enhanced interpretation of medical images, and contributed to early diagnosis. The typical architecture of a Computer Aided Diagnosis (CAD) system includes image pre-processing, definition of region(s) of interest, features extraction and selection, and classification. In this paper, the principles of CAD systems design and development are demonstrated by means of two examples. The first one focuses on the differentiation between symptomatic and asymptomatic carotid atheromatous plaques. For each plaque, a vector of texture and motion features was estimated, which was then reduced to the most robust ones by means of ANalysis of VAriance (ANOVA). Using fuzzy c-means, the features were then clustered into two classes. Clustering performances of 74%, 79%, and 84% were achieved for texture only, motion only, and combinations of texture and motion features, respectively. The second CAD system presented in this paper supports the diagnosis of focal liver lesions and is able to characterize liver tissue from Computed Tomography (CT) images as normal, hepatic cyst, hemangioma, and hepatocellular carcinoma. Five texture feature sets were extracted for each lesion, while a genetic algorithm based feature selection method was applied to identify the most robust features. The selected feature set was fed into an ensemble of neural network classifiers. The achieved classification performance was 100%, 93.75% and 90.63% in the training, validation and testing set, respectively. It is concluded that computerized analysis of medical images in combination with artificial intelligence can be used in clinical practice and may contribute to more efficient diagnosis.
NASA Astrophysics Data System (ADS)
Hussnain, Zille; Oude Elberink, Sander; Vosselman, George
2016-06-01
In mobile laser scanning systems, the platform's position is measured by GNSS and IMU, which is often not reliable in urban areas. Consequently, derived Mobile Laser Scanning Point Cloud (MLSPC) lacks expected positioning reliability and accuracy. Many of the current solutions are either semi-automatic or unable to achieve pixel level accuracy. We propose an automatic feature extraction method which involves utilizing corresponding aerial images as a reference data set. The proposed method comprise three steps; image feature detection, description and matching between corresponding patches of nadir aerial and MLSPC ortho images. In the data pre-processing step the MLSPC is patch-wise cropped and converted to ortho images. Furthermore, each aerial image patch covering the area of the corresponding MLSPC patch is also cropped from the aerial image. For feature detection, we implemented an adaptive variant of Harris-operator to automatically detect corner feature points on the vertices of road markings. In feature description phase, we used the LATCH binary descriptor, which is robust to data from different sensors. For descriptor matching, we developed an outlier filtering technique, which exploits the arrangements of relative Euclidean-distances and angles between corresponding sets of feature points. We found that the positioning accuracy of the computed correspondence has achieved the pixel level accuracy, where the image resolution is 12cm. Furthermore, the developed approach is reliable when enough road markings are available in the data sets. We conclude that, in urban areas, the developed approach can reliably extract features necessary to improve the MLSPC accuracy to pixel level.
Haque, M Muksitul; Holder, Lawrence B; Skinner, Michael K
2015-01-01
Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation involves germline transmitted epimutations. The primary epimutations identified involve altered differential DNA methylation regions (DMRs). Different environmental toxicants have been shown to promote exposure (i.e., toxicant) specific signatures of germline epimutations. Analysis of genomic features associated with these epimutations identified low-density CpG regions (<3 CpG / 100bp) termed CpG deserts and a number of unique DNA sequence motifs. The rat genome was annotated for these and additional relevant features. The objective of the current study was to use a machine learning computational approach to predict all potential epimutations in the genome. A number of previously identified sperm epimutations were used as training sets. A novel machine learning approach using a sequential combination of Active Learning and Imbalance Class Learner analysis was developed. The transgenerational sperm epimutation analysis identified approximately 50K individual sites with a 1 kb mean size and 3,233 regions that had a minimum of three adjacent sites with a mean size of 3.5 kb. A select number of the most relevant genomic features were identified with the low density CpG deserts being a critical genomic feature of the features selected. A similar independent analysis with transgenerational somatic cell epimutation training sets identified a smaller number of 1,503 regions of genome-wide predicted sites and differences in genomic feature contributions. The predicted genome-wide germline (sperm) epimutations were found to be distinct from the predicted somatic cell epimutations. Validation of the genome-wide germline predicted sites used two recently identified transgenerational sperm epimutation signature sets from the pesticides dichlorodiphenyltrichloroethane (DDT) and methoxychlor (MXC) exposure lineage F3 generation. Analysis of this positive validation data set showed a 100% prediction accuracy for all the DDT-MXC sperm epimutations. Observations further elucidate the genomic features associated with transgenerational germline epimutations and identify a genome-wide set of potential epimutations that can be used to facilitate identification of epigenetic diagnostics for ancestral environmental exposures and disease susceptibility.
Li, Yanpeng; Hu, Xiaohua; Lin, Hongfei; Yang, Zhihao
2011-01-01
Feature representation is essential to machine learning and text mining. In this paper, we present a feature coupling generalization (FCG) framework for generating new features from unlabeled data. It selects two special types of features, i.e., example-distinguishing features (EDFs) and class-distinguishing features (CDFs) from original feature set, and then generalizes EDFs into higher-level features based on their coupling degrees with CDFs in unlabeled data. The advantage is: EDFs with extreme sparsity in labeled data can be enriched by their co-occurrences with CDFs in unlabeled data so that the performance of these low-frequency features can be greatly boosted and new information from unlabeled can be incorporated. We apply this approach to three tasks in biomedical literature mining: gene named entity recognition (NER), protein-protein interaction extraction (PPIE), and text classification (TC) for gene ontology (GO) annotation. New features are generated from over 20 GB unlabeled PubMed abstracts. The experimental results on BioCreative 2, AIMED corpus, and TREC 2005 Genomics Track show that 1) FCG can utilize well the sparse features ignored by supervised learning. 2) It improves the performance of supervised baselines by 7.8 percent, 5.0 percent, and 5.8 percent, respectively, in the tree tasks. 3) Our methods achieve 89.1, 64.5 F-score, and 60.1 normalized utility on the three benchmark data sets.
Hadoop neural network for parallel and distributed feature selection.
Hodge, Victoria J; O'Keefe, Simon; Austin, Jim
2016-06-01
In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
2012-01-01
Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce frame-work. We extract a set of heterogeneous features such as random walk based features, neighborhood features and common author features. The potential number of links to consider for the possibility of link discovery is large in our concept network and to address the scalability problem, the features from a concept network are extracted using a cluster with Map-Reduce framework. We further model link discovery as a classification problem carried out on a training data set automatically extracted from two network snapshots taken in two consecutive time duration. A set of heterogeneous features, which cover both topological and semantic features derived from the concept network, have been studied with respect to their impacts on the accuracy of the proposed supervised link discovery process. A case study of hypotheses generation based on the proposed method has been presented in the paper. PMID:22759614
Al-Shaikhli, Saif Dawood Salman; Yang, Michael Ying; Rosenhahn, Bodo
2016-12-01
This paper presents a novel method for Alzheimer's disease classification via an automatic 3D caudate nucleus segmentation. The proposed method consists of segmentation and classification steps. In the segmentation step, we propose a novel level set cost function. The proposed cost function is constrained by a sparse representation of local image features using a dictionary learning method. We present coupled dictionaries: a feature dictionary of a grayscale brain image and a label dictionary of a caudate nucleus label image. Using online dictionary learning, the coupled dictionaries are learned from the training data. The learned coupled dictionaries are embedded into a level set function. In the classification step, a region-based feature dictionary is built. The region-based feature dictionary is learned from shape features of the caudate nucleus in the training data. The classification is based on the measure of the similarity between the sparse representation of region-based shape features of the segmented caudate in the test image and the region-based feature dictionary. The experimental results demonstrate the superiority of our method over the state-of-the-art methods by achieving a high segmentation (91.5%) and classification (92.5%) accuracy. In this paper, we find that the study of the caudate nucleus atrophy gives an advantage over the study of whole brain structure atrophy to detect Alzheimer's disease. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
2011-01-01
Background Cardiotocography (CTG) is the most widely used tool for fetal surveillance. The visual analysis of fetal heart rate (FHR) traces largely depends on the expertise and experience of the clinician involved. Several approaches have been proposed for the effective interpretation of FHR. In this paper, a new approach for FHR feature extraction based on empirical mode decomposition (EMD) is proposed, which was used along with support vector machine (SVM) for the classification of FHR recordings as 'normal' or 'at risk'. Methods The FHR were recorded from 15 subjects at a sampling rate of 4 Hz and a dataset consisting of 90 randomly selected records of 20 minutes duration was formed from these. All records were labelled as 'normal' or 'at risk' by two experienced obstetricians. A training set was formed by 60 records, the remaining 30 left as the testing set. The standard deviations of the EMD components are input as features to a support vector machine (SVM) to classify FHR samples. Results For the training set, a five-fold cross validation test resulted in an accuracy of 86% whereas the overall geometric mean of sensitivity and specificity was 94.8%. The Kappa value for the training set was .923. Application of the proposed method to the testing set (30 records) resulted in a geometric mean of 81.5%. The Kappa value for the testing set was .684. Conclusions Based on the overall performance of the system it can be stated that the proposed methodology is a promising new approach for the feature extraction and classification of FHR signals. PMID:21244712
Cross-Modal Retrieval With CNN Visual Features: A New Baseline.
Wei, Yunchao; Zhao, Yao; Lu, Canyi; Wei, Shikui; Liu, Luoqi; Zhu, Zhenfeng; Yan, Shuicheng
2017-02-01
Recently, convolutional neural network (CNN) visual features have demonstrated their powerful ability as a universal representation for various recognition tasks. In this paper, cross-modal retrieval with CNN visual features is implemented with several classic methods. Specifically, off-the-shelf CNN visual features are extracted from the CNN model, which is pretrained on ImageNet with more than one million images from 1000 object categories, as a generic image representation to tackle cross-modal retrieval. To further enhance the representational ability of CNN visual features, based on the pretrained CNN model on ImageNet, a fine-tuning step is performed by using the open source Caffe CNN library for each target data set. Besides, we propose a deep semantic matching method to address the cross-modal retrieval problem with respect to samples which are annotated with one or multiple labels. Extensive experiments on five popular publicly available data sets well demonstrate the superiority of CNN visual features for cross-modal retrieval.
When false recognition is out of control: the case of facial conjunctions.
Jones, Todd C; Bartlett, James C
2009-03-01
In three experiments, a dual-process approach to face recognition memory is examined, with a specific focus on the idea that a recollection process can be used to retrieve configural information of a studied face. Subjects could avoid, with confidence, a recognition error to conjunction lure faces (each a reconfiguration of features from separate studied faces) or feature lure faces (each based on a set of old features and a set of new features) by recalling a studied configuration. In Experiment 1, study repetition (one vs. eight presentations) was manipulated, and in Experiments 2 and 3, retention interval over a short number of trials (0-20) was manipulated. Different measures converged on the conclusion that subjects were unable to use a recollection process to retrieve configural information in an effort to temper recognition errors for conjunction or feature lure faces. A single process, familiarity, appears to be the sole process underlying recognition of conjunction and feature faces, and familiarity contributes, perhaps in whole, to discrimination of old from conjunction faces.
Bermeitinger, Christina; Wentura, Dirk; Frings, Christian
2011-06-01
"Semantic priming" refers to the phenomenon that people react faster to target words preceded by semantically related rather than semantically unrelated words. We wondered whether momentary mind sets modulate semantic priming for natural versus artifactual categories. We interspersed a category priming task with a second task that required participants to react to either the perceptual or action features of simple geometric shapes. Focusing on perceptual features enhanced semantic priming effects for natural categories, whereas focusing on action features enhanced semantic priming effects for artifactual categories. In fact, significant priming effects emerged only for those categories thought to rely on the features activated by the second task. This result suggests that (a) priming effects depend on momentary mind set and (b) features can be weighted flexibly in concept representations; it is also further evidence for sensory-functional accounts of concept and category representation.
NASA Astrophysics Data System (ADS)
de Araujo, Zandra; Orrill, Chandra Hawley; Jacobson, Erik
2018-04-01
While there is considerable scholarship describing principles for effective professional development, there have been few attempts to examine these principles in practice. In this paper, we identify and examine the particular design features of a mathematics professional development experience provided for middle grades teachers over 14 weeks. The professional development was grounded in a set of mathematical tasks that each had one right answer, but multiple solution paths. The facilitator engaged participants in problem solving and encouraged participants to work collaboratively to explore different solution paths. Through analysis of this collaborative learning environment, we identified five design features for supporting teacher learning of important mathematics and pedagogy in a problem-solving setting. We discuss these design features in depth and illustrate them by presenting an elaborated example from the professional development. This study extends the existing guidance for the design of professional development by examining and operationalizing the relationships among research-based features of effective professional development and the enacted features of a particular design.
Real estate value prediction using multivariate regression models
NASA Astrophysics Data System (ADS)
Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav
2017-11-01
The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
Breast Cancer Detection with Reduced Feature Set.
Mert, Ahmet; Kılıç, Niyazi; Bilgili, Erdem; Akan, Aydin
2015-01-01
This paper explores feature reduction properties of independent component analysis (ICA) on breast cancer decision support system. Wisconsin diagnostic breast cancer (WDBC) dataset is reduced to one-dimensional feature vector computing an independent component (IC). The original data with 30 features and reduced one feature (IC) are used to evaluate diagnostic accuracy of the classifiers such as k-nearest neighbor (k-NN), artificial neural network (ANN), radial basis function neural network (RBFNN), and support vector machine (SVM). The comparison of the proposed classification using the IC with original feature set is also tested on different validation (5/10-fold cross-validations) and partitioning (20%-40%) methods. These classifiers are evaluated how to effectively categorize tumors as benign and malignant in terms of specificity, sensitivity, accuracy, F-score, Youden's index, discriminant power, and the receiver operating characteristic (ROC) curve with its criterion values including area under curve (AUC) and 95% confidential interval (CI). This represents an improvement in diagnostic decision support system, while reducing computational complexity.
Separate class true discovery rate degree of association sets for biomarker identification.
Crager, Michael R; Ahmed, Murat
2014-01-01
In 2008, Efron showed that biological features in a high-dimensional study can be divided into classes and a separate false discovery rate (FDR) analysis can be conducted in each class using information from the entire set of features to assess the FDR within each class. We apply this separate class approach to true discovery rate degree of association (TDRDA) set analysis, which is used in clinical-genomic studies to identify sets of biomarkers having strong association with clinical outcome or state while controlling the FDR. Careful choice of classes based on prior information can increase the identification power of the separate class analysis relative to the overall analysis.
Hiltz, Mary-Ann; Mitton, Craig; Smith, Neale; Dowling, Laura; Campbell, Matthew; Magee, J Fergall; Gibson, Jennifer L; Gujar, Shashi Ashok; Levy, Adrian
2015-01-01
There are powerful arguments for increased investment in child and youth health. But the extent to which these benefits can be realized is shaped by health institutions' priority setting processes. We asked, "What are the unique features of a pediatric care setting that should influence choice and implementation of a formal priority setting and resource allocation process?" Based on multiple sources of data, we created a "made-for-child-health" lens containing three foci reflective of the distinct features of pediatric care settings: the diversity of child and youth populations, the challenges in measuring outcomes and the complexity of patient and public engagement.
NASA Astrophysics Data System (ADS)
Jagodziński, Dariusz; Matysiewicz, Mateusz; Neumann, Łukasz; Nowak, Robert M.; Okuniewski, Rafał; Oleszkiewicz, Witold; Cichosz, Paweł
2016-09-01
This contribution introduces the method of cancer pathologies detection on breast skin temperature distribution images. The use of thermosensitive foils applied to the breasts skin allows to create thermograms, which displays the amount of infrared energy emitted by all breast cells. The significant foci of hyperthermia or inflammation are typical for cancer cells. That foci can be recognized on thermograms as a contours, which are the areas of higher temperature. Every contour can be converted to a feature set that describe it, using the raw, central, Hu, outline, Fourier and colour moments of image pixels processing. This paper defines also the new way of describing a set of contours through theirs neighbourhood relations. Contribution introduces moreover the way of ranking and selecting most relevant features. Authors used Neural Network with Gevrey`s concept and recursive feature elimination, to estimate feature importance.
The limb movement analysis of rehabilitation exercises using wearable inertial sensors.
Bingquan Huang; Giggins, Oonagh; Kechadi, Tahar; Caulfield, Brian
2016-08-01
Due to no supervision of a therapist in home based exercise programs, inertial sensor based feedback systems which can accurately assess movement repetitions are urgently required. The synchronicity and the degrees of freedom both show that one movement might resemble another movement signal which is mixed in with another not precisely defined movement. Therefore, the data and feature selections are important for movement analysis. This paper explores the data and feature selection for the limb movement analysis of rehabilitation exercises. The results highlight that the classification accuracy is very sensitive to the mount location of the sensors. The results show that the use of 2 or 3 sensor units, the combination of acceleration and gyroscope data, and the feature sets combined by the statistical feature set with another type of feature, can significantly improve the classification accuracy rates. The results illustrate that acceleration data is more effective than gyroscope data for most of the movement analysis.
Rotation, scale, and translation invariant pattern recognition using feature extraction
NASA Astrophysics Data System (ADS)
Prevost, Donald; Doucet, Michel; Bergeron, Alain; Veilleux, Luc; Chevrette, Paul C.; Gingras, Denis J.
1997-03-01
A rotation, scale and translation invariant pattern recognition technique is proposed.It is based on Fourier- Mellin Descriptors (FMD). Each FMD is taken as an independent feature of the object, and a set of those features forms a signature. FMDs are naturally rotation invariant. Translation invariance is achieved through pre- processing. A proper normalization of the FMDs gives the scale invariance property. This approach offers the double advantage of providing invariant signatures of the objects, and a dramatic reduction of the amount of data to process. The compressed invariant feature signature is next presented to a multi-layered perceptron neural network. This final step provides some robustness to the classification of the signatures, enabling good recognition behavior under anamorphically scaled distortion. We also present an original feature extraction technique, adapted to optical calculation of the FMDs. A prototype optical set-up was built, and experimental results are presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kuo, H; Tome, W; FOX, J
2014-06-15
Purpose: To study the feasibility of applying cancer risk model established from treated patients to predict the risk of recurrence on follow-up mammography after radiation therapy for both ipsilateral and contralateral breast. Methods: An extensive set of textural feature functions was applied to a set of 196 Mammograms from 50 patients. 56 Mammograms from 28 patients were used as training set, 44 mammograms from 22 patients were used as test set and the rest were used for prediction. Feature functions include Histogram, Gradient, Co-Occurrence Matrix, Run-Length Matrix and Wavelet Energy. An optimum subset of the feature functions was selected bymore » Fisher Coefficient (FO) or Mutual Information (MI) (up to top 10 features) or a method combined FO, MI and Principal Component (FMP) (up to top 30 features). One-Nearest Neighbor (1-NN), Linear Discriminant Analysis (LDA) and Nonlinear Discriminant Analysis (NDA) were utilized to build a risk model of breast cancer from the training set of mammograms at the time of diagnosis. The risk model was then used to predict the risk of recurrence from mammogram taken one year and three years after RT. Results: FPM with NDA has the best classification power in classifying the training set of the mammogram with lesions versus those without lesions. The model of FPM with NDA achieved a true positive (TP) rate of 82% compared to 45.5% of using FO with 1-NN. The best false positive (FP) rates were 0% and 3.6% in contra-lateral breast of 1-year and 3-years after RT, and 10.9% in ipsi-lateral breast of 3-years after RT. Conclusion: Texture analysis offers high dimension to differentiate breast tissue in mammogram. Using NDA to classify mammogram with lesion from mammogram without lesion, it can achieve rather high TP and low FP in the surveillance of mammogram for patient with conservative surgery combined RT.« less
Classification Influence of Features on Given Emotions and Its Application in Feature Selection
NASA Astrophysics Data System (ADS)
Xing, Yin; Chen, Chuang; Liu, Li-Long
2018-04-01
In order to solve the problem that there is a large amount of redundant data in high-dimensional speech emotion features, we analyze deeply the extracted speech emotion features and select better features. Firstly, a given emotion is classified by each feature. Secondly, the recognition rate is ranked in descending order. Then, the optimal threshold of features is determined by rate criterion. Finally, the better features are obtained. When applied in Berlin and Chinese emotional data set, the experimental results show that the feature selection method outperforms the other traditional methods.
NASA Astrophysics Data System (ADS)
Moldovanu, Simona; Bibicu, Dorin; Moraru, Luminita; Nicolae, Mariana Carmen
2011-12-01
Co-occurrence matrix has been applied successfully for echographic images characterization because it contains information about spatial distribution of grey-scale levels in an image. The paper deals with the analysis of pixels in selected regions of interest of an US image of the liver. The useful information obtained refers to texture features such as entropy, contrast, dissimilarity and correlation extract with co-occurrence matrix. The analyzed US images were grouped in two distinct sets: healthy liver and steatosis (or fatty) liver. These two sets of echographic images of the liver build a database that includes only histological confirmed cases: 10 images of healthy liver and 10 images of steatosis liver. The healthy subjects help to compute four textural indices and as well as control dataset. We chose to study these diseases because the steatosis is the abnormal retention of lipids in cells. The texture features are statistical measures and they can be used to characterize irregularity of tissues. The goal is to extract the information using the Nearest Neighbor classification algorithm. The K-NN algorithm is a powerful tool to classify features textures by means of grouping in a training set using healthy liver, on the one hand, and in a holdout set using the features textures of steatosis liver, on the other hand. The results could be used to quantify the texture information and will allow a clear detection between health and steatosis liver.
Chudáček, V; Spilka, J; Janků, P; Koucký, M; Lhotská, L; Huptych, M
2011-08-01
Cardiotocography is the monitoring of fetal heart rate (FHR) and uterine contractions (TOCO), used routinely since the 1960s by obstetricians to detect fetal hypoxia. The evaluation of the FHR in clinical settings is based on an evaluation of macroscopic morphological features and so far has managed to avoid adopting any achievements from the HRV research field. In this work, most of the features utilized for FHR characterization, including FIGO, HRV, nonlinear, wavelet, and time and frequency domain features, are investigated and assessed based on their statistical significance in the task of distinguishing the FHR into three FIGO classes. We assess the features on a large data set (552 records) and unlike in other published papers we use three-class expert evaluation of the records instead of the pH values. We conclude the paper by presenting the best uncorrelated features and their individual rank of importance according to the meta-analysis of three different ranking methods. The number of accelerations and decelerations, interval index, as well as Lempel-Ziv complexity and Higuchi's fractal dimension are among the top five features.
NASA Astrophysics Data System (ADS)
Kushnir, A. F.; Troitsky, E. V.; Haikin, L. M.; Dainty, A.
1999-06-01
A semi-automatic procedure has been developed to achieve statistically optimum discrimination between earthquakes and explosions at local or regional distances based on a learning set specific to a given region. The method is used for step-by-step testing of candidate discrimination features to find the optimum (combination) subset of features, with the decision taken on a rigorous statistical basis. Linear (LDF) and Quadratic (QDF) Discriminant Functions based on Gaussian distributions of the discrimination features are implemented and statistically grounded; the features may be transformed by the Box-Cox transformation z=(1/ α)( yα-1) to make them more Gaussian. Tests of the method were successfully conducted on seismograms from the Israel Seismic Network using features consisting of spectral ratios between and within phases. Results showed that the QDF was more effective than the LDF and required five features out of 18 candidates for the optimum set. It was found that discrimination improved with increasing distance within the local range, and that eliminating transformation of the features and failing to correct for noise led to degradation of discrimination.
Neonatal Seizure Detection Using Deep Convolutional Neural Networks.
Ansari, Amir H; Cherian, Perumpillichira J; Caicedo, Alexander; Naulaers, Gunnar; De Vos, Maarten; Van Huffel, Sabine
2018-04-02
Identifying a core set of features is one of the most important steps in the development of an automated seizure detector. In most of the published studies describing features and seizure classifiers, the features were hand-engineered, which may not be optimal. The main goal of the present paper is using deep convolutional neural networks (CNNs) and random forest to automatically optimize feature selection and classification. The input of the proposed classifier is raw multi-channel EEG and the output is the class label: seizure/nonseizure. By training this network, the required features are optimized, while fitting a nonlinear classifier on the features. After training the network with EEG recordings of 26 neonates, five end layers performing the classification were replaced with a random forest classifier in order to improve the performance. This resulted in a false alarm rate of 0.9 per hour and seizure detection rate of 77% using a test set of EEG recordings of 22 neonates that also included dubious seizures. The newly proposed CNN classifier outperformed three data-driven feature-based approaches and performed similar to a previously developed heuristic method.
NASA Astrophysics Data System (ADS)
Vijverberg, Koen; Ghafoorian, Mohsen; van Uden, Inge W. M.; de Leeuw, Frank-Erik; Platel, Bram; Heskes, Tom
2016-03-01
Cerebral small vessel disease (SVD) is a disorder frequently found among the old people and is associated with deterioration in cognitive performance, parkinsonism, motor and mood impairments. White matter hyperintensities (WMH) as well as lacunes, microbleeds and subcortical brain atrophy are part of the spectrum of image findings, related to SVD. Accurate segmentation of WMHs is important for prognosis and diagnosis of multiple neurological disorders such as MS and SVD. Almost all of the published (semi-)automated WMH detection models employ multiple complex hand-crafted features, which require in-depth domain knowledge. In this paper we propose to apply a single-layer network unsupervised feature learning (USFL) method to avoid hand-crafted features, but rather to automatically learn a more efficient set of features. Experimental results show that a computer aided detection system with a USFL system outperforms a hand-crafted approach. Moreover, since the two feature sets have complementary properties, a hybrid system that makes use of both hand-crafted and unsupervised learned features, shows a significant performance boost compared to each system separately, getting close to the performance of an independent human expert.
Winokur, T S; McClellan, S; Siegal, G P; Reddy, V; Listinsky, C M; Conner, D; Goldman, J; Grimes, G; Vaughn, G; McDonald, J M
1998-07-01
Routine diagnosis of pathology images transmitted over telecommunications lines remains an elusive goal. Part of the resistance stems from the difficulty of enabling image selection by the remote pathologist. To address this problem, a telepathology microscope system (TelePath, TeleMedicine Solutions, Birmingham, Ala) that has features associated with static and dynamic imaging systems was constructed. Features of the system include near real time image transmission, provision of a tiled overview image, free choice of any fields at any desired optical magnification, and automated tracking of the pathologist's image selection. All commands and images are discrete, avoiding many inherent problems of full motion video and continuous remote control. A set of 64 slides was reviewed by 3 pathologists in a simulated frozen section environment. Each pathologist provided diagnoses for all 64 slides, as well as qualitative information about the system. Thirty-one of 192 diagnoses disagreed with the reference diagnosis that had been reached before the trial began. Qf the 31, 13 were deferrals and 12 were diagnoses of cases that had a deferral as the reference diagnosis. In 6 cases, the diagnosis disagreed with the reference diagnosis yielding an overall accuracy of 96.9%. Confidence levels in the diagnoses were high. This trial suggests that this system provides high-quality anatomic pathology services, including intraoperative diagnoses, over telecommunications lines.
A Cloud Microphysics Model for the Gas Giant Planets
NASA Astrophysics Data System (ADS)
Palotai, Csaba J.; Le Beau, Raymond P.; Shankar, Ramanakumar; Flom, Abigail; Lashley, Jacob; McCabe, Tyler
2016-10-01
Recent studies have significantly increased the quality and the number of observed meteorological features on the jovian planets, revealing banded cloud structures and discrete features. Our current understanding of the formation and decay of those clouds also defines the conceptual modes about the underlying atmospheric dynamics. The full interpretation of the new observational data set and the related theories requires modeling these features in a general circulation model (GCM). Here, we present details of our bulk cloud microphysics model that was designed to simulate clouds in the Explicit Planetary Hybrid-Isentropic Coordinate (EPIC) GCM for the jovian planets. The cloud module includes hydrological cycles for each condensable species that consist of interactive vapor, cloud and precipitation phases and it also accounts for latent heating and cooling throughout the transfer processes (Palotai and Dowling, 2008. Icarus, 194, 303-326). Previously, the self-organizing clouds in our simulations successfully reproduced the vertical and horizontal ammonia cloud structure in the vicinity of Jupiter's Great Red Spot and Oval BA (Palotai et al. 2014, Icarus, 232, 141-156). In our recent work, we extended this model to include water clouds on Jupiter and Saturn, ammonia clouds on Saturn, and methane clouds on Uranus and Neptune. Details of our cloud parameterization scheme, our initial results and their comparison with observations will be shown. The latest version of EPIC model is available as open source software from NASA's PDS Atmospheres Node.
Department of Defense Gateway Information System (DGIS) Users’ Guide
1993-10-01
all citations about any or all of ’he search terms of interest . Usually, all of the search terms combined using the OR operator are different ways to...Form at ........................................................................ 5-6 Baud Rate ...Set parity to none Set duplex (or echo)to full or Set duplex (or echo)to full Set baud rate to 300, 1200, 2400, 9600 Set baud rate to
Bag-of-features based medical image retrieval via multiple assignment and visual words weighting.
Wang, Jingyan; Li, Yongping; Zhang, Ying; Wang, Chao; Xie, Honglan; Chen, Guoling; Gao, Xin
2011-11-01
Bag-of-features based approaches have become prominent for image retrieval and image classification tasks in the past decade. Such methods represent an image as a collection of local features, such as image patches and key points with scale invariant feature transform (SIFT) descriptors. To improve the bag-of-features methods, we first model the assignments of local descriptors as contribution functions, and then propose a novel multiple assignment strategy. Assuming the local features can be reconstructed by their neighboring visual words in a vocabulary, reconstruction weights can be solved by quadratic programming. The weights are then used to build contribution functions, resulting in a novel assignment method, called quadratic programming (QP) assignment. We further propose a novel visual word weighting method. The discriminative power of each visual word is analyzed by the sub-similarity function in the bin that corresponds to the visual word. Each sub-similarity function is then treated as a weak classifier. A strong classifier is learned by boosting methods that combine those weak classifiers. The weighting factors of the visual words are learned accordingly. We evaluate the proposed methods on medical image retrieval tasks. The methods are tested on three well-known data sets, i.e., the ImageCLEFmed data set, the 304 CT Set, and the basal-cell carcinoma image set. Experimental results demonstrate that the proposed QP assignment outperforms the traditional nearest neighbor assignment, the multiple assignment, and the soft assignment, whereas the proposed boosting based weighting strategy outperforms the state-of-the-art weighting methods, such as the term frequency weights and the term frequency-inverse document frequency weights.
Linking metabolic network features to phenotypes using sparse group lasso.
Samal, Satya Swarup; Radulescu, Ovidiu; Weber, Andreas; Fröhlich, Holger
2017-11-01
Integration of metabolic networks with '-omics' data has been a subject of recent research in order to better understand the behaviour of such networks with respect to differences between biological and clinical phenotypes. Under the conditions of steady state of the reaction network and the non-negativity of fluxes, metabolic networks can be algebraically decomposed into a set of sub-pathways often referred to as extreme currents (ECs). Our objective is to find the statistical association of such sub-pathways with given clinical outcomes, resulting in a particular instance of a self-contained gene set analysis method. In this direction, we propose a method based on sparse group lasso (SGL) to identify phenotype associated ECs based on gene expression data. SGL selects a sparse set of feature groups and also introduces sparsity within each group. Features in our model are clusters of ECs, and feature groups are defined based on correlations among these features. We apply our method to metabolic networks from KEGG database and study the association of network features to prostate cancer (where the outcome is tumor and normal, respectively) as well as glioblastoma multiforme (where the outcome is survival time). In addition, simulations show the superior performance of our method compared to global test, which is an existing self-contained gene set analysis method. R code (compatible with version 3.2.5) is available from http://www.abi.bit.uni-bonn.de/index.php?id=17. samal@combine.rwth-aachen.de or frohlich@bit.uni-bonn.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Arabic OCR: toward a complete system
NASA Astrophysics Data System (ADS)
El-Bialy, Ahmed M.; Kandil, Ahmed H.; Hashish, Mohamed; Yamany, Sameh M.
1999-12-01
Latin and Chinese OCR systems have been studied extensively in the literature. Yet little work was performed for Arabic character recognition. This is due to the technical challenges found in the Arabic text. Due to its cursive nature, a powerful and stable text segmentation is needed. Also; features capturing the characteristics of the rich Arabic character representation are needed to build the Arabic OCR. In this paper a novel segmentation technique which is font and size independent is introduced. This technique can segment the cursive written text line even if the line suffers from small skewness. The technique is not sensitive to the location of the centerline of the text line and can segment different font sizes and type (for different character sets) occurring on the same line. Features extraction is considered one of the most important phases of the text reading system. Ideally, the features extracted from a character image should capture the essential characteristics of this character that are independent of the font type and size. In such ideal case, the classifier stores a single prototype per character. However, it is practically challenging to find such ideal set of features. In this paper, a set of features that reflect the topological aspects of Arabia characters is proposed. These proposed features integrated with a topological matching technique introduce an Arabic text reading system that is semi Omni.
Eimer, Martin; Kiss, Monika; Nicholas, Susan
2011-12-01
When target-defining features are specified in advance, attentional target selection in visual search is controlled by preparatory top-down task sets. We used ERP measures to study voluntary target selection in the absence of such feature-specific task sets, and to compare it to selection that is guided by advance knowledge about target features. Visual search arrays contained two different color singleton digits, and participants had to select one of these as target and report its parity. Target color was either known in advance (fixed color task) or had to be selected anew on each trial (free color-choice task). ERP correlates of spatially selective attentional target selection (N2pc) and working memory processing (SPCN) demonstrated rapid target selection and efficient exclusion of color singleton distractors from focal attention and working memory in the fixed color task. In the free color-choice task, spatially selective processing also emerged rapidly, but selection efficiency was reduced, with nontarget singleton digits capturing attention and gaining access to working memory. Results demonstrate the benefits of top-down task sets: Feature-specific advance preparation accelerates target selection, rapidly resolves attentional competition, and prevents irrelevant events from attracting attention and entering working memory.
Loba, P; Stewart, S H; Klein, R M; Blackburn, J R
2001-01-01
The present study was conducted to identify game parameters that would reduce the risk of abuse of video lottery terminals (VLTs) by pathological gamblers, while exerting minimal effects on the behavior of non-pathological gamblers. Three manipulations of standard VLT game features were explored. Participants were exposed to: a counter which displayed a running total of money spent; a VLT spinning reels game where participants could no longer "stop" the reels by touching the screen; and sensory feature manipulations. In control conditions, participants were exposed to standard settings for either a spinning reels or a video poker game. Dependent variables were self-ratings of reactions to each set of parameters. A set of 2(3) x 2 x 2 (game manipulation [experimental condition(s) vs. control condition] x game [spinning reels vs. video poker] x gambler status [pathological vs. non-pathological]) repeated measures ANOVAs were conducted on all dependent variables. The findings suggest that the sensory manipulations (i.e., fast speed/sound or slow speed/no sound manipulations) produced the most robust reaction differences. Before advocating harm reduction policies such as lowering sensory features of VLT games to reduce potential harm to pathological gamblers, it is important to replicate findings in a more naturalistic setting, such as a real bar.
Asynchronous transfer mode link performance over ground networks
NASA Technical Reports Server (NTRS)
Chow, E. T.; Markley, R. W.
1993-01-01
The results of an experiment to determine the feasibility of using asynchronous transfer mode (ATM) technology to support advanced spacecraft missions that require high-rate ground communications and, in particular, full-motion video are reported. Potential nodes in such a ground network include Deep Space Network (DSN) antenna stations, the Jet Propulsion Laboratory, and a set of national and international end users. The experiment simulated a lunar microrover, lunar lander, the DSN ground communications system, and distributed science users. The users were equipped with video-capable workstations. A key feature was an optical fiber link between two high-performance workstations equipped with ATM interfaces. Video was also transmitted through JPL's institutional network to a user 8 km from the experiment. Variations in video depending on the networks and computers were observed, the results are reported.
Schlieren photography on freely flying hawkmoth.
Liu, Yun; Roll, Jesse; Van Kooten, Stephen; Deng, Xinyan
2018-05-01
The aerodynamic force on flying insects results from the vortical flow structures that vary both spatially and temporally throughout flight. Due to these complexities and the inherent difficulties in studying flying insects in a natural setting, a complete picture of the vortical flow has been difficult to obtain experimentally. In this paper, Schlieren , a widely used technique for highspeed flow visualization, was adapted to capture the vortex structures around freely flying hawkmoth ( Manduca ). Flow features such as leading-edge vortex, trailing-edge vortex, as well as the full vortex system in the wake were visualized directly. Quantification of the flow from the Schlieren images was then obtained by applying a physics-based optical flow method, extending the potential applications of the method to further studies of flying insects. © 2018 The Author(s).
NASA Astrophysics Data System (ADS)
Back, B. B.; Baker, M. D.; Barton, D. S.; Basilev, S.; Baum, R.; Betts, R. R.; Białas, A.; Bindel, R.; Bogucki, W.; Budzanowski, A.; Busza, W.; Carroll, A.; Ceglia, M.; Chang, Y.-H.; Chen, A. E.; Coghen, T.; Connor, C.; Czyż, W.; Dabrowski, B.; Decowski, M. P.; Despet, M.; Fita, P.; Fitch, J.; Friedl, M.; Gałuszka, K.; Ganz, R.; Garcia, E.; George, N.; Godlewski, J.; Gomes, C.; Griesmayer, E.; Gulbrandsen, K.; Gushue, S.; Halik, J.; Halliwell, C.; Haridas, P.; Hayes, A.; Heintzelman, G. A.; Henderson, C.; Hollis, R.; Hołyński, R.; Hofman, D.; Holzman, B.; Johnson, E.; Kane, J.; Katzy, J.; Kita, W.; Kotuła, J.; Kraner, H.; Kucewicz, W.; Kulinich, P.; Law, C.; Lemler, M.; Ligocki, J.; Lin, W. T.; Manly, S.; McLeod, D.; Michałowski, J.; Mignerey, A.; Mülmenstädt, J.; Neal, M.; Nouicer, R.; Olszewski, A.; Pak, R.; Park, I. C.; Patel, M.; Pernegger, H.; Plesko, M.; Reed, C.; Remsberg, L. P.; Reuter, M.; Roland, C.; Roland, G.; Ross, D.; Rosenberg, L.; Ryan, J.; Sanzgiri, A.; Sarin, P.; Sawicki, P.; Scaduto, J.; Shea, J.; Sinacore, J.; Skulski, W.; Steadman, S. G.; Stephans, G. S. F.; Steinberg, P.; Straczek, A.; Stodulski, M.; Strek, M.; Stopa, Z.; Sukhanov, A.; Surowiecka, K.; Tang, J.-L.; Teng, R.; Trzupek, A.; Vale, C.; van Nieuwenhuizen, G. J.; Verdier, R.; Wadsworth, B.; Wolfs, F. L. H.; Wosiek, B.; Woźniak, K.; Wuosmaa, A. H.; Wysłouch, B.; Zalewski, K.; Żychowski, P.; Phobos Collaboration
2003-03-01
This manuscript contains a detailed description of the PHOBOS experiment as it is configured for the Year 2001 running period. It is capable of detecting charged particles over the full solid angle using a multiplicity detector and measuring identified charged particles near mid-rapidity in two spectrometer arms with opposite magnetic fields. Both of these components utilize silicon pad detectors for charged particle detection. The minimization of material between the collision vertex and the first layers of silicon detectors allows for the detection of charged particles with very low transverse momenta, which is a unique feature of the PHOBOS experiment. Additional detectors include a time-of-flight wall which extends the particle identification range for one spectrometer arm, as well as sets of scintillator paddle and Cherenkov detector arrays for event triggering and centrality selection.
Mn@Si14+: a singlet fullerene-like endohedrally doped silicon cluster.
Ngan, Vu Thi; Pierloot, Kristine; Nguyen, Minh Tho
2013-04-21
The electronic structure of Mn@Si14(+) is determined using DFT and CASPT2/CASSCF(14,15) computations with large basis sets. The endohedrally Mn-doped Si cationic cluster has a D3h fullerene-like structure featuring a closed-shell singlet ground state with a singlet-triplet gap of ~1 eV. A strong stabilizing interaction occurs between the 3d(Mn) and the 2D-shell(Si14) orbitals, and a large amount of charge is transferred from the Si14 cage to the Mn dopant. The 3d(Mn) orbitals are filled by encapsulation, and the magnetic moment of Mn is completely quenched. Full occupation of [2S, 2P, 2D] shell orbitals by 18 delocalized electrons confers the doped Mn@Si14(+) cluster a spherically aromatic character.
NASA Astrophysics Data System (ADS)
Zhang, Jie; Nixon, Andrew; Barber, Tom; Budyn, Nicolas; Bevan, Rhodri; Croxford, Anthony; Wilcox, Paul
2018-04-01
In this paper, a methodology of using finite element (FE) model to validate a ray-based model in the simulation of full matrix capture (FMC) ultrasonic array data set is proposed. The overall aim is to separate signal contributions from different interactions in FE results for easier comparing each individual component in the ray-based model results. This is achieved by combining the results from multiple FE models of the system of interest that include progressively more geometrical features while preserving the same mesh structure. It is shown that the proposed techniques allow the interactions from a large number of different ray-paths to be isolated in FE results and compared directly to the results from a ray-based forward model.
[Rochus, patron saint of physicians and hospitals--a teledermatologic quiz].
Aberer, Werner
2006-07-01
The painting "St. Rochus with an angel" by Quinten Massys in the Alte Pinakothek in Munich was utilized for a teledermatological quiz. First, only a detail of the plague bubo on the thigh was sent electronically to all physicians in our department. The answers were correct descriptions, but the interpretations quite heterogeneous. In a second set, the full painting together with the hint- Pinakothek - was given. Now the number of descriptively correct diagnoses was high; one resident knew the name of the featured individual and his diagnosis. This example demonstrates one problem with teledermatology - when viewing a clinical picture, relevant additional information is frequently essential in order to make a correct diagnosis. In addition, this presentation of saint physicians and hospitals, the holy Rochus, better known to those who are under his protection.
NASA Astrophysics Data System (ADS)
Nishizuka, N.; Sugiura, K.; Kubo, Y.; Den, M.; Watari, S.; Ishii, M.
2017-02-01
We developed a flare prediction model using machine learning, which is optimized to predict the maximum class of flares occurring in the following 24 hr. Machine learning is used to devise algorithms that can learn from and make decisions on a huge amount of data. We used solar observation data during the period 2010-2015, such as vector magnetograms, ultraviolet (UV) emission, and soft X-ray emission taken by the Solar Dynamics Observatory and the Geostationary Operational Environmental Satellite. We detected active regions (ARs) from the full-disk magnetogram, from which ˜60 features were extracted with their time differentials, including magnetic neutral lines, the current helicity, the UV brightening, and the flare history. After standardizing the feature database, we fully shuffled and randomly separated it into two for training and testing. To investigate which algorithm is best for flare prediction, we compared three machine-learning algorithms: the support vector machine, k-nearest neighbors (k-NN), and extremely randomized trees. The prediction score, the true skill statistic, was higher than 0.9 with a fully shuffled data set, which is higher than that for human forecasts. It was found that k-NN has the highest performance among the three algorithms. The ranking of the feature importance showed that previous flare activity is most effective, followed by the length of magnetic neutral lines, the unsigned magnetic flux, the area of UV brightening, and the time differentials of features over 24 hr, all of which are strongly correlated with the flux emergence dynamics in an AR.
Self-organizing neural integration of pose-motion features for human action recognition
Parisi, German I.; Weber, Cornelius; Wermter, Stefan
2015-01-01
The visual recognition of complex, articulated human movements is fundamental for a wide range of artificial systems oriented toward human-robot communication, action classification, and action-driven perception. These challenging tasks may generally involve the processing of a huge amount of visual information and learning-based mechanisms for generalizing a set of training actions and classifying new samples. To operate in natural environments, a crucial property is the efficient and robust recognition of actions, also under noisy conditions caused by, for instance, systematic sensor errors and temporarily occluded persons. Studies of the mammalian visual system and its outperforming ability to process biological motion information suggest separate neural pathways for the distinct processing of pose and motion features at multiple levels and the subsequent integration of these visual cues for action perception. We present a neurobiologically-motivated approach to achieve noise-tolerant action recognition in real time. Our model consists of self-organizing Growing When Required (GWR) networks that obtain progressively generalized representations of sensory inputs and learn inherent spatio-temporal dependencies. During the training, the GWR networks dynamically change their topological structure to better match the input space. We first extract pose and motion features from video sequences and then cluster actions in terms of prototypical pose-motion trajectories. Multi-cue trajectories from matching action frames are subsequently combined to provide action dynamics in the joint feature space. Reported experiments show that our approach outperforms previous results on a dataset of full-body actions captured with a depth sensor, and ranks among the best results for a public benchmark of domestic daily actions. PMID:26106323
Creating global comparative analyses of tectonic rifts, monogenetic volcanism and inverted relief
NASA Astrophysics Data System (ADS)
van Wyk de Vries, Benjamin
2016-04-01
I have been all around the world, and to other planets and have travelled from the present to the Archaean and back to seek out the most significant tectonic rifts, monogenetic volcanoes and examples of inverted relief. I have done this to provide a broad foundation of the comparative analysis for the Chaîne des Puys - Limagne fault nomination to UNESCO world Heritage. This would have been an impossible task, if not for the cooperation of the scientific community and for Google Earth, Google Maps and academic search engines. In preparing global comparisons of geological features, these quite recently developed tools provide a powerful way to find and describe geological features. The ability to do scientific crowd sourcing, rapidly discussing with colleagues about features, allows large numbers of areas to be checked and the open GIS tools (such as Google Earth) allow a standardised description. Search engines also allow the literature on areas to be checked and compared. I will present a comparative study of rifts of the world, monogenetic volcanic field and inverted relief, integrated to analyse the full geological system represented by the Chaîne des Puys - Limagne fault. The analysis confirms that the site is an exceptional example of the first steps of continental drift in a mountain rift setting, and that this is necessarily seen through the combined landscape of tectonic, volcanic and geomorphic features. The analysis goes further to deepen the understanding of geological systems and stresses the need for more study on geological heritage using such a global and broad systems approach.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nishizuka, N.; Kubo, Y.; Den, M.
We developed a flare prediction model using machine learning, which is optimized to predict the maximum class of flares occurring in the following 24 hr. Machine learning is used to devise algorithms that can learn from and make decisions on a huge amount of data. We used solar observation data during the period 2010–2015, such as vector magnetograms, ultraviolet (UV) emission, and soft X-ray emission taken by the Solar Dynamics Observatory and the Geostationary Operational Environmental Satellite . We detected active regions (ARs) from the full-disk magnetogram, from which ∼60 features were extracted with their time differentials, including magnetic neutralmore » lines, the current helicity, the UV brightening, and the flare history. After standardizing the feature database, we fully shuffled and randomly separated it into two for training and testing. To investigate which algorithm is best for flare prediction, we compared three machine-learning algorithms: the support vector machine, k-nearest neighbors (k-NN), and extremely randomized trees. The prediction score, the true skill statistic, was higher than 0.9 with a fully shuffled data set, which is higher than that for human forecasts. It was found that k-NN has the highest performance among the three algorithms. The ranking of the feature importance showed that previous flare activity is most effective, followed by the length of magnetic neutral lines, the unsigned magnetic flux, the area of UV brightening, and the time differentials of features over 24 hr, all of which are strongly correlated with the flux emergence dynamics in an AR.« less
Eyben, Florian; Weninger, Felix; Lehment, Nicolas; Schuller, Björn; Rigoll, Gerhard
2013-01-01
Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion's share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology "out of the lab" to real-world, diverse data. In this contribution, we address the problem of finding "disturbing" scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP) on the official data set of the MediaEval 2012 evaluation campaign's Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis.
Eyben, Florian; Weninger, Felix; Lehment, Nicolas; Schuller, Björn; Rigoll, Gerhard
2013-01-01
Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion's share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology “out of the lab” to real-world, diverse data. In this contribution, we address the problem of finding “disturbing” scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP) on the official data set of the MediaEval 2012 evaluation campaign's Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis. PMID:24391704
Weighted score-level feature fusion based on Dempster-Shafer evidence theory for action recognition
NASA Astrophysics Data System (ADS)
Zhang, Guoliang; Jia, Songmin; Li, Xiuzhi; Zhang, Xiangyin
2018-01-01
The majority of human action recognition methods use multifeature fusion strategy to improve the classification performance, where the contribution of different features for specific action has not been paid enough attention. We present an extendible and universal weighted score-level feature fusion method using the Dempster-Shafer (DS) evidence theory based on the pipeline of bag-of-visual-words. First, the partially distinctive samples in the training set are selected to construct the validation set. Then, local spatiotemporal features and pose features are extracted from these samples to obtain evidence information. The DS evidence theory and the proposed rule of survival of the fittest are employed to achieve evidence combination and calculate optimal weight vectors of every feature type belonging to each action class. Finally, the recognition results are deduced via the weighted summation strategy. The performance of the established recognition framework is evaluated on Penn Action dataset and a subset of the joint-annotated human metabolome database (sub-JHMDB). The experiment results demonstrate that the proposed feature fusion method can adequately exploit the complementarity among multiple features and improve upon most of the state-of-the-art algorithms on Penn Action and sub-JHMDB datasets.
Cost-Sensitive Local Binary Feature Learning for Facial Age Estimation.
Lu, Jiwen; Liong, Venice Erin; Zhou, Jie
2015-12-01
In this paper, we propose a cost-sensitive local binary feature learning (CS-LBFL) method for facial age estimation. Unlike the conventional facial age estimation methods that employ hand-crafted descriptors or holistically learned descriptors for feature representation, our CS-LBFL method learns discriminative local features directly from raw pixels for face representation. Motivated by the fact that facial age estimation is a cost-sensitive computer vision problem and local binary features are more robust to illumination and expression variations than holistic features, we learn a series of hashing functions to project raw pixel values extracted from face patches into low-dimensional binary codes, where binary codes with similar chronological ages are projected as close as possible, and those with dissimilar chronological ages are projected as far as possible. Then, we pool and encode these local binary codes within each face image as a real-valued histogram feature for face representation. Moreover, we propose a cost-sensitive local binary multi-feature learning method to jointly learn multiple sets of hashing functions using face patches extracted from different scales to exploit complementary information. Our methods achieve competitive performance on four widely used face aging data sets.
Chang, Kaowen Grace; Chien, Hungju
2017-07-05
Studies have suggested that visiting and viewing landscaping at hospitals accelerates patient's recovery from surgery and help staff's recovery from mental fatigue. To plan and construct such landscapes, we need to unravel landscape features desirable to different groups so that the space can benefit a wide range of hospital users. Using discrete choice modeling, we developed experimental choice sets to investigate how landscape features influence the visitations of different users in a large regional hospital in Taiwan. The empirical survey provides quantitative estimates of the influence of each landscape feature on four user groups, including patients, caregivers, staff, and neighborhood residents. Our findings suggest that different types of features promote visits from specific user groups. Landscape features facilitating physical activities effectively encourage visits across user groups especially for caregivers and staff. Patients in this study specify a strong need for contact with nature. The nearby community favors the features designed for children's play and family activities. People across user groups value the features that provide a mitigated microclimate of comfort, such as a shelter. Study implications and limitations are also discussed. Our study provides information essential for creating a better healing environment in a hospital setting.
Improved classification accuracy by feature extraction using genetic algorithms
NASA Astrophysics Data System (ADS)
Patriarche, Julia; Manduca, Armando; Erickson, Bradley J.
2003-05-01
A feature extraction algorithm has been developed for the purposes of improving classification accuracy. The algorithm uses a genetic algorithm / hill-climber hybrid to generate a set of linearly recombined features, which may be of reduced dimensionality compared with the original set. The genetic algorithm performs the global exploration, and a hill climber explores local neighborhoods. Hybridizing the genetic algorithm with a hill climber improves both the rate of convergence, and the final overall cost function value; it also reduces the sensitivity of the genetic algorithm to parameter selection. The genetic algorithm includes the operators: crossover, mutation, and deletion / reactivation - the last of these effects dimensionality reduction. The feature extractor is supervised, and is capable of deriving a separate feature space for each tissue (which are reintegrated during classification). A non-anatomical digital phantom was developed as a gold standard for testing purposes. In tests with the phantom, and with images of multiple sclerosis patients, classification with feature extractor derived features yielded lower error rates than using standard pulse sequences, and with features derived using principal components analysis. Using the multiple sclerosis patient data, the algorithm resulted in a mean 31% reduction in classification error of pure tissues.
Asymmetric bagging and feature selection for activities prediction of drug molecules.
Li, Guo-Zheng; Meng, Hao-Hua; Lu, Wen-Cong; Yang, Jack Y; Yang, Mary Qu
2008-05-28
Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an unbalanced situation. Here, asymmetric bagging and feature selection are introduced into the problem and asymmetric bagging of support vector machines (asBagging) is proposed on predicting drug activities to treat the unbalanced problem. At the same time, the features extracted from the structures of drug molecules affect prediction accuracy of QSAR models. Therefore, a novel algorithm named PRIFEAB is proposed, which applies an embedded feature selection method to remove redundant and irrelevant features for asBagging. Numerical experimental results on a data set of molecular activities show that asBagging improve the AUC and sensitivity values of molecular activities and PRIFEAB with feature selection further helps to improve the prediction ability. Asymmetric bagging can help to improve prediction accuracy of activities of drug molecules, which can be furthermore improved by performing feature selection to select relevant features from the drug molecules data sets.
Li, Jun; Xie, Changjian; Guo, Hua
2017-08-30
A full dimensional accurate potential energy surface (PES) for the C( 3 P) and H 2 O reaction is developed based on ∼34 000 data points calculated at the level of the explicitly correlated unrestricted coupled cluster method with single, double, and perturbative triple excitations with the augmented correlation-consistent polarized triple zeta basis set (CCSD(T)-F12a/AVTZ). The PES is invariant with respect to the permutation of the two hydrogen atoms and the total root mean square error (RMSE) of the fit is only 0.31 kcal mol -1 . The PES features two barriers in the entrance channel and several potential minima, as well as multiple product channels. The rate coefficients of this reaction calculated using a transition-state theory and quasi-classical trajectory (QCT) method are small near room temperature, consistent with experiments. The reaction dynamics is also investigated with QCT on the new PES, which found that the reactivity is constrained by the entrance barriers and the final product branching is not statistical.