Efficient Fingercode Classification
NASA Astrophysics Data System (ADS)
Sun, Hong-Wei; Law, Kwok-Yan; Gollmann, Dieter; Chung, Siu-Leung; Li, Jian-Bin; Sun, Jia-Guang
In this paper, we present an efficient fingerprint classification algorithm which is an essential component in many critical security application systems e. g. systems in the e-government and e-finance domains. Fingerprint identification is one of the most important security requirements in homeland security systems such as personnel screening and anti-money laundering. The problem of fingerprint identification involves searching (matching) the fingerprint of a person against each of the fingerprints of all registered persons. To enhance performance and reliability, a common approach is to reduce the search space by firstly classifying the fingerprints and then performing the search in the respective class. Jain et al. proposed a fingerprint classification algorithm based on a two-stage classifier, which uses a K-nearest neighbor classifier in its first stage. The fingerprint classification algorithm is based on the fingercode representation which is an encoding of fingerprints that has been demonstrated to be an effective fingerprint biometric scheme because of its ability to capture both local and global details in a fingerprint image. We enhance this approach by improving the efficiency of the K-nearest neighbor classifier for fingercode-based fingerprint classification. Our research firstly investigates the various fast search algorithms in vector quantization (VQ) and the potential application in fingerprint classification, and then proposes two efficient algorithms based on the pyramid-based search algorithms in VQ. Experimental results on DB1 of FVC 2004 demonstrate that our algorithms can outperform the full search algorithm and the original pyramid-based search algorithms in terms of computational efficiency without sacrificing accuracy.
A fingerprint classification algorithm based on combination of local and global information
NASA Astrophysics Data System (ADS)
Liu, Chongjin; Fu, Xiang; Bian, Junjie; Feng, Jufu
2011-12-01
Fingerprint recognition is one of the most important technologies in biometric identification and has been wildly applied in commercial and forensic areas. Fingerprint classification, as the fundamental procedure in fingerprint recognition, can sharply decrease the quantity for fingerprint matching and improve the efficiency of fingerprint recognition. Most fingerprint classification algorithms are based on the number and position of singular points. Because the singular points detecting method only considers the local information commonly, the classification algorithms are sensitive to noise. In this paper, we propose a novel fingerprint classification algorithm combining the local and global information of fingerprint. Firstly we use local information to detect singular points and measure their quality considering orientation structure and image texture in adjacent areas. Furthermore the global orientation model is adopted to measure the reliability of singular points group. Finally the local quality and global reliability is weighted to classify fingerprint. Experiments demonstrate the accuracy and effectivity of our algorithm especially for the poor quality fingerprint images.
Fingerprint recognition of wavelet-based compressed images by neuro-fuzzy clustering
NASA Astrophysics Data System (ADS)
Liu, Ti C.; Mitra, Sunanda
1996-06-01
Image compression plays a crucial role in many important and diverse applications requiring efficient storage and transmission. This work mainly focuses on a wavelet transform (WT) based compression of fingerprint images and the subsequent classification of the reconstructed images. The algorithm developed involves multiresolution wavelet decomposition, uniform scalar quantization, entropy and run- length encoder/decoder and K-means clustering of the invariant moments as fingerprint features. The performance of the WT-based compression algorithm has been compared with JPEG current image compression standard. Simulation results show that WT outperforms JPEG in high compression ratio region and the reconstructed fingerprint image yields proper classification.
Hierarchical minutiae matching for fingerprint and palmprint identification.
Chen, Fanglin; Huang, Xiaolin; Zhou, Jie
2013-12-01
Fingerprints and palmprints are the most common authentic biometrics for personal identification, especially for forensic security. Previous research have been proposed to speed up the searching process in fingerprint and palmprint identification systems, such as those based on classification or indexing, in which the deterioration of identification accuracy is hard to avert. In this paper, a novel hierarchical minutiae matching algorithm for fingerprint and palmprint identification systems is proposed. This method decomposes the matching step into several stages and rejects many false fingerprints or palmprints on different stages, thus it can save much time while preserving a high identification rate. Experimental results show that the proposed algorithm can save almost 50% searching time compared with traditional methods and illustrate its effectiveness.
Detection and Rectification of Distorted Fingerprints.
Si, Xuanbin; Feng, Jianjiang; Zhou, Jie; Luo, Yuxuan
2015-03-01
Elastic distortion of fingerprints is one of the major causes for false non-match. While this problem affects all fingerprint recognition applications, it is especially dangerous in negative recognition applications, such as watchlist and deduplication applications. In such applications, malicious users may purposely distort their fingerprints to evade identification. In this paper, we proposed novel algorithms to detect and rectify skin distortion based on a single fingerprint image. Distortion detection is viewed as a two-class classification problem, for which the registered ridge orientation map and period map of a fingerprint are used as the feature vector and a SVM classifier is trained to perform the classification task. Distortion rectification (or equivalently distortion field estimation) is viewed as a regression problem, where the input is a distorted fingerprint and the output is the distortion field. To solve this problem, a database (called reference database) of various distorted reference fingerprints and corresponding distortion fields is built in the offline stage, and then in the online stage, the nearest neighbor of the input fingerprint is found in the reference database and the corresponding distortion field is used to transform the input fingerprint into a normal one. Promising results have been obtained on three databases containing many distorted fingerprints, namely FVC2004 DB1, Tsinghua Distorted Fingerprint database, and the NIST SD27 latent fingerprint database.
De Martino, Federico; Gentile, Francesco; Esposito, Fabrizio; Balsi, Marco; Di Salle, Francesco; Goebel, Rainer; Formisano, Elia
2007-01-01
We present a general method for the classification of independent components (ICs) extracted from functional MRI (fMRI) data sets. The method consists of two steps. In the first step, each fMRI-IC is associated with an IC-fingerprint, i.e., a representation of the component in a multidimensional space of parameters. These parameters are post hoc estimates of global properties of the ICs and are largely independent of a specific experimental design and stimulus timing. In the second step a machine learning algorithm automatically separates the IC-fingerprints into six general classes after preliminary training performed on a small subset of expert-labeled components. We illustrate this approach in a multisubject fMRI study employing visual structure-from-motion stimuli encoding faces and control random shapes. We show that: (1) IC-fingerprints are a valuable tool for the inspection, characterization and selection of fMRI-ICs and (2) automatic classifications of fMRI-ICs in new subjects present a high correspondence with those obtained by expert visual inspection of the components. Importantly, our classification procedure highlights several neurophysiologically interesting processes. The most intriguing of which is reflected, with high intra- and inter-subject reproducibility, in one IC exhibiting a transiently task-related activation in the 'face' region of the primary sensorimotor cortex. This suggests that in addition to or as part of the mirror system, somatotopic regions of the sensorimotor cortex are involved in disambiguating the perception of a moving body part. Finally, we show that the same classification algorithm can be successfully applied, without re-training, to fMRI collected using acquisition parameters, stimulation modality and timing considerably different from those used for training.
Three-dimensional fingerprint recognition by using convolution neural network
NASA Astrophysics Data System (ADS)
Tian, Qianyu; Gao, Nan; Zhang, Zonghua
2018-01-01
With the development of science and technology and the improvement of social information, fingerprint recognition technology has become a hot research direction and been widely applied in many actual fields because of its feasibility and reliability. The traditional two-dimensional (2D) fingerprint recognition method relies on matching feature points. This method is not only time-consuming, but also lost three-dimensional (3D) information of fingerprint, with the fingerprint rotation, scaling, damage and other issues, a serious decline in robustness. To solve these problems, 3D fingerprint has been used to recognize human being. Because it is a new research field, there are still lots of challenging problems in 3D fingerprint recognition. This paper presents a new 3D fingerprint recognition method by using a convolution neural network (CNN). By combining 2D fingerprint and fingerprint depth map into CNN, and then through another CNN feature fusion, the characteristics of the fusion complete 3D fingerprint recognition after classification. This method not only can preserve 3D information of fingerprints, but also solves the problem of CNN input. Moreover, the recognition process is simpler than traditional feature point matching algorithm. 3D fingerprint recognition rate by using CNN is compared with other fingerprint recognition algorithms. The experimental results show that the proposed 3D fingerprint recognition method has good recognition rate and robustness.
Improvement of an algorithm for recognition of liveness using perspiration in fingerprint devices
NASA Astrophysics Data System (ADS)
Parthasaradhi, Sujan T.; Derakhshani, Reza; Hornak, Lawrence A.; Schuckers, Stephanie C.
2004-08-01
Previous work in our laboratory and others have demonstrated that spoof fingers made of a variety of materials including silicon, Play-Doh, clay, and gelatin (gummy finger) can be scanned and verified when compared to a live enrolled finger. Liveness, i.e. to determine whether the introduced biometric is coming from a live source, has been suggested as a means to circumvent attacks using spoof fingers. We developed a new liveness method based on perspiration changes in the fingerprint image. Recent results showed approximately 90% classification rate using different classification methods for various technologies including optical, electro-optical, and capacitive DC, a shorter time window and a diverse dataset. This paper focuses on improvement of the live classification rate by using a weight decay method during the training phase in order to improve the generalization and reduce the variance of the neural network based classifier. The dataset included fingerprint images from 33 live subjects, 33 spoofs created with dental impression material and Play-Doh, and fourteen cadaver fingers. 100% live classification was achieved with 81.8 to 100% spoof classification, depending on the device technology. The weight-decay method improves upon past reports by increasing the live and spoof classification rate.
AllergenFP: allergenicity prediction by descriptor fingerprints.
Dimitrov, Ivan; Naneva, Lyudmila; Doytchinova, Irini; Bangov, Ivan
2014-03-15
Allergenicity, like antigenicity and immunogenicity, is a property encoded linearly and non-linearly, and therefore the alignment-based approaches are not able to identify this property unambiguously. A novel alignment-free descriptor-based fingerprint approach is presented here and applied to identify allergens and non-allergens. The approach was implemented into a four step algorithm. Initially, the protein sequences are described by amino acid principal properties as hydrophobicity, size, relative abundance, helix and β-strand forming propensities. Then, the generated strings of different length are converted into vectors with equal length by auto- and cross-covariance (ACC). The vectors were transformed into binary fingerprints and compared in terms of Tanimoto coefficient. The approach was applied to a set of 2427 known allergens and 2427 non-allergens and identified correctly 88% of them with Matthews correlation coefficient of 0.759. The descriptor fingerprint approach presented here is universal. It could be applied for any classification problem in computational biology. The set of E-descriptors is able to capture the main structural and physicochemical properties of amino acids building the proteins. The ACC transformation overcomes the main problem in the alignment-based comparative studies arising from the different length of the aligned protein sequences. The conversion of protein ACC values into binary descriptor fingerprints allows similarity search and classification. The algorithm described in the present study was implemented in a specially designed Web site, named AllergenFP (FP stands for FingerPrint). AllergenFP is written in Python, with GIU in HTML. It is freely accessible at http://ddg-pharmfac.net/Allergen FP. idoytchinova@pharmfac.net or ivanbangov@shu-bg.net.
Sparse modeling applied to patient identification for safety in medical physics applications
NASA Astrophysics Data System (ADS)
Lewkowitz, Stephanie
Every scheduled treatment at a radiation therapy clinic involves a series of safety protocol to ensure the utmost patient care. Despite safety protocol, on a rare occasion an entirely preventable medical event, an accident, may occur. Delivering a treatment plan to the wrong patient is preventable, yet still is a clinically documented error. This research describes a computational method to identify patients with a novel machine learning technique to combat misadministration. The patient identification program stores face and fingerprint data for each patient. New, unlabeled data from those patients are categorized according to the library. The categorization of data by this face-fingerprint detector is accomplished with new machine learning algorithms based on Sparse Modeling that have already begun transforming the foundation of Computer Vision. Previous patient recognition software required special subroutines for faces and different tailored subroutines for fingerprints. In this research, the same exact model is used for both fingerprints and faces, without any additional subroutines and even without adjusting the two hyperparameters. Sparse modeling is a powerful tool, already shown utility in the areas of super-resolution, denoising, inpainting, demosaicing, and sub-nyquist sampling, i.e. compressed sensing. Sparse Modeling is possible because natural images are inherently sparse in some bases, due to their inherent structure. This research chooses datasets of face and fingerprint images to test the patient identification model. The model stores the images of each dataset as a basis (library). One image at a time is removed from the library, and is classified by a sparse code in terms of the remaining library. The Locally Competitive Algorithm, a truly neural inspired Artificial Neural Network, solves the computationally difficult task of finding the sparse code for the test image. The components of the sparse representation vector are summed by ℓ1 pooling, and correct patient identification is consistently achieved 100% over 1000 trials, when either the face data or fingerprint data are implemented as a classification basis. The algorithm gets 100% classification when faces and fingerprints are concatenated into multimodal datasets. This suggests that 100% patient identification will be achievable in the clinal setting.
Optical Methods in Fingerprint Imaging for Medical and Personality Applications
Wang, Jing-Wein; Lin, Ming-Hsun; Chang, Yao-Lang; Kuo, Chia-Ming
2017-01-01
Over the years, analysis and induction of personality traits has been a topic for individual subjective conjecture or speculation, rather than a focus of inductive scientific analysis. This study proposes a novel framework for analysis and induction of personality traits. First, 14 personality constructs based on the “Big Five” personality factors were developed. Next, a new fingerprint image algorithm was used for classification, and the fingerprints were classified into eight types. The relationship between personality traits and fingerprint type was derived from the results of the questionnaire survey. After comparison of pre-test and post-test results, this study determined the induction ability of personality traits from fingerprint type. Experimental results showed that the left/right thumbprint type of a majority of subjects was left loop/right loop and that the personalities of individuals with this fingerprint type were moderate with no significant differences in the 14 personality constructs. PMID:29065556
Optical Methods in Fingerprint Imaging for Medical and Personality Applications.
Wang, Chia-Nan; Wang, Jing-Wein; Lin, Ming-Hsun; Chang, Yao-Lang; Kuo, Chia-Ming
2017-10-23
Over the years, analysis and induction of personality traits has been a topic for individual subjective conjecture or speculation, rather than a focus of inductive scientific analysis. This study proposes a novel framework for analysis and induction of personality traits. First, 14 personality constructs based on the "Big Five" personality factors were developed. Next, a new fingerprint image algorithm was used for classification, and the fingerprints were classified into eight types. The relationship between personality traits and fingerprint type was derived from the results of the questionnaire survey. After comparison of pre-test and post-test results, this study determined the induction ability of personality traits from fingerprint type. Experimental results showed that the left/right thumbprint type of a majority of subjects was left loop/right loop and that the personalities of individuals with this fingerprint type were moderate with no significant differences in the 14 personality constructs.
An Improved WiFi Indoor Positioning Algorithm by Weighted Fusion
Ma, Rui; Guo, Qiang; Hu, Changzhen; Xue, Jingfeng
2015-01-01
The rapid development of mobile Internet has offered the opportunity for WiFi indoor positioning to come under the spotlight due to its low cost. However, nowadays the accuracy of WiFi indoor positioning cannot meet the demands of practical applications. To solve this problem, this paper proposes an improved WiFi indoor positioning algorithm by weighted fusion. The proposed algorithm is based on traditional location fingerprinting algorithms and consists of two stages: the offline acquisition and the online positioning. The offline acquisition process selects optimal parameters to complete the signal acquisition, and it forms a database of fingerprints by error classification and handling. To further improve the accuracy of positioning, the online positioning process first uses a pre-match method to select the candidate fingerprints to shorten the positioning time. After that, it uses the improved Euclidean distance and the improved joint probability to calculate two intermediate results, and further calculates the final result from these two intermediate results by weighted fusion. The improved Euclidean distance introduces the standard deviation of WiFi signal strength to smooth the WiFi signal fluctuation and the improved joint probability introduces the logarithmic calculation to reduce the difference between probability values. Comparing the proposed algorithm, the Euclidean distance based WKNN algorithm and the joint probability algorithm, the experimental results indicate that the proposed algorithm has higher positioning accuracy. PMID:26334278
An Improved WiFi Indoor Positioning Algorithm by Weighted Fusion.
Ma, Rui; Guo, Qiang; Hu, Changzhen; Xue, Jingfeng
2015-08-31
The rapid development of mobile Internet has offered the opportunity for WiFi indoor positioning to come under the spotlight due to its low cost. However, nowadays the accuracy of WiFi indoor positioning cannot meet the demands of practical applications. To solve this problem, this paper proposes an improved WiFi indoor positioning algorithm by weighted fusion. The proposed algorithm is based on traditional location fingerprinting algorithms and consists of two stages: the offline acquisition and the online positioning. The offline acquisition process selects optimal parameters to complete the signal acquisition, and it forms a database of fingerprints by error classification and handling. To further improve the accuracy of positioning, the online positioning process first uses a pre-match method to select the candidate fingerprints to shorten the positioning time. After that, it uses the improved Euclidean distance and the improved joint probability to calculate two intermediate results, and further calculates the final result from these two intermediate results by weighted fusion. The improved Euclidean distance introduces the standard deviation of WiFi signal strength to smooth the WiFi signal fluctuation and the improved joint probability introduces the logarithmic calculation to reduce the difference between probability values. Comparing the proposed algorithm, the Euclidean distance based WKNN algorithm and the joint probability algorithm, the experimental results indicate that the proposed algorithm has higher positioning accuracy.
Adaptive fuzzy leader clustering of complex data sets in pattern recognition
NASA Technical Reports Server (NTRS)
Newton, Scott C.; Pemmaraju, Surya; Mitra, Sunanda
1992-01-01
A modular, unsupervised neural network architecture for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns on-line in a stable and efficient manner. The initial classification is performed in two stages: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from fuzzy C-means system equations for the centroids and the membership values. The AFLC algorithm is applied to the Anderson Iris data and laser-luminescent fingerprint image data. It is concluded that the AFLC algorithm successfully classifies features extracted from real data, discrete or continuous.
An effective one-dimensional anisotropic fingerprint enhancement algorithm
NASA Astrophysics Data System (ADS)
Ye, Zhendong; Xie, Mei
2012-01-01
Fingerprint identification is one of the most important biometric technologies. The performance of the minutiae extraction and the speed of the fingerprint verification system rely heavily on the quality of the input fingerprint images, so the enhancement of the low fingerprint is a critical and difficult step in a fingerprint verification system. In this paper we proposed an effective algorithm for fingerprint enhancement. Firstly we use normalization algorithm to reduce the variations in gray level values along ridges and valleys. Then we utilize the structure tensor approach to estimate each pixel of the fingerprint orientations. At last we propose a novel algorithm which combines the advantages of onedimensional Gabor filtering method and anisotropic method to enhance the fingerprint in recoverable region. The proposed algorithm has been evaluated on the database of Fingerprint Verification Competition 2004, and the results show that our algorithm performs within less time.
An effective one-dimensional anisotropic fingerprint enhancement algorithm
NASA Astrophysics Data System (ADS)
Ye, Zhendong; Xie, Mei
2011-12-01
Fingerprint identification is one of the most important biometric technologies. The performance of the minutiae extraction and the speed of the fingerprint verification system rely heavily on the quality of the input fingerprint images, so the enhancement of the low fingerprint is a critical and difficult step in a fingerprint verification system. In this paper we proposed an effective algorithm for fingerprint enhancement. Firstly we use normalization algorithm to reduce the variations in gray level values along ridges and valleys. Then we utilize the structure tensor approach to estimate each pixel of the fingerprint orientations. At last we propose a novel algorithm which combines the advantages of onedimensional Gabor filtering method and anisotropic method to enhance the fingerprint in recoverable region. The proposed algorithm has been evaluated on the database of Fingerprint Verification Competition 2004, and the results show that our algorithm performs within less time.
Ristivojević, Petar; Trifković, Jelena; Vovk, Irena; Milojković-Opsenica, Dušanka
2017-01-01
Considering the introduction of phytochemical fingerprint analysis, as a method of screening the complex natural products for the presence of most bioactive compounds, use of chemometric classification methods, application of powerful scanning and image capturing and processing devices and algorithms, advancement in development of novel stationary phases as well as various separation modalities, high-performance thin-layer chromatography (HPTLC) fingerprinting is becoming attractive and fruitful field of separation science. Multivariate image analysis is crucial in the light of proper data acquisition. In a current study, different image processing procedures were studied and compared in detail on the example of HPTLC chromatograms of plant resins. In that sense, obtained variables such as gray intensities of pixels along the solvent front, peak area and mean values of peak were used as input data and compared to obtained best classification models. Important steps in image analysis, baseline removal, denoising, target peak alignment and normalization were pointed out. Numerical data set based on mean value of selected bands and intensities of pixels along the solvent front proved to be the most convenient for planar-chromatographic profiling, although required at least the basic knowledge on image processing methodology, and could be proposed for further investigation in HPLTC fingerprinting. Copyright © 2016 Elsevier B.V. All rights reserved.
A new algorithm for distorted fingerprints matching based on normalized fuzzy similarity measure.
Chen, Xinjian; Tian, Jie; Yang, Xin
2006-03-01
Coping with nonlinear distortions in fingerprint matching is a challenging task. This paper proposes a novel algorithm, normalized fuzzy similarity measure (NFSM), to deal with the nonlinear distortions. The proposed algorithm has two main steps. First, the template and input fingerprints were aligned. In this process, the local topological structure matching was introduced to improve the robustness of global alignment. Second, the method NFSM was introduced to compute the similarity between the template and input fingerprints. The proposed algorithm was evaluated on fingerprints databases of FVC2004. Experimental results confirm that NFSM is a reliable and effective algorithm for fingerprint matching with nonliner distortions. The algorithm gives considerably higher matching scores compared to conventional matching algorithms for the deformed fingerprints.
Johnson, LeeAnn K; Brown, Mary B; Carruthers, Ethan A; Ferguson, John A; Dombek, Priscilla E; Sadowsky, Michael J
2004-08-01
A horizontal, fluorophore-enhanced, repetitive extragenic palindromic-PCR (rep-PCR) DNA fingerprinting technique (HFERP) was developed and evaluated as a means to differentiate human from animal sources of Escherichia coli. Box A1R primers and PCR were used to generate 2,466 rep-PCR and 1,531 HFERP DNA fingerprints from E. coli strains isolated from fecal material from known human and 12 animal sources: dogs, cats, horses, deer, geese, ducks, chickens, turkeys, cows, pigs, goats, and sheep. HFERP DNA fingerprinting reduced within-gel grouping of DNA fingerprints and improved alignment of DNA fingerprints between gels, relative to that achieved using rep-PCR DNA fingerprinting. Jackknife analysis of the complete rep-PCR DNA fingerprint library, done using Pearson's product-moment correlation coefficient, indicated that animal and human isolates were assigned to the correct source groups with an 82.2% average rate of correct classification. However, when only unique isolates were examined, isolates from a single animal having a unique DNA fingerprint, Jackknife analysis showed that isolates were assigned to the correct source groups with a 60.5% average rate of correct classification. The percentages of correctly classified isolates were about 15 and 17% greater for rep-PCR and HFERP, respectively, when analyses were done using the curve-based Pearson's product-moment correlation coefficient, rather than the band-based Jaccard algorithm. Rarefaction analysis indicated that, despite the relatively large size of the known-source database, genetic diversity in E. coli was very great and is most likely accounting for our inability to correctly classify many environmental E. coli isolates. Our data indicate that removal of duplicate genotypes within DNA fingerprint libraries, increased database size, proper methods of statistical analysis, and correct alignment of band data within and between gels improve the accuracy of microbial source tracking methods.
Score-Level Fusion of Phase-Based and Feature-Based Fingerprint Matching Algorithms
NASA Astrophysics Data System (ADS)
Ito, Koichi; Morita, Ayumi; Aoki, Takafumi; Nakajima, Hiroshi; Kobayashi, Koji; Higuchi, Tatsuo
This paper proposes an efficient fingerprint recognition algorithm combining phase-based image matching and feature-based matching. In our previous work, we have already proposed an efficient fingerprint recognition algorithm using Phase-Only Correlation (POC), and developed commercial fingerprint verification units for access control applications. The use of Fourier phase information of fingerprint images makes it possible to achieve robust recognition for weakly impressed, low-quality fingerprint images. This paper presents an idea of improving the performance of POC-based fingerprint matching by combining it with feature-based matching, where feature-based matching is introduced in order to improve recognition efficiency for images with nonlinear distortion. Experimental evaluation using two different types of fingerprint image databases demonstrates efficient recognition performance of the combination of the POC-based algorithm and the feature-based algorithm.
Yang, Jun-Ho; Yoh, Jack J
2018-01-01
A novel technique is reported for separating overlapping latent fingerprints using chemometric approaches that combine laser-induced breakdown spectroscopy (LIBS) and multivariate analysis. The LIBS technique provides the capability of real time analysis and high frequency scanning as well as the data regarding the chemical composition of overlapping latent fingerprints. These spectra offer valuable information for the classification and reconstruction of overlapping latent fingerprints by implementing appropriate statistical multivariate analysis. The current study employs principal component analysis and partial least square methods for the classification of latent fingerprints from the LIBS spectra. This technique was successfully demonstrated through a classification study of four distinct latent fingerprints using classification methods such as soft independent modeling of class analogy (SIMCA) and partial least squares discriminant analysis (PLS-DA). The novel method yielded an accuracy of more than 85% and was proven to be sufficiently robust. Furthermore, through laser scanning analysis at a spatial interval of 125 µm, the overlapping fingerprints were reconstructed as separate two-dimensional forms.
Fingerprint separation: an application of ICA
NASA Astrophysics Data System (ADS)
Singh, Meenakshi; Singh, Deepak Kumar; Kalra, Prem Kumar
2008-04-01
Among all existing biometric techniques, fingerprint-based identification is the oldest method, which has been successfully used in numerous applications. Fingerprint-based identification is the most recognized tool in biometrics because of its reliability and accuracy. Fingerprint identification is done by matching questioned and known friction skin ridge impressions from fingers, palms, and toes to determine if the impressions are from the same finger (or palm, toe, etc.). There are many fingerprint matching algorithms which automate and facilitate the job of fingerprint matching, but for any of these algorithms matching can be difficult if the fingerprints are overlapped or mixed. In this paper, we have proposed a new algorithm for separating overlapped or mixed fingerprints so that the performance of the matching algorithms will improve when they are fed with these inputs. Independent Component Analysis (ICA) has been used as a tool to separate the overlapped or mixed fingerprints.
NASA Astrophysics Data System (ADS)
Zhao, Tieyu; Ran, Qiwen; Yuan, Lin; Chi, Yingying; Ma, Jing
2015-09-01
In this paper, a novel image encryption system with fingerprint used as a secret key is proposed based on the phase retrieval algorithm and RSA public key algorithm. In the system, the encryption keys include the fingerprint and the public key of RSA algorithm, while the decryption keys are the fingerprint and the private key of RSA algorithm. If the users share the fingerprint, then the system will meet the basic agreement of asymmetric cryptography. The system is also applicable for the information authentication. The fingerprint as secret key is used in both the encryption and decryption processes so that the receiver can identify the authenticity of the ciphertext by using the fingerprint in decryption process. Finally, the simulation results show the validity of the encryption scheme and the high robustness against attacks based on the phase retrieval technique.
Martins, Lucia Regina Rocha; Pereira-Filho, Edenir Rodrigues; Cass, Quezia Bezerra
2011-04-01
Taking in consideration the global analysis of complex samples, proposed by the metabolomic approach, the chromatographic fingerprint encompasses an attractive chemical characterization of herbal medicines. Thus, it can be used as a tool in quality control analysis of phytomedicines. The generated multivariate data are better evaluated by chemometric analyses, and they can be modeled by classification methods. "Stone breaker" is a popular Brazilian plant of Phyllanthus genus, used worldwide to treat renal calculus, hepatitis, and many other diseases. In this study, gradient elution at reversed-phase conditions with detection at ultraviolet region were used to obtain chemical profiles (fingerprints) of botanically identified samples of six Phyllanthus species. The obtained chromatograms, at 275 nm, were organized in data matrices, and the time shifts of peaks were adjusted using the Correlation Optimized Warping algorithm. Principal Component Analyses were performed to evaluate similarities among cultivated and uncultivated samples and the discrimination among the species and, after that, the samples were used to compose three classification models using Soft Independent Modeling of Class analogy, K-Nearest Neighbor, and Partial Least Squares for Discriminant Analysis. The ability of classification models were discussed after their successful application for authenticity evaluation of 25 commercial samples of "stone breaker."
Gabor filter based fingerprint image enhancement
NASA Astrophysics Data System (ADS)
Wang, Jin-Xiang
2013-03-01
Fingerprint recognition technology has become the most reliable biometric technology due to its uniqueness and invariance, which has been most convenient and most reliable technique for personal authentication. The development of Automated Fingerprint Identification System is an urgent need for modern information security. Meanwhile, fingerprint preprocessing algorithm of fingerprint recognition technology has played an important part in Automatic Fingerprint Identification System. This article introduces the general steps in the fingerprint recognition technology, namely the image input, preprocessing, feature recognition, and fingerprint image enhancement. As the key to fingerprint identification technology, fingerprint image enhancement affects the accuracy of the system. It focuses on the characteristics of the fingerprint image, Gabor filters algorithm for fingerprint image enhancement, the theoretical basis of Gabor filters, and demonstration of the filter. The enhancement algorithm for fingerprint image is in the windows XP platform with matlab.65 as a development tool for the demonstration. The result shows that the Gabor filter is effective in fingerprint image enhancement technology.
Nawaz, Tabassam; Mehmood, Zahid; Rashid, Muhammad; Habib, Hafiz Adnan
2018-01-01
Recent research on speech segregation and music fingerprinting has led to improvements in speech segregation and music identification algorithms. Speech and music segregation generally involves the identification of music followed by speech segregation. However, music segregation becomes a challenging task in the presence of noise. This paper proposes a novel method of speech segregation for unlabelled stationary noisy audio signals using the deep belief network (DBN) model. The proposed method successfully segregates a music signal from noisy audio streams. A recurrent neural network (RNN)-based hidden layer segregation model is applied to remove stationary noise. Dictionary-based fisher algorithms are employed for speech classification. The proposed method is tested on three datasets (TIMIT, MIR-1K, and MusicBrainz), and the results indicate the robustness of proposed method for speech segregation. The qualitative and quantitative analysis carried out on three datasets demonstrate the efficiency of the proposed method compared to the state-of-the-art speech segregation and classification-based methods. PMID:29558485
Jan Evangelista Purkynje (1787-1869): first to describe fingerprints.
Grzybowski, Andrzej; Pietrzak, Krzysztof
2015-01-01
Fingerprints have been used for years as the accepted tool in criminology and for identification. The first system of classification of fingerprints was introduced by Jan Evangelista Purkynje (1787-1869), a Czech physiologist, in 1823. He divided the papillary lines into nine types, based on their geometric arrangement. This work, however, was not recognized internationally for many years. In 1858, Sir William Herschel (1833-1917) registered fingerprints for those signing documents at the Indian magistrate's office in Jungipoor. Henry Faulds (1843-1930) in 1880 proposed using ink for fingerprint determination and people identification, and Francis Galton (1822-1911) collected 8000 fingerprints and developed their classification based on the spirals, loops, and arches. In 1892, Juan Vucetich (1858-1925) created his own fingerprint identification system and proved that a woman was responsible for killing two of her sons. In 1896, a London police officer Edward Henry (1850-1931) expanded on earlier systems of classification and used papillary lines to identify criminals; it was his system that was adopted by the forensic world. The work of Jan Evangelista Purkynje (1787-1869) (Figure 1), who in 1823 was the first to describe in detail fingerprints, is almost forgotten. He also established their classification. The year 2013 marked the 190th anniversary of the publication of his work on this topic. Our contribution is an attempt to introduce the reader to this scientist and his discoveries in the field of fingerprint identification. Copyright © 2015.
NASA Astrophysics Data System (ADS)
Wang, Jianfeng; Lin, Kan; Zheng, Wei; Yu Ho, Khek; Teh, Ming; Guan Yeoh, Khay; Huang, Zhiwei
2015-08-01
This work aims to evaluate clinical value of a fiber-optic Raman spectroscopy technique developed for in vivo diagnosis of esophageal squamous cell carcinoma (ESCC) during clinical endoscopy. We have developed a rapid fiber-optic Raman endoscopic system capable of simultaneously acquiring both fingerprint (FP)(800-1800 cm-1) and high-wavenumber (HW)(2800-3600 cm-1) Raman spectra from esophageal tissue in vivo. A total of 1172 in vivo FP/HW Raman spectra were acquired from 48 esophageal patients undergoing endoscopic examination. The total Raman dataset was split into two parts: 80% for training; while 20% for testing. Partial least squares-discriminant analysis (PLS-DA) and leave-one patient-out, cross validation (LOPCV) were implemented on training dataset to develop diagnostic algorithms for tissue classification. PLS-DA-LOPCV shows that simultaneous FP/HW Raman spectroscopy on training dataset provides a diagnostic sensitivity of 97.0% and specificity of 97.4% for ESCC classification. Further, the diagnostic algorithm applied to the independent testing dataset based on simultaneous FP/HW Raman technique gives a predictive diagnostic sensitivity of 92.7% and specificity of 93.6% for ESCC identification, which is superior to either FP or HW Raman technique alone. This work demonstrates that the simultaneous FP/HW fiber-optic Raman spectroscopy technique improves real-time in vivo diagnosis of esophageal neoplasia at endoscopy.
A Radio-Map Automatic Construction Algorithm Based on Crowdsourcing
Yu, Ning; Xiao, Chenxian; Wu, Yinfeng; Feng, Renjian
2016-01-01
Traditional radio-map-based localization methods need to sample a large number of location fingerprints offline, which requires huge amount of human and material resources. To solve the high sampling cost problem, an automatic radio-map construction algorithm based on crowdsourcing is proposed. The algorithm employs the crowd-sourced information provided by a large number of users when they are walking in the buildings as the source of location fingerprint data. Through the variation characteristics of users’ smartphone sensors, the indoor anchors (doors) are identified and their locations are regarded as reference positions of the whole radio-map. The AP-Cluster method is used to cluster the crowdsourced fingerprints to acquire the representative fingerprints. According to the reference positions and the similarity between fingerprints, the representative fingerprints are linked to their corresponding physical locations and the radio-map is generated. Experimental results demonstrate that the proposed algorithm reduces the cost of fingerprint sampling and radio-map construction and guarantees the localization accuracy. The proposed method does not require users’ explicit participation, which effectively solves the resource-consumption problem when a location fingerprint database is established. PMID:27070623
Comparative Analysis of RF Emission Based Fingerprinting Techniques for ZigBee Device Classification
quantify the differences invarious RF fingerprinting techniques via comparative analysis of MDA/ML classification results. The findings herein demonstrate...correct classification rates followed by COR-DNA and then RF-DNA in most test cases and especially in low Eb/N0 ranges, where ZigBee is designed to operate.
NASA Astrophysics Data System (ADS)
Yang, Gongping; Zhou, Guang-Tong; Yin, Yilong; Yang, Xiukun
2010-12-01
A critical step in an automatic fingerprint recognition system is the segmentation of fingerprint images. Existing methods are usually designed to segment fingerprint images originated from a certain sensor. Thus their performances are significantly affected when dealing with fingerprints collected by different sensors. This work studies the sensor interoperability of fingerprint segmentation algorithms, which refers to the algorithm's ability to adapt to the raw fingerprints obtained from different sensors. We empirically analyze the sensor interoperability problem, and effectively address the issue by proposing a [InlineEquation not available: see fulltext.]-means based segmentation method called SKI. SKI clusters foreground and background blocks of a fingerprint image based on the [InlineEquation not available: see fulltext.]-means algorithm, where a fingerprint block is represented by a 3-dimensional feature vector consisting of block-wise coherence, mean, and variance (abbreviated as CMV). SKI also employs morphological postprocessing to achieve favorable segmentation results. We perform SKI on each fingerprint to ensure sensor interoperability. The interoperability and robustness of our method are validated by experiments performed on a number of fingerprint databases which are obtained from various sensors.
A fingerprint key binding algorithm based on vector quantization and error correction
NASA Astrophysics Data System (ADS)
Li, Liang; Wang, Qian; Lv, Ke; He, Ning
2012-04-01
In recent years, researches on seamless combination cryptosystem with biometric technologies, e.g. fingerprint recognition, are conducted by many researchers. In this paper, we propose a binding algorithm of fingerprint template and cryptographic key to protect and access the key by fingerprint verification. In order to avoid the intrinsic fuzziness of variant fingerprints, vector quantization and error correction technique are introduced to transform fingerprint template and then bind with key, after a process of fingerprint registration and extracting global ridge pattern of fingerprint. The key itself is secure because only hash value is stored and it is released only when fingerprint verification succeeds. Experimental results demonstrate the effectiveness of our ideas.
Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin
2013-11-13
A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results.
Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin
2013-01-01
A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results. PMID:24233027
Zhou, Ru; Zhong, Dexing; Han, Jiuqiang
2013-01-01
The performance of conventional minutiae-based fingerprint authentication algorithms degrades significantly when dealing with low quality fingerprints with lots of cuts or scratches. A similar degradation of the minutiae-based algorithms is observed when small overlapping areas appear because of the quite narrow width of the sensors. Based on the detection of minutiae, Scale Invariant Feature Transformation (SIFT) descriptors are employed to fulfill verification tasks in the above difficult scenarios. However, the original SIFT algorithm is not suitable for fingerprint because of: (1) the similar patterns of parallel ridges; and (2) high computational resource consumption. To enhance the efficiency and effectiveness of the algorithm for fingerprint verification, we propose a SIFT-based Minutia Descriptor (SMD) to improve the SIFT algorithm through image processing, descriptor extraction and matcher. A two-step fast matcher, named improved All Descriptor-Pair Matching (iADM), is also proposed to implement the 1:N verifications in real-time. Fingerprint Identification using SMD and iADM (FISiA) achieved a significant improvement with respect to accuracy in representative databases compared with the conventional minutiae-based method. The speed of FISiA also can meet real-time requirements. PMID:23467056
Cao, Qi; Leung, K M
2014-09-22
Reliable computer models for the prediction of chemical biodegradability from molecular descriptors and fingerprints are very important for making health and environmental decisions. Coupling of the differential evolution (DE) algorithm with the support vector classifier (SVC) in order to optimize the main parameters of the classifier resulted in an improved classifier called the DE-SVC, which is introduced in this paper for use in chemical biodegradability studies. The DE-SVC was applied to predict the biodegradation of chemicals on the basis of extensive sample data sets and known structural features of molecules. Our optimization experiments showed that DE can efficiently find the proper parameters of the SVC. The resulting classifier possesses strong robustness and reliability compared with grid search, genetic algorithm, and particle swarm optimization methods. The classification experiments conducted here showed that the DE-SVC exhibits better classification performance than models previously used for such studies. It is a more effective and efficient prediction model for chemical biodegradability.
ERIC Educational Resources Information Center
Young, Sharon L.
1991-01-01
Presented are activities that focus on gathering, using, and interpreting data about fingerprints as a basis for integrating mathematics and science. Patterns, classification, logical reasoning, and mathematical relationships are explored by making graphs, classifying fingerprints, and matching identical fingerprints. A parent-involvement activity…
Mining disease fingerprints from within genetic pathways.
Nabhan, Ahmed Ragab; Sarkar, Indra Neil
2012-01-01
Mining biological networks can be an effective means to uncover system level knowledge out of micro level associations, such as encapsulated in genetic pathways. Analysis of human disease genetic pathways can lead to the identification of major mechanisms that may underlie disorders at an abstract functional level. The focus of this study was to develop an approach for structural pattern analysis and classification of genetic pathways of diseases. A probabilistic model was developed to capture characteristic components ('fingerprints') of functionally annotated pathways. A probability estimation procedure of this model searched for fingerprints in each disease pathway while improving probability estimates of model parameters. The approach was evaluated on data from the Kyoto Encyclopedia of Genes and Genomes (consisting of 56 pathways across seven disease categories). Based on the achieved average classification accuracy of up to ~77%, the findings suggest that these fingerprints may be used for classification and discovery of genetic pathways.
Localized Dictionaries Based Orientation Field Estimation for Latent Fingerprints.
Xiao Yang; Jianjiang Feng; Jie Zhou
2014-05-01
Dictionary based orientation field estimation approach has shown promising performance for latent fingerprints. In this paper, we seek to exploit stronger prior knowledge of fingerprints in order to further improve the performance. Realizing that ridge orientations at different locations of fingerprints have different characteristics, we propose a localized dictionaries-based orientation field estimation algorithm, in which noisy orientation patch at a location output by a local estimation approach is replaced by real orientation patch in the local dictionary at the same location. The precondition of applying localized dictionaries is that the pose of the latent fingerprint needs to be estimated. We propose a Hough transform-based fingerprint pose estimation algorithm, in which the predictions about fingerprint pose made by all orientation patches in the latent fingerprint are accumulated. Experimental results on challenging latent fingerprint datasets show the proposed method outperforms previous ones markedly.
Algorithm comparison for schedule optimization in MR fingerprinting.
Cohen, Ouri; Rosen, Matthew S
2017-09-01
In MR Fingerprinting, the flip angles and repetition times are chosen according to a pseudorandom schedule. In previous work, we have shown that maximizing the discrimination between different tissue types by optimizing the acquisition schedule allows reductions in the number of measurements required. The ideal optimization algorithm for this application remains unknown, however. In this work we examine several different optimization algorithms to determine the one best suited for optimizing MR Fingerprinting acquisition schedules. Copyright © 2017 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
I. W. Ginsberg
Multiresolutional decompositions known as spectral fingerprints are often used to extract spectral features from multispectral/hyperspectral data. In this study, the authors investigate the use of wavelet-based algorithms for generating spectral fingerprints. The wavelet-based algorithms are compared to the currently used method, traditional convolution with first-derivative Gaussian filters. The comparison analyses consists of two parts: (a) the computational expense of the new method is compared with the computational costs of the current method and (b) the outputs of the wavelet-based methods are compared with those of the current method to determine any practical differences in the resulting spectral fingerprints. The resultsmore » show that the wavelet-based algorithms can greatly reduce the computational expense of generating spectral fingerprints, while practically no differences exist in the resulting fingerprints. The analysis is conducted on a database of hyperspectral signatures, namely, Hyperspectral Digital Image Collection Experiment (HYDICE) signatures. The reduction in computational expense is by a factor of about 30, and the average Euclidean distance between resulting fingerprints is on the order of 0.02.« less
Video fingerprinting for copy identification: from research to industry applications
NASA Astrophysics Data System (ADS)
Lu, Jian
2009-02-01
Research that began a decade ago in video copy detection has developed into a technology known as "video fingerprinting". Today, video fingerprinting is an essential and enabling tool adopted by the industry for video content identification and management in online video distribution. This paper provides a comprehensive review of video fingerprinting technology and its applications in identifying, tracking, and managing copyrighted content on the Internet. The review includes a survey on video fingerprinting algorithms and some fundamental design considerations, such as robustness, discriminability, and compactness. It also discusses fingerprint matching algorithms, including complexity analysis, and approximation and optimization for fast fingerprint matching. On the application side, it provides an overview of a number of industry-driven applications that rely on video fingerprinting. Examples are given based on real-world systems and workflows to demonstrate applications in detecting and managing copyrighted content, and in monitoring and tracking video distribution on the Internet.
NASA Astrophysics Data System (ADS)
Qin, Jin; Tang, Siqi; Han, Congying; Guo, Tiande
2018-04-01
Partial fingerprint identification technology which is mainly used in device with small sensor area like cellphone, U disk and computer, has taken more attention in recent years with its unique advantages. However, owing to the lack of sufficient minutiae points, the conventional method do not perform well in the above situation. We propose a new fingerprint matching technique which utilizes ridges as features to deal with partial fingerprint images and combines the modified generalized Hough transform and scoring strategy based on machine learning. The algorithm can effectively meet the real-time and space-saving requirements of the resource constrained devices. Experiments on in-house database indicate that the proposed algorithm have an excellent performance.
Secure Indoor Localization Based on Extracting Trusted Fingerprint
Yin, Xixi; Zheng, Yanliu; Wang, Chun
2018-01-01
Indoor localization based on WiFi has attracted a lot of research effort because of the widespread application of WiFi. Fingerprinting techniques have received much attention due to their simplicity and compatibility with existing hardware. However, existing fingerprinting localization algorithms may not resist abnormal received signal strength indication (RSSI), such as unexpected environmental changes, impaired access points (APs) or the introduction of new APs. Traditional fingerprinting algorithms do not consider the problem of new APs and impaired APs in the environment when using RSSI. In this paper, we propose a secure fingerprinting localization (SFL) method that is robust to variable environments, impaired APs and the introduction of new APs. In the offline phase, a voting mechanism and a fingerprint database update method are proposed. We use the mutual cooperation between reference anchor nodes to update the fingerprint database, which can reduce the interference caused by the user measurement data. We analyze the standard deviation of RSSI, mobilize the reference points in the database to vote on APs and then calculate the trust factors of APs based on the voting results. In the online phase, we first make a judgment about the new APs and the broken APs, then extract the secure fingerprints according to the trusted factors of APs and obtain the localization results by using the trusted fingerprints. In the experiment section, we demonstrate the proposed method and find that the proposed strategy can resist abnormal RSSI and can improve the localization accuracy effectively compared with the existing fingerprinting localization algorithms. PMID:29401755
Secure Indoor Localization Based on Extracting Trusted Fingerprint.
Luo, Juan; Yin, Xixi; Zheng, Yanliu; Wang, Chun
2018-02-05
[-5]Indoor localization based on WiFi has attracted a lot of research effort because of the widespread application of WiFi. Fingerprinting techniques have received much attention due to their simplicity and compatibility with existing hardware. However, existing fingerprinting localization algorithms may not resist abnormal received signal strength indication (RSSI), such as unexpected environmental changes, impaired access points (APs) or the introduction of new APs. Traditional fingerprinting algorithms do not consider the problem of new APs and impaired APs in the environment when using RSSI. In this paper, we propose a secure fingerprinting localization (SFL) method that is robust to variable environments, impaired APs and the introduction of new APs. In the offline phase, a voting mechanism and a fingerprint database update method are proposed. We use the mutual cooperation between reference anchor nodes to update the fingerprint database, which can reduce the interference caused by the user measurement data. We analyze the standard deviation of RSSI, mobilize the reference points in the database to vote on APs and then calculate the trust factors of APs based on the voting results. In the online phase, we first make a judgment about the new APs and the broken APs, then extract the secure fingerprints according to the trusted factors of APs and obtain the localization results by using the trusted fingerprints. In the experiment section, we demonstrate the proposed method and find that the proposed strategy can resist abnormal RSSI and can improve the localization accuracy effectively compared with the existing fingerprinting localization algorithms.
Resonance Raman of BCC and normal skin
NASA Astrophysics Data System (ADS)
Liu, Cheng-hui; Sriramoju, Vidyasagar; Boydston-White, Susie; Wu, Binlin; Zhang, Chunyuan; Pei, Zhe; Sordillo, Laura; Beckman, Hugh; Alfano, Robert R.
2017-02-01
The Resonance Raman (RR) spectra of basal cell carcinoma (BCC) and normal human skin tissues were analyzed using 532nm laser excitation. RR spectral differences in vibrational fingerprints revealed skin normal and cancerous states tissues. The standard diagnosis criterion for BCC tissues are created by native RR biomarkers and its changes at peak intensity. The diagnostic algorithms for the classification of BCC and normal were generated based on SVM classifier and PCA statistical method. These statistical methods were used to analyze the RR spectral data collected from skin tissues, yielding a diagnostic sensitivity of 98.7% and specificity of 79% compared with pathological reports.
SVM-Based Synthetic Fingerprint Discrimination Algorithm and Quantitative Optimization Strategy
Chen, Suhang; Chang, Sheng; Huang, Qijun; He, Jin; Wang, Hao; Huang, Qiangui
2014-01-01
Synthetic fingerprints are a potential threat to automatic fingerprint identification systems (AFISs). In this paper, we propose an algorithm to discriminate synthetic fingerprints from real ones. First, four typical characteristic factors—the ridge distance features, global gray features, frequency feature and Harris Corner feature—are extracted. Then, a support vector machine (SVM) is used to distinguish synthetic fingerprints from real fingerprints. The experiments demonstrate that this method can achieve a recognition accuracy rate of over 98% for two discrete synthetic fingerprint databases as well as a mixed database. Furthermore, a performance factor that can evaluate the SVM's accuracy and efficiency is presented, and a quantitative optimization strategy is established for the first time. After the optimization of our synthetic fingerprint discrimination task, the polynomial kernel with a training sample proportion of 5% is the optimized value when the minimum accuracy requirement is 95%. The radial basis function (RBF) kernel with a training sample proportion of 15% is a more suitable choice when the minimum accuracy requirement is 98%. PMID:25347063
A Raman spectroscopy bio-sensor for tissue discrimination in surgical robotics.
Ashok, Praveen C; Giardini, Mario E; Dholakia, Kishan; Sibbett, Wilson
2014-01-01
We report the development of a fiber-based Raman sensor to be used in tumour margin identification during endoluminal robotic surgery. Although this is a generic platform, the sensor we describe was adapted for the ARAKNES (Array of Robots Augmenting the KiNematics of Endoluminal Surgery) robotic platform. On such a platform, the Raman sensor is intended to identify ambiguous tissue margins during robot-assisted surgeries. To maintain sterility of the probe during surgical intervention, a disposable sleeve was specially designed. A straightforward user-compatible interface was implemented where a supervised multivariate classification algorithm was used to classify different tissue types based on specific Raman fingerprints so that it could be used without prior knowledge of spectroscopic data analysis. The protocol avoids inter-patient variability in data and the sensor system is not restricted for use in the classification of a particular tissue type. Representative tissue classification assessments were performed using this system on excised tissue. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Unsupervised Indoor Localization Based on Smartphone Sensors, iBeacon and Wi-Fi.
Chen, Jing; Zhang, Yi; Xue, Wei
2018-04-28
In this paper, we propose UILoc, an unsupervised indoor localization scheme that uses a combination of smartphone sensors, iBeacons and Wi-Fi fingerprints for reliable and accurate indoor localization with zero labor cost. Firstly, compared with the fingerprint-based method, the UILoc system can build a fingerprint database automatically without any site survey and the database will be applied in the fingerprint localization algorithm. Secondly, since the initial position is vital to the system, UILoc will provide the basic location estimation through the pedestrian dead reckoning (PDR) method. To provide accurate initial localization, this paper proposes an initial localization module, a weighted fusion algorithm combined with a k-nearest neighbors (KNN) algorithm and a least squares algorithm. In UILoc, we have also designed a reliable model to reduce the landmark correction error. Experimental results show that the UILoc can provide accurate positioning, the average localization error is about 1.1 m in the steady state, and the maximum error is 2.77 m.
Castillo-Cara, Manuel; Lovón-Melgarejo, Jesús; Bravo-Rocca, Gusseppe; Orozco-Barbosa, Luis; García-Varea, Ismael
2017-01-01
Nowadays, there is a great interest in developing accurate wireless indoor localization mechanisms enabling the implementation of many consumer-oriented services. Among the many proposals, wireless indoor localization mechanisms based on the Received Signal Strength Indication (RSSI) are being widely explored. Most studies have focused on the evaluation of the capabilities of different mobile device brands and wireless network technologies. Furthermore, different parameters and algorithms have been proposed as a means of improving the accuracy of wireless-based localization mechanisms. In this paper, we focus on the tuning of the RSSI fingerprint to be used in the implementation of a Bluetooth Low Energy 4.0 (BLE4.0) Bluetooth localization mechanism. Following a holistic approach, we start by assessing the capabilities of two Bluetooth sensor/receiver devices. We then evaluate the relevance of the RSSI fingerprint reported by each BLE4.0 beacon operating at various transmission power levels using feature selection techniques. Based on our findings, we use two classification algorithms in order to improve the setting of the transmission power levels of each of the BLE4.0 beacons. Our main findings show that our proposal can greatly improve the localization accuracy by setting a custom transmission power level for each BLE4.0 beacon. PMID:28590413
Castillo-Cara, Manuel; Lovón-Melgarejo, Jesús; Bravo-Rocca, Gusseppe; Orozco-Barbosa, Luis; García-Varea, Ismael
2017-06-07
Nowadays, there is a great interest in developing accurate wireless indoor localization mechanisms enabling the implementation of many consumer-oriented services. Among the many proposals, wireless indoor localization mechanisms based on the Received Signal Strength Indication (RSSI) are being widely explored. Most studies have focused on the evaluation of the capabilities of different mobile device brands and wireless network technologies. Furthermore, different parameters and algorithms have been proposed as a means of improving the accuracy of wireless-based localization mechanisms. In this paper, we focus on the tuning of the RSSI fingerprint to be used in the implementation of a Bluetooth Low Energy 4.0 (BLE4.0) Bluetooth localization mechanism. Following a holistic approach, we start by assessing the capabilities of two Bluetooth sensor/receiver devices. We then evaluate the relevance of the RSSI fingerprint reported by each BLE4.0 beacon operating at various transmission power levels using feature selection techniques. Based on our findings, we use two classification algorithms in order to improve the setting of the transmission power levels of each of the BLE4.0 beacons. Our main findings show that our proposal can greatly improve the localization accuracy by setting a custom transmission power level for each BLE4.0 beacon.
Highly efficient classification and identification of human pathogenic bacteria by MALDI-TOF MS.
Hsieh, Sen-Yung; Tseng, Chiao-Li; Lee, Yun-Shien; Kuo, An-Jing; Sun, Chien-Feng; Lin, Yen-Hsiu; Chen, Jen-Kun
2008-02-01
Accurate and rapid identification of pathogenic microorganisms is of critical importance in disease treatment and public health. Conventional work flows are time-consuming, and procedures are multifaceted. MS can be an alternative but is limited by low efficiency for amino acid sequencing as well as low reproducibility for spectrum fingerprinting. We systematically analyzed the feasibility of applying MS for rapid and accurate bacterial identification. Directly applying bacterial colonies without further protein extraction to MALDI-TOF MS analysis revealed rich peak contents and high reproducibility. The MS spectra derived from 57 isolates comprising six human pathogenic bacterial species were analyzed using both unsupervised hierarchical clustering and supervised model construction via the Genetic Algorithm. Hierarchical clustering analysis categorized the spectra into six groups precisely corresponding to the six bacterial species. Precise classification was also maintained in an independently prepared set of bacteria even when the numbers of m/z values were reduced to six. In parallel, classification models were constructed via Genetic Algorithm analysis. A model containing 18 m/z values accurately classified independently prepared bacteria and identified those species originally not used for model construction. Moreover bacteria fewer than 10(4) cells and different species in bacterial mixtures were identified using the classification model approach. In conclusion, the application of MALDI-TOF MS in combination with a suitable model construction provides a highly accurate method for bacterial classification and identification. The approach can identify bacteria with low abundance even in mixed flora, suggesting that a rapid and accurate bacterial identification using MS techniques even before culture can be attained in the near future.
Mining Disease Fingerprints From Within Genetic Pathways
Nabhan, Ahmed Ragab; Sarkar, Indra Neil
2012-01-01
Mining biological networks can be an effective means to uncover system level knowledge out of micro level associations, such as encapsulated in genetic pathways. Analysis of human disease genetic pathways can lead to the identification of major mechanisms that may underlie disorders at an abstract functional level. The focus of this study was to develop an approach for structural pattern analysis and classification of genetic pathways of diseases. A probabilistic model was developed to capture characteristic components (‘fingerprints’) of functionally annotated pathways. A probability estimation procedure of this model searched for fingerprints in each disease pathway while improving probability estimates of model parameters. The approach was evaluated on data from the Kyoto Encyclopedia of Genes and Genomes (consisting of 56 pathways across seven disease categories). Based on the achieved average classification accuracy of up to ∼77%, the findings suggest that these fingerprints may be used for classification and discovery of genetic pathways. PMID:23304411
Jiménez-Carvelo, Ana M; González-Casado, Antonio; Pérez-Castaño, Estefanía; Cuadros-Rodríguez, Luis
2017-03-01
A new analytical method for the differentiation of olive oil from other vegetable oils using reversed-phase LC and applying chemometric techniques was developed. A 3 cm short column was used to obtain the chromatographic fingerprint of the methyl-transesterified fraction of each vegetable oil. The chromatographic analysis took only 4 min. The multivariate classification methods used were k-nearest neighbors, partial least-squares (PLS) discriminant analysis, one-class PLS, support vector machine classification, and soft independent modeling of class analogies. The discrimination of olive oil from other vegetable edible oils was evaluated by several classification quality metrics. Several strategies for the classification of the olive oil were used: one input-class, two input-class, and pseudo two input-class.
Schneider, Nadine; Lowe, Daniel M; Sayle, Roger A; Landrum, Gregory A
2015-01-26
Fingerprint methods applied to molecules have proven to be useful for similarity determination and as inputs to machine-learning models. Here, we present the development of a new fingerprint for chemical reactions and validate its usefulness in building machine-learning models and in similarity assessment. Our final fingerprint is constructed as the difference of the atom-pair fingerprints of products and reactants and includes agents via calculated physicochemical properties. We validated the fingerprints on a large data set of reactions text-mined from granted United States patents from the last 40 years that have been classified using a substructure-based expert system. We applied machine learning to build a 50-class predictive model for reaction-type classification that correctly predicts 97% of the reactions in an external test set. Impressive accuracies were also observed when applying the classifier to reactions from an in-house electronic laboratory notebook. The performance of the novel fingerprint for assessing reaction similarity was evaluated by a cluster analysis that recovered 48 out of 50 of the reaction classes with a median F-score of 0.63 for the clusters. The data sets used for training and primary validation as well as all python scripts required to reproduce the analysis are provided in the Supporting Information.
Chen, Wei; Wang, Weiping; Li, Qun; Chang, Qiang; Hou, Hongtao
2016-01-01
Indoor positioning based on existing Wi-Fi fingerprints is becoming more and more common. Unfortunately, the Wi-Fi fingerprint is susceptible to multiple path interferences, signal attenuation, and environmental changes, which leads to low accuracy. Meanwhile, with the recent advances in charge-coupled device (CCD) technologies and the processing speed of smartphones, indoor positioning using the optical camera on a smartphone has become an attractive research topic; however, the major challenge is its high computational complexity; as a result, real-time positioning cannot be achieved. In this paper we introduce a crowd-sourcing indoor localization algorithm via an optical camera and orientation sensor on a smartphone to address these issues. First, we use Wi-Fi fingerprint based on the K Weighted Nearest Neighbor (KWNN) algorithm to make a coarse estimation. Second, we adopt a mean-weighted exponent algorithm to fuse optical image features and orientation sensor data as well as KWNN in the smartphone to refine the result. Furthermore, a crowd-sourcing approach is utilized to update and supplement the positioning database. We perform several experiments comparing our approach with other positioning algorithms on a common smartphone to evaluate the performance of the proposed sensor-calibrated algorithm, and the results demonstrate that the proposed algorithm could significantly improve accuracy, stability, and applicability of positioning. PMID:27007379
Chen, Wei; Wang, Weiping; Li, Qun; Chang, Qiang; Hou, Hongtao
2016-03-19
Indoor positioning based on existing Wi-Fi fingerprints is becoming more and more common. Unfortunately, the Wi-Fi fingerprint is susceptible to multiple path interferences, signal attenuation, and environmental changes, which leads to low accuracy. Meanwhile, with the recent advances in charge-coupled device (CCD) technologies and the processing speed of smartphones, indoor positioning using the optical camera on a smartphone has become an attractive research topic; however, the major challenge is its high computational complexity; as a result, real-time positioning cannot be achieved. In this paper we introduce a crowd-sourcing indoor localization algorithm via an optical camera and orientation sensor on a smartphone to address these issues. First, we use Wi-Fi fingerprint based on the K Weighted Nearest Neighbor (KWNN) algorithm to make a coarse estimation. Second, we adopt a mean-weighted exponent algorithm to fuse optical image features and orientation sensor data as well as KWNN in the smartphone to refine the result. Furthermore, a crowd-sourcing approach is utilized to update and supplement the positioning database. We perform several experiments comparing our approach with other positioning algorithms on a common smartphone to evaluate the performance of the proposed sensor-calibrated algorithm, and the results demonstrate that the proposed algorithm could significantly improve accuracy, stability, and applicability of positioning.
Authentication of Trappist beers by LC-MS fingerprints and multivariate data analysis.
Mattarucchi, Elia; Stocchero, Matteo; Moreno-Rojas, José Manuel; Giordano, Giuseppe; Reniero, Fabiano; Guillou, Claude
2010-12-08
The aim of this study was to asses the applicability of LC-MS profiling to authenticate a selected Trappist beer as part of a program on traceability funded by the European Commission. A total of 232 beers were fingerprinted and classified through multivariate data analysis. The selected beer was clearly distinguished from beers of different brands, while only 3 samples (3.5% of the test set) were wrongly classified when compared with other types of beer of the same Trappist brewery. The fingerprints were further analyzed to extract the most discriminating variables, which proved to be sufficient for classification, even using a simplified unsupervised model. This reduced fingerprint allowed us to study the influence of batch-to-batch variability on the classification model. Our results can easily be applied to different matrices and they confirmed the effectiveness of LC-MS profiling in combination with multivariate data analysis for the characterization of food products.
Detection of illicit substances in fingerprints by infrared spectral imaging.
Ng, Ping Hei Ronnie; Walker, Sarah; Tahtouh, Mark; Reedy, Brian
2009-08-01
FTIR and Raman spectral imaging can be used to simultaneously image a latent fingerprint and detect exogenous substances deposited within it. These substances might include drugs of abuse or traces of explosives or gunshot residue. In this work, spectral searching algorithms were tested for their efficacy in finding targeted substances deposited within fingerprints. "Reverse" library searching, where a large number of possibly poor-quality spectra from a spectral image are searched against a small number of high-quality reference spectra, poses problems for common search algorithms as they are usually implemented. Out of a range of algorithms which included conventional Euclidean distance searching, the spectral angle mapper (SAM) and correlation algorithms gave the best results when used with second-derivative image and reference spectra. All methods tested gave poorer performances with first derivative and undifferentiated spectra. In a search against a caffeine reference, the SAM and correlation methods were able to correctly rank a set of 40 confirmed but poor-quality caffeine spectra at the top of a dataset which also contained 4,096 spectra from an image of an uncontaminated latent fingerprint. These methods also successfully and individually detected aspirin, diazepam and caffeine that had been deposited together in another fingerprint, and they did not indicate any of these substances as a match in a search for another substance which was known not to be present. The SAM was used to successfully locate explosive components in fingerprints deposited on silicon windows. The potential of other spectral searching algorithms used in the field of remote sensing is considered, and the applicability of the methods tested in this work to other modes of spectral imaging is discussed.
Converting Panax ginseng DNA and chemical fingerprints into two-dimensional barcode.
Cai, Yong; Li, Peng; Li, Xi-Wen; Zhao, Jing; Chen, Hai; Yang, Qing; Hu, Hao
2017-07-01
In this study, we investigated how to convert the Panax ginseng DNA sequence code and chemical fingerprints into a two-dimensional code. In order to improve the compression efficiency, GATC2Bytes and digital merger compression algorithms are proposed. HPLC chemical fingerprint data of 10 groups of P. ginseng from Northeast China and the internal transcribed spacer 2 (ITS2) sequence code as the DNA sequence code were ready for conversion. In order to convert such data into a two-dimensional code, the following six steps were performed: First, the chemical fingerprint characteristic data sets were obtained through the inflection filtering algorithm. Second, precompression processing of such data sets is undertaken. Third, precompression processing was undertaken with the P. ginseng DNA (ITS2) sequence codes. Fourth, the precompressed chemical fingerprint data and the DNA (ITS2) sequence code were combined in accordance with the set data format. Such combined data can be compressed by Zlib, an open source data compression algorithm. Finally, the compressed data generated a two-dimensional code called a quick response code (QR code). Through the abovementioned converting process, it can be found that the number of bytes needed for storing P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can be greatly reduced. After GTCA2Bytes algorithm processing, the ITS2 compression rate reaches 75% and the chemical fingerprint compression rate exceeds 99.65% via filtration and digital merger compression algorithm processing. Therefore, the overall compression ratio even exceeds 99.36%. The capacity of the formed QR code is around 0.5k, which can easily and successfully be read and identified by any smartphone. P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can form a QR code after data processing, and therefore the QR code can be a perfect carrier of the authenticity and quality of P. ginseng information. This study provides a theoretical basis for the development of a quality traceability system of traditional Chinese medicine based on a two-dimensional code.
Mated Fingerprint Card Pairs 2 (MFCP2)
National Institute of Standards and Technology Data Gateway
NIST Mated Fingerprint Card Pairs 2 (MFCP2) (Web, free access) NIST Special Database 14 is being distributed for use in development and testing of automated fingerprint classification and matching systems on a set of images which approximate a natural horizontal distribution of the National Crime Information Center (NCIC) fingerprint classes. A newer version of the compression/decompression software on the CDROM can be found at the website http://www.nist.gov/itl/iad/ig/nigos.cfm as part of the NBIS package.
Detection and classification of concealed weapons using a magnetometer-based portal
NASA Astrophysics Data System (ADS)
Kotter, Dale K.; Roybal, Lyle G.; Polk, Robert E.
2002-08-01
A concealed weapons detection technology was developed through the support of the National Institute of Justice (NIJ) to provide a non intrusive means for rapid detection, location, and archiving of data (including visual) of potential suspects and weapon threats. This technology, developed by the Idaho National Engineering and Environmental Laboratory (INEEL), has been applied in a portal style weapons detection system using passive magnetic sensors as its basis. This paper will report on enhancements to the weapon detection system to enable weapon classification and to discriminate threats from non-threats. Advanced signal processing algorithms were used to analyze the magnetic spectrum generated when a person passes through a portal. These algorithms analyzed multiple variables including variance in the magnetic signature from random weapon placement and/or orientation. They perform pattern recognition and calculate the probability that the collected magnetic signature correlates to a known database of weapon versus non-weapon responses. Neural networks were used to further discriminate weapon type and identify controlled electronic items such as cell phones and pagers. False alarms were further reduced by analyzing the magnetic detector response by using a Joint Time Frequency Analysis digital signal processing technique. The frequency components and power spectrum for a given sensor response were derived. This unique fingerprint provided additional information to aid in signal analysis. This technology has the potential to produce major improvements in weapon detection and classification.
Integrating Fingerprint Verification into the Smart Card-Based Healthcare Information System
NASA Astrophysics Data System (ADS)
Moon, Daesung; Chung, Yongwha; Pan, Sung Bum; Park, Jin-Won
2009-12-01
As VLSI technology has been improved, a smart card employing 32-bit processors has been released, and more personal information such as medical, financial data can be stored in the card. Thus, it becomes important to protect personal information stored in the card. Verification of the card holder's identity using a fingerprint has advantages over the present practices of Personal Identification Numbers (PINs) and passwords. However, the computational workload of fingerprint verification is much heavier than that of the typical PIN-based solution. In this paper, we consider three strategies to implement fingerprint verification in a smart card environment and how to distribute the modules of fingerprint verification between the smart card and the card reader. We first evaluate the number of instructions of each step of a typical fingerprint verification algorithm, and estimate the execution time of several cryptographic algorithms to guarantee the security/privacy of the fingerprint data transmitted in the smart card with the client-server environment. Based on the evaluation results, we analyze each scenario with respect to the security level and the real-time execution requirements in order to implement fingerprint verification in the smart card with the client-server environment.
LiCABEDS II. Modeling of ligand selectivity for G-protein-coupled cannabinoid receptors.
Ma, Chao; Wang, Lirong; Yang, Peng; Myint, Kyaw Z; Xie, Xiang-Qun
2013-01-28
The cannabinoid receptor subtype 2 (CB2) is a promising therapeutic target for blood cancer, pain relief, osteoporosis, and immune system disease. The recent withdrawal of Rimonabant, which targets another closely related cannabinoid receptor (CB1), accentuates the importance of selectivity for the development of CB2 ligands in order to minimize their effects on the CB1 receptor. In our previous study, LiCABEDS (Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps) was reported as a generic ligand classification algorithm for the prediction of categorical molecular properties. Here, we report extension of the application of LiCABEDS to the modeling of cannabinoid ligand selectivity with molecular fingerprints as descriptors. The performance of LiCABEDS was systematically compared with another popular classification algorithm, support vector machine (SVM), according to prediction precision and recall rate. In addition, the examination of LiCABEDS models revealed the difference in structure diversity of CB1 and CB2 selective ligands. The structure determination from data mining could be useful for the design of novel cannabinoid lead compounds. More importantly, the potential of LiCABEDS was demonstrated through successful identification of newly synthesized CB2 selective compounds.
Bajoub, Aadil; Medina-Rodríguez, Santiago; Gómez-Romero, María; Ajal, El Amine; Bagur-González, María Gracia; Fernández-Gutiérrez, Alberto; Carrasco-Pancorbo, Alegría
2017-01-15
High Performance Liquid Chromatography (HPLC) with diode array (DAD) and fluorescence (FLD) detection was used to acquire the fingerprints of the phenolic fraction of monovarietal extra-virgin olive oils (extra-VOOs) collected over three consecutive crop seasons (2011/2012-2013/2014). The chromatographic fingerprints of 140 extra-VOO samples processed from olive fruits of seven olive varieties, were recorded and statistically treated for varietal authentication purposes. First, DAD and FLD chromatographic-fingerprint datasets were separately processed and, subsequently, were joined using "Low-level" and "Mid-Level" data fusion methods. After the preliminary examination by principal component analysis (PCA), three supervised pattern recognition techniques, Partial Least Squares Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogies (SIMCA) and K-Nearest Neighbors (k-NN) were applied to the four chromatographic-fingerprinting matrices. The classification models built were very sensitive and selective, showing considerably good recognition and prediction abilities. The combination "chromatographic dataset+chemometric technique" allowing the most accurate classification for each monovarietal extra-VOO was highlighted. Copyright © 2016 Elsevier Ltd. All rights reserved.
Lee, Sang Min; Kim, Hye-Jin; Jang, Young Pyo
2012-01-01
It needs many years of special training to gain expertise on the organoleptic classification of botanical raw materials and, even for those experts, discrimination among Umbelliferae medicinal herbs remains an intricate challenge due to their morphological similarity. To develop a new chemometric classification method using a direct analysis in real time-time of flight-mass spectrometry (DART-TOF-MS) fingerprinting for Umbelliferae medicinal herbs and to provide a platform for its application to the discrimination of other herbal medicines. Angelica tenuissima, Angelica gigas, Angelica dahurica and Cnidium officinale were chosen for this study and ten samples of each species were purchased from various Korean markets. DART-TOF-MS was employed on powdered raw materials to obtain a chemical fingerprint of each sample and the orthogonal partial-least squares method in discriminant analysis (OPLS-DA) was used for multivariate analysis. All samples of collected species were successfully discriminated from each other according to their characteristic DART-TOF-MS fingerprint. Decursin (or decursinol angelate) and byakangelicol were identified as marker molecules for Angelica gigas and A. dahurica, respectively. Using the OPLS method for discriminant analysis, Angelica tenuissima and Cnidium officinale were clearly separated into two groups. Angelica tenuissima was characterised by the presence of ligustilide and unidentified molecular ions of m/z 239 and 283, while senkyunolide A together with signals with m/z 387 and 389 were the marker compounds for Cnidium officinale. Elaborating with chemoinformatics, DART-TOF-MS fingerprinting with chemoinformatic tools results in a powerful method for the classification of morphologically similar Umbelliferae medicinal herbs and quality control of medicinal herbal products, including the extracts of these crude drugs. Copyright © 2012 John Wiley & Sons, Ltd.
Influence of Skin Diseases on Fingerprint Recognition
Drahansky, Martin; Dolezel, Michal; Urbanek, Jaroslav; Brezinova, Eva; Kim, Tai-hoon
2012-01-01
There are many people who suffer from some of the skin diseases. These diseases have a strong influence on the process of fingerprint recognition. People with fingerprint diseases are unable to use fingerprint scanners, which is discriminating for them, since they are not allowed to use their fingerprints for the authentication purposes. First in this paper the various diseases, which might influence functionality of the fingerprint-based systems, are introduced, mainly from the medical point of view. This overview is followed by some examples of diseased finger fingerprints, acquired both from dactyloscopic card and electronic sensors. At the end of this paper the proposed fingerprint image enhancement algorithm is described. PMID:22654483
Influence of skin diseases on fingerprint recognition.
Drahansky, Martin; Dolezel, Michal; Urbanek, Jaroslav; Brezinova, Eva; Kim, Tai-hoon
2012-01-01
There are many people who suffer from some of the skin diseases. These diseases have a strong influence on the process of fingerprint recognition. People with fingerprint diseases are unable to use fingerprint scanners, which is discriminating for them, since they are not allowed to use their fingerprints for the authentication purposes. First in this paper the various diseases, which might influence functionality of the fingerprint-based systems, are introduced, mainly from the medical point of view. This overview is followed by some examples of diseased finger fingerprints, acquired both from dactyloscopic card and electronic sensors. At the end of this paper the proposed fingerprint image enhancement algorithm is described.
A Comparison of RF-DNA Fingerprinting Using High/Low Value Receivers with ZigBee Devices
2014-03-27
99) Device Classification using Hybrid Cross-Receiver model and USRP only testing . 54 V. Conclusion This chapter provides a summary of reseach ...Finally, when developing a Hybrid Cross-Receiver model using fingerprints from both receivers, testing with PXIe-only fingerprints proved to be the most...22 3.2 Post -Collection Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.1 Burst Detection
Gender determination: Role of lip prints, finger prints and mandibular canine index
KRISHNAN, RESHMA POOTHAKULATH; THANGAVELU, RADHIKA; RATHNAVELU, VIDHYA; NARASIMHAN, MALATHI
2016-01-01
Personal identification has a pivotal role in forensic investigations. Gender determination is an essential step in personal identification. Despite the advent of advanced techniques such as DNA fingerprinting, methods such as lip print and fingerprint analysis and mandibular canine index calculations are routinely used in gender determination, as they are simple and cost-effective. The present study investigated the hypothesis that lip print analysis is an effective tool in gender determination compared with fingerprint analysis and the mandibular canine index. The predominant patterns of lip prints and fingerprints were analyzed in males and females, and the efficacy of the mandibular canine index in gender determination was evaluated. The study group comprised 50 students, 25 males and 25 females who were 18–25 years of age. Lip prints and fingerprints were obtained and classified according to Tsuchihashi's classification and Kücken and Newell's classification, respectively. Mandibular impressions were made and the mandibular canine index was calculated. Type I and Type I' lip prints were predominant in females, and Type IV lip prints were predominant in males. The analysis of fingerprints revealed that the loop fingerprint pattern was predominant in both males and females. The mandibular canine index was not found to be significant in gender identification. The predominant patterns of lip prints were distinct for males and females; conversely, fingerprints were demonstrated to be similar in both genders. Therefore, lip prints hold an increased potential for gender determination, as compared with fingerprints, and the mandibular canine index is not a reliable indicator of gender. PMID:27284316
de Jongh, Arent; Lubach, Anko R; Lie Kwie, Sheryl L; Alberink, Ivo
2018-06-11
Latent print examiners often use their experience and knowledge to reach a conclusion on the identity of the source. Their conclusion is primarily based on their personal opinion on the rarity of the matching fingerprint features. Fingerprint patterns, if present, can play a significant role in the final assessment of a match. The authors believe that statistical data on the rarity of fingerprint patterns strengthens the subjective evaluation of the corresponding information. In order to provide fingerprint examiners with additional numerical support, fingerprint patterns were manually classified in a set of 24,104 fingerprints. In this study the frequencies of occurrence of 35 different fingerprint patterns have been obtained. The frequency data presented in this study can be used in the ACE-V process applied in forensic casework, allowing for the assessment of the evidential strength related to a specific fingerprint pattern type. © 2018 American Academy of Forensic Sciences.
Ultrafast fingerprint indexing for embedded systems
NASA Astrophysics Data System (ADS)
Zhou, Ru; Sin, Sang Woo; Li, Dongju; Isshiki, Tsuyoshi; Kunieda, Hiroaki
2011-10-01
A novel core-based fingerprint indexing scheme for embedded systems is presented in this paper. Our approach is enabled by our new precise and fast core-detection algorithm with the direction map. It introduces the feature of CMP (core minutiae pair), which describes the coordinates of minutiae and the direction of ridges associated with the minutiae based on the uniquely defined core coordinates. Since each CMP is identical against the shift and rotation of the fingerprint image, the CMP comparison between a template and an input image can be performed without any alignment. The proposed indexing algorithm based on CMP is suitable for embedded systems because the tremendous speed up and the memory reduction are achieved. In fact, the experiments with the fingerprint database FVC2002 show that its speed for the identifications becomes about 40 times faster than conventional approaches, even though the database includes fingerprints with no core.
Practical aspects of chemometrics for oil spill fingerprinting.
Christensen, Jan H; Tomasi, Giorgio
2007-10-26
Tiered approaches for oil spill fingerprinting have evolved rapidly since the 1990s. Chemometrics provides a large number of tools for pattern recognition, calibration and classification that can increase the speed and the objectivity of the analysis and allow for more extensive use of the available data in this field. However, although the chemometric literature is extensive, it does not focus on practical issues that are relevant to oil spill fingerprinting. The aim of this review is to provide a framework for the use of chemometric approaches in tiered oil spill fingerprinting and to provide clear-cut practical details and experiences that can be used by the forensic chemist. The framework is based on methods for initial screening, which include classification of samples into oil type, detection of non matches and of weathering state, and detailed oil spill fingerprinting, in which a more rigorous matching of an oil spill sample to suspected source oils is obtained. This review is intended as a tutorial, and is based on two examples of initial screening using respectively gas chromatography with flame ionization detection and fluorescence spectroscopy; and two of detailed oil spill fingerprinting where gas chromatography-mass spectrometry data are analyzed according to two approaches: The first relying on sections of processed chromatograms and the second on diagnostic ratios.
DWI-based neural fingerprinting technology: a preliminary study on stroke analysis.
Ye, Chenfei; Ma, Heather Ting; Wu, Jun; Yang, Pengfei; Chen, Xuhui; Yang, Zhengyi; Ma, Jingbo
2014-01-01
Stroke is a common neural disorder in neurology clinics. Magnetic resonance imaging (MRI) has become an important tool to assess the neural physiological changes under stroke, such as diffusion weighted imaging (DWI) and diffusion tensor imaging (DTI). Quantitative analysis of MRI images would help medical doctors to localize the stroke area in the diagnosis in terms of structural information and physiological characterization. However, current quantitative approaches can only provide localization of the disorder rather than measure physiological variation of subtypes of ischemic stroke. In the current study, we hypothesize that each kind of neural disorder would have its unique physiological characteristics, which could be reflected by DWI images on different gradients. Based on this hypothesis, a DWI-based neural fingerprinting technology was proposed to classify subtypes of ischemic stroke. The neural fingerprint was constructed by the signal intensity of the region of interest (ROI) on the DWI images under different gradients. The fingerprint derived from the manually drawn ROI could classify the subtypes with accuracy 100%. However, the classification accuracy was worse when using semiautomatic and automatic method in ROI segmentation. The preliminary results showed promising potential of DWI-based neural fingerprinting technology in stroke subtype classification. Further studies will be carried out for enhancing the fingerprinting accuracy and its application in other clinical practices.
Viswanathan, P; Krishna, P Venkata
2014-05-01
Teleradiology allows transmission of medical images for clinical data interpretation to provide improved e-health care access, delivery, and standards. The remote transmission raises various ethical and legal issues like image retention, fraud, privacy, malpractice liability, etc. A joint FED watermarking system means a joint fingerprint/encryption/dual watermarking system is proposed for addressing these issues. The system combines a region based substitution dual watermarking algorithm using spatial fusion, stream cipher algorithm using symmetric key, and fingerprint verification algorithm using invariants. This paper aims to give access to the outcomes of medical images with confidentiality, availability, integrity, and its origin. The watermarking, encryption, and fingerprint enrollment are conducted jointly in protection stage such that the extraction, decryption, and verification can be applied independently. The dual watermarking system, introducing two different embedding schemes, one used for patient data and other for fingerprint features, reduces the difficulty in maintenance of multiple documents like authentication data, personnel and diagnosis data, and medical images. The spatial fusion algorithm, which determines the region of embedding using threshold from the image to embed the encrypted patient data, follows the exact rules of fusion resulting in better quality than other fusion techniques. The four step stream cipher algorithm using symmetric key for encrypting the patient data with fingerprint verification system using algebraic invariants improves the robustness of the medical information. The experiment result of proposed scheme is evaluated for security and quality analysis in DICOM medical images resulted well in terms of attacks, quality index, and imperceptibility.
Custers, Deborah; Krakowska, Barbara; De Beer, Jacques O; Courselle, Patricia; Daszykowski, Michal; Apers, Sandra; Deconinck, Eric
2016-02-01
Counterfeit medicines are a global threat to public health. High amounts enter the European market, which is why characterization of these products is a very important issue. In this study, a high-performance liquid chromatography-photodiode array (HPLC-PDA) and high-performance liquid chromatography-mass spectrometry (HPLC-MS) method were developed for the analysis of genuine Viagra®, generic products of Viagra®, and counterfeit samples in order to obtain different types of fingerprints. These data were included in the chemometric data analysis, aiming to test whether PDA and MS are complementary detection techniques. The MS data comprise both MS1 and MS2 fingerprints; the PDA data consist of fingerprints measured at three different wavelengths, i.e., 254, 270, and 290 nm, and all possible combinations of these wavelengths. First, it was verified if both groups of fingerprints can discriminate between genuine, generic, and counterfeit medicines separately; next, it was studied if the obtained results could be ameliorated by combining both fingerprint types. This data analysis showed that MS1 does not provide suitable classification models since several genuines and generics are classified as counterfeits and vice versa. However, when analyzing the MS1_MS2 data in combination with partial least squares-discriminant analysis (PLS-DA), a perfect discrimination was obtained. When only using data measured at 254 nm, good classification models can be obtained by k nearest neighbors (kNN) and soft independent modelling of class analogy (SIMCA), which might be interesting for the characterization of counterfeit drugs in developing countries. However, in general, the combination of PDA and MS data (254 nm_MS1) is preferred due to less classification errors between the genuines/generics and counterfeits compared to PDA and MS data separately.
NASA Astrophysics Data System (ADS)
Anne, Marie-Laure; Le Lan, Caroline; Monbet, Valérie; Boussard-Plédel, Catherine; Ropert, Martine; Sire, Olivier; Pouchard, Michel; Jard, Christine; Lucas, Jacques; Adam, Jean Luc; Brissot, Pierre; Bureau, Bruno; Loréal, Olivier
2009-09-01
Fiber evanescent wave spectroscopy (FEWS) explores the mid-infrared domain, providing information on functional chemical groups represented in the sample. Our goal is to evaluate whether spectral fingerprints obtained by FEWS might orientate clinical diagnosis. Serum samples from normal volunteers and from four groups of patients with metabolic abnormalities are analyzed by FEWS. These groups consist of iron overloaded genetic hemochromatosis (GH), iron depleted GH, cirrhosis, and dysmetabolic hepatosiderosis (DYSH). A partial least squares (PLS) logistic method is used in a training group to create a classification algorithm, thereafter applied to a test group. Patients with cirrhosis or DYSH, two groups exhibiting important metabolic disturbances, are clearly discriminated from control groups with AUROC values of 0.94+/-0.05 and 0.90+/-0.06, and sensibility/specificity of 86/84% and 87/87%, respectively. When pooling all groups, the PLS method contributes to discriminate controls, cirrhotic, and dysmetabolic patients. Our data demonstrate that metabolic profiling using infrared FEWS is a possible way to investigate metabolic alterations in patients.
An atomistic fingerprint algorithm for learning ab initio molecular force fields
NASA Astrophysics Data System (ADS)
Tang, Yu-Hang; Zhang, Dongkun; Karniadakis, George Em
2018-01-01
Molecular fingerprints, i.e., feature vectors describing atomistic neighborhood configurations, is an important abstraction and a key ingredient for data-driven modeling of potential energy surface and interatomic force. In this paper, we present the density-encoded canonically aligned fingerprint algorithm, which is robust and efficient, for fitting per-atom scalar and vector quantities. The fingerprint is essentially a continuous density field formed through the superimposition of smoothing kernels centered on the atoms. Rotational invariance of the fingerprint is achieved by aligning, for each fingerprint instance, the neighboring atoms onto a local canonical coordinate frame computed from a kernel minisum optimization procedure. We show that this approach is superior over principal components analysis-based methods especially when the atomistic neighborhood is sparse and/or contains symmetry. We propose that the "distance" between the density fields be measured using a volume integral of their pointwise difference. This can be efficiently computed using optimal quadrature rules, which only require discrete sampling at a small number of grid points. We also experiment on the choice of weight functions for constructing the density fields and characterize their performance for fitting interatomic potentials. The applicability of the fingerprint is demonstrated through a set of benchmark problems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Musah, Rabi A.; Espinoza, Edgard O.; Cody, Robert B.
A high throughput method for species identification and classification through chemometric processing of direct analysis in real time (DART) mass spectrometry-derived fingerprint signatures has been developed. The method entails introduction of samples to the open air space between the DART ion source and the mass spectrometer inlet, with the entire observed mass spectral fingerprint subjected to unsupervised hierarchical clustering processing. Moreover, a range of both polar and non-polar chemotypes are instantaneously detected. The result is identification and species level classification based on the entire DART-MS spectrum. In this paper, we illustrate how the method can be used to: (1) distinguishmore » between endangered woods regulated by the Convention for the International Trade of Endangered Flora and Fauna (CITES) treaty; (2) assess the origin and by extension the properties of biodiesel feedstocks; (3) determine insect species from analysis of puparial casings; (4) distinguish between psychoactive plants products; and (5) differentiate between Eucalyptus species. An advantage of the hierarchical clustering approach to processing of the DART-MS derived fingerprint is that it shows both similarities and differences between species based on their chemotypes. Furthermore, full knowledge of the identities of the constituents contained within the small molecule profile of analyzed samples is not required.« less
Musah, Rabi A.; Espinoza, Edgard O.; Cody, Robert B.; ...
2015-07-09
A high throughput method for species identification and classification through chemometric processing of direct analysis in real time (DART) mass spectrometry-derived fingerprint signatures has been developed. The method entails introduction of samples to the open air space between the DART ion source and the mass spectrometer inlet, with the entire observed mass spectral fingerprint subjected to unsupervised hierarchical clustering processing. Moreover, a range of both polar and non-polar chemotypes are instantaneously detected. The result is identification and species level classification based on the entire DART-MS spectrum. In this paper, we illustrate how the method can be used to: (1) distinguishmore » between endangered woods regulated by the Convention for the International Trade of Endangered Flora and Fauna (CITES) treaty; (2) assess the origin and by extension the properties of biodiesel feedstocks; (3) determine insect species from analysis of puparial casings; (4) distinguish between psychoactive plants products; and (5) differentiate between Eucalyptus species. An advantage of the hierarchical clustering approach to processing of the DART-MS derived fingerprint is that it shows both similarities and differences between species based on their chemotypes. Furthermore, full knowledge of the identities of the constituents contained within the small molecule profile of analyzed samples is not required.« less
Musah, Rabi A.; Espinoza, Edgard O.; Cody, Robert B.; Lesiak, Ashton D.; Christensen, Earl D.; Moore, Hannah E.; Maleknia, Simin; Drijfhout, Falko P.
2015-01-01
A high throughput method for species identification and classification through chemometric processing of direct analysis in real time (DART) mass spectrometry-derived fingerprint signatures has been developed. The method entails introduction of samples to the open air space between the DART ion source and the mass spectrometer inlet, with the entire observed mass spectral fingerprint subjected to unsupervised hierarchical clustering processing. A range of both polar and non-polar chemotypes are instantaneously detected. The result is identification and species level classification based on the entire DART-MS spectrum. Here, we illustrate how the method can be used to: (1) distinguish between endangered woods regulated by the Convention for the International Trade of Endangered Flora and Fauna (CITES) treaty; (2) assess the origin and by extension the properties of biodiesel feedstocks; (3) determine insect species from analysis of puparial casings; (4) distinguish between psychoactive plants products; and (5) differentiate between Eucalyptus species. An advantage of the hierarchical clustering approach to processing of the DART-MS derived fingerprint is that it shows both similarities and differences between species based on their chemotypes. Furthermore, full knowledge of the identities of the constituents contained within the small molecule profile of analyzed samples is not required. PMID:26156000
NASA Astrophysics Data System (ADS)
Musah, Rabi A.; Espinoza, Edgard O.; Cody, Robert B.; Lesiak, Ashton D.; Christensen, Earl D.; Moore, Hannah E.; Maleknia, Simin; Drijfhout, Falko P.
2015-07-01
A high throughput method for species identification and classification through chemometric processing of direct analysis in real time (DART) mass spectrometry-derived fingerprint signatures has been developed. The method entails introduction of samples to the open air space between the DART ion source and the mass spectrometer inlet, with the entire observed mass spectral fingerprint subjected to unsupervised hierarchical clustering processing. A range of both polar and non-polar chemotypes are instantaneously detected. The result is identification and species level classification based on the entire DART-MS spectrum. Here, we illustrate how the method can be used to: (1) distinguish between endangered woods regulated by the Convention for the International Trade of Endangered Flora and Fauna (CITES) treaty; (2) assess the origin and by extension the properties of biodiesel feedstocks; (3) determine insect species from analysis of puparial casings; (4) distinguish between psychoactive plants products; and (5) differentiate between Eucalyptus species. An advantage of the hierarchical clustering approach to processing of the DART-MS derived fingerprint is that it shows both similarities and differences between species based on their chemotypes. Furthermore, full knowledge of the identities of the constituents contained within the small molecule profile of analyzed samples is not required.
NASA Astrophysics Data System (ADS)
Darlow, Luke Nicholas; Connan, James
2015-11-01
Surface fingerprint scanners are limited to a two-dimensional representation of the fingerprint topography, and thus, are vulnerable to fingerprint damage, distortion, and counterfeiting. Optical coherence tomography (OCT) scanners are able to image (in three dimensions) the internal structure of the fingertip skin. Techniques for obtaining the internal fingerprint from OCT scans have since been developed. This research presents an internal fingerprint extraction algorithm designed to extract high-quality internal fingerprints from touchless OCT fingertip scans. Furthermore, it serves as a correlation study between surface and internal fingerprints. Provided the scanned region contains sufficient fingerprint information, correlation to the surface topography is shown to be good (74% have true matches). The cross-correlation of internal fingerprints (96% have true matches) is substantial that internal fingerprints can constitute a fingerprint database. The internal fingerprints' performance was also compared to the performance of cropped surface counterparts, to eliminate bias owing to information level present, showing that the internal fingerprints' performance is superior 63.6% of the time.
Gottschlich, Carsten
2016-01-01
We present a new type of local image descriptor which yields binary patterns from small image patches. For the application to fingerprint liveness detection, we achieve rotation invariant image patches by taking the fingerprint segmentation and orientation field into account. We compute the discrete cosine transform (DCT) for these rotation invariant patches and attain binary patterns by comparing pairs of two DCT coefficients. These patterns are summarized into one or more histograms per image. Each histogram comprises the relative frequencies of pattern occurrences. Multiple histograms are concatenated and the resulting feature vector is used for image classification. We name this novel type of descriptor convolution comparison pattern (CCP). Experimental results show the usefulness of the proposed CCP descriptor for fingerprint liveness detection. CCP outperforms other local image descriptors such as LBP, LPQ and WLD on the LivDet 2013 benchmark. The CCP descriptor is a general type of local image descriptor which we expect to prove useful in areas beyond fingerprint liveness detection such as biological and medical image processing, texture recognition, face recognition and iris recognition, liveness detection for face and iris images, and machine vision for surface inspection and material classification. PMID:26844544
Singh, Aarti; Poczos, Barnabas; Erickson, Kirk I.; Tseng, Wen-Yih I.; Verstynen, Timothy D.
2016-01-01
Quantifying differences or similarities in connectomes has been a challenge due to the immense complexity of global brain networks. Here we introduce a noninvasive method that uses diffusion MRI to characterize whole-brain white matter architecture as a single local connectome fingerprint that allows for a direct comparison between structural connectomes. In four independently acquired data sets with repeated scans (total N = 213), we show that the local connectome fingerprint is highly specific to an individual, allowing for an accurate self-versus-others classification that achieved 100% accuracy across 17,398 identification tests. The estimated classification error was approximately one thousand times smaller than fingerprints derived from diffusivity-based measures or region-to-region connectivity patterns for repeat scans acquired within 3 months. The local connectome fingerprint also revealed neuroplasticity within an individual reflected as a decreasing trend in self-similarity across time, whereas this change was not observed in the diffusivity measures. Moreover, the local connectome fingerprint can be used as a phenotypic marker, revealing 12.51% similarity between monozygotic twins, 5.14% between dizygotic twins, and 4.51% between none-twin siblings, relative to differences between unrelated subjects. This novel approach opens a new door for probing the influence of pathological, genetic, social, or environmental factors on the unique configuration of the human connectome. PMID:27846212
A Fingerprint Encryption Scheme Based on Irreversible Function and Secure Authentication
Yu, Jianping; Zhang, Peng; Wang, Shulan
2015-01-01
A fingerprint encryption scheme based on irreversible function has been designed in this paper. Since the fingerprint template includes almost the entire information of users' fingerprints, the personal authentication can be determined only by the fingerprint features. This paper proposes an irreversible transforming function (using the improved SHA1 algorithm) to transform the original minutiae which are extracted from the thinned fingerprint image. Then, Chinese remainder theorem is used to obtain the biokey from the integration of the transformed minutiae and the private key. The result shows that the scheme has better performance on security and efficiency comparing with other irreversible function schemes. PMID:25873989
A fingerprint encryption scheme based on irreversible function and secure authentication.
Yang, Yijun; Yu, Jianping; Zhang, Peng; Wang, Shulan
2015-01-01
A fingerprint encryption scheme based on irreversible function has been designed in this paper. Since the fingerprint template includes almost the entire information of users' fingerprints, the personal authentication can be determined only by the fingerprint features. This paper proposes an irreversible transforming function (using the improved SHA1 algorithm) to transform the original minutiae which are extracted from the thinned fingerprint image. Then, Chinese remainder theorem is used to obtain the biokey from the integration of the transformed minutiae and the private key. The result shows that the scheme has better performance on security and efficiency comparing with other irreversible function schemes.
Estimation of gender using cheiloscopy and dermatoglyphics
Tandon, Aanchal; Srivastava, Abhinav; Jaiswal, Rohit; Patidar, Madhvika; Khare, Aashish
2017-01-01
Background and Objective: Forensic dentistry plays a vital role in detection and resolution of crime, civil proceedings and personal identification. With ever-increasing demands placed upon law enforcement to provide sufficient physical evidence linking a perpetrator to a crime, it makes sense to utilize any type of physical characteristic to identify a suspect of an offense. The least invasive and cost-effective procedure among all methods of human identification is the study of lip prints and fingerprints. This study is done to determine the predominant pattern of fingerprint and lip print in males and females and to correlate it for gender identification. Materials and Methods: The study sample comprised 100 individuals (50 males and 50 females) aged between 20 and 50 years; dark-colored lipstick was applied uniformly on the lips. The glued portion of cellophane tape was dabbed first in the center and then pressed uniformly over the corner of lips. Cellophane tape was then stuck to a white chart sheet for the purpose of permanent record. Lip print patterns were analyzed following the classification of Suzuki and Tsuchihashi. The imprint of left thumb was taken on a white chart sheet using a blue ink stamp pad and visualized using magnifying lens. Fingerprints were analyzed by following the classification given by Kücken. Correlation of lip print and fingerprint was analyzed using Chi-square test. Results: The overall correlation of lip prints with fingerprints in males revealed branched lip pattern associated with whorl fingerprint and in females as vertical lip print pattern associated with loop fingerprint. Conclusion: We conclude that the study between lip print and fingerprint can aid in gender determination. PMID:29386811
Adaptive fuzzy system for 3-D vision
NASA Technical Reports Server (NTRS)
Mitra, Sunanda
1993-01-01
An adaptive fuzzy system using the concept of the Adaptive Resonance Theory (ART) type neural network architecture and incorporating fuzzy c-means (FCM) system equations for reclassification of cluster centers was developed. The Adaptive Fuzzy Leader Clustering (AFLC) architecture is a hybrid neural-fuzzy system which learns on-line in a stable and efficient manner. The system uses a control structure similar to that found in the Adaptive Resonance Theory (ART-1) network to identify the cluster centers initially. The initial classification of an input takes place in a two stage process; a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from Fuzzy c-Means (FCM) system equations for the centroids and the membership values. The operational characteristics of AFLC and the critical parameters involved in its operation are discussed. The performance of the AFLC algorithm is presented through application of the algorithm to the Anderson Iris data, and laser-luminescent fingerprint image data. The AFLC algorithm successfully classifies features extracted from real data, discrete or continuous, indicating the potential strength of this new clustering algorithm in analyzing complex data sets. The hybrid neuro-fuzzy AFLC algorithm will enhance analysis of a number of difficult recognition and control problems involved with Tethered Satellite Systems and on-orbit space shuttle attitude controller.
Maraschin, Marcelo; Somensi-Zeggio, Amélia; Oliveira, Simone K; Kuhnen, Shirley; Tomazzoli, Maíra M; Raguzzoni, Josiane C; Zeri, Ana C M; Carreira, Rafael; Correia, Sara; Costa, Christopher; Rocha, Miguel
2016-01-22
The chemical composition of propolis is affected by environmental factors and harvest season, making it difficult to standardize its extracts for medicinal usage. By detecting a typical chemical profile associated with propolis from a specific production region or season, certain types of propolis may be used to obtain a specific pharmacological activity. In this study, propolis from three agroecological regions (plain, plateau, and highlands) from southern Brazil, collected over the four seasons of 2010, were investigated through a novel NMR-based metabolomics data analysis workflow. Chemometrics and machine learning algorithms (PLS-DA and RF), including methods to estimate variable importance in classification, were used in this study. The machine learning and feature selection methods permitted construction of models for propolis sample classification with high accuracy (>75%, reaching ∼90% in the best case), better discriminating samples regarding their collection seasons comparatively to the harvest regions. PLS-DA and RF allowed the identification of biomarkers for sample discrimination, expanding the set of discriminating features and adding relevant information for the identification of the class-determining metabolites. The NMR-based metabolomics analytical platform, coupled to bioinformatic tools, allowed characterization and classification of Brazilian propolis samples regarding the metabolite signature of important compounds, i.e., chemical fingerprint, harvest seasons, and production regions.
Jiménez-Carvelo, Ana M; Pérez-Castaño, Estefanía; González-Casado, Antonio; Cuadros-Rodríguez, Luis
2017-04-15
A new method for differentiation of olive oil (independently of the quality category) from other vegetable oils (canola, safflower, corn, peanut, seeds, grapeseed, palm, linseed, sesame and soybean) has been developed. The analytical procedure for chromatographic fingerprinting of the methyl-transesterified fraction of each vegetable oil, using normal-phase liquid chromatography, is described and the chemometric strategies applied and discussed. Some chemometric methods, such as k-nearest neighbours (kNN), partial least squared-discriminant analysis (PLS-DA), support vector machine classification analysis (SVM-C), and soft independent modelling of class analogies (SIMCA), were applied to build classification models. Performance of the classification was evaluated and ranked using several classification quality metrics. The discriminant analysis, based on the use of one input-class, (plus a dummy class) was applied for the first time in this study. Copyright © 2016 Elsevier Ltd. All rights reserved.
Comparative study of minutiae selection algorithms for ISO fingerprint templates
NASA Astrophysics Data System (ADS)
Vibert, B.; Charrier, C.; Le Bars, J.-M.; Rosenberger, C.
2015-03-01
We address the selection of fingerprint minutiae given a fingerprint ISO template. Minutiae selection plays a very important role when a secure element (i.e. a smart-card) is used. Because of the limited capability of computation and memory, the number of minutiae of a stored reference in the secure element is limited. We propose in this paper a comparative study of 6 minutiae selection methods including 2 methods from the literature and 1 like reference (No Selection). Experimental results on 3 fingerprint databases from the Fingerprint Verification Competition show their relative efficiency in terms of performance and computation time.
identification. URE from ten MSP430F5529 16-bit microcontrollers were analyzed using: 1) RF distinct native attributes (RF-DNA) fingerprints paired with multiple...discriminant analysis/maximum likelihood (MDA/ML) classification, 2) RF-DNA fingerprints paired with generalized relevance learning vector quantized
Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology.
Zhang, Jieru; Ju, Ying; Lu, Huijuan; Xuan, Ping; Zou, Quan
2016-01-01
Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics.
NASA Astrophysics Data System (ADS)
Reis-Santos, Patrick; Gillanders, Bronwyn M.; Tanner, Susanne E.; Vasconcelos, Rita P.; Elsdon, Travis S.; Cabral, Henrique N.
2012-10-01
The chemical composition of fish otoliths can provide valuable information for determining the nursery value of estuaries to adult populations of coastal fishes. However, understanding temporal variation in elemental fingerprints at different scales is important as it can potentially confound spatial discrimination among estuaries. Otolith elemental ratios (Li:Ca, Mg:Ca, Mn:Ca, Cu:Ca, Sr:Ca, Ba:Ca and Pb:Ca) of Platichthys flesus and Dicentrarchus labrax, from several estuaries along the Portuguese coast in two years and three seasons (spring, summer and autumn) within a year, were determined via Laser Ablation Inductively Coupled Plasma Mass Spectrometry. Elemental fingerprints varied significantly among years and seasons within a year but we achieved accurate classifications of juvenile fish to estuarine nursery of origin (77-96% overall cross-validated accuracy). Although elemental fingerprints were year-specific, variation among seasons did not hinder spatial discrimination. Estuarine fingerprints of pooled seasonal data were representative of the entire juvenile year class and attained high discrimination (77% and 80% overall cross-validated accuracy for flounder and sea bass, respectively). Incorporating seasonal variation resulted in up to an 11% increase in correct classification of individual estuaries, in comparison to seasons where accuracies were lowest. Overall, understanding the implications of temporal variations in otolith chemistry for spatial discrimination is key to establish baseline data for connectivity studies.
NASA Astrophysics Data System (ADS)
Yu, Eugene; Craver, Scott
2006-02-01
Wow, or time warping caused by speed fluctuations in analog audio equipment, provides a wealth of applications in watermarking. Very subtle temporal distortion has been used to defeat watermarks, and as components in watermarking systems. In the image domain, the analogous warping of an image's canvas has been used both to defeat watermarks and also proposed to prevent collusion attacks on fingerprinting systems. In this paper, we explore how subliminal levels of wow can be used for steganography and fingerprinting. We present both a low-bitrate robust solution and a higher-bitrate solution intended for steganographic communication. As already observed, such a fingerprinting algorithm naturally discourages collusion by averaging, owing to flanging effects when misaligned audio is averaged. Another advantage of warping is that even when imperceptible, it can be beyond the reach of compression algorithms. We use this opportunity to debunk the common misconception that steganography is impossible under "perfect compression."
Adding localization information in a fingerprint binary feature vector representation
NASA Astrophysics Data System (ADS)
Bringer, Julien; Despiegel, Vincent; Favre, Mélanie
2011-06-01
At BTAS'10, a new framework to transform a fingerprint minutiae template into a binary feature vector of fixed length is described. A fingerprint is characterized by its similarity with a fixed number set of representative local minutiae vicinities. This approach by representative leads to a fixed length binary representation, and, as the approach is local, it enables to deal with local distortions that may occur between two acquisitions. We extend this construction to incorporate additional information in the binary vector, in particular on localization of the vicinities. We explore the use of position and orientation information. The performance improvement is promising for utilization into fast identification algorithms or into privacy protection algorithms.
Bayesian estimation of multicomponent relaxation parameters in magnetic resonance fingerprinting.
McGivney, Debra; Deshmane, Anagha; Jiang, Yun; Ma, Dan; Badve, Chaitra; Sloan, Andrew; Gulani, Vikas; Griswold, Mark
2018-07-01
To estimate multiple components within a single voxel in magnetic resonance fingerprinting when the number and types of tissues comprising the voxel are not known a priori. Multiple tissue components within a single voxel are potentially separable with magnetic resonance fingerprinting as a result of differences in signal evolutions of each component. The Bayesian framework for inverse problems provides a natural and flexible setting for solving this problem when the tissue composition per voxel is unknown. Assuming that only a few entries from the dictionary contribute to a mixed signal, sparsity-promoting priors can be placed upon the solution. An iterative algorithm is applied to compute the maximum a posteriori estimator of the posterior probability density to determine the magnetic resonance fingerprinting dictionary entries that contribute most significantly to mixed or pure voxels. Simulation results show that the algorithm is robust in finding the component tissues of mixed voxels. Preliminary in vivo data confirm this result, and show good agreement in voxels containing pure tissue. The Bayesian framework and algorithm shown provide accurate solutions for the partial-volume problem in magnetic resonance fingerprinting. The flexibility of the method will allow further study into different priors and hyperpriors that can be applied in the model. Magn Reson Med 80:159-170, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Magagna, Federico; Guglielmetti, Alessandro; Liberto, Erica; Reichenbach, Stephen E; Allegrucci, Elena; Gobino, Guido; Bicchi, Carlo; Cordero, Chiara
2017-08-02
This study investigates chemical information of volatile fractions of high-quality cocoa (Theobroma cacao L. Malvaceae) from different origins (Mexico, Ecuador, Venezuela, Columbia, Java, Trinidad, and Sao Tomè) produced for fine chocolate. This study explores the evolution of the entire pattern of volatiles in relation to cocoa processing (raw, roasted, steamed, and ground beans). Advanced chemical fingerprinting (e.g., combined untargeted and targeted fingerprinting) with comprehensive two-dimensional gas chromatography coupled with mass spectrometry allows advanced pattern recognition for classification, discrimination, and sensory-quality characterization. The entire data set is analyzed for 595 reliable two-dimensional peak regions, including 130 known analytes and 13 potent odorants. Multivariate analysis with unsupervised exploration (principal component analysis) and simple supervised discrimination methods (Fisher ratios and linear regression trees) reveal informative patterns of similarities and differences and identify characteristic compounds related to sample origin and manufacturing step.
3D matching techniques using OCT fingerprint point clouds
NASA Astrophysics Data System (ADS)
Gutierrez da Costa, Henrique S.; Silva, Luciano; Bellon, Olga R. P.; Bowden, Audrey K.; Czovny, Raphael K.
2017-02-01
Optical Coherence Tomography (OCT) makes viable acquisition of 3D fingerprints from both dermis and epidermis skin layers and their interfaces, exposing features that can be explored to improve biometric identification such as the curvatures and distinctive 3D regions. Scanned images from eleven volunteers allowed the construction of the first OCT 3D fingerprint database, to our knowledge, containing epidermal and dermal fingerprints. 3D dermal fingerprints can be used to overcome cases of Failure to Enroll (FTE) due to poor ridge image quality and skin alterations, cases that affect 2D matching performance. We evaluate three matching techniques, including the well-established Iterative Closest Points algorithm (ICP), Surface Interpenetration Measure (SIM) and the well-known KH Curvature Maps, all assessed using a 3D OCT fingerprint database, the first one for this purpose. Two of these techniques are based on registration techniques and one on curvatures. These were evaluated, compared and the fusion of matching scores assessed. We applied a sequence of steps to extract regions of interest named (ROI) minutiae clouds, representing small regions around distinctive minutia, usually located at ridges/valleys endings or bifurcations. The obtained ROI is acquired from the epidermis and dermis-epidermis interface by OCT imaging. A comparative analysis of identification accuracy was explored using different scenarios and the obtained results shows improvements for biometric identification. A comparison against 2D fingerprint matching algorithms is also presented to assess the improvements.
Fingerprint recognition system by use of graph matching
NASA Astrophysics Data System (ADS)
Shen, Wei; Shen, Jun; Zheng, Huicheng
2001-09-01
Fingerprint recognition is an important subject in biometrics to identify or verify persons by physiological characteristics, and has found wide applications in different domains. In the present paper, we present a finger recognition system that combines singular points and structures. The principal steps of processing in our system are: preprocessing and ridge segmentation, singular point extraction and selection, graph representation, and finger recognition by graphs matching. Our fingerprint recognition system is implemented and tested for many fingerprint images and the experimental result are satisfactory. Different techniques are used in our system, such as fast calculation of orientation field, local fuzzy dynamical thresholding, algebraic analysis of connections and fingerprints representation and matching by graphs. Wed find that for fingerprint database that is not very large, the recognition rate is very high even without using a prior coarse category classification. This system works well for both one-to-few and one-to-many problems.
The FBI wavelet/scalar quantization standard for gray-scale fingerprint image compression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bradley, J.N.; Brislawn, C.M.; Hopper, T.
1993-05-01
The FBI has recently adopted a standard for the compression of digitized 8-bit gray-scale fingerprint images. The standard is based on scalar quantization of a 64-subband discrete wavelet transform decomposition of the images, followed by Huffman coding. Novel features of the algorithm include the use of symmetric boundary conditions for transforming finite-length signals and a subband decomposition tailored for fingerprint images scanned at 500 dpi. The standard is intended for use in conjunction with ANSI/NBS-CLS 1-1993, American National Standard Data Format for the Interchange of Fingerprint Information, and the FBI`s Integrated Automated Fingerprint Identification System.
The FBI wavelet/scalar quantization standard for gray-scale fingerprint image compression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bradley, J.N.; Brislawn, C.M.; Hopper, T.
1993-01-01
The FBI has recently adopted a standard for the compression of digitized 8-bit gray-scale fingerprint images. The standard is based on scalar quantization of a 64-subband discrete wavelet transform decomposition of the images, followed by Huffman coding. Novel features of the algorithm include the use of symmetric boundary conditions for transforming finite-length signals and a subband decomposition tailored for fingerprint images scanned at 500 dpi. The standard is intended for use in conjunction with ANSI/NBS-CLS 1-1993, American National Standard Data Format for the Interchange of Fingerprint Information, and the FBI's Integrated Automated Fingerprint Identification System.
Advanced fingerprint verification software
NASA Astrophysics Data System (ADS)
Baradarani, A.; Taylor, J. R. B.; Severin, F.; Maev, R. Gr.
2016-05-01
We have developed a fingerprint software package that can be used in a wide range of applications from law enforcement to public and private security systems, and to personal devices such as laptops, vehicles, and door- locks. The software and processing units are a unique implementation of new and sophisticated algorithms that compete with the current best systems in the world. Development of the software package has been in line with the third generation of our ultrasonic fingerprinting machine1. Solid and robust performance is achieved in the presence of misplaced and low quality fingerprints.
The FBI compression standard for digitized fingerprint images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brislawn, C.M.; Bradley, J.N.; Onyshczak, R.J.
1996-10-01
The FBI has formulated national standards for digitization and compression of gray-scale fingerprint images. The compression algorithm for the digitized images is based on adaptive uniform scalar quantization of a discrete wavelet transform subband decomposition, a technique referred to as the wavelet/scalar quantization method. The algorithm produces archival-quality images at compression ratios of around 15 to 1 and will allow the current database of paper fingerprint cards to be replaced by digital imagery. A compliance testing program is also being implemented to ensure high standards of image quality and interchangeability of data between different implementations. We will review the currentmore » status of the FBI standard, including the compliance testing process and the details of the first-generation encoder.« less
FBI compression standard for digitized fingerprint images
NASA Astrophysics Data System (ADS)
Brislawn, Christopher M.; Bradley, Jonathan N.; Onyshczak, Remigius J.; Hopper, Thomas
1996-11-01
The FBI has formulated national standards for digitization and compression of gray-scale fingerprint images. The compression algorithm for the digitized images is based on adaptive uniform scalar quantization of a discrete wavelet transform subband decomposition, a technique referred to as the wavelet/scalar quantization method. The algorithm produces archival-quality images at compression ratios of around 15 to 1 and will allow the current database of paper fingerprint cards to be replaced by digital imagery. A compliance testing program is also being implemented to ensure high standards of image quality and interchangeability of data between different implementations. We will review the current status of the FBI standard, including the compliance testing process and the details of the first-generation encoder.
Braem, G; De Vliegher, S; Supré, K; Haesebrouck, F; Leroy, F; De Vuyst, L
2011-01-10
Due to significant financial losses in the dairy cattle farming industry caused by mastitis and the possible influence of coagulase-negative staphylococci (CNS) in the development of this disease, accurate identification methods are needed that untangle the different species of the diverse CNS group. In this study, 39 Staphylococcus type strains and 253 field isolates were subjected to (GTG)(5)-PCR fingerprinting to construct a reference framework for the classification and identification of different CNS from (sub)clinical milk samples and teat apices swabs. Validation of the reference framework was performed by dividing the field isolates in two separate groups and testing whether one group of field isolates, in combination with type strains, could be used for a correct classification and identification of a second group of field isolates. (GTG)(5)-PCR fingerprinting achieved a typeability of 94.7% and an accuracy of 94.3% compared to identifications based on gene sequencing. The study shows the usefulness of the method to determine the identity of bovine Staphylococcus species, provided an identification framework updated with field isolates is available. Copyright © 2010 Elsevier B.V. All rights reserved.
A network identity authentication system based on Fingerprint identification technology
NASA Astrophysics Data System (ADS)
Xia, Hong-Bin; Xu, Wen-Bo; Liu, Yuan
2005-10-01
Fingerprint verification is one of the most reliable personal identification methods. However, most of the automatic fingerprint identification system (AFIS) is not run via Internet/Intranet environment to meet today's increasing Electric commerce requirements. This paper describes the design and implementation of the archetype system of identity authentication based on fingerprint biometrics technology, and the system can run via Internet environment. And in our system the COM and ASP technology are used to integrate Fingerprint technology with Web database technology, The Fingerprint image preprocessing algorithms are programmed into COM, which deployed on the internet information server. The system's design and structure are proposed, and the key points are discussed. The prototype system of identity authentication based on Fingerprint have been successfully tested and evaluated on our university's distant education applications in an internet environment.
A robust and non-invertible fingerprint template for fingerprint matching system.
Trivedi, Amit Kumar; Thounaojam, Dalton Meitei; Pal, Shyamosree
2018-05-20
Fingerprint Recognition System is widely deployed in variety of application domain, ranging from forensic to mobile phones. Its widespread deployment in various applications were person authentication are required, has caused concern that a leaked fingerprint template may be used to reconstruct the original fingerprint and the reconstructed fingerprint can be used to circumvent all the applications the person is enrolled. In this paper, a non-invertible fingerprint template that stores only the relative geometric information about the minutiae points is proposed. The spatial location of the minutiae points in original fingerprint and its orientations are not available in the proposed template which makes it impossible to estimate the orientation of fingerprint from the template. The proposed template is invariant to rotation, translation and distortion and immune to reconstruction algorithm. The proposed system is experimented using standard FVC2000 database and yields better results in terms of EER and FMR as compared with latest techniques. Copyright © 2018 Elsevier B.V. All rights reserved.
Shawky, Eman; Abou El Kheir, Rasha M
2018-02-11
Species of Apiaceae are used in folk medicine as spices and in officinal medicinal preparations of drugs. They are an excellent source of phenolics exhibiting antioxidant activity, which are of great benefit to human health. Discrimination among Apiaceae medicinal herbs remains an intricate challenge due to their morphological similarity. In this study, a combined "untargeted" and "targeted" approach to investigate different Apiaceae plants species was proposed by using the merging of high-performance thin layer chromatography (HPTLC)-image analysis and pattern recognition methods which were used for fingerprinting and classification of 42 different Apiaceae samples collected from Egypt. Software for image processing was applied for fingerprinting and data acquisition. HPTLC fingerprint assisted by principal component analysis (PCA) and hierarchical cluster analysis (HCA)-heat maps resulted in a reliable untargeted approach for discrimination and classification of different samples. The "targeted" approach was performed by developing and validating an HPTLC method allowing the quantification of eight flavonoids. The combination of quantitative data with PCA and HCA-heat-maps allowed the different samples to be discriminated from each other. The use of chemometrics tools for evaluation of fingerprints reduced expense and analysis time. The proposed method can be adopted for routine discrimination and evaluation of the phytochemical variability in different Apiaceae species extracts. Copyright © 2018 John Wiley & Sons, Ltd.
Teo, Chin Chye; Tan, Swee Ngin; Yong, Jean Wan Hong; Hew, Choy Sin; Ong, Eng Shi
2009-02-01
An approach that combined green-solvent methods of extraction with chromatographic chemical fingerprint and pattern recognition tools such as principal component analysis (PCA) was used to evaluate the quality of medicinal plants. Pressurized hot water extraction (PHWE) and microwave-assisted extraction (MAE) were used and their extraction efficiencies to extract two bioactive compounds, namely stevioside (SV) and rebaudioside A (RA), from Stevia rebaudiana Bertoni (SB) under different cultivation conditions were compared. The proposed methods showed that SV and RA could be extracted from SB using pure water under optimized conditions. The extraction efficiency of the methods was observed to be higher or comparable to heating under reflux with water. The method precision (RSD, n = 6) was found to vary from 1.91 to 2.86% for the two different methods on different days. Compared to PHWE, MAE has higher extraction efficiency with shorter extraction time. MAE was also found to extract more chemical constituents and provide distinctive chemical fingerprints for quality control purposes. Thus, a combination of MAE with chromatographic chemical fingerprints and PCA provided a simple and rapid approach for the comparison and classification of medicinal plants from different growth conditions. Hence, the current work highlighted the importance of extraction method in chemical fingerprinting for the classification of medicinal plants from different cultivation conditions with the aid of pattern recognition tools used.
Sun, Guoxiang; Zhang, Jingxian
2009-05-01
The three wavelength fusion high performance liquid chromatographic fingerprin (TWFFP) of Longdanxiegan pill (LDXGP) was established to identify the quality of LDXGP by the systematic quantified fingerprint method. The chromatographic fingerprints (CFPs) of the 12 batches of LDXGP were determined by reversed-phase high performance liquid chromatography. The technique of multi-wavelength fusion fingerprint was applied during processing the fingerprints. The TWFFPs containing 63 co-possessing peaks were obtained when choosing baicalin peak as the referential peak. The 12 batches of LDXGP were identified with hierarchical clustering analysis by using macro qualitative similarity (S(m)) as the variable. According to the results of classification, the referential fingerprint (RFP) was synthesized from 10 batches of LDXGP. Taking the RFP for the qualified model, all the 12 batches of LDXGP were evaluated by the systematic quantified fingerprint method. Among the 12 batches of LDXGP, 9 batches were completely qualified, the contents of 1 batch were obviously higher while the chemical constituents quantity and distributed proportion in 2 batches were not qualified. The systematic quantified fingerprint method based on the technique of multi-wavelength fusion fingerprint ca effectively identify the authentic quality of traditional Chinese medicine.
Webb, Samuel J; Hanser, Thierry; Howlin, Brendan; Krause, Paul; Vessey, Jonathan D
2014-03-25
A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints.A fragmentation algorithm is utilised to investigate the model's behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model's behaviour for the specific query. Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.
An Improved Algorithm to Generate a Wi-Fi Fingerprint Database for Indoor Positioning
Chen, Lina; Li, Binghao; Zhao, Kai; Rizos, Chris; Zheng, Zhengqi
2013-01-01
The major problem of Wi-Fi fingerprint-based positioning technology is the signal strength fingerprint database creation and maintenance. The significant temporal variation of received signal strength (RSS) is the main factor responsible for the positioning error. A probabilistic approach can be used, but the RSS distribution is required. The Gaussian distribution or an empirically-derived distribution (histogram) is typically used. However, these distributions are either not always correct or require a large amount of data for each reference point. Double peaks of the RSS distribution have been observed in experiments at some reference points. In this paper a new algorithm based on an improved double-peak Gaussian distribution is proposed. Kurtosis testing is used to decide if this new distribution, or the normal Gaussian distribution, should be applied. Test results show that the proposed algorithm can significantly improve the positioning accuracy, as well as reduce the workload of the off-line data training phase. PMID:23966197
An improved algorithm to generate a Wi-Fi fingerprint database for indoor positioning.
Chen, Lina; Li, Binghao; Zhao, Kai; Rizos, Chris; Zheng, Zhengqi
2013-08-21
The major problem of Wi-Fi fingerprint-based positioning technology is the signal strength fingerprint database creation and maintenance. The significant temporal variation of received signal strength (RSS) is the main factor responsible for the positioning error. A probabilistic approach can be used, but the RSS distribution is required. The Gaussian distribution or an empirically-derived distribution (histogram) is typically used. However, these distributions are either not always correct or require a large amount of data for each reference point. Double peaks of the RSS distribution have been observed in experiments at some reference points. In this paper a new algorithm based on an improved double-peak Gaussian distribution is proposed. Kurtosis testing is used to decide if this new distribution, or the normal Gaussian distribution, should be applied. Test results show that the proposed algorithm can significantly improve the positioning accuracy, as well as reduce the workload of the off-line data training phase.
Tashima, Hideaki; Takeda, Masafumi; Suzuki, Hiroyuki; Obi, Takashi; Yamaguchi, Masahiro; Ohyama, Nagaaki
2010-06-21
We have shown that the application of double random phase encoding (DRPE) to biometrics enables the use of biometrics as cipher keys for binary data encryption. However, DRPE is reported to be vulnerable to known-plaintext attacks (KPAs) using a phase recovery algorithm. In this study, we investigated the vulnerability of DRPE using fingerprints as cipher keys to the KPAs. By means of computational experiments, we estimated the encryption key and restored the fingerprint image using the estimated key. Further, we propose a method for avoiding the KPA on the DRPE that employs the phase retrieval algorithm. The proposed method makes the amplitude component of the encrypted image constant in order to prevent the amplitude component of the encrypted image from being used as a clue for phase retrieval. Computational experiments showed that the proposed method not only avoids revealing the cipher key and the fingerprint but also serves as a sufficiently accurate verification system.
Dual Resolution Images from Paired Fingerprint Cards
National Institute of Standards and Technology Data Gateway
NIST Dual Resolution Images from Paired Fingerprint Cards (Web, free access) NIST Special Database 30 is being distributed for use in development and testing of fingerprint compression and fingerprint matching systems. The database allows the user to develop and evaluate data compression algorithms for fingerprint images scanned at both 19.7 ppmm (500 dpi) and 39.4 ppmm (1000 dpi). The data consist of 36 ten-print paired cards with both the rolled and plain images scanned at 19.7 and 39.4 pixels per mm. A newer version of the compression/decompression software on the CDROM can be found at the website http://www.nist.gov/itl/iad/ig/nigos.cfm as part of the NBIS package.
MR fingerprinting reconstruction with Kalman filter.
Zhang, Xiaodi; Zhou, Zechen; Chen, Shiyang; Chen, Shuo; Li, Rui; Hu, Xiaoping
2017-09-01
Magnetic resonance fingerprinting (MR fingerprinting or MRF) is a newly introduced quantitative magnetic resonance imaging technique, which enables simultaneous multi-parameter mapping in a single acquisition with improved time efficiency. The current MRF reconstruction method is based on dictionary matching, which may be limited by the discrete and finite nature of the dictionary and the computational cost associated with dictionary construction, storage and matching. In this paper, we describe a reconstruction method based on Kalman filter for MRF, which avoids the use of dictionary to obtain continuous MR parameter measurements. With this Kalman filter framework, the Bloch equation of inversion-recovery balanced steady state free-precession (IR-bSSFP) MRF sequence was derived to predict signal evolution, and acquired signal was entered to update the prediction. The algorithm can gradually estimate the accurate MR parameters during the recursive calculation. Single pixel and numeric brain phantom simulation were implemented with Kalman filter and the results were compared with those from dictionary matching reconstruction algorithm to demonstrate the feasibility and assess the performance of Kalman filter algorithm. The results demonstrated that Kalman filter algorithm is applicable for MRF reconstruction, eliminating the need for a pre-define dictionary and obtaining continuous MR parameter in contrast to the dictionary matching algorithm. Copyright © 2017 Elsevier Inc. All rights reserved.
Filter Design and Performance Evaluation for Fingerprint Image Segmentation
Thai, Duy Hoang; Huckemann, Stephan; Gottschlich, Carsten
2016-01-01
Fingerprint recognition plays an important role in many commercial applications and is used by millions of people every day, e.g. for unlocking mobile phones. Fingerprint image segmentation is typically the first processing step of most fingerprint algorithms and it divides an image into foreground, the region of interest, and background. Two types of error can occur during this step which both have a negative impact on the recognition performance: ‘true’ foreground can be labeled as background and features like minutiae can be lost, or conversely ‘true’ background can be misclassified as foreground and spurious features can be introduced. The contribution of this paper is threefold: firstly, we propose a novel factorized directional bandpass (FDB) segmentation method for texture extraction based on the directional Hilbert transform of a Butterworth bandpass (DHBB) filter interwoven with soft-thresholding. Secondly, we provide a manually marked ground truth segmentation for 10560 images as an evaluation benchmark. Thirdly, we conduct a systematic performance comparison between the FDB method and four of the most often cited fingerprint segmentation algorithms showing that the FDB segmentation method clearly outperforms these four widely used methods. The benchmark and the implementation of the FDB method are made publicly available. PMID:27171150
Improving the recognition of fingerprint biometric system using enhanced image fusion
NASA Astrophysics Data System (ADS)
Alsharif, Salim; El-Saba, Aed; Stripathi, Reshma
2010-04-01
Fingerprints recognition systems have been widely used by financial institutions, law enforcement, border control, visa issuing, just to mention few. Biometric identifiers can be counterfeited, but considered more reliable and secure compared to traditional ID cards or personal passwords methods. Fingerprint pattern fusion improves the performance of a fingerprint recognition system in terms of accuracy and security. This paper presents digital enhancement and fusion approaches that improve the biometric of the fingerprint recognition system. It is a two-step approach. In the first step raw fingerprint images are enhanced using high-frequency-emphasis filtering (HFEF). The second step is a simple linear fusion process between the raw images and the HFEF ones. It is shown that the proposed approach increases the verification and identification of the fingerprint biometric recognition system, where any improvement is justified using the correlation performance metrics of the matching algorithm.
Altered fingerprints: analysis and detection.
Yoon, Soweon; Feng, Jianjiang; Jain, Anil K
2012-03-01
The widespread deployment of Automated Fingerprint Identification Systems (AFIS) in law enforcement and border control applications has heightened the need for ensuring that these systems are not compromised. While several issues related to fingerprint system security have been investigated, including the use of fake fingerprints for masquerading identity, the problem of fingerprint alteration or obfuscation has received very little attention. Fingerprint obfuscation refers to the deliberate alteration of the fingerprint pattern by an individual for the purpose of masking his identity. Several cases of fingerprint obfuscation have been reported in the press. Fingerprint image quality assessment software (e.g., NFIQ) cannot always detect altered fingerprints since the implicit image quality due to alteration may not change significantly. The main contributions of this paper are: 1) compiling case studies of incidents where individuals were found to have altered their fingerprints for circumventing AFIS, 2) investigating the impact of fingerprint alteration on the accuracy of a commercial fingerprint matcher, 3) classifying the alterations into three major categories and suggesting possible countermeasures, 4) developing a technique to automatically detect altered fingerprints based on analyzing orientation field and minutiae distribution, and 5) evaluating the proposed technique and the NFIQ algorithm on a large database of altered fingerprints provided by a law enforcement agency. Experimental results show the feasibility of the proposed approach in detecting altered fingerprints and highlight the need to further pursue this problem.
Implementation of Wi-Fi Signal Sampling on an Android Smartphone for Indoor Positioning Systems.
Liu, Hung-Huan; Liu, Chun
2017-12-21
Collecting and maintaining radio fingerprint for wireless indoor positioning systems involves considerable time and labor. We have proposed the quick radio fingerprint collection (QRFC) algorithm which employed the built-in accelerometer of Android smartphones to implement step detection in order to assist in collecting radio fingerprints. In the present study, we divided the algorithm into moving sampling (MS) and stepped MS (SMS), and describe the implementation of both algorithms and their comparison. Technical details and common errors concerning the use of Android smartphones to collect Wi-Fi radio beacons were surveyed and discussed. The results of signal sampling experiments performed in a hallway measuring 54 m in length showed that in terms of the amount of time required to complete collection of access point (AP) signals, static sampling (SS; a traditional procedure for collecting Wi-Fi signals) took at least 2 h, whereas MS and SMS took approximately 150 and 300 s, respectively. Notably, AP signals obtained through MS and SMS were comparable to those obtained through SS in terms of the distribution of received signal strength indicator (RSSI) and positioning accuracy. Therefore, MS and SMS are recommended instead of SS as signal sampling procedures for indoor positioning algorithms.
Implementation of Wi-Fi Signal Sampling on an Android Smartphone for Indoor Positioning Systems
Liu, Chun
2017-01-01
Collecting and maintaining radio fingerprint for wireless indoor positioning systems involves considerable time and labor. We have proposed the quick radio fingerprint collection (QRFC) algorithm which employed the built-in accelerometer of Android smartphones to implement step detection in order to assist in collecting radio fingerprints. In the present study, we divided the algorithm into moving sampling (MS) and stepped MS (SMS), and describe the implementation of both algorithms and their comparison. Technical details and common errors concerning the use of Android smartphones to collect Wi-Fi radio beacons were surveyed and discussed. The results of signal sampling experiments performed in a hallway measuring 54 m in length showed that in terms of the amount of time required to complete collection of access point (AP) signals, static sampling (SS; a traditional procedure for collecting Wi-Fi signals) took at least 2 h, whereas MS and SMS took approximately 150 and 300 s, respectively. Notably, AP signals obtained through MS and SMS were comparable to those obtained through SS in terms of the distribution of received signal strength indicator (RSSI) and positioning accuracy. Therefore, MS and SMS are recommended instead of SS as signal sampling procedures for indoor positioning algorithms. PMID:29267234
Cai, Yong; Li, Xiwen; Li, Mei; Chen, Xiaojia; Hu, Hao; Ni, Jingyun; Wang, Yitao
2015-01-01
Chemical fingerprinting is currently a widely used tool that enables rapid and accurate quality evaluation of Traditional Chinese Medicine (TCM). However, chemical fingerprints are not amenable to information storage, recognition, and retrieval, which limit their use in Chinese medicine traceability. In this study, samples of three kinds of Chinese medicines were randomly selected and chemical fingerprints were then constructed by using high performance liquid chromatography. Based on chemical data, the process of converting the TCM chemical fingerprint into two-dimensional code is presented; preprocess and filtering algorithm are also proposed aiming at standardizing the large amount of original raw data. In order to know which type of two-dimensional code (2D) is suitable for storing data of chemical fingerprints, current popular types of 2D codes are analyzed and compared. Results show that QR Code is suitable for recording the TCM chemical fingerprint. The fingerprint information of TCM can be converted into data format that can be stored as 2D code for traceability and quality control.
Ni, Yongnian; Lai, Yanhua; Brandes, Sarina; Kokot, Serge
2009-08-11
Multi-wavelength fingerprints of Cassia seed, a traditional Chinese medicine (TCM), were collected by high-performance liquid chromatography (HPLC) at two wavelengths with the use of diode array detection. The two data sets of chromatograms were combined by the data fusion-based method. This data set of fingerprints was compared separately with the two data sets collected at each of the two wavelengths. It was demonstrated with the use of principal component analysis (PCA), that multi-wavelength fingerprints provided a much improved representation of the differences in the samples. Thereafter, the multi-wavelength fingerprint data set was submitted for classification to a suite of chemometrics methods viz. fuzzy clustering (FC), SIMCA and the rank ordering MCDM PROMETHEE and GAIA. Each method highlighted different properties of the data matrix according to the fingerprints from different types of Cassia seeds. In general, the PROMETHEE and GAIA MCDM methods provided the most comprehensive information for matching and discrimination of the fingerprints, and appeared to be best suited for quality assurance purposes for these and similar types of sample.
Dermatoglyphics--a marker for malocclusion?
Tikare, S; Rajesh, G; Prasad, K W; Thippeswamy, V; Javali, S B
2010-08-01
Dermatoglyphics is the study of dermal ridge configurations on palmar and plantar surfaces of hands and feet. Dermal ridges and craniofacial structures are both formed during 6-7th week of intra-uterine life. It is believed that hereditary and environmental factors leading to malocclusion may also cause peculiarities in fingerprint patterns. To study and assess the relationship between fingerprints and malocclusion among a group of high school children aged 12-16 years in Dharwad, Karnataka, India. A total of 696 high school children aged 12-16 years were randomly selected. Their fingerprints were recorded using duplicating ink and malocclusion status was clinically assessed using Angle's classification. Chi-square analysis revealed statistical association between whorl patterns and classes 1 and 2 malocclusion (p < 0.05). However, no overall statistical association was observed between fingerprint patterns and malocclusion (p > 0.05). Dermatoglyphics might be an appropriate marker for malocclusion and further studies are required to elucidate an association between fingerprint patterns and malocclusion.
Ma, Yue; Tuskan, Gerald A.
2018-01-01
The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here) from the protein distribution densities in the LD space defined by ln(L) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level. PMID:29686995
NASA Astrophysics Data System (ADS)
Dieckman, Eric Allen
In order to perform useful tasks for us, robots must have the ability to notice, recognize, and respond to objects and events in their environment. This requires the acquisition and synthesis of information from a variety of sensors. Here we investigate the performance of a number of sensor modalities in an unstructured outdoor environment, including the Microsoft Kinect, thermal infrared camera, and coffee can radar. Special attention is given to acoustic echolocation measurements of approaching vehicles, where an acoustic parametric array propagates an audible signal to the oncoming target and the Kinect microphone array records the reflected backscattered signal. Although useful information about the target is hidden inside the noisy time domain measurements, the Dynamic Wavelet Fingerprint process (DWFP) is used to create a time-frequency representation of the data. A small-dimensional feature vector is created for each measurement using an intelligent feature selection process for use in statistical pattern classification routines. Using our experimentally measured data from real vehicles at 50 m, this process is able to correctly classify vehicles into one of five classes with 94% accuracy. Fully three-dimensional simulations allow us to study the nonlinear beam propagation and interaction with real-world targets to improve classification results.
Liu, Zechang; Wang, Liping; Liu, Yumei
2018-01-18
Hops impart flavor to beer, with the volatile components characterizing the various hop varieties and qualities. Fingerprinting, especially flavor fingerprinting, is often used to identify 'flavor products' because inconsistencies in the description of flavor may lead to an incorrect definition of beer quality. Compared to flavor fingerprinting, volatile fingerprinting is simpler and easier. We performed volatile fingerprinting using head space-solid phase micro-extraction gas chromatography-mass spectrometry combined with similarity analysis and principal component analysis (PCA) for evaluating and distinguishing between three major Chinese hops. Eighty-four volatiles were identified, which were classified into seven categories. Volatile fingerprinting based on similarity analysis did not yield any obvious result. By contrast, hop varieties and qualities were identified using volatile fingerprinting based on PCA. The potential variables explained the variance in the three hop varieties. In addition, the dendrogram and principal component score plot described the differences and classifications of hops. Volatile fingerprinting plus multivariate statistical analysis can rapidly differentiate between the different varieties and qualities of the three major Chinese hops. Furthermore, this method can be used as a reference in other fields. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.
Enhancement of plant metabolite fingerprinting by machine learning.
Scott, Ian M; Vermeer, Cornelia P; Liakata, Maria; Corol, Delia I; Ward, Jane L; Lin, Wanchang; Johnson, Helen E; Whitehead, Lynne; Kular, Baldeep; Baker, John M; Walsh, Sean; Dave, Anuja; Larson, Tony R; Graham, Ian A; Wang, Trevor L; King, Ross D; Draper, John; Beale, Michael H
2010-08-01
Metabolite fingerprinting of Arabidopsis (Arabidopsis thaliana) mutants with known or predicted metabolic lesions was performed by (1)H-nuclear magnetic resonance, Fourier transform infrared, and flow injection electrospray-mass spectrometry. Fingerprinting enabled processing of five times more plants than conventional chromatographic profiling and was competitive for discriminating mutants, other than those affected in only low-abundance metabolites. Despite their rapidity and complexity, fingerprints yielded metabolomic insights (e.g. that effects of single lesions were usually not confined to individual pathways). Among fingerprint techniques, (1)H-nuclear magnetic resonance discriminated the most mutant phenotypes from the wild type and Fourier transform infrared discriminated the fewest. To maximize information from fingerprints, data analysis was crucial. One-third of distinctive phenotypes might have been overlooked had data models been confined to principal component analysis score plots. Among several methods tested, machine learning (ML) algorithms, namely support vector machine or random forest (RF) classifiers, were unsurpassed for phenotype discrimination. Support vector machines were often the best performing classifiers, but RFs yielded some particularly informative measures. First, RFs estimated margins between mutant phenotypes, whose relations could then be visualized by Sammon mapping or hierarchical clustering. Second, RFs provided importance scores for the features within fingerprints that discriminated mutants. These scores correlated with analysis of variance F values (as did Kruskal-Wallis tests, true- and false-positive measures, mutual information, and the Relief feature selection algorithm). ML classifiers, as models trained on one data set to predict another, were ideal for focused metabolomic queries, such as the distinctiveness and consistency of mutant phenotypes. Accessible software for use of ML in plant physiology is highlighted.
The wavelet/scalar quantization compression standard for digital fingerprint images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bradley, J.N.; Brislawn, C.M.
1994-04-01
A new digital image compression standard has been adopted by the US Federal Bureau of Investigation for use on digitized gray-scale fingerprint images. The algorithm is based on adaptive uniform scalar quantization of a discrete wavelet transform image decomposition and is referred to as the wavelet/scalar quantization standard. The standard produces archival quality images at compression ratios of around 20:1 and will allow the FBI to replace their current database of paper fingerprint cards with digital imagery.
The Design and Implementation of Indoor Localization System Using Magnetic Field Based on Smartphone
NASA Astrophysics Data System (ADS)
Liu, J.; Jiang, C.; Shi, Z.
2017-09-01
Sufficient signal nodes are mostly required to implement indoor localization in mainstream research. Magnetic field take advantage of high precision, stable and reliability, and the reception of magnetic field signals is reliable and uncomplicated, it could be realized by geomagnetic sensor on smartphone, without external device. After the study of indoor positioning technologies, choose the geomagnetic field data as fingerprints to design an indoor localization system based on smartphone. A localization algorithm that appropriate geomagnetic matching is designed, and present filtering algorithm and algorithm for coordinate conversion. With the implement of plot geomagnetic fingerprints, the indoor positioning of smartphone without depending on external devices can be achieved. Finally, an indoor positioning system which is based on Android platform is successfully designed, through the experiments, proved the capability and effectiveness of indoor localization algorithm.
Molecular classification of liver cirrhosis in a rat model by proteomics and bioinformatics.
Xu, Xiu-Qin; Leow, Chon K; Lu, Xin; Zhang, Xuegong; Liu, Jun S; Wong, Wing-Hung; Asperger, Arndt; Deininger, Sören; Eastwood Leung, Hon-Chiu
2004-10-01
Liver cirrhosis is a worldwide health problem. Reliable, noninvasive methods for early detection of liver cirrhosis are not available. Using a three-step approach, we classified sera from rats with liver cirrhosis following different treatment insults. The approach consisted of: (i) protein profiling using surface-enhanced laser desorption/ionization (SELDI) technology; (ii) selection of a statistically significant serum biomarker set using machine learning algorithms; and (iii) identification of selected serum biomarkers by peptide sequencing. We generated serum protein profiles from three groups of rats: (i) normal (n=8), (ii) thioacetamide-induced liver cirrhosis (n=22), and (iii) bile duct ligation-induced liver fibrosis (n=5) using a weak cation exchanger surface. Profiling data were further analyzed by a recursive support vector machine algorithm to select a panel of statistically significant biomarkers for class prediction. Sensitivity and specificity of classification using the selected protein marker set were higher than 92%. A consistently down-regulated 3495 Da protein in cirrhosis samples was one of the selected significant biomarkers. This 3495 Da protein was purified on-chip and trypsin digested. Further structural characterization of this biomarkers candidate was done by using cross-platform matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) peptide mass fingerprinting (PMF) and matrix-assisted laser desorption/ionization time of flight/time of flight (MALDI-TOF/TOF) tandem mass spectrometry (MS/MS). Combined data from PMF and MS/MS spectra of two tryptic peptides suggested that this 3495 Da protein shared homology to a histidine-rich glycoprotein. These results demonstrated a novel approach to discovery of new biomarkers for early detection of liver cirrhosis and classification of liver diseases.
Adachi, Tetsuya; Pezzotti, Giuseppe; Yamamoto, Toshiro; Ichioka, Hiroaki; Boffelli, Marco; Zhu, Wenliang; Kanamura, Narisato
2015-05-01
A systematic investigation, based on highly spectrally resolved Raman spectroscopy, was undertaken to research the efficacy of vibrational assessments in locating chemical and crystallographic fingerprints for the characterization of dental caries and the early detection of non-cavitated carious lesions. Raman results published by other authors have indicated possible approaches for this method. However, they conspicuously lacked physical insight at the molecular scale and, thus, the rigor necessary to prove the efficacy of this spectroscopy method. After solving basic physical challenges in a companion paper, we apply them here in the form of newly developed Raman algorithms for practical dental research. Relevant differences in mineral crystallite (average) orientation and texture distribution were revealed for diseased enamel at different stages compared with healthy mineralized enamel. Clear spectroscopy features could be directly translated in terms of a rigorous and quantitative classification of crystallography and chemical characteristics of diseased enamel structures. The Raman procedure enabled us to trace back otherwise invisible characteristics in early caries, in the translucent zone (i.e., the advancing front of the disease) and in the body of lesion of cavitated caries.
A Foot-Mounted Inertial Measurement Unit (IMU) Positioning Algorithm Based on Magnetic Constraint
Zou, Jiaheng
2018-01-01
With the development of related applications, indoor positioning techniques have been more and more widely developed. Based on Wi-Fi, Bluetooth low energy (BLE) and geomagnetism, indoor positioning techniques often rely on the physical location of fingerprint information. The focus and difficulty of establishing the fingerprint database are in obtaining a relatively accurate physical location with as little given information as possible. This paper presents a foot-mounted inertial measurement unit (IMU) positioning algorithm under the loop closure constraint based on magnetic information. It can provide relatively reliable position information without maps and geomagnetic information and provides a relatively accurate coordinate for the collection of a fingerprint database. In the experiment, the features extracted by the multi-level Fourier transform method proposed in this paper are validated and the validity of loop closure matching is tested with a RANSAC-based method. Moreover, the loop closure detection results show that the cumulative error of the trajectory processed by the graph optimization algorithm is significantly suppressed, presenting a good accuracy. The average error of the trajectory under loop closure constraint is controlled below 2.15 m. PMID:29494542
A Foot-Mounted Inertial Measurement Unit (IMU) Positioning Algorithm Based on Magnetic Constraint.
Wang, Yan; Li, Xin; Zou, Jiaheng
2018-03-01
With the development of related applications, indoor positioning techniques have been more and more widely developed. Based on Wi-Fi, Bluetooth low energy (BLE) and geomagnetism, indoor positioning techniques often rely on the physical location of fingerprint information. The focus and difficulty of establishing the fingerprint database are in obtaining a relatively accurate physical location with as little given information as possible. This paper presents a foot-mounted inertial measurement unit (IMU) positioning algorithm under the loop closure constraint based on magnetic information. It can provide relatively reliable position information without maps and geomagnetic information and provides a relatively accurate coordinate for the collection of a fingerprint database. In the experiment, the features extracted by the multi-level Fourier transform method proposed in this paper are validated and the validity of loop closure matching is tested with a RANSAC-based method. Moreover, the loop closure detection results show that the cumulative error of the trajectory processed by the graph optimization algorithm is significantly suppressed, presenting a good accuracy. The average error of the trajectory under loop closure constraint is controlled below 2.15 m.
Kharbach, Mourad; Kamal, Rabie; Mansouri, Mohammed Alaoui; Marmouzi, Ilias; Viaene, Johan; Cherrah, Yahia; Alaoui, Katim; Vercammen, Joeri; Bouklouze, Abdelaziz; Vander Heyden, Yvan
2018-10-15
This study investigated the effectiveness of SIFT-MS versus chemical profiling, both coupled to multivariate data analysis, to classify 95 Extra Virgin Argan Oils (EVAO), originating from five Moroccan Argan forest locations. The full scan option of SIFT-MS, is suitable to indicate the geographic origin of EVAO based on the fingerprints obtained using the three chemical ionization precursors (H 3 O + , NO + and O 2 + ). The chemical profiling (including acidity, peroxide value, spectrophotometric indices, fatty acids, tocopherols- and sterols composition) was also used for classification. Partial least squares discriminant analysis (PLS-DA), soft independent modeling of class analogy (SIMCA), K-nearest neighbors (KNN), and support vector machines (SVM), were compared. The SIFT-MS data were therefore fed to variable-selection methods to find potential biomarkers for classification. The classification models based either on chemical profiling or SIFT-MS data were able to classify the samples with high accuracy. SIFT-MS was found to be advantageous for rapid geographic classification. Copyright © 2018 Elsevier Ltd. All rights reserved.
Fingerprint image enhancement by differential hysteresis processing.
Blotta, Eduardo; Moler, Emilce
2004-05-10
A new method to enhance defective fingerprints images through image digital processing tools is presented in this work. When the fingerprints have been taken without any care, blurred and in some cases mostly illegible, as in the case presented here, their classification and comparison becomes nearly impossible. A combination of spatial domain filters, including a technique called differential hysteresis processing (DHP), is applied to improve these kind of images. This set of filtering methods proved to be satisfactory in a wide range of cases by uncovering hidden details that helped to identify persons. Dactyloscopy experts from Policia Federal Argentina and the EAAF have validated these results.
Cai, Yong; Li, Xiwen; Li, Mei; Chen, Xiaojia; Ni, Jingyun; Wang, Yitao
2015-01-01
Chemical fingerprinting is currently a widely used tool that enables rapid and accurate quality evaluation of Traditional Chinese Medicine (TCM). However, chemical fingerprints are not amenable to information storage, recognition, and retrieval, which limit their use in Chinese medicine traceability. In this study, samples of three kinds of Chinese medicines were randomly selected and chemical fingerprints were then constructed by using high performance liquid chromatography. Based on chemical data, the process of converting the TCM chemical fingerprint into two-dimensional code is presented; preprocess and filtering algorithm are also proposed aiming at standardizing the large amount of original raw data. In order to know which type of two-dimensional code (2D) is suitable for storing data of chemical fingerprints, current popular types of 2D codes are analyzed and compared. Results show that QR Code is suitable for recording the TCM chemical fingerprint. The fingerprint information of TCM can be converted into data format that can be stored as 2D code for traceability and quality control. PMID:26089936
Maximum Likelihood Reconstruction for Magnetic Resonance Fingerprinting
Zhao, Bo; Setsompop, Kawin; Ye, Huihui; Cauley, Stephen; Wald, Lawrence L.
2017-01-01
This paper introduces a statistical estimation framework for magnetic resonance (MR) fingerprinting, a recently proposed quantitative imaging paradigm. Within this framework, we present a maximum likelihood (ML) formalism to estimate multiple parameter maps directly from highly undersampled, noisy k-space data. A novel algorithm, based on variable splitting, the alternating direction method of multipliers, and the variable projection method, is developed to solve the resulting optimization problem. Representative results from both simulations and in vivo experiments demonstrate that the proposed approach yields significantly improved accuracy in parameter estimation, compared to the conventional MR fingerprinting reconstruction. Moreover, the proposed framework provides new theoretical insights into the conventional approach. We show analytically that the conventional approach is an approximation to the ML reconstruction; more precisely, it is exactly equivalent to the first iteration of the proposed algorithm for the ML reconstruction, provided that a gridding reconstruction is used as an initialization. PMID:26915119
Maximum Likelihood Reconstruction for Magnetic Resonance Fingerprinting.
Zhao, Bo; Setsompop, Kawin; Ye, Huihui; Cauley, Stephen F; Wald, Lawrence L
2016-08-01
This paper introduces a statistical estimation framework for magnetic resonance (MR) fingerprinting, a recently proposed quantitative imaging paradigm. Within this framework, we present a maximum likelihood (ML) formalism to estimate multiple MR tissue parameter maps directly from highly undersampled, noisy k-space data. A novel algorithm, based on variable splitting, the alternating direction method of multipliers, and the variable projection method, is developed to solve the resulting optimization problem. Representative results from both simulations and in vivo experiments demonstrate that the proposed approach yields significantly improved accuracy in parameter estimation, compared to the conventional MR fingerprinting reconstruction. Moreover, the proposed framework provides new theoretical insights into the conventional approach. We show analytically that the conventional approach is an approximation to the ML reconstruction; more precisely, it is exactly equivalent to the first iteration of the proposed algorithm for the ML reconstruction, provided that a gridding reconstruction is used as an initialization.
From template to image: reconstructing fingerprints from minutiae points.
Ross, Arun; Shah, Jidnya; Jain, Anil K
2007-04-01
Most fingerprint-based biometric systems store the minutiae template of a user in the database. It has been traditionally assumed that the minutiae template of a user does not reveal any information about the original fingerprint. In this paper, we challenge this notion and show that three levels of information about the parent fingerprint can be elicited from the minutiae template alone, viz., 1) the orientation field information, 2) the class or type information, and 3) the friction ridge structure. The orientation estimation algorithm determines the direction of local ridges using the evidence of minutiae triplets. The estimated orientation field, along with the given minutiae distribution, is then used to predict the class of the fingerprint. Finally, the ridge structure of the parent fingerprint is generated using streamlines that are based on the estimated orientation field. Line Integral Convolution is used to impart texture to the ensuing ridges, resulting in a ridge map resembling the parent fingerprint. The salient feature of this noniterative method to generate ridges is its ability to preserve the minutiae at specified locations in the reconstructed ridge map. Experiments using a commercial fingerprint matcher suggest that the reconstructed ridge structure bears close resemblance to the parent fingerprint.
jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints
2011-01-01
Background The decomposition of a chemical graph is a convenient approach to encode information of the corresponding organic compound. While several commercial toolkits exist to encode molecules as so-called fingerprints, only a few open source implementations are available. The aim of this work is to introduce a library for exactly defined molecular decompositions, with a strong focus on the application of these features in machine learning and data mining. It provides several options such as search depth, distance cut-offs, atom- and pharmacophore typing. Furthermore, it provides the functionality to combine, to compare, or to export the fingerprints into several formats. Results We provide a Java 1.6 library for the decomposition of chemical graphs based on the open source Chemistry Development Kit toolkit. We reimplemented popular fingerprinting algorithms such as depth-first search fingerprints, extended connectivity fingerprints, autocorrelation fingerprints (e.g. CATS2D), radial fingerprints (e.g. Molprint2D), geometrical Molprint, atom pairs, and pharmacophore fingerprints. We also implemented custom fingerprints such as the all-shortest path fingerprint that only includes the subset of shortest paths from the full set of paths of the depth-first search fingerprint. As an application of jCompoundMapper, we provide a command-line executable binary. We measured the conversion speed and number of features for each encoding and described the composition of the features in detail. The quality of the encodings was tested using the default parametrizations in combination with a support vector machine on the Sutherland QSAR data sets. Additionally, we benchmarked the fingerprint encodings on the large-scale Ames toxicity benchmark using a large-scale linear support vector machine. The results were promising and could often compete with literature results. On the large Ames benchmark, for example, we obtained an AUC ROC performance of 0.87 with a reimplementation of the extended connectivity fingerprint. This result is comparable to the performance achieved by a non-linear support vector machine using state-of-the-art descriptors. On the Sutherland QSAR data set, the best fingerprint encodings showed a comparable or better performance on 5 of the 8 benchmarks when compared against the results of the best descriptors published in the paper of Sutherland et al. Conclusions jCompoundMapper is a library for chemical graph fingerprints with several tweaking possibilities and exporting options for open source data mining toolkits. The quality of the data mining results, the conversion speed, the LPGL software license, the command-line interface, and the exporters should be useful for many applications in cheminformatics like benchmarks against literature methods, comparison of data mining algorithms, similarity searching, and similarity-based data mining. PMID:21219648
Enhancement of Plant Metabolite Fingerprinting by Machine Learning1[W
Scott, Ian M.; Vermeer, Cornelia P.; Liakata, Maria; Corol, Delia I.; Ward, Jane L.; Lin, Wanchang; Johnson, Helen E.; Whitehead, Lynne; Kular, Baldeep; Baker, John M.; Walsh, Sean; Dave, Anuja; Larson, Tony R.; Graham, Ian A.; Wang, Trevor L.; King, Ross D.; Draper, John; Beale, Michael H.
2010-01-01
Metabolite fingerprinting of Arabidopsis (Arabidopsis thaliana) mutants with known or predicted metabolic lesions was performed by 1H-nuclear magnetic resonance, Fourier transform infrared, and flow injection electrospray-mass spectrometry. Fingerprinting enabled processing of five times more plants than conventional chromatographic profiling and was competitive for discriminating mutants, other than those affected in only low-abundance metabolites. Despite their rapidity and complexity, fingerprints yielded metabolomic insights (e.g. that effects of single lesions were usually not confined to individual pathways). Among fingerprint techniques, 1H-nuclear magnetic resonance discriminated the most mutant phenotypes from the wild type and Fourier transform infrared discriminated the fewest. To maximize information from fingerprints, data analysis was crucial. One-third of distinctive phenotypes might have been overlooked had data models been confined to principal component analysis score plots. Among several methods tested, machine learning (ML) algorithms, namely support vector machine or random forest (RF) classifiers, were unsurpassed for phenotype discrimination. Support vector machines were often the best performing classifiers, but RFs yielded some particularly informative measures. First, RFs estimated margins between mutant phenotypes, whose relations could then be visualized by Sammon mapping or hierarchical clustering. Second, RFs provided importance scores for the features within fingerprints that discriminated mutants. These scores correlated with analysis of variance F values (as did Kruskal-Wallis tests, true- and false-positive measures, mutual information, and the Relief feature selection algorithm). ML classifiers, as models trained on one data set to predict another, were ideal for focused metabolomic queries, such as the distinctiveness and consistency of mutant phenotypes. Accessible software for use of ML in plant physiology is highlighted. PMID:20566707
Guo, Hao-Bo; Ma, Yue; Tuskan, Gerald A.; ...
2018-01-01
The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here)more » from the protein distribution densities in the LD space defined by ln( L ) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guo, Hao-Bo; Ma, Yue; Tuskan, Gerald A.
The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here)more » from the protein distribution densities in the LD space defined by ln( L ) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level.« less
NASA Astrophysics Data System (ADS)
Hildebrandt, Mario; Kiltz, Stefan; Krapyvskyy, Dmytro; Dittmann, Jana; Vielhauer, Claus; Leich, Marcus
2011-11-01
A machine-assisted analysis of traces from crime scenes might be possible with the advent of new high-resolution non-destructive contact-less acquisition techniques for latent fingerprints. This requires reliable techniques for the automatic extraction of fingerprint features from latent and exemplar fingerprints for matching purposes using pattern recognition approaches. Therefore, we evaluate the NIST Biometric Image Software for the feature extraction and verification of contact-lessly acquired latent fingerprints to determine potential error rates. Our exemplary test setup includes 30 latent fingerprints from 5 people in two test sets that are acquired from different surfaces using a chromatic white light sensor. The first test set includes 20 fingerprints on two different surfaces. It is used to determine the feature extraction performance. The second test set includes one latent fingerprint on 10 different surfaces and an exemplar fingerprint to determine the verification performance. This utilized sensing technique does not require a physical or chemical visibility enhancement of the fingerprint residue, thus the original trace remains unaltered for further investigations. No particular feature extraction and verification techniques have been applied to such data, yet. Hence, we see the need for appropriate algorithms that are suitable to support forensic investigations.
Separation and sequence detection of overlapped fingerprints: experiments and first results
NASA Astrophysics Data System (ADS)
Kärgel, Rainer; Giebel, Sascha; Leich, Marcus; Dittmann, Jana
2011-11-01
Latent fingerprints provide vital information in modern crime scene investigation. On frequently touched surfaces the fingerprints may overlap which poses a major problem for forensic analysis. In order to make such overlapping fingerprints available for analysis, they have to be separated. An additional evaluation of the sequence in which the fingerprints were brought onto the surface can help to reconstruct the progression of events. Advances in both tasks can considerably aid crime investigation agencies and are the subject of this work. Here, a statistical approach, initially devised for the separation of overlapping text patterns by Tonazzini et al.,1 is employed to separate overlapping fingerprints. The method involves a maximum a posteriori estimation of the single fingerprints and the mixing coefficients, computed by an expectation-maximization algorithm. A fingerprint age determination feature based on corrosion is evaluated for sequence estimation. The approaches are evaluated using 30 samples of overlapping latent fingerprints on two different substrates. The fingerprint images are acquired with a non-destructive chromatic white light surface measurement device, each sample containing exactly two fingerprints that overlap in the center of the image. Since forensic investigations rely on manual assessment of acquired fingerprints by forensics experts, a subjective scale ranging from 0 to 8 is used to rate the separation results. Our results indicate that the chosen method can separate overlapped fingerprints which exhibit strong differences in contrast, since results gradually improve with the growing contrast difference of the overlapping fingerprints. Investigating the effects of corrosion leads to a reliable determination of the fingerprints' sequence as the timespan between their leaving increases.
Earthquake Fingerprints: Representing Earthquake Waveforms for Similarity-Based Detection
NASA Astrophysics Data System (ADS)
Bergen, K.; Beroza, G. C.
2016-12-01
New earthquake detection methods, such as Fingerprint and Similarity Thresholding (FAST), use fast approximate similarity search to identify similar waveforms in long-duration data without templates (Yoon et al. 2015). These methods have two key components: fingerprint extraction and an efficient search algorithm. Fingerprint extraction converts waveforms into fingerprints, compact signatures that represent short-duration waveforms for identification and search. Earthquakes are detected using an efficient indexing and search scheme, such as locality-sensitive hashing, that identifies similar waveforms in a fingerprint database. The quality of the search results, and thus the earthquake detection results, is strongly dependent on the fingerprinting scheme. Fingerprint extraction should map similar earthquake waveforms to similar waveform fingerprints to ensure a high detection rate, even under additive noise and small distortions. Additionally, fingerprints corresponding to noise intervals should have mutually dissimilar fingerprints to minimize false detections. In this work, we compare the performance of multiple fingerprint extraction approaches for the earthquake waveform similarity search problem. We apply existing audio fingerprinting (used in content-based audio identification systems) and time series indexing techniques and present modified versions that are specifically adapted for seismic data. We also explore data-driven fingerprinting approaches that can take advantage of labeled or unlabeled waveform data. For each fingerprinting approach we measure its ability to identify similar waveforms in a low signal-to-noise setting, and quantify the trade-off between true and false detection rates in the presence of persistent noise sources. We compare the performance using known event waveforms from eight independent stations in the Northern California Seismic Network.
Collins, A.L; Pulley, S.; Foster, I.D.L; Gellis, Allen; Porto, P.; Horowitz, A.J.
2017-01-01
The growing awareness of the environmental significance of fine-grained sediment fluxes through catchment systems continues to underscore the need for reliable information on the principal sources of this material. Source estimates are difficult to obtain using traditional monitoring techniques, but sediment source fingerprinting or tracing procedures, have emerged as a potentially valuable alternative. Despite the rapidly increasing numbers of studies reporting the use of sediment source fingerprinting, several key challenges and uncertainties continue to hamper consensus among the international scientific community on key components of the existing methodological procedures. Accordingly, this contribution reviews and presents recent developments for several key aspects of fingerprinting, namely: sediment source classification, catchment source and target sediment sampling, tracer selection, grain size issues, tracer conservatism, source apportionment modelling, and assessment of source predictions using artificial mixtures. Finally, a decision-tree representing the current state of knowledge is presented, to guide end-users in applying the fingerprinting approach.
A preliminary study of DTI Fingerprinting on stroke analysis.
Ma, Heather T; Ye, Chenfei; Wu, Jun; Yang, Pengfei; Chen, Xuhui; Yang, Zhengyi; Ma, Jingbo
2014-01-01
DTI (Diffusion Tensor Imaging) is a well-known MRI (Magnetic Resonance Imaging) technique which provides useful structural information about human brain. However, the quantitative measurement to physiological variation of subtypes of ischemic stroke is not available. An automatically quantitative method for DTI analysis will enhance the DTI application in clinics. In this study, we proposed a DTI Fingerprinting technology to quantitatively analyze white matter tissue, which was applied in stroke classification. The TBSS (Tract Based Spatial Statistics) method was employed to generate mask automatically. To evaluate the clustering performance of the automatic method, lesion ROI (Region of Interest) is manually drawn on the DWI images as a reference. The results from the DTI Fingerprinting were compared with those obtained from the reference ROIs. It indicates that the DTI Fingerprinting could identify different states of ischemic stroke and has promising potential to provide a more comprehensive measure of the DTI data. Further development should be carried out to improve DTI Fingerprinting technology in clinics.
Identification of goat milk powder by manufacturer using multiple chemical parameters.
McLeod, Rebecca J; Prosser, Colin G; Wakefield, Joshua W
2016-02-01
Concentrations of multiple elements and ratios of stable isotopes of carbon and nitrogen were measured and combined to create a chemical fingerprint of production batches of goat whole milk powder (WMP) produced by different manufacturers. Our objectives were to determine whether or not differences exist in the chemical fingerprint among samples of goat WMP produced at different sites, and assess temporal changes in the chemical fingerprint in product manufactured at one site. In total, 58 samples of goat WMP were analyzed by inductively coupled plasma-mass spectrometry as well as isotope ratio mass spectrometry and a suite of 13 elements (Li, Na, Mg, K, Ca, Mn, Cu, Zn, Rb, Sr, Mo, Cs, and Ba), δ(13)C, and δ(15)N selected to create the chemical fingerprint. Differences in the chemical fingerprint of samples between sites and over time were assessed using principal components analysis and canonical analysis of principal coordinates. Differences in the chemical fingerprints of samples between production sites provided a classification success rate (leave-one-out classification) of 98.1%, providing a basis for using the approach to test the authenticity of product manufactured at a site. Within one site, the chemical fingerprint of samples produced at the beginning of the production season differed from those produced in the middle and late season, driven predominantly by lower concentrations of Na, Mg, K, Mn, and Rb, and higher concentrations of Ba and Cu. This observed temporal variability highlights the importance of obtaining samples from throughout the season to ensure a representative chemical fingerprint is obtained for goat WMP from a single manufacturing site. The reconstitution and spray drying of samples from one manufacturer by the other manufacturer enabled the relative influence of the manufacturing process on the chemical fingerprint to be examined. It was found that such reprocessing altered the chemical fingerprint, although the degree of alteration varied among samples and individual elements. The findings of this study support the use of trace elements and stable isotope ratios to test the authenticity of goat WMP, which can likely be applied to other dairy goat products. This approach could be used test to the factory of origin (and potentially batch of origin) of products in the supply chain, thus providing the ability to audit the supply chain and monitor for fraudulent activity. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Sherlock Holmes in the Classroom.
ERIC Educational Resources Information Center
Faia, Jean E.
1988-01-01
Describes a three-day classroom activity combining criminal investigations and scientific skills, especially observation skills. Provides detailed classroom procedures with an illustration of eight basic fingerprint patterns and a classification chart. (YP)
Fingerprints selection for topological localization
NASA Astrophysics Data System (ADS)
Popov, Vladimir
2017-07-01
Problems of visual navigation are extensively studied in contemporary robotics. In particular, we can mention different problems of visual landmarks selection, the problem of selection of a minimal set of visual landmarks, selection of partially distinguishable guards, the problem of placement of visual landmarks. In this paper, we consider one-dimensional color panoramas. Such panoramas can be used for creating fingerprints. Fingerprints give us unique identifiers for visually distinct locations by recovering statistically significant features. Fingerprints can be used as visual landmarks for the solution of various problems of mobile robot navigation. In this paper, we consider a method for automatic generation of fingerprints. In particular, we consider the bounded Post correspondence problem and applications of the problem to consensus fingerprints and topological localization. We propose an efficient approach to solve the bounded Post correspondence problem. In particular, we use an explicit reduction from the decision version of the problem to the satisfiability problem. We present the results of computational experiments for different satisfiability algorithms. In robotic experiments, we consider the average accuracy of reaching of the target point for different lengths of routes and types of fingerprints.
Sensor-oriented feature usability evaluation in fingerprint segmentation
NASA Astrophysics Data System (ADS)
Li, Ying; Yin, Yilong; Yang, Gongping
2013-06-01
Existing fingerprint segmentation methods usually process fingerprint images captured by different sensors with the same feature or feature set. We propose to improve the fingerprint segmentation result in view of an important fact that images from different sensors have different characteristics for segmentation. Feature usability evaluation, which means to evaluate the usability of features to find the personalized feature or feature set for different sensors to improve the performance of segmentation. The need for feature usability evaluation for fingerprint segmentation is raised and analyzed as a new issue. To address this issue, we present a decision-tree-based feature-usability evaluation method, which utilizes a C4.5 decision tree algorithm to evaluate and pick the best suitable feature or feature set for fingerprint segmentation from a typical candidate feature set. We apply the novel method on the FVC2002 database of fingerprint images, which are acquired by four different respective sensors and technologies. Experimental results show that the accuracy of segmentation is improved, and time consumption for feature extraction is dramatically reduced with selected feature(s).
NASA Astrophysics Data System (ADS)
Brunner, Siegfried; Kargel, Christian
2011-06-01
The conservation and efficient use of natural and especially strategic resources like oil and water have become global issues, which increasingly initiate environmental and political activities for comprehensive recycling programs. To effectively reutilize oil-based materials necessary in many industrial fields (e.g. chemical and pharmaceutical industry, automotive, packaging), appropriate methods for a fast and highly reliable automated material identification are required. One non-contacting, color- and shape-independent new technique that eliminates the shortcomings of existing methods is to label materials like plastics with certain combinations of fluorescent markers ("optical codes", "optical fingerprints") incorporated during manufacture. Since time-resolved measurements are complex (and expensive), fluorescent markers must be designed that possess unique spectral signatures. The number of identifiable materials increases with the number of fluorescent markers that can be reliably distinguished within the limited wavelength band available. In this article we shall investigate the reliable detection and classification of fluorescent markers with specific fluorescence emission spectra. These simulated spectra are modeled based on realistic fluorescence spectra acquired from material samples using a modern VNIR spectral imaging system. In order to maximize the number of materials that can be reliably identified, we evaluate the performance of 8 classification algorithms based on different spectral similarity measures. The results help guide the design of appropriate fluorescent markers, optical sensors and the overall measurement system.
NASA Astrophysics Data System (ADS)
Niazmardi, S.; Safari, A.; Homayouni, S.
2017-09-01
Crop mapping through classification of Satellite Image Time-Series (SITS) data can provide very valuable information for several agricultural applications, such as crop monitoring, yield estimation, and crop inventory. However, the SITS data classification is not straightforward. Because different images of a SITS data have different levels of information regarding the classification problems. Moreover, the SITS data is a four-dimensional data that cannot be classified using the conventional classification algorithms. To address these issues in this paper, we presented a classification strategy based on Multiple Kernel Learning (MKL) algorithms for SITS data classification. In this strategy, initially different kernels are constructed from different images of the SITS data and then they are combined into a composite kernel using the MKL algorithms. The composite kernel, once constructed, can be used for the classification of the data using the kernel-based classification algorithms. We compared the computational time and the classification performances of the proposed classification strategy using different MKL algorithms for the purpose of crop mapping. The considered MKL algorithms are: MKL-Sum, SimpleMKL, LPMKL and Group-Lasso MKL algorithms. The experimental tests of the proposed strategy on two SITS data sets, acquired by SPOT satellite sensors, showed that this strategy was able to provide better performances when compared to the standard classification algorithm. The results also showed that the optimization method of the used MKL algorithms affects both the computational time and classification accuracy of this strategy.
A Design of Irregular Grid Map for Large-Scale Wi-Fi LAN Fingerprint Positioning Systems
Kim, Jae-Hoon; Min, Kyoung Sik; Yeo, Woon-Young
2014-01-01
The rapid growth of mobile communication and the proliferation of smartphones have drawn significant attention to location-based services (LBSs). One of the most important factors in the vitalization of LBSs is the accurate position estimation of a mobile device. The Wi-Fi positioning system (WPS) is a new positioning method that measures received signal strength indication (RSSI) data from all Wi-Fi access points (APs) and stores them in a large database as a form of radio fingerprint map. Because of the millions of APs in urban areas, radio fingerprints are seriously contaminated and confused. Moreover, the algorithmic advances for positioning face computational limitation. Therefore, we present a novel irregular grid structure and data analytics for efficient fingerprint map management. The usefulness of the proposed methodology is presented using the actual radio fingerprint measurements taken throughout Seoul, Korea. PMID:25302315
A design of irregular grid map for large-scale Wi-Fi LAN fingerprint positioning systems.
Kim, Jae-Hoon; Min, Kyoung Sik; Yeo, Woon-Young
2014-01-01
The rapid growth of mobile communication and the proliferation of smartphones have drawn significant attention to location-based services (LBSs). One of the most important factors in the vitalization of LBSs is the accurate position estimation of a mobile device. The Wi-Fi positioning system (WPS) is a new positioning method that measures received signal strength indication (RSSI) data from all Wi-Fi access points (APs) and stores them in a large database as a form of radio fingerprint map. Because of the millions of APs in urban areas, radio fingerprints are seriously contaminated and confused. Moreover, the algorithmic advances for positioning face computational limitation. Therefore, we present a novel irregular grid structure and data analytics for efficient fingerprint map management. The usefulness of the proposed methodology is presented using the actual radio fingerprint measurements taken throughout Seoul, Korea.
Pulley, S; Collins, A L
2018-09-01
The mitigation of diffuse sediment pollution requires reliable provenance information so that measures can be targeted. Sediment source fingerprinting represents one approach for supporting these needs, but recent methodological developments have resulted in an increasing complexity of data processing methods rendering the approach less accessible to non-specialists. A comprehensive new software programme (SIFT; SedIment Fingerprinting Tool) has therefore been developed which guides the user through critical data analysis decisions and automates all calculations. Multiple source group configurations and composite fingerprints are identified and tested using multiple methods of uncertainty analysis. This aims to explore the sediment provenance information provided by the tracers more comprehensively than a single model, and allows for model configurations with high uncertainties to be rejected. This paper provides an overview of its application to an agricultural catchment in the UK to determine if the approach used can provide a reduction in uncertainty and increase in precision. Five source group classifications were used; three formed using a k-means cluster analysis containing 2, 3 and 4 clusters, and two a-priori groups based upon catchment geology. Three different composite fingerprints were used for each classification and bi-plots, range tests, tracer variability ratios and virtual mixtures tested the reliability of each model configuration. Some model configurations performed poorly when apportioning the composition of virtual mixtures, and different model configurations could produce different sediment provenance results despite using composite fingerprints able to discriminate robustly between the source groups. Despite this uncertainty, dominant sediment sources were identified, and those in close proximity to each sediment sampling location were found to be of greatest importance. This new software, by integrating recent methodological developments in tracer data processing, guides users through key steps. Critically, by applying multiple model configurations and uncertainty assessment, it delivers more robust solutions for informing catchment management of the sediment problem than many previously used approaches. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Merkel, Ronny; Gruhn, Stefan; Dittmann, Jana; Vielhauer, Claus; Bräutigam, Anja
2012-10-10
The feasibility of 2D-intensity and 3D-topography images from a non-invasive Chromatic White Light (CWL) sensor for the age determination of latent fingerprints is investigated. The proposed method might provide the means to solve the so far unresolved issue of determining a fingerprints age in forensics. Conducting numerous experiments for an indoor crime scene using selected surfaces, different influences on the aging of fingerprints are investigated and the resulting aging variability is determined in terms of inter-person, intra-person, inter-finger and intra-finger variation. Main influence factors are shown to be the sweat composition, temperature, humidity, wind, UV-radiation, surface type, contamination of the finger with water-containing substances, resolution and measured area size, whereas contact time, contact pressure and smearing of the print seem to be of minor importance. Such influences lead to a certain experimental variability in inter-person and intra-person variation, which is higher than the inter-finger and intra-finger variation. Comparing the aging behavior of 17 different features using 1490 time series with a total of 41,520 fingerprint images, the great potential of the CWL technique in combination with the binary pixel feature from prior work is shown. Performing three different experiments for the classification of fingerprints into the two time classes [0, 5 h] and [5, 24 h], a maximum classification performance of 79.29% (kappa=0.46) is achieved for a general case, which is further improved for special cases. The statistical significance of the two best-performing features (both binary pixel versions based on 2D-intensity images) is manually shown and a feature fusion is performed, highlighting the strong dependency of the features on each other. It is concluded that such method might be combined with additional capturing devices, such as microscopes or spectroscopes, to a very promising age estimation scheme. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang
2016-11-16
The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
Multidimensional Optimization of Signal Space Distance Parameters in WLAN Positioning
Brković, Milenko; Simić, Mirjana
2014-01-01
Accurate indoor localization of mobile users is one of the challenging problems of the last decade. Besides delivering high speed Internet, Wireless Local Area Network (WLAN) can be used as an effective indoor positioning system, being competitive both in terms of accuracy and cost. Among the localization algorithms, nearest neighbor fingerprinting algorithms based on Received Signal Strength (RSS) parameter have been extensively studied as an inexpensive solution for delivering indoor Location Based Services (LBS). In this paper, we propose the optimization of the signal space distance parameters in order to improve precision of WLAN indoor positioning, based on nearest neighbor fingerprinting algorithms. Experiments in a real WLAN environment indicate that proposed optimization leads to substantial improvements of the localization accuracy. Our approach is conceptually simple, is easy to implement, and does not require any additional hardware. PMID:24757443
Genomic Classification of Tumors | Center for Cancer Research
CCR investigators were among the first to classify tumors based on genetics, laying the groundwork for today’s common practice to molecularly characterize tumors based on their genetic fingerprints for personalized treatments.
Hyperspectral feature mapping classification based on mathematical morphology
NASA Astrophysics Data System (ADS)
Liu, Chang; Li, Junwei; Wang, Guangping; Wu, Jingli
2016-03-01
This paper proposed a hyperspectral feature mapping classification algorithm based on mathematical morphology. Without the priori information such as spectral library etc., the spectral and spatial information can be used to realize the hyperspectral feature mapping classification. The mathematical morphological erosion and dilation operations are performed respectively to extract endmembers. The spectral feature mapping algorithm is used to carry on hyperspectral image classification. The hyperspectral image collected by AVIRIS is applied to evaluate the proposed algorithm. The proposed algorithm is compared with minimum Euclidean distance mapping algorithm, minimum Mahalanobis distance mapping algorithm, SAM algorithm and binary encoding mapping algorithm. From the results of the experiments, it is illuminated that the proposed algorithm's performance is better than that of the other algorithms under the same condition and has higher classification accuracy.
Tamburro, Gabriella; Fiedler, Patrique; Stone, David; Haueisen, Jens; Comani, Silvia
2018-01-01
EEG may be affected by artefacts hindering the analysis of brain signals. Data-driven methods like independent component analysis (ICA) are successful approaches to remove artefacts from the EEG. However, the ICA-based methods developed so far are often affected by limitations, such as: the need for visual inspection of the separated independent components (subjectivity problem) and, in some cases, for the independent and simultaneous recording of the inspected artefacts to identify the artefactual independent components; a potentially heavy manipulation of the EEG signals; the use of linear classification methods; the use of simulated artefacts to validate the methods; no testing in dry electrode or high-density EEG datasets; applications limited to specific conditions and electrode layouts. Our fingerprint method automatically identifies EEG ICs containing eyeblinks, eye movements, myogenic artefacts and cardiac interference by evaluating 14 temporal, spatial, spectral, and statistical features composing the IC fingerprint. Sixty-two real EEG datasets containing cued artefacts are recorded with wet and dry electrodes (128 wet and 97 dry channels). For each artefact, 10 nonlinear SVM classifiers are trained on fingerprints of expert-classified ICs. Training groups include randomly chosen wet and dry datasets decomposed in 80 ICs. The classifiers are tested on the IC-fingerprints of different datasets decomposed into 20, 50, or 80 ICs. The SVM performance is assessed in terms of accuracy, False Omission Rate (FOR), Hit Rate (HR), False Alarm Rate (FAR), and sensitivity ( p ). For each artefact, the quality of the artefact-free EEG reconstructed using the classification of the best SVM is assessed by visual inspection and SNR. The best SVM classifier for each artefact type achieved average accuracy of 1 (eyeblink), 0.98 (cardiac interference), and 0.97 (eye movement and myogenic artefact). Average classification sensitivity (p) was 1 (eyeblink), 0.997 (myogenic artefact), 0.98 (eye movement), and 0.48 (cardiac interference). Average artefact reduction ranged from a maximum of 82% for eyeblinks to a minimum of 33% for cardiac interference, depending on the effectiveness of the proposed method and the amplitude of the removed artefact. The performance of the SVM classifiers did not depend on the electrode type, whereas it was better for lower decomposition levels (50 and 20 ICs). Apart from cardiac interference, SVM performance and average artefact reduction indicate that the fingerprint method has an excellent overall performance in the automatic detection of eyeblinks, eye movements and myogenic artefacts, which is comparable to that of existing methods. Being also independent from simultaneous artefact recording, electrode number, type and layout, and decomposition level, the proposed fingerprint method can have useful applications in clinical and experimental EEG settings.
Tamburro, Gabriella; Fiedler, Patrique; Stone, David; Haueisen, Jens
2018-01-01
Background EEG may be affected by artefacts hindering the analysis of brain signals. Data-driven methods like independent component analysis (ICA) are successful approaches to remove artefacts from the EEG. However, the ICA-based methods developed so far are often affected by limitations, such as: the need for visual inspection of the separated independent components (subjectivity problem) and, in some cases, for the independent and simultaneous recording of the inspected artefacts to identify the artefactual independent components; a potentially heavy manipulation of the EEG signals; the use of linear classification methods; the use of simulated artefacts to validate the methods; no testing in dry electrode or high-density EEG datasets; applications limited to specific conditions and electrode layouts. Methods Our fingerprint method automatically identifies EEG ICs containing eyeblinks, eye movements, myogenic artefacts and cardiac interference by evaluating 14 temporal, spatial, spectral, and statistical features composing the IC fingerprint. Sixty-two real EEG datasets containing cued artefacts are recorded with wet and dry electrodes (128 wet and 97 dry channels). For each artefact, 10 nonlinear SVM classifiers are trained on fingerprints of expert-classified ICs. Training groups include randomly chosen wet and dry datasets decomposed in 80 ICs. The classifiers are tested on the IC-fingerprints of different datasets decomposed into 20, 50, or 80 ICs. The SVM performance is assessed in terms of accuracy, False Omission Rate (FOR), Hit Rate (HR), False Alarm Rate (FAR), and sensitivity (p). For each artefact, the quality of the artefact-free EEG reconstructed using the classification of the best SVM is assessed by visual inspection and SNR. Results The best SVM classifier for each artefact type achieved average accuracy of 1 (eyeblink), 0.98 (cardiac interference), and 0.97 (eye movement and myogenic artefact). Average classification sensitivity (p) was 1 (eyeblink), 0.997 (myogenic artefact), 0.98 (eye movement), and 0.48 (cardiac interference). Average artefact reduction ranged from a maximum of 82% for eyeblinks to a minimum of 33% for cardiac interference, depending on the effectiveness of the proposed method and the amplitude of the removed artefact. The performance of the SVM classifiers did not depend on the electrode type, whereas it was better for lower decomposition levels (50 and 20 ICs). Discussion Apart from cardiac interference, SVM performance and average artefact reduction indicate that the fingerprint method has an excellent overall performance in the automatic detection of eyeblinks, eye movements and myogenic artefacts, which is comparable to that of existing methods. Being also independent from simultaneous artefact recording, electrode number, type and layout, and decomposition level, the proposed fingerprint method can have useful applications in clinical and experimental EEG settings. PMID:29492336
Pan, Yu; Zhang, Ji; Li, Hong; Wang, Yuan-Zhong; Li, Wan-Yi
2016-10-01
Macamides with a benzylalkylamide nucleus are characteristic and major bioactive compounds in the functional food maca (Lepidium meyenii Walp). The aim of this study was to explore variations in macamide content among maca from China and Peru. Twenty-seven batches of maca hypocotyls with different phenotypes, sampled from different geographical origins, were extracted and profiled by liquid chromatography with ultraviolet detection/tandem mass spectrometry (LC-UV/MS/MS). Twelve macamides were identified by MS operated in multiple scanning modes. Similarity analysis showed that maca samples differed significantly in their macamide fingerprinting. Partial least squares discriminant analysis (PLS-DA) was used to differentiate samples according to their geographical origin and to identify the most relevant variables in the classification model. The prediction accuracy for raw maca was 91% and five macamides were selected and considered as chemical markers for sample classification. When combined with a PLS-DA model, characteristic fingerprinting based on macamides could be recommended for labelling for the authentication of maca from different geographical origins. The results provided potential evidence for the relationships between environmental or other factors and distribution of macamides. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.
Innovation and stasis: technology and race in Mark Twain's Pudd'nhead Wilson.
Current, Cynthia A
2009-01-01
Mark Twain's Pudd'nhead Wilson demonstrates how technologies of identification attempt to counter how bodies evolve beyond previous constraints—in particular, the constraints of racial classification. Twain develops accounts of subjectivity and racial classification that cover an extraordinary breadth of genealogy, biology, and law, while still invoking elements of randomness and chance. The key to such combinations of fixity and emergence in human identity is the technology of fingerprinting. Twain's speculative engagement with fingerprinting creates a system and medium to classify and secure particular forms of identity, leading to the reassertion of racial values already inherent in science and technology, law, and commerce. Fingerprinting represents the direction that technologies of identity would seek to employ: a movement away from direct visual observation of bodies, whose emergence and change over time make them difficult to categorize, to reliance on archives of information that are increasingly removed from the contexts of meaning and emergence those bodies inhabit; this reflects "one drop" politics, as race becomes increasingly difficult to define visually. The archive itself, then, becomes infected with the spectacular vitality of, and the speculation and risk within, nineteenth-century biological and cultural determinism.
[Application of genetic algorithm in blending technology for extractions of Cortex Fraxini].
Yang, Ming; Zhou, Yinmin; Chen, Jialei; Yu, Minying; Shi, Xiufeng; Gu, Xijun
2009-10-01
To explore the feasibility of genetic algorithm (GA) on multiple objective blending technology for extractions of Cortex Fraxini. According to that the optimization objective was the combination of fingerprint similarity and the root-mean-square error of multiple key constituents, a new multiple objective optimization model of 10 batches extractions of Cortex Fraxini was built. The blending coefficient was obtained by genetic algorithm. The quality of 10 batches extractions of Cortex Fraxini that after blending was evaluated with the finger print similarity and root-mean-square error as indexes. The quality of 10 batches extractions of Cortex Fraxini that after blending was well improved. Comparing with the fingerprint of the control sample, the similarity was up, but the degree of variation is down. The relative deviation of the key constituents was less than 10%. It is proved that genetic algorithm works well on multiple objective blending technology for extractions of Cortex Fraxini. This method can be a reference to control the quality of extractions of Cortex Fraxini. Genetic algorithm in blending technology for extractions of Chinese medicines is advisable.
Jia, Yi; Huan, Jun; Buhr, Vincent; Zhang, Jintao; Carayannopoulos, Leonidas N
2009-01-01
Background Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail. Results Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in "immunoevasins", proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. Conclusion We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty. PMID:19208148
Collins, A L; Pulley, S; Foster, I D L; Gellis, A; Porto, P; Horowitz, A J
2017-06-01
The growing awareness of the environmental significance of fine-grained sediment fluxes through catchment systems continues to underscore the need for reliable information on the principal sources of this material. Source estimates are difficult to obtain using traditional monitoring techniques, but sediment source fingerprinting or tracing procedures, have emerged as a potentially valuable alternative. Despite the rapidly increasing numbers of studies reporting the use of sediment source fingerprinting, several key challenges and uncertainties continue to hamper consensus among the international scientific community on key components of the existing methodological procedures. Accordingly, this contribution reviews and presents recent developments for several key aspects of fingerprinting, namely: sediment source classification, catchment source and target sediment sampling, tracer selection, grain size issues, tracer conservatism, source apportionment modelling, and assessment of source predictions using artificial mixtures. Finally, a decision-tree representing the current state of knowledge is presented, to guide end-users in applying the fingerprinting approach. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Recognizable-image selection for fingerprint recognition with a mobile-device camera.
Lee, Dongjae; Choi, Kyoungtaek; Choi, Heeseung; Kim, Jaihie
2008-02-01
This paper proposes a recognizable-image selection algorithm for fingerprint-verification systems that use a camera embedded in a mobile device. A recognizable image is defined as the fingerprint image which includes the characteristics that are sufficiently discriminating an individual from other people. While general camera systems obtain focused images by using various gradient measures to estimate high-frequency components, mobile cameras cannot acquire recognizable images in the same way because the obtained images may not be adequate for fingerprint recognition, even if they are properly focused. A recognizable image has to meet the following two conditions: First, valid region in the recognizable image should be large enough compared with other nonrecognizable images. Here, a valid region is a well-focused part, and ridges in the region are clearly distinguishable from valleys. In order to select valid regions, this paper proposes a new focus-measurement algorithm using the secondary partial derivatives and a quality estimation utilizing the coherence and symmetry of gradient distribution. Second, rolling and pitching degrees of a finger measured from the camera plane should be within some limit for a recognizable image. The position of a core point and the contour of a finger are used to estimate the degrees of rolling and pitching. Experimental results show that our proposed method selects valid regions and estimates the degrees of rolling and pitching properly. In addition, fingerprint-verification performance is improved by detecting the recognizable images.
Fast probabilistic file fingerprinting for big data
2013-01-01
Background Biological data acquisition is raising new challenges, both in data analysis and handling. Not only is it proving hard to analyze the data at the rate it is generated today, but simply reading and transferring data files can be prohibitively slow due to their size. This primarily concerns logistics within and between data centers, but is also important for workstation users in the analysis phase. Common usage patterns, such as comparing and transferring files, are proving computationally expensive and are tying down shared resources. Results We present an efficient method for calculating file uniqueness for large scientific data files, that takes less computational effort than existing techniques. This method, called Probabilistic Fast File Fingerprinting (PFFF), exploits the variation present in biological data and computes file fingerprints by sampling randomly from the file instead of reading it in full. Consequently, it has a flat performance characteristic, correlated with data variation rather than file size. We demonstrate that probabilistic fingerprinting can be as reliable as existing hashing techniques, with provably negligible risk of collisions. We measure the performance of the algorithm on a number of data storage and access technologies, identifying its strengths as well as limitations. Conclusions Probabilistic fingerprinting may significantly reduce the use of computational resources when comparing very large files. Utilisation of probabilistic fingerprinting techniques can increase the speed of common file-related workflows, both in the data center and for workbench analysis. The implementation of the algorithm is available as an open-source tool named pfff, as a command-line tool as well as a C library. The tool can be downloaded from http://biit.cs.ut.ee/pfff. PMID:23445565
Liu, Wei; Wang, Dongmei; Liu, Jianjun; Li, Dengwu; Yin, Dongxue
2016-01-01
The present study was performed to assess the quality of Potentilla fruticosa L. sampled from distinct regions of China using high performance liquid chromatography (HPLC) fingerprinting coupled with a suite of chemometric methods. For this quantitative analysis, the main active phytochemical compositions and the antioxidant activity in P. fruticosa were also investigated. Considering the high percentages and antioxidant activities of phytochemicals, P. fruticosa samples from Kangding, Sichuan were selected as the most valuable raw materials. Similarity analysis (SA) of HPLC fingerprints, hierarchical cluster analysis (HCA), principle component analysis (PCA), and discriminant analysis (DA) were further employed to provide accurate classification and quality estimates of P. fruticosa. Two principal components (PCs) were collected by PCA. PC1 separated samples from Kangding, Sichuan, capturing 57.64% of the variance, whereas PC2 contributed to further separation, capturing 18.97% of the variance. Two kinds of discriminant functions with a 100% discrimination ratio were constructed. The results strongly supported the conclusion that the eight samples from different regions were clustered into three major groups, corresponding with their morphological classification, for which HPLC analysis confirmed the considerable variation in phytochemical compositions and that P. fruticosa samples from Kangding, Sichuan were of high quality. The results of SA, HCA, PCA, and DA were in agreement and performed well for the quality assessment of P. fruticosa. Consequently, HPLC fingerprinting coupled with chemometric techniques provides a highly flexible and reliable method for the quality evaluation of traditional Chinese medicines.
Liu, Wei; Wang, Dongmei; Liu, Jianjun; Li, Dengwu; Yin, Dongxue
2016-01-01
The present study was performed to assess the quality of Potentilla fruticosa L. sampled from distinct regions of China using high performance liquid chromatography (HPLC) fingerprinting coupled with a suite of chemometric methods. For this quantitative analysis, the main active phytochemical compositions and the antioxidant activity in P. fruticosa were also investigated. Considering the high percentages and antioxidant activities of phytochemicals, P. fruticosa samples from Kangding, Sichuan were selected as the most valuable raw materials. Similarity analysis (SA) of HPLC fingerprints, hierarchical cluster analysis (HCA), principle component analysis (PCA), and discriminant analysis (DA) were further employed to provide accurate classification and quality estimates of P. fruticosa. Two principal components (PCs) were collected by PCA. PC1 separated samples from Kangding, Sichuan, capturing 57.64% of the variance, whereas PC2 contributed to further separation, capturing 18.97% of the variance. Two kinds of discriminant functions with a 100% discrimination ratio were constructed. The results strongly supported the conclusion that the eight samples from different regions were clustered into three major groups, corresponding with their morphological classification, for which HPLC analysis confirmed the considerable variation in phytochemical compositions and that P. fruticosa samples from Kangding, Sichuan were of high quality. The results of SA, HCA, PCA, and DA were in agreement and performed well for the quality assessment of P. fruticosa. Consequently, HPLC fingerprinting coupled with chemometric techniques provides a highly flexible and reliable method for the quality evaluation of traditional Chinese medicines. PMID:26890416
Thin-layer chromatographic identification of Chinese propolis using chemometric fingerprinting.
Tang, Tie-xin; Guo, Wei-yan; Xu, Ye; Zhang, Si-ming; Xu, Xin-jun; Wang, Dong-mei; Zhao, Zhi-min; Zhu, Long-ping; Yang, De-po
2014-01-01
Poplar tree gum has a similar chemical composition and appearance to Chinese propolis (bee glue) and has been widely used as a counterfeit propolis because Chinese propolis is typically the poplar-type propolis, the chemical composition of which is determined mainly by the resin of poplar trees. The discrimination of Chinese propolis from poplar tree gum is a challenging task. To develop a rapid thin-layer chromatographic (TLC) identification method using chemometric fingerprinting to discriminate Chinese propolis from poplar tree gum. A new TLC method using a combination of ammonia and hydrogen peroxide vapours as the visualisation reagent was developed to characterise the chemical profile of Chinese propolis. Three separate people performed TLC on eight Chinese propolis samples and three poplar tree gum samples of varying origins. Five chemometric methods, including similarity analysis, hierarchical clustering, k-means clustering, neural network and support vector machine, were compared for use in classifying the samples based on their densitograms obtained from the TLC chromatograms via image analysis. Hierarchical clustering, neural network and support vector machine analyses achieved a correct classification rate of 100% in classifying the samples. A strategy for TLC identification of Chinese propolis using chemometric fingerprinting was proposed and it provided accurate sample classification. The study has shown that the TLC identification method using chemometric fingerprinting is a rapid, low-cost method for the discrimination of Chinese propolis from poplar tree gum and may be used for the quality control of Chinese propolis. Copyright © 2014 John Wiley & Sons, Ltd.
Deconinck, E; Sokeng Djiogo, C A; Courselle, P
2017-09-05
Plant food supplements are gaining popularity, resulting in a broader spectrum of available products and an increased consumption. Next to the problem of adulteration of these products with synthetic drugs the presence of regulated or toxic plants is an important issue, especially when the products are purchased from irregular sources. This paper focusses on this problem by using specific chromatographic fingerprints for five targeted plants and chemometric classification techniques in order to extract the important information from the fingerprints and determine the presence of the targeted plants in plant food supplements in an objective way. Two approaches were followed: (1) a multiclass model, (2) 2-class model for each of the targeted plants separately. For both approaches good classification models were obtained, especially when using SIMCA and PLS-DA. For each model, misclassification rates for the external test set of maximum one sample could be obtained. The models were applied to five real samples resulting in the identification of the correct plants, confirmed by mass spectrometry. Therefore chromatographic fingerprinting combined with chemometric modelling can be considered interesting to make a more objective decision on whether a regulated plant is present in a plant food supplement or not, especially when no mass spectrometry equipment is available. The results suggest also that the use of a battery of 2-class models to screen for several plants is the approach to be preferred. Copyright © 2017 Elsevier B.V. All rights reserved.
Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature
Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V.; Lawson, Andrew; Zeng, Jia; Johnson, Amber M.; Holla, Vijaykumar; Bailey, Ann M.; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W.
2015-01-01
Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org PMID:25858285
Low-Speed Fingerprint Image Capture System User`s Guide, June 1, 1993
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whitus, B.R.; Goddard, J.S.; Jatko, W.B.
1993-06-01
The Low-Speed Fingerprint Image Capture System (LS-FICS) uses a Sun workstation controlling a Lenzar ElectroOptics Opacity 1000 imaging system to digitize fingerprint card images to support the Federal Bureau of Investigation`s (FBI`s) Automated Fingerprint Identification System (AFIS) program. The system also supports the operations performed by the Oak Ridge National Laboratory- (ORNL-) developed Image Transmission Network (ITN) prototype card scanning system. The input to the system is a single FBI fingerprint card of the agreed-upon standard format and a user-specified identification number. The output is a file formatted to be compatible with the National Institute of Standards and Technology (NIST)more » draft standard for fingerprint data exchange dated June 10, 1992. These NIST compatible files contain the required print and text images. The LS-FICS is designed to provide the FBI with the capability of scanning fingerprint cards into a digital format. The FBI will replicate the system to generate a data base of test images. The Host Workstation contains the image data paths and the compression algorithm. A local area network interface, disk storage, and tape drive are used for the image storage and retrieval, and the Lenzar Opacity 1000 scanner is used to acquire the image. The scanner is capable of resolving 500 pixels/in. in both x and y directions. The print images are maintained in full 8-bit gray scale and compressed with an FBI-approved wavelet-based compression algorithm. The text fields are downsampled to 250 pixels/in. and 2-bit gray scale. The text images are then compressed using a lossless Huffman coding scheme. The text fields retrieved from the output files are easily interpreted when displayed on the screen. Detailed procedures are provided for system calibration and operation. Software tools are provided to verify proper system operation.« less
Siew, Ging Yang; Ng, Wei Lun; Tan, Sheau Wei; Alitheen, Noorjahan Banu; Tan, Soon Guan; Yeap, Swee Keong
2018-01-01
Durian ( Durio zibethinus ) is one of the most popular tropical fruits in Asia. To date, 126 durian types have been registered with the Department of Agriculture in Malaysia based on phenotypic characteristics. Classification based on morphology is convenient, easy, and fast but it suffers from phenotypic plasticity as a direct result of environmental factors and age. To overcome the limitation of morphological classification, there is a need to carry out genetic characterization of the various durian types. Such data is important for the evaluation and management of durian genetic resources in producing countries. In this study, simple sequence repeat (SSR) markers were used to study the genetic variation in 27 durian types from the germplasm collection of Universiti Putra Malaysia. Based on DNA sequences deposited in Genbank, seven pairs of primers were successfully designed to amplify SSR regions in the durian DNA samples. High levels of variation among the 27 durian types were observed (expected heterozygosity, H E = 0.35). The DNA fingerprinting power of SSR markers revealed by the combined probability of identity (PI) of all loci was 2.3×10 -3 . Unique DNA fingerprints were generated for 21 out of 27 durian types using five polymorphic SSR markers (the other two SSR markers were monomorphic). We further tested the utility of these markers by evaluating the clonal status of shared durian types from different germplasm collection sites, and found that some were not clones. The findings in this preliminary study not only shows the feasibility of using SSR markers for DNA fingerprinting of durian types, but also challenges the current classification of durian types, e.g., on whether the different types should be called "clones", "varieties", or "cultivars". Such matters have a direct impact on the regulation and management of durian genetic resources in the region.
Siew, Ging Yang; Tan, Sheau Wei; Tan, Soon Guan; Yeap, Swee Keong
2018-01-01
Durian (Durio zibethinus) is one of the most popular tropical fruits in Asia. To date, 126 durian types have been registered with the Department of Agriculture in Malaysia based on phenotypic characteristics. Classification based on morphology is convenient, easy, and fast but it suffers from phenotypic plasticity as a direct result of environmental factors and age. To overcome the limitation of morphological classification, there is a need to carry out genetic characterization of the various durian types. Such data is important for the evaluation and management of durian genetic resources in producing countries. In this study, simple sequence repeat (SSR) markers were used to study the genetic variation in 27 durian types from the germplasm collection of Universiti Putra Malaysia. Based on DNA sequences deposited in Genbank, seven pairs of primers were successfully designed to amplify SSR regions in the durian DNA samples. High levels of variation among the 27 durian types were observed (expected heterozygosity, HE = 0.35). The DNA fingerprinting power of SSR markers revealed by the combined probability of identity (PI) of all loci was 2.3×10−3. Unique DNA fingerprints were generated for 21 out of 27 durian types using five polymorphic SSR markers (the other two SSR markers were monomorphic). We further tested the utility of these markers by evaluating the clonal status of shared durian types from different germplasm collection sites, and found that some were not clones. The findings in this preliminary study not only shows the feasibility of using SSR markers for DNA fingerprinting of durian types, but also challenges the current classification of durian types, e.g., on whether the different types should be called “clones”, “varieties”, or “cultivars”. Such matters have a direct impact on the regulation and management of durian genetic resources in the region. PMID:29511604
Piccinonna, Sara; Ragone, Rosa; Stocchero, Matteo; Del Coco, Laura; De Pascali, Sandra Angelica; Schena, Francesco Paolo; Fanizzi, Francesco Paolo
2016-05-15
Nuclear Magnetic Resonance (NMR) spectroscopy is emerging as a powerful technique in olive oil fingerprinting, but its analytical robustness has to be proved. Here, we report a comparative study between two laboratories on olive oil (1)H NMR fingerprinting, aiming to demonstrate the robustness of NMR-based metabolomics in generating comparable data sets for cultivar classification. Sample preparation and data acquisition were performed independently in two laboratories, equipped with different resolution spectrometers (400 and 500 MHz), using two identical sets of mono-varietal olive oils. Partial Least Squares (PLS)-based techniques were applied to compare the data sets produced by the two laboratories. Despite differences in spectrum baseline, and in intensity and shape of peaks, the amount of shared information was significant (almost 70%) and related to cultivar (same metabolites discriminated between cultivars). In conclusion, regardless of the variability due to operator and machine, the data sets from the two participating units were comparable for the purpose of classification. Copyright © 2015 Elsevier Ltd. All rights reserved.
Estuarial fingerprinting through multidimensional fluorescence and multivariate analysis.
Hall, Gregory J; Clow, Kerin E; Kenny, Jonathan E
2005-10-01
As part of a strategy for preventing the introduction of aquatic nuisance species (ANS) to U.S. estuaries, ballast water exchange (BWE) regulations have been imposed. Enforcing these regulations requires a reliable method for determining the port of origin of water in the ballast tanks of ships entering U.S. waters. This study shows that a three-dimensional fluorescence fingerprinting technique, excitation emission matrix (EEM) spectroscopy, holds great promise as a ballast water analysis tool. In our technique, EEMs are analyzed by multivariate classification and curve resolution methods, such as N-way partial least squares Regression-discriminant analysis (NPLS-DA) and parallel factor analysis (PARAFAC). We demonstrate that classification techniques can be used to discriminate among sampling sites less than 10 miles apart, encompassing Boston Harbor and two tributaries in the Mystic River Watershed. To our knowledge, this work is the first to use multivariate analysis to classify water as to location of origin. Furthermore, it is shown that curve resolution can show seasonal features within the multidimensional fluorescence data sets, which correlate with difficulty in classification.
Cai, Yong; Li, Xiwen; Wang, Runmiao; Yang, Qing; Li, Peng; Hu, Hao
2016-01-01
Currently, the chemical fingerprint comparison and analysis is mainly based on professional equipment and software, it's expensive and inconvenient. This study aims to integrate QR (Quick Response) code with quality data and mobile intelligent technology to develop a convenient query terminal for tracing quality in the whole industrial chain of TCM (traditional Chinese medicine). Three herbal medicines were randomly selected and their chemical two-dimensional barcode (2D) barcodes fingerprints were constructed. Smartphone application (APP) based on Android system was developed to read initial data of 2D chemical barcodes, and compared multiple fingerprints from different batches of same species or different species. It was demonstrated that there were no significant differences between original and scanned TCM chemical fingerprints. Meanwhile, different TCM chemical fingerprint QR codes could be rendered in the same coordinate and showed the differences very intuitively. To be able to distinguish the variations of chemical fingerprint more directly, linear interpolation angle cosine similarity algorithm (LIACSA) was proposed to get similarity ratio. This study showed that QR codes can be used as an effective information carrier to transfer quality data. Smartphone application can rapidly read quality information in QR codes and convert data into TCM chemical fingerprints.
Evaluation of fingerprint deformation using optical coherence tomography
NASA Astrophysics Data System (ADS)
Gutierrez da Costa, Henrique S.; Maxey, Jessica R.; Silva, Luciano; Ellerbee, Audrey K.
2014-02-01
Biometric identification systems have important applications to privacy and security. The most widely used of these, print identification, is based on imaging patterns present in the fingers, hands and feet that are formed by the ridges, valleys and pores of the skin. Most modern print sensors acquire images of the finger when pressed against a sensor surface. Unfortunately, this pressure may result in deformations, characterized by changes in the sizes and relative distances of the print patterns, and such changes have been shown to negatively affect the performance of fingerprint identification algorithms. Optical coherence tomography (OCT) is a novel imaging technique that is capable of imaging the subsurface of biological tissue. Hence, OCT may be used to obtain images of subdermal skin structures from which one can extract an internal fingerprint. The internal fingerprint is very similar in structure to the commonly used external fingerprint and is of increasing interest in investigations of identify fraud. We proposed and tested metrics based on measurements calculated from external and internal fingerprints to evaluate the amount of deformation of the skin. Such metrics were used to test hypotheses about the differences of deformation between the internal and external images, variations with the type of finger and location inside the fingerprint.
Verifying Sediment Fingerprinting Results with Known Mixtures
NASA Astrophysics Data System (ADS)
Gellis, A.; Gorman-Sanisaca, L.; Cashman, M. J.
2017-12-01
Sediment fingerprinting is a widely used approach to determine the specific sources of fluvial sediment within a watershed. It relies on the principle that potential sediment sources can be identified using a set of chemical tracers (or fingerprints), and comparison of these source fingerprints with fluvial (target) sediment allows for source apportionment of the fluvial sediment. There are numerous source classifications, fingerprints, and statistical approaches used in the literature to apportion sources of sediment. However, few of these studies have sought to test the method by creating controls on the ratio of sources in the target sediment. Without a controlled environment for inputs and outputs, such verification of results is ambiguous. Here, we generated artificial mixtures of source sediment from an agricultural/forested watershed in Virginia, USA (Smith Creek, 246 km2) to verify the apportionment results. Target samples were established from known mixtures of the four major sediment sources in the watershed (forest, pasture, cropland, and streambanks). The target samples were sieved to less than 63 microns and analyzed for elemental and isotopic chemistry. The target samples and source samples were run through the Sediment Source Assessment Tool (Sed_SAT) to verify if the statistical operations provided the correct apportionment. Sed_SAT uses a multivariate parametric approach to identify the minimum suite of fingerprints that discriminate the source areas and applies these fingerprints through an unmixng model to apportion sediment. The results of this sediment fingerprinting verification experiment will be presented in this session.
Linguistically informed digital fingerprints for text
NASA Astrophysics Data System (ADS)
Uzuner, Özlem
2006-02-01
Digital fingerprinting, watermarking, and tracking technologies have gained importance in the recent years in response to growing problems such as digital copyright infringement. While fingerprints and watermarks can be generated in many different ways, use of natural language processing for these purposes has so far been limited. Measuring similarity of literary works for automatic copyright infringement detection requires identifying and comparing creative expression of content in documents. In this paper, we present a linguistic approach to automatically fingerprinting novels based on their expression of content. We use natural language processing techniques to generate "expression fingerprints". These fingerprints consist of both syntactic and semantic elements of language, i.e., syntactic and semantic elements of expression. Our experiments indicate that syntactic and semantic elements of expression enable accurate identification of novels and their paraphrases, providing a significant improvement over techniques used in text classification literature for automatic copy recognition. We show that these elements of expression can be used to fingerprint, label, or watermark works; they represent features that are essential to the character of works and that remain fairly consistent in the works even when works are paraphrased. These features can be directly extracted from the contents of the works on demand and can be used to recognize works that would not be correctly identified either in the absence of pre-existing labels or by verbatim-copy detectors.
Ugliano, Maurizio
2016-12-01
This work describes the application of disposable screen printed carbon paste sensors for the analysis of the main white wine oxidizable compounds as well as for the rapid fingerprinting and classification of white wines from different grape varieties. The response of individual white wine antioxidants such as flavanols, flavanol derivatives, phenolic acids, SO2 and ascorbic acid was first assessed in model wine. Analysis of commercial white wines gave voltammograms featuring two unresolved anodic waves corresponding to the oxidation of different compounds, mostly phenolic antioxidants. Calculation of the first order derivative of measured current vs. applied potential allowed resolving these two waves, highlighting the occurrence of several electrode processes corresponding to the oxidation of individual wine components. Through the application of Principal Component Analysis (PCA), derivative voltammograms were used to discriminate among wines of different varieties. Copyright © 2016 Elsevier Ltd. All rights reserved.
Keitel, Anne; Gross, Joachim
2016-06-01
The human brain can be parcellated into diverse anatomical areas. We investigated whether rhythmic brain activity in these areas is characteristic and can be used for automatic classification. To this end, resting-state MEG data of 22 healthy adults was analysed. Power spectra of 1-s long data segments for atlas-defined brain areas were clustered into spectral profiles ("fingerprints"), using k-means and Gaussian mixture (GM) modelling. We demonstrate that individual areas can be identified from these spectral profiles with high accuracy. Our results suggest that each brain area engages in different spectral modes that are characteristic for individual areas. Clustering of brain areas according to similarity of spectral profiles reveals well-known brain networks. Furthermore, we demonstrate task-specific modulations of auditory spectral profiles during auditory processing. These findings have important implications for the classification of regional spectral activity and allow for novel approaches in neuroimaging and neurostimulation in health and disease.
DOE Office of Scientific and Technical Information (OSTI.GOV)
SEDLACEK,III, A.J.FINFROCK,C.
As a member of the science-support part of the ITT-lead LISA development program, BNL is tasked with the acquisition of UV Raman spectral fingerprints and associated scattering cross-sections for those chemicals-of-interest to the program's sponsor. In support of this role, the present report contains the first installment of UV Raman spectral fingerprint data on the initial subset of chemicals. Because of the unique nature associated with the acquisition of spectral fingerprints for use in spectral pattern matching algorithms (i.e., CLS, PLS, ANN) great care has been undertaken to maximize the signal-to-noise and to minimize unnecessary spectral subtractions, in an effortmore » to provide the highest quality spectral fingerprints. This report is divided into 4 sections. The first is an Experimental section that outlines how the Raman spectra are performed. This is then followed by a section on Sample Handling. Following this, the spectral fingerprints are presented in the Results section where the data reduction process is outlined. Finally, a Photographs section is included.« less
Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature.
Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V; Lawson, Andrew; Zeng, Jia; Johnson, Amber M; Holla, Vijaykumar; Bailey, Ann M; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W
2015-01-01
Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org © The Author(s) 2015. Published by Oxford University Press.
A review of classification algorithms for EEG-based brain-computer interfaces.
Lotte, F; Congedo, M; Lécuyer, A; Lamarche, F; Arnaldi, B
2007-06-01
In this paper we review classification algorithms used to design brain-computer interface (BCI) systems based on electroencephalography (EEG). We briefly present the commonly employed algorithms and describe their critical properties. Based on the literature, we compare them in terms of performance and provide guidelines to choose the suitable classification algorithm(s) for a specific BCI.
1998-06-26
METHOD OF FREQUENCY DETERMINATION 4 IN SOFTWARE METRIC DATA THROUGH THE USE OF THE 5 MULTIPLE SIGNAL CLASSIFICATION ( MUSIC ) ALGORITHM 6 7 STATEMENT OF...graph showing the estimated power spectral 12 density (PSD) generated by the multiple signal classification 13 ( MUSIC ) algorithm from the data set used...implemented in this module; however, it is preferred to use 1 the Multiple Signal Classification ( MUSIC ) algorithm. The MUSIC 2 algorithm is
NASA Astrophysics Data System (ADS)
Wang, Bingjie; Pi, Shaohua; Sun, Qi; Jia, Bo
2015-05-01
An improved classification algorithm that considers multiscale wavelet packet Shannon entropy is proposed. Decomposition coefficients at all levels are obtained to build the initial Shannon entropy feature vector. After subtracting the Shannon entropy map of the background signal, components of the strongest discriminating power in the initial feature vector are picked out to rebuild the Shannon entropy feature vector, which is transferred to radial basis function (RBF) neural network for classification. Four types of man-made vibrational intrusion signals are recorded based on a modified Sagnac interferometer. The performance of the improved classification algorithm has been evaluated by the classification experiments via RBF neural network under different diffusion coefficients. An 85% classification accuracy rate is achieved, which is higher than the other common algorithms. The classification results show that this improved classification algorithm can be used to classify vibrational intrusion signals in an automatic real-time monitoring system.
Moros, J; Serrano, J; Gallego, F J; Macías, J; Laserna, J J
2013-06-15
During recent years laser-induced breakdown spectroscopy (LIBS) has been considered one of the techniques with larger ability for trace detection of explosives. However, despite of the high sensitivity exhibited for this application, LIBS suffers from a limited selectivity due to difficulties in assigning the molecular origin of the spectral emissions observed. This circumstance makes the recognition of fingerprints a latent challenging problem. In the present manuscript the sorting of six explosives (chloratite, ammonal, DNT, TNT, RDX and PETN) against a broad list of potential harmless interferents (butter, fuel oil, hand cream, olive oil, …), all of them in the form of fingerprints deposited on the surfaces of objects for courier services, has been carried out. When LIBS information is processed through a multi-stage architecture algorithm built from a suitable combination of 3 learning classifiers, an unknown fingerprint may be labeled into a particular class. Neural network classifiers trained by the Levenberg-Marquardt rule were decided within 3D scatter plots projected onto the subspace of the most useful features extracted from the LIBS spectra. Experimental results demonstrate that the presented algorithm sorts fingerprints according to their hazardous character, although its spectral information is virtually identical in appearance, with rates of false negatives and false positives not beyond of 10%. These reported achievements mean a step forward in the technology readiness level of LIBS for this complex application related to defense, homeland security and force protection. Copyright © 2013 Elsevier B.V. All rights reserved.
2015-01-01
Two factors contribute to the inefficiency associated with screening pharmaceutical library collections as a means of identifying new drugs: [1] the limited success of virtual screening (VS) methods in identifying new scaffolds; [2] the limited accuracy of computational methods in predicting off-target effects. We recently introduced a 3D shape-based similarity algorithm of the SABRE program, which encodes a consensus molecular shape pattern of a set of active ligands into a 4D fingerprint descriptor. Here, we report a mathematical model for shape similarity comparisons and ligand database filtering using this 4D fingerprint method and benchmarked the scoring function HWK (Hamza–Wei–Korotkov), using the 81 targets of the DEKOIS database. Subsequently, we applied our combined 4D fingerprint and HWK scoring function VS approach in scaffold-hopping and drug repurposing using the National Cancer Institute (NCI) and Food and Drug Administration (FDA) databases, and we identified new inhibitors with different scaffolds of MycP1 protease from the mycobacterial ESX-1 secretion system. Experimental evaluation of nine compounds from the NCI database and three from the FDA database displayed IC50 values ranging from 70 to 100 μM against MycP1 and possessed high structural diversity, which provides departure points for further structure–activity relationship (SAR) optimization. In addition, this study demonstrates that the combination of our 4D fingerprint algorithm and the HWK scoring function may provide a means for identifying repurposed drugs for the treatment of infectious diseases and may be used in the drug-target profile strategy. PMID:25229183
Hamza, Adel; Wagner, Jonathan M; Wei, Ning-Ning; Kwiatkowski, Stefan; Zhan, Chang-Guo; Watt, David S; Korotkov, Konstantin V
2014-10-27
Two factors contribute to the inefficiency associated with screening pharmaceutical library collections as a means of identifying new drugs: [1] the limited success of virtual screening (VS) methods in identifying new scaffolds; [2] the limited accuracy of computational methods in predicting off-target effects. We recently introduced a 3D shape-based similarity algorithm of the SABRE program, which encodes a consensus molecular shape pattern of a set of active ligands into a 4D fingerprint descriptor. Here, we report a mathematical model for shape similarity comparisons and ligand database filtering using this 4D fingerprint method and benchmarked the scoring function HWK (Hamza-Wei-Korotkov), using the 81 targets of the DEKOIS database. Subsequently, we applied our combined 4D fingerprint and HWK scoring function VS approach in scaffold-hopping and drug repurposing using the National Cancer Institute (NCI) and Food and Drug Administration (FDA) databases, and we identified new inhibitors with different scaffolds of MycP1 protease from the mycobacterial ESX-1 secretion system. Experimental evaluation of nine compounds from the NCI database and three from the FDA database displayed IC50 values ranging from 70 to 100 μM against MycP1 and possessed high structural diversity, which provides departure points for further structure-activity relationship (SAR) optimization. In addition, this study demonstrates that the combination of our 4D fingerprint algorithm and the HWK scoring function may provide a means for identifying repurposed drugs for the treatment of infectious diseases and may be used in the drug-target profile strategy.
Tai, David; Fang, Jianwen
2012-08-27
The large sizes of today's chemical databases require efficient algorithms to perform similarity searches. It can be very time consuming to compare two large chemical databases. This paper seeks to build upon existing research efforts by describing a novel strategy for accelerating existing search algorithms for comparing large chemical collections. The quest for efficiency has focused on developing better indexing algorithms by creating heuristics for searching individual chemical against a chemical library by detecting and eliminating needless similarity calculations. For comparing two chemical collections, these algorithms simply execute searches for each chemical in the query set sequentially. The strategy presented in this paper achieves a speedup upon these algorithms by indexing the set of all query chemicals so redundant calculations that arise in the case of sequential searches are eliminated. We implement this novel algorithm by developing a similarity search program called Symmetric inDexing or SymDex. SymDex shows over a 232% maximum speedup compared to the state-of-the-art single query search algorithm over real data for various fingerprint lengths. Considerable speedup is even seen for batch searches where query set sizes are relatively small compared to typical database sizes. To the best of our knowledge, SymDex is the first search algorithm designed specifically for comparing chemical libraries. It can be adapted to most, if not all, existing indexing algorithms and shows potential for accelerating future similarity search algorithms for comparing chemical databases.
Photogrammetric fingerprint unwrapping
NASA Astrophysics Data System (ADS)
Paar, Gerhard; del Pilar Caballo Perucha, Maria; Bauer, Arnold; Nauschnegg, Bernhard
2008-04-01
Fingerprints are important biometric cues. Compared to conventional fingerprint sensors the use of contact-free stereoscopic image acquisition of the front-most finger segment has a set of advantages: Finger deformation is avoided, the entire relevant area for biometric use is covered, some technical aspects like sensor maintenance and cleaning are facilitated, and access to a three-dimensional reconstruction of the covered area is possible. We describe a photogrammetric workflow for nail-to-nail fingerprint reconstruction: A calibrated sensor setup with typically 5 cameras and dedicated illumination acquires adjacent stereo pairs. Using the silhouettes of the segmented finger a raw cylindrical model is generated. After preprocessing (shading correction, dust removal, lens distortion correction), each individual camera texture is projected onto the model. Image-to-image matching on these pseudo ortho images and dense 3D reconstruction obtains a textured cylindrical digital surface model with radial distances around the major axis and a grid size in the range of 25-50 µm. The model allows for objective fingerprint unwrapping and novel fingerprint matching algorithms since 3D relations between fingerprint features are available as additional cues. Moreover, covering the entire region with relevant fingerprint texture is particularly important for establishing a comprehensive forensic database. The workflow has been implemented in portable C and is ready for industrial exploitation. Further improvement issues are code optimization, unwrapping method, illumination strategy to avoid highlights and to improve the initial segmentation, and the comparison of the unwrapping result to conventional fingerprint acquisition technology.
Accurate Classification of RNA Structures Using Topological Fingerprints
Li, Kejie; Gribskov, Michael
2016-01-01
While RNAs are well known to possess complex structures, functionally similar RNAs often have little sequence similarity. While the exact size and spacing of base-paired regions vary, functionally similar RNAs have pronounced similarity in the arrangement, or topology, of base-paired stems. Furthermore, predicted RNA structures often lack pseudoknots (a crucial aspect of biological activity), and are only partially correct, or incomplete. A topological approach addresses all of these difficulties. In this work we describe each RNA structure as a graph that can be converted to a topological spectrum (RNA fingerprint). The set of subgraphs in an RNA structure, its RNA fingerprint, can be compared with the fingerprints of other RNA structures to identify and correctly classify functionally related RNAs. Topologically similar RNAs can be identified even when a large fraction, up to 30%, of the stems are omitted, indicating that highly accurate structures are not necessary. We investigate the performance of the RNA fingerprint approach on a set of eight highly curated RNA families, with diverse sizes and functions, containing pseudoknots, and with little sequence similarity–an especially difficult test set. In spite of the difficult test set, the RNA fingerprint approach is very successful (ROC AUC > 0.95). Due to the inclusion of pseudoknots, the RNA fingerprint approach both covers a wider range of possible structures than methods based only on secondary structure, and its tolerance for incomplete structures suggests that it can be applied even to predicted structures. Source code is freely available at https://github.rcac.purdue.edu/mgribsko/XIOS_RNA_fingerprint. PMID:27755571
Real-Time Food Authentication Using a Miniature Mass Spectrometer.
Gerbig, Stefanie; Neese, Stephan; Penner, Alexander; Spengler, Bernhard; Schulz, Sabine
2017-10-17
Food adulteration is a threat to public health and the economy. In order to determine food adulteration efficiently, rapid and easy-to-use on-site analytical methods are needed. In this study, a miniaturized mass spectrometer in combination with three ambient ionization methods was used for food authentication. The chemical fingerprints of three milk types, five fish species, and two coffee types were measured using electrospray ionization, desorption electrospray ionization, and low temperature plasma ionization. Minimum sample preparation was needed for the analysis of liquid and solid food samples. Mass spectrometric data was processed using the laboratory-built software MS food classifier, which allows for the definition of specific food profiles from reference data sets using multivariate statistical methods and the subsequent classification of unknown data. Applicability of the obtained mass spectrometric fingerprints for food authentication was evaluated using different data processing methods, leave-10%-out cross-validation, and real-time classification of new data. Classification accuracy of 100% was achieved for the differentiation of milk types and fish species, and a classification accuracy of 96.4% was achieved for coffee types in cross-validation experiments. Measurement of two milk mixtures yielded correct classification of >94%. For real-time classification, the accuracies were comparable. Functionality of the software program and its performance is described. Processing time for a reference data set and a newly acquired spectrum was found to be 12 s and 2 s, respectively. These proof-of-principle experiments show that the combination of a miniaturized mass spectrometer, ambient ionization, and statistical analysis is suitable for on-site real-time food authentication.
NASA Astrophysics Data System (ADS)
Khan, Asif; Ryoo, Chang-Kyung; Kim, Heung Soo
2017-04-01
This paper presents a comparative study of different classification algorithms for the classification of various types of inter-ply delaminations in smart composite laminates. Improved layerwise theory is used to model delamination at different interfaces along the thickness and longitudinal directions of the smart composite laminate. The input-output data obtained through surface bonded piezoelectric sensor and actuator is analyzed by the system identification algorithm to get the system parameters. The identified parameters for the healthy and delaminated structure are supplied as input data to the classification algorithms. The classification algorithms considered in this study are ZeroR, Classification via regression, Naïve Bayes, Multilayer Perceptron, Sequential Minimal Optimization, Multiclass-Classifier, and Decision tree (J48). The open source software of Waikato Environment for Knowledge Analysis (WEKA) is used to evaluate the classification performance of the classifiers mentioned above via 75-25 holdout and leave-one-sample-out cross-validation regarding classification accuracy, precision, recall, kappa statistic and ROC Area.
Cai, Yong; Li, Xiwen; Wang, Runmiao; Yang, Qing; Li, Peng; Hu, Hao
2016-01-01
Currently, the chemical fingerprint comparison and analysis is mainly based on professional equipment and software, it’s expensive and inconvenient. This study aims to integrate QR (Quick Response) code with quality data and mobile intelligent technology to develop a convenient query terminal for tracing quality in the whole industrial chain of TCM (traditional Chinese medicine). Three herbal medicines were randomly selected and their chemical two-dimensional barcode (2D) barcodes fingerprints were constructed. Smartphone application (APP) based on Android system was developed to read initial data of 2D chemical barcodes, and compared multiple fingerprints from different batches of same species or different species. It was demonstrated that there were no significant differences between original and scanned TCM chemical fingerprints. Meanwhile, different TCM chemical fingerprint QR codes could be rendered in the same coordinate and showed the differences very intuitively. To be able to distinguish the variations of chemical fingerprint more directly, linear interpolation angle cosine similarity algorithm (LIACSA) was proposed to get similarity ratio. This study showed that QR codes can be used as an effective information carrier to transfer quality data. Smartphone application can rapidly read quality information in QR codes and convert data into TCM chemical fingerprints. PMID:27780256
Node fingerprinting: an efficient heuristic for aligning biological networks.
Radu, Alex; Charleston, Michael
2014-10-01
With the continuing increase in availability of biological data and improvements to biological models, biological network analysis has become a promising area of research. An emerging technique for the analysis of biological networks is through network alignment. Network alignment has been used to calculate genetic distance, similarities between regulatory structures, and the effect of external forces on gene expression, and to depict conditional activity of expression modules in cancer. Network alignment is algorithmically complex, and therefore we must rely on heuristics, ideally as efficient and accurate as possible. The majority of current techniques for network alignment rely on precomputed information, such as with protein sequence alignment, or on tunable network alignment parameters, which may introduce an increased computational overhead. Our presented algorithm, which we call Node Fingerprinting (NF), is appropriate for performing global pairwise network alignment without precomputation or tuning, can be fully parallelized, and is able to quickly compute an accurate alignment between two biological networks. It has performed as well as or better than existing algorithms on biological and simulated data, and with fewer computational resources. The algorithmic validation performed demonstrates the low computational resource requirements of NF.
Blocked inverted indices for exact clustering of large chemical spaces.
Thiel, Philipp; Sach-Peltason, Lisa; Ottmann, Christian; Kohlbacher, Oliver
2014-09-22
The calculation of pairwise compound similarities based on fingerprints is one of the fundamental tasks in chemoinformatics. Methods for efficient calculation of compound similarities are of the utmost importance for various applications like similarity searching or library clustering. With the increasing size of public compound databases, exact clustering of these databases is desirable, but often computationally prohibitively expensive. We present an optimized inverted index algorithm for the calculation of all pairwise similarities on 2D fingerprints of a given data set. In contrast to other algorithms, it neither requires GPU computing nor yields a stochastic approximation of the clustering. The algorithm has been designed to work well with multicore architectures and shows excellent parallel speedup. As an application example of this algorithm, we implemented a deterministic clustering application, which has been designed to decompose virtual libraries comprising tens of millions of compounds in a short time on current hardware. Our results show that our implementation achieves more than 400 million Tanimoto similarity calculations per second on a common desktop CPU. Deterministic clustering of the available chemical space thus can be done on modern multicore machines within a few days.
Machine learning in materials informatics: recent applications and prospects
NASA Astrophysics Data System (ADS)
Ramprasad, Rampi; Batra, Rohit; Pilania, Ghanshyam; Mannodi-Kanakkithodi, Arun; Kim, Chiho
2017-12-01
Propelled partly by the Materials Genome Initiative, and partly by the algorithmic developments and the resounding successes of data-driven efforts in other domains, informatics strategies are beginning to take shape within materials science. These approaches lead to surrogate machine learning models that enable rapid predictions based purely on past data rather than by direct experimentation or by computations/simulations in which fundamental equations are explicitly solved. Data-centric informatics methods are becoming useful to determine material properties that are hard to measure or compute using traditional methods—due to the cost, time or effort involved—but for which reliable data either already exists or can be generated for at least a subset of the critical cases. Predictions are typically interpolative, involving fingerprinting a material numerically first, and then following a mapping (established via a learning algorithm) between the fingerprint and the property of interest. Fingerprints, also referred to as "descriptors", may be of many types and scales, as dictated by the application domain and needs. Predictions may also be extrapolative—extending into new materials spaces—provided prediction uncertainties are properly taken into account. This article attempts to provide an overview of some of the recent successful data-driven "materials informatics" strategies undertaken in the last decade, with particular emphasis on the fingerprint or descriptor choices. The review also identifies some challenges the community is facing and those that should be overcome in the near future.
Liu, Wen; Fu, Xiao; Deng, Zhongliang
2016-12-02
Indoor positioning technologies has boomed recently because of the growing commercial interest in indoor location-based service (ILBS). Due to the absence of satellite signal in Global Navigation Satellite System (GNSS), various technologies have been proposed for indoor applications. Among them, Wi-Fi fingerprinting has been attracting much interest from researchers because of its pervasive deployment, flexibility and robustness to dense cluttered indoor environments. One challenge, however, is the deployment of Access Points (AP), which would bring a significant influence on the system positioning accuracy. This paper concentrates on WLAN based fingerprinting indoor location by analyzing the AP deployment influence, and studying the advantages of coordinate-based clustering compared to traditional RSS-based clustering. A coordinate-based clustering method for indoor fingerprinting location, named Smallest-Enclosing-Circle-based (SEC), is then proposed aiming at reducing the positioning error lying in the AP deployment and improving robustness to dense cluttered environments. All measurements are conducted in indoor public areas, such as the National Center For the Performing Arts (as Test-bed 1) and the XiDan Joy City (Floors 1 and 2, as Test-bed 2), and results show that SEC clustering algorithm can improve system positioning accuracy by about 32.7% for Test-bed 1, 71.7% for Test-bed 2 Floor 1 and 73.7% for Test-bed 2 Floor 2 compared with traditional RSS-based clustering algorithms such as K-means.
Liu, Wen; Fu, Xiao; Deng, Zhongliang
2016-01-01
Indoor positioning technologies has boomed recently because of the growing commercial interest in indoor location-based service (ILBS). Due to the absence of satellite signal in Global Navigation Satellite System (GNSS), various technologies have been proposed for indoor applications. Among them, Wi-Fi fingerprinting has been attracting much interest from researchers because of its pervasive deployment, flexibility and robustness to dense cluttered indoor environments. One challenge, however, is the deployment of Access Points (AP), which would bring a significant influence on the system positioning accuracy. This paper concentrates on WLAN based fingerprinting indoor location by analyzing the AP deployment influence, and studying the advantages of coordinate-based clustering compared to traditional RSS-based clustering. A coordinate-based clustering method for indoor fingerprinting location, named Smallest-Enclosing-Circle-based (SEC), is then proposed aiming at reducing the positioning error lying in the AP deployment and improving robustness to dense cluttered environments. All measurements are conducted in indoor public areas, such as the National Center For the Performing Arts (as Test-bed 1) and the XiDan Joy City (Floors 1 and 2, as Test-bed 2), and results show that SEC clustering algorithm can improve system positioning accuracy by about 32.7% for Test-bed 1, 71.7% for Test-bed 2 Floor 1 and 73.7% for Test-bed 2 Floor 2 compared with traditional RSS-based clustering algorithms such as K-means. PMID:27918454
Indoor localization using FM radio and DTMB signals
NASA Astrophysics Data System (ADS)
Wu, H.; Wang, Q.; Zhao, Y.; Ma, X.; Yang, M.; Liu, B.; Tang, R.; Xu, X.
2016-07-01
Indoor localization systems based on Wi-Fi signal strength fingerprinting techniques are widely used in office buildings. However, a general problem of these systems pertains to Wi-Fi signal degradation due to the environmental factors. And also, these systems cannot be used in the environments not covered with Wi-Fi signals or the environments with only a single Wi-Fi access point. In this paper, a new indoor location fingerprinting system using both FM radio and Digital Television Terrestrial Multimedia Broadcasting (DTMB) signals is proposed. First, the indoor location fingerprinting using FM radio and DTMB signals is theoretically analyzed to confirm its feasibility. Then, a specially designed combined strength fingerprinting location algorithm is proposed for the location system, which is achieved on the USRP2 platform. Finally, the system is tested in a typical indoor environment. The theoretical analysis and the tests show that the indoor location fingerprinting system using FM radio and DTMB signals has a similar localization accuracy to the Wi-Fi signal strength fingerprinting location system, while it has a wider coverage area, a lower maintenance cost, and more stable signal strength, which makes it a practical indoor positioning method.
Method for combined biometric and chemical analysis of human fingerprints.
Staymates, Jessica L; Orandi, Shahram; Staymates, Matthew E; Gillen, Greg
This paper describes a method for combining direct chemical analysis of latent fingerprints with subsequent biometric analysis within a single sample. The method described here uses ion mobility spectrometry (IMS) as a chemical detection method for explosives and narcotics trace contamination. A collection swab coated with a high-temperature adhesive has been developed to lift latent fingerprints from various surfaces. The swab is then directly inserted into an IMS instrument for a quick chemical analysis. After the IMS analysis, the lifted print remains intact for subsequent biometric scanning and analysis using matching algorithms. Several samples of explosive-laden fingerprints were successfully lifted and the explosives detected with IMS. Following explosive detection, the lifted fingerprints remained of sufficient quality for positive match scores using a prepared gallery consisting of 60 fingerprints. Based on our results ( n = 1200), there was no significant decrease in the quality of the lifted print post IMS analysis. In fact, for a small subset of lifted prints, the quality was improved after IMS analysis. The described method can be readily applied to domestic criminal investigations, transportation security, terrorist and bombing threats, and military in-theatre settings.
Hierarchical trie packet classification algorithm based on expectation-maximization clustering.
Bi, Xia-An; Zhao, Junxia
2017-01-01
With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm.
Wavelet/scalar quantization compression standard for fingerprint images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brislawn, C.M.
1996-06-12
US Federal Bureau of Investigation (FBI) has recently formulated a national standard for digitization and compression of gray-scale fingerprint images. Fingerprints are scanned at a spatial resolution of 500 dots per inch, with 8 bits of gray-scale resolution. The compression algorithm for the resulting digital images is based on adaptive uniform scalar quantization of a discrete wavelet transform subband decomposition (wavelet/scalar quantization method). The FBI standard produces archival-quality images at compression ratios of around 15 to 1 and will allow the current database of paper fingerprint cards to be replaced by digital imagery. The compression standard specifies a class ofmore » potential encoders and a universal decoder with sufficient generality to reconstruct compressed images produced by any compliant encoder, allowing flexibility for future improvements in encoder technology. A compliance testing program is also being implemented to ensure high standards of image quality and interchangeability of data between different implementations.« less
NASA Astrophysics Data System (ADS)
Sindt, Nathan M.; Robison, Faith; Brick, Mark A.; Schwartz, Howard F.; Heuberger, Adam L.; Prenni, Jessica E.
2018-02-01
Matrix-assisted desorption/ionization time of flight mass spectrometry (MALDI-TOF-MS) is a fast and effective tool for microbial species identification. However, current approaches are limited to species-level identification even when genetic differences are known. Here, we present a novel workflow that applies the statistical method of partial least squares discriminant analysis (PLS-DA) to MALDI-TOF-MS protein fingerprint data of Xanthomonas axonopodis, an important bacterial plant pathogen of fruit and vegetable crops. Mass spectra of 32 X. axonopodis strains were used to create a mass spectral library and PLS-DA was employed to model the closely related strains. A robust workflow was designed to optimize the PLS-DA model by assessing the model performance over a range of signal-to-noise ratios (s/n) and mass filter (MF) thresholds. The optimized parameters were observed to be s/n = 3 and MF = 0.7. The model correctly classified 83% of spectra withheld from the model as a test set. A new decision rule was developed, termed the rolled-up Maximum Decision Rule (ruMDR), and this method improved identification rates to 92%. These results demonstrate that MALDI-TOF-MS protein fingerprints of bacterial isolates can be utilized to enable identification at the strain level. Furthermore, the open-source framework of this workflow allows for broad implementation across various instrument platforms as well as integration with alternative modeling and classification algorithms.
Mothers Consistently Alter Their Unique Vocal Fingerprints When Communicating with Infants.
Piazza, Elise A; Iordan, Marius Cătălin; Lew-Williams, Casey
2017-10-23
The voice is the most direct link we have to others' minds, allowing us to communicate using a rich variety of speech cues [1, 2]. This link is particularly critical early in life as parents draw infants into the structure of their environment using infant-directed speech (IDS), a communicative code with unique pitch and rhythmic characteristics relative to adult-directed speech (ADS) [3, 4]. To begin breaking into language, infants must discern subtle statistical differences about people and voices in order to direct their attention toward the most relevant signals. Here, we uncover a new defining feature of IDS: mothers significantly alter statistical properties of vocal timbre when speaking to their infants. Timbre, the tone color or unique quality of a sound, is a spectral fingerprint that helps us instantly identify and classify sound sources, such as individual people and musical instruments [5-7]. We recorded 24 mothers' naturalistic speech while they interacted with their infants and with adult experimenters in their native language. Half of the participants were English speakers, and half were not. Using a support vector machine classifier, we found that mothers consistently shifted their timbre between ADS and IDS. Importantly, this shift was similar across languages, suggesting that such alterations of timbre may be universal. These findings have theoretical implications for understanding how infants tune in to their local communicative environments. Moreover, our classification algorithm for identifying infant-directed timbre has direct translational implications for speech recognition technology. Copyright © 2017 Elsevier Ltd. All rights reserved.
Xu, Ning; Zhou, Guofu; Li, Xiaojuan; Lu, Heng; Meng, Fanyun; Zhai, Huaqiang
2017-05-01
A reliable and comprehensive method for identifying the origin and assessing the quality of Epimedium has been developed. The method is based on analysis of HPLC fingerprints, combined with similarity analysis, hierarchical cluster analysis (HCA), principal component analysis (PCA) and multi-ingredient quantitative analysis. Nineteen batches of Epimedium, collected from different areas in the western regions of China, were used to establish the fingerprints and 18 peaks were selected for the analysis. Similarity analysis, HCA and PCA all classified the 19 areas into three groups. Simultaneous quantification of the five major bioactive ingredients in the Epimedium samples was also carried out to confirm the consistency of the quality tests. These methods were successfully used to identify the geographical origin of the Epimedium samples and to evaluate their quality. Copyright © 2016 John Wiley & Sons, Ltd.
Georgouli, Konstantia; Martinez Del Rincon, Jesus; Koidis, Anastasios
2017-02-15
The main objective of this work was to develop a novel dimensionality reduction technique as a part of an integrated pattern recognition solution capable of identifying adulterants such as hazelnut oil in extra virgin olive oil at low percentages based on spectroscopic chemical fingerprints. A novel Continuous Locality Preserving Projections (CLPP) technique is proposed which allows the modelling of the continuous nature of the produced in-house admixtures as data series instead of discrete points. The maintenance of the continuous structure of the data manifold enables the better visualisation of this examined classification problem and facilitates the more accurate utilisation of the manifold for detecting the adulterants. The performance of the proposed technique is validated with two different spectroscopic techniques (Raman and Fourier transform infrared, FT-IR). In all cases studied, CLPP accompanied by k-Nearest Neighbors (kNN) algorithm was found to outperform any other state-of-the-art pattern recognition techniques. Copyright © 2016 Elsevier Ltd. All rights reserved.
Machine Learning Algorithms for Automatic Classification of Marmoset Vocalizations
Ribeiro, Sidarta; Pereira, Danillo R.; Papa, João P.; de Albuquerque, Victor Hugo C.
2016-01-01
Automatic classification of vocalization type could potentially become a useful tool for acoustic the monitoring of captive colonies of highly vocal primates. However, for classification to be useful in practice, a reliable algorithm that can be successfully trained on small datasets is necessary. In this work, we consider seven different classification algorithms with the goal of finding a robust classifier that can be successfully trained on small datasets. We found good classification performance (accuracy > 0.83 and F1-score > 0.84) using the Optimum Path Forest classifier. Dataset and algorithms are made publicly available. PMID:27654941
Ihmaid, Saleh K; Ahmed, Hany E A; Zayed, Mohamed F; Abadleh, Mohammed M
2016-01-30
The main step in a successful drug discovery pipeline is the identification of small potent compounds that selectively bind to the target of interest with high affinity. However, there is still a shortage of efficient and accurate computational methods with powerful capability to study and hence predict compound selectivity properties. In this work, we propose an affordable machine learning method to perform compound selectivity classification and prediction. For this purpose, we have collected compounds with reported activity and built a selectivity database formed of 153 cathepsin K and S inhibitors that are considered of medicinal interest. This database has three compound sets, two K/S and S/K selective ones and one non-selective KS one. We have subjected this database to the selectivity classification tool 'Emergent Self-Organizing Maps' for exploring its capability to differentiate selective cathepsin inhibitors for one target over the other. The method exhibited good clustering performance for selective ligands with high accuracy (up to 100 %). Among the possibilites, BAPs and MACCS molecular structural fingerprints were used for such a classification. The results exhibited the ability of the method for structure-selectivity relationship interpretation and selectivity markers were identified for the design of further novel inhibitors with high activity and target selectivity.
Improving oil classification quality from oil spill fingerprint beyond six sigma approach.
Juahir, Hafizan; Ismail, Azimah; Mohamed, Saiful Bahri; Toriman, Mohd Ekhwan; Kassim, Azlina Md; Zain, Sharifuddin Md; Ahmad, Wan Kamaruzaman Wan; Wah, Wong Kok; Zali, Munirah Abdul; Retnam, Ananthy; Taib, Mohd Zaki Mohd; Mokhtar, Mazlin
2017-07-15
This study involves the use of quality engineering in oil spill classification based on oil spill fingerprinting from GC-FID and GC-MS employing the six-sigma approach. The oil spills are recovered from various water areas of Peninsular Malaysia and Sabah (East Malaysia). The study approach used six sigma methodologies that effectively serve as the problem solving in oil classification extracted from the complex mixtures of oil spilled dataset. The analysis of six sigma link with the quality engineering improved the organizational performance to achieve its objectivity of the environmental forensics. The study reveals that oil spills are discriminated into four groups' viz. diesel, hydrocarbon fuel oil (HFO), mixture oil lubricant and fuel oil (MOLFO) and waste oil (WO) according to the similarity of the intrinsic chemical properties. Through the validation, it confirmed that four discriminant component, diesel, hydrocarbon fuel oil (HFO), mixture oil lubricant and fuel oil (MOLFO) and waste oil (WO) dominate the oil types with a total variance of 99.51% with ANOVA giving F stat >F critical at 95% confidence level and a Chi Square goodness test of 74.87. Results obtained from this study reveals that by employing six-sigma approach in a data-driven problem such as in the case of oil spill classification, good decision making can be expedited. Copyright © 2017. Published by Elsevier Ltd.
SVD compression for magnetic resonance fingerprinting in the time domain.
McGivney, Debra F; Pierre, Eric; Ma, Dan; Jiang, Yun; Saybasili, Haris; Gulani, Vikas; Griswold, Mark A
2014-12-01
Magnetic resonance (MR) fingerprinting is a technique for acquiring and processing MR data that simultaneously provides quantitative maps of different tissue parameters through a pattern recognition algorithm. A predefined dictionary models the possible signal evolutions simulated using the Bloch equations with different combinations of various MR parameters and pattern recognition is completed by computing the inner product between the observed signal and each of the predicted signals within the dictionary. Though this matching algorithm has been shown to accurately predict the MR parameters of interest, one desires a more efficient method to obtain the quantitative images. We propose to compress the dictionary using the singular value decomposition, which will provide a low-rank approximation. By compressing the size of the dictionary in the time domain, we are able to speed up the pattern recognition algorithm, by a factor of between 3.4-4.8, without sacrificing the high signal-to-noise ratio of the original scheme presented previously.
SVD Compression for Magnetic Resonance Fingerprinting in the Time Domain
McGivney, Debra F.; Pierre, Eric; Ma, Dan; Jiang, Yun; Saybasili, Haris; Gulani, Vikas; Griswold, Mark A.
2016-01-01
Magnetic resonance fingerprinting is a technique for acquiring and processing MR data that simultaneously provides quantitative maps of different tissue parameters through a pattern recognition algorithm. A predefined dictionary models the possible signal evolutions simulated using the Bloch equations with different combinations of various MR parameters and pattern recognition is completed by computing the inner product between the observed signal and each of the predicted signals within the dictionary. Though this matching algorithm has been shown to accurately predict the MR parameters of interest, one desires a more efficient method to obtain the quantitative images. We propose to compress the dictionary using the singular value decomposition (SVD), which will provide a low-rank approximation. By compressing the size of the dictionary in the time domain, we are able to speed up the pattern recognition algorithm, by a factor of between 3.4-4.8, without sacrificing the high signal-to-noise ratio of the original scheme presented previously. PMID:25029380
An Indoor Positioning System Based on Wearables for Ambient-Assisted Living.
Belmonte-Fernández, Óscar; Puertas-Cabedo, Adrian; Torres-Sospedra, Joaquín; Montoliu-Colás, Raúl; Trilles-Oliver, Sergi
2016-12-25
The urban population is growing at such a rate that by 2050 it is estimated that 84% of the world's population will live in cities, with flats being the most common living place. Moreover, WiFi technology is present in most developed country urban areas, with a quick growth in developing countries. New Ambient-Assisted Living applications will be developed in the near future having user positioning as ground technology: elderly tele-care, energy consumption, security and the like are strongly based on indoor positioning information. We present an Indoor Positioning System for wearable devices based on WiFi fingerprinting. Smart-watch wearable devices are used to acquire the WiFi strength signals of the surrounding Wireless Access Points used to build an ensemble of Machine Learning classification algorithms. Once built, the ensemble algorithm is used to locate a user based on the WiFi strength signals provided by the wearable device. Experimental results for five different urban flats are reported, showing that the system is robust and reliable enough for locating a user at room level into his/her home. Another interesting characteristic of the presented system is that it does not require deployment of any infrastructure, and it is unobtrusive, the only device required for it to work is a smart-watch.
An Indoor Positioning System Based on Wearables for Ambient-Assisted Living
Belmonte-Fernández, Óscar; Puertas-Cabedo, Adrian; Torres-Sospedra, Joaquín; Montoliu-Colás, Raúl; Trilles-Oliver, Sergi
2016-01-01
The urban population is growing at such a rate that by 2050 it is estimated that 84% of the world’s population will live in cities, with flats being the most common living place. Moreover, WiFi technology is present in most developed country urban areas, with a quick growth in developing countries. New Ambient-Assisted Living applications will be developed in the near future having user positioning as ground technology: elderly tele-care, energy consumption, security and the like are strongly based on indoor positioning information. We present an indoor positioning system for wearable devices based on WiFi fingerprinting. Smart-watch wearable devices are used to acquire the WiFi strength signals of the surrounding Wireless Access Points used to build an ensemble of Machine Learning classification algorithms. Once built, the ensemble algorithm is used to locate a user based on the WiFi strength signals provided by the wearable device. Experimental results for five different urban flats are reported, showing that the system is robust and reliable enough for locating a user at room level into his/her home. Another interesting characteristic of the presented system is that it does not require deployment of any infrastructure, and it is unobtrusive, the only device required for it to work is a smart-watch. PMID:28029142
Contactless optical scanning of fingerprints with 180 degrees view.
Palma, J; Liessner, C; Mil'shtein, S
2006-01-01
Fingerprint recognition technology is an integral part of criminal investigations. It is the basis for the design of numerous security systems in both the private and public sectors. In a recent study emulating the fingerprinting procedure with widely used optical scanners, it was found that, on average, the distance between ridges decreases about 20% when a finger is positioned on a scanner. Using calibrated silicon pressure sensors, the authors scanned the distribution of pressure across a finger, pixel by pixel, and also generated maps of the average pressure distribution during fingerprinting. Controlled loading of a finger demonstrated that it is impossible to reproduce the same distribution of pressure across a given finger during repeated fingerprinting procedures. Based on this study, a novel method of scanning the fingerprint with more than a 180 degrees view was developed. Using a camera rotated around the finger, small slices of the entire image of the finger were acquired. Equal sized slices of the image were processed with a special program assembling a more than 180 degrees view of the finger. Comparison of two images of the same fingerprint, namely the registered and actual images, could be performed by a new algorithm based on the symmetry of the correlation function. The novel method is the first contactless optical scanning technique to view 180 degrees of a fingerprint without moving the finger. In a machine which is under design, it is expected that the full view of one finger would be acquired in about a second.
Hierarchical trie packet classification algorithm based on expectation-maximization clustering
Bi, Xia-an; Zhao, Junxia
2017-01-01
With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm. PMID:28704476
The generalization ability of online SVM classification based on Markov sampling.
Xu, Jie; Yan Tang, Yuan; Zou, Bin; Xu, Zongben; Li, Luoqing; Lu, Yang
2015-03-01
In this paper, we consider online support vector machine (SVM) classification learning algorithms with uniformly ergodic Markov chain (u.e.M.c.) samples. We establish the bound on the misclassification error of an online SVM classification algorithm with u.e.M.c. samples based on reproducing kernel Hilbert spaces and obtain a satisfactory convergence rate. We also introduce a novel online SVM classification algorithm based on Markov sampling, and present the numerical studies on the learning ability of online SVM classification based on Markov sampling for benchmark repository. The numerical studies show that the learning performance of the online SVM classification algorithm based on Markov sampling is better than that of classical online SVM classification based on random sampling as the size of training samples is larger.
Robust spike classification based on frequency domain neural waveform features.
Yang, Chenhui; Yuan, Yuan; Si, Jennie
2013-12-01
We introduce a new spike classification algorithm based on frequency domain features of the spike snippets. The goal for the algorithm is to provide high classification accuracy, low false misclassification, ease of implementation, robustness to signal degradation, and objectivity in classification outcomes. In this paper, we propose a spike classification algorithm based on frequency domain features (CFDF). It makes use of frequency domain contents of the recorded neural waveforms for spike classification. The self-organizing map (SOM) is used as a tool to determine the cluster number intuitively and directly by viewing the SOM output map. After that, spike classification can be easily performed using clustering algorithms such as the k-Means. In conjunction with our previously developed multiscale correlation of wavelet coefficient (MCWC) spike detection algorithm, we show that the MCWC and CFDF detection and classification system is robust when tested on several sets of artificial and real neural waveforms. The CFDF is comparable to or outperforms some popular automatic spike classification algorithms with artificial and real neural data. The detection and classification of neural action potentials or neural spikes is an important step in single-unit-based neuroscientific studies and applications. After the detection of neural snippets potentially containing neural spikes, a robust classification algorithm is applied for the analysis of the snippets to (1) extract similar waveforms into one class for them to be considered coming from one unit, and to (2) remove noise snippets if they do not contain any features of an action potential. Usually, a snippet is a small 2 or 3 ms segment of the recorded waveform, and differences in neural action potentials can be subtle from one unit to another. Therefore, a robust, high performance classification system like the CFDF is necessary. In addition, the proposed algorithm does not require any assumptions on statistical properties of the noise and proves to be robust under noise contamination.
Voice based gender classification using machine learning
NASA Astrophysics Data System (ADS)
Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.
2017-11-01
Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.
A semi-supervised classification algorithm using the TAD-derived background as training data
NASA Astrophysics Data System (ADS)
Fan, Lei; Ambeau, Brittany; Messinger, David W.
2013-05-01
In general, spectral image classification algorithms fall into one of two categories: supervised and unsupervised. In unsupervised approaches, the algorithm automatically identifies clusters in the data without a priori information about those clusters (except perhaps the expected number of them). Supervised approaches require an analyst to identify training data to learn the characteristics of the clusters such that they can then classify all other pixels into one of the pre-defined groups. The classification algorithm presented here is a semi-supervised approach based on the Topological Anomaly Detection (TAD) algorithm. The TAD algorithm defines background components based on a mutual k-Nearest Neighbor graph model of the data, along with a spectral connected components analysis. Here, the largest components produced by TAD are used as regions of interest (ROI's),or training data for a supervised classification scheme. By combining those ROI's with a Gaussian Maximum Likelihood (GML) or a Minimum Distance to the Mean (MDM) algorithm, we are able to achieve a semi supervised classification method. We test this classification algorithm against data collected by the HyMAP sensor over the Cooke City, MT area and University of Pavia scene.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bolme, David S; Tokola, Ryan A; Boehnen, Chris Bensing
Automatic recognition systems are a valuable tool for identifying unknown deceased individuals. Immediately af- ter death fingerprint and face biometric samples are easy to collect using standard sensors and cameras and can be easily matched to anti-mortem biometric samples. Even though post-mortem fingerprints and faces have been used for decades, there are no studies that track these biomet- rics through the later stages of decomposition to determine the length of time the biometrics remain viable. This paper discusses a multimodal dataset of fingerprints, faces, and irises from 14 human cadavers that decomposed outdoors under natural conditions. Results include predictive modelsmore » relating time and temperature, measured as Accumulated Degree Days (ADD), and season (winter, spring, summer) to the predicted probably of automatic verification using a commercial algorithm.« less
Minutia Tensor Matrix: A New Strategy for Fingerprint Matching
Fu, Xiang; Feng, Jufu
2015-01-01
Establishing correspondences between two minutia sets is a fundamental issue in fingerprint recognition. This paper proposes a new tensor matching strategy. First, the concept of minutia tensor matrix (simplified as MTM) is proposed. It describes the first-order features and second-order features of a matching pair. In the MTM, the diagonal elements indicate similarities of minutia pairs and non-diagonal elements indicate pairwise compatibilities between minutia pairs. Correct minutia pairs are likely to establish both large similarities and large compatibilities, so they form a dense sub-block. Minutia matching is then formulated as recovering the dense sub-block in the MTM. This is a new tensor matching strategy for fingerprint recognition. Second, as fingerprint images show both local rigidity and global nonlinearity, we design two different kinds of MTMs: local MTM and global MTM. Meanwhile, a two-level matching algorithm is proposed. For local matching level, the local MTM is constructed and a novel local similarity calculation strategy is proposed. It makes full use of local rigidity in fingerprints. For global matching level, the global MTM is constructed to calculate similarities of entire minutia sets. It makes full use of global compatibility in fingerprints. Proposed method has stronger description ability and better robustness to noise and nonlinearity. Experiments conducted on Fingerprint Verification Competition databases (FVC2002 and FVC2004) demonstrate the effectiveness and the efficiency. PMID:25822489
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buckner, Mark A; Bobrek, Miljko; Farquhar, Ethan
Wireless Access Points (WAP) remain one of the top 10 network security threats. This research is part of an effort to develop a physical (PHY) layer aware Radio Frequency (RF) air monitoring system with multi-factor authentication to provide a first-line of defense for network security--stopping attackers before they can gain access to critical infrastructure networks through vulnerable WAPs. This paper presents early results on the identification of OFDM-based 802.11a WiFi devices using RF Distinct Native Attribute (RF-DNA) fingerprints produced by the Fractional Fourier Transform (FRFT). These fingerprints are input to a "Learning from Signals" (LFS) classifier which uses hybrid Differentialmore » Evolution/Conjugate Gradient (DECG) optimization to determine the optimal features for a low-rank model to be used for future predictions. Results are presented for devices under the most challenging conditions of intra-manufacturer classification, i.e., same-manufacturer, same-model, differing only in serial number. The results of Fractional Fourier Domain (FRFD) RF-DNA fingerprints demonstrate significant improvement over results based on Time Domain (TD), Spectral Domain (SD) and even Wavelet Domain (WD) fingerprints.« less
Lu, Huijuan; Wei, Shasha; Zhou, Zili; Miao, Yanzi; Lu, Yi
2015-01-01
The main purpose of traditional classification algorithms on bioinformatics application is to acquire better classification accuracy. However, these algorithms cannot meet the requirement that minimises the average misclassification cost. In this paper, a new algorithm of cost-sensitive regularised extreme learning machine (CS-RELM) was proposed by using probability estimation and misclassification cost to reconstruct the classification results. By improving the classification accuracy of a group of small sample which higher misclassification cost, the new CS-RELM can minimise the classification cost. The 'rejection cost' was integrated into CS-RELM algorithm to further reduce the average misclassification cost. By using Colon Tumour dataset and SRBCT (Small Round Blue Cells Tumour) dataset, CS-RELM was compared with other cost-sensitive algorithms such as extreme learning machine (ELM), cost-sensitive extreme learning machine, regularised extreme learning machine, cost-sensitive support vector machine (SVM). The results of experiments show that CS-RELM with embedded rejection cost could reduce the average cost of misclassification and made more credible classification decision than others.
Law, Wai Siang; Chen, Huan Wen; Balabin, Roman; Berchtold, Christian; Meier, Lukas; Zenobi, Renato
2010-04-01
Microjet sampling in combination with extractive electrospray ionization (EESI) mass spectrometry (MS) was applied to the rapid characterization and classification of extra virgin olive oil (EVOO) without any sample pretreatment. When modifying the composition of the primary ESI spray solvent, mass spectra of an identical EVOO sample showed differences. This demonstrates the capability of this technique to extract molecules with varying polarities, hence generating rich molecular information of the EVOO. Moreover, with the aid of microjet sampling, compounds of different volatilities (e.g.E-2-hexenal, trans-trans-2,4-heptadienal, tyrosol and caffeic acid) could be sampled simultaneously. EVOO data was also compared with that of other edible oils. Principal Component Analysis (PCA) was performed to discriminate EVOO and EVOO adulterated with edible oils. Microjet sampling EESI-MS was found to be a simple, rapid (less than 2 min analysis time per sample) and powerful method to obtain MS fingerprints of EVOO without requiring any complicated sample pretreatment steps.
Sexual dimorphism in finger ridge breadth measurements: a tool for sex estimation from fingerprints.
Mundorff, Amy Z; Bartelink, Eric J; Murad, Turhon A
2014-07-01
Previous research has demonstrated significant sexual dimorphism in friction ridge skin characteristics. This study uses a novel method for measuring sexual dimorphism in finger ridge breadths to evaluate its utility as a sex estimation method from an unknown fingerprint. Beginning and ending in a valley, the width of ten parallel ridges with no obstructions or minutia was measured in a sample of 250 males and females (N = 500). The results demonstrate statistically significant differences in ridge breadth between males and females (p < 0.001), with classification accuracy for each digit varying from 83.2% to 89.3%. Classification accuracy for the pooled finger samples was 83.9% for the right hand and 86.2% for the left hand, which is applicable for cases where the digit number cannot be determined. Weight, stature, and to a lesser degree body mass index also significantly correlate with ridge breadth and account for the degree of overlap between males and females. © 2014 American Academy of Forensic Sciences.
Gao, Xin; Yang, Xiu-Wei; Marriott, Philip J
2013-11-01
Coptidis Rhizoma-Euodiae Fructus couple (CEC) is a classic traditional Chinese medicine preparation consisting of Coptidis Rhizoma and Euodiae Fructus at the ratio of 6:1, and used to treat gastro-intestinal disorders. Alkaloids are the main bioactive component. This research provides comprehensive analysis information for the quality control of CEC. To develop a high-performance liquid chromatography-diode array detection fingerprint for chemical composition characteristics of CEC and its products. The samples were separated with a Gemini C18 column by using gradient elution with water-formic acid (100:0.03) and acetonitrile as mobile phase. Flow rate was 1.0 mL/min and detection wavelength was 250 nm. Similarity analysis and principal component analysis (PCA) were employed to evaluate quality consistencies of analytes. Mean chromatograms and correlation coefficients of analytes were calculated by the software "Similarity Evaluation System for Chromatographic Fingerprint of Traditional Chinese Medicine". Fingerprint chromatogram comparison determined 20 representative general fingerprint peaks, and the fingerprint chromatogram resemblances are all better than 0.988. Consistent results were obtained to show that CEC and its related samples could be successfully divided into three groups. Contribution plots generated by PCA were performed to interpret differences among the sample groups while peaks which significantly contributed to classification were identified. Seven bioactive constituents in the samples were verified by quantitative analysis. The chromatographic fingerprint with similarity evaluation and PCA assay combined with quantification of seven compounds could be utilized as a quality control method for the herbal couple.
A spectrum fractal feature classification algorithm for agriculture crops with hyper spectrum image
NASA Astrophysics Data System (ADS)
Su, Junying
2011-11-01
A fractal dimension feature analysis method in spectrum domain for hyper spectrum image is proposed for agriculture crops classification. Firstly, a fractal dimension calculation algorithm in spectrum domain is presented together with the fast fractal dimension value calculation algorithm using the step measurement method. Secondly, the hyper spectrum image classification algorithm and flowchart is presented based on fractal dimension feature analysis in spectrum domain. Finally, the experiment result of the agricultural crops classification with FCL1 hyper spectrum image set with the proposed method and SAM (spectral angle mapper). The experiment results show it can obtain better classification result than the traditional SAM feature analysis which can fulfill use the spectrum information of hyper spectrum image to realize precision agricultural crops classification.
Image-classification-based global dimming algorithm for LED backlights in LCDs
NASA Astrophysics Data System (ADS)
Qibin, Feng; Huijie, He; Dong, Han; Lei, Zhang; Guoqiang, Lv
2015-07-01
Backlight dimming can help LCDs reduce power consumption and improve CR. With fixed parameters, dimming algorithm cannot achieve satisfied effects for all kinds of images. The paper introduces an image-classification-based global dimming algorithm. The proposed classification method especially for backlight dimming is based on luminance and CR of input images. The parameters for backlight dimming level and pixel compensation are adaptive with image classifications. The simulation results show that the classification based dimming algorithm presents 86.13% power reduction improvement compared with dimming without classification, with almost same display quality. The prototype is developed. There are no perceived distortions when playing videos. The practical average power reduction of the prototype TV is 18.72%, compared with common TV without dimming.
An Efficient Optimization Method for Solving Unsupervised Data Classification Problems.
Shabanzadeh, Parvaneh; Yusof, Rubiyah
2015-01-01
Unsupervised data classification (or clustering) analysis is one of the most useful tools and a descriptive task in data mining that seeks to classify homogeneous groups of objects based on similarity and is used in many medical disciplines and various applications. In general, there is no single algorithm that is suitable for all types of data, conditions, and applications. Each algorithm has its own advantages, limitations, and deficiencies. Hence, research for novel and effective approaches for unsupervised data classification is still active. In this paper a heuristic algorithm, Biogeography-Based Optimization (BBO) algorithm, was adapted for data clustering problems by modifying the main operators of BBO algorithm, which is inspired from the natural biogeography distribution of different species. Similar to other population-based algorithms, BBO algorithm starts with an initial population of candidate solutions to an optimization problem and an objective function that is calculated for them. To evaluate the performance of the proposed algorithm assessment was carried on six medical and real life datasets and was compared with eight well known and recent unsupervised data classification algorithms. Numerical results demonstrate that the proposed evolutionary optimization algorithm is efficient for unsupervised data classification.
Gradient Evolution-based Support Vector Machine Algorithm for Classification
NASA Astrophysics Data System (ADS)
Zulvia, Ferani E.; Kuo, R. J.
2018-03-01
This paper proposes a classification algorithm based on a support vector machine (SVM) and gradient evolution (GE) algorithms. SVM algorithm has been widely used in classification. However, its result is significantly influenced by the parameters. Therefore, this paper aims to propose an improvement of SVM algorithm which can find the best SVMs’ parameters automatically. The proposed algorithm employs a GE algorithm to automatically determine the SVMs’ parameters. The GE algorithm takes a role as a global optimizer in finding the best parameter which will be used by SVM algorithm. The proposed GE-SVM algorithm is verified using some benchmark datasets and compared with other metaheuristic-based SVM algorithms. The experimental results show that the proposed GE-SVM algorithm obtains better results than other algorithms tested in this paper.
Comparison analysis for classification algorithm in data mining and the study of model use
NASA Astrophysics Data System (ADS)
Chen, Junde; Zhang, Defu
2018-04-01
As a key technique in data mining, classification algorithm was received extensive attention. Through an experiment of classification algorithm in UCI data set, we gave a comparison analysis method for the different algorithms and the statistical test was used here. Than that, an adaptive diagnosis model for preventive electricity stealing and leakage was given as a specific case in the paper.
Chemometric techniques in oil classification from oil spill fingerprinting.
Ismail, Azimah; Toriman, Mohd Ekhwan; Juahir, Hafizan; Kassim, Azlina Md; Zain, Sharifuddin Md; Ahmad, Wan Kamaruzaman Wan; Wong, Kok Fah; Retnam, Ananthy; Zali, Munirah Abdul; Mokhtar, Mazlin; Yusri, Mohd Ayub
2016-10-15
Extended use of GC-FID and GC-MS in oil spill fingerprinting and matching is significantly important for oil classification from the oil spill sources collected from various areas of Peninsular Malaysia and Sabah (East Malaysia). Oil spill fingerprinting from GC-FID and GC-MS coupled with chemometric techniques (discriminant analysis and principal component analysis) is used as a diagnostic tool to classify the types of oil polluting the water. Clustering and discrimination of oil spill compounds in the water from the actual site of oil spill events are divided into four groups viz. diesel, Heavy Fuel Oil (HFO), Mixture Oil containing Light Fuel Oil (MOLFO) and Waste Oil (WO) according to the similarity of their intrinsic chemical properties. Principal component analysis (PCA) demonstrates that diesel, HFO, MOLFO and WO are types of oil or oil products from complex oil mixtures with a total variance of 85.34% and are identified with various anthropogenic activities related to either intentional releasing of oil or accidental discharge of oil into the environment. Our results show that the use of chemometric techniques is significant in providing independent validation for classifying the types of spilled oil in the investigation of oil spill pollution in Malaysia. This, in consequence would result in cost and time saving in identification of the oil spill sources. Copyright © 2016. Published by Elsevier Ltd.
NASA Astrophysics Data System (ADS)
Ciany, Charles M.; Zurawski, William; Kerfoot, Ian
2001-10-01
The performance of Computer Aided Detection/Computer Aided Classification (CAD/CAC) Fusion algorithms on side-scan sonar images was evaluated using data taken at the Navy's's Fleet Battle Exercise-Hotel held in Panama City, Florida, in August 2000. A 2-of-3 binary fusion algorithm is shown to provide robust performance. The algorithm accepts the classification decisions and associated contact locations form three different CAD/CAC algorithms, clusters the contacts based on Euclidian distance, and then declares a valid target when a clustered contact is declared by at least 2 of the 3 individual algorithms. This simple binary fusion provided a 96 percent probability of correct classification at a false alarm rate of 0.14 false alarms per image per side. The performance represented a 3.8:1 reduction in false alarms over the best performing single CAD/CAC algorithm, with no loss in probability of correct classification.
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
ERIC Educational Resources Information Center
Salman, Raied
2012-01-01
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
GPU based cloud system for high-performance arrhythmia detection with parallel k-NN algorithm.
Tae Joon Jun; Hyun Ji Park; Hyuk Yoo; Young-Hak Kim; Daeyoung Kim
2016-08-01
In this paper, we propose an GPU based Cloud system for high-performance arrhythmia detection. Pan-Tompkins algorithm is used for QRS detection and we optimized beat classification algorithm with K-Nearest Neighbor (K-NN). To support high performance beat classification on the system, we parallelized beat classification algorithm with CUDA to execute the algorithm on virtualized GPU devices on the Cloud system. MIT-BIH Arrhythmia database is used for validation of the algorithm. The system achieved about 93.5% of detection rate which is comparable to previous researches while our algorithm shows 2.5 times faster execution time compared to CPU only detection algorithm.
The biometric recognition on contactless multi-spectrum finger images
NASA Astrophysics Data System (ADS)
Kang, Wenxiong; Chen, Xiaopeng; Wu, Qiuxia
2015-01-01
This paper presents a novel multimodal biometric system based on contactless multi-spectrum finger images, which aims to deal with the limitations of unimodal biometrics. The chief merits of the system are the richness of the permissible texture and the ease of data access. We constructed a multi-spectrum instrument to simultaneously acquire three different types of biometrics from a finger: contactless fingerprint, finger vein, and knuckleprint. On the basis of the samples with these characteristics, a moderate database was built for the evaluation of our system. Considering the real-time requirements and the respective characteristics of the three biometrics, the block local binary patterns algorithm was used to extract features and match for the fingerprints and finger veins, while the Oriented FAST and Rotated BRIEF algorithm was applied for knuckleprints. Finally, score-level fusion was performed on the matching results from the aforementioned three types of biometrics. The experiments showed that our proposed multimodal biometric recognition system achieves an equal error rate of 0.109%, which is 88.9%, 94.6%, and 89.7% lower than the individual fingerprint, knuckleprint, and finger vein recognitions, respectively. Nevertheless, our proposed system also satisfies the real-time requirements of the applications.
Machine learning in materials informatics: recent applications and prospects
Ramprasad, Rampi; Batra, Rohit; Pilania, Ghanshyam; ...
2017-12-13
Propelled partly by the Materials Genome Initiative, and partly by the algorithmic developments and the resounding successes of data-driven efforts in other domains, informatics strategies are beginning to take shape within materials science. These approaches lead to surrogate machine learning models that enable rapid predictions based purely on past data rather than by direct experimentation or by computations/simulations in which fundamental equations are explicitly solved. Data-centric informatics methods are becoming useful to determine material properties that are hard to measure or compute using traditional methods—due to the cost, time or effort involved—but for which reliable data either already exists ormore » can be generated for at least a subset of the critical cases. Predictions are typically interpolative, involving fingerprinting a material numerically first, and then following a mapping (established via a learning algorithm) between the fingerprint and the property of interest. Fingerprints, also referred to as “descriptors”, may be of many types and scales, as dictated by the application domain and needs. Predictions may also be extrapolative—extending into new materials spaces—provided prediction uncertainties are properly taken into account. This article attempts to provide an overview of some of the recent successful data-driven “materials informatics” strategies undertaken in the last decade, with particular emphasis on the fingerprint or descriptor choices. The review also identifies some challenges the community is facing and those that should be overcome in the near future.« less
Machine learning in materials informatics: recent applications and prospects
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramprasad, Rampi; Batra, Rohit; Pilania, Ghanshyam
Propelled partly by the Materials Genome Initiative, and partly by the algorithmic developments and the resounding successes of data-driven efforts in other domains, informatics strategies are beginning to take shape within materials science. These approaches lead to surrogate machine learning models that enable rapid predictions based purely on past data rather than by direct experimentation or by computations/simulations in which fundamental equations are explicitly solved. Data-centric informatics methods are becoming useful to determine material properties that are hard to measure or compute using traditional methods—due to the cost, time or effort involved—but for which reliable data either already exists ormore » can be generated for at least a subset of the critical cases. Predictions are typically interpolative, involving fingerprinting a material numerically first, and then following a mapping (established via a learning algorithm) between the fingerprint and the property of interest. Fingerprints, also referred to as “descriptors”, may be of many types and scales, as dictated by the application domain and needs. Predictions may also be extrapolative—extending into new materials spaces—provided prediction uncertainties are properly taken into account. This article attempts to provide an overview of some of the recent successful data-driven “materials informatics” strategies undertaken in the last decade, with particular emphasis on the fingerprint or descriptor choices. The review also identifies some challenges the community is facing and those that should be overcome in the near future.« less
A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem
Liu, Dong-sheng; Fan, Shu-jiang
2014-01-01
In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389
The use of long-chain alkylbenzenes and alkyltoluenes for fingerprinting marine oil wastes.
Albaigés, Joan; Jimenez, Núria; Arcos, Altamira; Dominguez, Carmen; Bayona, Josep M
2013-04-01
Petroleum long-chain alkylbenzenes and alkyltoluenes are characterized and used for chemical fingerprinting of marine oil spills. Their distributions, extending from C10 to C35 can be used for a general oil type classification. Moreover, the relative distributions of specific components, namely the 3-methyl and 2-methyl-1-alkylbenzenes (m- and o-isomers), and the aryl isoprenoid 1-methyl-3-phytanylbenzene, are proposed as diagnostic markers for source identification. This approach has been exemplified in two case studies involving the spill of bilge oils, where a preliminary screening of the potential source was obtained. Copyright © 2012 Elsevier Ltd. All rights reserved.
Terahertz scattering by two phased media with optically soft scatterers
NASA Astrophysics Data System (ADS)
Kaushik, Mayank; Ng, Brian W.-H.; Fischer, Bernd M.; Abbott, Derek
2012-12-01
Frequency dependent absorption by materials at distinct frequencies in the THz range is commonly used as spectral-fingerprints for identification and classification. For transmission measurements, the substance under study is often mixed with a transparent host material. Refractive index variations arising from the presence of impurities and inconsistencies in the sample's internal structure often cause the incident radiation to scatter. This can significantly distort the measured spectral-fingerprints. In this letter, we present a numerical approach to allay the scattering contribution in THz-TDS measurements, provided the sample's refractive index is known, and reveal the true absorption spectra for a given sample.
Improved classification accuracy by feature extraction using genetic algorithms
NASA Astrophysics Data System (ADS)
Patriarche, Julia; Manduca, Armando; Erickson, Bradley J.
2003-05-01
A feature extraction algorithm has been developed for the purposes of improving classification accuracy. The algorithm uses a genetic algorithm / hill-climber hybrid to generate a set of linearly recombined features, which may be of reduced dimensionality compared with the original set. The genetic algorithm performs the global exploration, and a hill climber explores local neighborhoods. Hybridizing the genetic algorithm with a hill climber improves both the rate of convergence, and the final overall cost function value; it also reduces the sensitivity of the genetic algorithm to parameter selection. The genetic algorithm includes the operators: crossover, mutation, and deletion / reactivation - the last of these effects dimensionality reduction. The feature extractor is supervised, and is capable of deriving a separate feature space for each tissue (which are reintegrated during classification). A non-anatomical digital phantom was developed as a gold standard for testing purposes. In tests with the phantom, and with images of multiple sclerosis patients, classification with feature extractor derived features yielded lower error rates than using standard pulse sequences, and with features derived using principal components analysis. Using the multiple sclerosis patient data, the algorithm resulted in a mean 31% reduction in classification error of pure tissues.
Contextual classification of multispectral image data: Approximate algorithm
NASA Technical Reports Server (NTRS)
Tilton, J. C. (Principal Investigator)
1980-01-01
An approximation to a classification algorithm incorporating spatial context information in a general, statistical manner is presented which is computationally less intensive. Classifications that are nearly as accurate are produced.
Towards a robust framework for catchment classification
NASA Astrophysics Data System (ADS)
Deshmukh, A.; Samal, A.; Singh, R.
2017-12-01
Classification of catchments based on various measures of similarity has emerged as an important technique to understand regional scale hydrologic behavior. Classification of catchment characteristics and/or streamflow response has been used reveal which characteristics are more likely to explain the observed variability of hydrologic response. However, numerous algorithms for supervised or unsupervised classification are available, making it hard to identify the algorithm most suitable for the dataset at hand. Consequently, existing catchment classification studies vary significantly in the classification algorithms employed with no previous attempt at understanding the degree of uncertainty in classification due to this algorithmic choice. This hinders the generalizability of interpretations related to hydrologic behavior. Our goal is to develop a protocol that can be followed while classifying hydrologic datasets. We focus on a classification framework for unsupervised classification and provide a step-by-step classification procedure. The steps include testing the clusterabiltiy of original dataset prior to classification, feature selection, validation of clustered data, and quantification of similarity of two clusterings. We test several commonly available methods within this framework to understand the level of similarity of classification results across algorithms. We apply the proposed framework on recently developed datasets for India to analyze to what extent catchment properties can explain observed catchment response. Our testing dataset includes watershed characteristics for over 200 watersheds which comprise of both natural (physio-climatic) characteristics and socio-economic characteristics. This framework allows us to understand the controls on observed hydrologic variability across India.
Longobardi, F; Ventrella, A; Bianco, A; Catucci, L; Cafagna, I; Gallo, V; Mastrorilli, P; Agostiano, A
2013-12-01
In this study, non-targeted (1)H NMR fingerprinting was used in combination with multivariate statistical techniques for the classification of Italian sweet cherries based on their different geographical origins (Emilia Romagna and Puglia). As classification techniques, Soft Independent Modelling of Class Analogy (SIMCA), Partial Least Squares Discriminant Analysis (PLS-DA), and Linear Discriminant Analysis (LDA) were carried out and the results were compared. For LDA, before performing a refined selection of the number/combination of variables, two different strategies for a preliminary reduction of the variable number were tested. The best average recognition and CV prediction abilities (both 100.0%) were obtained for all the LDA models, although PLS-DA also showed remarkable performances (94.6%). All the statistical models were validated by observing the prediction abilities with respect to an external set of cherry samples. The best result (94.9%) was obtained with LDA by performing a best subset selection procedure on a set of 30 principal components previously selected by a stepwise decorrelation. The metabolites that mostly contributed to the classification performances of such LDA model, were found to be malate, glucose, fructose, glutamine and succinate. Copyright © 2013 Elsevier Ltd. All rights reserved.
Flumignan, Danilo Luiz; Boralle, Nivaldo; Oliveira, José Eduardo de
2010-06-30
In this work, the combination of carbon nuclear magnetic resonance ((13)C NMR) fingerprinting with pattern-recognition analyses provides an original and alternative approach to screening commercial gasoline quality. Soft Independent Modelling of Class Analogy (SIMCA) was performed on spectroscopic fingerprints to classify representative commercial gasoline samples, which were selected by Hierarchical Cluster Analyses (HCA) over several months in retails services of gas stations, into previously quality-defined classes. Following optimized (13)C NMR-SIMCA algorithm, sensitivity values were obtained in the training set (99.0%), with leave-one-out cross-validation, and external prediction set (92.0%). Governmental laboratories could employ this method as a rapid screening analysis to discourage adulteration practices. Copyright 2010 Elsevier B.V. All rights reserved.
Cutrale, Francesco; Salih, Anya; Gratton, Enrico
2013-01-01
The phasor global analysis algorithm is common for fluorescence lifetime applications, but has only been recently proposed for spectral analysis. Here the phasor representation and fingerprinting is exploited in its second harmonic to determine the number and spectra of photo-activated states as well as their conversion dynamics. We follow the sequence of photo-activation of proteins over time by rapidly collecting multiple spectral images. The phasor representation of the cumulative images provides easy identification of the spectral signatures of each photo-activatable protein. PMID:24040513
Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A
2015-06-01
Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification. Copyright © 2015 Elsevier Ltd. All rights reserved.
A novel artificial immune clonal selection classification and rule mining with swarm learning model
NASA Astrophysics Data System (ADS)
Al-Sheshtawi, Khaled A.; Abdul-Kader, Hatem M.; Elsisi, Ashraf B.
2013-06-01
Metaheuristic optimisation algorithms have become popular choice for solving complex problems. By integrating Artificial Immune clonal selection algorithm (CSA) and particle swarm optimisation (PSO) algorithm, a novel hybrid Clonal Selection Classification and Rule Mining with Swarm Learning Algorithm (CS2) is proposed. The main goal of the approach is to exploit and explore the parallel computation merit of Clonal Selection and the speed and self-organisation merits of Particle Swarm by sharing information between clonal selection population and particle swarm. Hence, we employed the advantages of PSO to improve the mutation mechanism of the artificial immune CSA and to mine classification rules within datasets. Consequently, our proposed algorithm required less training time and memory cells in comparison to other AIS algorithms. In this paper, classification rule mining has been modelled as a miltiobjective optimisation problem with predictive accuracy. The multiobjective approach is intended to allow the PSO algorithm to return an approximation to the accuracy and comprehensibility border, containing solutions that are spread across the border. We compared our proposed algorithm classification accuracy CS2 with five commonly used CSAs, namely: AIRS1, AIRS2, AIRS-Parallel, CLONALG, and CSCA using eight benchmark datasets. We also compared our proposed algorithm classification accuracy CS2 with other five methods, namely: Naïve Bayes, SVM, MLP, CART, and RFB. The results show that the proposed algorithm is comparable to the 10 studied algorithms. As a result, the hybridisation, built of CSA and PSO, can develop respective merit, compensate opponent defect, and make search-optimal effect and speed better.
In silico prediction of ROCK II inhibitors by different classification approaches.
Cai, Chuipu; Wu, Qihui; Luo, Yunxia; Ma, Huili; Shen, Jiangang; Zhang, Yongbin; Yang, Lei; Chen, Yunbo; Wen, Zehuai; Wang, Qi
2017-11-01
ROCK II is an important pharmacological target linked to central nervous system disorders such as Alzheimer's disease. The purpose of this research is to generate ROCK II inhibitor prediction models by machine learning approaches. Firstly, four sets of descriptors were calculated with MOE 2010 and PaDEL-Descriptor, and optimized by F-score and linear forward selection methods. In addition, four classification algorithms were used to initially build 16 classifiers with k-nearest neighbors [Formula: see text], naïve Bayes, Random forest, and support vector machine. Furthermore, three sets of structural fingerprint descriptors were introduced to enhance the predictive capacity of classifiers, which were assessed with fivefold cross-validation, test set validation and external test set validation. The best two models, MFK + MACCS and MLR + SubFP, have both MCC values of 0.925 for external test set. After that, a privileged substructure analysis was performed to reveal common chemical features of ROCK II inhibitors. Finally, binding modes were analyzed to identify relationships between molecular descriptors and activity, while main interactions were revealed by comparing the docking interaction of the most potent and the weakest ROCK II inhibitors. To the best of our knowledge, this is the first report on ROCK II inhibitors utilizing machine learning approaches that provides a new method for discovering novel ROCK II inhibitors.
A Locality-Constrained and Label Embedding Dictionary Learning Algorithm for Image Classification.
Zhengming Li; Zhihui Lai; Yong Xu; Jian Yang; Zhang, David
2017-02-01
Locality and label information of training samples play an important role in image classification. However, previous dictionary learning algorithms do not take the locality and label information of atoms into account together in the learning process, and thus their performance is limited. In this paper, a discriminative dictionary learning algorithm, called the locality-constrained and label embedding dictionary learning (LCLE-DL) algorithm, was proposed for image classification. First, the locality information was preserved using the graph Laplacian matrix of the learned dictionary instead of the conventional one derived from the training samples. Then, the label embedding term was constructed using the label information of atoms instead of the classification error term, which contained discriminating information of the learned dictionary. The optimal coding coefficients derived by the locality-based and label-based reconstruction were effective for image classification. Experimental results demonstrated that the LCLE-DL algorithm can achieve better performance than some state-of-the-art algorithms.
Zhang, Xiaoheng; Wang, Lirui; Cao, Yao; Wang, Pin; Zhang, Cheng; Yang, Liuyang; Li, Yongming; Zhang, Yanling; Cheng, Oumei
2018-02-01
Diagnosis of Parkinson's disease (PD) based on speech data has been proved to be an effective way in recent years. However, current researches just care about the feature extraction and classifier design, and do not consider the instance selection. Former research by authors showed that the instance selection can lead to improvement on classification accuracy. However, no attention is paid on the relationship between speech sample and feature until now. Therefore, a new diagnosis algorithm of PD is proposed in this paper by simultaneously selecting speech sample and feature based on relevant feature weighting algorithm and multiple kernel method, so as to find their synergy effects, thereby improving classification accuracy. Experimental results showed that this proposed algorithm obtained apparent improvement on classification accuracy. It can obtain mean classification accuracy of 82.5%, which was 30.5% higher than the relevant algorithm. Besides, the proposed algorithm detected the synergy effects of speech sample and feature, which is valuable for speech marker extraction.
Chandra, Sharat; Pandey, Jyotsana; Tamrakar, Akhilesh Kumar; Siddiqi, Mohammad Imran
2017-01-01
In insulin and leptin signaling pathway, Protein-Tyrosine Phosphatase 1B (PTP1B) plays a crucial controlling role as a negative regulator, which makes it an attractive therapeutic target for both Type-2 Diabetes (T2D) and obesity. In this work, we have generated classification models by using the inhibition data set of known PTP1B inhibitors to identify new inhibitors of PTP1B utilizing multiple machine learning techniques like naïve Bayesian, random forest, support vector machine and k-nearest neighbors, along with structural fingerprints and selected molecular descriptors. Several models from each algorithm have been constructed and optimized, with the different combination of molecular descriptors and structural fingerprints. For the training and test sets, most of the predictive models showed more than 90% of overall prediction accuracies. The best model was obtained with support vector machine approach and has Matthews Correlation Coefficient of 0.82 for the external test set, which was further employed for the virtual screening of Maybridge small compound database. Five compounds were subsequently selected for experimental assay. Out of these two compounds were found to inhibit PTP1B with significant inhibitory activity in in-vitro inhibition assay. The structural fragments which are important for PTP1B inhibition were identified by naïve Bayesian method and can be further exploited to design new molecules around the identified scaffolds. The descriptive and predictive modeling strategy applied in this study is capable of identifying PTP1B inhibitors from the large compound libraries. Copyright © 2016 Elsevier Inc. All rights reserved.
Caso, Giuseppe; de Nardis, Luca; di Benedetto, Maria-Gabriella
2015-10-30
The weighted k-nearest neighbors (WkNN) algorithm is by far the most popular choice in the design of fingerprinting indoor positioning systems based on WiFi received signal strength (RSS). WkNN estimates the position of a target device by selecting k reference points (RPs) based on the similarity of their fingerprints with the measured RSS values. The position of the target device is then obtained as a weighted sum of the positions of the k RPs. Two-step WkNN positioning algorithms were recently proposed, in which RPs are divided into clusters using the affinity propagation clustering algorithm, and one representative for each cluster is selected. Only cluster representatives are then considered during the position estimation, leading to a significant computational complexity reduction compared to traditional, flat WkNN. Flat and two-step WkNN share the issue of properly selecting the similarity metric so as to guarantee good positioning accuracy: in two-step WkNN, in particular, the metric impacts three different steps in the position estimation, that is cluster formation, cluster selection and RP selection and weighting. So far, however, the only similarity metric considered in the literature was the one proposed in the original formulation of the affinity propagation algorithm. This paper fills this gap by comparing different metrics and, based on this comparison, proposes a novel mixed approach in which different metrics are adopted in the different steps of the position estimation procedure. The analysis is supported by an extensive experimental campaign carried out in a multi-floor 3D indoor positioning testbed. The impact of similarity metrics and their combinations on the structure and size of the resulting clusters, 3D positioning accuracy and computational complexity are investigated. Results show that the adoption of metrics different from the one proposed in the original affinity propagation algorithm and, in particular, the combination of different metrics can significantly improve the positioning accuracy while preserving the efficiency in computational complexity typical of two-step algorithms.
Caso, Giuseppe; de Nardis, Luca; di Benedetto, Maria-Gabriella
2015-01-01
The weighted k-nearest neighbors (WkNN) algorithm is by far the most popular choice in the design of fingerprinting indoor positioning systems based on WiFi received signal strength (RSS). WkNN estimates the position of a target device by selecting k reference points (RPs) based on the similarity of their fingerprints with the measured RSS values. The position of the target device is then obtained as a weighted sum of the positions of the k RPs. Two-step WkNN positioning algorithms were recently proposed, in which RPs are divided into clusters using the affinity propagation clustering algorithm, and one representative for each cluster is selected. Only cluster representatives are then considered during the position estimation, leading to a significant computational complexity reduction compared to traditional, flat WkNN. Flat and two-step WkNN share the issue of properly selecting the similarity metric so as to guarantee good positioning accuracy: in two-step WkNN, in particular, the metric impacts three different steps in the position estimation, that is cluster formation, cluster selection and RP selection and weighting. So far, however, the only similarity metric considered in the literature was the one proposed in the original formulation of the affinity propagation algorithm. This paper fills this gap by comparing different metrics and, based on this comparison, proposes a novel mixed approach in which different metrics are adopted in the different steps of the position estimation procedure. The analysis is supported by an extensive experimental campaign carried out in a multi-floor 3D indoor positioning testbed. The impact of similarity metrics and their combinations on the structure and size of the resulting clusters, 3D positioning accuracy and computational complexity are investigated. Results show that the adoption of metrics different from the one proposed in the original affinity propagation algorithm and, in particular, the combination of different metrics can significantly improve the positioning accuracy while preserving the efficiency in computational complexity typical of two-step algorithms. PMID:26528984
NASA Astrophysics Data System (ADS)
García-Flores, Agustín.; Paz-Gallardo, Abel; Plaza, Antonio; Li, Jun
2016-10-01
This paper describes a new web platform dedicated to the classification of satellite images called Hypergim. The current implementation of this platform enables users to perform classification of satellite images from any part of the world thanks to the worldwide maps provided by Google Maps. To perform this classification, Hypergim uses unsupervised algorithms like Isodata and K-means. Here, we present an extension of the original platform in which we adapt Hypergim in order to use supervised algorithms to improve the classification results. This involves a significant modification of the user interface, providing the user with a way to obtain samples of classes present in the images to use in the training phase of the classification process. Another main goal of this development is to improve the runtime of the image classification process. To achieve this goal, we use a parallel implementation of the Random Forest classification algorithm. This implementation is a modification of the well-known CURFIL software package. The use of this type of algorithms to perform image classification is widespread today thanks to its precision and ease of training. The actual implementation of Random Forest was developed using CUDA platform, which enables us to exploit the potential of several models of NVIDIA graphics processing units using them to execute general purpose computing tasks as image classification algorithms. As well as CUDA, we use other parallel libraries as Intel Boost, taking advantage of the multithreading capabilities of modern CPUs. To ensure the best possible results, the platform is deployed in a cluster of commodity graphics processing units (GPUs), so that multiple users can use the tool in a concurrent way. The experimental results indicate that this new algorithm widely outperform the previous unsupervised algorithms implemented in Hypergim, both in runtime as well as precision of the actual classification of the images.
Disease state fingerprint for fall risk assessment.
Similä, Heidi; Immonen, Milla
2014-01-01
Fall prevention is an important and complex multifactorial challenge, since one third of people over 65 years old fall at least once every year. A novel application of Disease State Fingerprint (DSF) algorithm is presented for holistic visualization of fall risk factors and identifying persons with falls history or decreased level of physical functioning based on fall risk assessment data. The algorithm is tested with data from 42 older adults, that went through a comprehensive fall risk assessment. Within the study population the Activities-specific Balance Confidence (ABC) scale score, Berg Balance Scale (BBS) score and the number of drugs in use were the three most relevant variables, that differed between the fallers and non-fallers. This study showed that the DSF visualization is beneficial in inspection of an individual's significant fall risk factors, since people have problems in different areas and one single assessment scale is not enough to expose all the people at risk.
Multilayer Statistical Intrusion Detection in Wireless Networks
NASA Astrophysics Data System (ADS)
Hamdi, Mohamed; Meddeb-Makhlouf, Amel; Boudriga, Noureddine
2008-12-01
The rapid proliferation of mobile applications and services has introduced new vulnerabilities that do not exist in fixed wired networks. Traditional security mechanisms, such as access control and encryption, turn out to be inefficient in modern wireless networks. Given the shortcomings of the protection mechanisms, an important research focuses in intrusion detection systems (IDSs). This paper proposes a multilayer statistical intrusion detection framework for wireless networks. The architecture is adequate to wireless networks because the underlying detection models rely on radio parameters and traffic models. Accurate correlation between radio and traffic anomalies allows enhancing the efficiency of the IDS. A radio signal fingerprinting technique based on the maximal overlap discrete wavelet transform (MODWT) is developed. Moreover, a geometric clustering algorithm is presented. Depending on the characteristics of the fingerprinting technique, the clustering algorithm permits to control the false positive and false negative rates. Finally, simulation experiments have been carried out to validate the proposed IDS.
Fast dictionary generation and searching for magnetic resonance fingerprinting.
Jun Xie; Mengye Lyu; Jian Zhang; Hui, Edward S; Wu, Ed X; Ze Wang
2017-07-01
A super-fast dictionary generation and searching (DGS) algorithm was developed for MR parameter quantification using magnetic resonance fingerprinting (MRF). MRF is a new technique for simultaneously quantifying multiple MR parameters using one temporally resolved MR scan. But it has a multiplicative computation complexity, resulting in a big burden of dictionary generating, saving, and retrieving, which can easily be intractable for any state-of-art computers. Based on retrospective analysis of the dictionary matching object function, a multi-scale ZOOM like DGS algorithm, dubbed as MRF-ZOOM, was proposed. MRF ZOOM is quasi-parameter-separable so the multiplicative computation complexity is broken into additive one. Evaluations showed that MRF ZOOM was hundreds or thousands of times faster than the original MRF parameter quantification method even without counting the dictionary generation time in. Using real data, it yielded nearly the same results as produced by the original method. MRF ZOOM provides a super-fast solution for MR parameter quantification.
Comparative Performance Analysis of Different Fingerprint Biometric Scanners for Patient Matching.
Kasiiti, Noah; Wawira, Judy; Purkayastha, Saptarshi; Were, Martin C
2017-01-01
Unique patient identification within health services is an operational challenge in healthcare settings. Use of key identifiers, such as patient names, hospital identification numbers, national ID, and birth date are often inadequate for ensuring unique patient identification. In addition approximate string comparator algorithms, such as distance-based algorithms, have proven suboptimal for improving patient matching, especially in low-resource settings. Biometric approaches may improve unique patient identification. However, before implementing the technology in a given setting, such as health care, the right scanners should be rigorously tested to identify an optimal package for the implementation. This study aimed to investigate the effects of factors such as resolution, template size, and scan capture area on the matching performance of different fingerprint scanners for use within health care settings. Performance analysis of eight different scanners was tested using the demo application distributed as part of the Neurotech Verifinger SDK 6.0.
Online clustering algorithms for radar emitter classification.
Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max
2005-08-01
Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clustering algorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability.
Spectral band selection for classification of soil organic matter content
NASA Technical Reports Server (NTRS)
Henderson, Tracey L.; Szilagyi, Andrea; Baumgardner, Marion F.; Chen, Chih-Chien Thomas; Landgrebe, David A.
1989-01-01
This paper describes the spectral-band-selection (SBS) algorithm of Chen and Landgrebe (1987, 1988, and 1989) and uses the algorithm to classify the organic matter content in the earth's surface soil. The effectiveness of the algorithm was evaluated comparing the results of classification of the soil organic matter using SBS bands with those obtained using Landsat MSS bands and TM bands, showing that the algorithm was successful in finding important spectral bands for classification of organic matter content. Using the calculated bands, the probabilities of correct classification for climate-stratified data were found to range from 0.910 to 0.980.
Geographical provenance of palm oil by fatty acid and volatile compound fingerprinting techniques.
Tres, A; Ruiz-Samblas, C; van der Veer, G; van Ruth, S M
2013-04-15
Analytical methods are required in addition to administrative controls to verify the geographical origin of vegetable oils such as palm oil in an objective manner. In this study the application of fatty acid and volatile organic compound fingerprinting in combination with chemometrics have been applied to verify the geographical origin of crude palm oil (continental scale). For this purpose 94 crude palm oil samples were collected from South East Asia (55), South America (11) and Africa (28). Partial least squares discriminant analysis (PLS-DA) was used to develop a hierarchical classification model by combining two consecutive binary PLS-DA models. First, a PLS-DA model was built to distinguish South East Asian from non-South East Asian palm oil samples. Then a second model was developed, only for the non-Asian samples, to discriminate African from South American crude palm oil. Models were externally validated by using them to predict the identity of new authentic samples. The fatty acid fingerprinting model revealed three misclassified samples. The volatile compound fingerprinting models showed an 88%, 100% and 100% accuracy for the South East Asian, African and American class, respectively. The verification of the geographical origin of crude palm oil is feasible by fatty acid and volatile compound fingerprinting. Further research is required to further validate the approach and to increase its spatial specificity to country/province scale. Copyright © 2012 Elsevier Ltd. All rights reserved.
Nasiri, Jaber; Naghavi, Mohammad Reza; Kayvanjoo, Amir Hossein; Nasiri, Mojtaba; Ebrahimi, Mansour
2015-03-07
For the first time, prediction accuracies of some supervised and unsupervised algorithms were evaluated in an SSR-based DNA fingerprinting study of a pea collection containing 20 cultivars and 57 wild samples. In general, according to the 10 attribute weighting models, the SSR alleles of PEAPHTAP-2 and PSBLOX13.2-1 were the two most important attributes to generate discrimination among eight different species and subspecies of genus Pisum. In addition, K-Medoids unsupervised clustering run on Chi squared dataset exhibited the best prediction accuracy (83.12%), while the lowest accuracy (25.97%) gained as K-Means model ran on FCdb database. Irrespective of some fluctuations, the overall accuracies of tree induction models were significantly high for many algorithms, and the attributes PSBLOX13.2-3 and PEAPHTAP could successfully detach Pisum fulvum accessions and cultivars from the others when two selected decision trees were taken into account. Meanwhile, the other used supervised algorithms exhibited overall reliable accuracies, even though in some rare cases, they gave us low amounts of accuracies. Our results, altogether, demonstrate promising applications of both supervised and unsupervised algorithms to provide suitable data mining tools regarding accurate fingerprinting of different species and subspecies of genus Pisum, as a fundamental priority task in breeding programs of the crop. Copyright © 2015 Elsevier Ltd. All rights reserved.
Bromuri, Stefano; Zufferey, Damien; Hennebert, Jean; Schumacher, Michael
2014-10-01
This research is motivated by the issue of classifying illnesses of chronically ill patients for decision support in clinical settings. Our main objective is to propose multi-label classification of multivariate time series contained in medical records of chronically ill patients, by means of quantization methods, such as bag of words (BoW), and multi-label classification algorithms. Our second objective is to compare supervised dimensionality reduction techniques to state-of-the-art multi-label classification algorithms. The hypothesis is that kernel methods and locality preserving projections make such algorithms good candidates to study multi-label medical time series. We combine BoW and supervised dimensionality reduction algorithms to perform multi-label classification on health records of chronically ill patients. The considered algorithms are compared with state-of-the-art multi-label classifiers in two real world datasets. Portavita dataset contains 525 diabetes type 2 (DT2) patients, with co-morbidities of DT2 such as hypertension, dyslipidemia, and microvascular or macrovascular issues. MIMIC II dataset contains 2635 patients affected by thyroid disease, diabetes mellitus, lipoid metabolism disease, fluid electrolyte disease, hypertensive disease, thrombosis, hypotension, chronic obstructive pulmonary disease (COPD), liver disease and kidney disease. The algorithms are evaluated using multi-label evaluation metrics such as hamming loss, one error, coverage, ranking loss, and average precision. Non-linear dimensionality reduction approaches behave well on medical time series quantized using the BoW algorithm, with results comparable to state-of-the-art multi-label classification algorithms. Chaining the projected features has a positive impact on the performance of the algorithm with respect to pure binary relevance approaches. The evaluation highlights the feasibility of representing medical health records using the BoW for multi-label classification tasks. The study also highlights that dimensionality reduction algorithms based on kernel methods, locality preserving projections or both are good candidates to deal with multi-label classification tasks in medical time series with many missing values and high label density. Copyright © 2014 Elsevier Inc. All rights reserved.
Weakly supervised classification in high energy physics
Dery, Lucio Mwinmaarong; Nachman, Benjamin; Rubbo, Francesco; ...
2017-05-01
As machine learning algorithms become increasingly sophisticated to exploit subtle features of the data, they often become more dependent on simulations. Here, this paper presents a new approach called weakly supervised classification in which class proportions are the only input into the machine learning algorithm. Using one of the most challenging binary classification tasks in high energy physics $-$ quark versus gluon tagging $-$ we show that weakly supervised classification can match the performance of fully supervised algorithms. Furthermore, by design, the new algorithm is insensitive to any mis-modeling of discriminating features in the data by the simulation. Weakly supervisedmore » classification is a general procedure that can be applied to a wide variety of learning problems to boost performance and robustness when detailed simulations are not reliable or not available.« less
Weakly supervised classification in high energy physics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dery, Lucio Mwinmaarong; Nachman, Benjamin; Rubbo, Francesco
As machine learning algorithms become increasingly sophisticated to exploit subtle features of the data, they often become more dependent on simulations. Here, this paper presents a new approach called weakly supervised classification in which class proportions are the only input into the machine learning algorithm. Using one of the most challenging binary classification tasks in high energy physics $-$ quark versus gluon tagging $-$ we show that weakly supervised classification can match the performance of fully supervised algorithms. Furthermore, by design, the new algorithm is insensitive to any mis-modeling of discriminating features in the data by the simulation. Weakly supervisedmore » classification is a general procedure that can be applied to a wide variety of learning problems to boost performance and robustness when detailed simulations are not reliable or not available.« less
Novotná, H; Kmiecik, O; Gałązka, M; Krtková, V; Hurajová, A; Schulzová, V; Hallmann, E; Rembiałkowska, E; Hajšlová, J
2012-01-01
The rapidly growing demand for organic food requires the availability of analytical tools enabling their authentication. Recently, metabolomic fingerprinting/profiling has been demonstrated as a challenging option for a comprehensive characterisation of small molecules occurring in plants, since their pattern may reflect the impact of various external factors. In a two-year pilot study, concerned with the classification of organic versus conventional crops, ambient mass spectrometry consisting of a direct analysis in real time (DART) ion source and a time-of-flight mass spectrometer (TOFMS) was employed. This novel methodology was tested on 40 tomato and 24 pepper samples grown under specified conditions. To calculate statistical models, the obtained data (mass spectra) were processed by the principal component analysis (PCA) followed by linear discriminant analysis (LDA). The results from the positive ionisation mode enabled better differentiation between organic and conventional samples than the results from the negative mode. In this case, the recognition ability obtained by LDA was 97.5% for tomato and 100% for pepper samples and the prediction abilities were above 80% for both sample sets. The results suggest that the year of production had stronger influence on the metabolomic fingerprints compared with the type of farming (organic versus conventional). In any case, DART-TOFMS is a promising tool for rapid screening of samples. Establishing comprehensive (multi-sample) long-term databases may further help to improve the quality of statistical classification models.
Chasset, Thibaut; Häbe, Tim T; Ristivojevic, Petar; Morlock, Gertrud E
2016-09-23
Quality control of propolis is challenging, as it is a complex natural mixture of compounds, and thus, very difficult to analyze and standardize. Shown on the example of 30 French propolis samples, a strategy for an improved quality control was demonstrated in which high-performance thin-layer chromatography (HPTLC) fingerprints were evaluated in combination with selected mass signals obtained by desorption-based scanning mass spectrometry (MS). The French propolis sample extracts were separated by a newly developed reversed phase (RP)-HPTLC method. The fingerprints obtained by two different detection modes, i.e. after (1) derivatization and fluorescence detection (FLD) at UV 366nm and (2) scanning direct analysis in real time (DART)-MS, were analyzed by multivariate data analysis. Thus, RP-HPTLC-FLD and RP-HPTLC-DART-MS fingerprints were explored and the best classification was obtained using both methods in combination with pattern recognition techniques, such as principal component analysis. All investigated French propolis samples were divided in two types and characteristic patterns were observed. Phenolic compounds such as caffeic acid, p-coumaric acid, chrysin, pinobanksin, pinobanksin-3-acetate, galangin, kaempferol, tectochrysin and pinocembrin were identified as characteristic marker compounds of French propolis samples. This study expanded the research on the European poplar type of propolis and confirmed the presence of two botanically different types of propolis, known as the blue and orange types. Copyright © 2016 Elsevier B.V. All rights reserved.
Fongaro, Lorenzo; Ho, Doris Mer Lin; Kvaal, Knut; Mayer, Klaus; Rondinella, Vincenzo V
2016-05-15
The identification of interdicted nuclear or radioactive materials requires the application of dedicated techniques. In this work, a new approach for characterizing powder of uranium ore concentrates (UOCs) is presented. It is based on image texture analysis and multivariate data modelling. 26 different UOCs samples were evaluated applying the Angle Measure Technique (AMT) algorithm to extract textural features on samples images acquired at 250× and 1000× magnification by Scanning Electron Microscope (SEM). At both magnifications, this method proved effective to classify the different types of UOC powder based on the surface characteristics that depend on particle size, homogeneity, and graininess and are related to the composition and processes used in the production facilities. Using the outcome data from the application of the AMT algorithm, the total explained variance was higher than 90% with Principal Component Analysis (PCA), while partial least square discriminant analysis (PLS-DA) applied only on the 14 black colour UOCs powder samples, allowed their classification only on the basis of their surface texture features (sensitivity>0.6; specificity>0.6). This preliminary study shows that this method was able to distinguish samples with similar composition, but obtained from different facilities. The mean angle spectral data obtained by the image texture analysis using the AMT algorithm can be considered as a specific fingerprint or signature of UOCs and could be used for nuclear forensic investigation. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Mo, Yun; Zhang, Zhongzhao; Meng, Weixiao; Ma, Lin; Wang, Yao
2014-01-01
Indoor positioning systems based on the fingerprint method are widely used due to the large number of existing devices with a wide range of coverage. However, extensive positioning regions with a massive fingerprint database may cause high computational complexity and error margins, therefore clustering methods are widely applied as a solution. However, traditional clustering methods in positioning systems can only measure the similarity of the Received Signal Strength without being concerned with the continuity of physical coordinates. Besides, outage of access points could result in asymmetric matching problems which severely affect the fine positioning procedure. To solve these issues, in this paper we propose a positioning system based on the Spatial Division Clustering (SDC) method for clustering the fingerprint dataset subject to physical distance constraints. With the Genetic Algorithm and Support Vector Machine techniques, SDC can achieve higher coarse positioning accuracy than traditional clustering algorithms. In terms of fine localization, based on the Kernel Principal Component Analysis method, the proposed positioning system outperforms its counterparts based on other feature extraction methods in low dimensionality. Apart from balancing online matching computational burden, the new positioning system exhibits advantageous performance on radio map clustering, and also shows better robustness and adaptability in the asymmetric matching problem aspect. PMID:24451470
Fusing Bluetooth Beacon Data with Wi-Fi Radiomaps for Improved Indoor Localization
Kanaris, Loizos; Kokkinis, Akis; Liotta, Antonio; Stavrou, Stavros
2017-01-01
Indoor user localization and tracking are instrumental to a broad range of services and applications in the Internet of Things (IoT) and particularly in Body Sensor Networks (BSN) and Ambient Assisted Living (AAL) scenarios. Due to the widespread availability of IEEE 802.11, many localization platforms have been proposed, based on the Wi-Fi Received Signal Strength (RSS) indicator, using algorithms such as K-Nearest Neighbour (KNN), Maximum A Posteriori (MAP) and Minimum Mean Square Error (MMSE). In this paper, we introduce a hybrid method that combines the simplicity (and low cost) of Bluetooth Low Energy (BLE) and the popular 802.11 infrastructure, to improve the accuracy of indoor localization platforms. Building on KNN, we propose a new positioning algorithm (dubbed i-KNN) which is able to filter the initial fingerprint dataset (i.e., the radiomap), after considering the proximity of RSS fingerprints with respect to the BLE devices. In this way, i-KNN provides an optimised small subset of possible user locations, based on which it finally estimates the user position. The proposed methodology achieves fast positioning estimation due to the utilization of a fragment of the initial fingerprint dataset, while at the same time improves positioning accuracy by minimizing any calculation errors. PMID:28394268
Fusing Bluetooth Beacon Data with Wi-Fi Radiomaps for Improved Indoor Localization.
Kanaris, Loizos; Kokkinis, Akis; Liotta, Antonio; Stavrou, Stavros
2017-04-10
Indoor user localization and tracking are instrumental to a broad range of services and applications in the Internet of Things (IoT) and particularly in Body Sensor Networks (BSN) and Ambient Assisted Living (AAL) scenarios. Due to the widespread availability of IEEE 802.11, many localization platforms have been proposed, based on the Wi-Fi Received Signal Strength (RSS) indicator, using algorithms such as K -Nearest Neighbour (KNN), Maximum A Posteriori (MAP) and Minimum Mean Square Error (MMSE). In this paper, we introduce a hybrid method that combines the simplicity (and low cost) of Bluetooth Low Energy (BLE) and the popular 802.11 infrastructure, to improve the accuracy of indoor localization platforms. Building on KNN, we propose a new positioning algorithm (dubbed i-KNN) which is able to filter the initial fingerprint dataset (i.e., the radiomap), after considering the proximity of RSS fingerprints with respect to the BLE devices. In this way, i-KNN provides an optimised small subset of possible user locations, based on which it finally estimates the user position. The proposed methodology achieves fast positioning estimation due to the utilization of a fragment of the initial fingerprint dataset, while at the same time improves positioning accuracy by minimizing any calculation errors.
Performance Assessment Method for a Forged Fingerprint Detection Algorithm
NASA Astrophysics Data System (ADS)
Shin, Yong Nyuo; Jun, In-Kyung; Kim, Hyun; Shin, Woochang
The threat of invasion of privacy and of the illegal appropriation of information both increase with the expansion of the biometrics service environment to open systems. However, while certificates or smart cards can easily be cancelled and reissued if found to be missing, there is no way to recover the unique biometric information of an individual following a security breach. With the recognition that this threat factor may disrupt the large-scale civil service operations approaching implementation, such as electronic ID cards and e-Government systems, many agencies and vendors around the world continue to develop forged fingerprint detection technology, but no objective performance assessment method has, to date, been reported. Therefore, in this paper, we propose a methodology designed to evaluate the objective performance of the forged fingerprint detection technology that is currently attracting a great deal of attention.
Liu, Yanqiu; Lu, Huijuan; Yan, Ke; Xia, Haixia; An, Chunlin
2016-01-01
Embedding cost-sensitive factors into the classifiers increases the classification stability and reduces the classification costs for classifying high-scale, redundant, and imbalanced datasets, such as the gene expression data. In this study, we extend our previous work, that is, Dissimilar ELM (D-ELM), by introducing misclassification costs into the classifier. We name the proposed algorithm as the cost-sensitive D-ELM (CS-D-ELM). Furthermore, we embed rejection cost into the CS-D-ELM to increase the classification stability of the proposed algorithm. Experimental results show that the rejection cost embedded CS-D-ELM algorithm effectively reduces the average and overall cost of the classification process, while the classification accuracy still remains competitive. The proposed method can be extended to classification problems of other redundant and imbalanced data.
Fang, Guihua; Goh, Jing Yeen; Tay, Manjun; Lau, Hiu Fung; Li, Sam Fong Yau
2013-06-01
The correct identification of oils and fats is important to consumers from both commercial and health perspectives. Proton nuclear magnetic resonance ((1)H NMR) spectroscopy, gas chromatography-mass spectrometry (GC/MS) fingerprinting and chemometrics were employed successfully for the quality control of oils and fats. Principal component analysis (PCA) of both techniques showed group clustering of 14 types of oils and fats. Partial least squares discriminant analysis (PLS-DA) and orthogonal projections to latent structures discriminant analysis (OPLS-DA) using GC/MS data had excellent classification sensitivity and specificity compared to models using NMR data. Depending on the availability of the instruments, data from either technique can effectively be applied for the establishment of an oils and fats database to identify unknown samples. Partial least squares (PLS) models were successfully established for the detection of as low as 5% of lard and beef tallow spiked into canola oil, thus illustrating possible applications in Islamic and Jewish countries. Copyright © 2012 Elsevier Ltd. All rights reserved.
Reduction from cost-sensitive ordinal ranking to weighted binary classification.
Lin, Hsuan-Tien; Li, Ling
2012-05-01
We present a reduction framework from ordinal ranking to binary classification. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a ranker from the binary classifier. Based on the framework, we show that a weighted 0/1 loss of the binary classifier upper-bounds the mislabeling cost of the ranker, both error-wise and regret-wise. Our framework allows not only the design of good ordinal ranking algorithms based on well-tuned binary classification approaches, but also the derivation of new generalization bounds for ordinal ranking from known bounds for binary classification. In addition, our framework unifies many existing ordinal ranking algorithms, such as perceptron ranking and support vector ordinal regression. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages in terms of both training speed and generalization performance over existing algorithms. In addition, the newly designed algorithms lead to better cost-sensitive ordinal ranking performance, as well as improved listwise ranking performance.
Lin, Kuan-Cheng; Hsieh, Yi-Hsiu
2015-10-01
The classification and analysis of data is an important issue in today's research. Selecting a suitable set of features makes it possible to classify an enormous quantity of data quickly and efficiently. Feature selection is generally viewed as a problem of feature subset selection, such as combination optimization problems. Evolutionary algorithms using random search methods have proven highly effective in obtaining solutions to problems of optimization in a diversity of applications. In this study, we developed a hybrid evolutionary algorithm based on endocrine-based particle swarm optimization (EPSO) and artificial bee colony (ABC) algorithms in conjunction with a support vector machine (SVM) for the selection of optimal feature subsets for the classification of datasets. The results of experiments using specific UCI medical datasets demonstrate that the accuracy of the proposed hybrid evolutionary algorithm is superior to that of basic PSO, EPSO and ABC algorithms, with regard to classification accuracy using subsets with a reduced number of features.
Simple-random-sampling-based multiclass text classification algorithm.
Liu, Wuying; Wang, Lin; Yi, Mianzhu
2014-01-01
Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements.
Cloud classification from satellite data using a fuzzy sets algorithm: A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1988-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
Ion mobility spectrometry fingerprints: A rapid detection technology for adulteration of sesame oil.
Zhang, Liangxiao; Shuai, Qian; Li, Peiwu; Zhang, Qi; Ma, Fei; Zhang, Wen; Ding, Xiaoxia
2016-02-01
A simple and rapid detection technology was proposed based on ion mobility spectrometry (IMS) fingerprints to determine potential adulteration of sesame oil. Oil samples were diluted by n-hexane and analyzed by IMS for 20s. Then, chemometric methods were employed to establish discriminant models for sesame oils and four other edible oils, pure and adulterated sesame oils, and pure and counterfeit sesame oils, respectively. Finally, Random Forests (RF) classification model could correctly classify all five types of edible oils. The detection results indicated that the discriminant models built by recursive support vector machine (R-SVM) method could identify adulterated sesame oil samples (⩾ 10%) with an accuracy value of 94.2%. Therefore, IMS was shown to be an effective method to detect the adulterated sesame oils. Meanwhile, IMS fingerprints work well to detect the counterfeit sesame oils produced by adding sesame oil essence into cheaper edible oils. Copyright © 2015 Elsevier Ltd. All rights reserved.
Cao, Jianfang; Cui, Hongyan; Shi, Hao; Jiao, Lijuan
2016-01-01
A back-propagation (BP) neural network can solve complicated random nonlinear mapping problems; therefore, it can be applied to a wide range of problems. However, as the sample size increases, the time required to train BP neural networks becomes lengthy. Moreover, the classification accuracy decreases as well. To improve the classification accuracy and runtime efficiency of the BP neural network algorithm, we proposed a parallel design and realization method for a particle swarm optimization (PSO)-optimized BP neural network based on MapReduce on the Hadoop platform using both the PSO algorithm and a parallel design. The PSO algorithm was used to optimize the BP neural network's initial weights and thresholds and improve the accuracy of the classification algorithm. The MapReduce parallel programming model was utilized to achieve parallel processing of the BP algorithm, thereby solving the problems of hardware and communication overhead when the BP neural network addresses big data. Datasets on 5 different scales were constructed using the scene image library from the SUN Database. The classification accuracy of the parallel PSO-BP neural network algorithm is approximately 92%, and the system efficiency is approximately 0.85, which presents obvious advantages when processing big data. The algorithm proposed in this study demonstrated both higher classification accuracy and improved time efficiency, which represents a significant improvement obtained from applying parallel processing to an intelligent algorithm on big data.
Optimized extreme learning machine for urban land cover classification using hyperspectral imagery
NASA Astrophysics Data System (ADS)
Su, Hongjun; Tian, Shufang; Cai, Yue; Sheng, Yehua; Chen, Chen; Najafian, Maryam
2017-12-01
This work presents a new urban land cover classification framework using the firefly algorithm (FA) optimized extreme learning machine (ELM). FA is adopted to optimize the regularization coefficient C and Gaussian kernel σ for kernel ELM. Additionally, effectiveness of spectral features derived from an FA-based band selection algorithm is studied for the proposed classification task. Three sets of hyperspectral databases were recorded using different sensors, namely HYDICE, HyMap, and AVIRIS. Our study shows that the proposed method outperforms traditional classification algorithms such as SVM and reduces computational cost significantly.
Auto-SEIA: simultaneous optimization of image processing and machine learning algorithms
NASA Astrophysics Data System (ADS)
Negro Maggio, Valentina; Iocchi, Luca
2015-02-01
Object classification from images is an important task for machine vision and it is a crucial ingredient for many computer vision applications, ranging from security and surveillance to marketing. Image based object classification techniques properly integrate image processing and machine learning (i.e., classification) procedures. In this paper we present a system for automatic simultaneous optimization of algorithms and parameters for object classification from images. More specifically, the proposed system is able to process a dataset of labelled images and to return a best configuration of image processing and classification algorithms and of their parameters with respect to the accuracy of classification. Experiments with real public datasets are used to demonstrate the effectiveness of the developed system.
NASA Astrophysics Data System (ADS)
Hildebrandt, Mario; Kiltz, Stefan; Dittmann, Jana; Vielhauer, Claus
2014-02-01
In crime scene forensics latent fingerprints are found on various substrates. Nowadays primarily physical or chemical preprocessing techniques are applied for enhancing the visibility of the fingerprint trace. In order to avoid altering the trace it has been shown that contact-less sensors offer a non-destructive acquisition approach. Here, the exploitation of fingerprint or substrate properties and the utilization of signal processing techniques are an essential requirement to enhance the fingerprint visibility. However, especially the optimal sensory is often substrate-dependent. An enhanced generic pattern recognition based contrast enhancement approach for scans of a chromatic white light sensor is introduced in Hildebrandt et al.1 using statistical, structural and Benford's law2 features for blocks of 50 micron. This approach achieves very good results for latent fingerprints on cooperative, non-textured, smooth substrates. However, on textured and structured substrates the error rates are very high and the approach thus unsuitable for forensic use cases. We propose the extension of the feature set with semantic features derived from known Gabor filter based exemplar fingerprint enhancement techniques by suggesting an Epsilon-neighborhood of each block in order to achieve an improved accuracy (called fingerprint ridge orientation semantics). Furthermore, we use rotation invariant Hu moments as an extension of the structural features and two additional preprocessing methods (separate X- and Y Sobel operators). This results in a 408-dimensional feature space. In our experiments we investigate and report the recognition accuracy for eight substrates, each with ten latent fingerprints: white furniture surface, veneered plywood, brushed stainless steel, aluminum foil, "Golden-Oak" veneer, non-metallic matte car body finish, metallic car body finish and blued metal. In comparison to Hildebrandt et al.,1 our evaluation shows a significant reduction of the error rates by 15.8 percent points on brushed stainless steel using the same classifier. This also allows for a successful biometric matching of 3 of the 8 latent fingerprint samples with the corresponding exemplar fingerprint on this particular substrate. For contrast enhancement analysis of classification results we suggest to use known Visual Quality Indexes (VQI)3 as a contrast enhancement quality indicator and discuss our first preliminary results using the exemplary chosen VQI Edge Similarity Score (ESS),4 showing a tendency that higher image differences between a substrate containing a fingerprint and a substrate with a blank surface correlate with a higher recognition accuracy between a latent fingerprint and an exemplar fingerprint. Those first preliminary results support further research into VQIs as contrast enhancement quality indicator for a given feature space.
Identification of Mobile Phones Using the Built-In Magnetometers Stimulated by Motion Patterns.
Baldini, Gianmarco; Dimc, Franc; Kamnik, Roman; Steri, Gary; Giuliani, Raimondo; Gentile, Claudio
2017-04-06
We investigate the identification of mobile phones through their built-in magnetometers. These electronic components have started to be widely deployed in mass market phones in recent years, and they can be exploited to uniquely identify mobile phones due their physical differences, which appear in the digital output generated by them. This is similar to approaches reported in the literature for other components of the mobile phone, including the digital camera, the microphones or their RF transmission components. In this paper, the identification is performed through an inexpensive device made up of a platform that rotates the mobile phone under test and a fixed magnet positioned on the edge of the rotating platform. When the mobile phone passes in front of the fixed magnet, the built-in magnetometer is stimulated, and its digital output is recorded and analyzed. For each mobile phone, the experiment is repeated over six different days to ensure consistency in the results. A total of 10 phones of different brands and models or of the same model were used in our experiment. The digital output from the magnetometers is synchronized and correlated, and statistical features are extracted to generate a fingerprint of the built-in magnetometer and, consequently, of the mobile phone. A SVM machine learning algorithm is used to classify the mobile phones on the basis of the extracted statistical features. Our results show that inter-model classification (i.e., different models and brands classification) is possible with great accuracy, but intra-model (i.e., phones with different serial numbers and same model) classification is more challenging, the resulting accuracy being just slightly above random choice.
Identification of Mobile Phones Using the Built-In Magnetometers Stimulated by Motion Patterns
Baldini, Gianmarco; Dimc, Franc; Kamnik, Roman; Steri, Gary; Giuliani, Raimondo; Gentile, Claudio
2017-01-01
We investigate the identification of mobile phones through their built-in magnetometers. These electronic components have started to be widely deployed in mass market phones in recent years, and they can be exploited to uniquely identify mobile phones due their physical differences, which appear in the digital output generated by them. This is similar to approaches reported in the literature for other components of the mobile phone, including the digital camera, the microphones or their RF transmission components. In this paper, the identification is performed through an inexpensive device made up of a platform that rotates the mobile phone under test and a fixed magnet positioned on the edge of the rotating platform. When the mobile phone passes in front of the fixed magnet, the built-in magnetometer is stimulated, and its digital output is recorded and analyzed. For each mobile phone, the experiment is repeated over six different days to ensure consistency in the results. A total of 10 phones of different brands and models or of the same model were used in our experiment. The digital output from the magnetometers is synchronized and correlated, and statistical features are extracted to generate a fingerprint of the built-in magnetometer and, consequently, of the mobile phone. A SVM machine learning algorithm is used to classify the mobile phones on the basis of the extracted statistical features. Our results show that inter-model classification (i.e., different models and brands classification) is possible with great accuracy, but intra-model (i.e., phones with different serial numbers and same model) classification is more challenging, the resulting accuracy being just slightly above random choice. PMID:28383482
Performance-scalable volumetric data classification for online industrial inspection
NASA Astrophysics Data System (ADS)
Abraham, Aby J.; Sadki, Mustapha; Lea, R. M.
2002-03-01
Non-intrusive inspection and non-destructive testing of manufactured objects with complex internal structures typically requires the enhancement, analysis and visualization of high-resolution volumetric data. Given the increasing availability of fast 3D scanning technology (e.g. cone-beam CT), enabling on-line detection and accurate discrimination of components or sub-structures, the inherent complexity of classification algorithms inevitably leads to throughput bottlenecks. Indeed, whereas typical inspection throughput requirements range from 1 to 1000 volumes per hour, depending on density and resolution, current computational capability is one to two orders-of-magnitude less. Accordingly, speeding up classification algorithms requires both reduction of algorithm complexity and acceleration of computer performance. A shape-based classification algorithm, offering algorithm complexity reduction, by using ellipses as generic descriptors of solids-of-revolution, and supporting performance-scalability, by exploiting the inherent parallelism of volumetric data, is presented. A two-stage variant of the classical Hough transform is used for ellipse detection and correlation of the detected ellipses facilitates position-, scale- and orientation-invariant component classification. Performance-scalability is achieved cost-effectively by accelerating a PC host with one or more COTS (Commercial-Off-The-Shelf) PCI multiprocessor cards. Experimental results are reported to demonstrate the feasibility and cost-effectiveness of the data-parallel classification algorithm for on-line industrial inspection applications.
Al-Rajab, Murad; Lu, Joan; Xu, Qiang
2017-07-01
This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Darlow, Luke N.; Akhoury, Sharat S.; Connan, James
2015-02-01
Standard surface fingerprint scanners are vulnerable to counterfeiting attacks and also failure due to skin damage and distortion. Thus a high security and damage resistant means of fingerprint acquisition is needed, providing scope for new approaches and technologies. Optical Coherence Tomography (OCT) is a high resolution imaging technology that can be used to image the human fingertip and allow for the extraction of a subsurface fingerprint. Being robust toward spoofing and damage, the subsurface fingerprint is an attractive solution. However, the nature of the OCT scanning process induces speckle: a correlative and multiplicative noise. Six speckle reducing filters for the digital enhancement of OCT fingertip scans have been evaluated. The optimized Bayesian non-local means algorithm improved the structural similarity between processed and reference images by 34%, increased the signal-to-noise ratio, and yielded the most promising visual results. An adaptive wavelet approach, originally designed for ultrasound imaging, and a speckle reducing anisotropic diffusion approach also yielded promising results. A reformulation of these in future work, with an OCT-speckle specific model, may improve their performance.
Salari, Nader; Shohaimi, Shamarina; Najafi, Farid; Nallappan, Meenakshii; Karishnarajah, Isthrinayagy
2014-01-01
Among numerous artificial intelligence approaches, k-Nearest Neighbor algorithms, genetic algorithms, and artificial neural networks are considered as the most common and effective methods in classification problems in numerous studies. In the present study, the results of the implementation of a novel hybrid feature selection-classification model using the above mentioned methods are presented. The purpose is benefitting from the synergies obtained from combining these technologies for the development of classification models. Such a combination creates an opportunity to invest in the strength of each algorithm, and is an approach to make up for their deficiencies. To develop proposed model, with the aim of obtaining the best array of features, first, feature ranking techniques such as the Fisher's discriminant ratio and class separability criteria were used to prioritize features. Second, the obtained results that included arrays of the top-ranked features were used as the initial population of a genetic algorithm to produce optimum arrays of features. Third, using a modified k-Nearest Neighbor method as well as an improved method of backpropagation neural networks, the classification process was advanced based on optimum arrays of the features selected by genetic algorithms. The performance of the proposed model was compared with thirteen well-known classification models based on seven datasets. Furthermore, the statistical analysis was performed using the Friedman test followed by post-hoc tests. The experimental findings indicated that the novel proposed hybrid model resulted in significantly better classification performance compared with all 13 classification methods. Finally, the performance results of the proposed model was benchmarked against the best ones reported as the state-of-the-art classifiers in terms of classification accuracy for the same data sets. The substantial findings of the comprehensive comparative study revealed that performance of the proposed model in terms of classification accuracy is desirable, promising, and competitive to the existing state-of-the-art classification models. PMID:25419659
Research on Remote Sensing Image Classification Based on Feature Level Fusion
NASA Astrophysics Data System (ADS)
Yuan, L.; Zhu, G.
2018-04-01
Remote sensing image classification, as an important direction of remote sensing image processing and application, has been widely studied. However, in the process of existing classification algorithms, there still exists the phenomenon of misclassification and missing points, which leads to the final classification accuracy is not high. In this paper, we selected Sentinel-1A and Landsat8 OLI images as data sources, and propose a classification method based on feature level fusion. Compare three kind of feature level fusion algorithms (i.e., Gram-Schmidt spectral sharpening, Principal Component Analysis transform and Brovey transform), and then select the best fused image for the classification experimental. In the classification process, we choose four kinds of image classification algorithms (i.e. Minimum distance, Mahalanobis distance, Support Vector Machine and ISODATA) to do contrast experiment. We use overall classification precision and Kappa coefficient as the classification accuracy evaluation criteria, and the four classification results of fused image are analysed. The experimental results show that the fusion effect of Gram-Schmidt spectral sharpening is better than other methods. In four kinds of classification algorithms, the fused image has the best applicability to Support Vector Machine classification, the overall classification precision is 94.01 % and the Kappa coefficients is 0.91. The fused image with Sentinel-1A and Landsat8 OLI is not only have more spatial information and spectral texture characteristics, but also enhances the distinguishing features of the images. The proposed method is beneficial to improve the accuracy and stability of remote sensing image classification.
Classification algorithm of lung lobe for lung disease cases based on multislice CT images
NASA Astrophysics Data System (ADS)
Matsuhiro, M.; Kawata, Y.; Niki, N.; Nakano, Y.; Mishima, M.; Ohmatsu, H.; Tsuchida, T.; Eguchi, K.; Kaneko, M.; Moriyama, N.
2011-03-01
With the development of multi-slice CT technology, to obtain an accurate 3D image of lung field in a short time is possible. To support that, a lot of image processing methods need to be developed. In clinical setting for diagnosis of lung cancer, it is important to study and analyse lung structure. Therefore, classification of lung lobe provides useful information for lung cancer analysis. In this report, we describe algorithm which classify lungs into lung lobes for lung disease cases from multi-slice CT images. The classification algorithm of lung lobes is efficiently carried out using information of lung blood vessel, bronchus, and interlobar fissure. Applying the classification algorithms to multi-slice CT images of 20 normal cases and 5 lung disease cases, we demonstrate the usefulness of the proposed algorithms.
Research on aviation unsafe incidents classification with improved TF-IDF algorithm
NASA Astrophysics Data System (ADS)
Wang, Yanhua; Zhang, Zhiyuan; Huo, Weigang
2016-05-01
The text content of Aviation Safety Confidential Reports contains a large number of valuable information. Term frequency-inverse document frequency algorithm is commonly used in text analysis, but it does not take into account the sequential relationship of the words in the text and its role in semantic expression. According to the seven category labels of civil aviation unsafe incidents, aiming at solving the problems of TF-IDF algorithm, this paper improved TF-IDF algorithm based on co-occurrence network; established feature words extraction and words sequential relations for classified incidents. Aviation domain lexicon was used to improve the accuracy rate of classification. Feature words network model was designed for multi-documents unsafe incidents classification, and it was used in the experiment. Finally, the classification accuracy of improved algorithm was verified by the experiments.
Quantum Algorithm for K-Nearest Neighbors Classification Based on the Metric of Hamming Distance
NASA Astrophysics Data System (ADS)
Ruan, Yue; Xue, Xiling; Liu, Heng; Tan, Jianing; Li, Xi
2017-11-01
K-nearest neighbors (KNN) algorithm is a common algorithm used for classification, and also a sub-routine in various complicated machine learning tasks. In this paper, we presented a quantum algorithm (QKNN) for implementing this algorithm based on the metric of Hamming distance. We put forward a quantum circuit for computing Hamming distance between testing sample and each feature vector in the training set. Taking advantage of this method, we realized a good analog for classical KNN algorithm by setting a distance threshold value t to select k - n e a r e s t neighbors. As a result, QKNN achieves O( n 3) performance which is only relevant to the dimension of feature vectors and high classification accuracy, outperforms Llyod's algorithm (Lloyd et al. 2013) and Wiebe's algorithm (Wiebe et al. 2014).
Identification and classification of hubs in brain networks.
Sporns, Olaf; Honey, Christopher J; Kötter, Rolf
2007-10-17
Brain regions in the mammalian cerebral cortex are linked by a complex network of fiber bundles. These inter-regional networks have previously been analyzed in terms of their node degree, structural motif, path length and clustering coefficient distributions. In this paper we focus on the identification and classification of hub regions, which are thought to play pivotal roles in the coordination of information flow. We identify hubs and characterize their network contributions by examining motif fingerprints and centrality indices for all regions within the cerebral cortices of both the cat and the macaque. Motif fingerprints capture the statistics of local connection patterns, while measures of centrality identify regions that lie on many of the shortest paths between parts of the network. Within both cat and macaque networks, we find that a combination of degree, motif participation, betweenness centrality and closeness centrality allows for reliable identification of hub regions, many of which have previously been functionally classified as polysensory or multimodal. We then classify hubs as either provincial (intra-cluster) hubs or connector (inter-cluster) hubs, and proceed to show that lesioning hubs of each type from the network produces opposite effects on the small-world index. Our study presents an approach to the identification and classification of putative hub regions in brain networks on the basis of multiple network attributes and charts potential links between the structural embedding of such regions and their functional roles.
CNN universal machine as classificaton platform: an art-like clustering algorithm.
Bálya, David
2003-12-01
Fast and robust classification of feature vectors is a crucial task in a number of real-time systems. A cellular neural/nonlinear network universal machine (CNN-UM) can be very efficient as a feature detector. The next step is to post-process the results for object recognition. This paper shows how a robust classification scheme based on adaptive resonance theory (ART) can be mapped to the CNN-UM. Moreover, this mapping is general enough to include different types of feed-forward neural networks. The designed analogic CNN algorithm is capable of classifying the extracted feature vectors keeping the advantages of the ART networks, such as robust, plastic and fault-tolerant behaviors. An analogic algorithm is presented for unsupervised classification with tunable sensitivity and automatic new class creation. The algorithm is extended for supervised classification. The presented binary feature vector classification is implemented on the existing standard CNN-UM chips for fast classification. The experimental evaluation shows promising performance after 100% accuracy on the training set.
Spatial Classification of Orchards and Vineyards with High Spatial Resolution Panchromatic Imagery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Warner, Timothy; Steinmaus, Karen L.
2005-02-01
New high resolution single spectral band imagery offers the capability to conduct image classifications based on spatial patterns in imagery. A classification algorithm based on autocorrelation patterns was developed to automatically extract orchards and vineyards from satellite imagery. The algorithm was tested on IKONOS imagery over Granger, WA, which resulted in a classification accuracy of 95%.
Multi-label literature classification based on the Gene Ontology graph.
Jin, Bo; Muller, Brian; Zhai, Chengxiang; Lu, Xinghua
2008-12-08
The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on the literature.
A Survey on Sentiment Classification in Face Recognition
NASA Astrophysics Data System (ADS)
Qian, Jingyu
2018-01-01
Face recognition has been an important topic for both industry and academia for a long time. K-means clustering, autoencoder, and convolutional neural network, each representing a design idea for face recognition method, are three popular algorithms to deal with face recognition problems. It is worthwhile to summarize and compare these three different algorithms. This paper will focus on one specific face recognition problem-sentiment classification from images. Three different algorithms for sentiment classification problems will be summarized, including k-means clustering, autoencoder, and convolutional neural network. An experiment with the application of these algorithms on a specific dataset of human faces will be conducted to illustrate how these algorithms are applied and their accuracy. Finally, the three algorithms are compared based on the accuracy result.
2015-04-01
Current routine MRI examinations rely on the acquisition of qualitative images that have a contrast "weighted" for a mixture of (magnetic) tissue properties. Recently, a novel approach was introduced, namely MR Fingerprinting (MRF) with a completely different approach to data acquisition, post-processing and visualization. Instead of using a repeated, serial acquisition of data for the characterization of individual parameters of interest, MRF uses a pseudo randomized acquisition that causes the signals from different tissues to have a unique signal evolution or 'fingerprint' that is simultaneously a function of the multiple material properties under investigation. The processing after acquisition involves a pattern recognition algorithm to match the fingerprints to a predefined dictionary of predicted signal evolutions. These can then be translated into quantitative maps of the magnetic parameters of interest. MR Fingerprinting (MRF) is a technique that could theoretically be applied to most traditional qualitative MRI methods and replaces them with acquisition of truly quantitative tissue measures. MRF is, thereby, expected to be much more accurate and reproducible than traditional MRI and should improve multi-center studies and significantly reduce reader bias when diagnostic imaging is performed. Key Points • MR fingerprinting (MRF) is a new approach to data acquisition, post-processing and visualization.• MRF provides highly accurate quantitative maps of T1, T2, proton density, diffusion.• MRF may offer multiparametric imaging with high reproducibility, and high potential for multicenter/ multivendor studies.
Land-cover classification in a moist tropical region of Brazil with Landsat TM imagery.
Li, Guiying; Lu, Dengsheng; Moran, Emilio; Hetrick, Scott
2011-01-01
This research aims to improve land-cover classification accuracy in a moist tropical region in Brazil by examining the use of different remote sensing-derived variables and classification algorithms. Different scenarios based on Landsat Thematic Mapper (TM) spectral data and derived vegetation indices and textural images, and different classification algorithms - maximum likelihood classification (MLC), artificial neural network (ANN), classification tree analysis (CTA), and object-based classification (OBC), were explored. The results indicated that a combination of vegetation indices as extra bands into Landsat TM multispectral bands did not improve the overall classification performance, but the combination of textural images was valuable for improving vegetation classification accuracy. In particular, the combination of both vegetation indices and textural images into TM multispectral bands improved overall classification accuracy by 5.6% and kappa coefficient by 6.25%. Comparison of the different classification algorithms indicated that CTA and ANN have poor classification performance in this research, but OBC improved primary forest and pasture classification accuracies. This research indicates that use of textural images or use of OBC are especially valuable for improving the vegetation classes such as upland and liana forest classes having complex stand structures and having relatively large patch sizes.
Land-cover classification in a moist tropical region of Brazil with Landsat TM imagery
LI, GUIYING; LU, DENGSHENG; MORAN, EMILIO; HETRICK, SCOTT
2011-01-01
This research aims to improve land-cover classification accuracy in a moist tropical region in Brazil by examining the use of different remote sensing-derived variables and classification algorithms. Different scenarios based on Landsat Thematic Mapper (TM) spectral data and derived vegetation indices and textural images, and different classification algorithms – maximum likelihood classification (MLC), artificial neural network (ANN), classification tree analysis (CTA), and object-based classification (OBC), were explored. The results indicated that a combination of vegetation indices as extra bands into Landsat TM multispectral bands did not improve the overall classification performance, but the combination of textural images was valuable for improving vegetation classification accuracy. In particular, the combination of both vegetation indices and textural images into TM multispectral bands improved overall classification accuracy by 5.6% and kappa coefficient by 6.25%. Comparison of the different classification algorithms indicated that CTA and ANN have poor classification performance in this research, but OBC improved primary forest and pasture classification accuracies. This research indicates that use of textural images or use of OBC are especially valuable for improving the vegetation classes such as upland and liana forest classes having complex stand structures and having relatively large patch sizes. PMID:22368311
Agent Collaborative Target Localization and Classification in Wireless Sensor Networks
Wang, Xue; Bi, Dao-wei; Ding, Liang; Wang, Sheng
2007-01-01
Wireless sensor networks (WSNs) are autonomous networks that have been frequently deployed to collaboratively perform target localization and classification tasks. Their autonomous and collaborative features resemble the characteristics of agents. Such similarities inspire the development of heterogeneous agent architecture for WSN in this paper. The proposed agent architecture views WSN as multi-agent systems and mobile agents are employed to reduce in-network communication. According to the architecture, an energy based acoustic localization algorithm is proposed. In localization, estimate of target location is obtained by steepest descent search. The search algorithm adapts to measurement environments by dynamically adjusting its termination condition. With the agent architecture, target classification is accomplished by distributed support vector machine (SVM). Mobile agents are employed for feature extraction and distributed SVM learning to reduce communication load. Desirable learning performance is guaranteed by combining support vectors and convex hull vectors. Fusion algorithms are designed to merge SVM classification decisions made from various modalities. Real world experiments with MICAz sensor nodes are conducted for vehicle localization and classification. Experimental results show the proposed agent architecture remarkably facilitates WSN designs and algorithm implementation. The localization and classification algorithms also prove to be accurate and energy efficient.
Iselin, Greg; Le Brocque, Robyne; Kenardy, Justin; Anderson, Vicki; McKinlay, Lynne
2010-10-01
Controversy surrounds the classification of posttraumatic stress disorder (PTSD), particularly in children and adolescents with traumatic brain injury (TBI). In these populations, it is difficult to differentiate TBI-related organic memory loss from dissociative amnesia. Several alternative PTSD classification algorithms have been proposed for use with children. This paper investigates DSM-IV-TR and alternative PTSD classification algorithms, including and excluding the dissociative amnesia item, in terms of their ability to predict psychosocial function following pediatric TBI. A sample of 184 children aged 6-14 years were recruited following emergency department presentation and/or hospital admission for TBI. PTSD was assessed via semi-structured clinical interview (CAPS-CA) with the child at 3 months post-injury. Psychosocial function was assessed using the parent report CHQ-PF50. Two alternative classification algorithms, the PTSD-AA and 2 of 3 algorithms, reached statistical significance. While the inclusion of the dissociative amnesia item increased prevalence rates across algorithms, it generally resulted in weaker associations with psychosocial function. The PTSD-AA algorithm appears to have the strongest association with psychosocial function following TBI in children and adolescents. Removing the dissociative amnesia item from the diagnostic algorithm generally results in improved validity. Copyright 2010 Elsevier Ltd. All rights reserved.
Spectral Domain RF Fingerprinting for 802.11 Wireless Devices
2010-03-01
induce unintentional modulation effects . If these effects (features) are sufficiently unique, it becomes possible to identify a device us- ing its...Previous AFIT research has demonstrated the effectiveness of RF Fin- gerprinting using 802.11A signals with 1) spectral correlation on Power Spectral...32 4.5. SD Intra-manufacturer Classification: Effects of Burst Location Error
Maximum Margin Clustering of Hyperspectral Data
NASA Astrophysics Data System (ADS)
Niazmardi, S.; Safari, A.; Homayouni, S.
2013-09-01
In recent decades, large margin methods such as Support Vector Machines (SVMs) are supposed to be the state-of-the-art of supervised learning methods for classification of hyperspectral data. However, the results of these algorithms mainly depend on the quality and quantity of available training data. To tackle down the problems associated with the training data, the researcher put effort into extending the capability of large margin algorithms for unsupervised learning. One of the recent proposed algorithms is Maximum Margin Clustering (MMC). The MMC is an unsupervised SVMs algorithm that simultaneously estimates both the labels and the hyperplane parameters. Nevertheless, the optimization of the MMC algorithm is a non-convex problem. Most of the existing MMC methods rely on the reformulating and the relaxing of the non-convex optimization problem as semi-definite programs (SDP), which are computationally very expensive and only can handle small data sets. Moreover, most of these algorithms are two-class classification, which cannot be used for classification of remotely sensed data. In this paper, a new MMC algorithm is used that solve the original non-convex problem using Alternative Optimization method. This algorithm is also extended for multi-class classification and its performance is evaluated. The results of the proposed algorithm show that the algorithm has acceptable results for hyperspectral data clustering.
An Extended Spectral-Spatial Classification Approach for Hyperspectral Data
NASA Astrophysics Data System (ADS)
Akbari, D.
2017-11-01
In this paper an extended classification approach for hyperspectral imagery based on both spectral and spatial information is proposed. The spatial information is obtained by an enhanced marker-based minimum spanning forest (MSF) algorithm. Three different methods of dimension reduction are first used to obtain the subspace of hyperspectral data: (1) unsupervised feature extraction methods including principal component analysis (PCA), independent component analysis (ICA), and minimum noise fraction (MNF); (2) supervised feature extraction including decision boundary feature extraction (DBFE), discriminate analysis feature extraction (DAFE), and nonparametric weighted feature extraction (NWFE); (3) genetic algorithm (GA). The spectral features obtained are then fed into the enhanced marker-based MSF classification algorithm. In the enhanced MSF algorithm, the markers are extracted from the classification maps obtained by both SVM and watershed segmentation algorithm. To evaluate the proposed approach, the Pavia University hyperspectral data is tested. Experimental results show that the proposed approach using GA achieves an approximately 8 % overall accuracy higher than the original MSF-based algorithm.
NASA Astrophysics Data System (ADS)
Yan, Dan; Bai, Lianfa; Zhang, Yi; Han, Jing
2018-02-01
For the problems of missing details and performance of the colorization based on sparse representation, we propose a conceptual model framework for colorizing gray-scale images, and then a multi-sparse dictionary colorization algorithm based on the feature classification and detail enhancement (CEMDC) is proposed based on this framework. The algorithm can achieve a natural colorized effect for a gray-scale image, and it is consistent with the human vision. First, the algorithm establishes a multi-sparse dictionary classification colorization model. Then, to improve the accuracy rate of the classification, the corresponding local constraint algorithm is proposed. Finally, we propose a detail enhancement based on Laplacian Pyramid, which is effective in solving the problem of missing details and improving the speed of image colorization. In addition, the algorithm not only realizes the colorization of the visual gray-scale image, but also can be applied to the other areas, such as color transfer between color images, colorizing gray fusion images, and infrared images.
Cao, Jianfang; Cui, Hongyan; Shi, Hao; Jiao, Lijuan
2016-01-01
A back-propagation (BP) neural network can solve complicated random nonlinear mapping problems; therefore, it can be applied to a wide range of problems. However, as the sample size increases, the time required to train BP neural networks becomes lengthy. Moreover, the classification accuracy decreases as well. To improve the classification accuracy and runtime efficiency of the BP neural network algorithm, we proposed a parallel design and realization method for a particle swarm optimization (PSO)-optimized BP neural network based on MapReduce on the Hadoop platform using both the PSO algorithm and a parallel design. The PSO algorithm was used to optimize the BP neural network’s initial weights and thresholds and improve the accuracy of the classification algorithm. The MapReduce parallel programming model was utilized to achieve parallel processing of the BP algorithm, thereby solving the problems of hardware and communication overhead when the BP neural network addresses big data. Datasets on 5 different scales were constructed using the scene image library from the SUN Database. The classification accuracy of the parallel PSO-BP neural network algorithm is approximately 92%, and the system efficiency is approximately 0.85, which presents obvious advantages when processing big data. The algorithm proposed in this study demonstrated both higher classification accuracy and improved time efficiency, which represents a significant improvement obtained from applying parallel processing to an intelligent algorithm on big data. PMID:27304987
Vehicle Classification Using an Imbalanced Dataset Based on a Single Magnetic Sensor.
Xu, Chang; Wang, Yingguan; Bao, Xinghe; Li, Fengrong
2018-05-24
This paper aims to improve the accuracy of automatic vehicle classifiers for imbalanced datasets. Classification is made through utilizing a single anisotropic magnetoresistive sensor, with the models of vehicles involved being classified into hatchbacks, sedans, buses, and multi-purpose vehicles (MPVs). Using time domain and frequency domain features in combination with three common classification algorithms in pattern recognition, we develop a novel feature extraction method for vehicle classification. These three common classification algorithms are the k-nearest neighbor, the support vector machine, and the back-propagation neural network. Nevertheless, a problem remains with the original vehicle magnetic dataset collected being imbalanced, and may lead to inaccurate classification results. With this in mind, we propose an approach called SMOTE, which can further boost the performance of classifiers. Experimental results show that the k-nearest neighbor (KNN) classifier with the SMOTE algorithm can reach a classification accuracy of 95.46%, thus minimizing the effect of the imbalance.
Carotenoids Database: structures, chemical fingerprints and distribution among organisms.
Yabuzaki, Junko
2017-01-01
To promote understanding of how organisms are related via carotenoids, either evolutionarily or symbiotically, or in food chains through natural histories, we built the Carotenoids Database. This provides chemical information on 1117 natural carotenoids with 683 source organisms. For extracting organisms closely related through the biosynthesis of carotenoids, we offer a new similarity search system 'Search similar carotenoids' using our original chemical fingerprint 'Carotenoid DB Chemical Fingerprints'. These Carotenoid DB Chemical Fingerprints describe the chemical substructure and the modification details based upon International Union of Pure and Applied Chemistry (IUPAC) semi-systematic names of the carotenoids. The fingerprints also allow (i) easier prediction of six biological functions of carotenoids: provitamin A, membrane stabilizers, odorous substances, allelochemicals, antiproliferative activity and reverse MDR activity against cancer cells, (ii) easier classification of carotenoid structures, (iii) partial and exact structure searching and (iv) easier extraction of structural isomers and stereoisomers. We believe this to be the first attempt to establish fingerprints using the IUPAC semi-systematic names. For extracting close profiled organisms, we provide a new tool 'Search similar profiled organisms'. Our current statistics show some insights into natural history: carotenoids seem to have been spread largely by bacteria, as they produce C30, C40, C45 and C50 carotenoids, with the widest range of end groups, and they share a small portion of C40 carotenoids with eukaryotes. Archaea share an even smaller portion with eukaryotes. Eukaryotes then have evolved a considerable variety of C40 carotenoids. Considering carotenoids, eukaryotes seem more closely related to bacteria than to archaea aside from 16S rRNA lineage analysis. : http://carotenoiddb.jp. © The Author(s) 2017. Published by Oxford University Press.
Performance of resonant radar target identification algorithms using intra-class weighting functions
NASA Astrophysics Data System (ADS)
Mustafa, A.
The use of calibrated resonant-region radar cross section (RCS) measurements of targets for the classification of large aircraft is discussed. Errors in the RCS estimate of full scale aircraft flying over an ocean, introduced by the ionospheric variability and the sea conditions were studied. The Weighted Target Representative (WTR) classification algorithm was developed, implemented, tested and compared with the nearest neighbor (NN) algorithm. The WTR-algorithm has a low sensitivity to the uncertainty in the aspect angle of the unknown target returns. In addition, this algorithm was based on the development of a new catalog of representative data which reduces the storage requirements and increases the computational efficiency of the classification system compared to the NN-algorithm. Experiments were designed to study and evaluate the characteristics of the WTR- and the NN-algorithms, investigate the classifiability of targets and study the relative behavior of the number of misclassifications as a function of the target backscatter features. The classification results and statistics were shown in the form of performance curves, performance tables and confusion tables.
A comparison of PCA/ICA for data preprocessing in remote sensing imagery classification
NASA Astrophysics Data System (ADS)
He, Hui; Yu, Xianchuan
2005-10-01
In this paper a performance comparison of a variety of data preprocessing algorithms in remote sensing image classification is presented. These selected algorithms are principal component analysis (PCA) and three different independent component analyses, ICA (Fast-ICA (Aapo Hyvarinen, 1999), Kernel-ICA (KCCA and KGV (Bach & Jordan, 2002), EFFICA (Aiyou Chen & Peter Bickel, 2003). These algorithms were applied to a remote sensing imagery (1600×1197), obtained from Shunyi, Beijing. For classification, a MLC method is used for the raw and preprocessed data. The results show that classification with the preprocessed data have more confident results than that with raw data and among the preprocessing algorithms, ICA algorithms improve on PCA and EFFICA performs better than the others. The convergence of these ICA algorithms (for data points more than a million) are also studied, the result shows EFFICA converges much faster than the others. Furthermore, because EFFICA is a one-step maximum likelihood estimate (MLE) which reaches asymptotic Fisher efficiency (EFFICA), it computers quite small so that its demand of memory come down greatly, which settled the "out of memory" problem occurred in the other algorithms.
Indoor Pedestrian Localization Using iBeacon and Improved Kalman Filter.
Sung, Kwangjae; Lee, Dong Kyu 'Roy'; Kim, Hwangnam
2018-05-26
The reliable and accurate indoor pedestrian positioning is one of the biggest challenges for location-based systems and applications. Most pedestrian positioning systems have drift error and large bias due to low-cost inertial sensors and random motions of human being, as well as unpredictable and time-varying radio-frequency (RF) signals used for position determination. To solve this problem, many indoor positioning approaches that integrate the user's motion estimated by dead reckoning (DR) method and the location data obtained by RSS fingerprinting through Bayesian filter, such as the Kalman filter (KF), unscented Kalman filter (UKF), and particle filter (PF), have recently been proposed to achieve higher positioning accuracy in indoor environments. Among Bayesian filtering methods, PF is the most popular integrating approach and can provide the best localization performance. However, since PF uses a large number of particles for the high performance, it can lead to considerable computational cost. This paper presents an indoor positioning system implemented on a smartphone, which uses simple dead reckoning (DR), RSS fingerprinting using iBeacon and machine learning scheme, and improved KF. The core of the system is the enhanced KF called a sigma-point Kalman particle filter (SKPF), which localize the user leveraging both the unscented transform of UKF and the weighting method of PF. The SKPF algorithm proposed in this study is used to provide the enhanced positioning accuracy by fusing positional data obtained from both DR and fingerprinting with uncertainty. The SKPF algorithm can achieve better positioning accuracy than KF and UKF and comparable performance compared to PF, and it can provide higher computational efficiency compared with PF. iBeacon in our positioning system is used for energy-efficient localization and RSS fingerprinting. We aim to design the localization scheme that can realize the high positioning accuracy, computational efficiency, and energy efficiency through the SKPF and iBeacon indoors. Empirical experiments in real environments show that the use of the SKPF algorithm and iBeacon in our indoor localization scheme can achieve very satisfactory performance in terms of localization accuracy, computational cost, and energy efficiency.
Toward a hyperspectral optical signature of extra virgin olive oil
NASA Astrophysics Data System (ADS)
Mignani, A. G.; Ciaccheri, L.; Thienpont, H.; Ottevaere, H.; Attilio, C.; Cimato, A.
2007-05-01
Italian extra virgin olive oils bearing labels of certified area of origin were considered. Their multispectral digital signature was measured by means of absorption spectroscopy in the 200-1700 nm spectral range. The instrumentation was a fiber optic-based, cheap, and compact device. The spectral data were processed by means of multivariate analysis and plotted on a 2D classification map. The map showed sharp clusters according to the geographical origin of the oils, thus demonstrating the potentials of UV-VIS-NIR spectroscopy for optical fingerprinting. Then, the spectral data were correlated to the content of the most important fatty acids. The good fitting achieved demonstrated that the optical fingerprinting can be used also for predicting nutritional and chemical parameters.
Automatic Classification Using Supervised Learning in a Medical Document Filtering Application.
ERIC Educational Resources Information Center
Mostafa, J.; Lam, W.
2000-01-01
Presents a multilevel model of the information filtering process that permits document classification. Evaluates a document classification approach based on a supervised learning algorithm, measures the accuracy of the algorithm in a neural network that was trained to classify medical documents on cell biology, and discusses filtering…
Principles of qualitative analysis in the chromatographic context.
Valcárcel, M; Cárdenas, S; Simonet, B M; Carrillo-Carrión, C
2007-07-27
This article presents the state of the art of qualitative analysis in the framework of the chromatographic analysis. After establishing the differences between two main classes of qualitative analysis (analyte identification and sample classification/qualification) the particularities of instrumental qualitative analysis are commented on. Qualitative chromatographic analysis for sample classification/qualification through the so-called chromatographic fingerprint (for complex samples) or the volatiles profile (through the direct coupling headspace-mass spectrometry using the chromatograph as interface) is discussed. Next, more technical exposition of the qualitative chromatographic information is presented supported by a variety of representative examples.
Benchmarking protein classification algorithms via supervised cross-validation.
Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor
2008-04-24
Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.
NASA Astrophysics Data System (ADS)
Yuldashev, M. N.; Vlasov, A. I.; Novikov, A. N.
2018-05-01
This paper focuses on the development of an energy-efficient algorithm for classification of states of a wireless sensor network using machine learning methods. The proposed algorithm reduces energy consumption by: 1) elimination of monitoring of parameters that do not affect the state of the sensor network, 2) reduction of communication sessions over the network (the data are transmitted only if their values can affect the state of the sensor network). The studies of the proposed algorithm have shown that at classification accuracy close to 100%, the number of communication sessions can be reduced by 80%.
LDA boost classification: boosting by topics
NASA Astrophysics Data System (ADS)
Lei, La; Qiao, Guo; Qimin, Cao; Qitao, Li
2012-12-01
AdaBoost is an efficacious classification algorithm especially in text categorization (TC) tasks. The methodology of setting up a classifier committee and voting on the documents for classification can achieve high categorization precision. However, traditional Vector Space Model can easily lead to the curse of dimensionality and feature sparsity problems; so it affects classification performance seriously. This article proposed a novel classification algorithm called LDABoost based on boosting ideology which uses Latent Dirichlet Allocation (LDA) to modeling the feature space. Instead of using words or phrase, LDABoost use latent topics as the features. In this way, the feature dimension is significantly reduced. Improved Naïve Bayes (NB) is designed as the weaker classifier which keeps the efficiency advantage of classic NB algorithm and has higher precision. Moreover, a two-stage iterative weighted method called Cute Integration in this article is proposed for improving the accuracy by integrating weak classifiers into strong classifier in a more rational way. Mutual Information is used as metrics of weights allocation. The voting information and the categorization decision made by basis classifiers are fully utilized for generating the strong classifier. Experimental results reveals LDABoost making categorization in a low-dimensional space, it has higher accuracy than traditional AdaBoost algorithms and many other classic classification algorithms. Moreover, its runtime consumption is lower than different versions of AdaBoost, TC algorithms based on support vector machine and Neural Networks.
Waveform Fingerprinting for Efficient Seismic Signal Detection
NASA Astrophysics Data System (ADS)
Yoon, C. E.; OReilly, O. J.; Beroza, G. C.
2013-12-01
Cross-correlating an earthquake waveform template with continuous waveform data has proven a powerful approach for detecting events missing from earthquake catalogs. If templates do not exist, it is possible to divide the waveform data into short overlapping time windows, then identify window pairs with similar waveforms. Applying these approaches to earthquake monitoring in seismic networks has tremendous potential to improve the completeness of earthquake catalogs, but because effort scales quadratically with time, it rapidly becomes computationally infeasible. We develop a fingerprinting technique to identify similar waveforms, using only a few compact features of the original data. The concept is similar to human fingerprints, which utilize key diagnostic features to identify people uniquely. Analogous audio-fingerprinting approaches have accurately and efficiently found similar audio clips within large databases; example applications include identifying songs and finding copyrighted content within YouTube videos. In order to fingerprint waveforms, we compute a spectrogram of the time series, and segment it into multiple overlapping windows (spectral images). For each spectral image, we apply a wavelet transform, and retain only the sign of the maximum magnitude wavelet coefficients. This procedure retains just the large-scale structure of the data, providing both robustness to noise and significant dimensionality reduction. Each fingerprint is a high-dimensional, sparse, binary data object that can be stored in a database without significant storage costs. Similar fingerprints within the database are efficiently searched using locality-sensitive hashing. We test this technique on waveform data from the Northern California Seismic Network that contains events not detected in the catalog. We show that this algorithm successfully identifies similar waveforms and detects uncataloged low magnitude events in addition to cataloged events, while running to completion faster than a comparison waveform autocorrelation code.
Identification and analysis of multigene families by comparison of exon fingerprints.
Brown, N P; Whittaker, A J; Newell, W R; Rawlings, C J; Beck, S
1995-06-02
Gene families are often recognised by sequence homology using similarity searching to find relationships, however, genomic sequence data provides gene architectural information not used by conventional search methods. In particular, intron positions and phases are expected to be relatively conserved features, because mis-splicing and reading frame shifts should be selected against. A fast search technique capable of detecting possible weak sequence homologies apparent at the intron/exon level of gene organization is presented for comparing spliceosomal genes and gene fragments. FINEX compares strings of exons delimited by intron/exon boundary positions and intron phases (exon fingerprint) using a global dynamic programming algorithm with a combined intron phase identity and exon size dissimilarity score. Exon fingerprints are typically two orders of magnitude smaller than their nucleic acid sequence counterparts giving rise to fast search times: a ranked search against a library of 6755 fingerprints for a typical three exon fingerprint completes in under 30 seconds on an ordinary workstation, while a worst case largest fingerprint of 52 exons completes in just over one minute. The short "sequence" length of exon fingerprints in comparisons is compensated for by the large exon alphabet compounded of intron phase types and a wide range of exon sizes, the latter contributing the most information to alignments. FINEX performs better in some searches than conventional methods, finding matches with similar exon organization, but low sequence homology. A search using a human serum albumin finds all members of the multigene family in the FINEX database at the top of the search ranking, despite very low amino acid percentage identities between family members. The method should complement conventional sequence searching and alignment techniques, offering a means of identifying otherwise hard to detect homologies where genomic data are available.
Assessment of statistical methods used in library-based approaches to microbial source tracking.
Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D
2003-12-01
Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.
Photometric Supernova Classification with Machine Learning
NASA Astrophysics Data System (ADS)
Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.
2016-08-01
Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.
NASA Astrophysics Data System (ADS)
Srinivasan, Yeshwanth; Hernes, Dana; Tulpule, Bhakti; Yang, Shuyu; Guo, Jiangling; Mitra, Sunanda; Yagneswaran, Sriraja; Nutter, Brian; Jeronimo, Jose; Phillips, Benny; Long, Rodney; Ferris, Daron
2005-04-01
Automated segmentation and classification of diagnostic markers in medical imagery are challenging tasks. Numerous algorithms for segmentation and classification based on statistical approaches of varying complexity are found in the literature. However, the design of an efficient and automated algorithm for precise classification of desired diagnostic markers is extremely image-specific. The National Library of Medicine (NLM), in collaboration with the National Cancer Institute (NCI), is creating an archive of 60,000 digitized color images of the uterine cervix. NLM is developing tools for the analysis and dissemination of these images over the Web for the study of visual features correlated with precancerous neoplasia and cancer. To enable indexing of images of the cervix, it is essential to develop algorithms for the segmentation of regions of interest, such as acetowhitened regions, and automatic identification and classification of regions exhibiting mosaicism and punctation. Success of such algorithms depends, primarily, on the selection of relevant features representing the region of interest. We present color and geometric features based statistical classification and segmentation algorithms yielding excellent identification of the regions of interest. The distinct classification of the mosaic regions from the non-mosaic ones has been obtained by clustering multiple geometric and color features of the segmented sections using various morphological and statistical approaches. Such automated classification methodologies will facilitate content-based image retrieval from the digital archive of uterine cervix and have the potential of developing an image based screening tool for cervical cancer.
Comparison Between Supervised and Unsupervised Classifications of Neuronal Cell Types: A Case Study
Guerra, Luis; McGarry, Laura M; Robles, Víctor; Bielza, Concha; Larrañaga, Pedro; Yuste, Rafael
2011-01-01
In the study of neural circuits, it becomes essential to discern the different neuronal cell types that build the circuit. Traditionally, neuronal cell types have been classified using qualitative descriptors. More recently, several attempts have been made to classify neurons quantitatively, using unsupervised clustering methods. While useful, these algorithms do not take advantage of previous information known to the investigator, which could improve the classification task. For neocortical GABAergic interneurons, the problem to discern among different cell types is particularly difficult and better methods are needed to perform objective classifications. Here we explore the use of supervised classification algorithms to classify neurons based on their morphological features, using a database of 128 pyramidal cells and 199 interneurons from mouse neocortex. To evaluate the performance of different algorithms we used, as a “benchmark,” the test to automatically distinguish between pyramidal cells and interneurons, defining “ground truth” by the presence or absence of an apical dendrite. We compared hierarchical clustering with a battery of different supervised classification algorithms, finding that supervised classifications outperformed hierarchical clustering. In addition, the selection of subsets of distinguishing features enhanced the classification accuracy for both sets of algorithms. The analysis of selected variables indicates that dendritic features were most useful to distinguish pyramidal cells from interneurons when compared with somatic and axonal morphological variables. We conclude that supervised classification algorithms are better matched to the general problem of distinguishing neuronal cell types when some information on these cell groups, in our case being pyramidal or interneuron, is known a priori. As a spin-off of this methodological study, we provide several methods to automatically distinguish neocortical pyramidal cells from interneurons, based on their morphologies. © 2010 Wiley Periodicals, Inc. Develop Neurobiol 71: 71–82, 2011 PMID:21154911
Na, X D; Zang, S Y; Wu, C S; Li, W L
2015-11-01
Knowledge of the spatial extent of forested wetlands is essential to many studies including wetland functioning assessment, greenhouse gas flux estimation, and wildlife suitable habitat identification. For discriminating forested wetlands from their adjacent land cover types, researchers have resorted to image analysis techniques applied to numerous remotely sensed data. While with some success, there is still no consensus on the optimal approaches for mapping forested wetlands. To address this problem, we examined two machine learning approaches, random forest (RF) and K-nearest neighbor (KNN) algorithms, and applied these two approaches to the framework of pixel-based and object-based classifications. The RF and KNN algorithms were constructed using predictors derived from Landsat 8 imagery, Radarsat-2 advanced synthetic aperture radar (SAR), and topographical indices. The results show that the objected-based classifications performed better than per-pixel classifications using the same algorithm (RF) in terms of overall accuracy and the difference of their kappa coefficients are statistically significant (p<0.01). There were noticeably omissions for forested and herbaceous wetlands based on the per-pixel classifications using the RF algorithm. As for the object-based image analysis, there were also statistically significant differences (p<0.01) of Kappa coefficient between results performed based on RF and KNN algorithms. The object-based classification using RF provided a more visually adequate distribution of interested land cover types, while the object classifications based on the KNN algorithm showed noticeably commissions for forested wetlands and omissions for agriculture land. This research proves that the object-based classification with RF using optical, radar, and topographical data improved the mapping accuracy of land covers and provided a feasible approach to discriminate the forested wetlands from the other land cover types in forestry area.
Performance of Activity Classification Algorithms in Free-living Older Adults
Sasaki, Jeffer Eidi; Hickey, Amanda; Staudenmayer, John; John, Dinesh; Kent, Jane A.; Freedson, Patty S.
2015-01-01
Purpose To compare activity type classification rates of machine learning algorithms trained on laboratory versus free-living accelerometer data in older adults. Methods Thirty-five older adults (21F and 14M ; 70.8 ± 4.9 y) performed selected activities in the laboratory while wearing three ActiGraph GT3X+ activity monitors (dominant hip, wrist, and ankle). Monitors were initialized to collect raw acceleration data at a sampling rate of 80 Hz. Fifteen of the participants also wore the GT3X+ in free-living settings and were directly observed for 2-3 hours. Time- and frequency- domain features from acceleration signals of each monitor were used to train Random Forest (RF) and Support Vector Machine (SVM) models to classify five activity types: sedentary, standing, household, locomotion, and recreational activities. All algorithms were trained on lab data (RFLab and SVMLab) and free-living data (RFFL and SVMFL) using 20 s signal sampling windows. Classification accuracy rates of both types of algorithms were tested on free-living data using a leave-one-out technique. Results Overall classification accuracy rates for the algorithms developed from lab data were between 49% (wrist) to 55% (ankle) for the SVMLab algorithms, and 49% (wrist) to 54% (ankle) for RFLab algorithms. The classification accuracy rates for SVMFL and RFFL algorithms ranged from 58% (wrist) to 69% (ankle) and from 61% (wrist) to 67% (ankle), respectively. Conclusion Our algorithms developed on free-living accelerometer data were more accurate in classifying activity type in free-living older adults than our algorithms developed on laboratory accelerometer data. Future studies should consider using free-living accelerometer data to train machine-learning algorithms in older adults. PMID:26673129
Performance of Activity Classification Algorithms in Free-Living Older Adults.
Sasaki, Jeffer Eidi; Hickey, Amanda M; Staudenmayer, John W; John, Dinesh; Kent, Jane A; Freedson, Patty S
2016-05-01
The objective of this study is to compare activity type classification rates of machine learning algorithms trained on laboratory versus free-living accelerometer data in older adults. Thirty-five older adults (21 females and 14 males, 70.8 ± 4.9 yr) performed selected activities in the laboratory while wearing three ActiGraph GT3X+ activity monitors (in the dominant hip, wrist, and ankle; ActiGraph, LLC, Pensacola, FL). Monitors were initialized to collect raw acceleration data at a sampling rate of 80 Hz. Fifteen of the participants also wore GT3X+ in free-living settings and were directly observed for 2-3 h. Time- and frequency-domain features from acceleration signals of each monitor were used to train random forest (RF) and support vector machine (SVM) models to classify five activity types: sedentary, standing, household, locomotion, and recreational activities. All algorithms were trained on laboratory data (RFLab and SVMLab) and free-living data (RFFL and SVMFL) using 20-s signal sampling windows. Classification accuracy rates of both types of algorithms were tested on free-living data using a leave-one-out technique. Overall classification accuracy rates for the algorithms developed from laboratory data were between 49% (wrist) and 55% (ankle) for the SVMLab algorithms and 49% (wrist) to 54% (ankle) for the RFLab algorithms. The classification accuracy rates for SVMFL and RFFL algorithms ranged from 58% (wrist) to 69% (ankle) and from 61% (wrist) to 67% (ankle), respectively. Our algorithms developed on free-living accelerometer data were more accurate in classifying the activity type in free-living older adults than those on our algorithms developed on laboratory accelerometer data. Future studies should consider using free-living accelerometer data to train machine learning algorithms in older adults.
Earthquake detection through computationally efficient similarity search
Yoon, Clara E.; O’Reilly, Ossian; Bergen, Karianne J.; Beroza, Gregory C.
2015-01-01
Seismology is experiencing rapid growth in the quantity of data, which has outpaced the development of processing algorithms. Earthquake detection—identification of seismic events in continuous data—is a fundamental operation for observational seismology. We developed an efficient method to detect earthquakes using waveform similarity that overcomes the disadvantages of existing detection methods. Our method, called Fingerprint And Similarity Thresholding (FAST), can analyze a week of continuous seismic waveform data in less than 2 hours, or 140 times faster than autocorrelation. FAST adapts a data mining algorithm, originally designed to identify similar audio clips within large databases; it first creates compact “fingerprints” of waveforms by extracting key discriminative features, then groups similar fingerprints together within a database to facilitate fast, scalable search for similar fingerprint pairs, and finally generates a list of earthquake detections. FAST detected most (21 of 24) cataloged earthquakes and 68 uncataloged earthquakes in 1 week of continuous data from a station located near the Calaveras Fault in central California, achieving detection performance comparable to that of autocorrelation, with some additional false detections. FAST is expected to realize its full potential when applied to extremely long duration data sets over a distributed network of seismic stations. The widespread application of FAST has the potential to aid in the discovery of unexpected seismic signals, improve seismic monitoring, and promote a greater understanding of a variety of earthquake processes. PMID:26665176
Using information from historical high-throughput screens to predict active compounds.
Riniker, Sereina; Wang, Yuan; Jenkins, Jeremy L; Landrum, Gregory A
2014-07-28
Modern high-throughput screening (HTS) is a well-established approach for hit finding in drug discovery that is routinely employed in the pharmaceutical industry to screen more than a million compounds within a few weeks. However, as the industry shifts to more disease-relevant but more complex phenotypic screens, the focus has moved to piloting smaller but smarter chemically/biologically diverse subsets followed by an expansion around hit compounds. One standard method for doing this is to train a machine-learning (ML) model with the chemical fingerprints of the tested subset of molecules and then select the next compounds based on the predictions of this model. An alternative approach would be to take advantage of the wealth of bioactivity information contained in older (full-deck) screens using so-called HTS fingerprints, where each element of the fingerprint corresponds to the outcome of a particular assay, as input to machine-learning algorithms. We constructed HTS fingerprints using two collections of data: 93 in-house assays and 95 publicly available assays from PubChem. For each source, an additional set of 51 and 46 assays, respectively, was collected for testing. Three different ML methods, random forest (RF), logistic regression (LR), and naïve Bayes (NB), were investigated for both the HTS fingerprint and a chemical fingerprint, Morgan2. RF was found to be best suited for learning from HTS fingerprints yielding area under the receiver operating characteristic curve (AUC) values >0.8 for 78% of the internal assays and enrichment factors at 5% (EF(5%)) >10 for 55% of the assays. The RF(HTS-fp) generally outperformed the LR trained with Morgan2, which was the best ML method for the chemical fingerprint, for the majority of assays. In addition, HTS fingerprints were found to retrieve more diverse chemotypes. Combining the two models through heterogeneous classifier fusion led to a similar or better performance than the best individual model for all assays. Further validation using a pair of in-house assays and data from a confirmatory screen--including a prospective set of around 2000 compounds selected based on our approach--confirmed the good performance. Thus, the combination of machine-learning with HTS fingerprints and chemical fingerprints utilizes information from both domains and presents a very promising approach for hit expansion, leading to more hits. The source code used with the public data is provided.
Alshamlan, Hala; Badr, Ghada; Alohali, Yousef
2015-01-01
An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems. PMID:25961028
Alshamlan, Hala; Badr, Ghada; Alohali, Yousef
2015-01-01
An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.
Using Differential Evolution to Optimize Learning from Signals and Enhance Network Security
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harmer, Paul K; Temple, Michael A; Buckner, Mark A
2011-01-01
Computer and communication network attacks are commonly orchestrated through Wireless Access Points (WAPs). This paper summarizes proof-of-concept research activity aimed at developing a physical layer Radio Frequency (RF) air monitoring capability to limit unauthorizedWAP access and mprove network security. This is done using Differential Evolution (DE) to optimize the performance of a Learning from Signals (LFS) classifier implemented with RF Distinct Native Attribute (RF-DNA) fingerprints. Performance of the resultant DE-optimized LFS classifier is demonstrated using 802.11a WiFi devices under the most challenging conditions of intra-manufacturer classification, i.e., using emissions of like-model devices that only differ in serial number. Using identicalmore » classifier input features, performance of the DE-optimized LFS classifier is assessed relative to a Multiple Discriminant Analysis / Maximum Likelihood (MDA/ML) classifier that has been used for previous demonstrations. The comparative assessment is made using both Time Domain (TD) and Spectral Domain (SD) fingerprint features. For all combinations of classifier type, feature type, and signal-to-noise ratio considered, results show that the DEoptimized LFS classifier with TD features is uperior and provides up to 20% improvement in classification accuracy with proper selection of DE parameters.« less
Hettick, Justin M; Green, Brett J; Buskirk, Amanda D; Kashon, Michael L; Slaven, James E; Janotka, Erika; Blachere, Francoise M; Schmechel, Detlef; Beezhold, Donald H
2008-09-15
Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) was used to generate highly reproducible mass spectral fingerprints for 12 species of fungi of the genus Aspergillus and 5 different strains of Aspergillus flavus. Prior to MALDI-TOF MS analysis, the fungi were subjected to three 1-min bead beating cycles in an acetonitrile/trifluoroacetic acid solvent. The mass spectra contain abundant peaks in the range of 5 to 20kDa and may be used to discriminate between species unambiguously. A discriminant analysis using all peaks from the MALDI-TOF MS data yielded error rates for classification of 0 and 18.75% for resubstitution and cross-validation methods, respectively. If a subset of 28 significant peaks is chosen, resubstitution and cross-validation error rates are 0%. Discriminant analysis of the MALDI-TOF MS data for 5 strains of A. flavus using all peaks yielded error rates for classification of 0 and 5% for resubstitution and cross-validation methods, respectively. These data indicate that MALDI-TOF MS data may be used for unambiguous identification of members of the genus Aspergillus at both the species and strain levels.
Aircraft target detection algorithm based on high resolution spaceborne SAR imagery
NASA Astrophysics Data System (ADS)
Zhang, Hui; Hao, Mengxi; Zhang, Cong; Su, Xiaojing
2018-03-01
In this paper, an image classification algorithm for airport area is proposed, which based on the statistical features of synthetic aperture radar (SAR) images and the spatial information of pixels. The algorithm combines Gamma mixture model and MRF. The algorithm using Gamma mixture model to obtain the initial classification result. Pixel space correlation based on the classification results are optimized by the MRF technique. Additionally, morphology methods are employed to extract airport (ROI) region where the suspected aircraft target samples are clarified to reduce the false alarm and increase the detection performance. Finally, this paper presents the plane target detection, which have been verified by simulation test.
The software application and classification algorithms for welds radiograms analysis
NASA Astrophysics Data System (ADS)
Sikora, R.; Chady, T.; Baniukiewicz, P.; Grzywacz, B.; Lopato, P.; Misztal, L.; Napierała, L.; Piekarczyk, B.; Pietrusewicz, T.; Psuj, G.
2013-01-01
The paper presents a software implementation of an Intelligent System for Radiogram Analysis (ISAR). The system has to support radiologists in welds quality inspection. The image processing part of software with a graphical user interface and a welds classification part are described with selected classification results. Classification was based on a few algorithms: an artificial neural network, a k-means clustering, a simplified k-means and a rough sets theory.
Android Malware Classification Using K-Means Clustering Algorithm
NASA Astrophysics Data System (ADS)
Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah
2017-08-01
Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.
Pattern Classifications Using Grover's and Ventura's Algorithms in a Two-qubits System
NASA Astrophysics Data System (ADS)
Singh, Manu Pratap; Radhey, Kishori; Rajput, B. S.
2018-03-01
Carrying out the classification of patterns in a two-qubit system by separately using Grover's and Ventura's algorithms on different possible superposition, it has been shown that the exclusion superposition and the phase-invariance superposition are the most suitable search states obtained from two-pattern start-states and one-pattern start-states, respectively, for the simultaneous classifications of patterns. The higher effectiveness of Grover's algorithm for large search states has been verified but the higher effectiveness of Ventura's algorithm for smaller data base has been contradicted in two-qubit systems and it has been demonstrated that the unknown patterns (not present in the concerned data-base) are classified more efficiently than the known ones (present in the data-base) in both the algorithms. It has also been demonstrated that different states of Singh-Rajput MES obtained from the corresponding self-single- pattern start states are the most suitable search states for the classification of patterns |00>,|01 >, |10> and |11> respectively on the second iteration of Grover's method or the first operation of Ventura's algorithm.
NASA Astrophysics Data System (ADS)
Huang, Ding-jiang; Ivanova, Nataliya M.
2016-02-01
In this paper, we explain in more details the modern treatment of the problem of group classification of (systems of) partial differential equations (PDEs) from the algorithmic point of view. More precisely, we revise the classical Lie algorithm of construction of symmetries of differential equations, describe the group classification algorithm and discuss the process of reduction of (systems of) PDEs to (systems of) equations with smaller number of independent variables in order to construct invariant solutions. The group classification algorithm and reduction process are illustrated by the example of the generalized Zakharov-Kuznetsov (GZK) equations of form ut +(F (u)) xxx +(G (u)) xyy +(H (u)) x = 0. As a result, a complete group classification of the GZK equations is performed and a number of new interesting nonlinear invariant models which have non-trivial invariance algebras are obtained. Lie symmetry reductions and exact solutions for two important invariant models, i.e., the classical and modified Zakharov-Kuznetsov equations, are constructed. The algorithmic framework for group analysis of differential equations presented in this paper can also be applied to other nonlinear PDEs.
NASA Astrophysics Data System (ADS)
Copeland, Patricia L.; Shugars, James
1997-02-01
The Federal Bureau of Investigation (FBI) is currently developing a new system to provide timely criminal and civil identities and criminal history information to the nation's local, state, and federal users. The Integrated Automated Fingerprint Identification System (IAFIS), an upgrade to the existing Identification Division Automated Services (IDAS) System, is scheduled for implementation in 1999 at the new FBI facility in Clarksburg, West Virginia. IAFIS will offer new capabilities for electronic transmittal of fingerprint cards to the FBI, an improved fingerprint matching algorithm, and electronic maintenance of fingerprints and photo images. The Interstate Identification Index (III/FBI) System is one of three segments comprising the umbrella IAFIS System. III/FBI provides repository, maintenance, and dissemination capabilities for the 40 million subject national criminal history database. III/FBI will perform over 1 million name searches each day. Demanding performance, reliability/maintainability/availability, and flexibility/expandability requirements make III/FBI an architectural challenge to the system developers. This paper will discuss these driving requirements and present the technical solutions in terms of leading edge hardware and software.
Analysis of Sources of Large Positioning Errors in Deterministic Fingerprinting
2017-01-01
Wi-Fi fingerprinting is widely used for indoor positioning and indoor navigation due to the ubiquity of wireless networks, high proliferation of Wi-Fi-enabled mobile devices, and its reasonable positioning accuracy. The assumption is that the position can be estimated based on the received signal strength intensity from multiple wireless access points at a given point. The positioning accuracy, within a few meters, enables the use of Wi-Fi fingerprinting in many different applications. However, it has been detected that the positioning error might be very large in a few cases, which might prevent its use in applications with high accuracy positioning requirements. Hybrid methods are the new trend in indoor positioning since they benefit from multiple diverse technologies (Wi-Fi, Bluetooth, and Inertial Sensors, among many others) and, therefore, they can provide a more robust positioning accuracy. In order to have an optimal combination of technologies, it is crucial to identify when large errors occur and prevent the use of extremely bad positioning estimations in hybrid algorithms. This paper investigates why large positioning errors occur in Wi-Fi fingerprinting and how to detect them by using the received signal strength intensities. PMID:29186921
NASA Technical Reports Server (NTRS)
Maslanik, J. A.; Key, J.
1992-01-01
An expert system framework has been developed to classify sea ice types using satellite passive microwave data, an operational classification algorithm, spatial and temporal information, ice types estimated from a dynamic-thermodynamic model, output from a neural network that detects the onset of melt, and knowledge about season and region. The rule base imposes boundary conditions upon the ice classification, modifies parameters in the ice algorithm, determines a `confidence' measure for the classified data, and under certain conditions, replaces the algorithm output with model output. Results demonstrate the potential power of such a system for minimizing overall error in the classification and for providing non-expert data users with a means of assessing the usefulness of the classification results for their applications.
NASA Astrophysics Data System (ADS)
Merkel, Ronny; Gruhn, Stefan; Dittmann, Jana; Vielhauer, Claus; Bräutigam, Anja
2012-03-01
Determining the age of latent fingerprint traces found at crime scenes is an unresolved research issue since decades. Solving this issue could provide criminal investigators with the specific time a fingerprint trace was left on a surface, and therefore would enable them to link potential suspects to the time a crime took place as well as to reconstruct the sequence of events or eliminate irrelevant fingerprints to ensure privacy constraints. Transferring imaging techniques from different application areas, such as 3D image acquisition, surface measurement and chemical analysis to the domain of lifting latent biometric fingerprint traces is an upcoming trend in forensics. Such non-destructive sensor devices might help to solve the challenge of determining the age of a latent fingerprint trace, since it provides the opportunity to create time series and process them using pattern recognition techniques and statistical methods on digitized 2D, 3D and chemical data, rather than classical, contact-based capturing techniques, which alter the fingerprint trace and therefore make continuous scans impossible. In prior work, we have suggested to use a feature called binary pixel, which is a novel approach in the working field of fingerprint age determination. The feature uses a Chromatic White Light (CWL) image sensor to continuously scan a fingerprint trace over time and retrieves a characteristic logarithmic aging tendency for 2D-intensity as well as 3D-topographic images from the sensor. In this paper, we propose to combine such two characteristic aging features with other 2D and 3D features from the domains of surface measurement, microscopy, photography and spectroscopy, to achieve an increase in accuracy and reliability of a potential future age determination scheme. Discussing the feasibility of such variety of sensor devices and possible aging features, we propose a general fusion approach, which might combine promising features to a joint age determination scheme in future. We furthermore demonstrate the feasibility of the introduced approach by exemplary fusing the binary pixel features based on 2D-intensity and 3D-topographic images of the mentioned CWL sensor. We conclude that a formula based age determination approach requires very precise image data, which cannot be achieved at the moment, whereas a machine learning based classification approach seems to be feasible, if an adequate amount of features can be provided.
Optimizing Support Vector Machine Parameters with Genetic Algorithm for Credit Risk Assessment
NASA Astrophysics Data System (ADS)
Manurung, Jonson; Mawengkang, Herman; Zamzami, Elviawaty
2017-12-01
Support vector machine (SVM) is a popular classification method known to have strong generalization capabilities. SVM can solve the problem of classification and linear regression or nonlinear kernel which can be a learning algorithm for the ability of classification and regression. However, SVM also has a weakness that is difficult to determine the optimal parameter value. SVM calculates the best linear separator on the input feature space according to the training data. To classify data which are non-linearly separable, SVM uses kernel tricks to transform the data into a linearly separable data on a higher dimension feature space. The kernel trick using various kinds of kernel functions, such as : linear kernel, polynomial, radial base function (RBF) and sigmoid. Each function has parameters which affect the accuracy of SVM classification. To solve the problem genetic algorithms are proposed to be applied as the optimal parameter value search algorithm thus increasing the best classification accuracy on SVM. Data taken from UCI repository of machine learning database: Australian Credit Approval. The results show that the combination of SVM and genetic algorithms is effective in improving classification accuracy. Genetic algorithms has been shown to be effective in systematically finding optimal kernel parameters for SVM, instead of randomly selected kernel parameters. The best accuracy for data has been upgraded from kernel Linear: 85.12%, polynomial: 81.76%, RBF: 77.22% Sigmoid: 78.70%. However, for bigger data sizes, this method is not practical because it takes a lot of time.
PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer
Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models tomore » curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.« less
Hakimzadeh, Neda; Parastar, Hadi; Fattahi, Mohammad
2014-01-24
In this study, multivariate curve resolution (MCR) and multivariate classification methods are proposed to develop a new chemometric strategy for comprehensive analysis of high-performance liquid chromatography-diode array absorbance detection (HPLC-DAD) fingerprints of sixty Salvia reuterana samples from five different geographical regions. Different chromatographic problems occurred during HPLC-DAD analysis of S. reuterana samples, such as baseline/background contribution and noise, low signal-to-noise ratio (S/N), asymmetric peaks, elution time shifts, and peak overlap are handled using the proposed strategy. In this way, chromatographic fingerprints of sixty samples are properly segmented to ten common chromatographic regions using local rank analysis and then, the corresponding segments are column-wise augmented for subsequent MCR analysis. Extended multivariate curve resolution-alternating least squares (MCR-ALS) is used to obtain pure component profiles in each segment. In general, thirty-one chemical components were resolved using MCR-ALS in sixty S. reuterana samples and the lack of fit (LOF) values of MCR-ALS models were below 10.0% in all cases. Pure spectral profiles are considered for identification of chemical components by comparing their resolved spectra with the standard ones and twenty-four components out of thirty-one components were identified. Additionally, pure elution profiles are used to obtain relative concentrations of chemical components in different samples for multivariate classification analysis by principal component analysis (PCA) and k-nearest neighbors (kNN). Inspection of the PCA score plot (explaining 76.1% of variance accounted for three PCs) showed that S. reuterana samples belong to four clusters. The degree of class separation (DCS) which quantifies the distance separating clusters in relation to the scatter within each cluster is calculated for four clusters and it was in the range of 1.6-5.8. These results are then confirmed by kNN. In addition, according to the PCA loading plot and kNN dendrogram of thirty-one variables, five chemical constituents of luteolin-7-o-glucoside, salvianolic acid D, rosmarinic acid, lithospermic acid and trijuganone A are identified as the most important variables (i.e., chemical markers) for clusters discrimination. Finally, the effect of different chemical markers on samples differentiation is investigated using counter-propagation artificial neural network (CP-ANN) method. It is concluded that the proposed strategy can be successfully applied for comprehensive analysis of chromatographic fingerprints of complex natural samples. Copyright © 2013 Elsevier B.V. All rights reserved.
Classification of Odours for Mobile Robots Using an Ensemble of Linear Classifiers
NASA Astrophysics Data System (ADS)
Trincavelli, Marco; Coradeschi, Silvia; Loutfi, Amy
2009-05-01
This paper investigates the classification of odours using an electronic nose mounted on a mobile robot. The samples are collected as the robot explores the environment. Under such conditions, the sensor response differs from typical three phase sampling processes. In this paper, we focus particularly on the classification problem and how it is influenced by the movement of the robot. To cope with these influences, an algorithm consisting of an ensemble of classifiers is presented. Experimental results show that this algorithm increases classification performance compared to other traditional classification methods.
Gonzalez Viejo, Claudia; Fuentes, Sigfredo; Torrico, Damir; Howell, Kate; Dunshea, Frank R
2018-01-01
Beer quality is mainly defined by its colour, foamability and foam stability, which are influenced by the chemical composition of the product such as proteins, carbohydrates, pH and alcohol. Traditional methods to assess specific chemical compounds are usually time-consuming and costly. This study used rapid methods to evaluate 15 foam and colour-related parameters using a robotic pourer (RoboBEER) and chemical fingerprinting using near infrared spectroscopy (NIR) from six replicates of 21 beers from three types of fermentation. Results from NIR were used to create partial least squares regression (PLS) and artificial neural networks (ANN) models to predict four chemometrics such as pH, alcohol, Brix and maximum volume of foam. The ANN method was able to create more accurate models (R 2 = 0.95) compared to PLS. Principal components analysis using RoboBEER parameters and NIR overtones related to protein explained 67% of total data variability. Additionally, a sub-space discriminant model using the absorbance values from NIR wavelengths resulted in the successful classification of 85% of beers according to fermentation type. The method proposed showed to be a rapid system based on NIR spectroscopy and RoboBEER outputs of foamability that can be used to infer the quality, production method and chemical parameters of beer with minimal laboratory equipment. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Based on the CSI regional segmentation indoor localization algorithm
NASA Astrophysics Data System (ADS)
Zeng, Xi; Lin, Wei; Lan, Jingwei
2017-08-01
To solve the problem of high cost and low accuracy, the method of Channel State Information (CSI) regional segmentation are proposed in the indoor positioning. Because Channel State Information (CSI) stability, and effective against multipath effect, we used the Channel State Information (CSI) to segment location area. The method Acquisition CSI the influence of different link to pinpoint the location of the area. Then the method can improve the accuracy of positioning, and reduce the cost of the fingerprint localization algorithm.
Data fusion for target tracking and classification with wireless sensor network
NASA Astrophysics Data System (ADS)
Pannetier, Benjamin; Doumerc, Robin; Moras, Julien; Dezert, Jean; Canevet, Loic
2016-10-01
In this paper, we address the problem of multiple ground target tracking and classification with information obtained from a unattended wireless sensor network. A multiple target tracking (MTT) algorithm, taking into account road and vegetation information, is proposed based on a centralized architecture. One of the key issue is how to adapt classical MTT approach to satisfy embedded processing. Based on track statistics, the classification algorithm uses estimated location, velocity and acceleration to help to classify targets. The algorithms enables tracking human and vehicles driving both on and off road. We integrate road or trail width and vegetation cover, as constraints in target motion models to improve performance of tracking under constraint with classification fusion. Our algorithm also presents different dynamic models, to palliate the maneuvers of targets. The tracking and classification algorithms are integrated into an operational platform (the fusion node). In order to handle realistic ground target tracking scenarios, we use an autonomous smart computer deposited in the surveillance area. After the calibration step of the heterogeneous sensor network, our system is able to handle real data from a wireless ground sensor network. The performance of system is evaluated in a real exercise for intelligence operation ("hunter hunt" scenario).
Lossless Compression of Classification-Map Data
NASA Technical Reports Server (NTRS)
Hua, Xie; Klimesh, Matthew
2009-01-01
A lossless image-data-compression algorithm intended specifically for application to classification-map data is based on prediction, context modeling, and entropy coding. The algorithm was formulated, in consideration of the differences between classification maps and ordinary images of natural scenes, so as to be capable of compressing classification- map data more effectively than do general-purpose image-data-compression algorithms. Classification maps are typically generated from remote-sensing images acquired by instruments aboard aircraft (see figure) and spacecraft. A classification map is a synthetic image that summarizes information derived from one or more original remote-sensing image(s) of a scene. The value assigned to each pixel in such a map is the index of a class that represents some type of content deduced from the original image data for example, a type of vegetation, a mineral, or a body of water at the corresponding location in the scene. When classification maps are generated onboard the aircraft or spacecraft, it is desirable to compress the classification-map data in order to reduce the volume of data that must be transmitted to a ground station.
Balouchestani, Mohammadreza; Krishnan, Sridhar
2014-01-01
Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.
Ozcift, Akin; Gulten, Arif
2011-12-01
Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature. While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC). Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases. RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Gan, Heng-Hui; Soukoulis, Christos; Fisk, Ian
2014-03-01
In the present work, we have evaluated for first time the feasibility of APCI-MS volatile compound fingerprinting in conjunction with chemometrics (PLS-DA) as a new strategy for rapid and non-destructive food classification. For this purpose 202 clarified monovarietal juices extracted from apples differing in their botanical and geographical origin were used for evaluation of the performance of APCI-MS as a classification tool. For an independent test set PLS-DA analyses of pre-treated spectral data gave 100% and 94.2% correct classification rate for the classification by cultivar and geographical origin, respectively. Moreover, PLS-DA analysis of APCI-MS in conjunction with GC-MS data revealed that masses within the spectral ACPI-MS data set were related with parent ions or fragments of alkyesters, carbonyl compounds (hexanal, trans-2-hexenal) and alcohols (1-hexanol, 1-butanol, cis-3-hexenol) and had significant discriminating power both in terms of cultivar and geographical origin. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.
Pet fur color and texture classification
NASA Astrophysics Data System (ADS)
Yen, Jonathan; Mukherjee, Debarghar; Lim, SukHwan; Tretter, Daniel
2007-01-01
Object segmentation is important in image analysis for imaging tasks such as image rendering and image retrieval. Pet owners have been known to be quite vocal about how important it is to render their pets perfectly. We present here an algorithm for pet (mammal) fur color classification and an algorithm for pet (animal) fur texture classification. Per fur color classification can be applied as a necessary condition for identifying the regions in an image that may contain pets much like the skin tone classification for human flesh detection. As a result of the evolution, fur coloration of all mammals is caused by a natural organic pigment called Melanin and Melanin has only very limited color ranges. We have conducted a statistical analysis and concluded that mammal fur colors can be only in levels of gray or in two colors after the proper color quantization. This pet fur color classification algorithm has been applied for peteye detection. We also present here an algorithm for animal fur texture classification using the recently developed multi-resolution directional sub-band Contourlet transform. The experimental results are very promising as these transforms can identify regions of an image that may contain fur of mammals, scale of reptiles and feather of birds, etc. Combining the color and texture classification, one can have a set of strong classifiers for identifying possible animals in an image.
Fernandez-Lozano, C.; Canto, C.; Gestal, M.; Andrade-Garda, J. M.; Rabuñal, J. R.; Dorado, J.; Pazos, A.
2013-01-01
Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected. PMID:24453933
A coarse to fine minutiae-based latent palmprint matching.
Liu, Eryun; Jain, Anil K; Tian, Jie
2013-10-01
With the availability of live-scan palmprint technology, high resolution palmprint recognition has started to receive significant attention in forensics and law enforcement. In forensic applications, latent palmprints provide critical evidence as it is estimated that about 30 percent of the latents recovered at crime scenes are those of palms. Most of the available high-resolution palmprint matching algorithms essentially follow the minutiae-based fingerprint matching strategy. Considering the large number of minutiae (about 1,000 minutiae in a full palmprint compared to about 100 minutiae in a rolled fingerprint) and large area of foreground region in full palmprints, novel strategies need to be developed for efficient and robust latent palmprint matching. In this paper, a coarse to fine matching strategy based on minutiae clustering and minutiae match propagation is designed specifically for palmprint matching. To deal with the large number of minutiae, a local feature-based minutiae clustering algorithm is designed to cluster minutiae into several groups such that minutiae belonging to the same group have similar local characteristics. The coarse matching is then performed within each cluster to establish initial minutiae correspondences between two palmprints. Starting with each initial correspondence, a minutiae match propagation algorithm searches for mated minutiae in the full palmprint. The proposed palmprint matching algorithm has been evaluated on a latent-to-full palmprint database consisting of 446 latents and 12,489 background full prints. The matching results show a rank-1 identification accuracy of 79.4 percent, which is significantly higher than the 60.8 percent identification accuracy of a state-of-the-art latent palmprint matching algorithm on the same latent database. The average computation time of our algorithm for a single latent-to-full match is about 141 ms for genuine match and 50 ms for impostor match, on a Windows XP desktop system with 2.2-GHz CPU and 1.00-GB RAM. The computation time of our algorithm is an order of magnitude faster than a previously published state-of-the-art-algorithm.
NASA Astrophysics Data System (ADS)
Kotelnikov, E. V.; Milov, V. R.
2018-05-01
Rule-based learning algorithms have higher transparency and easiness to interpret in comparison with neural networks and deep learning algorithms. These properties make it possible to effectively use such algorithms to solve descriptive tasks of data mining. The choice of an algorithm depends also on its ability to solve predictive tasks. The article compares the quality of the solution of the problems with binary and multiclass classification based on the experiments with six datasets from the UCI Machine Learning Repository. The authors investigate three algorithms: Ripper (rule induction), C4.5 (decision trees), In-Close (formal concept analysis). The results of the experiments show that In-Close demonstrates the best quality of classification in comparison with Ripper and C4.5, however the latter two generate more compact rule sets.
NASA Astrophysics Data System (ADS)
Dobeck, Gerald J.; Cobb, J. Tory
2002-08-01
The high-resolution sonar is one of the principal sensors used by the Navy to detect and classify sea mines in minehunting operations. For such sonar systems, substantial effort has been devoted to the development of automated detection and classification (D/C) algorithms. These have been spurred by several factors including (1) aids for operators to reduce work overload, (2) more optimal use of all available data, and (3) the introduction of unmanned minehunting systems. The environments where sea mines are typically laid (harbor areas, shipping lanes, and the littorals) give rise to many false alarms caused by natural, biologic, and man-made clutter. The objective of the automated D/C algorithms is to eliminate most of these false alarms while still maintaining a very high probability of mine detection and classification (PdPc). In recent years, the benefits of fusing the outputs of multiple D/C algorithms have been studied. We refer to this as Algorithm Fusion. The results have been remarkable, including reliable robustness to new environments. The Quadratic Penalty Function Support Vector Machine (QPFSVM) algorithm to aid in the automated detection and classification of sea mines is introduced in this paper. The QPFSVM algorithm is easy to train, simple to implement, and robust to feature space dimension. Outputs of successive SVM algorithms are cascaded in stages (fused) to improve the Probability of Classification (Pc) and reduce the number of false alarms. Even though our experience has been gained in the area of sea mine detection and classification, the principles described herein are general and can be applied to fusion of any D/C problem (e.g., automated medical diagnosis or automatic target recognition for ballistic missile defense).
NASA Technical Reports Server (NTRS)
Kocurek, Michael J.
2005-01-01
The HARVIST project seeks to automatically provide an accurate, interactive interface to predict crop yield over the entire United States. In order to accomplish this goal, large images must be quickly and automatically classified by crop type. Current trained and untrained classification algorithms, while accurate, are highly inefficient when operating on large datasets. This project sought to develop new variants of two standard trained and untrained classification algorithms that are optimized to take advantage of the spatial nature of image data. The first algorithm, harvist-cluster, utilizes divide-and-conquer techniques to precluster an image in the hopes of increasing overall clustering speed. The second algorithm, harvistSVM, utilizes support vector machines (SVMs), a type of trained classifier. It seeks to increase classification speed by applying a "meta-SVM" to a quick (but inaccurate) SVM to approximate a slower, yet more accurate, SVM. Speedups were achieved by tuning the algorithm to quickly identify when the quick SVM was incorrect, and then reclassifying low-confidence pixels as necessary. Comparing the classification speeds of both algorithms to known baselines showed a slight speedup for large values of k (the number of clusters) for harvist-cluster, and a significant speedup for harvistSVM. Future work aims to automate the parameter tuning process required for harvistSVM, and further improve classification accuracy and speed. Additionally, this research will move documents created in Canvas into ArcGIS. The launch of the Mars Reconnaissance Orbiter (MRO) will provide a wealth of image data such as global maps of Martian weather and high resolution global images of Mars. The ability to store this new data in a georeferenced format will support future Mars missions by providing data for landing site selection and the search for water on Mars.
Automatic classification of protein structures using physicochemical parameters.
Mohan, Abhilash; Rao, M Divya; Sunderrajan, Shruthi; Pennathur, Gautam
2014-09-01
Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.
Parallel Implementation of the Wideband DOA Algorithm on the IBM Cell BE Processor
2010-05-01
Abstract—The Multiple Signal Classification ( MUSIC ) algorithm is a powerful technique for determining the Direction of Arrival (DOA) of signals...Broadband Engine Processor (Cell BE). The process of adapting the serial based MUSIC algorithm to the Cell BE will be analyzed in terms of parallelism and...using Multiple Signal Classification MUSIC algorithm [4] • Computation of Focus matrix • Computation of number of sources • Separation of Signal
A robust data scaling algorithm to improve classification accuracies in biomedical data.
Cao, Xi Hang; Stojkovic, Ivan; Obradovic, Zoran
2016-09-09
Machine learning models have been adapted in biomedical research and practice for knowledge discovery and decision support. While mainstream biomedical informatics research focuses on developing more accurate models, the importance of data preprocessing draws less attention. We propose the Generalized Logistic (GL) algorithm that scales data uniformly to an appropriate interval by learning a generalized logistic function to fit the empirical cumulative distribution function of the data. The GL algorithm is simple yet effective; it is intrinsically robust to outliers, so it is particularly suitable for diagnostic/classification models in clinical/medical applications where the number of samples is usually small; it scales the data in a nonlinear fashion, which leads to potential improvement in accuracy. To evaluate the effectiveness of the proposed algorithm, we conducted experiments on 16 binary classification tasks with different variable types and cover a wide range of applications. The resultant performance in terms of area under the receiver operation characteristic curve (AUROC) and percentage of correct classification showed that models learned using data scaled by the GL algorithm outperform the ones using data scaled by the Min-max and the Z-score algorithm, which are the most commonly used data scaling algorithms. The proposed GL algorithm is simple and effective. It is robust to outliers, so no additional denoising or outlier detection step is needed in data preprocessing. Empirical results also show models learned from data scaled by the GL algorithm have higher accuracy compared to the commonly used data scaling algorithms.
A hybrid clustering and classification approach for predicting crash injury severity on rural roads.
Hasheminejad, Seyed Hessam-Allah; Zahedi, Mohsen; Hasheminejad, Seyed Mohammad Hossein
2018-03-01
As a threat for transportation system, traffic crashes have a wide range of social consequences for governments. Traffic crashes are increasing in developing countries and Iran as a developing country is not immune from this risk. There are several researches in the literature to predict traffic crash severity based on artificial neural networks (ANNs), support vector machines and decision trees. This paper attempts to investigate the crash injury severity of rural roads by using a hybrid clustering and classification approach to compare the performance of classification algorithms before and after applying the clustering. In this paper, a novel rule-based genetic algorithm (GA) is proposed to predict crash injury severity, which is evaluated by performance criteria in comparison with classification algorithms like ANN. The results obtained from analysis of 13,673 crashes (5600 property damage, 778 fatal crashes, 4690 slight injuries and 2605 severe injuries) on rural roads in Tehran Province of Iran during 2011-2013 revealed that the proposed GA method outperforms other classification algorithms based on classification metrics like precision (86%), recall (88%) and accuracy (87%). Moreover, the proposed GA method has the highest level of interpretation, is easy to understand and provides feedback to analysts.
Evaluating data mining algorithms using molecular dynamics trajectories.
Tatsis, Vasileios A; Tjortjis, Christos; Tzirakis, Panagiotis
2013-01-01
Molecular dynamics simulations provide a sample of a molecule's conformational space. Experiments on the mus time scale, resulting in large amounts of data, are nowadays routine. Data mining techniques such as classification provide a way to analyse such data. In this work, we evaluate and compare several classification algorithms using three data sets which resulted from computer simulations, of a potential enzyme mimetic biomolecule. We evaluated 65 classifiers available in the well-known data mining toolkit Weka, using 'classification' errors to assess algorithmic performance. Results suggest that: (i) 'meta' classifiers perform better than the other groups, when applied to molecular dynamics data sets; (ii) Random Forest and Rotation Forest are the best classifiers for all three data sets; and (iii) classification via clustering yields the highest classification error. Our findings are consistent with bibliographic evidence, suggesting a 'roadmap' for dealing with such data.
Modified Mahalanobis Taguchi System for Imbalance Data Classification
2017-01-01
The Mahalanobis Taguchi System (MTS) is considered one of the most promising binary classification algorithms to handle imbalance data. Unfortunately, MTS lacks a method for determining an efficient threshold for the binary classification. In this paper, a nonlinear optimization model is formulated based on minimizing the distance between MTS Receiver Operating Characteristics (ROC) curve and the theoretical optimal point named Modified Mahalanobis Taguchi System (MMTS). To validate the MMTS classification efficacy, it has been benchmarked with Support Vector Machines (SVMs), Naive Bayes (NB), Probabilistic Mahalanobis Taguchi Systems (PTM), Synthetic Minority Oversampling Technique (SMOTE), Adaptive Conformal Transformation (ACT), Kernel Boundary Alignment (KBA), Hidden Naive Bayes (HNB), and other improved Naive Bayes algorithms. MMTS outperforms the benchmarked algorithms especially when the imbalance ratio is greater than 400. A real life case study on manufacturing sector is used to demonstrate the applicability of the proposed model and to compare its performance with Mahalanobis Genetic Algorithm (MGA). PMID:28811820
Virtual screening by a new Clustering-based Weighted Similarity Extreme Learning Machine approach
Kudisthalert, Wasu
2018-01-01
Machine learning techniques are becoming popular in virtual screening tasks. One of the powerful machine learning algorithms is Extreme Learning Machine (ELM) which has been applied to many applications and has recently been applied to virtual screening. We propose the Weighted Similarity ELM (WS-ELM) which is based on a single layer feed-forward neural network in a conjunction of 16 different similarity coefficients as activation function in the hidden layer. It is known that the performance of conventional ELM is not robust due to random weight selection in the hidden layer. Thus, we propose a Clustering-based WS-ELM (CWS-ELM) that deterministically assigns weights by utilising clustering algorithms i.e. k-means clustering and support vector clustering. The experiments were conducted on one of the most challenging datasets–Maximum Unbiased Validation Dataset–which contains 17 activity classes carefully selected from PubChem. The proposed algorithms were then compared with other machine learning techniques such as support vector machine, random forest, and similarity searching. The results show that CWS-ELM in conjunction with support vector clustering yields the best performance when utilised together with Sokal/Sneath(1) coefficient. Furthermore, ECFP_6 fingerprint presents the best results in our framework compared to the other types of fingerprints, namely ECFP_4, FCFP_4, and FCFP_6. PMID:29652912
Exploitation of RF-DNA for Device Classification and Verification Using GRLVQI Processing
2012-12-01
5 FLD Fisher’s Linear Discriminant . . . . . . . . . . . . . . . . . . . 6 kNN K-Nearest Neighbor...Neighbor ( kNN ), Support Vector Machine (SVM), and simple cross-correlation techniques [40, 57, 82, 88, 94, 95]. The RF-DNA fingerprinting research in...Expansion and the Dis- crete Gabor Transform on a Non-Separable Lattice”. 2000 IEEE Int’l Conf on Acoustics, Speech , and Signal Processing (ICASSP00
Fingerprinting Green Curry: An Electrochemical Approach to Food Quality Control.
Chaibun, Thanyarat; La-O-Vorakiat, Chan; O'Mullane, Anthony P; Lertanantawong, Benchaporn; Surareungchai, Werasak
2018-06-07
The detection and identification of multiple components in a complex sample such as food in a cost-effective way is an ongoing challenge. The development of on-site and rapid detection methods to ensure food quality and composition is of significant interest to the food industry. Here we report that an electrochemical method can be used with an unmodified glassy carbon electrode for the identification of the key ingredients found within Thai green curries. It was found that green curry presents a fingerprint electrochemical response that contains four distinct peaks when differential pulse voltammetry is performed. The reproducibility of the sensor is excellent as no surface modification is required and therefore storage is not an issue. By employing particle swarm optimization algorithms the identification of ingredients within a green curry could be obtained. In addition, the quality and freshness of the sample could be monitored by detecting a change in the intensity of the peaks in the fingerprint response.
Ostenson, Jason; Robison, Ryan K; Zwart, Nicholas R; Welch, E Brian
2017-09-01
Magnetic resonance fingerprinting (MRF) pulse sequences often employ spiral trajectories for data readout. Spiral k-space acquisitions are vulnerable to blurring in the spatial domain in the presence of static field off-resonance. This work describes a blurring correction algorithm for use in spiral MRF and demonstrates its effectiveness in phantom and in vivo experiments. Results show that image quality of T1 and T2 parametric maps is improved by application of this correction. This MRF correction has negligible effect on the concordance correlation coefficient and improves coefficient of variation in regions of off-resonance relative to uncorrected measurements. Copyright © 2017 Elsevier Inc. All rights reserved.
Robust video copy detection approach based on local tangent space alignment
NASA Astrophysics Data System (ADS)
Nie, Xiushan; Qiao, Qianping
2012-04-01
We propose a robust content-based video copy detection approach based on local tangent space alignment (LTSA), which is an efficient dimensionality reduction algorithm. The idea is motivated by the fact that the content of video becomes richer and the dimension of content becomes higher. It does not give natural tools for video analysis and understanding because of the high dimensionality. The proposed approach reduces the dimensionality of video content using LTSA, and then generates video fingerprints in low dimensional space for video copy detection. Furthermore, a dynamic sliding window is applied to fingerprint matching. Experimental results show that the video copy detection approach has good robustness and discrimination.
Aksungur, N; Korkut, E
2018-05-24
We read Atamanalp classification, treatment algorithm and prognosis-estimating systems for sigmoid volvulus (SV) and ileosigmoid knotting (ISK) in Colorectal Disease [1,2]. Our comments relate to necessity and utility of these new classification systems. Classification or staging systems are generally used in malignant or premalignant pathologies such as colorectal cancers [3] or polyps [4]. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Friedman, Lee; Rigas, Ioannis; Abdulin, Evgeny; Komogortsev, Oleg V
2018-05-15
Nystrӧm and Holmqvist have published a method for the classification of eye movements during reading (ONH) (Nyström & Holmqvist, 2010). When we applied this algorithm to our data, the results were not satisfactory, so we modified the algorithm (now the MNH) to better classify our data. The changes included: (1) reducing the amount of signal filtering, (2) excluding a new type of noise, (3) removing several adaptive thresholds and replacing them with fixed thresholds, (4) changing the way that the start and end of each saccade was determined, (5) employing a new algorithm for detecting PSOs, and (6) allowing a fixation period to either begin or end with noise. A new method for the evaluation of classification algorithms is presented. It was designed to provide comprehensive feedback to an algorithm developer, in a time-efficient manner, about the types and numbers of classification errors that an algorithm produces. This evaluation was conducted by three expert raters independently, across 20 randomly chosen recordings, each classified by both algorithms. The MNH made many fewer errors in determining when saccades start and end, and it also detected some fixations and saccades that the ONH did not. The MNH fails to detect very small saccades. We also evaluated two additional algorithms: the EyeLink Parser and a more current, machine-learning-based algorithm. The EyeLink Parser tended to find more saccades that ended too early than did the other methods, and we found numerous problems with the output of the machine-learning-based algorithm.
Comparison of GOES Cloud Classification Algorithms Employing Explicit and Implicit Physics
NASA Technical Reports Server (NTRS)
Bankert, Richard L.; Mitrescu, Cristian; Miller, Steven D.; Wade, Robert H.
2009-01-01
Cloud-type classification based on multispectral satellite imagery data has been widely researched and demonstrated to be useful for distinguishing a variety of classes using a wide range of methods. The research described here is a comparison of the classifier output from two very different algorithms applied to Geostationary Operational Environmental Satellite (GOES) data over the course of one year. The first algorithm employs spectral channel thresholding and additional physically based tests. The second algorithm was developed through a supervised learning method with characteristic features of expertly labeled image samples used as training data for a 1-nearest-neighbor classification. The latter's ability to identify classes is also based in physics, but those relationships are embedded implicitly within the algorithm. A pixel-to-pixel comparison analysis was done for hourly daytime scenes within a region in the northeastern Pacific Ocean. Considerable agreement was found in this analysis, with many of the mismatches or disagreements providing insight to the strengths and limitations of each classifier. Depending upon user needs, a rule-based or other postprocessing system that combines the output from the two algorithms could provide the most reliable cloud-type classification.
NASA Astrophysics Data System (ADS)
Wang, Qingjie; Xin, Jingmin; Wu, Jiayi; Zheng, Nanning
2017-03-01
Microaneurysms are the earliest clinic signs of diabetic retinopathy, and many algorithms were developed for the automatic classification of these specific pathology. However, the imbalanced class distribution of dataset usually causes the classification accuracy of true microaneurysms be low. Therefore, by combining the borderline synthetic minority over-sampling technique (BSMOTE) with the data cleaning techniques such as Tomek links and Wilson's edited nearest neighbor rule (ENN) to resample the imbalanced dataset, we propose two new support vector machine (SVM) classification algorithms for the microaneurysms. The proposed BSMOTE-Tomek and BSMOTE-ENN algorithms consist of: 1) the adaptive synthesis of the minority samples in the neighborhood of the borderline, and 2) the remove of redundant training samples for improving the efficiency of data utilization. Moreover, the modified SVM classifier with probabilistic outputs is used to divide the microaneurysm candidates into two groups: true microaneurysms and false microaneurysms. The experiments with a public microaneurysms database shows that the proposed algorithms have better classification performance including the receiver operating characteristic (ROC) curve and the free-response receiver operating characteristic (FROC) curve.
Best Merge Region Growing with Integrated Probabilistic Classification for Hyperspectral Imagery
NASA Technical Reports Server (NTRS)
Tarabalka, Yuliya; Tilton, James C.
2011-01-01
A new method for spectral-spatial classification of hyperspectral images is proposed. The method is based on the integration of probabilistic classification within the hierarchical best merge region growing algorithm. For this purpose, preliminary probabilistic support vector machines classification is performed. Then, hierarchical step-wise optimization algorithm is applied, by iteratively merging regions with the smallest Dissimilarity Criterion (DC). The main novelty of this method consists in defining a DC between regions as a function of region statistical and geometrical features along with classification probabilities. Experimental results are presented on a 200-band AVIRIS image of the Northwestern Indiana s vegetation area and compared with those obtained by recently proposed spectral-spatial classification techniques. The proposed method improves classification accuracies when compared to other classification approaches.
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
NASA Astrophysics Data System (ADS)
Chatterjee, Amit; Bhatia, Vimal; Prakash, Shashi
2017-08-01
Fingerprint is a unique, un-alterable and easily collected biometric of a human being. Although it is a 3D biological characteristic, traditional methods are designed to provide only a 2D image. This touch based mapping of 3D shape to 2D image losses information and leads to nonlinear distortions. Moreover, as only topographic details are captured, conventional systems are potentially vulnerable to spoofing materials (e.g. artificial fingers, dead fingers, false prints, etc.). In this work, we demonstrate an anti-spoof touchless 3D fingerprint detection system using a combination of single shot fringe projection and biospeckle analysis. For fingerprint detection using fringe projection, light from a low power LED source illuminates a finger through a sinusoidal grating. The fringe pattern modulated because of features on the fingertip is captured using a CCD camera. Fourier transform method based frequency filtering is used for the reconstruction of 3D fingerprint from the captured fringe pattern. In the next step, for spoof detection using biospeckle analysis a visuo-numeric algorithm based on modified structural function and non-normalized histogram is proposed. High activity biospeckle patterns are generated because of interaction of collimated laser light with internal fluid flow of the real finger sample. This activity reduces abruptly in case of layered fake prints, and is almost absent in dead or fake fingers. Furthermore, the proposed setup is fast, low-cost, involves non-mechanical scanning and is highly stable.
Enhancing Breast Cancer Recurrence Algorithms Through Selective Use of Medical Record Data.
Kroenke, Candyce H; Chubak, Jessica; Johnson, Lisa; Castillo, Adrienne; Weltzien, Erin; Caan, Bette J
2016-03-01
The utility of data-based algorithms in research has been questioned because of errors in identification of cancer recurrences. We adapted previously published breast cancer recurrence algorithms, selectively using medical record (MR) data to improve classification. We evaluated second breast cancer event (SBCE) and recurrence-specific algorithms previously published by Chubak and colleagues in 1535 women from the Life After Cancer Epidemiology (LACE) and 225 women from the Women's Health Initiative cohorts and compared classification statistics to published values. We also sought to improve classification with minimal MR examination. We selected pairs of algorithms-one with high sensitivity/high positive predictive value (PPV) and another with high specificity/high PPV-using MR information to resolve discrepancies between algorithms, properly classifying events based on review; we called this "triangulation." Finally, in LACE, we compared associations between breast cancer survival risk factors and recurrence using MR data, single Chubak algorithms, and triangulation. The SBCE algorithms performed well in identifying SBCE and recurrences. Recurrence-specific algorithms performed more poorly than published except for the high-specificity/high-PPV algorithm, which performed well. The triangulation method (sensitivity = 81.3%, specificity = 99.7%, PPV = 98.1%, NPV = 96.5%) improved recurrence classification over two single algorithms (sensitivity = 57.1%, specificity = 95.5%, PPV = 71.3%, NPV = 91.9%; and sensitivity = 74.6%, specificity = 97.3%, PPV = 84.7%, NPV = 95.1%), with 10.6% MR review. Triangulation performed well in survival risk factor analyses vs analyses using MR-identified recurrences. Use of multiple recurrence algorithms in administrative data, in combination with selective examination of MR data, may improve recurrence data quality and reduce research costs. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Prediction of customer behaviour analysis using classification algorithms
NASA Astrophysics Data System (ADS)
Raju, Siva Subramanian; Dhandayudam, Prabha
2018-04-01
Customer Relationship management plays a crucial role in analyzing of customer behavior patterns and their values with an enterprise. Analyzing of customer data can be efficient performed using various data mining techniques, with the goal of developing business strategies and to enhance the business. In this paper, three classification models (NB, J48, and MLPNN) are studied and evaluated for our experimental purpose. The performance measures of the three classifications are compared using three different parameters (accuracy, sensitivity, specificity) and experimental results expose J48 algorithm has better accuracy with compare to NB and MLPNN algorithm.
Multi-label spacecraft electrical signal classification method based on DBN and random forest
Li, Ke; Yu, Nan; Li, Pengfei; Song, Shimin; Wu, Yalei; Li, Yang; Liu, Meng
2017-01-01
In spacecraft electrical signal characteristic data, there exists a large amount of data with high-dimensional features, a high computational complexity degree, and a low rate of identification problems, which causes great difficulty in fault diagnosis of spacecraft electronic load systems. This paper proposes a feature extraction method that is based on deep belief networks (DBN) and a classification method that is based on the random forest (RF) algorithm; The proposed algorithm mainly employs a multi-layer neural network to reduce the dimension of the original data, and then, classification is applied. Firstly, we use the method of wavelet denoising, which was used to pre-process the data. Secondly, the deep belief network is used to reduce the feature dimension and improve the rate of classification for the electrical characteristics data. Finally, we used the random forest algorithm to classify the data and comparing it with other algorithms. The experimental results show that compared with other algorithms, the proposed method shows excellent performance in terms of accuracy, computational efficiency, and stability in addressing spacecraft electrical signal data. PMID:28486479
Multi-label spacecraft electrical signal classification method based on DBN and random forest.
Li, Ke; Yu, Nan; Li, Pengfei; Song, Shimin; Wu, Yalei; Li, Yang; Liu, Meng
2017-01-01
In spacecraft electrical signal characteristic data, there exists a large amount of data with high-dimensional features, a high computational complexity degree, and a low rate of identification problems, which causes great difficulty in fault diagnosis of spacecraft electronic load systems. This paper proposes a feature extraction method that is based on deep belief networks (DBN) and a classification method that is based on the random forest (RF) algorithm; The proposed algorithm mainly employs a multi-layer neural network to reduce the dimension of the original data, and then, classification is applied. Firstly, we use the method of wavelet denoising, which was used to pre-process the data. Secondly, the deep belief network is used to reduce the feature dimension and improve the rate of classification for the electrical characteristics data. Finally, we used the random forest algorithm to classify the data and comparing it with other algorithms. The experimental results show that compared with other algorithms, the proposed method shows excellent performance in terms of accuracy, computational efficiency, and stability in addressing spacecraft electrical signal data.
NASA Astrophysics Data System (ADS)
Sokolov, Anton; Gengembre, Cyril; Dmitriev, Egor; Delbarre, Hervé
2017-04-01
The problem is considered of classification of local atmospheric meteorological events in the coastal area such as sea breezes, fogs and storms. The in-situ meteorological data as wind speed and direction, temperature, humidity and turbulence are used as predictors. Local atmospheric events of 2013-2014 were analysed manually to train classification algorithms in the coastal area of English Channel in Dunkirk (France). Then, ultrasonic anemometer data and LIDAR wind profiler data were used as predictors. A few algorithms were applied to determine meteorological events by local data such as a decision tree, the nearest neighbour classifier, a support vector machine. The comparison of classification algorithms was carried out, the most important predictors for each event type were determined. It was shown that in more than 80 percent of the cases machine learning algorithms detect the meteorological class correctly. We expect that this methodology could be applied also to classify events by climatological in-situ data or by modelling data. It allows estimating frequencies of each event in perspective of climate change.
Attention Recognition in EEG-Based Affective Learning Research Using CFS+KNN Algorithm.
Hu, Bin; Li, Xiaowei; Sun, Shuting; Ratcliffe, Martyn
2018-01-01
The research detailed in this paper focuses on the processing of Electroencephalography (EEG) data to identify attention during the learning process. The identification of affect using our procedures is integrated into a simulated distance learning system that provides feedback to the user with respect to attention and concentration. The authors propose a classification procedure that combines correlation-based feature selection (CFS) and a k-nearest-neighbor (KNN) data mining algorithm. To evaluate the CFS+KNN algorithm, it was test against CFS+C4.5 algorithm and other classification algorithms. The classification performance was measured 10 times with different 3-fold cross validation data. The data was derived from 10 subjects while they were attempting to learn material in a simulated distance learning environment. A self-assessment model of self-report was used with a single valence to evaluate attention on 3 levels (high, neutral, low). It was found that CFS+KNN had a much better performance, giving the highest correct classification rate (CCR) of % for the valence dimension divided into three classes.
NASA Astrophysics Data System (ADS)
Brodic, D.
2011-01-01
Text line segmentation represents the key element in the optical character recognition process. Hence, testing of text line segmentation algorithms has substantial relevance. All previously proposed testing methods deal mainly with text database as a template. They are used for testing as well as for the evaluation of the text segmentation algorithm. In this manuscript, methodology for the evaluation of the algorithm for text segmentation based on extended binary classification is proposed. It is established on the various multiline text samples linked with text segmentation. Their results are distributed according to binary classification. Final result is obtained by comparative analysis of cross linked data. At the end, its suitability for different types of scripts represents its main advantage.
Sandhu, Harpreet; Verma, Pradhuman; Padda, Sarfaraz; Raj, Seetharamaiha Sunder
2017-11-01
To investigate the frequency and uniqueness of different lip print patterns, fingerprint patterns in relation to gender and ABO Rh blood groups among a semi-urban population of Sriganganagar, Rajasthan. The study was conducted on 1200 healthy volunteers aged 18-30 years. The cheiloscopic and dermatographic data of each subject were obtained and were analysed according to the Suzuki and Tsuchihashi and Henry systems of classification, respectively. Two forensic experts analyzed the patterns independently. The ABO Rh blood group was also recorded for each subject. The Chi square statistical analysis was done and tests were considered significant when p value <0.001 and Cohen kappa test was applied to analyze inter-observer reliability. The B+ blood group was noted as most common in both genders while least common were A- among males and AB- in females. Type II lip pattern was most predominant while the least common was Type I' in males and Type I' and Type V in females. The UL fingerprint pattern was the most common, while RL was least noted in both genders. All the fingerprint patterns showed correlation with different lip print patterns. A correlation was found between different blood groups and lip print patterns except Type I (vertical) lip pattern. A positive correlation was observed between all the blood groups and fingerprint patterns, except for RL pattern. There is an association between lip print patterns, fingerprint patterns and ABO blood groups in both the genders. Thus, correlating the uniqueness of these physical evidences sometimes helps the forensic team members in accurate personal identification or it can at least narrow the search for an individual where there are no possible data referring to the identity of the subject. Copyright © 2017 by Academy of Sciences and Arts of Bosnia and Herzegovina.
Evolving land cover classification algorithms for multispectral and multitemporal imagery
NASA Astrophysics Data System (ADS)
Brumby, Steven P.; Theiler, James P.; Bloch, Jeffrey J.; Harvey, Neal R.; Perkins, Simon J.; Szymanski, John J.; Young, Aaron C.
2002-01-01
The Cerro Grande/Los Alamos forest fire devastated over 43,000 acres (17,500 ha) of forested land, and destroyed over 200 structures in the town of Los Alamos and the adjoining Los Alamos National Laboratory. The need to measure the continuing impact of the fire on the local environment has led to the application of a number of remote sensing technologies. During and after the fire, remote-sensing data was acquired from a variety of aircraft- and satellite-based sensors, including Landsat 7 Enhanced Thematic Mapper (ETM+). We now report on the application of a machine learning technique to the automated classification of land cover using multi-spectral and multi-temporal imagery. We apply a hybrid genetic programming/supervised classification technique to evolve automatic feature extraction algorithms. We use a software package we have developed at Los Alamos National Laboratory, called GENIE, to carry out this evolution. We use multispectral imagery from the Landsat 7 ETM+ instrument from before, during, and after the wildfire. Using an existing land cover classification based on a 1992 Landsat 5 TM scene for our training data, we evolve algorithms that distinguish a range of land cover categories, and an algorithm to mask out clouds and cloud shadows. We report preliminary results of combining individual classification results using a K-means clustering approach. The details of our evolved classification are compared to the manually produced land-cover classification.
Gao, Wen; Yang, Hua; Qi, Lian-Wen; Liu, E-Hu; Ren, Mei-Ting; Yan, Yu-Ting; Chen, Jun; Li, Ping
2012-07-06
Plant-based medicines become increasingly popular over the world. Authentication of herbal raw materials is important to ensure their safety and efficacy. Some herbs belonging to closely related species but differing in medicinal properties are difficult to be identified because of similar morphological and microscopic characteristics. Chromatographic fingerprinting is an alternative method to distinguish them. Existing approaches do not allow a comprehensive analysis for herbal authentication. We have now developed a strategy consisting of (1) full metabolic profiling of herbal medicines by rapid resolution liquid chromatography (RRLC) combined with quadrupole time-of-flight mass spectrometry (QTOF MS), (2) global analysis of non-targeted compounds by molecular feature extraction algorithm, (3) multivariate statistical analysis for classification and prediction, and (4) marker compounds characterization. This approach has provided a fast and unbiased comparative multivariate analysis of the metabolite composition of 33-batch samples covering seven Lonicera species. Individual metabolic profiles are performed at the level of molecular fragments without prior structural assignment. In the entire set, the obtained classifier for seven Lonicera species flower buds showed good prediction performance and a total of 82 statistically different components were rapidly obtained by the strategy. The elemental compositions of discriminative metabolites were characterized by the accurate mass measurement of the pseudomolecular ions and their chemical types were assigned by the MS/MS spectra. The high-resolution, comprehensive and unbiased strategy for metabolite data analysis presented here is powerful and opens the new direction of authentication in herbal analysis. Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Duraipandian, Shiyamala; Zheng, Wei; Ng, Joseph; Low, Jeffrey J. H.; Ilancheran, A.; Huang, Zhiwei
2012-03-01
Raman spectroscopy is a unique analytical probe for molecular vibration and is capable of providing specific spectroscopic fingerprints of molecular compositions and structures of biological tissues. The aim of this study is to improve the classification accuracy of cervical precancer by characterizing the variations in the normal high wavenumber (HW - 2800-3700cm-1) Raman spectra arising from the menopausal status of the cervix. A rapidacquisition near-infrared (NIR) Raman spectroscopic system was used for in vivo tissue Raman measurements at 785 nm excitation. Individual HW Raman spectrum was measured with a 5s exposure time from both normal and precancer tissue sites of 15 patients recruited. The acquired Raman spectra were stratified based on the menopausal status of the cervix before the data analysis. Significant differences were noticed in Raman intensities of prominent band at 2924 cm-1 (CH3 stretching of proteins) and the broad water Raman band (in the 3100-3700 cm-1 range) with a peak at 3390 cm-1 in normal and dysplasia cervical tissue sites. Multivariate diagnostic decision algorithm based on principal component analysis (PCA) and linear discriminant analysis (LDA) was utilized to successfully differentiate the normal and precancer cervical tissue sites. By considering the variations in the Raman spectra of normal cervix due to the hormonal or menopausal status of women, the diagnostic accuracy was improved from 71 to 91%. By incorporating these variations prior to tissue classification, we can significantly improve the accuracy of cervical precancer detection using HW Raman spectroscopy.
Chen, Guoliang; Meng, Xiaolin; Wang, Yunjia; Zhang, Yanzhe; Tian, Peng; Yang, Huachao
2015-09-23
Because of the high calculation cost and poor performance of a traditional planar map when dealing with complicated indoor geographic information, a WiFi fingerprint indoor positioning system cannot be widely employed on a smartphone platform. By making full use of the hardware sensors embedded in the smartphone, this study proposes an integrated approach to a three-dimensional (3D) indoor positioning system. First, an improved K-means clustering method is adopted to reduce the fingerprint database retrieval time and enhance positioning efficiency. Next, with the mobile phone's acceleration sensor, a new step counting method based on auto-correlation analysis is proposed to achieve cell phone inertial navigation positioning. Furthermore, the integration of WiFi positioning with Pedestrian Dead Reckoning (PDR) obtains higher positional accuracy with the help of the Unscented Kalman Filter algorithm. Finally, a hybrid 3D positioning system based on Unity 3D, which can carry out real-time positioning for targets in 3D scenes, is designed for the fluent operation of mobile terminals.
Integrated WiFi/PDR/Smartphone Using an Unscented Kalman Filter Algorithm for 3D Indoor Localization
Chen, Guoliang; Meng, Xiaolin; Wang, Yunjia; Zhang, Yanzhe; Tian, Peng; Yang, Huachao
2015-01-01
Because of the high calculation cost and poor performance of a traditional planar map when dealing with complicated indoor geographic information, a WiFi fingerprint indoor positioning system cannot be widely employed on a smartphone platform. By making full use of the hardware sensors embedded in the smartphone, this study proposes an integrated approach to a three-dimensional (3D) indoor positioning system. First, an improved K-means clustering method is adopted to reduce the fingerprint database retrieval time and enhance positioning efficiency. Next, with the mobile phone’s acceleration sensor, a new step counting method based on auto-correlation analysis is proposed to achieve cell phone inertial navigation positioning. Furthermore, the integration of WiFi positioning with Pedestrian Dead Reckoning (PDR) obtains higher positional accuracy with the help of the Unscented Kalman Filter algorithm. Finally, a hybrid 3D positioning system based on Unity 3D, which can carry out real-time positioning for targets in 3D scenes, is designed for the fluent operation of mobile terminals. PMID:26404314
Robust efficient video fingerprinting
NASA Astrophysics Data System (ADS)
Puri, Manika; Lubin, Jeffrey
2009-02-01
We have developed a video fingerprinting system with robustness and efficiency as the primary and secondary design criteria. In extensive testing, the system has shown robustness to cropping, letter-boxing, sub-titling, blur, drastic compression, frame rate changes, size changes and color changes, as well as to the geometric distortions often associated with camcorder capture in cinema settings. Efficiency is afforded by a novel two-stage detection process in which a fast matching process first computes a number of likely candidates, which are then passed to a second slower process that computes the overall best match with minimal false alarm probability. One key component of the algorithm is a maximally stable volume computation - a three-dimensional generalization of maximally stable extremal regions - that provides a content-centric coordinate system for subsequent hash function computation, independent of any affine transformation or extensive cropping. Other key features include an efficient bin-based polling strategy for initial candidate selection, and a final SIFT feature-based computation for final verification. We describe the algorithm and its performance, and then discuss additional modifications that can provide further improvement to efficiency and accuracy.
Machine Learning for Biological Trajectory Classification Applications
NASA Technical Reports Server (NTRS)
Sbalzarini, Ivo F.; Theriot, Julie; Koumoutsakos, Petros
2002-01-01
Machine-learning techniques, including clustering algorithms, support vector machines and hidden Markov models, are applied to the task of classifying trajectories of moving keratocyte cells. The different algorithms axe compared to each other as well as to expert and non-expert test persons, using concepts from signal-detection theory. The algorithms performed very well as compared to humans, suggesting a robust tool for trajectory classification in biological applications.
Prahm, Cosima; Eckstein, Korbinian; Ortiz-Catalan, Max; Dorffner, Georg; Kaniusas, Eugenijus; Aszmann, Oskar C
2016-08-31
Controlling a myoelectric prosthesis for upper limbs is increasingly challenging for the user as more electrodes and joints become available. Motion classification based on pattern recognition with a multi-electrode array allows multiple joints to be controlled simultaneously. Previous pattern recognition studies are difficult to compare, because individual research groups use their own data sets. To resolve this shortcoming and to facilitate comparisons, open access data sets were analysed using components of BioPatRec and Netlab pattern recognition models. Performances of the artificial neural networks, linear models, and training program components were compared. Evaluation took place within the BioPatRec environment, a Matlab-based open source platform that provides feature extraction, processing and motion classification algorithms for prosthetic control. The algorithms were applied to myoelectric signals for individual and simultaneous classification of movements, with the aim of finding the best performing algorithm and network model. Evaluation criteria included classification accuracy and training time. Results in both the linear and the artificial neural network models demonstrated that Netlab's implementation using scaled conjugate training algorithm reached significantly higher accuracies than BioPatRec. It is concluded that the best movement classification performance would be achieved through integrating Netlab training algorithms in the BioPatRec environment so that future prosthesis training can be shortened and control made more reliable. Netlab was therefore included into the newest release of BioPatRec (v4.0).
Data-driven advice for applying machine learning to bioinformatics problems
Olson, Randal S.; La Cava, William; Mustahsan, Zairah; Varik, Akshay; Moore, Jason H.
2017-01-01
As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems. PMID:29218881
NASA Astrophysics Data System (ADS)
Bassa, Zaakirah; Bob, Urmilla; Szantoi, Zoltan; Ismail, Riyad
2016-01-01
In recent years, the popularity of tree-based ensemble methods for land cover classification has increased significantly. Using WorldView-2 image data, we evaluate the potential of the oblique random forest algorithm (oRF) to classify a highly heterogeneous protected area. In contrast to the random forest (RF) algorithm, the oRF algorithm builds multivariate trees by learning the optimal split using a supervised model. The oRF binary algorithm is adapted to a multiclass land cover and land use application using both the "one-against-one" and "one-against-all" combination approaches. Results show that the oRF algorithms are capable of achieving high classification accuracies (>80%). However, there was no statistical difference in classification accuracies obtained by the oRF algorithms and the more popular RF algorithm. For all the algorithms, user accuracies (UAs) and producer accuracies (PAs) >80% were recorded for most of the classes. Both the RF and oRF algorithms poorly classified the indigenous forest class as indicated by the low UAs and PAs. Finally, the results from this study advocate and support the utility of the oRF algorithm for land cover and land use mapping of protected areas using WorldView-2 image data.
Carotenoids Database: structures, chemical fingerprints and distribution among organisms
2017-01-01
Abstract To promote understanding of how organisms are related via carotenoids, either evolutionarily or symbiotically, or in food chains through natural histories, we built the Carotenoids Database. This provides chemical information on 1117 natural carotenoids with 683 source organisms. For extracting organisms closely related through the biosynthesis of carotenoids, we offer a new similarity search system ‘Search similar carotenoids’ using our original chemical fingerprint ‘Carotenoid DB Chemical Fingerprints’. These Carotenoid DB Chemical Fingerprints describe the chemical substructure and the modification details based upon International Union of Pure and Applied Chemistry (IUPAC) semi-systematic names of the carotenoids. The fingerprints also allow (i) easier prediction of six biological functions of carotenoids: provitamin A, membrane stabilizers, odorous substances, allelochemicals, antiproliferative activity and reverse MDR activity against cancer cells, (ii) easier classification of carotenoid structures, (iii) partial and exact structure searching and (iv) easier extraction of structural isomers and stereoisomers. We believe this to be the first attempt to establish fingerprints using the IUPAC semi-systematic names. For extracting close profiled organisms, we provide a new tool ‘Search similar profiled organisms’. Our current statistics show some insights into natural history: carotenoids seem to have been spread largely by bacteria, as they produce C30, C40, C45 and C50 carotenoids, with the widest range of end groups, and they share a small portion of C40 carotenoids with eukaryotes. Archaea share an even smaller portion with eukaryotes. Eukaryotes then have evolved a considerable variety of C40 carotenoids. Considering carotenoids, eukaryotes seem more closely related to bacteria than to archaea aside from 16S rRNA lineage analysis. Database URL: http://carotenoiddb.jp PMID:28365725
Fu, Haiyan; Fan, Yao; Zhang, Xu; Lan, Hanyue; Yang, Tianming; Shao, Mei; Li, Sihan
2015-01-01
As an effective method, the fingerprint technique, which emphasized the whole compositions of samples, has already been used in various fields, especially in identifying and assessing the quality of herbal medicines. High-performance liquid chromatography (HPLC) and near-infrared (NIR), with their unique characteristics of reliability, versatility, precision, and simple measurement, played an important role among all the fingerprint techniques. In this paper, a supervised pattern recognition method based on PLSDA algorithm by HPLC and NIR has been established to identify the information of Hibiscus mutabilis L. and Berberidis radix, two common kinds of herbal medicines. By comparing component analysis (PCA), linear discriminant analysis (LDA), and particularly partial least squares discriminant analysis (PLSDA) with different fingerprint preprocessing of NIR spectra variables, PLSDA model showed perfect functions on the analysis of samples as well as chromatograms. Most important, this pattern recognition method by HPLC and NIR can be used to identify different collection parts, collection time, and different origins or various species belonging to the same genera of herbal medicines which proved to be a promising approach for the identification of complex information of herbal medicines. PMID:26345990
Preprocessing and meta-classification for brain-computer interfaces.
Hammon, Paul S; de Sa, Virginia R
2007-03-01
A brain-computer interface (BCI) is a system which allows direct translation of brain states into actions, bypassing the usual muscular pathways. A BCI system works by extracting user brain signals, applying machine learning algorithms to classify the user's brain state, and performing a computer-controlled action. Our goal is to improve brain state classification. Perhaps the most obvious way to improve classification performance is the selection of an advanced learning algorithm. However, it is now well known in the BCI community that careful selection of preprocessing steps is crucial to the success of any classification scheme. Furthermore, recent work indicates that combining the output of multiple classifiers (meta-classification) leads to improved classification rates relative to single classifiers (Dornhege et al., 2004). In this paper, we develop an automated approach which systematically analyzes the relative contributions of different preprocessing and meta-classification approaches. We apply this procedure to three data sets drawn from BCI Competition 2003 (Blankertz et al., 2004) and BCI Competition III (Blankertz et al., 2006), each of which exhibit very different characteristics. Our final classification results compare favorably with those from past BCI competitions. Additionally, we analyze the relative contributions of individual preprocessing and meta-classification choices and discuss which types of BCI data benefit most from specific algorithms.
Comparison of different classification algorithms for underwater target discrimination.
Li, Donghui; Azimi-Sadjadi, Mahmood R; Robinson, Marc
2004-01-01
Classification of underwater targets from the acoustic backscattered signals is considered here. Several different classification algorithms are tested and benchmarked not only for their performance but also to gain insight to the properties of the feature space. Results on a wideband 80-kHz acoustic backscattered data set collected for six different objects are presented in terms of the receiver operating characteristic (ROC) and robustness of the classifiers wrt reverberation.
Analysis of miRNA expression profile based on SVM algorithm
NASA Astrophysics Data System (ADS)
Ting-ting, Dai; Chang-ji, Shan; Yan-shou, Dong; Yi-duo, Bian
2018-05-01
Based on mirna expression spectrum data set, a new data mining algorithm - tSVM - KNN (t statistic with support vector machine - k nearest neighbor) is proposed. the idea of the algorithm is: firstly, the feature selection of the data set is carried out by the unified measurement method; Secondly, SVM - KNN algorithm, which combines support vector machine (SVM) and k - nearest neighbor (k - nearest neighbor) is used as classifier. Simulation results show that SVM - KNN algorithm has better classification ability than SVM and KNN alone. Tsvm - KNN algorithm only needs 5 mirnas to obtain 96.08 % classification accuracy in terms of the number of mirna " tags" and recognition accuracy. compared with similar algorithms, tsvm - KNN algorithm has obvious advantages.
Enhancing Breast Cancer Recurrence Algorithms Through Selective Use of Medical Record Data
Chubak, Jessica; Johnson, Lisa; Castillo, Adrienne; Weltzien, Erin; Caan, Bette J.
2016-01-01
Abstract Background: The utility of data-based algorithms in research has been questioned because of errors in identification of cancer recurrences. We adapted previously published breast cancer recurrence algorithms, selectively using medical record (MR) data to improve classification. Methods: We evaluated second breast cancer event (SBCE) and recurrence-specific algorithms previously published by Chubak and colleagues in 1535 women from the Life After Cancer Epidemiology (LACE) and 225 women from the Women’s Health Initiative cohorts and compared classification statistics to published values. We also sought to improve classification with minimal MR examination. We selected pairs of algorithms—one with high sensitivity/high positive predictive value (PPV) and another with high specificity/high PPV—using MR information to resolve discrepancies between algorithms, properly classifying events based on review; we called this “triangulation.” Finally, in LACE, we compared associations between breast cancer survival risk factors and recurrence using MR data, single Chubak algorithms, and triangulation. Results: The SBCE algorithms performed well in identifying SBCE and recurrences. Recurrence-specific algorithms performed more poorly than published except for the high-specificity/high-PPV algorithm, which performed well. The triangulation method (sensitivity = 81.3%, specificity = 99.7%, PPV = 98.1%, NPV = 96.5%) improved recurrence classification over two single algorithms (sensitivity = 57.1%, specificity = 95.5%, PPV = 71.3%, NPV = 91.9%; and sensitivity = 74.6%, specificity = 97.3%, PPV = 84.7%, NPV = 95.1%), with 10.6% MR review. Triangulation performed well in survival risk factor analyses vs analyses using MR-identified recurrences. Conclusions: Use of multiple recurrence algorithms in administrative data, in combination with selective examination of MR data, may improve recurrence data quality and reduce research costs. PMID:26582243
Carroll, Kristen L; Murray, Kathleen A; MacLeod, Lynne M; Hennessey, Theresa A; Woiczik, Marcella R; Roach, James W
2011-06-01
Numerous studies underscore the poor intraobserver and interobserver reliability of both the center edge angle (CEA) and the Severin classification using plain film measurements. In this study, experienced observers applied a computer-assisted measurement program to determine the CEA in digital pelvic radiographs of adults who had been previously treated for dysplasia of the hip (DDH). Using a teaching aid/algorithm of the Severin classification, the observers then assigned a Severin rating to these hips. Intraobserver and interobserver errors were then calculated on both the CEA measurements and the Severin classifications. Four pediatric orthopaedic surgeons and 1 pediatric radiologist calculated the CEAs using the OrthoView TM planning system and then determined the Severin classification on 41 blinded digital pelvic radiographs. The radiographs were evaluated by each examiner twice, with evaluations separated by 2 months. All examiners reviewed a Severin classification algorithm before making their Severin assignments. The intraobserver and interobserver reliability for both the CEA and the Severin classification were calculated using the interclass correlation coefficients and Cohen and Fleiss κ scores, respectively. The intraobserver and interobserver reliability for CEA measurement was moderate to almost perfect. When we separated the Severin classification into 3 clinically relevant groups of good (Severin I and II), dysplastic (Severin III), and poor (Severin IV and above), our interobserver reliability neared almost perfect. The Severin classification is an extremely useful and oft-used radiographic measure for the success of DDH treatment. Our research found digital radiography, computer-aided measurement tools, the use of a Severin algorithm, and separating the Severin classification into 3 clinically relevant groups significantly increased the intraobserver and interobserver reliability of both the CEA and Severin classification. This finding will assist future studies using the CEA and Severin classification in the radiographic assessment of DDH treatment outcomes.
Research on Optimization of GLCM Parameter in Cell Classification
NASA Astrophysics Data System (ADS)
Zhang, Xi-Kun; Hou, Jie; Hu, Xin-Hua
2016-05-01
Real-time classification of biological cells according to their 3D morphology is highly desired in a flow cytometer setting. Gray level co-occurrence matrix (GLCM) algorithm has been developed to extract feature parameters from measured diffraction images ,which are too complicated to coordinate with the real-time system for a large amount of calculation. An optimization of GLCM algorithm is provided based on correlation analysis of GLCM parameters. The results of GLCM analysis and subsequent classification demonstrate optimized method can lower the time complexity significantly without loss of classification accuracy.
a Gsa-Svm Hybrid System for Classification of Binary Problems
NASA Astrophysics Data System (ADS)
Sarafrazi, Soroor; Nezamabadi-pour, Hossein; Barahman, Mojgan
2011-06-01
This paperhybridizesgravitational search algorithm (GSA) with support vector machine (SVM) and made a novel GSA-SVM hybrid system to improve the classification accuracy in binary problems. GSA is an optimization heuristic toolused to optimize the value of SVM kernel parameter (in this paper, radial basis function (RBF) is chosen as the kernel function). The experimental results show that this newapproach can achieve high classification accuracy and is comparable to or better than the particle swarm optimization (PSO)-SVM and genetic algorithm (GA)-SVM, which are two hybrid systems for classification.
Classification of Clinically Relevant Microorganisms in Non-Medical Environments
2004-05-06
settings largely due to the rapidity of its evolutionary response to treatment. The first antibiotic-resistant strains of S . aureus were isolated only...studies have assigned isolates of the bacteria to known strains. The objectives of this study were to collect, isolate and characterize samples of S ...internal fragments of seven genes were obtained for 36 S . aureus isolates and assigned a unique allelic profile. These profiles, like fingerprints
Mai, Xiaofeng; Liu, Jie; Wu, Xiong; Zhang, Qun; Guo, Changjian; Yang, Yanfu; Li, Zhaohui
2017-02-06
A Stokes-space modulation format classification (MFC) technique is proposed for coherent optical receivers by using a non-iterative clustering algorithm. In the clustering algorithm, two simple parameters are calculated to help find the density peaks of the data points in Stokes space and no iteration is required. Correct MFC can be realized in numerical simulations among PM-QPSK, PM-8QAM, PM-16QAM, PM-32QAM and PM-64QAM signals within practical optical signal-to-noise ratio (OSNR) ranges. The performance of the proposed MFC algorithm is also compared with those of other schemes based on clustering algorithms. The simulation results show that good classification performance can be achieved using the proposed MFC scheme with moderate time complexity. Proof-of-concept experiments are finally implemented to demonstrate MFC among PM-QPSK/16QAM/64QAM signals, which confirm the feasibility of our proposed MFC scheme.
A Novel Segment-Based Approach for Improving Classification Performance of Transport Mode Detection.
Guvensan, M Amac; Dusun, Burak; Can, Baris; Turkmen, H Irem
2017-12-30
Transportation planning and solutions have an enormous impact on city life. To minimize the transport duration, urban planners should understand and elaborate the mobility of a city. Thus, researchers look toward monitoring people's daily activities including transportation types and duration by taking advantage of individual's smartphones. This paper introduces a novel segment-based transport mode detection architecture in order to improve the results of traditional classification algorithms in the literature. The proposed post-processing algorithm, namely the Healing algorithm, aims to correct the misclassification results of machine learning-based solutions. Our real-life test results show that the Healing algorithm could achieve up to 40% improvement of the classification results. As a result, the implemented mobile application could predict eight classes including stationary, walking, car, bus, tram, train, metro and ferry with a success rate of 95% thanks to the proposed multi-tier architecture and Healing algorithm.
Applying FastSLAM to Articulated Rovers
NASA Astrophysics Data System (ADS)
Hewitt, Robert Alexander
This thesis presents the navigation algorithms designed for use on Kapvik, a 30 kg planetary micro-rover built for the Canadian Space Agency; the simulations used to test the algorithm; and novel techniques for terrain classification using Kapvik's LIDAR (Light Detection And Ranging) sensor. Kapvik implements a six-wheeled, skid-steered, rocker-bogie mobility system. This warrants a more complicated kinematic model for navigation than a typical 4-wheel differential drive system. The design of a 3D navigation algorithm is presented that includes nonlinear Kalman filtering and Simultaneous Localization and Mapping (SLAM). A neural network for terrain classification is used to improve navigation performance. Simulation is used to train the neural network and validate the navigation algorithms. Real world tests of the terrain classification algorithm validate the use of simulation for training and the improvement to SLAM through the reduction of extraneous LIDAR measurements in each scan.
Handwritten digits recognition based on immune network
NASA Astrophysics Data System (ADS)
Li, Yangyang; Wu, Yunhui; Jiao, Lc; Wu, Jianshe
2011-11-01
With the development of society, handwritten digits recognition technique has been widely applied to production and daily life. It is a very difficult task to solve these problems in the field of pattern recognition. In this paper, a new method is presented for handwritten digit recognition. The digit samples firstly are processed and features extraction. Based on these features, a novel immune network classification algorithm is designed and implemented to the handwritten digits recognition. The proposed algorithm is developed by Jerne's immune network model for feature selection and KNN method for classification. Its characteristic is the novel network with parallel commutating and learning. The performance of the proposed method is experimented to the handwritten number datasets MNIST and compared with some other recognition algorithms-KNN, ANN and SVM algorithm. The result shows that the novel classification algorithm based on immune network gives promising performance and stable behavior for handwritten digits recognition.
Liang, Wenyi; Chen, Wenjing; Wu, Lingfang; Li, Shi; Qi, Qi; Cui, Yaping; Liang, Linjin; Ye, Ting; Zhang, Lanzhen
2017-03-17
Danshen, the dried root of Salvia miltiorrhiza Bge., is a widely used commercially available herbal drug, and unstable quality of different samples is a current issue. This study focused on a comprehensive and systematic method combining fingerprints and chemical identification with chemometrics for discrimination and quality assessment of Danshen samples. Twenty-five samples were analyzed by HPLC-PAD and HPLC-MS n . Forty-nine components were identified and characteristic fragmentation regularities were summarized for further interpretation of bioactive components. Chemometric analysis was employed to differentiate samples and clarify the quality differences of Danshen including hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis. Consistent results were that the samples were divided into three categories which reflected the difference in quality of Danshen samples. By analyzing the reasons for sample classification, it was revealed that the processing method had a more obvious impact on sample classification than the geographical origin, it induced the different content of bioactive compounds and finally lead to different qualities. Cryptotanshinone, trijuganone B, and 15,16-dihydrotanshinone I were screened out as markers to distinguish samples by different processing methods. The developed strategy could provide a reference for evaluation and discrimination of other traditional herbal medicines.
Controlling protected designation of origin of wine by Raman spectroscopy.
Mandrile, Luisa; Zeppa, Giuseppe; Giovannozzi, Andrea Mario; Rossi, Andrea Mario
2016-11-15
In this paper, a Fourier Transform Raman spectroscopy method, to authenticate the provenience of wine, for food traceability applications was developed. In particular, due to the specific chemical fingerprint of the Raman spectrum, it was possible to discriminate different wines produced in the Piedmont area (North West Italy) in accordance with i) grape varieties, ii) production area and iii) ageing time. In order to create a consistent training set, more than 300 samples from tens of different producers were analyzed, and a chemometric treatment of raw spectra was applied. A discriminant analysis method was employed in the classification procedures, providing a classification capability (percentage of correct answers) of 90% for validation of grape analysis and geographical area provenance, and a classification capability of 84% for ageing time classification. The present methodology was applied successfully to raw materials without any preliminary treatment of the sample, providing a response in a very short time. Copyright © 2016 Elsevier Ltd. All rights reserved.
Drivelos, Spiros A; Danezis, Georgios P; Haroutounian, Serkos A; Georgiou, Constantinos A
2016-12-15
This study examines the trace and rare earth elemental (REE) fingerprint variations of PDO (Protected Designation of Origin) "Fava Santorinis" over three consecutive harvesting years (2011-2013). Classification of samples in harvesting years was studied by performing discriminant analysis (DA), k nearest neighbours (κ-NN), partial least squares (PLS) analysis and probabilistic neural networks (PNN) using rare earth elements and trace metals determined using ICP-MS. DA performed better than κ-NN, producing 100% discrimination using trace elements and 79% using REEs. PLS was found to be superior to PNN, achieving 99% and 90% classification for trace and REEs, respectively, while PNN achieved 96% and 71% classification for trace and REEs, respectively. The information obtained using REEs did not enhance classification, indicating that REEs vary minimally per harvesting year, providing robust geographical origin discrimination. The results show that seasonal patterns can occur in the elemental composition of "Fava Santorinis", probably reflecting seasonality of climate. Copyright © 2016 Elsevier Ltd. All rights reserved.
Evaluation of registration, compression and classification algorithms. Volume 1: Results
NASA Technical Reports Server (NTRS)
Jayroe, R.; Atkinson, R.; Callas, L.; Hodges, J.; Gaggini, B.; Peterson, J.
1979-01-01
The registration, compression, and classification algorithms were selected on the basis that such a group would include most of the different and commonly used approaches. The results of the investigation indicate clearcut, cost effective choices for registering, compressing, and classifying multispectral imagery.
Sinha, S K; Karray, F
2002-01-01
Pipeline surface defects such as holes and cracks cause major problems for utility managers, particularly when the pipeline is buried under the ground. Manual inspection for surface defects in the pipeline has a number of drawbacks, including subjectivity, varying standards, and high costs. Automatic inspection system using image processing and artificial intelligence techniques can overcome many of these disadvantages and offer utility managers an opportunity to significantly improve quality and reduce costs. A recognition and classification of pipe cracks using images analysis and neuro-fuzzy algorithm is proposed. In the preprocessing step the scanned images of pipe are analyzed and crack features are extracted. In the classification step the neuro-fuzzy algorithm is developed that employs a fuzzy membership function and error backpropagation algorithm. The idea behind the proposed approach is that the fuzzy membership function will absorb variation of feature values and the backpropagation network, with its learning ability, will show good classification efficiency.
A combined reconstruction-classification method for diffuse optical tomography.
Hiltunen, P; Prince, S J D; Arridge, S
2009-11-07
We present a combined classification and reconstruction algorithm for diffuse optical tomography (DOT). DOT is a nonlinear ill-posed inverse problem. Therefore, some regularization is needed. We present a mixture of Gaussians prior, which regularizes the DOT reconstruction step. During each iteration, the parameters of a mixture model are estimated. These associate each reconstructed pixel with one of several classes based on the current estimate of the optical parameters. This classification is exploited to form a new prior distribution to regularize the reconstruction step and update the optical parameters. The algorithm can be described as an iteration between an optimization scheme with zeroth-order variable mean and variance Tikhonov regularization and an expectation-maximization scheme for estimation of the model parameters. We describe the algorithm in a general Bayesian framework. Results from simulated test cases and phantom measurements show that the algorithm enhances the contrast of the reconstructed images with good spatial accuracy. The probabilistic classifications of each image contain only a few misclassified pixels.
NASA Astrophysics Data System (ADS)
Sanhouse-García, Antonio J.; Rangel-Peraza, Jesús Gabriel; Bustos-Terrones, Yaneth; García-Ferrer, Alfonso; Mesas-Carrascosa, Francisco J.
2016-02-01
Land cover classification is often based on different characteristics between their classes, but with great homogeneity within each one of them. This cover is obtained through field work or by mean of processing satellite images. Field work involves high costs; therefore, digital image processing techniques have become an important alternative to perform this task. However, in some developing countries and particularly in Casacoima municipality in Venezuela, there is a lack of geographic information systems due to the lack of updated information and high costs in software license acquisition. This research proposes a low cost methodology to develop thematic mapping of local land use and types of coverage in areas with scarce resources. Thematic mapping was developed from CBERS-2 images and spatial information available on the network using open source tools. The supervised classification method per pixel and per region was applied using different classification algorithms and comparing them among themselves. Classification method per pixel was based on Maxver algorithms (maximum likelihood) and Euclidean distance (minimum distance), while per region classification was based on the Bhattacharya algorithm. Satisfactory results were obtained from per region classification, where overall reliability of 83.93% and kappa index of 0.81% were observed. Maxver algorithm showed a reliability value of 73.36% and kappa index 0.69%, while Euclidean distance obtained values of 67.17% and 0.61% for reliability and kappa index, respectively. It was demonstrated that the proposed methodology was very useful in cartographic processing and updating, which in turn serve as a support to develop management plans and land management. Hence, open source tools showed to be an economically viable alternative not only for forestry organizations, but for the general public, allowing them to develop projects in economically depressed and/or environmentally threatened areas.
A typology of street patterns.
Louf, Rémi; Barthelemy, Marc
2014-12-06
We propose a quantitative method to classify cities according to their street pattern. We use the conditional probability distribution of shape factor of blocks with a given area and define what could constitute the 'fingerprint' of a city. Using a simple hierarchical clustering method, these fingerprints can then serve as a basis for a typology of cities. We apply this method to a set of 131 cities in the world, and at an intermediate level of the dendrogram, we observe four large families of cities characterized by different abundances of blocks of a certain area and shape. At a lower level of the classification, we find that most European cities and American cities in our sample fall in their own sub-category, highlighting quantitatively the differences between the typical layouts of cities in both regions. We also show with the example of New York and its different boroughs, that the fingerprint of a city can be seen as the sum of the ones characterizing the different neighbourhoods inside a city. This method provides a quantitative comparison of urban street patterns, which could be helpful for a better understanding of the causes and mechanisms behind their distinct shapes. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Comparative study of classification algorithms for immunosignaturing data
2012-01-01
Background High-throughput technologies such as DNA, RNA, protein, antibody and peptide microarrays are often used to examine differences across drug treatments, diseases, transgenic animals, and others. Typically one trains a classification system by gathering large amounts of probe-level data, selecting informative features, and classifies test samples using a small number of features. As new microarrays are invented, classification systems that worked well for other array types may not be ideal. Expression microarrays, arguably one of the most prevalent array types, have been used for years to help develop classification algorithms. Many biological assumptions are built into classifiers that were designed for these types of data. One of the more problematic is the assumption of independence, both at the probe level and again at the biological level. Probes for RNA transcripts are designed to bind single transcripts. At the biological level, many genes have dependencies across transcriptional pathways where co-regulation of transcriptional units may make many genes appear as being completely dependent. Thus, algorithms that perform well for gene expression data may not be suitable when other technologies with different binding characteristics exist. The immunosignaturing microarray is based on complex mixtures of antibodies binding to arrays of random sequence peptides. It relies on many-to-many binding of antibodies to the random sequence peptides. Each peptide can bind multiple antibodies and each antibody can bind multiple peptides. This technology has been shown to be highly reproducible and appears promising for diagnosing a variety of disease states. However, it is not clear what is the optimal classification algorithm for analyzing this new type of data. Results We characterized several classification algorithms to analyze immunosignaturing data. We selected several datasets that range from easy to difficult to classify, from simple monoclonal binding to complex binding patterns in asthma patients. We then classified the biological samples using 17 different classification algorithms. Using a wide variety of assessment criteria, we found ‘Naïve Bayes’ far more useful than other widely used methods due to its simplicity, robustness, speed and accuracy. Conclusions ‘Naïve Bayes’ algorithm appears to accommodate the complex patterns hidden within multilayered immunosignaturing microarray data due to its fundamental mathematical properties. PMID:22720696
Progressive Classification Using Support Vector Machines
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri; Kocurek, Michael
2009-01-01
An algorithm for progressive classification of data, analogous to progressive rendering of images, makes it possible to compromise between speed and accuracy. This algorithm uses support vector machines (SVMs) to classify data. An SVM is a machine learning algorithm that builds a mathematical model of the desired classification concept by identifying the critical data points, called support vectors. Coarse approximations to the concept require only a few support vectors, while precise, highly accurate models require far more support vectors. Once the model has been constructed, the SVM can be applied to new observations. The cost of classifying a new observation is proportional to the number of support vectors in the model. When computational resources are limited, an SVM of the appropriate complexity can be produced. However, if the constraints are not known when the model is constructed, or if they can change over time, a method for adaptively responding to the current resource constraints is required. This capability is particularly relevant for spacecraft (or any other real-time systems) that perform onboard data analysis. The new algorithm enables the fast, interactive application of an SVM classifier to a new set of data. The classification process achieved by this algorithm is characterized as progressive because a coarse approximation to the true classification is generated rapidly and thereafter iteratively refined. The algorithm uses two SVMs: (1) a fast, approximate one and (2) slow, highly accurate one. New data are initially classified by the fast SVM, producing a baseline approximate classification. For each classified data point, the algorithm calculates a confidence index that indicates the likelihood that it was classified correctly in the first pass. Next, the data points are sorted by their confidence indices and progressively reclassified by the slower, more accurate SVM, starting with the items most likely to be incorrectly classified. The user can halt this reclassification process at any point, thereby obtaining the best possible result for a given amount of computation time. Alternatively, the results can be displayed as they are generated, providing the user with real-time feedback about the current accuracy of classification.
Classification of voting algorithms for N-version software
NASA Astrophysics Data System (ADS)
Tsarev, R. Yu; Durmuş, M. S.; Üstoglu, I.; Morozov, V. A.
2018-05-01
A voting algorithm in N-version software is a crucial component that evaluates the execution of each of the N versions and determines the correct result. Obviously, the result of the voting algorithm determines the outcome of the N-version software in general. Thus, the choice of the voting algorithm is a vital issue. A lot of voting algorithms were already developed and they may be selected for implementation based on the specifics of the analysis of input data. However, the voting algorithms applied in N-version software are not classified. This article presents an overview of classic and recent voting algorithms used in N-version software and the authors' classification of the voting algorithms. Moreover, the steps of the voting algorithms are presented and the distinctive features of the voting algorithms in Nversion software are defined.
A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update
NASA Astrophysics Data System (ADS)
Lotte, F.; Bougrain, L.; Cichocki, A.; Clerc, M.; Congedo, M.; Rakotomamonjy, A.; Yger, F.
2018-06-01
Objective. Most current electroencephalography (EEG)-based brain–computer interfaces (BCIs) are based on machine learning algorithms. There is a large diversity of classifier types that are used in this field, as described in our 2007 review paper. Now, approximately ten years after this review publication, many new algorithms have been developed and tested to classify EEG signals in BCIs. The time is therefore ripe for an updated review of EEG classification algorithms for BCIs. Approach. We surveyed the BCI and machine learning literature from 2007 to 2017 to identify the new classification approaches that have been investigated to design BCIs. We synthesize these studies in order to present such algorithms, to report how they were used for BCIs, what were the outcomes, and to identify their pros and cons. Main results. We found that the recently designed classification algorithms for EEG-based BCIs can be divided into four main categories: adaptive classifiers, matrix and tensor classifiers, transfer learning and deep learning, plus a few other miscellaneous classifiers. Among these, adaptive classifiers were demonstrated to be generally superior to static ones, even with unsupervised adaptation. Transfer learning can also prove useful although the benefits of transfer learning remain unpredictable. Riemannian geometry-based methods have reached state-of-the-art performances on multiple BCI problems and deserve to be explored more thoroughly, along with tensor-based methods. Shrinkage linear discriminant analysis and random forests also appear particularly useful for small training samples settings. On the other hand, deep learning methods have not yet shown convincing improvement over state-of-the-art BCI methods. Significance. This paper provides a comprehensive overview of the modern classification algorithms used in EEG-based BCIs, presents the principles of these methods and guidelines on when and how to use them. It also identifies a number of challenges to further advance EEG classification in BCI.
Contextual classification on a CDC Flexible Processor system. [for photomapped remote sensing data
NASA Technical Reports Server (NTRS)
Smith, B. W.; Siegel, H. J.; Swain, P. H.
1981-01-01
A potential hardware organization for the Flexible Processor Array is presented. An algorithm that implements a contextual classifier for remote sensing data analysis is given, along with uniprocessor classification algorithms. The Flexible Processor algorithm is provided, as are simulated timings for contextual classifiers run on the Flexible Processor Array and another system. The timings are analyzed for context neighborhoods of sizes three and nine.
NASA Astrophysics Data System (ADS)
Abramovich, N. S.; Kovalev, A. A.; Plyuta, V. Y.
1986-02-01
A computer algorithm has been developed to classify the spectral bands of natural scenes on Earth according to their optical characteristics. The algorithm is written in FORTRAN-IV and can be used in spectral data processing programs requiring small data loads. The spectral classifications of some different types of green vegetable canopies are given in order to illustrate the effectiveness of the algorithm.
"Who" is saying "what"? Brain-based decoding of human voice and speech.
Formisano, Elia; De Martino, Federico; Bonte, Milene; Goebel, Rainer
2008-11-07
Can we decipher speech content ("what" is being said) and speaker identity ("who" is saying it) from observations of brain activity of a listener? Here, we combine functional magnetic resonance imaging with a data-mining algorithm and retrieve what and whom a person is listening to from the neural fingerprints that speech and voice signals elicit in the listener's auditory cortex. These cortical fingerprints are spatially distributed and insensitive to acoustic variations of the input so as to permit the brain-based recognition of learned speech from unknown speakers and of learned voices from previously unheard utterances. Our findings unravel the detailed cortical layout and computational properties of the neural populations at the basis of human speech recognition and speaker identification.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhu, Li; Fuhrer, Tobias; Schaefer, Bastian
Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not directly suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell, we introduce crystal fingerprints that can be calculated easily and define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures.more » The new method can be a useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms, and high-throughput screenings.« less
Measuring coherence of computer-assisted likelihood ratio methods.
Haraksim, Rudolf; Ramos, Daniel; Meuwly, Didier; Berger, Charles E H
2015-04-01
Measuring the performance of forensic evaluation methods that compute likelihood ratios (LRs) is relevant for both the development and the validation of such methods. A framework of performance characteristics categorized as primary and secondary is introduced in this study to help achieve such development and validation. Ground-truth labelled fingerprint data is used to assess the performance of an example likelihood ratio method in terms of those performance characteristics. Discrimination, calibration, and especially the coherence of this LR method are assessed as a function of the quantity and quality of the trace fingerprint specimen. Assessment of the coherence revealed a weakness of the comparison algorithm in the computer-assisted likelihood ratio method used. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Baltzer, Pascal A T; Dietzel, Matthias; Kaiser, Werner A
2013-08-01
In the face of multiple available diagnostic criteria in MR-mammography (MRM), a practical algorithm for lesion classification is needed. Such an algorithm should be as simple as possible and include only important independent lesion features to differentiate benign from malignant lesions. This investigation aimed to develop a simple classification tree for differential diagnosis in MRM. A total of 1,084 lesions in standardised MRM with subsequent histological verification (648 malignant, 436 benign) were investigated. Seventeen lesion criteria were assessed by 2 readers in consensus. Classification analysis was performed using the chi-squared automatic interaction detection (CHAID) method. Results include the probability for malignancy for every descriptor combination in the classification tree. A classification tree incorporating 5 lesion descriptors with a depth of 3 ramifications (1, root sign; 2, delayed enhancement pattern; 3, border, internal enhancement and oedema) was calculated. Of all 1,084 lesions, 262 (40.4 %) and 106 (24.3 %) could be classified as malignant and benign with an accuracy above 95 %, respectively. Overall diagnostic accuracy was 88.4 %. The classification algorithm reduced the number of categorical descriptors from 17 to 5 (29.4 %), resulting in a high classification accuracy. More than one third of all lesions could be classified with accuracy above 95 %. • A practical algorithm has been developed to classify lesions found in MR-mammography. • A simple decision tree consisting of five criteria reaches high accuracy of 88.4 %. • Unique to this approach, each classification is associated with a diagnostic certainty. • Diagnostic certainty of greater than 95 % is achieved in 34 % of all cases.
A Semi-supervised Heat Kernel Pagerank MBO Algorithm for Data Classification
2016-07-01
financial predictions, etc. and is finding growing use in text mining studies. In this paper, we present an efficient algorithm for classification of high...video data, set of images, hyperspectral data, medical data, text data, etc. Moreover, the framework provides a way to analyze data whose different...also be incorporated. For text classification, one can use tfidf (term frequency inverse document frequency) to form feature vectors for each document
An algorithm for the arithmetic classification of multilattices.
Indelicato, Giuliana
2013-01-01
A procedure for the construction and the classification of monoatomic multilattices in arbitrary dimension is developed. The algorithm allows one to determine the location of the points of all monoatomic multilattices with a given symmetry, or to determine whether two assigned multilattices are arithmetically equivalent. This approach is based on ideas from integral matrix theory, in particular the reduction to the Smith normal form, and can be coded to provide a classification software package.
Algorithmic Classification of Five Characteristic Types of Paraphasias.
Fergadiotis, Gerasimos; Gorman, Kyle; Bedrick, Steven
2016-12-01
This study was intended to evaluate a series of algorithms developed to perform automatic classification of paraphasic errors (formal, semantic, mixed, neologistic, and unrelated errors). We analyzed 7,111 paraphasias from the Moss Aphasia Psycholinguistics Project Database (Mirman et al., 2010) and evaluated the classification accuracy of 3 automated tools. First, we used frequency norms from the SUBTLEXus database (Brysbaert & New, 2009) to differentiate nonword errors and real-word productions. Then we implemented a phonological-similarity algorithm to identify phonologically related real-word errors. Last, we assessed the performance of a semantic-similarity criterion that was based on word2vec (Mikolov, Yih, & Zweig, 2013). Overall, the algorithmic classification replicated human scoring for the major categories of paraphasias studied with high accuracy. The tool that was based on the SUBTLEXus frequency norms was more than 97% accurate in making lexicality judgments. The phonological-similarity criterion was approximately 91% accurate, and the overall classification accuracy of the semantic classifier ranged from 86% to 90%. Overall, the results highlight the potential of tools from the field of natural language processing for the development of highly reliable, cost-effective diagnostic tools suitable for collecting high-quality measurement data for research and clinical purposes.
Pérez-Castillo, Yunierkis; Lazar, Cosmin; Taminau, Jonatan; Froeyen, Mathy; Cabrera-Pérez, Miguel Ángel; Nowé, Ann
2012-09-24
Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.
Benchmarking contactless acquisition sensor reproducibility for latent fingerprint trace evidence
NASA Astrophysics Data System (ADS)
Hildebrandt, Mario; Dittmann, Jana
2015-03-01
Optical, nano-meter range, contactless, non-destructive sensor devices are promising acquisition techniques in crime scene trace forensics, e.g. for digitizing latent fingerprint traces. Before new approaches are introduced in crime investigations, innovations need to be positively tested and quality ensured. In this paper we investigate sensor reproducibility by studying different scans from four sensors: two chromatic white light sensors (CWL600/CWL1mm), one confocal laser scanning microscope, and one NIR/VIS/UV reflection spectrometer. Firstly, we perform an intra-sensor reproducibility testing for CWL600 with a privacy conform test set of artificial-sweat printed, computer generated fingerprints. We use 24 different fingerprint patterns as original samples (printing samples/templates) for printing with artificial sweat (physical trace samples) and their acquisition with contactless sensory resulting in 96 sensor images, called scan or acquired samples. The second test set for inter-sensor reproducibility assessment consists of the first three patterns from the first test set, acquired in two consecutive scans using each device. We suggest using a simple feature space set in spatial and frequency domain known from signal processing and test its suitability for six different classifiers classifying scan data into small differences (reproducible) and large differences (non-reproducible). Furthermore, we suggest comparing the classification results with biometric verification scores (calculated with NBIS, with threshold of 40) as biometric reproducibility score. The Bagging classifier is nearly for all cases the most reliable classifier in our experiments and the results are also confirmed with the biometric matching rates.
Adaptive sleep-wake discrimination for wearable devices.
Karlen, Walter; Floreano, Dario
2011-04-01
Sleep/wake classification systems that rely on physiological signals suffer from intersubject differences that make accurate classification with a single, subject-independent model difficult. To overcome the limitations of intersubject variability, we suggest a novel online adaptation technique that updates the sleep/wake classifier in real time. The objective of the present study was to evaluate the performance of a newly developed adaptive classification algorithm that was embedded on a wearable sleep/wake classification system called SleePic. The algorithm processed ECG and respiratory effort signals for the classification task and applied behavioral measurements (obtained from accelerometer and press-button data) for the automatic adaptation task. When trained as a subject-independent classifier algorithm, the SleePic device was only able to correctly classify 74.94 ± 6.76% of the human-rated sleep/wake data. By using the suggested automatic adaptation method, the mean classification accuracy could be significantly improved to 92.98 ± 3.19%. A subject-independent classifier based on activity data only showed a comparable accuracy of 90.44 ± 3.57%. We demonstrated that subject-independent models used for online sleep-wake classification can successfully be adapted to previously unseen subjects without the intervention of human experts or off-line calibration.
Spectral-spatial classification of hyperspectral imagery with cooperative game
NASA Astrophysics Data System (ADS)
Zhao, Ji; Zhong, Yanfei; Jia, Tianyi; Wang, Xinyu; Xu, Yao; Shu, Hong; Zhang, Liangpei
2018-01-01
Spectral-spatial classification is known to be an effective way to improve classification performance by integrating spectral information and spatial cues for hyperspectral imagery. In this paper, a game-theoretic spectral-spatial classification algorithm (GTA) using a conditional random field (CRF) model is presented, in which CRF is used to model the image considering the spatial contextual information, and a cooperative game is designed to obtain the labels. The algorithm establishes a one-to-one correspondence between image classification and game theory. The pixels of the image are considered as the players, and the labels are considered as the strategies in a game. Similar to the idea of soft classification, the uncertainty is considered to build the expected energy model in the first step. The local expected energy can be quickly calculated, based on a mixed strategy for the pixels, to establish the foundation for a cooperative game. Coalitions can then be formed by the designed merge rule based on the local expected energy, so that a majority game can be performed to make a coalition decision to obtain the label of each pixel. The experimental results on three hyperspectral data sets demonstrate the effectiveness of the proposed classification algorithm.
Medical image classification based on multi-scale non-negative sparse coding.
Zhang, Ruijie; Shen, Jian; Wei, Fushan; Li, Xiong; Sangaiah, Arun Kumar
2017-11-01
With the rapid development of modern medical imaging technology, medical image classification has become more and more important in medical diagnosis and clinical practice. Conventional medical image classification algorithms usually neglect the semantic gap problem between low-level features and high-level image semantic, which will largely degrade the classification performance. To solve this problem, we propose a multi-scale non-negative sparse coding based medical image classification algorithm. Firstly, Medical images are decomposed into multiple scale layers, thus diverse visual details can be extracted from different scale layers. Secondly, for each scale layer, the non-negative sparse coding model with fisher discriminative analysis is constructed to obtain the discriminative sparse representation of medical images. Then, the obtained multi-scale non-negative sparse coding features are combined to form a multi-scale feature histogram as the final representation for a medical image. Finally, SVM classifier is combined to conduct medical image classification. The experimental results demonstrate that our proposed algorithm can effectively utilize multi-scale and contextual spatial information of medical images, reduce the semantic gap in a large degree and improve medical image classification performance. Copyright © 2017 Elsevier B.V. All rights reserved.
Wang, Shuaiqun; Aorigele; Kong, Wei; Zeng, Weiming; Hong, Xiaomin
2016-01-01
Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes.
Aorigele; Zeng, Weiming; Hong, Xiaomin
2016-01-01
Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes. PMID:27579323
QUEST: Eliminating Online Supervised Learning for Efficient Classification Algorithms.
Zwartjes, Ardjan; Havinga, Paul J M; Smit, Gerard J M; Hurink, Johann L
2016-10-01
In this work, we introduce QUEST (QUantile Estimation after Supervised Training), an adaptive classification algorithm for Wireless Sensor Networks (WSNs) that eliminates the necessity for online supervised learning. Online processing is important for many sensor network applications. Transmitting raw sensor data puts high demands on the battery, reducing network life time. By merely transmitting partial results or classifications based on the sampled data, the amount of traffic on the network can be significantly reduced. Such classifications can be made by learning based algorithms using sampled data. An important issue, however, is the training phase of these learning based algorithms. Training a deployed sensor network requires a lot of communication and an impractical amount of human involvement. QUEST is a hybrid algorithm that combines supervised learning in a controlled environment with unsupervised learning on the location of deployment. Using the SITEX02 dataset, we demonstrate that the presented solution works with a performance penalty of less than 10% in 90% of the tests. Under some circumstances, it even outperforms a network of classifiers completely trained with supervised learning. As a result, the need for on-site supervised learning and communication for training is completely eliminated by our solution.
Constructing an Indoor Floor Plan Using Crowdsourcing Based on Magnetic Fingerprinting
Zhao, Fang; Jiang, Mengling; Ma, Hao; Zhang, Yuexia
2017-01-01
A large number of indoor positioning systems have recently been developed to cater for various location-based services. Indoor maps are a prerequisite of such indoor positioning systems; however, indoor maps are currently non-existent for most indoor environments. Construction of an indoor map by external experts excludes quick deployment and prevents widespread utilization of indoor localization systems. Here, we propose an algorithm for the automatic construction of an indoor floor plan, together with a magnetic fingerprint map of unmapped buildings using crowdsourced smartphone data. For floor plan construction, our system combines the use of dead reckoning technology, an observation model with geomagnetic signals, and trajectory fusion based on an affinity propagation algorithm. To obtain the indoor paths, the magnetic trajectory data obtained through crowdsourcing were first clustered using dynamic time warping similarity criteria. The trajectories were inferred from odometry tracing, and those belonging to the same cluster in the magnetic trajectory domain were then fused. Fusing these data effectively eliminates the inherent tracking errors originating from noisy sensors; as a result, we obtained highly accurate indoor paths. One advantage of our system is that no additional hardware such as a laser rangefinder or wheel encoder is required. Experimental results demonstrate that our proposed algorithm successfully constructs indoor floor plans with 0.48 m accuracy, which could benefit location-based services which lack indoor maps. PMID:29156639
A robust fingerprint matching algorithm based on compatibility of star structures
NASA Astrophysics Data System (ADS)
Cao, Jia; Feng, Jufu
2009-10-01
In fingerprint verification or identification systems, most minutiae-based matching algorithms suffered from the problems of non-linear distortion and missing or faking minutiae. Local structures such as triangle or k-nearest structure are widely used to reduce the impact of non-linear distortion, but are suffered from missing and faking minutiae. In our proposed method, star structure is used to present local structure. A star structure contains various number of minutiae, thus, it is more robust with missing and faking minutiae. Our method consists of four steps: 1) Constructing star structures at minutia level; 2) Computing similarity score for each structure pair, and eliminating impostor matched pairs which have the low scores. As it is generally assumed that there is only linear distortion in local area, the similarity is defined by rotation and shifting. 3) Voting for remained matched pairs according to the compatibility between them, and eliminating impostor matched pairs which gain few votes. The concept of compatibility is first introduced by Yansong Feng [4], the original definition is only based on triangles. We define the compatibility for star structures to adjust to our proposed algorithm. 4) Computing the matching score, based on the number of matched structures and their voting scores. The score also reflects the fact that, it should get higher score if minutiae match in more intensive areas. Experiments evaluated on FVC 2004 show both effectiveness and efficiency of our methods.
NASA Astrophysics Data System (ADS)
Lazariev, A.; Allouche, A.-R.; Aubert-Frécon, M.; Fauvelle, F.; Piotto, M.; Elbayed, K.; Namer, I.-J.; van Ormondt, D.; Graveron-Demilly, D.
2011-11-01
High-resolution magic angle spinning (HRMAS) nuclear magnetic resonance (NMR) is playing an increasingly important role for diagnosis. This technique enables setting up metabolite profiles of ex vivo pathological and healthy tissue. The need to monitor diseases and pharmaceutical follow-up requires an automatic quantitation of HRMAS 1H signals. However, for several metabolites, the values of chemical shifts of proton groups may slightly differ according to the micro-environment in the tissue or cells, in particular to its pH. This hampers the accurate estimation of the metabolite concentrations mainly when using quantitation algorithms based on a metabolite basis set: the metabolite fingerprints are not correct anymore. In this work, we propose an accurate method coupling quantum mechanical simulations and quantitation algorithms to handle basis-set changes. The proposed algorithm automatically corrects mismatches between the signals of the simulated basis set and the signal under analysis by maximizing the normalized cross-correlation between the mentioned signals. Optimized chemical shift values of the metabolites are obtained. This method, QM-QUEST, provides more robust fitting while limiting user involvement and respects the correct fingerprints of metabolites. Its efficiency is demonstrated by accurately quantitating 33 signals from tissue samples of human brains with oligodendroglioma, obtained at 11.7 tesla. The corresponding chemical shift changes of several metabolites within the series are also analyzed.
Evaluation of an Algorithm to Predict Menstrual-Cycle Phase at the Time of Injury.
Tourville, Timothy W; Shultz, Sandra J; Vacek, Pamela M; Knudsen, Emily J; Bernstein, Ira M; Tourville, Kelly J; Hardy, Daniel M; Johnson, Robert J; Slauterbeck, James R; Beynnon, Bruce D
2016-01-01
Women are 2 to 8 times more likely to sustain an anterior cruciate ligament (ACL) injury than men, and previous studies indicated an increased risk for injury during the preovulatory phase of the menstrual cycle (MC). However, investigations of risk rely on retrospective classification of MC phase, and no tools for this have been validated. To evaluate the accuracy of an algorithm for retrospectively classifying MC phase at the time of a mock injury based on MC history and salivary progesterone (P4) concentration. Descriptive laboratory study. Research laboratory. Thirty-one healthy female collegiate athletes (age range, 18-24 years) provided serum or saliva (or both) samples at 8 visits over 1 complete MC. Self-reported MC information was obtained on a randomized date (1-45 days) after mock injury, which is the typical timeframe in which researchers have access to ACL-injured study participants. The MC phase was classified using the algorithm as applied in a stand-alone computational fashion and also by 4 clinical experts using the algorithm and additional subjective hormonal history information to help inform their decision. To assess algorithm accuracy, phase classifications were compared with the actual MC phase at the time of mock injury (ascertained using urinary luteinizing hormone tests and serial serum P4 samples). Clinical expert and computed classifications were compared using κ statistics. Fourteen participants (45%) experienced anovulatory cycles. The algorithm correctly classified MC phase for 23 participants (74%): 22 (76%) of 29 who were preovulatory/anovulatory and 1 (50%) of 2 who were postovulatory. Agreement between expert and algorithm classifications ranged from 80.6% (κ = 0.50) to 93% (κ = 0.83). Classifications based on same-day saliva sample and optimal P4 threshold were the same as those based on MC history alone (87.1% correct). Algorithm accuracy varied during the MC but at no time were both sensitivity and specificity levels acceptable. These findings raise concerns about the accuracy of previous retrospective MC-phase classification systems, particularly in a population with a high occurrence of anovulatory cycles.
The influence of negative training set size on machine learning-based virtual screening.
Kurczab, Rafał; Smusz, Sabina; Bojarski, Andrzej J
2014-01-01
The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening.
The influence of negative training set size on machine learning-based virtual screening
2014-01-01
Background The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. Results The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. Conclusions In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening. PMID:24976867
Wolski, Witold E; Lalowski, Maciej; Jungblut, Peter; Reinert, Knut
2005-01-01
Background Peptide Mass Fingerprinting (PMF) is a widely used mass spectrometry (MS) method of analysis of proteins and peptides. It relies on the comparison between experimentally determined and theoretical mass spectra. The PMF process requires calibration, usually performed with external or internal calibrants of known molecular masses. Results We have introduced two novel MS calibration methods. The first method utilises the local similarity of peptide maps generated after separation of complex protein samples by two-dimensional gel electrophoresis. It computes a multiple peak-list alignment of the data set using a modified Minimum Spanning Tree (MST) algorithm. The second method exploits the idea that hundreds of MS samples are measured in parallel on one sample support. It improves the calibration coefficients by applying a two-dimensional Thin Plate Splines (TPS) smoothing algorithm. We studied the novel calibration methods utilising data generated by three different MALDI-TOF-MS instruments. We demonstrate that a PMF data set can be calibrated without resorting to external or relying on widely occurring internal calibrants. The methods developed here were implemented in R and are part of the BioConductor package mscalib available from . Conclusion The MST calibration algorithm is well suited to calibrate MS spectra of protein samples resulting from two-dimensional gel electrophoretic separation. The TPS based calibration algorithm might be used to correct systematic mass measurement errors observed for large MS sample supports. As compared to other methods, our combined MS spectra calibration strategy increases the peptide/protein identification rate by an additional 5 – 15%. PMID:16102175
Likelihood ratio data to report the validation of a forensic fingerprint evaluation method.
Ramos, Daniel; Haraksim, Rudolf; Meuwly, Didier
2017-02-01
Data to which the authors refer to throughout this article are likelihood ratios (LR) computed from the comparison of 5-12 minutiae fingermarks with fingerprints. These LRs data are used for the validation of a likelihood ratio (LR) method in forensic evidence evaluation. These data present a necessary asset for conducting validation experiments when validating LR methods used in forensic evidence evaluation and set up validation reports. These data can be also used as a baseline for comparing the fingermark evidence in the same minutiae configuration as presented in (D. Meuwly, D. Ramos, R. Haraksim,) [1], although the reader should keep in mind that different feature extraction algorithms and different AFIS systems used may produce different LRs values. Moreover, these data may serve as a reproducibility exercise, in order to train the generation of validation reports of forensic methods, according to [1]. Alongside the data, a justification and motivation for the use of methods is given. These methods calculate LRs from the fingerprint/mark data and are subject to a validation procedure. The choice of using real forensic fingerprint in the validation and simulated data in the development is described and justified. Validation criteria are set for the purpose of validation of the LR methods, which are used to calculate the LR values from the data and the validation report. For privacy and data protection reasons, the original fingerprint/mark images cannot be shared. But these images do not constitute the core data for the validation, contrarily to the LRs that are shared.
2013-05-28
those of the support vector machine and relevance vector machine, and the model runs more quickly than the other algorithms . When one class occurs...incremental support vector machine algorithm for online learning when fewer than 50 data points are available. (a) Papers published in peer-reviewed journals...learning environments, where data processing occurs one observation at a time and the classification algorithm improves over time with new
Adaptive phase k-means algorithm for waveform classification
NASA Astrophysics Data System (ADS)
Song, Chengyun; Liu, Zhining; Wang, Yaojun; Xu, Feng; Li, Xingming; Hu, Guangmin
2018-01-01
Waveform classification is a powerful technique for seismic facies analysis that describes the heterogeneity and compartments within a reservoir. Horizon interpretation is a critical step in waveform classification. However, the horizon often produces inconsistent waveform phase, and thus results in an unsatisfied classification. To alleviate this problem, an adaptive phase waveform classification method called the adaptive phase k-means is introduced in this paper. Our method improves the traditional k-means algorithm using an adaptive phase distance for waveform similarity measure. The proposed distance is a measure with variable phases as it moves from sample to sample along the traces. Model traces are also updated with the best phase interference in the iterative process. Therefore, our method is robust to phase variations caused by the interpretation horizon. We tested the effectiveness of our algorithm by applying it to synthetic and real data. The satisfactory results reveal that the proposed method tolerates certain waveform phase variation and is a good tool for seismic facies analysis.
Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.
Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi
2013-01-01
The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.
Computational approaches for the classification of seed storage proteins.
Radhika, V; Rao, V Sree Hari
2015-07-01
Seed storage proteins comprise a major part of the protein content of the seed and have an important role on the quality of the seed. These storage proteins are important because they determine the total protein content and have an effect on the nutritional quality and functional properties for food processing. Transgenic plants are being used to develop improved lines for incorporation into plant breeding programs and the nutrient composition of seeds is a major target of molecular breeding programs. Hence, classification of these proteins is crucial for the development of superior varieties with improved nutritional quality. In this study we have applied machine learning algorithms for classification of seed storage proteins. We have presented an algorithm based on nearest neighbor approach for classification of seed storage proteins and compared its performance with decision tree J48, multilayer perceptron neural (MLP) network and support vector machine (SVM) libSVM. The model based on our algorithm has been able to give higher classification accuracy in comparison to the other methods.
Vlsi implementation of flexible architecture for decision tree classification in data mining
NASA Astrophysics Data System (ADS)
Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak
2017-07-01
The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.
On the use of harmony search algorithm in the training of wavelet neural networks
NASA Astrophysics Data System (ADS)
Lai, Kee Huong; Zainuddin, Zarita; Ong, Pauline
2015-10-01
Wavelet neural networks (WNNs) are a class of feedforward neural networks that have been used in a wide range of industrial and engineering applications to model the complex relationships between the given inputs and outputs. The training of WNNs involves the configuration of the weight values between neurons. The backpropagation training algorithm, which is a gradient-descent method, can be used for this training purpose. Nonetheless, the solutions found by this algorithm often get trapped at local minima. In this paper, a harmony search-based algorithm is proposed for the training of WNNs. The training of WNNs, thus can be formulated as a continuous optimization problem, where the objective is to maximize the overall classification accuracy. Each candidate solution proposed by the harmony search algorithm represents a specific WNN architecture. In order to speed up the training process, the solution space is divided into disjoint partitions during the random initialization step of harmony search algorithm. The proposed training algorithm is tested onthree benchmark problems from the UCI machine learning repository, as well as one real life application, namely, the classification of electroencephalography signals in the task of epileptic seizure detection. The results obtained show that the proposed algorithm outperforms the traditional harmony search algorithm in terms of overall classification accuracy.
New Dandelion Algorithm Optimizes Extreme Learning Machine for Biomedical Classification Problems
Li, Xiguang; Zhao, Liang; Gong, Changqing; Liu, Xiaojing
2017-01-01
Inspired by the behavior of dandelion sowing, a new novel swarm intelligence algorithm, namely, dandelion algorithm (DA), is proposed for global optimization of complex functions in this paper. In DA, the dandelion population will be divided into two subpopulations, and different subpopulations will undergo different sowing behaviors. Moreover, another sowing method is designed to jump out of local optimum. In order to demonstrate the validation of DA, we compare the proposed algorithm with other existing algorithms, including bat algorithm, particle swarm optimization, and enhanced fireworks algorithm. Simulations show that the proposed algorithm seems much superior to other algorithms. At the same time, the proposed algorithm can be applied to optimize extreme learning machine (ELM) for biomedical classification problems, and the effect is considerable. At last, we use different fusion methods to form different fusion classifiers, and the fusion classifiers can achieve higher accuracy and better stability to some extent. PMID:29085425
Strength in Numbers: Using Big Data to Simplify Sentiment Classification.
Filippas, Apostolos; Lappas, Theodoros
2017-09-01
Sentiment classification, the task of assigning a positive or negative label to a text segment, is a key component of mainstream applications such as reputation monitoring, sentiment summarization, and item recommendation. Even though the performance of sentiment classification methods has steadily improved over time, their ever-increasing complexity renders them comprehensible by only a shrinking minority of expert practitioners. For all others, such highly complex methods are black-box predictors that are hard to tune and even harder to justify to decision makers. Motivated by these shortcomings, we introduce BigCounter: a new algorithm for sentiment classification that substitutes algorithmic complexity with Big Data. Our algorithm combines standard data structures with statistical testing to deliver accurate and interpretable predictions. It is also parameter free and suitable for use virtually "out of the box," which makes it appealing for organizations wanting to leverage their troves of unstructured data without incurring the significant expense of creating in-house teams of data scientists. Finally, BigCounter's efficient and parallelizable design makes it applicable to very large data sets. We apply our method on such data sets toward a study on the limits of Big Data for sentiment classification. Our study finds that, after a certain point, predictive performance tends to converge and additional data have little benefit. Our algorithmic design and findings provide the foundations for future research on the data-over-computation paradigm for classification problems.
System and method for resolving gamma-ray spectra
Gentile, Charles A.; Perry, Jason; Langish, Stephen W.; Silber, Kenneth; Davis, William M.; Mastrovito, Dana
2010-05-04
A system for identifying radionuclide emissions is described. The system includes at least one processor for processing output signals from a radionuclide detecting device, at least one training algorithm run by the at least one processor for analyzing data derived from at least one set of known sample data from the output signals, at least one classification algorithm derived from the training algorithm for classifying unknown sample data, wherein the at least one training algorithm analyzes the at least one sample data set to derive at least one rule used by said classification algorithm for identifying at least one radionuclide emission detected by the detecting device.
NASA Astrophysics Data System (ADS)
Kirst, Stefan; Vielhauer, Claus
2015-03-01
In digitized forensics the support of investigators in any manner is one of the main goals. Using conservative lifting methods, the detection of traces is done manually. For non-destructive contactless methods, the necessity for detecting traces is obvious for further biometric analysis. High resolutional 3D confocal laser scanning microscopy (CLSM) grants the possibility for a detection by segmentation approach with improved detection results. Optimal scan results with CLSM are achieved on surfaces orthogonal to the sensor, which is not always possible due to environmental circumstances or the surface's shape. This introduces additional noise, outliers and a lack of contrast, making a detection of traces even harder. Prior work showed the possibility of determining angle-independent classification models for the detection of latent fingerprints (LFP). Enhancing this approach, we introduce a larger feature space containing a variety of statistical-, roughness-, color-, edge-directivity-, histogram-, Gabor-, gradient- and Tamura features based on raw data and gray-level co-occurrence matrices (GLCM) using high resolutional data. Our test set consists of eight different surfaces for the detection of LFP in four different acquisition angles with a total of 1920 single scans. For each surface and angles in steps of 10, we capture samples from five donors to introduce variance by a variety of sweat compositions and application influences such as pressure or differences in ridge thickness. By analyzing the present test set with our approach, we intend to determine angle- and substrate-dependent classification models to determine optimal surface specific acquisition setups and also classification models for a general detection purpose for both, angles and substrates. The results on overall models with classification rates up to 75.15% (kappa 0.50) already show a positive tendency regarding the usability of the proposed methods for LFP detection on varying surfaces in non-planar scenarios.
Sharma, Ashok K; Srivastava, Gopal N; Roy, Ankita; Sharma, Vineet K
2017-01-01
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( R 2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( R 2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.
Sharma, Ashok K.; Srivastava, Gopal N.; Roy, Ankita; Sharma, Vineet K.
2017-01-01
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules. PMID:29249969
1945-03-07
Picture (285) ....... •"■*’ Cameraman, Motion Picture (043) 115 Canvas Cover Renairuan (OhU) ■ * Car Carpenter, Railway (046) i<" Car Mechanic...Film Editor, Motion Picture (l3l) .,,,.,♦ * 15 Filter Operator, ^ tor Supply (O83). # 10 Fingerprinter (307) ’. . * 30 Fire Fighter (383) ,. 128...Mechanic (322) .... Registered Nurse (225) ....... Repairman, Camera (042) Repairman, Canvas Cover (044) . . . Repairman, Central. Of fice (095
(GTG)5-PCR reference framework for acetic acid bacteria.
Papalexandratou, Zoi; Cleenwerck, Ilse; De Vos, Paul; De Vuyst, Luc
2009-11-01
One hundred and fifty-eight strains of acetic acid bacteria (AAB) were subjected to (GTG)(5)-PCR fingerprinting to construct a reference framework for their rapid classification and identification. Most of them clustered according to their respective taxonomic designation; others had to be reclassified based on polyphasic data. This study shows the usefulness of the method to determine the taxonomic and phylogenetic relationships among AAB and to study the AAB diversity of complex ecosystems.
Mohapatra, Bidyut R; Broersma, Klaas; Mazumder, Asit
2008-04-01
Determination of the non-point sources of fecal pollution is essential for the assessment of potential public health risk and development of appropriate management practices for prevention of further contamination. Repetitive extragenic palindromic-PCR coupled with (GTG)(5) primer [(GTG)(5)-PCR] was performed on 573 Escherichia coli isolates obtained from the feces of poultry (chicken, duck and turkey) and free-living (Canada goose, hawk, magpie, seagull and songbird) birds to evaluate the efficacy of (GTG)(5)-PCR genomic fingerprinting in the prediction of the correct source of fecal pollution. A discriminant analysis with the jack-knife algorithm of (GTG)(5)-PCR DNA fingerprints revealed that 95%, 94.1%, 93.2%, 84.6%, 79.7%, 76.7%, 75.3% and 70.7% of magpie, hawk, turkey, seagull, Canada goose, chicken, duck and songbird fecal E. coli isolates classified into the correct host source, respectively. The results of this study indicate that (GTG)(5)-PCR can be considered to be a complementary molecular tool for the rapid determination of E. coli isolates identity and tracking the non-point sources of fecal pollution.
Particle analysis using laser ablation mass spectroscopy
Parker, Eric P.; Rosenthal, Stephen E.; Trahan, Michael W.; Wagner, John S.
2003-09-09
The present invention provides a method of quickly identifying bioaerosols by class, even if the subject bioaerosol has not been previously encountered. The method begins by collecting laser ablation mass spectra from known particles. The spectra are correlated with the known particles, including the species of particle and the classification (e.g., bacteria). The spectra can then be used to train a neural network, for example using genetic algorithm-based training, to recognize each spectra and to recognize characteristics of the classifications. The spectra can also be used in a multivariate patch algorithm. Laser ablation mass specta from unknown particles can be presented as inputs to the trained neural net for identification as to classification. The description below first describes suitable intelligent algorithms and multivariate patch algorithms, then presents an example of the present invention including results.
A Theoretical Analysis of Why Hybrid Ensembles Work.
Hsu, Kuo-Wei
2017-01-01
Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles.
Carlson, Jay D; Mittek, Mateusz; Parkison, Steven A; Sathler, Pedro; Bayne, David; Psota, Eric T; Perez, Lance C; Bonasera, Stephen J
2014-01-01
As a first step toward building a smart home behavioral monitoring system capable of classifying a wide variety of human behavior, a wireless sensor network (WSN) system is presented for RSSI localization. The low-cost, non-intrusive system uses a smart watch worn by the user to broadcast data to the WSN, where the strength of the radio signal is evaluated at each WSN node to localize the user. A method is presented that uses simultaneous localization and mapping (SLAM) for system calibration, providing automated fingerprinting associating the radio signal strength patterns to the user's location within the living space. To improve the accuracy of localization, a novel refinement technique is introduced that takes into account typical movement patterns of people within their homes. Experimental results demonstrate that the system is capable of providing accurate localization results in a typical living space.
Sun, Zong-Liang; Zhang, Yu-Zhen; Zhang, Feng; Zhang, Jia-Wei; Zheng, Guo-Can; Tan, Ling; Wang, Chong-Zhi; Zhou, Lian-Di; Zhang, Qi-Hui; Yuan, Chun-Su
2018-06-22
An efficient method combined with fingerprint and chemometric analyses was developed to evaluate the quality of the traditional Chinese medicine plant Penthorum chinense Pursh. Nine samples were collected from different regions during different harvest periods, and 17 components in the form of extracts were simultaneously examined to assess quality by using high-performance liquid chromatography. The hepatoprotective effects of components were investigated by assessing the inhibition of SMMC-7721 cell growth. The results indicated that the quality control method was accurate, stable, and reliable, and the hierarchical heat-map cluster and the principle component analyses confirmed that the classification of all nine samples was consistent. Quercetin and ellagitannins including pinocembrin-7-O-[3''-O-galloyl-4'',6''-hexahydroxydiphenoyl]-β-glucose (PGHG), thonningianin A, thonningianin B, and other flavonoids were abundant in the extracts, and significantly contributed to the hepatoprotective effects.
Active learning for clinical text classification: is it better than random sampling?
Figueroa, Rosa L; Zeng-Treitler, Qing; Ngo, Long H; Goryachev, Sergey; Wiechmann, Eduardo P
2012-01-01
This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks. Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences. The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm. For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty.
Active learning for clinical text classification: is it better than random sampling?
Figueroa, Rosa L; Ngo, Long H; Goryachev, Sergey; Wiechmann, Eduardo P
2012-01-01
Objective This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks. Design Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. Measurements Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences. Results The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm. Conclusion For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty. PMID:22707743
Ma, Li; Fan, Suohai
2017-03-14
The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.
NASA Astrophysics Data System (ADS)
Squiers, John J.; Li, Weizhi; King, Darlene R.; Mo, Weirong; Zhang, Xu; Lu, Yang; Sellke, Eric W.; Fan, Wensheng; DiMaio, J. Michael; Thatcher, Jeffrey E.
2016-03-01
The clinical judgment of expert burn surgeons is currently the standard on which diagnostic and therapeutic decisionmaking regarding burn injuries is based. Multispectral imaging (MSI) has the potential to increase the accuracy of burn depth assessment and the intraoperative identification of viable wound bed during surgical debridement of burn injuries. A highly accurate classification model must be developed using machine-learning techniques in order to translate MSI data into clinically-relevant information. An animal burn model was developed to build an MSI training database and to study the burn tissue classification ability of several models trained via common machine-learning algorithms. The algorithms tested, from least to most complex, were: K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), weighted linear discriminant analysis (W-LDA), quadratic discriminant analysis (QDA), ensemble linear discriminant analysis (EN-LDA), ensemble K-nearest neighbors (EN-KNN), and ensemble decision tree (EN-DT). After the ground-truth database of six tissue types (healthy skin, wound bed, blood, hyperemia, partial injury, full injury) was generated by histopathological analysis, we used 10-fold cross validation to compare the algorithms' performances based on their accuracies in classifying data against the ground truth, and each algorithm was tested 100 times. The mean test accuracy of the algorithms were KNN 68.3%, DT 61.5%, LDA 70.5%, W-LDA 68.1%, QDA 68.9%, EN-LDA 56.8%, EN-KNN 49.7%, and EN-DT 36.5%. LDA had the highest test accuracy, reflecting the bias-variance tradeoff over the range of complexities inherent to the algorithms tested. Several algorithms were able to match the current standard in burn tissue classification, the clinical judgment of expert burn surgeons. These results will guide further development of an MSI burn tissue classification system. Given that there are few surgeons and facilities specializing in burn care, this technology may improve the standard of burn care for patients without access to specialized facilities.
NASA Astrophysics Data System (ADS)
Ma, Weiwei; Gong, Cailan; Hu, Yong; Li, Long; Meng, Peng
2015-10-01
Remote sensing technology has been broadly recognized for its convenience and efficiency in mapping vegetation, particularly in high-altitude and inaccessible areas where there are lack of in-situ observations. In this study, Landsat Thematic Mapper (TM) images and Chinese environmental mitigation satellite CCD sensor (HJ-1 CCD) images, both of which are at 30m spatial resolution were employed for identifying and monitoring of vegetation types in a area of Western China——Qinghai Lake Watershed(QHLW). A decision classification tree (DCT) algorithm using multi-characteristic including seasonal TM/HJ-1 CCD time series data combined with digital elevation models (DEMs) dataset, and a supervised maximum likelihood classification (MLC) algorithm with single-data TM image were applied vegetation classification. Accuracy of the two algorithms was assessed using field observation data. Based on produced vegetation classification maps, it was found that the DCT using multi-season data and geomorphologic parameters was superior to the MLC algorithm using single-data image, improving the overall accuracy by 11.86% at second class level and significantly reducing the "salt and pepper" noise. The DCT algorithm applied to TM /HJ-1 CCD time series data geomorphologic parameters appeared as a valuable and reliable tool for monitoring vegetation at first class level (5 vegetation classes) and second class level(8 vegetation subclasses). The DCT algorithm using multi-characteristic might provide a theoretical basis and general approach to automatic extraction of vegetation types from remote sensing imagery over plateau areas.
NASA Astrophysics Data System (ADS)
Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y.; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian
2016-09-01
We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.
Low complexity feature extraction for classification of harmonic signals
NASA Astrophysics Data System (ADS)
William, Peter E.
In this dissertation, feature extraction algorithms have been developed for extraction of characteristic features from harmonic signals. The common theme for all developed algorithms is the simplicity in generating a significant set of features directly from the time domain harmonic signal. The features are a time domain representation of the composite, yet sparse, harmonic signature in the spectral domain. The algorithms are adequate for low-power unattended sensors which perform sensing, feature extraction, and classification in a standalone scenario. The first algorithm generates the characteristic features using only the duration between successive zero-crossing intervals. The second algorithm estimates the harmonics' amplitudes of the harmonic structure employing a simplified least squares method without the need to estimate the true harmonic parameters of the source signal. The third algorithm, resulting from a collaborative effort with Daniel White at the DSP Lab, University of Nebraska-Lincoln, presents an analog front end approach that utilizes a multichannel analog projection and integration to extract the sparse spectral features from the analog time domain signal. Classification is performed using a multilayer feedforward neural network. Evaluation of the proposed feature extraction algorithms for classification through the processing of several acoustic and vibration data sets (including military vehicles and rotating electric machines) with comparison to spectral features shows that, for harmonic signals, time domain features are simpler to extract and provide equivalent or improved reliability over the spectral features in both the detection probabilities and false alarm rate.
Blob-level active-passive data fusion for Benthic classification
NASA Astrophysics Data System (ADS)
Park, Joong Yong; Kalluri, Hemanth; Mathur, Abhinav; Ramnath, Vinod; Kim, Minsu; Aitken, Jennifer; Tuell, Grady
2012-06-01
We extend the data fusion pixel level to the more semantically meaningful blob level, using the mean-shift algorithm to form labeled blobs having high similarity in the feature domain, and connectivity in the spatial domain. We have also developed Bhattacharyya Distance (BD) and rule-based classifiers, and have implemented these higher-level data fusion algorithms into the CZMIL Data Processing System. Applying these new algorithms to recent SHOALS and CASI data at Plymouth Harbor, Massachusetts, we achieved improved benthic classification accuracies over those produced with either single sensor, or pixel-level fusion strategies. These results appear to validate the hypothesis that classification accuracy may be generally improved by adopting higher spatial and semantic levels of fusion.
Text Extraction from Scene Images by Character Appearance and Structure Modeling
Yi, Chucai; Tian, Yingli
2012-01-01
In this paper, we propose a novel algorithm to detect text information from natural scene images. Scene text classification and detection are still open research topics. Our proposed algorithm is able to model both character appearance and structure to generate representative and discriminative text descriptors. The contributions of this paper include three aspects: 1) a new character appearance model by a structure correlation algorithm which extracts discriminative appearance features from detected interest points of character samples; 2) a new text descriptor based on structons and correlatons, which model character structure by structure differences among character samples and structure component co-occurrence; and 3) a new text region localization method by combining color decomposition, character contour refinement, and string line alignment to localize character candidates and refine detected text regions. We perform three groups of experiments to evaluate the effectiveness of our proposed algorithm, including text classification, text detection, and character identification. The evaluation results on benchmark datasets demonstrate that our algorithm achieves the state-of-the-art performance on scene text classification and detection, and significantly outperforms the existing algorithms for character identification. PMID:23316111
Introcaso, Camille E; Gruber, DeAnn; Bradley, Heather; Peterman, Thomas A; Ewell, Joy; Wendell, Debbie; Foxhood, Joseph; Su, John R; Weinstock, Hillard S; Markowitz, Lauri E
2013-09-01
Congenital syphilis is a serious, preventable, and nationally notifiable disease. Despite the existence of a surveillance case definition, congenital syphilis is sometimes classified differently using an algorithm on the Centers for Disease Control and Prevention's case reporting form. We reviewed Louisiana's congenital syphilis electronic reporting system for investigations of infants born from January 2010 to October 2011, abstracted data required for classification, and applied the surveillance definition and the algorithm. We calculated the sensitivities and specificities of the algorithm and Louisiana's classification using the surveillance definition as the surveillance gold standard. Among 349 congenital syphilis investigations, the surveillance definition identified 62 cases. The algorithm had a sensitivity of 91.9% and a specificity of 64.1%. Louisiana's classification had a sensitivity of 50% and a specificity of 91.3% compared with the surveillance definition. The differences between the algorithm and the surveillance definition led to misclassification of congenital syphilis cases. The algorithm should match the surveillance definition. Other state and local health departments should assure that their reported cases meet the surveillance definition.
Diagnostic Accuracy Comparison of Artificial Immune Algorithms for Primary Headaches.
Çelik, Ufuk; Yurtay, Nilüfer; Koç, Emine Rabia; Tepe, Nermin; Güllüoğlu, Halil; Ertaş, Mustafa
2015-01-01
The present study evaluated the diagnostic accuracy of immune system algorithms with the aim of classifying the primary types of headache that are not related to any organic etiology. They are divided into four types: migraine, tension, cluster, and other primary headaches. After we took this main objective into consideration, three different neurologists were required to fill in the medical records of 850 patients into our web-based expert system hosted on our project web site. In the evaluation process, Artificial Immune Systems (AIS) were used as the classification algorithms. The AIS are classification algorithms that are inspired by the biological immune system mechanism that involves significant and distinct capabilities. These algorithms simulate the specialties of the immune system such as discrimination, learning, and the memorizing process in order to be used for classification, optimization, or pattern recognition. According to the results, the accuracy level of the classifier used in this study reached a success continuum ranging from 95% to 99%, except for the inconvenient one that yielded 71% accuracy.
Chen, Zhiru; Hong, Wenxue
2016-02-01
Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
Constrained Metric Learning by Permutation Inducing Isometries.
Bosveld, Joel; Mahmood, Arif; Huynh, Du Q; Noakes, Lyle
2016-01-01
The choice of metric critically affects the performance of classification and clustering algorithms. Metric learning algorithms attempt to improve performance, by learning a more appropriate metric. Unfortunately, most of the current algorithms learn a distance function which is not invariant to rigid transformations of images. Therefore, the distances between two images and their rigidly transformed pair may differ, leading to inconsistent classification or clustering results. We propose to constrain the learned metric to be invariant to the geometry preserving transformations of images that induce permutations in the feature space. The constraint that these transformations are isometries of the metric ensures consistent results and improves accuracy. Our second contribution is a dimension reduction technique that is consistent with the isometry constraints. Our third contribution is the formulation of the isometry constrained logistic discriminant metric learning (IC-LDML) algorithm, by incorporating the isometry constraints within the objective function of the LDML algorithm. The proposed algorithm is compared with the existing techniques on the publicly available labeled faces in the wild, viewpoint-invariant pedestrian recognition, and Toy Cars data sets. The IC-LDML algorithm has outperformed existing techniques for the tasks of face recognition, person identification, and object classification by a significant margin.
Automatic detection and classification of artifacts in single-channel EEG.
Olund, Thomas; Duun-Henriksen, Jonas; Kjaer, Troels W; Sorensen, Helge B D
2014-01-01
Ambulatory EEG monitoring can provide medical doctors important diagnostic information, without hospitalizing the patient. These recordings are however more exposed to noise and artifacts compared to clinically recorded EEG. An automatic artifact detection and classification algorithm for single-channel EEG is proposed to help identifying these artifacts. Features are extracted from the EEG signal and wavelet subbands. Subsequently a selection algorithm is applied in order to identify the best discriminating features. A non-linear support vector machine is used to discriminate among different artifact classes using the selected features. Single-channel (Fp1-F7) EEG recordings are obtained from experiments with 12 healthy subjects performing artifact inducing movements. The dataset was used to construct and validate the model. Both subject-specific and generic implementation, are investigated. The detection algorithm yield an average sensitivity and specificity above 95% for both the subject-specific and generic models. The classification algorithm show a mean accuracy of 78 and 64% for the subject-specific and generic model, respectively. The classification model was additionally validated on a reference dataset with similar results.
A Survey on the Feasibility of Sound Classification on Wireless Sensor Nodes
Salomons, Etto L.; Havinga, Paul J. M.
2015-01-01
Wireless sensor networks are suitable to gain context awareness for indoor environments. As sound waves form a rich source of context information, equipping the nodes with microphones can be of great benefit. The algorithms to extract features from sound waves are often highly computationally intensive. This can be problematic as wireless nodes are usually restricted in resources. In order to be able to make a proper decision about which features to use, we survey how sound is used in the literature for global sound classification, age and gender classification, emotion recognition, person verification and identification and indoor and outdoor environmental sound classification. The results of the surveyed algorithms are compared with respect to accuracy and computational load. The accuracies are taken from the surveyed papers; the computational loads are determined by benchmarking the algorithms on an actual sensor node. We conclude that for indoor context awareness, the low-cost algorithms for feature extraction perform equally well as the more computationally-intensive variants. As the feature extraction still requires a large amount of processing time, we present four possible strategies to deal with this problem. PMID:25822142
K-Nearest Neighbor Algorithm Optimization in Text Categorization
NASA Astrophysics Data System (ADS)
Chen, Shufeng
2018-01-01
K-Nearest Neighbor (KNN) classification algorithm is one of the simplest methods of data mining. It has been widely used in classification, regression and pattern recognition. The traditional KNN method has some shortcomings such as large amount of sample computation and strong dependence on the sample library capacity. In this paper, a method of representative sample optimization based on CURE algorithm is proposed. On the basis of this, presenting a quick algorithm QKNN (Quick k-nearest neighbor) to find the nearest k neighbor samples, which greatly reduces the similarity calculation. The experimental results show that this algorithm can effectively reduce the number of samples and speed up the search for the k nearest neighbor samples to improve the performance of the algorithm.
Contour classification in thermographic images for detection of breast cancer
NASA Astrophysics Data System (ADS)
Okuniewski, Rafał; Nowak, Robert M.; Cichosz, Paweł; Jagodziński, Dariusz; Matysiewicz, Mateusz; Neumann, Łukasz; Oleszkiewicz, Witold
2016-09-01
Thermographic images of breast taken by the Braster device are uploaded into web application which uses different classification algorithms to automatically decide whether a patient should be more thoroughly examined. This article presents the approach to the task of classifying contours visible on thermographic images of breast taken by the Braster device in order to make the decision about the existence of cancerous tumors in breast. It presents the results of the researches conducted on the different classification algorithms.
Space Object Classification Using Fused Features of Time Series Data
NASA Astrophysics Data System (ADS)
Jia, B.; Pham, K. D.; Blasch, E.; Shen, D.; Wang, Z.; Chen, G.
In this paper, a fused feature vector consisting of raw time series and texture feature information is proposed for space object classification. The time series data includes historical orbit trajectories and asteroid light curves. The texture feature is derived from recurrence plots using Gabor filters for both unsupervised learning and supervised learning algorithms. The simulation results show that the classification algorithms using the fused feature vector achieve better performance than those using raw time series or texture features only.
A Novel Modulation Classification Approach Using Gabor Filter Network
Ghauri, Sajjad Ahmed; Qureshi, Ijaz Mansoor; Cheema, Tanveer Ahmed; Malik, Aqdas Naveed
2014-01-01
A Gabor filter network based approach is used for feature extraction and classification of digital modulated signals by adaptively tuning the parameters of Gabor filter network. Modulation classification of digitally modulated signals is done under the influence of additive white Gaussian noise (AWGN). The modulations considered for the classification purpose are PSK 2 to 64, FSK 2 to 64, and QAM 4 to 64. The Gabor filter network uses the network structure of two layers; the first layer which is input layer constitutes the adaptive feature extraction part and the second layer constitutes the signal classification part. The Gabor atom parameters are tuned using Delta rule and updating of weights of Gabor filter using least mean square (LMS) algorithm. The simulation results show that proposed novel modulation classification algorithm has high classification accuracy at low signal to noise ratio (SNR) on AWGN channel. PMID:25126603
Yang, Jian; Ma, Shexia; Gao, Bo; Li, Xiaoying; Zhang, Yanjun; Cai, Jing; Li, Mei; Yao, Ling'ai; Huang, Bo; Zheng, Mei
2017-09-01
In order to accurately apportion the many distinct types of individual particles observed, it is necessary to characterize fingerprints of individual particles emitted directly from known sources. In this study, single particle mass spectral signatures from vehicle exhaust particles in a tunnel were performed. These data were used to evaluate particle signatures in a real-world PM 2.5 apportionment study. The dominant chemical type originating from average positive and negative mass spectra for vehicle exhaust particles are EC species. Four distinct particle types describe the majority of particles emitted by vehicle exhaust particles in this tunnel. Each particle class is labeled according to the most significant chemical features in both average positive and negative mass spectral signatures, including ECOC, NaK, Metal and PAHs species. A single particle aerosol mass spectrometry (SPAMS) was also employed during the winter of 2013 in Guangzhou to determine both the size and chemical composition of individual atmospheric particles, with vacuum aerodynamic diameter (d va ) in the size range of 0.2-2μm. A total of 487,570 particles were chemically analyzed with positive and negative ion mass spectra and a large set of single particle mass spectra was collected and analyzed in order to identify the speciation. According to the typical tracer ions from different source types and classification by the ART-2a algorithm which uses source fingerprints for apportioning ambient particles, the major sources of single particles were simulated. Coal combustion, vehicle exhaust, and secondary ion were the most abundant particle sources, contributing 28.5%, 17.8%, and 18.2%, respectively. The fraction with vehicle exhaust species particles decreased slightly with particle size in the condensation mode particles. Copyright © 2017 Elsevier B.V. All rights reserved.