Neural system for heartbeats recognition using genetically integrated ensemble of classifiers.
Osowski, Stanislaw; Siwek, Krzysztof; Siroic, Robert
2011-03-01
This paper presents the application of genetic algorithm for the integration of neural classifiers combined in the ensemble for the accurate recognition of heartbeat types on the basis of ECG registration. The idea presented in this paper is that using many classifiers arranged in the form of ensemble leads to the increased accuracy of the recognition. In such ensemble the important problem is the integration of all classifiers into one effective classification system. This paper proposes the use of genetic algorithm. It was shown that application of the genetic algorithm is very efficient and allows to reduce significantly the total error of heartbeat recognition. This was confirmed by the numerical experiments performed on the MIT BIH Arrhythmia Database. Copyright © 2011 Elsevier Ltd. All rights reserved.
A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem
Liu, Dong-sheng; Fan, Shu-jiang
2014-01-01
In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389
Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
2016-01-01
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.
Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
2016-01-01
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases. PMID:26764911
Multiobjective GAs, quantitative indices, and pattern classification.
Bandyopadhyay, Sanghamitra; Pal, Sankar K; Aruna, B
2004-10-01
The concept of multiobjective optimization (MOO) has been integrated with variable length chromosomes for the development of a nonparametric genetic classifier which can overcome the problems, like overfitting/overlearning and ignoring smaller classes, as faced by single objective classifiers. The classifier can efficiently approximate any kind of linear and/or nonlinear class boundaries of a data set using an appropriate number of hyperplanes. While designing the classifier the aim is to simultaneously minimize the number of misclassified training points and the number of hyperplanes, and to maximize the product of class wise recognition scores. The concepts of validation set (in addition to training and test sets) and validation functional are introduced in the multiobjective classifier for selecting a solution from a set of nondominated solutions provided by the MOO algorithm. This genetic classifier incorporates elitism and some domain specific constraints in the search process, and is called the CEMOGA-Classifier (constrained elitist multiobjective genetic algorithm based classifier). Two new quantitative indices, namely, the purity and minimal spacing, are developed for evaluating the performance of different MOO techniques. These are used, along with classification accuracy, required number of hyperplanes and the computation time, to compare the CEMOGA-Classifier with other related ones.
Chaotic particle swarm optimization with mutation for classification.
Assarzadeh, Zahra; Naghsh-Nilchi, Ahmad Reza
2015-01-01
In this paper, a chaotic particle swarm optimization with mutation-based classifier particle swarm optimization is proposed to classify patterns of different classes in the feature space. The introduced mutation operators and chaotic sequences allows us to overcome the problem of early convergence into a local minima associated with particle swarm optimization algorithms. That is, the mutation operator sharpens the convergence and it tunes the best possible solution. Furthermore, to remove the irrelevant data and reduce the dimensionality of medical datasets, a feature selection approach using binary version of the proposed particle swarm optimization is introduced. In order to demonstrate the effectiveness of our proposed classifier, mutation-based classifier particle swarm optimization, it is checked out with three sets of data classifications namely, Wisconsin diagnostic breast cancer, Wisconsin breast cancer and heart-statlog, with different feature vector dimensions. The proposed algorithm is compared with different classifier algorithms including k-nearest neighbor, as a conventional classifier, particle swarm-classifier, genetic algorithm, and Imperialist competitive algorithm-classifier, as more sophisticated ones. The performance of each classifier was evaluated by calculating the accuracy, sensitivity, specificity and Matthews's correlation coefficient. The experimental results show that the mutation-based classifier particle swarm optimization unequivocally performs better than all the compared algorithms.
Texture segmentation by genetic programming.
Song, Andy; Ciesielski, Vic
2008-01-01
This paper describes a texture segmentation method using genetic programming (GP), which is one of the most powerful evolutionary computation algorithms. By choosing an appropriate representation texture, classifiers can be evolved without computing texture features. Due to the absence of time-consuming feature extraction, the evolved classifiers enable the development of the proposed texture segmentation algorithm. This GP based method can achieve a segmentation speed that is significantly higher than that of conventional methods. This method does not require a human expert to manually construct models for texture feature extraction. In an analysis of the evolved classifiers, it can be seen that these GP classifiers are not arbitrary. Certain textural regularities are captured by these classifiers to discriminate different textures. GP has been shown in this study as a feasible and a powerful approach for texture classification and segmentation, which are generally considered as complex vision tasks.
Chaotic Particle Swarm Optimization with Mutation for Classification
Assarzadeh, Zahra; Naghsh-Nilchi, Ahmad Reza
2015-01-01
In this paper, a chaotic particle swarm optimization with mutation-based classifier particle swarm optimization is proposed to classify patterns of different classes in the feature space. The introduced mutation operators and chaotic sequences allows us to overcome the problem of early convergence into a local minima associated with particle swarm optimization algorithms. That is, the mutation operator sharpens the convergence and it tunes the best possible solution. Furthermore, to remove the irrelevant data and reduce the dimensionality of medical datasets, a feature selection approach using binary version of the proposed particle swarm optimization is introduced. In order to demonstrate the effectiveness of our proposed classifier, mutation-based classifier particle swarm optimization, it is checked out with three sets of data classifications namely, Wisconsin diagnostic breast cancer, Wisconsin breast cancer and heart-statlog, with different feature vector dimensions. The proposed algorithm is compared with different classifier algorithms including k-nearest neighbor, as a conventional classifier, particle swarm-classifier, genetic algorithm, and Imperialist competitive algorithm-classifier, as more sophisticated ones. The performance of each classifier was evaluated by calculating the accuracy, sensitivity, specificity and Matthews's correlation coefficient. The experimental results show that the mutation-based classifier particle swarm optimization unequivocally performs better than all the compared algorithms. PMID:25709937
2015-01-01
Color is one of the most prominent features of an image and used in many skin and face detection applications. Color space transformation is widely used by researchers to improve face and skin detection performance. Despite the substantial research efforts in this area, choosing a proper color space in terms of skin and face classification performance which can address issues like illumination variations, various camera characteristics and diversity in skin color tones has remained an open issue. This research proposes a new three-dimensional hybrid color space termed SKN by employing the Genetic Algorithm heuristic and Principal Component Analysis to find the optimal representation of human skin color in over seventeen existing color spaces. Genetic Algorithm heuristic is used to find the optimal color component combination setup in terms of skin detection accuracy while the Principal Component Analysis projects the optimal Genetic Algorithm solution to a less complex dimension. Pixel wise skin detection was used to evaluate the performance of the proposed color space. We have employed four classifiers including Random Forest, Naïve Bayes, Support Vector Machine and Multilayer Perceptron in order to generate the human skin color predictive model. The proposed color space was compared to some existing color spaces and shows superior results in terms of pixel-wise skin detection accuracy. Experimental results show that by using Random Forest classifier, the proposed SKN color space obtained an average F-score and True Positive Rate of 0.953 and False Positive Rate of 0.0482 which outperformed the existing color spaces in terms of pixel wise skin detection accuracy. The results also indicate that among the classifiers used in this study, Random Forest is the most suitable classifier for pixel wise skin detection applications. PMID:26267377
Evolution, learning, and cognition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Y.C.
1988-01-01
The book comprises more than fifteen articles in the areas of neural networks and connectionist systems, classifier systems, adaptive network systems, genetic algorithm, cellular automata, artificial immune systems, evolutionary genetics, cognitive science, optical computing, combinatorial optimization, and cybernetics.
A Genetic Algorithm for Learning Significant Phrase Patterns in Radiology Reports
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patton, Robert M; Potok, Thomas E; Beckerman, Barbara G
2009-01-01
Radiologists disagree with each other over the characteristics and features of what constitutes a normal mammogram and the terminology to use in the associated radiology report. Recently, the focus has been on classifying abnormal or suspicious reports, but even this process needs further layers of clustering and gradation, so that individual lesions can be more effectively classified. Using a genetic algorithm, the approach described here successfully learns phrase patterns for two distinct classes of radiology reports (normal and abnormal). These patterns can then be used as a basis for automatically analyzing, categorizing, clustering, or retrieving relevant radiology reports for themore » user.« less
NASA Astrophysics Data System (ADS)
Khehra, Baljit Singh; Pharwaha, Amar Partap Singh
2017-04-01
Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.
Automatic detection of osteoporosis based on hybrid genetic swarm fuzzy classifier approaches
Kavitha, Muthu Subash; Ganesh Kumar, Pugalendhi; Park, Soon-Yong; Huh, Kyung-Hoe; Heo, Min-Suk; Kurita, Takio; Asano, Akira; An, Seo-Yong
2016-01-01
Objectives: This study proposed a new automated screening system based on a hybrid genetic swarm fuzzy (GSF) classifier using digital dental panoramic radiographs to diagnose females with a low bone mineral density (BMD) or osteoporosis. Methods: The geometrical attributes of both the mandibular cortical bone and trabecular bone were acquired using previously developed software. Designing an automated system for osteoporosis screening involved partitioning of the input attributes to generate an initial membership function (MF) and a rule set (RS), classification using a fuzzy inference system and optimization of the generated MF and RS using the genetic swarm algorithm. Fivefold cross-validation (5-FCV) was used to estimate the classification accuracy of the hybrid GSF classifier. The performance of the hybrid GSF classifier has been further compared with that of individual genetic algorithm and particle swarm optimization fuzzy classifiers. Results: Proposed hybrid GSF classifier in identifying low BMD or osteoporosis at the lumbar spine and femoral neck BMD was evaluated. The sensitivity, specificity and accuracy of the hybrid GSF with optimized MF and RS in identifying females with a low BMD were 95.3%, 94.7% and 96.01%, respectively, at the lumbar spine and 99.1%, 98.4% and 98.9%, respectively, at the femoral neck BMD. The diagnostic performance of the proposed system with femoral neck BMD was 0.986 with a confidence interval of 0.942–0.998. The highest mean accuracy using 5-FCV was 97.9% with femoral neck BMD. Conclusions: The combination of high accuracy along with its interpretation ability makes this proposed automatic system using hybrid GSF classifier capable of identifying a large proportion of undetected low BMD or osteoporosis at its early stage. PMID:27186991
The Performance of Short-Term Heart Rate Variability in the Detection of Congestive Heart Failure
Barros, Allan Kardec; Ohnishi, Noboru
2016-01-01
Congestive heart failure (CHF) is a cardiac disease associated with the decreasing capacity of the cardiac output. It has been shown that the CHF is the main cause of the cardiac death around the world. Some works proposed to discriminate CHF subjects from healthy subjects using either electrocardiogram (ECG) or heart rate variability (HRV) from long-term recordings. In this work, we propose an alternative framework to discriminate CHF from healthy subjects by using HRV short-term intervals based on 256 RR continuous samples. Our framework uses a matching pursuit algorithm based on Gabor functions. From the selected Gabor functions, we derived a set of features that are inputted into a hybrid framework which uses a genetic algorithm and k-nearest neighbour classifier to select a subset of features that has the best classification performance. The performance of the framework is analyzed using both Fantasia and CHF database from Physionet archives which are, respectively, composed of 40 healthy volunteers and 29 subjects. From a set of nonstandard 16 features, the proposed framework reaches an overall accuracy of 100% with five features. Our results suggest that the application of hybrid frameworks whose classifier algorithms are based on genetic algorithms has outperformed well-known classifier methods. PMID:27891509
Akbar, Shahid; Hayat, Maqsood; Iqbal, Muhammad; Jan, Mian Ahmad
2017-06-01
Cancer is a fatal disease, responsible for one-quarter of all deaths in developed countries. Traditional anticancer therapies such as, chemotherapy and radiation, are highly expensive, susceptible to errors and ineffective techniques. These conventional techniques induce severe side-effects on human cells. Due to perilous impact of cancer, the development of an accurate and highly efficient intelligent computational model is desirable for identification of anticancer peptides. In this paper, evolutionary intelligent genetic algorithm-based ensemble model, 'iACP-GAEnsC', is proposed for the identification of anticancer peptides. In this model, the protein sequences are formulated, using three different discrete feature representation methods, i.e., amphiphilic Pseudo amino acid composition, g-Gap dipeptide composition, and Reduce amino acid alphabet composition. The performance of the extracted feature spaces are investigated separately and then merged to exhibit the significance of hybridization. In addition, the predicted results of individual classifiers are combined together, using optimized genetic algorithm and simple majority technique in order to enhance the true classification rate. It is observed that genetic algorithm-based ensemble classification outperforms than individual classifiers as well as simple majority voting base ensemble. The performance of genetic algorithm-based ensemble classification is highly reported on hybrid feature space, with an accuracy of 96.45%. In comparison to the existing techniques, 'iACP-GAEnsC' model has achieved remarkable improvement in terms of various performance metrics. Based on the simulation results, it is observed that 'iACP-GAEnsC' model might be a leading tool in the field of drug design and proteomics for researchers. Copyright © 2017 Elsevier B.V. All rights reserved.
Genetic programming based ensemble system for microarray data classification.
Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To
2015-01-01
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.
Genetic Programming Based Ensemble System for Microarray Data Classification
Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To
2015-01-01
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved. PMID:25810748
2012-01-01
Background Development and application of transcriptomics-based gene classifiers for ecotoxicological applications lag far behind those of biomedical sciences. Many such classifiers discovered thus far lack vigorous statistical and experimental validations. A combination of genetic algorithm/support vector machines and genetic algorithm/K nearest neighbors was used in this study to search for classifiers of endocrine-disrupting chemicals (EDCs) in zebrafish. Searches were conducted on both tissue-specific and tissue-combined datasets, either across the entire transcriptome or within individual transcription factor (TF) networks previously linked to EDC effects. Candidate classifiers were evaluated by gene set enrichment analysis (GSEA) on both the original training data and a dedicated validation dataset. Results Multi-tissue dataset yielded no classifiers. Among the 19 chemical-tissue conditions evaluated, the transcriptome-wide searches yielded classifiers for six of them, each having approximately 20 to 30 gene features unique to a condition. Searches within individual TF networks produced classifiers for 15 chemical-tissue conditions, each containing 100 or fewer top-ranked gene features pooled from those of multiple TF networks and also unique to each condition. For the training dataset, 10 out of 11 classifiers successfully identified the gene expression profiles (GEPs) of their targeted chemical-tissue conditions by GSEA. For the validation dataset, classifiers for prochloraz-ovary and flutamide-ovary also correctly identified the GEPs of corresponding conditions while no classifier could predict the GEP from prochloraz-brain. Conclusions The discrepancies in the performance of these classifiers were attributed in part to varying data complexity among the conditions, as measured to some degree by Fisher’s discriminant ratio statistic. This variation in data complexity could likely be compensated by adjusting sample size for individual chemical-tissue conditions, thus suggesting a need for a preliminary survey of transcriptomic responses before launching a full scale classifier discovery effort. Classifier discovery based on individual TF networks could yield more mechanistically-oriented biomarkers. GSEA proved to be a flexible and effective tool for application of gene classifiers but a similar and more refined algorithm, connectivity mapping, should also be explored. The distribution characteristics of classifiers across tissues, chemicals, and TF networks suggested a differential biological impact among the EDCs on zebrafish transcriptome involving some basic cellular functions. PMID:22849515
Nanthini, B. Suguna; Santhi, B.
2017-01-01
Background: Epilepsy causes when the repeated seizure occurs in the brain. Electroencephalogram (EEG) test provides valuable information about the brain functions and can be useful to detect brain disorder, especially for epilepsy. In this study, application for an automated seizure detection model has been introduced successfully. Materials and Methods: The EEG signals are decomposed into sub-bands by discrete wavelet transform using db2 (daubechies) wavelet. The eight statistical features, the four gray level co-occurrence matrix and Renyi entropy estimation with four different degrees of order, are extracted from the raw EEG and its sub-bands. Genetic algorithm (GA) is used to select eight relevant features from the 16 dimension features. The model has been trained and tested using support vector machine (SVM) classifier successfully for EEG signals. The performance of the SVM classifier is evaluated for two different databases. Results: The study has been experimented through two different analyses and achieved satisfactory performance for automated seizure detection using relevant features as the input to the SVM classifier. Conclusion: Relevant features using GA give better accuracy performance for seizure detection. PMID:28781480
Moteghaed, Niloofar Yousefi; Maghooli, Keivan; Garshasbi, Masoud
2018-01-01
Background: Gene expression data are characteristically high dimensional with a small sample size in contrast to the feature size and variability inherent in biological processes that contribute to difficulties in analysis. Selection of highly discriminative features decreases the computational cost and complexity of the classifier and improves its reliability for prediction of a new class of samples. Methods: The present study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy support vector machine (SVM) as the classifier. Fuzzy logic is used to infer the importance of each sample in the training phase and decrease the outlier sensitivity of the system to increase the ability to generalize the classifier. A decision-tree algorithm was applied to the most frequent genes to develop a set of rules for each type of cancer. This improved the abilities of the algorithm by finding the best parameters for the classifier during the training phase without the need for trial-and-error by the user. The proposed approach was tested on four benchmark gene expression profiles. Results: Good results have been demonstrated for the proposed algorithm. The classification accuracy for leukemia data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the best kernel used in training the SVM classifier is the radial basis function. Conclusions: The experimental results show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most informative gene subset, and improve classification accuracy using the optimal parameters of the classifier with no user interface. PMID:29535919
Applications of genetic programming in cancer research.
Worzel, William P; Yu, Jianjun; Almal, Arpit A; Chinnaiyan, Arul M
2009-02-01
The theory of Darwinian evolution is the fundamental keystones of modern biology. Late in the last century, computer scientists began adapting its principles, in particular natural selection, to complex computational challenges, leading to the emergence of evolutionary algorithms. The conceptual model of selective pressure and recombination in evolutionary algorithms allow scientists to efficiently search high dimensional space for solutions to complex problems. In the last decade, genetic programming has been developed and extensively applied for analysis of molecular data to classify cancer subtypes and characterize the mechanisms of cancer pathogenesis and development. This article reviews current successes using genetic programming and discusses its potential impact in cancer research and treatment in the near future.
Genetic algorithm enhanced by machine learning in dynamic aperture optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Yongjun; Cheng, Weixing; Yu, Li Hua
With the aid of machine learning techniques, the genetic algorithm has been enhanced and applied to the multi-objective optimization problem presented by the dynamic aperture of the National Synchrotron Light Source II (NSLS-II) Storage Ring. During the evolution processes employed by the genetic algorithm, the population is classified into different clusters in the search space. The clusters with top average fitness are given “elite” status. Intervention on the population is implemented by repopulating some potentially competitive candidates based on the experience learned from the accumulated data. These candidates replace randomly selected candidates among the original data pool. The average fitnessmore » of the population is therefore improved while diversity is not lost. Maintaining diversity ensures that the optimization is global rather than local. The quality of the population increases and produces more competitive descendants accelerating the evolution process significantly. When identifying the distribution of optimal candidates, they appear to be located in isolated islands within the search space. Some of these optimal candidates have been experimentally confirmed at the NSLS-II storage ring. Furthermore, the machine learning techniques that exploit the genetic algorithm can also be used in other population-based optimization problems such as particle swarm algorithm.« less
Genetic algorithm enhanced by machine learning in dynamic aperture optimization
NASA Astrophysics Data System (ADS)
Li, Yongjun; Cheng, Weixing; Yu, Li Hua; Rainer, Robert
2018-05-01
With the aid of machine learning techniques, the genetic algorithm has been enhanced and applied to the multi-objective optimization problem presented by the dynamic aperture of the National Synchrotron Light Source II (NSLS-II) Storage Ring. During the evolution processes employed by the genetic algorithm, the population is classified into different clusters in the search space. The clusters with top average fitness are given "elite" status. Intervention on the population is implemented by repopulating some potentially competitive candidates based on the experience learned from the accumulated data. These candidates replace randomly selected candidates among the original data pool. The average fitness of the population is therefore improved while diversity is not lost. Maintaining diversity ensures that the optimization is global rather than local. The quality of the population increases and produces more competitive descendants accelerating the evolution process significantly. When identifying the distribution of optimal candidates, they appear to be located in isolated islands within the search space. Some of these optimal candidates have been experimentally confirmed at the NSLS-II storage ring. The machine learning techniques that exploit the genetic algorithm can also be used in other population-based optimization problems such as particle swarm algorithm.
Genetic algorithm enhanced by machine learning in dynamic aperture optimization
Li, Yongjun; Cheng, Weixing; Yu, Li Hua; ...
2018-05-29
With the aid of machine learning techniques, the genetic algorithm has been enhanced and applied to the multi-objective optimization problem presented by the dynamic aperture of the National Synchrotron Light Source II (NSLS-II) Storage Ring. During the evolution processes employed by the genetic algorithm, the population is classified into different clusters in the search space. The clusters with top average fitness are given “elite” status. Intervention on the population is implemented by repopulating some potentially competitive candidates based on the experience learned from the accumulated data. These candidates replace randomly selected candidates among the original data pool. The average fitnessmore » of the population is therefore improved while diversity is not lost. Maintaining diversity ensures that the optimization is global rather than local. The quality of the population increases and produces more competitive descendants accelerating the evolution process significantly. When identifying the distribution of optimal candidates, they appear to be located in isolated islands within the search space. Some of these optimal candidates have been experimentally confirmed at the NSLS-II storage ring. Furthermore, the machine learning techniques that exploit the genetic algorithm can also be used in other population-based optimization problems such as particle swarm algorithm.« less
Cao, Qi; Leung, K M
2014-09-22
Reliable computer models for the prediction of chemical biodegradability from molecular descriptors and fingerprints are very important for making health and environmental decisions. Coupling of the differential evolution (DE) algorithm with the support vector classifier (SVC) in order to optimize the main parameters of the classifier resulted in an improved classifier called the DE-SVC, which is introduced in this paper for use in chemical biodegradability studies. The DE-SVC was applied to predict the biodegradation of chemicals on the basis of extensive sample data sets and known structural features of molecules. Our optimization experiments showed that DE can efficiently find the proper parameters of the SVC. The resulting classifier possesses strong robustness and reliability compared with grid search, genetic algorithm, and particle swarm optimization methods. The classification experiments conducted here showed that the DE-SVC exhibits better classification performance than models previously used for such studies. It is a more effective and efficient prediction model for chemical biodegradability.
Evolving neural networks with genetic algorithms to study the string landscape
NASA Astrophysics Data System (ADS)
Ruehle, Fabian
2017-08-01
We study possible applications of artificial neural networks to examine the string landscape. Since the field of application is rather versatile, we propose to dynamically evolve these networks via genetic algorithms. This means that we start from basic building blocks and combine them such that the neural network performs best for the application we are interested in. We study three areas in which neural networks can be applied: to classify models according to a fixed set of (physically) appealing features, to find a concrete realization for a computation for which the precise algorithm is known in principle but very tedious to actually implement, and to predict or approximate the outcome of some involved mathematical computation which performs too inefficient to apply it, e.g. in model scans within the string landscape. We present simple examples that arise in string phenomenology for all three types of problems and discuss how they can be addressed by evolving neural networks from genetic algorithms.
Classifier-Guided Sampling for Complex Energy System Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Backlund, Peter B.; Eddy, John P.
2015-09-01
This report documents the results of a Laboratory Directed Research and Development (LDRD) effort enti tled "Classifier - Guided Sampling for Complex Energy System Optimization" that was conducted during FY 2014 and FY 2015. The goal of this proj ect was to develop, implement, and test major improvements to the classifier - guided sampling (CGS) algorithm. CGS is type of evolutionary algorithm for perform ing search and optimization over a set of discrete design variables in the face of one or more objective functions. E xisting evolutionary algorithms, such as genetic algorithms , may require a large number of omore » bjecti ve function evaluations to identify optimal or near - optimal solutions . Reducing the number of evaluations can result in significant time savings, especially if the objective function is computationally expensive. CGS reduce s the evaluation count by us ing a Bayesian network classifier to filter out non - promising candidate designs , prior to evaluation, based on their posterior probabilit ies . In this project, b oth the single - objective and multi - objective version s of the CGS are developed and tested on a set of benchm ark problems. As a domain - specific case study, CGS is used to design a microgrid for use in islanded mode during an extended bulk power grid outage.« less
NASA Astrophysics Data System (ADS)
Sheikhan, Mansour; Abbasnezhad Arabi, Mahdi; Gharavian, Davood
2015-10-01
Artificial neural networks are efficient models in pattern recognition applications, but their performance is dependent on employing suitable structure and connection weights. This study used a hybrid method for obtaining the optimal weight set and architecture of a recurrent neural emotion classifier based on gravitational search algorithm (GSA) and its binary version (BGSA), respectively. By considering the features of speech signal that were related to prosody, voice quality, and spectrum, a rich feature set was constructed. To select more efficient features, a fast feature selection method was employed. The performance of the proposed hybrid GSA-BGSA method was compared with similar hybrid methods based on particle swarm optimisation (PSO) algorithm and its binary version, PSO and discrete firefly algorithm, and hybrid of error back-propagation and genetic algorithm that were used for optimisation. Experimental tests on Berlin emotional database demonstrated the superior performance of the proposed method using a lighter network structure.
An ensemble of SVM classifiers based on gene pairs.
Tong, Muchenxuan; Liu, Kun-Hong; Xu, Chungui; Ju, Wenbin
2013-07-01
In this paper, a genetic algorithm (GA) based ensemble support vector machine (SVM) classifier built on gene pairs (GA-ESP) is proposed. The SVMs (base classifiers of the ensemble system) are trained on different informative gene pairs. These gene pairs are selected by the top scoring pair (TSP) criterion. Each of these pairs projects the original microarray expression onto a 2-D space. Extensive permutation of gene pairs may reveal more useful information and potentially lead to an ensemble classifier with satisfactory accuracy and interpretability. GA is further applied to select an optimized combination of base classifiers. The effectiveness of the GA-ESP classifier is evaluated on both binary-class and multi-class datasets. Copyright © 2013 Elsevier Ltd. All rights reserved.
Holland, Katherine D; Bouley, Thomas M; Horn, Paul S
2017-07-01
Variants in neuronal voltage-gated sodium channel α-subunits genes SCN1A, SCN2A, and SCN8A are common in early onset epileptic encephalopathies and other autosomal dominant childhood epilepsy syndromes. However, in clinical practice, missense variants are often classified as variants of uncertain significance when missense variants are identified but heritability cannot be determined. Genetic testing reports often include results of computational tests to estimate pathogenicity and the frequency of that variant in population-based databases. The objective of this work was to enhance clinicians' understanding of results by (1) determining how effectively computational algorithms predict epileptogenicity of sodium channel (SCN) missense variants; (2) optimizing their predictive capabilities; and (3) determining if epilepsy-associated SCN variants are present in population-based databases. This will help clinicians better understand the results of indeterminate SCN test results in people with epilepsy. Pathogenic, likely pathogenic, and benign variants in SCNs were identified using databases of sodium channel variants. Benign variants were also identified from population-based databases. Eight algorithms commonly used to predict pathogenicity were compared. In addition, logistic regression was used to determine if a combination of algorithms could better predict pathogenicity. Based on American College of Medical Genetic Criteria, 440 variants were classified as pathogenic or likely pathogenic and 84 were classified as benign or likely benign. Twenty-eight variants previously associated with epilepsy were present in population-based gene databases. The output provided by most computational algorithms had a high sensitivity but low specificity with an accuracy of 0.52-0.77. Accuracy could be improved by adjusting the threshold for pathogenicity. Using this adjustment, the Mendelian Clinically Applicable Pathogenicity (M-CAP) algorithm had an accuracy of 0.90 and a combination of algorithms increased the accuracy to 0.92. Potentially pathogenic variants are present in population-based sources. Most computational algorithms overestimate pathogenicity; however, a weighted combination of several algorithms increased classification accuracy to >0.90. Wiley Periodicals, Inc. © 2017 International League Against Epilepsy.
NASA Astrophysics Data System (ADS)
Li, Shaoxin; Li, Linfang; Zeng, Qiuyao; Zhang, Yanjiao; Guo, Zhouyi; Liu, Zhiming; Jin, Mei; Su, Chengkang; Lin, Lin; Xu, Junfa; Liu, Songhao
2015-05-01
This study aims to characterize and classify serum surface-enhanced Raman spectroscopy (SERS) spectra between bladder cancer patients and normal volunteers by genetic algorithms (GAs) combined with linear discriminate analysis (LDA). Two group serum SERS spectra excited with nanoparticles are collected from healthy volunteers (n = 36) and bladder cancer patients (n = 55). Six diagnostic Raman bands in the regions of 481-486, 682-687, 1018-1034, 1313-1323, 1450-1459 and 1582-1587 cm-1 related to proteins, nucleic acids and lipids are picked out with the GAs and LDA. By the diagnostic models built with the identified six Raman bands, the improved diagnostic sensitivity of 90.9% and specificity of 100% were acquired for classifying bladder cancer patients from normal serum SERS spectra. The results are superior to the sensitivity of 74.6% and specificity of 97.2% obtained with principal component analysis by the same serum SERS spectra dataset. Receiver operating characteristic (ROC) curves further confirmed the efficiency of diagnostic algorithm based on GA-LDA technique. This exploratory work demonstrates that the serum SERS associated with GA-LDA technique has enormous potential to characterize and non-invasively detect bladder cancer through peripheral blood.
Optimization of the ANFIS using a genetic algorithm for physical work rate classification.
Habibi, Ehsanollah; Salehi, Mina; Yadegarfar, Ghasem; Taheri, Ali
2018-03-13
Recently, a new method was proposed for physical work rate classification based on an adaptive neuro-fuzzy inference system (ANFIS). This study aims to present a genetic algorithm (GA)-optimized ANFIS model for a highly accurate classification of physical work rate. Thirty healthy men participated in this study. Directly measured heart rate and oxygen consumption of the participants in the laboratory were used for training the ANFIS classifier model in MATLAB version 8.0.0 using a hybrid algorithm. A similar process was done using the GA as an optimization technique. The accuracy, sensitivity and specificity of the ANFIS classifier model were increased successfully. The mean accuracy of the model was increased from 92.95 to 97.92%. Also, the calculated root mean square error of the model was reduced from 5.4186 to 3.1882. The maximum estimation error of the optimized ANFIS during the network testing process was ± 5%. The GA can be effectively used for ANFIS optimization and leads to an accurate classification of physical work rate. In addition to high accuracy, simple implementation and inter-individual variability consideration are two other advantages of the presented model.
A novel approach for dimension reduction of microarray.
Aziz, Rabia; Verma, C K; Srivastava, Namita
2017-12-01
This paper proposes a new hybrid search technique for feature (gene) selection (FS) using Independent component analysis (ICA) and Artificial Bee Colony (ABC) called ICA+ABC, to select informative genes based on a Naïve Bayes (NB) algorithm. An important trait of this technique is the optimization of ICA feature vector using ABC. ICA+ABC is a hybrid search algorithm that combines the benefits of extraction approach, to reduce the size of data and wrapper approach, to optimize the reduced feature vectors. This hybrid search technique is facilitated by evaluating the performance of ICA+ABC on six standard gene expression datasets of classification. Extensive experiments were conducted to compare the performance of ICA+ABC with the results obtained from recently published Minimum Redundancy Maximum Relevance (mRMR) +ABC algorithm for NB classifier. Also to check the performance that how ICA+ABC works as feature selection with NB classifier, compared the combination of ICA with popular filter techniques and with other similar bio inspired algorithm such as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The result shows that ICA+ABC has a significant ability to generate small subsets of genes from the ICA feature vector, that significantly improve the classification accuracy of NB classifier compared to other previously suggested methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
Multi-objective evolutionary algorithms for fuzzy classification in survival prediction.
Jiménez, Fernando; Sánchez, Gracia; Juárez, José M
2014-03-01
This paper presents a novel rule-based fuzzy classification methodology for survival/mortality prediction in severe burnt patients. Due to the ethical aspects involved in this medical scenario, physicians tend not to accept a computer-based evaluation unless they understand why and how such a recommendation is given. Therefore, any fuzzy classifier model must be both accurate and interpretable. The proposed methodology is a three-step process: (1) multi-objective constrained optimization of a patient's data set, using Pareto-based elitist multi-objective evolutionary algorithms to maximize accuracy and minimize the complexity (number of rules) of classifiers, subject to interpretability constraints; this step produces a set of alternative (Pareto) classifiers; (2) linguistic labeling, which assigns a linguistic label to each fuzzy set of the classifiers; this step is essential to the interpretability of the classifiers; (3) decision making, whereby a classifier is chosen, if it is satisfactory, according to the preferences of the decision maker. If no classifier is satisfactory for the decision maker, the process starts again in step (1) with a different input parameter set. The performance of three multi-objective evolutionary algorithms, niched pre-selection multi-objective algorithm, elitist Pareto-based multi-objective evolutionary algorithm for diversity reinforcement (ENORA) and the non-dominated sorting genetic algorithm (NSGA-II), was tested using a patient's data set from an intensive care burn unit and a standard machine learning data set from an standard machine learning repository. The results are compared using the hypervolume multi-objective metric. Besides, the results have been compared with other non-evolutionary techniques and validated with a multi-objective cross-validation technique. Our proposal improves the classification rate obtained by other non-evolutionary techniques (decision trees, artificial neural networks, Naive Bayes, and case-based reasoning) obtaining with ENORA a classification rate of 0.9298, specificity of 0.9385, and sensitivity of 0.9364, with 14.2 interpretable fuzzy rules on average. Our proposal improves the accuracy and interpretability of the classifiers, compared with other non-evolutionary techniques. We also conclude that ENORA outperforms niched pre-selection and NSGA-II algorithms. Moreover, given that our multi-objective evolutionary methodology is non-combinational based on real parameter optimization, the time cost is significantly reduced compared with other evolutionary approaches existing in literature based on combinational optimization. Copyright © 2014 Elsevier B.V. All rights reserved.
Cost-sensitive case-based reasoning using a genetic algorithm: application to medical diagnosis.
Park, Yoon-Joo; Chun, Se-Hak; Kim, Byung-Chun
2011-02-01
The paper studies the new learning technique called cost-sensitive case-based reasoning (CSCBR) incorporating unequal misclassification cost into CBR model. Conventional CBR is now considered as a suitable technique for diagnosis, prognosis and prescription in medicine. However it lacks the ability to reflect asymmetric misclassification and often assumes that the cost of a positive diagnosis (an illness) as a negative one (no illness) is the same with that of the opposite situation. Thus, the objective of this research is to overcome the limitation of conventional CBR and encourage applying CBR to many real world medical cases associated with costs of asymmetric misclassification errors. The main idea involves adjusting the optimal cut-off classification point for classifying the absence or presence of diseases and the cut-off distance point for selecting optimal neighbors within search spaces based on similarity distribution. These steps are dynamically adapted to new target cases using a genetic algorithm. We apply this proposed method to five real medical datasets and compare the results with two other cost-sensitive learning methods-C5.0 and CART. Our finding shows that the total misclassification cost of CSCBR is lower than other cost-sensitive methods in many cases. Even though the genetic algorithm has limitations in terms of unstable results and over-fitting training data, CSCBR results with GA are better overall than those of other methods. Also the paired t-test results indicate that the total misclassification cost of CSCBR is significantly less than C5.0 and CART for several datasets. We have proposed a new CBR method called cost-sensitive case-based reasoning (CSCBR) that can incorporate unequal misclassification costs into CBR and optimize the number of neighbors dynamically using a genetic algorithm. It is meaningful not only for introducing the concept of cost-sensitive learning to CBR, but also for encouraging the use of CBR in the medical area. The result shows that the total misclassification costs of CSCBR do not increase in arithmetic progression as the cost of false absence increases arithmetically, thus it is cost-sensitive. We also show that total misclassification costs of CSCBR are the lowest among all methods in four datasets out of five and the result is statistically significant in many cases. The limitation of our proposed CSCBR is confined to classify binary cases for minimizing misclassification cost because our proposed CSCBR is originally designed to classify binary case. Our future work extends this method for multi-classification which can classify more than two groups. Copyright © 2010 Elsevier B.V. All rights reserved.
ExSTraCS 2.0: Description and Evaluation of a Scalable Learning Classifier System.
Urbanowicz, Ryan J; Moore, Jason H
2015-09-01
Algorithmic scalability is a major concern for any machine learning strategy in this age of 'big data'. A large number of potentially predictive attributes is emblematic of problems in bioinformatics, genetic epidemiology, and many other fields. Previously, ExS-TraCS was introduced as an extended Michigan-style supervised learning classifier system that combined a set of powerful heuristics to successfully tackle the challenges of classification, prediction, and knowledge discovery in complex, noisy, and heterogeneous problem domains. While Michigan-style learning classifier systems are powerful and flexible learners, they are not considered to be particularly scalable. For the first time, this paper presents a complete description of the ExS-TraCS algorithm and introduces an effective strategy to dramatically improve learning classifier system scalability. ExSTraCS 2.0 addresses scalability with (1) a rule specificity limit, (2) new approaches to expert knowledge guided covering and mutation mechanisms, and (3) the implementation and utilization of the TuRF algorithm for improving the quality of expert knowledge discovery in larger datasets. Performance over a complex spectrum of simulated genetic datasets demonstrated that these new mechanisms dramatically improve nearly every performance metric on datasets with 20 attributes and made it possible for ExSTraCS to reliably scale up to perform on related 200 and 2000-attribute datasets. ExSTraCS 2.0 was also able to reliably solve the 6, 11, 20, 37, 70, and 135 multiplexer problems, and did so in similar or fewer learning iterations than previously reported, with smaller finite training sets, and without using building blocks discovered from simpler multiplexer problems. Furthermore, ExS-TraCS usability was made simpler through the elimination of previously critical run parameters.
Xu, Libin; Li, Yang; Xu, Ning; Hu, Yong; Wang, Chao; He, Jianjun; Cao, Yueze; Chen, Shigui; Li, Dongsheng
2014-12-24
This work demonstrated the possibility of using artificial neural networks to classify soy sauce from China. The aroma profiles of different soy sauce samples were differentiated using headspace solid-phase microextraction. The soy sauce samples were analyzed by gas chromatography-mass spectrometry, and 22 and 15 volatile aroma compounds were selected for sensitivity analysis to classify the samples by fermentation and geographic region, respectively. The 15 selected samples can be classified by fermentation and geographic region with a prediction success rate of 100%. Furans and phenols represented the variables with the greatest contribution in classifying soy sauce samples by fermentation and geographic region, respectively.
NASA Astrophysics Data System (ADS)
Paino, A.; Keller, J.; Popescu, M.; Stone, K.
2014-06-01
In this paper we present an approach that uses Genetic Programming (GP) to evolve novel feature extraction algorithms for greyscale images. Our motivation is to create an automated method of building new feature extraction algorithms for images that are competitive with commonly used human-engineered features, such as Local Binary Pattern (LBP) and Histogram of Oriented Gradients (HOG). The evolved feature extraction algorithms are functions defined over the image space, and each produces a real-valued feature vector of variable length. Each evolved feature extractor breaks up the given image into a set of cells centered on every pixel, performs evolved operations on each cell, and then combines the results of those operations for every cell using an evolved operator. Using this method, the algorithm is flexible enough to reproduce both LBP and HOG features. The dataset we use to train and test our approach consists of a large number of pre-segmented image "chips" taken from a Forward Looking Infrared Imagery (FLIR) camera mounted on the hood of a moving vehicle. The goal is to classify each image chip as either containing or not containing a buried object. To this end, we define the fitness of a candidate solution as the cross-fold validation accuracy of the features generated by said candidate solution when used in conjunction with a Support Vector Machine (SVM) classifier. In order to validate our approach, we compare the classification accuracy of an SVM trained using our evolved features with the accuracy of an SVM trained using mainstream feature extraction algorithms, including LBP and HOG.
USDA-ARS?s Scientific Manuscript database
Support Vector Machine (SVM) was used in the Genetic Algorithms (GA) process to select and classify a subset of hyperspectral image bands. The method was applied to fluorescence hyperspectral data for the detection of aflatoxin contamination in Aspergillus flavus infected single corn kernels. In the...
Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition.
Bardsiri, Mahshid Khatibi; Eftekhari, Mahdi
2014-01-01
In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree (DT) are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the features of the datasets are divided into some groups. Then, for each of these groups, three ensemble classifiers, namely, random forest, rotation forest and AdaBoost.M1 are employed. Also, some fusion methods are introduced for combining the ensemble classifiers obtained in the previous step. After this step, three classifiers are produced based on the combination of classifiers of types random forest, rotation forest and AdaBoost.M1. Finally, the three different classifiers achieved are combined to make an overall classifier. Experimental results show that the overall classifier obtained by the genetic algorithm (GA) weighting fusion method, is the best one in comparison to previously applied methods in terms of classification accuracy.
An improved K-means clustering method for cDNA microarray image segmentation.
Wang, T N; Li, T J; Shao, G F; Wu, S X
2015-07-14
Microarray technology is a powerful tool for human genetic research and other biomedical applications. Numerous improvements to the standard K-means algorithm have been carried out to complete the image segmentation step. However, most of the previous studies classify the image into two clusters. In this paper, we propose a novel K-means algorithm, which first classifies the image into three clusters, and then one of the three clusters is divided as the background region and the other two clusters, as the foreground region. The proposed method was evaluated on six different data sets. The analyses of accuracy, efficiency, expression values, special gene spots, and noise images demonstrate the effectiveness of our method in improving the segmentation quality.
Chen, Peng; Li, Jinyan
2010-05-17
Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC. Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy.
Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R; Barman, Ishan; Kumar Gundawar, Manoj
2015-08-19
Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the 'curse of dimensionality' have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers -based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations.
Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R.; Barman, Ishan; Kumar Gundawar, Manoj
2015-01-01
Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the ‘curse of dimensionality’ have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers –based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations. PMID:26286630
Genetic algorithm for the optimization of features and neural networks in ECG signals classification
NASA Astrophysics Data System (ADS)
Li, Hongqiang; Yuan, Danyang; Ma, Xiangdong; Cui, Dianyin; Cao, Lu
2017-01-01
Feature extraction and classification of electrocardiogram (ECG) signals are necessary for the automatic diagnosis of cardiac diseases. In this study, a novel method based on genetic algorithm-back propagation neural network (GA-BPNN) for classifying ECG signals with feature extraction using wavelet packet decomposition (WPD) is proposed. WPD combined with the statistical method is utilized to extract the effective features of ECG signals. The statistical features of the wavelet packet coefficients are calculated as the feature sets. GA is employed to decrease the dimensions of the feature sets and to optimize the weights and biases of the back propagation neural network (BPNN). Thereafter, the optimized BPNN classifier is applied to classify six types of ECG signals. In addition, an experimental platform is constructed for ECG signal acquisition to supply the ECG data for verifying the effectiveness of the proposed method. The GA-BPNN method with the MIT-BIH arrhythmia database achieved a dimension reduction of nearly 50% and produced good classification results with an accuracy of 97.78%. The experimental results based on the established acquisition platform indicated that the GA-BPNN method achieved a high classification accuracy of 99.33% and could be efficiently applied in the automatic identification of cardiac arrhythmias.
Optimizing Support Vector Machine Parameters with Genetic Algorithm for Credit Risk Assessment
NASA Astrophysics Data System (ADS)
Manurung, Jonson; Mawengkang, Herman; Zamzami, Elviawaty
2017-12-01
Support vector machine (SVM) is a popular classification method known to have strong generalization capabilities. SVM can solve the problem of classification and linear regression or nonlinear kernel which can be a learning algorithm for the ability of classification and regression. However, SVM also has a weakness that is difficult to determine the optimal parameter value. SVM calculates the best linear separator on the input feature space according to the training data. To classify data which are non-linearly separable, SVM uses kernel tricks to transform the data into a linearly separable data on a higher dimension feature space. The kernel trick using various kinds of kernel functions, such as : linear kernel, polynomial, radial base function (RBF) and sigmoid. Each function has parameters which affect the accuracy of SVM classification. To solve the problem genetic algorithms are proposed to be applied as the optimal parameter value search algorithm thus increasing the best classification accuracy on SVM. Data taken from UCI repository of machine learning database: Australian Credit Approval. The results show that the combination of SVM and genetic algorithms is effective in improving classification accuracy. Genetic algorithms has been shown to be effective in systematically finding optimal kernel parameters for SVM, instead of randomly selected kernel parameters. The best accuracy for data has been upgraded from kernel Linear: 85.12%, polynomial: 81.76%, RBF: 77.22% Sigmoid: 78.70%. However, for bigger data sizes, this method is not practical because it takes a lot of time.
Khoje, Suchitra
2018-02-01
Images of four qualities of mangoes and guavas are evaluated for color and textural features to characterize and classify them, and to model the fruit appearance grading. The paper discusses three approaches to identify most discriminating texture features of both the fruits. In the first approach, fruit's color and texture features are selected using Mahalanobis distance. A total of 20 color features and 40 textural features are extracted for analysis. Using Mahalanobis distance and feature intercorrelation analyses, one best color feature (mean of a* [L*a*b* color space]) and two textural features (energy a*, contrast of H*) are selected as features for Guava while two best color features (R std, H std) and one textural features (energy b*) are selected as features for mangoes with the highest discriminate power. The second approach studies some common wavelet families for searching the best classification model for fruit quality grading. The wavelet features extracted from five basic mother wavelets (db, bior, rbior, Coif, Sym) are explored to characterize fruits texture appearance. In third approach, genetic algorithm is used to select only those color and wavelet texture features that are relevant to the separation of the class, from a large universe of features. The study shows that image color and texture features which were identified using a genetic algorithm can distinguish between various qualities classes of fruits. The experimental results showed that support vector machine classifier is elected for Guava grading with an accuracy of 97.61% and artificial neural network is elected from Mango grading with an accuracy of 95.65%. The proposed method is nondestructive fruit quality assessment method. The experimental results has proven that Genetic algorithm along with wavelet textures feature has potential to discriminate fruit quality. Finally, it can be concluded that discussed method is an accurate, reliable, and objective tool to determine fruit quality namely Mango and Guava, and might be applicable to in-line sorting systems. © 2017 Wiley Periodicals, Inc.
GENIE: a hybrid genetic algorithm for feature classification in multispectral images
NASA Astrophysics Data System (ADS)
Perkins, Simon J.; Theiler, James P.; Brumby, Steven P.; Harvey, Neal R.; Porter, Reid B.; Szymanski, John J.; Bloch, Jeffrey J.
2000-10-01
We consider the problem of pixel-by-pixel classification of a multi- spectral image using supervised learning. Conventional spuervised classification techniques such as maximum likelihood classification and less conventional ones s uch as neural networks, typically base such classifications solely on the spectral components of each pixel. It is easy to see why: the color of a pixel provides a nice, bounded, fixed dimensional space in which these classifiers work well. It is often the case however, that spectral information alone is not sufficient to correctly classify a pixel. Maybe spatial neighborhood information is required as well. Or maybe the raw spectral components do not themselves make for easy classification, but some arithmetic combination of them would. In either of these cases we have the problem of selecting suitable spatial, spectral or spatio-spectral features that allow the classifier to do its job well. The number of all possible such features is extremely large. How can we select a suitable subset? We have developed GENIE, a hybrid learning system that combines a genetic algorithm that searches a space of image processing operations for a set that can produce suitable feature planes, and a more conventional classifier which uses those feature planes to output a final classification. In this paper we show that the use of a hybrid GA provides significant advantages over using either a GA alone or more conventional classification methods alone. We present results using high-resolution IKONOS data, looking for regions of burned forest and for roads.
Dashtban, M; Balafar, Mohammadali
2017-03-01
Gene selection is a demanding task for microarray data analysis. The diverse complexity of different cancers makes this issue still challenging. In this study, a novel evolutionary method based on genetic algorithms and artificial intelligence is proposed to identify predictive genes for cancer classification. A filter method was first applied to reduce the dimensionality of feature space followed by employing an integer-coded genetic algorithm with dynamic-length genotype, intelligent parameter settings, and modified operators. The algorithmic behaviors including convergence trends, mutation and crossover rate changes, and running time were studied, conceptually discussed, and shown to be coherent with literature findings. Two well-known filter methods, Laplacian and Fisher score, were examined considering similarities, the quality of selected genes, and their influences on the evolutionary approach. Several statistical tests concerning choice of classifier, choice of dataset, and choice of filter method were performed, and they revealed some significant differences between the performance of different classifiers and filter methods over datasets. The proposed method was benchmarked upon five popular high-dimensional cancer datasets; for each, top explored genes were reported. Comparing the experimental results with several state-of-the-art methods revealed that the proposed method outperforms previous methods in DLBCL dataset. Copyright © 2017 Elsevier Inc. All rights reserved.
Mora, Juan David Sandino; Hurtado, Darío Amaya; Sandoval, Olga Lucía Ramos
2016-01-01
Background: Reported cases of uncontrolled use of pesticides and its produced effects by direct or indirect exposition, represent a high risk for human health. Therefore, in this paper, it is shown the results of the development and execution of an algorithm that predicts the possible effects in endocrine system in Fisher 344 (F344) rats, occasioned by ingestion of malathion. Methods: It was referred to ToxRefDB database in which different case studies in F344 rats exposed to malathion were collected. The experimental data were processed using Naïve Bayes (NB) machine learning classifier, which was subsequently optimized using genetic algorithms (GAs). The model was executed in an application with a graphical user interface programmed in C#. Results: There was a tendency to suffer bigger alterations, increasing levels in the parathyroid gland in dosages between 4 and 5 mg/kg/day, in contrast to the thyroid gland for doses between 739 and 868 mg/kg/day. It was showed a greater resistance for females to contract effects on the endocrine system by the ingestion of malathion. Females were more susceptible to suffer alterations in the pituitary gland with exposure times between 3 and 6 months. Conclusions: The prediction model based on NB classifiers allowed to analyze all the possible combinations of the studied variables and improving its accuracy using GAs. Excepting the pituitary gland, females demonstrated better resistance to contract effects by increasing levels on the rest of endocrine system glands. PMID:27833725
Mora, Juan David Sandino; Hurtado, Darío Amaya; Sandoval, Olga Lucía Ramos
2016-01-01
Reported cases of uncontrolled use of pesticides and its produced effects by direct or indirect exposition, represent a high risk for human health. Therefore, in this paper, it is shown the results of the development and execution of an algorithm that predicts the possible effects in endocrine system in Fisher 344 (F344) rats, occasioned by ingestion of malathion. It was referred to ToxRefDB database in which different case studies in F344 rats exposed to malathion were collected. The experimental data were processed using Naïve Bayes (NB) machine learning classifier, which was subsequently optimized using genetic algorithms (GAs). The model was executed in an application with a graphical user interface programmed in C#. There was a tendency to suffer bigger alterations, increasing levels in the parathyroid gland in dosages between 4 and 5 mg/kg/day, in contrast to the thyroid gland for doses between 739 and 868 mg/kg/day. It was showed a greater resistance for females to contract effects on the endocrine system by the ingestion of malathion. Females were more susceptible to suffer alterations in the pituitary gland with exposure times between 3 and 6 months. The prediction model based on NB classifiers allowed to analyze all the possible combinations of the studied variables and improving its accuracy using GAs. Excepting the pituitary gland, females demonstrated better resistance to contract effects by increasing levels on the rest of endocrine system glands.
A fuzzy classifier system for process control
NASA Technical Reports Server (NTRS)
Karr, C. L.; Phillips, J. C.
1994-01-01
A fuzzy classifier system that discovers rules for controlling a mathematical model of a pH titration system was developed by researchers at the U.S. Bureau of Mines (USBM). Fuzzy classifier systems successfully combine the strengths of learning classifier systems and fuzzy logic controllers. Learning classifier systems resemble familiar production rule-based systems, but they represent their IF-THEN rules by strings of characters rather than in the traditional linguistic terms. Fuzzy logic is a tool that allows for the incorporation of abstract concepts into rule based-systems, thereby allowing the rules to resemble the familiar 'rules-of-thumb' commonly used by humans when solving difficult process control and reasoning problems. Like learning classifier systems, fuzzy classifier systems employ a genetic algorithm to explore and sample new rules for manipulating the problem environment. Like fuzzy logic controllers, fuzzy classifier systems encapsulate knowledge in the form of production rules. The results presented in this paper demonstrate the ability of fuzzy classifier systems to generate a fuzzy logic-based process control system.
NASA Astrophysics Data System (ADS)
Kim, Byung Soo; Lee, Woon-Seek; Koh, Shiegheun
2012-07-01
This article considers an inbound ordering and outbound dispatching problem for a single product in a third-party warehouse, where the demands are dynamic over a discrete and finite time horizon, and moreover, each demand has a time window in which it must be satisfied. Replenishing orders are shipped in containers and the freight cost is proportional to the number of containers used. The problem is classified into two cases, i.e. non-split demand case and split demand case, and a mathematical model for each case is presented. An in-depth analysis of the models shows that they are very complicated and difficult to find optimal solutions as the problem size becomes large. Therefore, genetic algorithm (GA) based heuristic approaches are designed to solve the problems in a reasonable time. To validate and evaluate the algorithms, finally, some computational experiments are conducted.
An incremental approach to genetic-algorithms-based classification.
Guan, Sheng-Uei; Zhu, Fangming
2005-04-01
Incremental learning has been widely addressed in the machine learning literature to cope with learning tasks where the learning environment is ever changing or training samples become available over time. However, most research work explores incremental learning with statistical algorithms or neural networks, rather than evolutionary algorithms. The work in this paper employs genetic algorithms (GAs) as basic learning algorithms for incremental learning within one or more classifier agents in a multiagent environment. Four new approaches with different initialization schemes are proposed. They keep the old solutions and use an "integration" operation to integrate them with new elements to accommodate new attributes, while biased mutation and crossover operations are adopted to further evolve a reinforced solution. The simulation results on benchmark classification data sets show that the proposed approaches can deal with the arrival of new input attributes and integrate them with the original input space. It is also shown that the proposed approaches can be successfully used for incremental learning and improve classification rates as compared to the retraining GA. Possible applications for continuous incremental training and feature selection are also discussed.
Yaacoub, Charles; Mhanna, Georges; Rihana, Sandy
2017-01-01
Electroencephalography is a non-invasive measure of the brain electrical activity generated by millions of neurons. Feature extraction in electroencephalography analysis is a core issue that may lead to accurate brain mental state classification. This paper presents a new feature selection method that improves left/right hand movement identification of a motor imagery brain-computer interface, based on genetic algorithms and artificial neural networks used as classifiers. Raw electroencephalography signals are first preprocessed using appropriate filtering. Feature extraction is carried out afterwards, based on spectral and temporal signal components, and thus a feature vector is constructed. As various features might be inaccurate and mislead the classifier, thus degrading the overall system performance, the proposed approach identifies a subset of features from a large feature space, such that the classifier error rate is reduced. Experimental results show that the proposed method is able to reduce the number of features to as low as 0.5% (i.e., the number of ignored features can reach 99.5%) while improving the accuracy, sensitivity, specificity, and precision of the classifier. PMID:28124985
Yaacoub, Charles; Mhanna, Georges; Rihana, Sandy
2017-01-23
Electroencephalography is a non-invasive measure of the brain electrical activity generated by millions of neurons. Feature extraction in electroencephalography analysis is a core issue that may lead to accurate brain mental state classification. This paper presents a new feature selection method that improves left/right hand movement identification of a motor imagery brain-computer interface, based on genetic algorithms and artificial neural networks used as classifiers. Raw electroencephalography signals are first preprocessed using appropriate filtering. Feature extraction is carried out afterwards, based on spectral and temporal signal components, and thus a feature vector is constructed. As various features might be inaccurate and mislead the classifier, thus degrading the overall system performance, the proposed approach identifies a subset of features from a large feature space, such that the classifier error rate is reduced. Experimental results show that the proposed method is able to reduce the number of features to as low as 0.5% (i.e., the number of ignored features can reach 99.5%) while improving the accuracy, sensitivity, specificity, and precision of the classifier.
Research on classified real-time flood forecasting framework based on K-means cluster and rough set.
Xu, Wei; Peng, Yong
2015-01-01
This research presents a new classified real-time flood forecasting framework. In this framework, historical floods are classified by a K-means cluster according to the spatial and temporal distribution of precipitation, the time variance of precipitation intensity and other hydrological factors. Based on the classified results, a rough set is used to extract the identification rules for real-time flood forecasting. Then, the parameters of different categories within the conceptual hydrological model are calibrated using a genetic algorithm. In real-time forecasting, the corresponding category of parameters is selected for flood forecasting according to the obtained flood information. This research tests the new classified framework on Guanyinge Reservoir and compares the framework with the traditional flood forecasting method. It finds that the performance of the new classified framework is significantly better in terms of accuracy. Furthermore, the framework can be considered in a catchment with fewer historical floods.
Constructing better classifier ensemble based on weighted accuracy and diversity measure.
Zeng, Xiaodong; Wong, Derek F; Chao, Lidia S
2014-01-01
A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases.
Constructing Better Classifier Ensemble Based on Weighted Accuracy and Diversity Measure
Chao, Lidia S.
2014-01-01
A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases. PMID:24672402
Pérez-Castillo, Yunierkis; Lazar, Cosmin; Taminau, Jonatan; Froeyen, Mathy; Cabrera-Pérez, Miguel Ángel; Nowé, Ann
2012-09-24
Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.
Duan, Li; Guo, Long; Liu, Ke; Liu, E-Hu; Li, Ping
2014-04-25
Citrus herbs have been widely used in traditional medicine and cuisine in China and other countries since the ancient time. However, the authentication and quality control of Citrus herbs has always been a challenging task due to their similar morphological characteristics and the diversity of the multi-components existed in the complicated matrix. In the present investigation, we developed a novel strategy to characterize and classify seven Citrus herbs based on chromatographic analysis and chemometric methods. Firstly, the chemical constituents in seven Citrus herbs were globally characterized by liquid chromatography combined with quadrupole time-of-flight mass spectrometry (LC-QTOF-MS). Based on their retention time, UV spectra and MS fragmentation behavior, a total of 75 compounds were identified or tentatively characterized in these herbal medicines. Secondly, a segmental monitoring method based on LC-variable wavelength detection was developed for simultaneous quantification of ten marker compounds in these Citrus herbs. Thirdly, based on the contents of the ten analytes, genetic algorithm optimized support vector machines (GA-SVM) was employed to differentiate and classify the 64 samples covering these seven herbs. The obtained classifier showed good prediction performance and the overall prediction accuracy reached 96.88%. The proposed strategy is expected to provide new insight for authentication and quality control of traditional herbs. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Karkra, Rashmi; Kumar, Prashant; Bansod, Baban K. S.; Bagchi, Sudeshna; Sharma, Pooja; Krishna, C. Rama
2017-11-01
Access to potable water for the common people is one of the most challenging tasks in the present era. Contamination of drinking water has become a serious problem due to various anthropogenic and geogenic events. The paper demonstrates the application of evolutionary algorithms, viz., particle swan optimization and genetic algorithm to 24 water samples containing eight different heavy metal ions (Cd, Cu, Co, Pb, Zn, Ar, Cr and Ni) for the optimal estimation of electrode and frequency to classify the heavy metal ions. The work has been carried out on multi-variate data, viz., single electrode multi-frequency, single frequency multi-electrode and multi-frequency multi-electrode water samples. The electrodes used are platinum, gold, silver nanoparticles and glassy carbon electrodes. Various hazardous metal ions present in the water samples have been optimally classified and validated by the application of Davis Bouldin index. Such studies are useful in the segregation of hazardous heavy metal ions found in water resources, thereby quantifying the degree of water quality.
Wu, Jianfa; Peng, Dahao; Li, Zhuping; Zhao, Li; Ling, Huanzhang
2015-01-01
To effectively and accurately detect and classify network intrusion data, this paper introduces a general regression neural network (GRNN) based on the artificial immune algorithm with elitist strategies (AIAE). The elitist archive and elitist crossover were combined with the artificial immune algorithm (AIA) to produce the AIAE-GRNN algorithm, with the aim of improving its adaptivity and accuracy. In this paper, the mean square errors (MSEs) were considered the affinity function. The AIAE was used to optimize the smooth factors of the GRNN; then, the optimal smooth factor was solved and substituted into the trained GRNN. Thus, the intrusive data were classified. The paper selected a GRNN that was separately optimized using a genetic algorithm (GA), particle swarm optimization (PSO), and fuzzy C-mean clustering (FCM) to enable a comparison of these approaches. As shown in the results, the AIAE-GRNN achieves a higher classification accuracy than PSO-GRNN, but the running time of AIAE-GRNN is long, which was proved first. FCM and GA-GRNN were eliminated because of their deficiencies in terms of accuracy and convergence. To improve the running speed, the paper adopted principal component analysis (PCA) to reduce the dimensions of the intrusive data. With the reduction in dimensionality, the PCA-AIAE-GRNN decreases in accuracy less and has better convergence than the PCA-PSO-GRNN, and the running speed of the PCA-AIAE-GRNN was relatively improved. The experimental results show that the AIAE-GRNN has a higher robustness and accuracy than the other algorithms considered and can thus be used to classify the intrusive data.
Comparison of algorithms for the detection of cancer-drivers at sub-gene resolution
Porta-Pardo, Eduard; Kamburov, Atanas; Tamborero, David; Pons, Tirso; Grases, Daniela; Valencia, Alfonso; Lopez-Bigas, Nuria; Getz, Gad; Godzik, Adam
2018-01-01
Understanding genetic events that lead to cancer initiation and progression remains one of the biggest challenges in cancer biology. Traditionally most algorithms for cancer driver identification look for genes that have more mutations than expected from the average background mutation rate. However, there is now a wide variety of methods that look for non-random distribution of mutations within proteins as a signal they have a driving role in cancer. Here we classify and review the progress of such sub-gene resolution algorithms, compare their findings on four distinct cancer datasets from The Cancer Genome Atlas and discuss how predictions from these algorithms can be interpreted in the emerging paradigms that challenge the simple dichotomy between driver and passenger genes. PMID:28714987
Automated global structure extraction for effective local building block processing in XCS.
Butz, Martin V; Pelikan, Martin; Llorà, Xavier; Goldberg, David E
2006-01-01
Learning Classifier Systems (LCSs), such as the accuracy-based XCS, evolve distributed problem solutions represented by a population of rules. During evolution, features are specialized, propagated, and recombined to provide increasingly accurate subsolutions. Recently, it was shown that, as in conventional genetic algorithms (GAs), some problems require efficient processing of subsets of features to find problem solutions efficiently. In such problems, standard variation operators of genetic and evolutionary algorithms used in LCSs suffer from potential disruption of groups of interacting features, resulting in poor performance. This paper introduces efficient crossover operators to XCS by incorporating techniques derived from competent GAs: the extended compact GA (ECGA) and the Bayesian optimization algorithm (BOA). Instead of simple crossover operators such as uniform crossover or one-point crossover, ECGA or BOA-derived mechanisms are used to build a probabilistic model of the global population and to generate offspring classifiers locally using the model. Several offspring generation variations are introduced and evaluated. The results show that it is possible to achieve performance similar to runs with an informed crossover operator that is specifically designed to yield ideal problem-dependent exploration, exploiting provided problem structure information. Thus, we create the first competent LCSs, XCS/ECGA and XCS/BOA, that detect dependency structures online and propagate corresponding lower-level dependency structures effectively without any information about these structures given in advance.
Comparison between genetic algorithm and self organizing map to detect botnet network traffic
NASA Astrophysics Data System (ADS)
Yugandhara Prabhakar, Shinde; Parganiha, Pratishtha; Madhu Viswanatham, V.; Nirmala, M.
2017-11-01
In Cyber Security world the botnet attacks are increasing. To detect botnet is a challenging task. Botnet is a group of computers connected in a coordinated fashion to do malicious activities. Many techniques have been developed and used to detect and prevent botnet traffic and the attacks. In this paper, a comparative study is done on Genetic Algorithm (GA) and Self Organizing Map (SOM) to detect the botnet network traffic. Both are soft computing techniques and used in this paper as data analytics system. GA is based on natural evolution process and SOM is an Artificial Neural Network type, uses unsupervised learning techniques. SOM uses neurons and classifies the data according to the neurons. Sample of KDD99 dataset is used as input to GA and SOM.
Bartsch, Georg; Mitra, Anirban P; Mitra, Sheetal A; Almal, Arpit A; Steven, Kenneth E; Skinner, Donald G; Fry, David W; Lenehan, Peter F; Worzel, William P; Cote, Richard J
2016-02-01
Due to the high recurrence risk of nonmuscle invasive urothelial carcinoma it is crucial to distinguish patients at high risk from those with indolent disease. In this study we used a machine learning algorithm to identify the genes in patients with nonmuscle invasive urothelial carcinoma at initial presentation that were most predictive of recurrence. We used the genes in a molecular signature to predict recurrence risk within 5 years after transurethral resection of bladder tumor. Whole genome profiling was performed on 112 frozen nonmuscle invasive urothelial carcinoma specimens obtained at first presentation on Human WG-6 BeadChips (Illumina®). A genetic programming algorithm was applied to evolve classifier mathematical models for outcome prediction. Cross-validation based resampling and gene use frequencies were used to identify the most prognostic genes, which were combined into rules used in a voting algorithm to predict the sample target class. Key genes were validated by quantitative polymerase chain reaction. The classifier set included 21 genes that predicted recurrence. Quantitative polymerase chain reaction was done for these genes in a subset of 100 patients. A 5-gene combined rule incorporating a voting algorithm yielded 77% sensitivity and 85% specificity to predict recurrence in the training set, and 69% and 62%, respectively, in the test set. A singular 3-gene rule was constructed that predicted recurrence with 80% sensitivity and 90% specificity in the training set, and 71% and 67%, respectively, in the test set. Using primary nonmuscle invasive urothelial carcinoma from initial occurrences genetic programming identified transcripts in reproducible fashion, which were predictive of recurrence. These findings could potentially impact nonmuscle invasive urothelial carcinoma management. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
Jiang, Xiaoying; Wei, Rong; Zhao, Yanjun; Zhang, Tongliang
2008-05-01
The knowledge of subnuclear localization in eukaryotic cells is essential for understanding the life function of nucleus. Developing prediction methods and tools for proteins subnuclear localization become important research fields in protein science for special characteristics in cell nuclear. In this study, a novel approach has been proposed to predict protein subnuclear localization. Sample of protein is represented by Pseudo Amino Acid (PseAA) composition based on approximate entropy (ApEn) concept, which reflects the complexity of time series. A novel ensemble classifier is designed incorporating three AdaBoost classifiers. The base classifier algorithms in three AdaBoost are decision stumps, fuzzy K nearest neighbors classifier, and radial basis-support vector machines, respectively. Different PseAA compositions are used as input data of different AdaBoost classifier in ensemble. Genetic algorithm is used to optimize the dimension and weight factor of PseAA composition. Two datasets often used in published works are used to validate the performance of the proposed approach. The obtained results of Jackknife cross-validation test are higher and more balance than them of other methods on same datasets. The promising results indicate that the proposed approach is effective and practical. It might become a useful tool in protein subnuclear localization. The software in Matlab and supplementary materials are available freely by contacting the corresponding author.
Galván-Tejada, Carlos E.; Zanella-Calzada, Laura A.; Galván-Tejada, Jorge I.; Celaya-Padilla, José M.; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L.
2017-01-01
Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions. PMID:28216571
Galván-Tejada, Carlos E; Zanella-Calzada, Laura A; Galván-Tejada, Jorge I; Celaya-Padilla, José M; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L
2017-02-14
Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions.
Instruction-matrix-based genetic programming.
Li, Gang; Wang, Jin Feng; Lee, Kin Hong; Leung, Kwong-Sak
2008-08-01
In genetic programming (GP), evolving tree nodes separately would reduce the huge solution space. However, tree nodes are highly interdependent with respect to their fitness. In this paper, we propose a new GP framework, namely, instruction-matrix (IM)-based GP (IMGP), to handle their interactions. IMGP maintains an IM to evolve tree nodes and subtrees separately. IMGP extracts program trees from an IM and updates the IM with the information of the extracted program trees. As the IM actually keeps most of the information of the schemata of GP and evolves the schemata directly, IMGP is effective and efficient. Our experimental results on benchmark problems have verified that IMGP is not only better than those of canonical GP in terms of the qualities of the solutions and the number of program evaluations, but they are also better than some of the related GP algorithms. IMGP can also be used to evolve programs for classification problems. The classifiers obtained have higher classification accuracies than four other GP classification algorithms on four benchmark classification problems. The testing errors are also comparable to or better than those obtained with well-known classifiers. Furthermore, an extended version, called condition matrix for rule learning, has been used successfully to handle multiclass classification problems.
NASA Astrophysics Data System (ADS)
Zhang, Aizhu; Sun, Genyun; Wang, Zhenjie
2015-12-01
The serious information redundancy in hyperspectral images (HIs) cannot contribute to the data analysis accuracy, instead it require expensive computational resources. Consequently, to identify the most useful and valuable information from the HIs, thereby improve the accuracy of data analysis, this paper proposed a novel hyperspectral band selection method using the hybrid genetic algorithm and gravitational search algorithm (GA-GSA). In the proposed method, the GA-GSA is mapped to the binary space at first. Then, the accuracy of the support vector machine (SVM) classifier and the number of selected spectral bands are utilized to measure the discriminative capability of the band subset. Finally, the band subset with the smallest number of spectral bands as well as covers the most useful and valuable information is obtained. To verify the effectiveness of the proposed method, studies conducted on an AVIRIS image against two recently proposed state-of-the-art GSA variants are presented. The experimental results revealed the superiority of the proposed method and indicated that the method can indeed considerably reduce data storage costs and efficiently identify the band subset with stable and high classification precision.
[The application of gene expression programming in the diagnosis of heart disease].
Dai, Wenbin; Zhang, Yuntao; Gao, Xingyu
2009-02-01
GEP (Gene expression programming) is a new genetic algorithm, and it has been proved to be excellent in function finding. In this paper, for the purpose of setting up a diagnostic model, GEP is used to deal with the data of heart disease. Eight variables, Sex, Chest pain, Blood pressure, Angina, Peak, Slope, Colored vessels and Thal, are picked out of thirteen variables to form a classified function. This function is used to predict a forecasting set of 100 samples, and the accuracy is 87%. Other algorithms such as SVM (Support vector machine) are applied to the same data and the forecasting results show that GEP is better than other algorithms.
Design of Clinical Support Systems Using Integrated Genetic Algorithm and Support Vector Machine
NASA Astrophysics Data System (ADS)
Chen, Yung-Fu; Huang, Yung-Fa; Jiang, Xiaoyi; Hsu, Yuan-Nian; Lin, Hsuan-Hung
Clinical decision support system (CDSS) provides knowledge and specific information for clinicians to enhance diagnostic efficiency and improving healthcare quality. An appropriate CDSS can highly elevate patient safety, improve healthcare quality, and increase cost-effectiveness. Support vector machine (SVM) is believed to be superior to traditional statistical and neural network classifiers. However, it is critical to determine suitable combination of SVM parameters regarding classification performance. Genetic algorithm (GA) can find optimal solution within an acceptable time, and is faster than greedy algorithm with exhaustive searching strategy. By taking the advantage of GA in quickly selecting the salient features and adjusting SVM parameters, a method using integrated GA and SVM (IGS), which is different from the traditional method with GA used for feature selection and SVM for classification, was used to design CDSSs for prediction of successful ventilation weaning, diagnosis of patients with severe obstructive sleep apnea, and discrimination of different cell types form Pap smear. The results show that IGS is better than methods using SVM alone or linear discriminator.
Khotanlou, Hassan; Afrasiabi, Mahlagha
2012-10-01
This paper presents a new feature selection approach for automatically extracting multiple sclerosis (MS) lesions in three-dimensional (3D) magnetic resonance (MR) images. Presented method is applicable to different types of MS lesions. In this method, T1, T2, and fluid attenuated inversion recovery (FLAIR) images are firstly preprocessed. In the next phase, effective features to extract MS lesions are selected by using a genetic algorithm (GA). The fitness function of the GA is the Similarity Index (SI) of a support vector machine (SVM) classifier. The results obtained on different types of lesions have been evaluated by comparison with manual segmentations. This algorithm is evaluated on 15 real 3D MR images using several measures. As a result, the SI between MS regions determined by the proposed method and radiologists was 87% on average. Experiments and comparisons with other methods show the effectiveness and the efficiency of the proposed approach.
Wu, Jianfa; Peng, Dahao; Li, Zhuping; Zhao, Li; Ling, Huanzhang
2015-01-01
To effectively and accurately detect and classify network intrusion data, this paper introduces a general regression neural network (GRNN) based on the artificial immune algorithm with elitist strategies (AIAE). The elitist archive and elitist crossover were combined with the artificial immune algorithm (AIA) to produce the AIAE-GRNN algorithm, with the aim of improving its adaptivity and accuracy. In this paper, the mean square errors (MSEs) were considered the affinity function. The AIAE was used to optimize the smooth factors of the GRNN; then, the optimal smooth factor was solved and substituted into the trained GRNN. Thus, the intrusive data were classified. The paper selected a GRNN that was separately optimized using a genetic algorithm (GA), particle swarm optimization (PSO), and fuzzy C-mean clustering (FCM) to enable a comparison of these approaches. As shown in the results, the AIAE-GRNN achieves a higher classification accuracy than PSO-GRNN, but the running time of AIAE-GRNN is long, which was proved first. FCM and GA-GRNN were eliminated because of their deficiencies in terms of accuracy and convergence. To improve the running speed, the paper adopted principal component analysis (PCA) to reduce the dimensions of the intrusive data. With the reduction in dimensionality, the PCA-AIAE-GRNN decreases in accuracy less and has better convergence than the PCA-PSO-GRNN, and the running speed of the PCA-AIAE-GRNN was relatively improved. The experimental results show that the AIAE-GRNN has a higher robustness and accuracy than the other algorithms considered and can thus be used to classify the intrusive data. PMID:25807466
Arshad, Sannia; Rho, Seungmin
2014-01-01
We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes. PMID:25295302
Khalid, Shehzad; Arshad, Sannia; Jabbar, Sohail; Rho, Seungmin
2014-01-01
We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes.
Kesharaju, Manasa; Nagarajah, Romesh
2015-09-01
The motivation for this research stems from a need for providing a non-destructive testing method capable of detecting and locating any defects and microstructural variations within armour ceramic components before issuing them to the soldiers who rely on them for their survival. The development of an automated ultrasonic inspection based classification system would make possible the checking of each ceramic component and immediately alert the operator about the presence of defects. Generally, in many classification problems a choice of features or dimensionality reduction is significant and simultaneously very difficult, as a substantial computational effort is required to evaluate possible feature subsets. In this research, a combination of artificial neural networks and genetic algorithms are used to optimize the feature subset used in classification of various defects in reaction-sintered silicon carbide ceramic components. Initially wavelet based feature extraction is implemented from the region of interest. An Artificial Neural Network classifier is employed to evaluate the performance of these features. Genetic Algorithm based feature selection is performed. Principal Component Analysis is a popular technique used for feature selection and is compared with the genetic algorithm based technique in terms of classification accuracy and selection of optimal number of features. The experimental results confirm that features identified by Principal Component Analysis lead to improved performance in terms of classification percentage with 96% than Genetic algorithm with 94%. Copyright © 2015 Elsevier B.V. All rights reserved.
Al-Rajab, Murad; Lu, Joan; Xu, Qiang
2017-07-01
This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society. Copyright © 2017 Elsevier B.V. All rights reserved.
Concurrent approach for evolving compact decision rule sets
NASA Astrophysics Data System (ADS)
Marmelstein, Robert E.; Hammack, Lonnie P.; Lamont, Gary B.
1999-02-01
The induction of decision rules from data is important to many disciplines, including artificial intelligence and pattern recognition. To improve the state of the art in this area, we introduced the genetic rule and classifier construction environment (GRaCCE). It was previously shown that GRaCCE consistently evolved decision rule sets from data, which were significantly more compact than those produced by other methods (such as decision tree algorithms). The primary disadvantage of GRaCCe, however, is its relatively poor run-time execution performance. In this paper, a concurrent version of the GRaCCE architecture is introduced, which improves the efficiency of the original algorithm. A prototype of the algorithm is tested on an in- house parallel processor configuration and the results are discussed.
NASA Astrophysics Data System (ADS)
Srivastava, D. C.
2016-12-01
A Genetic Algorithm Method for Direct estimation of paleostress states from heterogeneous fault-slip observationsDeepak C. Srivastava, Prithvi Thakur and Pravin K. GuptaDepartment of Earth Sciences, Indian Institute of Technology Roorkee, Roorkee 247667, India. Abstract Paleostress estimation from a group of heterogeneous fault-slip observations entails first the classification of the observations into homogeneous fault sets and then a separate inversion of each homogeneous set. This study combines these two issues into a nonlinear inverse problem and proposes a heuristic search method that inverts the heterogeneous fault-slip observations. The method estimates different paleostress states in a group of heterogeneous fault-slip observations and classifies it into homogeneous sets as a byproduct. It uses the genetic algorithm operators, elitism, selection, encoding, crossover and mutation. These processes translate into a guided search that finds successively fitter solutions and operate iteratively until the termination criteria is met and the globally fittest stress tensors are obtained. We explain the basic steps of the algorithm on a working example and demonstrate validity of the method on several synthetic and a natural group of heterogeneous fault-slip observations. The method is independent of any user-defined bias or any entrapment of solution in a local optimum. It succeeds even in the difficult situations where other classification methods are found to fail.
Intelligent fuzzy approach for fast fractal image compression
NASA Astrophysics Data System (ADS)
Nodehi, Ali; Sulong, Ghazali; Al-Rodhaan, Mznah; Al-Dhelaan, Abdullah; Rehman, Amjad; Saba, Tanzila
2014-12-01
Fractal image compression (FIC) is recognized as a NP-hard problem, and it suffers from a high number of mean square error (MSE) computations. In this paper, a two-phase algorithm was proposed to reduce the MSE computation of FIC. In the first phase, based on edge property, range and domains are arranged. In the second one, imperialist competitive algorithm (ICA) is used according to the classified blocks. For maintaining the quality of the retrieved image and accelerating algorithm operation, we divided the solutions into two groups: developed countries and undeveloped countries. Simulations were carried out to evaluate the performance of the developed approach. Promising results thus achieved exhibit performance better than genetic algorithm (GA)-based and Full-search algorithms in terms of decreasing the number of MSE computations. The number of MSE computations was reduced by the proposed algorithm for 463 times faster compared to the Full-search algorithm, although the retrieved image quality did not have a considerable change.
Wen, Tingxi; Zhang, Zhongnan
2017-01-01
Abstract In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy. PMID:28489789
Wen, Tingxi; Zhang, Zhongnan
2017-05-01
In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy.
Information Gain Based Dimensionality Selection for Classifying Text Documents
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dumidu Wijayasekara; Milos Manic; Miles McQueen
2013-06-01
Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexitymore » is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods.« less
Accurate crop classification using hierarchical genetic fuzzy rule-based systems
NASA Astrophysics Data System (ADS)
Topaloglou, Charalampos A.; Mylonas, Stelios K.; Stavrakoudis, Dimitris G.; Mastorocostas, Paris A.; Theocharis, John B.
2014-10-01
This paper investigates the effectiveness of an advanced classification system for accurate crop classification using very high resolution (VHR) satellite imagery. Specifically, a recently proposed genetic fuzzy rule-based classification system (GFRBCS) is employed, namely, the Hierarchical Rule-based Linguistic Classifier (HiRLiC). HiRLiC's model comprises a small set of simple IF-THEN fuzzy rules, easily interpretable by humans. One of its most important attributes is that its learning algorithm requires minimum user interaction, since the most important learning parameters affecting the classification accuracy are determined by the learning algorithm automatically. HiRLiC is applied in a challenging crop classification task, using a SPOT5 satellite image over an intensively cultivated area in a lake-wetland ecosystem in northern Greece. A rich set of higher-order spectral and textural features is derived from the initial bands of the (pan-sharpened) image, resulting in an input space comprising 119 features. The experimental analysis proves that HiRLiC compares favorably to other interpretable classifiers of the literature, both in terms of structural complexity and classification accuracy. Its testing accuracy was very close to that obtained by complex state-of-the-art classification systems, such as the support vector machines (SVM) and random forest (RF) classifiers. Nevertheless, visual inspection of the derived classification maps shows that HiRLiC is characterized by higher generalization properties, providing more homogeneous classifications that the competitors. Moreover, the runtime requirements for producing the thematic map was orders of magnitude lower than the respective for the competitors.
Interactive searching of facial image databases
NASA Astrophysics Data System (ADS)
Nicholls, Robert A.; Shepherd, John W.; Shepherd, Jean
1995-09-01
A set of psychological facial descriptors has been devised to enable computerized searching of criminal photograph albums. The descriptors have been used to encode image databased of up to twelve thousand images. Using a system called FACES, the databases are searched by translating a witness' verbal description into corresponding facial descriptors. Trials of FACES have shown that this coding scheme is more productive and efficient than searching traditional photograph albums. An alternative method of searching the encoded database using a genetic algorithm is currenly being tested. The genetic search method does not require the witness to verbalize a description of the target but merely to indicate a degree of similarity between the target and a limited selection of images from the database. The major drawback of FACES is that is requires a manual encoding of images. Research is being undertaken to automate the process, however, it will require an algorithm which can predict human descriptive values. Alternatives to human derived coding schemes exist using statistical classifications of images. Since databases encoded using statistical classifiers do not have an obvious direct mapping to human derived descriptors, a search method which does not require the entry of human descriptors is required. A genetic search algorithm is being tested for such a purpose.
A Functional-Genetic Scheme for Seizure Forecasting in Canine Epilepsy.
Bou Assi, Elie; Nguyen, Dang K; Rihana, Sandy; Sawan, Mohamad
2018-06-01
The objective of this work is the development of an accurate seizure forecasting algorithm that considers brain's functional connectivity for electrode selection. We start by proposing Kmeans-directed transfer function, an adaptive functional connectivity method intended for seizure onset zone localization in bilateral intracranial EEG recordings. Electrodes identified as seizure activity sources and sinks are then used to implement a seizure-forecasting algorithm on long-term continuous recordings in dogs with naturally-occurring epilepsy. A precision-recall genetic algorithm is proposed for feature selection in line with a probabilistic support vector machine classifier. Epileptic activity generators were focal in all dogs confirming the diagnosis of focal epilepsy in these animals while sinks spanned both hemispheres in 2 of 3 dogs. Seizure forecasting results show performance improvement compared to previous studies, achieving average sensitivity of 84.82% and time in warning of 0.1. Achieved performances highlight the feasibility of seizure forecasting in canine epilepsy. The ability to improve seizure forecasting provides promise for the development of EEG-triggered closed-loop seizure intervention systems for ambulatory implantation in patients with refractory epilepsy.
Salari, Nader; Shohaimi, Shamarina; Najafi, Farid; Nallappan, Meenakshii; Karishnarajah, Isthrinayagy
2014-01-01
Among numerous artificial intelligence approaches, k-Nearest Neighbor algorithms, genetic algorithms, and artificial neural networks are considered as the most common and effective methods in classification problems in numerous studies. In the present study, the results of the implementation of a novel hybrid feature selection-classification model using the above mentioned methods are presented. The purpose is benefitting from the synergies obtained from combining these technologies for the development of classification models. Such a combination creates an opportunity to invest in the strength of each algorithm, and is an approach to make up for their deficiencies. To develop proposed model, with the aim of obtaining the best array of features, first, feature ranking techniques such as the Fisher's discriminant ratio and class separability criteria were used to prioritize features. Second, the obtained results that included arrays of the top-ranked features were used as the initial population of a genetic algorithm to produce optimum arrays of features. Third, using a modified k-Nearest Neighbor method as well as an improved method of backpropagation neural networks, the classification process was advanced based on optimum arrays of the features selected by genetic algorithms. The performance of the proposed model was compared with thirteen well-known classification models based on seven datasets. Furthermore, the statistical analysis was performed using the Friedman test followed by post-hoc tests. The experimental findings indicated that the novel proposed hybrid model resulted in significantly better classification performance compared with all 13 classification methods. Finally, the performance results of the proposed model was benchmarked against the best ones reported as the state-of-the-art classifiers in terms of classification accuracy for the same data sets. The substantial findings of the comprehensive comparative study revealed that performance of the proposed model in terms of classification accuracy is desirable, promising, and competitive to the existing state-of-the-art classification models. PMID:25419659
Amaral, Jorge L M; Lopes, Agnaldo J; Jansen, José M; Faria, Alvaro C D; Melo, Pedro L
2013-12-01
The purpose of this study was to develop an automatic classifier to increase the accuracy of the forced oscillation technique (FOT) for diagnosing early respiratory abnormalities in smoking patients. The data consisted of FOT parameters obtained from 56 volunteers, 28 healthy and 28 smokers with low tobacco consumption. Many supervised learning techniques were investigated, including logistic linear classifiers, k nearest neighbor (KNN), neural networks and support vector machines (SVM). To evaluate performance, the ROC curve of the most accurate parameter was established as baseline. To determine the best input features and classifier parameters, we used genetic algorithms and a 10-fold cross-validation using the average area under the ROC curve (AUC). In the first experiment, the original FOT parameters were used as input. We observed a significant improvement in accuracy (KNN=0.89 and SVM=0.87) compared with the baseline (0.77). The second experiment performed a feature selection on the original FOT parameters. This selection did not cause any significant improvement in accuracy, but it was useful in identifying more adequate FOT parameters. In the third experiment, we performed a feature selection on the cross products of the FOT parameters. This selection resulted in a further increase in AUC (KNN=SVM=0.91), which allows for high diagnostic accuracy. In conclusion, machine learning classifiers can help identify early smoking-induced respiratory alterations. The use of FOT cross products and the search for the best features and classifier parameters can markedly improve the performance of machine learning classifiers. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Sudha, M
2017-09-27
As a recent trend, various computational intelligence and machine learning approaches have been used for mining inferences hidden in the large clinical databases to assist the clinician in strategic decision making. In any target data the irrelevant information may be detrimental, causing confusion for the mining algorithm and degrades the prediction outcome. To address this issue, this study attempts to identify an intelligent approach to assist disease diagnostic procedure using an optimal set of attributes instead of all attributes present in the clinical data set. In this proposed Application Specific Intelligent Computing (ASIC) decision support system, a rough set based genetic algorithm is employed in pre-processing phase and a back propagation neural network is applied in training and testing phase. ASIC has two phases, the first phase handles outliers, noisy data, and missing values to obtain a qualitative target data to generate appropriate attribute reduct sets from the input data using rough computing based genetic algorithm centred on a relative fitness function measure. The succeeding phase of this system involves both training and testing of back propagation neural network classifier on the selected reducts. The model performance is evaluated with widely adopted existing classifiers. The proposed ASIC system for clinical decision support has been tested with breast cancer, fertility diagnosis and heart disease data set from the University of California at Irvine (UCI) machine learning repository. The proposed system outperformed the existing approaches attaining the accuracy rate of 95.33%, 97.61%, and 93.04% for breast cancer, fertility issue and heart disease diagnosis.
Ozcift, Akin; Gulten, Arif
2011-12-01
Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature. While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC). Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases. RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Dogan, Meeshanthini V; Grumbach, Isabella M; Michaelson, Jacob J; Philibert, Robert A
2018-01-01
An improved method for detecting coronary heart disease (CHD) could have substantial clinical impact. Building on the idea that systemic effects of CHD risk factors are a conglomeration of genetic and environmental factors, we use machine learning techniques and integrate genetic, epigenetic and phenotype data from the Framingham Heart Study to build and test a Random Forest classification model for symptomatic CHD. Our classifier was trained on n = 1,545 individuals and consisted of four DNA methylation sites, two SNPs, age and gender. The methylation sites and SNPs were selected during the training phase. The final trained model was then tested on n = 142 individuals. The test data comprised of individuals removed based on relatedness to those in the training dataset. This integrated classifier was capable of classifying symptomatic CHD status of those in the test set with an accuracy, sensitivity and specificity of 78%, 0.75 and 0.80, respectively. In contrast, a model using only conventional CHD risk factors as predictors had an accuracy and sensitivity of only 65% and 0.42, respectively, but with a specificity of 0.89 in the test set. Regression analyses of the methylation signatures illustrate our ability to map these signatures to known risk factors in CHD pathogenesis. These results demonstrate the capability of an integrated approach to effectively model symptomatic CHD status. These results also suggest that future studies of biomaterial collected from longitudinally informative cohorts that are specifically characterized for cardiac disease at follow-up could lead to the introduction of sensitive, readily employable integrated genetic-epigenetic algorithms for predicting onset of future symptomatic CHD.
Mahrooghy, Majid; Ashraf, Ahmed B; Daye, Dania; Mies, Carolyn; Feldman, Michael; Rosen, Mark; Kontos, Despina
2013-01-01
Breast tumors are heterogeneous lesions. Intra-tumor heterogeneity presents a major challenge for cancer diagnosis and treatment. Few studies have worked on capturing tumor heterogeneity from imaging. Most studies to date consider aggregate measures for tumor characterization. In this work we capture tumor heterogeneity by partitioning tumor pixels into subregions and extracting heterogeneity wavelet kinetic (HetWave) features from breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to obtain the spatiotemporal patterns of the wavelet coefficients and contrast agent uptake from each partition. Using a genetic algorithm for feature selection, and a logistic regression classifier with leave one-out cross validation, we tested our proposed HetWave features for the task of classifying breast cancer recurrence risk. The classifier based on our features gave an ROC AUC of 0.78, outperforming previously proposed kinetic, texture, and spatial enhancement variance features which give AUCs of 0.69, 0.64, and 0.65, respectively.
Liu, Aiming; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi
2017-01-01
Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain–computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain–computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain–computer interface systems. PMID:29117100
Liu, Aiming; Chen, Kun; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi
2017-11-08
Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain-computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain-computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain-computer interface systems.
Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures.
Liu, Bo; Tian, Meihong; Zhang, Chunhua; Li, Xiangtao
2015-04-01
Biomarker discovery from high-dimensional data is a complex task in the development of efficient cancer diagnoses and classification. However, these data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a discrete biogeography based optimization is proposed to select the good subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the fisher-markov selector is used to choose fixed number of gene data. Secondly, to make biogeography based optimization suitable for the feature selection problem; discrete migration model and discrete mutation model are proposed to balance the exploration and exploitation ability. Then, discrete biogeography based optimization, as we called DBBO, is proposed by integrating discrete migration model and discrete mutation model. Finally, the DBBO method is used for feature selection, and three classifiers are used as the classifier with the 10 fold cross-validation method. In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on four breast cancer dataset benchmarks. Comparison with genetic algorithm, particle swarm optimization, differential evolution algorithm and hybrid biogeography based optimization, experimental results demonstrate that the proposed method is better or at least comparable with previous method from literature when considering the quality of the solutions obtained. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xi, T; Jones, I M; Mohrenweiser, H W
2003-11-03
Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of themore » variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.« less
Patients classification on weaning trials using neural networks and wavelet transform.
Arizmendi, Carlos; Viviescas, Juan; González, Hernando; Giraldo, Beatriz
2014-01-01
The determination of the optimal time of the patients in weaning trial process from mechanical ventilation, between patients capable of maintaining spontaneous breathing and patients that fail to maintain spontaneous breathing, is a very important task in intensive care unit. Wavelet Transform (WT) and Neural Networks (NN) techniques were applied in order to develop a classifier for the study of patients on weaning trial process. The respiratory pattern of each patient was characterized through different time series. Genetic Algorithms (GA) and Forward Selection were used as feature selection techniques. A classification performance of 77.00±0.06% of well classified patients, was obtained using a NN and GA combination, with only 6 variables of the 14 initials.
Applications of colored petri net and genetic algorithms to cluster tool scheduling
NASA Astrophysics Data System (ADS)
Liu, Tung-Kuan; Kuo, Chih-Jen; Hsiao, Yung-Chin; Tsai, Jinn-Tsong; Chou, Jyh-Horng
2005-12-01
In this paper, we propose a method, which uses Coloured Petri Net (CPN) and genetic algorithm (GA) to obtain an optimal deadlock-free schedule and to solve re-entrant problem for the flexible process of the cluster tool. The process of the cluster tool for producing a wafer usually can be classified into three types: 1) sequential process, 2) parallel process, and 3) sequential parallel process. But these processes are not economical enough to produce a variety of wafers in small volume. Therefore, this paper will propose the flexible process where the operations of fabricating wafers are randomly arranged to achieve the best utilization of the cluster tool. However, the flexible process may have deadlock and re-entrant problems which can be detected by CPN. On the other hand, GAs have been applied to find the optimal schedule for many types of manufacturing processes. Therefore, we successfully integrate CPN and GAs to obtain an optimal schedule with the deadlock and re-entrant problems for the flexible process of the cluster tool.
Semantic segmentation of mFISH images using convolutional networks.
Pardo, Esteban; Morgado, José Mário T; Malpica, Norberto
2018-04-30
Multicolor in situ hybridization (mFISH) is a karyotyping technique used to detect major chromosomal alterations using fluorescent probes and imaging techniques. Manual interpretation of mFISH images is a time consuming step that can be automated using machine learning; in previous works, pixel or patch wise classification was employed, overlooking spatial information which can help identify chromosomes. In this work, we propose a fully convolutional semantic segmentation network for the interpretation of mFISH images, which uses both spatial and spectral information to classify each pixel in an end-to-end fashion. The semantic segmentation network developed was tested on samples extracted from a public dataset using cross validation. Despite having no labeling information of the image it was tested on, our algorithm yielded an average correct classification ratio (CCR) of 87.41%. Previously, this level of accuracy was only achieved with state of the art algorithms when classifying pixels from the same image in which the classifier has been trained. These results provide evidence that fully convolutional semantic segmentation networks may be employed in the computer aided diagnosis of genetic diseases with improved performance over the current image analysis methods. © 2018 International Society for Advancement of Cytometry. © 2018 International Society for Advancement of Cytometry.
Prediction of microRNA target genes using an efficient genetic algorithm-based decision tree.
Rabiee-Ghahfarrokhi, Behzad; Rafiei, Fariba; Niknafs, Ali Akbar; Zamani, Behzad
2015-01-01
MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression in almost all plants and animals. They play an important role in key processes, such as proliferation, apoptosis, and pathogen-host interactions. Nevertheless, the mechanisms by which miRNAs act are not fully understood. The first step toward unraveling the function of a particular miRNA is the identification of its direct targets. This step has shown to be quite challenging in animals primarily because of incomplete complementarities between miRNA and target mRNAs. In recent years, the use of machine-learning techniques has greatly increased the prediction of miRNA targets, avoiding the need for costly and time-consuming experiments to achieve miRNA targets experimentally. Among the most important machine-learning algorithms are decision trees, which classify data based on extracted rules. In the present work, we used a genetic algorithm in combination with C4.5 decision tree for prediction of miRNA targets. We applied our proposed method to a validated human datasets. We nearly achieved 93.9% accuracy of classification, which could be related to the selection of best rules.
Prediction of microRNA target genes using an efficient genetic algorithm-based decision tree
Rabiee-Ghahfarrokhi, Behzad; Rafiei, Fariba; Niknafs, Ali Akbar; Zamani, Behzad
2015-01-01
MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression in almost all plants and animals. They play an important role in key processes, such as proliferation, apoptosis, and pathogen–host interactions. Nevertheless, the mechanisms by which miRNAs act are not fully understood. The first step toward unraveling the function of a particular miRNA is the identification of its direct targets. This step has shown to be quite challenging in animals primarily because of incomplete complementarities between miRNA and target mRNAs. In recent years, the use of machine-learning techniques has greatly increased the prediction of miRNA targets, avoiding the need for costly and time-consuming experiments to achieve miRNA targets experimentally. Among the most important machine-learning algorithms are decision trees, which classify data based on extracted rules. In the present work, we used a genetic algorithm in combination with C4.5 decision tree for prediction of miRNA targets. We applied our proposed method to a validated human datasets. We nearly achieved 93.9% accuracy of classification, which could be related to the selection of best rules. PMID:26649272
Guidugli, Lucia; Shimelis, Hermela; Masica, David L; Pankratz, Vernon S; Lipton, Gary B; Singh, Namit; Hu, Chunling; Monteiro, Alvaro N A; Lindor, Noralane M; Goldgar, David E; Karchin, Rachel; Iversen, Edwin S; Couch, Fergus J
2018-01-17
Many variants of uncertain significance (VUS) have been identified in BRCA2 through clinical genetic testing. VUS pose a significant clinical challenge because the contribution of these variants to cancer risk has not been determined. We conducted a comprehensive assessment of VUS in the BRCA2 C-terminal DNA binding domain (DBD) by using a validated functional assay of BRCA2 homologous recombination (HR) DNA-repair activity and defined a classifier of variant pathogenicity. Among 139 variants evaluated, 54 had ≥99% probability of pathogenicity, and 73 had ≥95% probability of neutrality. Functional assay results were compared with predictions of variant pathogenicity from the Align-GVGD protein-sequence-based prediction algorithm, which has been used for variant classification. Relative to the HR assay, Align-GVGD significantly (p < 0.05) over-predicted pathogenic variants. We subsequently combined functional and Align-GVGD prediction results in a Bayesian hierarchical model (VarCall) to estimate the overall probability of pathogenicity for each VUS. In addition, to predict the effects of all other BRCA2 DBD variants and to prioritize variants for functional studies, we used the endoPhenotype-Optimized Sequence Ensemble (ePOSE) algorithm to train classifiers for BRCA2 variants by using data from the HR functional assay. Together, the results show that systematic functional assays in combination with in silico predictors of pathogenicity provide robust tools for clinical annotation of BRCA2 VUS. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Janaki Sathya, D.; Geetha, K.
2017-12-01
Automatic mass or lesion classification systems are developed to aid in distinguishing between malignant and benign lesions present in the breast DCE-MR images, the systems need to improve both the sensitivity and specificity of DCE-MR image interpretation in order to be successful for clinical use. A new classifier (a set of features together with a classification method) based on artificial neural networks trained using artificial fish swarm optimization (AFSO) algorithm is proposed in this paper. The basic idea behind the proposed classifier is to use AFSO algorithm for searching the best combination of synaptic weights for the neural network. An optimal set of features based on the statistical textural features is presented. The investigational outcomes of the proposed suspicious lesion classifier algorithm therefore confirm that the resulting classifier performs better than other such classifiers reported in the literature. Therefore this classifier demonstrates that the improvement in both the sensitivity and specificity are possible through automated image analysis.
Nankali, Saber; Torshabi, Ahmad Esmaili; Miandoab, Payam Samadi; Baghizadeh, Amin
2016-01-08
In external-beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation-based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two "Genetic" and "Ranker" searching procedures. The performance of these algorithms has been evaluated using four-dimensional extended cardiac-torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro-fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F-test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation-based feature selection algorithm, in combination with a genetic search algorithm, proved to yield best performance accuracy for selecting optimum markers.
NASA Technical Reports Server (NTRS)
Wang, Lui; Bayer, Steven E.
1991-01-01
Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.
Monga, Isha; Qureshi, Abid; Thakur, Nishant; Gupta, Amit Kumar; Kumar, Manoj
2017-01-01
Allele-specific siRNAs (ASP-siRNAs) have emerged as promising therapeutic molecules owing to their selectivity to inhibit the mutant allele or associated single-nucleotide polymorphisms (SNPs) sparing the expression of the wild-type counterpart. Thus, a dedicated bioinformatics platform encompassing updated ASP-siRNAs and an algorithm for the prediction of their inhibitory efficacy will be helpful in tackling currently intractable genetic disorders. In the present study, we have developed the ASPsiRNA resource (http://crdd.osdd.net/servers/aspsirna/) covering three components viz (i) ASPsiDb, (ii) ASPsiPred, and (iii) analysis tools like ASP-siOffTar. ASPsiDb is a manually curated database harboring 4543 (including 422 chemically modified) ASP-siRNAs targeting 78 unique genes involved in 51 different diseases. It furnishes comprehensive information from experimental studies on ASP-siRNAs along with multidimensional genetic and clinical information for numerous mutations. ASPsiPred is a two-layered algorithm to predict efficacy of ASP-siRNAs for fully complementary mutant (Effmut) and wild-type allele (Effwild) with one mismatch by ASPsiPredSVM and ASPsiPredmatrix, respectively. In ASPsiPredSVM, 922 unique ASP-siRNAs with experimentally validated quantitative Effmut were used. During 10-fold cross-validation (10nCV) employing various sequence features on the training/testing dataset (T737), the best predictive model achieved a maximum Pearson’s correlation coefficient (PCC) of 0.71. Further, the accuracy of the classifier to predict Effmut against novel genes was assessed by leave one target out cross-validation approach (LOTOCV). ASPsiPredmatrix was constructed from rule-based studies describing the effect of single siRNA:mRNA mismatches on the efficacy at 19 different locations of siRNA. Thus, ASPsiRNA encompasses the first database, prediction algorithm, and off-target analysis tool that is expected to accelerate research in the field of RNAi-based therapeutics for human genetic diseases. PMID:28696921
Automated phenotype pattern recognition of zebrafish for high-throughput screening.
Schutera, Mark; Dickmeis, Thomas; Mione, Marina; Peravali, Ravindra; Marcato, Daniel; Reischl, Markus; Mikut, Ralf; Pylatiuk, Christian
2016-07-03
Over the last years, the zebrafish (Danio rerio) has become a key model organism in genetic and chemical screenings. A growing number of experiments and an expanding interest in zebrafish research makes it increasingly essential to automatize the distribution of embryos and larvae into standard microtiter plates or other sample holders for screening, often according to phenotypical features. Until now, such sorting processes have been carried out by manually handling the larvae and manual feature detection. Here, a prototype platform for image acquisition together with a classification software is presented. Zebrafish embryos and larvae and their features such as pigmentation are detected automatically from the image. Zebrafish of 4 different phenotypes can be classified through pattern recognition at 72 h post fertilization (hpf), allowing the software to classify an embryo into 2 distinct phenotypic classes: wild-type versus variant. The zebrafish phenotypes are classified with an accuracy of 79-99% without any user interaction. A description of the prototype platform and of the algorithms for image processing and pattern recognition is presented.
Computerized system for recognition of autism on the basis of gene expression microarray data.
Latkowski, Tomasz; Osowski, Stanislaw
2015-01-01
The aim of this paper is to provide a means to recognize a case of autism using gene expression microarrays. The crucial task is to discover the most important genes which are strictly associated with autism. The paper presents an application of different methods of gene selection, to select the most representative input attributes for an ensemble of classifiers. The set of classifiers is responsible for distinguishing autism data from the reference class. Simultaneous application of a few gene selection methods enables analysis of the ill-conditioned gene expression matrix from different points of view. The results of selection combined with a genetic algorithm and SVM classifier have shown increased accuracy of autism recognition. Early recognition of autism is extremely important for treatment of children and increases the probability of their recovery and return to normal social communication. The results of this research can find practical application in early recognition of autism on the basis of gene expression microarray analysis. Copyright © 2014 Elsevier Ltd. All rights reserved.
Evolutionary and biological metaphors for engineering design
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jakiela, M.
1994-12-31
Since computing became generally available, there has been strong interest in using computers to assist and automate engineering design processes. Specifically, for design optimization and automation, nonlinear programming and artificial intelligence techniques have been extensively studied. New computational techniques, based upon the natural processes of evolution, adaptation, and learing, are showing promise because of their generality and robustness. This presentation will describe the use of two such techniques, genetic algorithms and classifier systems, for a variety of engineering design problems. Structural topology optimization, meshing, and general engineering optimization are shown as example applications.
Genetic Algorithms and Local Search
NASA Technical Reports Server (NTRS)
Whitley, Darrell
1996-01-01
The first part of this presentation is a tutorial level introduction to the principles of genetic search and models of simple genetic algorithms. The second half covers the combination of genetic algorithms with local search methods to produce hybrid genetic algorithms. Hybrid algorithms can be modeled within the existing theoretical framework developed for simple genetic algorithms. An application of a hybrid to geometric model matching is given. The hybrid algorithm yields results that improve on the current state-of-the-art for this problem.
Human emotion detector based on genetic algorithm using lip features
NASA Astrophysics Data System (ADS)
Brown, Terrence; Fetanat, Gholamreza; Homaifar, Abdollah; Tsou, Brian; Mendoza-Schrock, Olga
2010-04-01
We predicted human emotion using a Genetic Algorithm (GA) based lip feature extractor from facial images to classify all seven universal emotions of fear, happiness, dislike, surprise, anger, sadness and neutrality. First, we isolated the mouth from the input images using special methods, such as Region of Interest (ROI) acquisition, grayscaling, histogram equalization, filtering, and edge detection. Next, the GA determined the optimal or near optimal ellipse parameters that circumvent and separate the mouth into upper and lower lips. The two ellipses then went through fitness calculation and were followed by training using a database of Japanese women's faces expressing all seven emotions. Finally, our proposed algorithm was tested using a published database consisting of emotions from several persons. The final results were then presented in confusion matrices. Our results showed an accuracy that varies from 20% to 60% for each of the seven emotions. The errors were mainly due to inaccuracies in the classification, and also due to the different expressions in the given emotion database. Detailed analysis of these errors pointed to the limitation of detecting emotion based on the lip features alone. Similar work [1] has been done in the literature for emotion detection in only one person, we have successfully extended our GA based solution to include several subjects.
Novel probabilistic neuroclassifier
NASA Astrophysics Data System (ADS)
Hong, Jiang; Serpen, Gursel
2003-09-01
A novel probabilistic potential function neural network classifier algorithm to deal with classes which are multi-modally distributed and formed from sets of disjoint pattern clusters is proposed in this paper. The proposed classifier has a number of desirable properties which distinguish it from other neural network classifiers. A complete description of the algorithm in terms of its architecture and the pseudocode is presented. Simulation analysis of the newly proposed neuro-classifier algorithm on a set of benchmark problems is presented. Benchmark problems tested include IRIS, Sonar, Vowel Recognition, Two-Spiral, Wisconsin Breast Cancer, Cleveland Heart Disease and Thyroid Gland Disease. Simulation results indicate that the proposed neuro-classifier performs consistently better for a subset of problems for which other neural classifiers perform relatively poorly.
AdaBoost-based algorithm for network intrusion detection.
Hu, Weiming; Hu, Wei; Maybank, Steve
2008-04-01
Network intrusion detection aims at distinguishing the attacks on the Internet from normal use of the Internet. It is an indispensable part of the information security system. Due to the variety of network behaviors and the rapid development of attack fashions, it is necessary to develop fast machine-learning-based intrusion detection algorithms with high detection rates and low false-alarm rates. In this correspondence, we propose an intrusion detection algorithm based on the AdaBoost algorithm. In the algorithm, decision stumps are used as weak classifiers. The decision rules are provided for both categorical and continuous features. By combining the weak classifiers for continuous features and the weak classifiers for categorical features into a strong classifier, the relations between these two different types of features are handled naturally, without any forced conversions between continuous and categorical features. Adaptable initial weights and a simple strategy for avoiding overfitting are adopted to improve the performance of the algorithm. Experimental results show that our algorithm has low computational complexity and error rates, as compared with algorithms of higher computational complexity, as tested on the benchmark sample data.
Schmitz, Boris; De Maria, Renata; Gatsios, Dimitris; Chrysanthakopoulou, Theodora; Landolina, Maurizio; Gasparini, Maurizio; Campolo, Jonica; Parolini, Marina; Sanzo, Antonio; Galimberti, Paola; Bianchi, Michele; Lenders, Malte; Brand, Eva; Parodi, Oberdan; Lunati, Maurizio; Brand, Stefan-Martin
2014-12-01
Cardiac resynchronization therapy (CRT) can improve ventricular size, shape, and mass and reduce mitral regurgitation by reverse remodeling of the failing ventricle. About 30% of patients do not respond to this therapy for unknown reasons. In this study, we aimed at the identification and classification of CRT responder by the use of genetic variants and clinical parameters. Of 1421 CRT patients, 207 subjects were consecutively selected, and CRT responder and nonresponder were matched for their baseline parameters before CRT. Treatment success of CRT was defined as a decrease in left ventricular end-systolic volume >15% at follow-up echocardiography compared with left ventricular end-systolic volume at baseline. All other changes classified the patient as CRT nonresponder. A genetic association study was performed, which identified 4 genetic variants to be associated with the CRT responder phenotype at the allelic (P<0.035) and genotypic (P<0.031) level: rs3766031 (ATPIB1), rs5443 (GNB3), rs5522 (NR3C2), and rs7325635 (TNFSF11). Machine learning algorithms were used for the classification of CRT patients into responder and nonresponder status, including combinations of the identified genetic variants and clinical parameters. We demonstrated that rule induction algorithms can successfully be applied for the classification of heart failure patients in CRT responder and nonresponder status using clinical and genetic parameters. Our analysis included information on alleles and genotypes of 4 genetic loci, rs3766031 (ATPIB1), rs5443 (GNB3), rs5522 (NR3C2), and rs7325635 (TNFSF11), pathophysiologically associated with remodeling of the failing ventricle. © 2014 American Heart Association, Inc.
NASA Astrophysics Data System (ADS)
Tahernezhad-Javazm, Farajollah; Azimirad, Vahid; Shoaran, Maryam
2018-04-01
Objective. Considering the importance and the near-future development of noninvasive brain-machine interface (BMI) systems, this paper presents a comprehensive theoretical-experimental survey on the classification and evolutionary methods for BMI-based systems in which EEG signals are used. Approach. The paper is divided into two main parts. In the first part, a wide range of different types of the base and combinatorial classifiers including boosting and bagging classifiers and evolutionary algorithms are reviewed and investigated. In the second part, these classifiers and evolutionary algorithms are assessed and compared based on two types of relatively widely used BMI systems, sensory motor rhythm-BMI and event-related potentials-BMI. Moreover, in the second part, some of the improved evolutionary algorithms as well as bi-objective algorithms are experimentally assessed and compared. Main results. In this study two databases are used, and cross-validation accuracy (CVA) and stability to data volume (SDV) are considered as the evaluation criteria for the classifiers. According to the experimental results on both databases, regarding the base classifiers, linear discriminant analysis and support vector machines with respect to CVA evaluation metric, and naive Bayes with respect to SDV demonstrated the best performances. Among the combinatorial classifiers, four classifiers, Bagg-DT (bagging decision tree), LogitBoost, and GentleBoost with respect to CVA, and Bagging-LR (bagging logistic regression) and AdaBoost (adaptive boosting) with respect to SDV had the best performances. Finally, regarding the evolutionary algorithms, single-objective invasive weed optimization (IWO) and bi-objective nondominated sorting IWO algorithms demonstrated the best performances. Significance. We present a general survey on the base and the combinatorial classification methods for EEG signals (sensory motor rhythm and event-related potentials) as well as their optimization methods through the evolutionary algorithms. In addition, experimental and statistical significance tests are carried out to study the applicability and effectiveness of the reviewed methods.
Feature Selection and Effective Classifiers.
ERIC Educational Resources Information Center
Deogun, Jitender S.; Choubey, Suresh K.; Raghavan, Vijay V.; Sever, Hayri
1998-01-01
Develops and analyzes four algorithms for feature selection in the context of rough set methodology. Experimental results confirm the expected relationship between the time complexity of these algorithms and the classification accuracy of the resulting upper classifiers. When compared, results of upper classifiers perform better than lower…
Non-proliferative diabetic retinopathy symptoms detection and classification using neural network.
Al-Jarrah, Mohammad A; Shatnawi, Hadeel
2017-08-01
Diabetic retinopathy (DR) causes blindness in the working age for people with diabetes in most countries. The increasing number of people with diabetes worldwide suggests that DR will continue to be major contributors to vision loss. Early detection of retinopathy progress in individuals with diabetes is critical for preventing visual loss. Non-proliferative DR (NPDR) is an early stage of DR. Moreover, NPDR can be classified into mild, moderate and severe. This paper proposes a novel morphology-based algorithm for detecting retinal lesions and classifying each case. First, the proposed algorithm detects the three DR lesions, namely haemorrhages, microaneurysms and exudates. Second, we defined and extracted a set of features from detected lesions. The set of selected feature emulates what physicians looked for in classifying NPDR case. Finally, we designed an artificial neural network (ANN) classifier with three layers to classify NPDR to normal, mild, moderate and severe. Bayesian regularisation and resilient backpropagation algorithms are used to train ANN. The accuracy for the proposed classifiers based on Bayesian regularisation and resilient backpropagation algorithms are 96.6 and 89.9, respectively. The obtained results are compared with results of the recent published classifier. Our proposed classifier outperforms the best in terms of sensitivity and specificity.
Saini, Harsh; Lal, Sunil Pranit; Naidu, Vimal Vikash; Pickering, Vincel Wince; Singh, Gurmeet; Tsunoda, Tatsuhiko; Sharma, Alok
2016-12-05
High dimensional feature space generally degrades classification in several applications. In this paper, we propose a strategy called gene masking, in which non-contributing dimensions are heuristically removed from the data to improve classification accuracy. Gene masking is implemented via a binary encoded genetic algorithm that can be integrated seamlessly with classifiers during the training phase of classification to perform feature selection. It can also be used to discriminate between features that contribute most to the classification, thereby, allowing researchers to isolate features that may have special significance. This technique was applied on publicly available datasets whereby it substantially reduced the number of features used for classification while maintaining high accuracies. The proposed technique can be extremely useful in feature selection as it heuristically removes non-contributing features to improve the performance of classifiers.
Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes
2013-01-01
Motivation Multivariate quantitative traits arise naturally in recent neuroimaging genetics studies, in which both structural and functional variability of the human brain is measured non-invasively through techniques such as magnetic resonance imaging (MRI). There is growing interest in detecting genetic variants associated with such multivariate traits, especially in genome-wide studies. Random forests (RFs) classifiers, which are ensembles of decision trees, are amongst the best performing machine learning algorithms and have been successfully employed for the prioritisation of genetic variants in case-control studies. RFs can also be applied to produce gene rankings in association studies with multivariate quantitative traits, and to estimate genetic similarities measures that are predictive of the trait. However, in studies involving hundreds of thousands of SNPs and high-dimensional traits, a very large ensemble of trees must be inferred from the data in order to obtain reliable rankings, which makes the application of these algorithms computationally prohibitive. Results We have developed a parallel version of the RF algorithm for regression and genetic similarity learning tasks in large-scale population genetic association studies involving multivariate traits, called PaRFR (Parallel Random Forest Regression). Our implementation takes advantage of the MapReduce programming model and is deployed on Hadoop, an open-source software framework that supports data-intensive distributed applications. Notable speed-ups are obtained by introducing a distance-based criterion for node splitting in the tree estimation process. PaRFR has been applied to a genome-wide association study on Alzheimer's disease (AD) in which the quantitative trait consists of a high-dimensional neuroimaging phenotype describing longitudinal changes in the human brain structure. PaRFR provides a ranking of SNPs associated to this trait, and produces pair-wise measures of genetic proximity that can be directly compared to pair-wise measures of phenotypic proximity. Several known AD-related variants have been identified, including APOE4 and TOMM40. We also present experimental evidence supporting the hypothesis of a linear relationship between the number of top-ranked mutated states, or frequent mutation patterns, and an indicator of disease severity. Availability The Java codes are freely available at http://www2.imperial.ac.uk/~gmontana. PMID:24564704
Wang, Yue; Goh, Wilson; Wong, Limsoon; Montana, Giovanni
2013-01-01
Multivariate quantitative traits arise naturally in recent neuroimaging genetics studies, in which both structural and functional variability of the human brain is measured non-invasively through techniques such as magnetic resonance imaging (MRI). There is growing interest in detecting genetic variants associated with such multivariate traits, especially in genome-wide studies. Random forests (RFs) classifiers, which are ensembles of decision trees, are amongst the best performing machine learning algorithms and have been successfully employed for the prioritisation of genetic variants in case-control studies. RFs can also be applied to produce gene rankings in association studies with multivariate quantitative traits, and to estimate genetic similarities measures that are predictive of the trait. However, in studies involving hundreds of thousands of SNPs and high-dimensional traits, a very large ensemble of trees must be inferred from the data in order to obtain reliable rankings, which makes the application of these algorithms computationally prohibitive. We have developed a parallel version of the RF algorithm for regression and genetic similarity learning tasks in large-scale population genetic association studies involving multivariate traits, called PaRFR (Parallel Random Forest Regression). Our implementation takes advantage of the MapReduce programming model and is deployed on Hadoop, an open-source software framework that supports data-intensive distributed applications. Notable speed-ups are obtained by introducing a distance-based criterion for node splitting in the tree estimation process. PaRFR has been applied to a genome-wide association study on Alzheimer's disease (AD) in which the quantitative trait consists of a high-dimensional neuroimaging phenotype describing longitudinal changes in the human brain structure. PaRFR provides a ranking of SNPs associated to this trait, and produces pair-wise measures of genetic proximity that can be directly compared to pair-wise measures of phenotypic proximity. Several known AD-related variants have been identified, including APOE4 and TOMM40. We also present experimental evidence supporting the hypothesis of a linear relationship between the number of top-ranked mutated states, or frequent mutation patterns, and an indicator of disease severity. The Java codes are freely available at http://www2.imperial.ac.uk/~gmontana.
Using SPOT–5 HRG Data in Panchromatic Mode for Operational Detection of Small Ships in Tropical Area
Corbane, Christina; Marre, Fabrice; Petit, Michel
2008-01-01
Nowadays, there is a growing interest in applications of space remote sensing systems for maritime surveillance which includes among others traffic surveillance, maritime security, illegal fisheries survey, oil discharge and sea pollution monitoring. Within the framework of several French and European projects, an algorithm for automatic ship detection from SPOT–5 HRG data was developed to complement existing fishery control measures, in particular the Vessel Monitoring System. The algorithm focused on feature–based analysis of satellite imagery. Genetic algorithms and Neural Networks were used to deal with the feature–borne information. Based on the described approach, a first prototype was designed to classify small targets such as shrimp boats and tested on panchromatic SPOT–5, 5–m resolution product taking into account the environmental and fishing context. The ability to detect shrimp boats with satisfactory detection rates is an indicator of the robustness of the algorithm. Still, the benchmark revealed problems related to increased false alarm rates on particular types of images with a high percentage of cloud cover and a sea cluttered background. PMID:27879859
Automated analysis of brain activity for seizure detection in zebrafish models of epilepsy.
Hunyadi, Borbála; Siekierska, Aleksandra; Sourbron, Jo; Copmans, Daniëlle; de Witte, Peter A M
2017-08-01
Epilepsy is a chronic neurological condition, with over 30% of cases unresponsive to treatment. Zebrafish larvae show great potential to serve as an animal model of epilepsy in drug discovery. Thanks to their high fecundity and relatively low cost, they are amenable to high-throughput screening. However, the assessment of seizure occurrences in zebrafish larvae remains a bottleneck, as visual analysis is subjective and time-consuming. For the first time, we present an automated algorithm to detect epileptic discharges in single-channel local field potential (LFP) recordings in zebrafish. First, candidate seizure segments are selected based on their energy and length. Afterwards, discriminative features are extracted from each segment. Using a labeled dataset, a support vector machine (SVM) classifier is trained to learn an optimal feature mapping. Finally, this SVM classifier is used to detect seizure segments in new signals. We tested the proposed algorithm both in a chemically-induced seizure model and a genetic epilepsy model. In both cases, the algorithm delivered similar results to visual analysis and found a significant difference in number of seizures between the epileptic and control group. Direct comparison with multichannel techniques or methods developed for different animal models is not feasible. Nevertheless, a literature review shows that our algorithm outperforms state-of-the-art techniques in terms of accuracy, precision and specificity, while maintaining a reasonable sensitivity. Our seizure detection system is a generic, time-saving and objective method to analyze zebrafish LPF, which can replace visual analysis and facilitate true high-throughput studies. Copyright © 2017 Elsevier B.V. All rights reserved.
LDA boost classification: boosting by topics
NASA Astrophysics Data System (ADS)
Lei, La; Qiao, Guo; Qimin, Cao; Qitao, Li
2012-12-01
AdaBoost is an efficacious classification algorithm especially in text categorization (TC) tasks. The methodology of setting up a classifier committee and voting on the documents for classification can achieve high categorization precision. However, traditional Vector Space Model can easily lead to the curse of dimensionality and feature sparsity problems; so it affects classification performance seriously. This article proposed a novel classification algorithm called LDABoost based on boosting ideology which uses Latent Dirichlet Allocation (LDA) to modeling the feature space. Instead of using words or phrase, LDABoost use latent topics as the features. In this way, the feature dimension is significantly reduced. Improved Naïve Bayes (NB) is designed as the weaker classifier which keeps the efficiency advantage of classic NB algorithm and has higher precision. Moreover, a two-stage iterative weighted method called Cute Integration in this article is proposed for improving the accuracy by integrating weak classifiers into strong classifier in a more rational way. Mutual Information is used as metrics of weights allocation. The voting information and the categorization decision made by basis classifiers are fully utilized for generating the strong classifier. Experimental results reveals LDABoost making categorization in a low-dimensional space, it has higher accuracy than traditional AdaBoost algorithms and many other classic classification algorithms. Moreover, its runtime consumption is lower than different versions of AdaBoost, TC algorithms based on support vector machine and Neural Networks.
Sivley, R Michael; Sheehan, Jonathan H; Kropski, Jonathan A; Cogan, Joy; Blackwell, Timothy S; Phillips, John A; Bush, William S; Meiler, Jens; Capra, John A
2018-01-23
Next-generation sequencing of individuals with genetic diseases often detects candidate rare variants in numerous genes, but determining which are causal remains challenging. We hypothesized that the spatial distribution of missense variants in protein structures contains information about function and pathogenicity that can help prioritize variants of unknown significance (VUS) and elucidate the structural mechanisms leading to disease. To illustrate this approach in a clinical application, we analyzed 13 candidate missense variants in regulator of telomere elongation helicase 1 (RTEL1) identified in patients with Familial Interstitial Pneumonia (FIP). We curated pathogenic and neutral RTEL1 variants from the literature and public databases. We then used homology modeling to construct a 3D structural model of RTEL1 and mapped known variants into this structure. We next developed a pathogenicity prediction algorithm based on proximity to known disease causing and neutral variants and evaluated its performance with leave-one-out cross-validation. We further validated our predictions with segregation analyses, telomere lengths, and mutagenesis data from the homologous XPD protein. Our algorithm for classifying RTEL1 VUS based on spatial proximity to pathogenic and neutral variation accurately distinguished 7 known pathogenic from 29 neutral variants (ROC AUC = 0.85) in the N-terminal domains of RTEL1. Pathogenic proximity scores were also significantly correlated with effects on ATPase activity (Pearson r = -0.65, p = 0.0004) in XPD, a related helicase. Applying the algorithm to 13 VUS identified from sequencing of RTEL1 from patients predicted five out of six disease-segregating VUS to be pathogenic. We provide structural hypotheses regarding how these mutations may disrupt RTEL1 ATPase and helicase function. Spatial analysis of missense variation accurately classified candidate VUS in RTEL1 and suggests how such variants cause disease. Incorporating spatial proximity analyses into other pathogenicity prediction tools may improve accuracy for other genes and genetic diseases.
Computational intelligence techniques for biological data mining: An overview
NASA Astrophysics Data System (ADS)
Faye, Ibrahima; Iqbal, Muhammad Javed; Said, Abas Md; Samir, Brahim Belhaouari
2014-10-01
Computational techniques have been successfully utilized for a highly accurate analysis and modeling of multifaceted and raw biological data gathered from various genome sequencing projects. These techniques are proving much more effective to overcome the limitations of the traditional in-vitro experiments on the constantly increasing sequence data. However, most critical problems that caught the attention of the researchers may include, but not limited to these: accurate structure and function prediction of unknown proteins, protein subcellular localization prediction, finding protein-protein interactions, protein fold recognition, analysis of microarray gene expression data, etc. To solve these problems, various classification and clustering techniques using machine learning have been extensively used in the published literature. These techniques include neural network algorithms, genetic algorithms, fuzzy ARTMAP, K-Means, K-NN, SVM, Rough set classifiers, decision tree and HMM based algorithms. Major difficulties in applying the above algorithms include the limitations found in the previous feature encoding and selection methods while extracting the best features, increasing classification accuracy and decreasing the running time overheads of the learning algorithms. The application of this research would be potentially useful in the drug design and in the diagnosis of some diseases. This paper presents a concise overview of the well-known protein classification techniques.
Combined data mining/NIR spectroscopy for purity assessment of lime juice
NASA Astrophysics Data System (ADS)
Shafiee, Sahameh; Minaei, Saeid
2018-06-01
This paper reports the data mining study on the NIR spectrum of lime juice samples to determine their purity (natural or synthetic). NIR spectra for 72 pure and synthetic lime juice samples were recorded in reflectance mode. Sample outliers were removed using PCA analysis. Different data mining techniques for feature selection (Genetic Algorithm (GA)) and classification (including the radial basis function (RBF) network, Support Vector Machine (SVM), and Random Forest (RF) tree) were employed. Based on the results, SVM proved to be the most accurate classifier as it achieved the highest accuracy (97%) using the raw spectrum information. The classifier accuracy dropped to 93% when selected feature vector by GA search method was applied as classifier input. It can be concluded that some relevant features which produce good performance with the SVM classifier are removed by feature selection. Also, reduced spectra using PCA do not show acceptable performance (total accuracy of 66% by RBFNN), which indicates that dimensional reduction methods such as PCA do not always lead to more accurate results. These findings demonstrate the potential of data mining combination with near-infrared spectroscopy for monitoring lime juice quality in terms of natural or synthetic nature.
Predictive classification of self-paced upper-limb analytical movements with EEG.
Ibáñez, Jaime; Serrano, J I; del Castillo, M D; Minguez, J; Pons, J L
2015-11-01
The extent to which the electroencephalographic activity allows the characterization of movements with the upper limb is an open question. This paper describes the design and validation of a classifier of upper-limb analytical movements based on electroencephalographic activity extracted from intervals preceding self-initiated movement tasks. Features selected for the classification are subject specific and associated with the movement tasks. Further tests are performed to reject the hypothesis that other information different from the task-related cortical activity is being used by the classifiers. Six healthy subjects were measured performing self-initiated upper-limb analytical movements. A Bayesian classifier was used to classify among seven different kinds of movements. Features considered covered the alpha and beta bands. A genetic algorithm was used to optimally select a subset of features for the classification. An average accuracy of 62.9 ± 7.5% was reached, which was above the baseline level observed with the proposed methodology (30.2 ± 4.3%). The study shows how the electroencephalography carries information about the type of analytical movement performed with the upper limb and how it can be decoded before the movement begins. In neurorehabilitation environments, this information could be used for monitoring and assisting purposes.
Generalising Ward's Method for Use with Manhattan Distances.
Strauss, Trudie; von Maltitz, Michael Johan
2017-01-01
The claim that Ward's linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward's clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward's linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward's method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward's algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric.
Artificial bee colony algorithm for single-trial electroencephalogram analysis.
Hsu, Wei-Yen; Hu, Ya-Ping
2015-04-01
In this study, we propose an analysis system combined with feature selection to further improve the classification accuracy of single-trial electroencephalogram (EEG) data. Acquiring event-related brain potential data from the sensorimotor cortices, the system comprises artifact and background noise removal, feature extraction, feature selection, and feature classification. First, the artifacts and background noise are removed automatically by means of independent component analysis and surface Laplacian filter, respectively. Several potential features, such as band power, autoregressive model, and coherence and phase-locking value, are then extracted for subsequent classification. Next, artificial bee colony (ABC) algorithm is used to select features from the aforementioned feature combination. Finally, selected subfeatures are classified by support vector machine. Comparing with and without artifact removal and feature selection, using a genetic algorithm on single-trial EEG data for 6 subjects, the results indicate that the proposed system is promising and suitable for brain-computer interface applications. © EEG and Clinical Neuroscience Society (ECNS) 2014.
Ma, Li; Fan, Suohai
2017-03-14
The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.
Problem solving with genetic algorithms and Splicer
NASA Technical Reports Server (NTRS)
Bayer, Steven E.; Wang, Lui
1991-01-01
Genetic algorithms are highly parallel, adaptive search procedures (i.e., problem-solving methods) loosely based on the processes of population genetics and Darwinian survival of the fittest. Genetic algorithms have proven useful in domains where other optimization techniques perform poorly. The main purpose of the paper is to discuss a NASA-sponsored software development project to develop a general-purpose tool for using genetic algorithms. The tool, called Splicer, can be used to solve a wide variety of optimization problems and is currently available from NASA and COSMIC. This discussion is preceded by an introduction to basic genetic algorithm concepts and a discussion of genetic algorithm applications.
A unified classifier for robust face recognition based on combining multiple subspace algorithms
NASA Astrophysics Data System (ADS)
Ijaz Bajwa, Usama; Ahmad Taj, Imtiaz; Waqas Anwar, Muhammad
2012-10-01
Face recognition being the fastest growing biometric technology has expanded manifold in the last few years. Various new algorithms and commercial systems have been proposed and developed. However, none of the proposed or developed algorithm is a complete solution because it may work very well on one set of images with say illumination changes but may not work properly on another set of image variations like expression variations. This study is motivated by the fact that any single classifier cannot claim to show generally better performance against all facial image variations. To overcome this shortcoming and achieve generality, combining several classifiers using various strategies has been studied extensively also incorporating the question of suitability of any classifier for this task. The study is based on the outcome of a comprehensive comparative analysis conducted on a combination of six subspace extraction algorithms and four distance metrics on three facial databases. The analysis leads to the selection of the most suitable classifiers which performs better on one task or the other. These classifiers are then combined together onto an ensemble classifier by two different strategies of weighted sum and re-ranking. The results of the ensemble classifier show that these strategies can be effectively used to construct a single classifier that can successfully handle varying facial image conditions of illumination, aging and facial expressions.
Evolution of cellular automata with memory: The Density Classification Task.
Stone, Christopher; Bull, Larry
2009-08-01
The Density Classification Task is a well known test problem for two-state discrete dynamical systems. For many years researchers have used a variety of evolutionary computation approaches to evolve solutions to this problem. In this paper, we investigate the evolvability of solutions when the underlying Cellular Automaton is augmented with a type of memory based on the Least Mean Square algorithm. To obtain high performance solutions using a simple non-hybrid genetic algorithm, we design a novel representation based on the ternary representation used for Learning Classifier Systems. The new representation is found able to produce superior performance to the bit string traditionally used for representing Cellular automata. Moreover, memory is shown to improve evolvability of solutions and appropriate memory settings are able to be evolved as a component part of these solutions.
Genetic algorithms using SISAL parallel programming language
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tejada, S.
1994-05-06
Genetic algorithms are a mathematical optimization technique developed by John Holland at the University of Michigan [1]. The SISAL programming language possesses many of the characteristics desired to implement genetic algorithms. SISAL is a deterministic, functional programming language which is inherently parallel. Because SISAL is functional and based on mathematical concepts, genetic algorithms can be efficiently translated into the language. Several of the steps involved in genetic algorithms, such as mutation, crossover, and fitness evaluation, can be parallelized using SISAL. In this paper I will l discuss the implementation and performance of parallel genetic algorithms in SISAL.
Biswas, Surama; Dutta, Subarna; Acharyya, Sriyankar
2017-12-01
Identifying a small subset of disease critical genes out of a large size of microarray gene expression data is a challenge in computational life sciences. This paper has applied four meta-heuristic algorithms, namely, honey bee mating optimization (HBMO), harmony search (HS), differential evolution (DE) and genetic algorithm (basic version GA) to find disease critical genes of preeclampsia which affects women during gestation. Two hybrid algorithms, namely, HBMO-kNN and HS-kNN have been newly proposed here where kNN (k nearest neighbor classifier) is used for sample classification. Performances of these new approaches have been compared with other two hybrid algorithms, namely, DE-kNN and SGA-kNN. Three datasets of different sizes have been used. In a dataset, the set of genes found common in the output of each algorithm is considered here as disease critical genes. In different datasets, the percentage of classification or classification accuracy of meta-heuristic algorithms varied between 92.46 and 100%. HBMO-kNN has the best performance (99.64-100%) in almost all data sets. DE-kNN secures the second position (99.42-100%). Disease critical genes obtained here match with clinically revealed preeclampsia genes to a large extent.
An efficient robust sound classification algorithm for hearing aids.
Nordqvist, Peter; Leijon, Arne
2004-06-01
An efficient robust sound classification algorithm based on hidden Markov models is presented. The system would enable a hearing aid to automatically change its behavior for differing listening environments according to the user's preferences. This work attempts to distinguish between three listening environment categories: speech in traffic noise, speech in babble, and clean speech, regardless of the signal-to-noise ratio. The classifier uses only the modulation characteristics of the signal. The classifier ignores the absolute sound pressure level and the absolute spectrum shape, resulting in an algorithm that is robust against irrelevant acoustic variations. The measured classification hit rate was 96.7%-99.5% when the classifier was tested with sounds representing one of the three environment categories included in the classifier. False-alarm rates were 0.2%-1.7% in these tests. The algorithm is robust and efficient and consumes a small amount of instructions and memory. It is fully possible to implement the classifier in a DSP-based hearing instrument.
2012-01-01
Background Through the wealth of information contained within them, genome-wide association studies (GWAS) have the potential to provide researchers with a systematic means of associating genetic variants with a wide variety of disease phenotypes. Due to the limitations of approaches that have analyzed single variants one at a time, it has been proposed that the genetic basis of these disorders could be determined through detailed analysis of the genetic variants themselves and in conjunction with one another. The construction of models that account for these subsets of variants requires methodologies that generate predictions based on the total risk of a particular group of polymorphisms. However, due to the excessive number of variants, constructing these types of models has so far been computationally infeasible. Results We have implemented an algorithm, known as greedy RLS, that we use to perform the first known wrapper-based feature selection on the genome-wide level. The running time of greedy RLS grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. This speed is achieved through computational short-cuts based on matrix calculus. Since the memory consumption in present-day computers can form an even tighter bottleneck than running time, we also developed a space efficient variation of greedy RLS which trades running time for memory. These approaches are then compared to traditional wrapper-based feature selection implementations based on support vector machines (SVM) to reveal the relative speed-up and to assess the feasibility of the new algorithm. As a proof of concept, we apply greedy RLS to the Hypertension – UK National Blood Service WTCCC dataset and select the most predictive variants using 3-fold external cross-validation in less than 26 minutes on a high-end desktop. On this dataset, we also show that greedy RLS has a better classification performance on independent test data than a classifier trained using features selected by a statistical p-value-based filter, which is currently the most popular approach for constructing predictive models in GWAS. Conclusions Greedy RLS is the first known implementation of a machine learning based method with the capability to conduct a wrapper-based feature selection on an entire GWAS containing several thousand examples and over 400,000 variants. In our experiments, greedy RLS selected a highly predictive subset of genetic variants in a fraction of the time spent by wrapper-based selection methods used together with SVM classifiers. The proposed algorithms are freely available as part of the RLScore software library at http://users.utu.fi/aatapa/RLScore/. PMID:22551170
Walking Objectively Measured: Classifying Accelerometer Data with GPS and Travel Diaries
Kang, Bumjoon; Moudon, Anne V.; Hurvitz, Philip M.; Reichley, Lucas; Saelens, Brian E.
2013-01-01
Purpose This study developed and tested an algorithm to classify accelerometer data as walking or non-walking using either GPS or travel diary data within a large sample of adults under free-living conditions. Methods Participants wore an accelerometer and a GPS unit, and concurrently completed a travel diary for 7 consecutive days. Physical activity (PA) bouts were identified using accelerometry count sequences. PA bouts were then classified as walking or non-walking based on a decision-tree algorithm consisting of 7 classification scenarios. Algorithm reliability was examined relative to two independent analysts’ classification of a 100-bout verification sample. The algorithm was then applied to the entire set of PA bouts. Results The 706 participants’ (mean age 51 years, 62% female, 80% non-Hispanic white, 70% college graduate or higher) yielded 4,702 person-days of data and had a total of 13,971 PA bouts. The algorithm showed a mean agreement of 95% with the independent analysts. It classified physical activity into 8,170 (58.5 %) walking bouts and 5,337 (38.2%) non-walking bouts; 464 (3.3%) bouts were not classified for lack of GPS and diary data. Nearly 70% of the walking bouts and 68% of the non-walking bouts were classified using only the objective accelerometer and GPS data. Travel diary data helped classify 30% of all bouts with no GPS data. The mean duration of PA bouts classified as walking was 15.2 min (SD=12.9). On average, participants had 1.7 walking bouts and 25.4 total walking minutes per day. Conclusions GPS and travel diary information can be helpful in classifying most accelerometer-derived PA bouts into walking or non-walking behavior. PMID:23439414
Model-Independent Phenotyping of C. elegans Locomotion Using Scale-Invariant Feature Transform
Koren, Yelena; Sznitman, Raphael; Arratia, Paulo E.; Carls, Christopher; Krajacic, Predrag; Brown, André E. X.; Sznitman, Josué
2015-01-01
To uncover the genetic basis of behavioral traits in the model organism C. elegans, a common strategy is to study locomotion defects in mutants. Despite efforts to introduce (semi-)automated phenotyping strategies, current methods overwhelmingly depend on worm-specific features that must be hand-crafted and as such are not generalizable for phenotyping motility in other animal models. Hence, there is an ongoing need for robust algorithms that can automatically analyze and classify motility phenotypes quantitatively. To this end, we have developed a fully-automated approach to characterize C. elegans’ phenotypes that does not require the definition of nematode-specific features. Rather, we make use of the popular computer vision Scale-Invariant Feature Transform (SIFT) from which we construct histograms of commonly-observed SIFT features to represent nematode motility. We first evaluated our method on a synthetic dataset simulating a range of nematode crawling gaits. Next, we evaluated our algorithm on two distinct datasets of crawling C. elegans with mutants affecting neuromuscular structure and function. Not only is our algorithm able to detect differences between strains, results capture similarities in locomotory phenotypes that lead to clustering that is consistent with expectations based on genetic relationships. Our proposed approach generalizes directly and should be applicable to other animal models. Such applicability holds promise for computational ethology as more groups collect high-resolution image data of animal behavior. PMID:25816290
Use of genetic algorithm for the selection of EEG features
NASA Astrophysics Data System (ADS)
Asvestas, P.; Korda, A.; Kostopoulos, S.; Karanasiou, I.; Ouzounoglou, A.; Sidiropoulos, K.; Ventouras, E.; Matsopoulos, G.
2015-09-01
Genetic Algorithm (GA) is a popular optimization technique that can detect the global optimum of a multivariable function containing several local optima. GA has been widely used in the field of biomedical informatics, especially in the context of designing decision support systems that classify biomedical signals or images into classes of interest. The aim of this paper is to present a methodology, based on GA, for the selection of the optimal subset of features that can be used for the efficient classification of Event Related Potentials (ERPs), which are recorded during the observation of correct or incorrect actions. In our experiment, ERP recordings were acquired from sixteen (16) healthy volunteers who observed correct or incorrect actions of other subjects. The brain electrical activity was recorded at 47 locations on the scalp. The GA was formulated as a combinatorial optimizer for the selection of the combination of electrodes that maximizes the performance of the Fuzzy C Means (FCM) classification algorithm. In particular, during the evolution of the GA, for each candidate combination of electrodes, the well-known (Σ, Φ, Ω) features were calculated and were evaluated by means of the FCM method. The proposed methodology provided a combination of 8 electrodes, with classification accuracy 93.8%. Thus, GA can be the basis for the selection of features that discriminate ERP recordings of observations of correct or incorrect actions.
Computational approaches for understanding the diagnosis and treatment of Parkinson’s disease
Smith, Stephen L.; Lones, Michael A.; Bedder, Matthew; Alty, Jane E.; Cosgrove, Jeremy; Maguire, Richard J.; Pownall, Mary Elizabeth; Ivanoiu, Diana; Lyle, Camille; Cording, Amy; Elliott, Christopher J.H.
2015-01-01
This study describes how the application of evolutionary algorithms (EAs) can be used to study motor function in humans with Parkinson’s disease (PD) and in animal models of PD. Human data is obtained using commercially available sensors via a range of non-invasive procedures that follow conventional clinical practice. EAs can then be used to classify human data for a range of uses, including diagnosis and disease monitoring. New results are presented that demonstrate how EAs can also be used to classify fruit flies with and without genetic mutations that cause Parkinson’s by using measurements of the proboscis extension reflex. The case is made for a computational approach that can be applied across human and animal studies of PD and lays the way for evaluation of existing and new drug therapies in a truly objective way. PMID:26577157
A target recognition method for maritime surveillance radars based on hybrid ensemble selection
NASA Astrophysics Data System (ADS)
Fan, Xueman; Hu, Shengliang; He, Jingbo
2017-11-01
In order to improve the generalisation ability of the maritime surveillance radar, a novel ensemble selection technique, termed Optimisation and Dynamic Selection (ODS), is proposed. During the optimisation phase, the non-dominated sorting genetic algorithm II for multi-objective optimisation is used to find the Pareto front, i.e. a set of ensembles of classifiers representing different tradeoffs between the classification error and diversity. During the dynamic selection phase, the meta-learning method is used to predict whether a candidate ensemble is competent enough to classify a query instance based on three different aspects, namely, feature space, decision space and the extent of consensus. The classification performance and time complexity of ODS are compared against nine other ensemble methods using a self-built full polarimetric high resolution range profile data-set. The experimental results clearly show the effectiveness of ODS. In addition, the influence of the selection of diversity measures is studied concurrently.
Arrhythmia Classification Based on Multi-Domain Feature Extraction for an ECG Recognition System.
Li, Hongqiang; Yuan, Danyang; Wang, Youxi; Cui, Dianyin; Cao, Lu
2016-10-20
Automatic recognition of arrhythmias is particularly important in the diagnosis of heart diseases. This study presents an electrocardiogram (ECG) recognition system based on multi-domain feature extraction to classify ECG beats. An improved wavelet threshold method for ECG signal pre-processing is applied to remove noise interference. A novel multi-domain feature extraction method is proposed; this method employs kernel-independent component analysis in nonlinear feature extraction and uses discrete wavelet transform to extract frequency domain features. The proposed system utilises a support vector machine classifier optimized with a genetic algorithm to recognize different types of heartbeats. An ECG acquisition experimental platform, in which ECG beats are collected as ECG data for classification, is constructed to demonstrate the effectiveness of the system in ECG beat classification. The presented system, when applied to the MIT-BIH arrhythmia database, achieves a high classification accuracy of 98.8%. Experimental results based on the ECG acquisition experimental platform show that the system obtains a satisfactory classification accuracy of 97.3% and is able to classify ECG beats efficiently for the automatic identification of cardiac arrhythmias.
Arrhythmia Classification Based on Multi-Domain Feature Extraction for an ECG Recognition System
Li, Hongqiang; Yuan, Danyang; Wang, Youxi; Cui, Dianyin; Cao, Lu
2016-01-01
Automatic recognition of arrhythmias is particularly important in the diagnosis of heart diseases. This study presents an electrocardiogram (ECG) recognition system based on multi-domain feature extraction to classify ECG beats. An improved wavelet threshold method for ECG signal pre-processing is applied to remove noise interference. A novel multi-domain feature extraction method is proposed; this method employs kernel-independent component analysis in nonlinear feature extraction and uses discrete wavelet transform to extract frequency domain features. The proposed system utilises a support vector machine classifier optimized with a genetic algorithm to recognize different types of heartbeats. An ECG acquisition experimental platform, in which ECG beats are collected as ECG data for classification, is constructed to demonstrate the effectiveness of the system in ECG beat classification. The presented system, when applied to the MIT-BIH arrhythmia database, achieves a high classification accuracy of 98.8%. Experimental results based on the ECG acquisition experimental platform show that the system obtains a satisfactory classification accuracy of 97.3% and is able to classify ECG beats efficiently for the automatic identification of cardiac arrhythmias. PMID:27775596
AUTOCLASSIFICATION OF THE VARIABLE 3XMM SOURCES USING THE RANDOM FOREST MACHINE LEARNING ALGORITHM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farrell, Sean A.; Murphy, Tara; Lo, Kitty K., E-mail: s.farrell@physics.usyd.edu.au
In the current era of large surveys and massive data sets, autoclassification of astrophysical sources using intelligent algorithms is becoming increasingly important. In this paper we present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog (3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample of manually classified variable sources from the second data release of the XMM-Newton catalogs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ∼92%. We also evaluated the effectiveness of identifying spurious detections using a sample of spurious sources, achieving an accuracy of ∼95%. Manual investigation of amore » random sample of classified sources confirmed these accuracy levels and showed that the Random Forest machine learning algorithm is highly effective at automatically classifying 3XMM sources. Here we present the catalog of classified 3XMM variable sources. We also present three previously unidentified unusual sources that were flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid.« less
Hip and Wrist Accelerometer Algorithms for Free-Living Behavior Classification.
Ellis, Katherine; Kerr, Jacqueline; Godbole, Suneeta; Staudenmayer, John; Lanckriet, Gert
2016-05-01
Accelerometers are a valuable tool for objective measurement of physical activity (PA). Wrist-worn devices may improve compliance over standard hip placement, but more research is needed to evaluate their validity for measuring PA in free-living settings. Traditional cut-point methods for accelerometers can be inaccurate and need testing in free living with wrist-worn devices. In this study, we developed and tested the performance of machine learning (ML) algorithms for classifying PA types from both hip and wrist accelerometer data. Forty overweight or obese women (mean age = 55.2 ± 15.3 yr; BMI = 32.0 ± 3.7) wore two ActiGraph GT3X+ accelerometers (right hip, nondominant wrist; ActiGraph, Pensacola, FL) for seven free-living days. Wearable cameras captured ground truth activity labels. A classifier consisting of a random forest and hidden Markov model classified the accelerometer data into four activities (sitting, standing, walking/running, and riding in a vehicle). Free-living wrist and hip ML classifiers were compared with each other, with traditional accelerometer cut points, and with an algorithm developed in a laboratory setting. The ML classifier obtained average values of 89.4% and 84.6% balanced accuracy over the four activities using the hip and wrist accelerometer, respectively. In our data set with average values of 28.4 min of walking or running per day, the ML classifier predicted average values of 28.5 and 24.5 min of walking or running using the hip and wrist accelerometer, respectively. Intensity-based cut points and the laboratory algorithm significantly underestimated walking minutes. Our results demonstrate the superior performance of our PA-type classification algorithm, particularly in comparison with traditional cut points. Although the hip algorithm performed better, additional compliance achieved with wrist devices might justify using a slightly lower performing algorithm.
Muhlbaier, Michael D; Topalis, Apostolos; Polikar, Robi
2009-01-01
We have previously introduced an incremental learning algorithm Learn(++), which learns novel information from consecutive data sets by generating an ensemble of classifiers with each data set, and combining them by weighted majority voting. However, Learn(++) suffers from an inherent "outvoting" problem when asked to learn a new class omega(new) introduced by a subsequent data set, as earlier classifiers not trained on this class are guaranteed to misclassify omega(new) instances. The collective votes of earlier classifiers, for an inevitably incorrect decision, then outweigh the votes of the new classifiers' correct decision on omega(new) instances--until there are enough new classifiers to counteract the unfair outvoting. This forces Learn(++) to generate an unnecessarily large number of classifiers. This paper describes Learn(++).NC, specifically designed for efficient incremental learning of multiple new classes using significantly fewer classifiers. To do so, Learn (++).NC introduces dynamically weighted consult and vote (DW-CAV), a novel voting mechanism for combining classifiers: individual classifiers consult with each other to determine which ones are most qualified to classify a given instance, and decide how much weight, if any, each classifier's decision should carry. Experiments on real-world problems indicate that the new algorithm performs remarkably well with substantially fewer classifiers, not only as compared to its predecessor Learn(++), but also as compared to several other algorithms recently proposed for similar problems.
Welikala, R A; Fraz, M M; Dehmeshki, J; Hoppe, A; Tah, V; Mann, S; Williamson, T H; Barman, S A
2015-07-01
Proliferative diabetic retinopathy (PDR) is a condition that carries a high risk of severe visual impairment. The hallmark of PDR is the growth of abnormal new vessels. In this paper, an automated method for the detection of new vessels from retinal images is presented. This method is based on a dual classification approach. Two vessel segmentation approaches are applied to create two separate binary vessel map which each hold vital information. Local morphology features are measured from each binary vessel map to produce two separate 4-D feature vectors. Independent classification is performed for each feature vector using a support vector machine (SVM) classifier. The system then combines these individual outcomes to produce a final decision. This is followed by the creation of additional features to generate 21-D feature vectors, which feed into a genetic algorithm based feature selection approach with the objective of finding feature subsets that improve the performance of the classification. Sensitivity and specificity results using a dataset of 60 images are 0.9138 and 0.9600, respectively, on a per patch basis and 1.000 and 0.975, respectively, on a per image basis. Copyright © 2015 Elsevier Ltd. All rights reserved.
Monga, Isha; Qureshi, Abid; Thakur, Nishant; Gupta, Amit Kumar; Kumar, Manoj
2017-09-07
Allele-specific siRNAs (ASP-siRNAs) have emerged as promising therapeutic molecules owing to their selectivity to inhibit the mutant allele or associated single-nucleotide polymorphisms (SNPs) sparing the expression of the wild-type counterpart. Thus, a dedicated bioinformatics platform encompassing updated ASP-siRNAs and an algorithm for the prediction of their inhibitory efficacy will be helpful in tackling currently intractable genetic disorders. In the present study, we have developed the ASPsiRNA resource (http://crdd.osdd.net/servers/aspsirna/) covering three components viz (i) ASPsiDb , (ii) ASPsiPred , and (iii) analysis tools like ASP-siOffTar ASPsiDb is a manually curated database harboring 4543 (including 422 chemically modified) ASP-siRNAs targeting 78 unique genes involved in 51 different diseases. It furnishes comprehensive information from experimental studies on ASP-siRNAs along with multidimensional genetic and clinical information for numerous mutations. ASPsiPred is a two-layered algorithm to predict efficacy of ASP-siRNAs for fully complementary mutant (Eff mut ) and wild-type allele (Eff wild ) with one mismatch by ASPsiPred SVM and ASPsiPred matrix , respectively. In ASPsiPred SVM , 922 unique ASP-siRNAs with experimentally validated quantitative Eff mut were used. During 10-fold cross-validation (10nCV) employing various sequence features on the training/testing dataset (T737), the best predictive model achieved a maximum Pearson's correlation coefficient (PCC) of 0.71. Further, the accuracy of the classifier to predict Eff mut against novel genes was assessed by leave one target out cross-validation approach (LOTOCV). ASPsiPred matrix was constructed from rule-based studies describing the effect of single siRNA:mRNA mismatches on the efficacy at 19 different locations of siRNA. Thus, ASPsiRNA encompasses the first database, prediction algorithm, and off-target analysis tool that is expected to accelerate research in the field of RNAi-based therapeutics for human genetic diseases. Copyright © 2017 Monga et al.
NASA Astrophysics Data System (ADS)
Tewari, Jagdish C.; Dixit, Vivechana; Cho, Byoung-Kwan; Malik, Kamal A.
2008-12-01
The capacity to confirm the variety or origin and the estimation of sucrose, glucose, fructose of the citrus fruits are major interests of citrus juice industry. A rapid classification and quantification technique was developed and validated for simultaneous and nondestructive quantifying the sugar constituent's concentrations and the origin of citrus fruits using Fourier Transform Near-Infrared (FT-NIR) spectroscopy in conjunction with Artificial Neural Network (ANN) using genetic algorithm, Chemometrics and Correspondences Analysis (CA). To acquire good classification accuracy and to present a wide range of concentration of sucrose, glucose and fructose, we have collected 22 different varieties of citrus fruits from the market during the entire season of citruses. FT-NIR spectra were recorded in the NIR region from 1100 to 2500 nm using the fiber optic probe and three types of data analysis were performed. Chemometrics analysis using Partial Least Squares (PLS) was performed in order to determine the concentration of individual sugars. Artificial Neural Network analysis was performed for classification, origin or variety identification of citrus fruits using genetic algorithm. Correspondence analysis was performed in order to visualize the relationship between the citrus fruits. To compute a PLS model based upon the reference values and to validate the developed method, high performance liquid chromatography (HPLC) was performed. Spectral range and the number of PLS factors were optimized for the lowest standard error of calibration (SEC), prediction (SEP) and correlation coefficient ( R2). The calibration model developed was able to assess the sucrose, glucose and fructose contents in unknown citrus fruit up to an R2 value of 0.996-0.998. Numbers of factors from F1 to F10 were optimized for correspondence analysis for relationship visualization of citrus fruits based on the output values of genetic algorithm. ANN and CA analysis showed excellent classification of citrus according to the variety to which they belong and well-classified citrus according to their origin. The technique has potential in rapid determination of sugars content and to identify different varieties and origins of citrus in citrus juice industry.
Tewari, Jagdish C; Dixit, Vivechana; Cho, Byoung-Kwan; Malik, Kamal A
2008-12-01
The capacity to confirm the variety or origin and the estimation of sucrose, glucose, fructose of the citrus fruits are major interests of citrus juice industry. A rapid classification and quantification technique was developed and validated for simultaneous and nondestructive quantifying the sugar constituent's concentrations and the origin of citrus fruits using Fourier Transform Near-Infrared (FT-NIR) spectroscopy in conjunction with Artificial Neural Network (ANN) using genetic algorithm, Chemometrics and Correspondences Analysis (CA). To acquire good classification accuracy and to present a wide range of concentration of sucrose, glucose and fructose, we have collected 22 different varieties of citrus fruits from the market during the entire season of citruses. FT-NIR spectra were recorded in the NIR region from 1,100 to 2,500 nm using the fiber optic probe and three types of data analysis were performed. Chemometrics analysis using Partial Least Squares (PLS) was performed in order to determine the concentration of individual sugars. Artificial Neural Network analysis was performed for classification, origin or variety identification of citrus fruits using genetic algorithm. Correspondence analysis was performed in order to visualize the relationship between the citrus fruits. To compute a PLS model based upon the reference values and to validate the developed method, high performance liquid chromatography (HPLC) was performed. Spectral range and the number of PLS factors were optimized for the lowest standard error of calibration (SEC), prediction (SEP) and correlation coefficient (R(2)). The calibration model developed was able to assess the sucrose, glucose and fructose contents in unknown citrus fruit up to an R(2) value of 0.996-0.998. Numbers of factors from F1 to F10 were optimized for correspondence analysis for relationship visualization of citrus fruits based on the output values of genetic algorithm. ANN and CA analysis showed excellent classification of citrus according to the variety to which they belong and well-classified citrus according to their origin. The technique has potential in rapid determination of sugars content and to identify different varieties and origins of citrus in citrus juice industry.
Optimal Design of Passive Power Filters Based on Pseudo-parallel Genetic Algorithm
NASA Astrophysics Data System (ADS)
Li, Pei; Li, Hongbo; Gao, Nannan; Niu, Lin; Guo, Liangfeng; Pei, Ying; Zhang, Yanyan; Xu, Minmin; Chen, Kerui
2017-05-01
The economic costs together with filter efficiency are taken as targets to optimize the parameter of passive filter. Furthermore, the method of combining pseudo-parallel genetic algorithm with adaptive genetic algorithm is adopted in this paper. In the early stages pseudo-parallel genetic algorithm is introduced to increase the population diversity, and adaptive genetic algorithm is used in the late stages to reduce the workload. At the same time, the migration rate of pseudo-parallel genetic algorithm is improved to change with population diversity adaptively. Simulation results show that the filter designed by the proposed method has better filtering effect with lower economic cost, and can be used in engineering.
Yu, Sheng; Liao, Katherine P; Shaw, Stanley Y; Gainer, Vivian S; Churchill, Susanne E; Szolovits, Peter; Murphy, Shawn N; Kohane, Isaac S; Cai, Tianxi
2015-09-01
Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy. Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. When combined with additional codified features, a penalized logistic regression model was trained to classify the target phenotype. The authors applied our method to develop algorithms to identify patients with rheumatoid arthritis and coronary artery disease cases among those with rheumatoid arthritis from a large multi-institutional EHR. The area under the receiver operating characteristic curves (AUC) for classifying RA and CAD using models trained with automated features were 0.951 and 0.929, respectively, compared to the AUCs of 0.938 and 0.929 by models trained with expert-curated features. Models trained with NLP text features selected through an unbiased, automated procedure achieved comparable or slightly higher accuracy than those trained with expert-curated features. The majority of the selected model features were interpretable. The proposed automated feature extraction method, generating highly accurate phenotyping algorithms with improved efficiency, is a significant step toward high-throughput phenotyping. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Takenouchi, Toshiki; Kuchikata, Tomu; Yoshihashi, Hiroshi; Fujiwara, Mineko; Uehara, Tomoko; Miyama, Sahoko; Yamada, Shiro; Kosaki, Kenjiro
2017-05-01
Among more than 5,000 human monogenic disorders with known causative genes, transposable element insertion of a Long Interspersed Nuclear Element 1 (LINE1, L1) is known as the mechanistic basis in only 13 genetic conditions. Meckel-Gruber syndrome is a rare ciliopathy characterized by occipital encephalocele and cystic kidney disease. Here, we document a boy with occipital encephalocele, post-axial polydactyly, and multicystic renal disease. A medical exome analysis detected a heterozygous frameshift mutation, c.4582_4583delCG p.(Arg1528Serfs*17) in CC2D2A in the maternally derived allele. The further use of a dedicated bioinformatics algorithm for detecting retrotransposon insertions led to the detection of an L1 insertion affecting exon 7 in the paternally derived allele. The complete sequencing and sequence homology analysis of the inserted L1 element showed that the L1 element was classified as L1HS (L1 human specific) and that the element had intact open reading frames in the two L1-encoded proteins. This observation ranks Meckel-Gruber syndrome as only the 14th disorder to be caused by an L1 insertion among more than 5,000 known human genetic disorders. Although a transposable element detection algorithm is not included in the current best-practice next-generation sequencing analysis, the present observation illustrates the utility of such an algorithm, which would require modest computational time and resources. Whether the seemingly infrequent recognition of L1 insertion in the pathogenesis of human genetic diseases might simply reflect a lack of appropriate detection methods remains to be seen. © 2017 Wiley Periodicals, Inc.
Hosseinifard, Behshad; Moradi, Mohammad Hassan; Rostami, Reza
2013-03-01
Diagnosing depression in the early curable stages is very important and may even save the life of a patient. In this paper, we study nonlinear analysis of EEG signal for discriminating depression patients and normal controls. Forty-five unmedicated depressed patients and 45 normal subjects were participated in this study. Power of four EEG bands and four nonlinear features including detrended fluctuation analysis (DFA), higuchi fractal, correlation dimension and lyapunov exponent were extracted from EEG signal. For discriminating the two groups, k-nearest neighbor, linear discriminant analysis and logistic regression as the classifiers are then used. Highest classification accuracy of 83.3% is obtained by correlation dimension and LR classifier among other nonlinear features. For further improvement, all nonlinear features are combined and applied to classifiers. A classification accuracy of 90% is achieved by all nonlinear features and LR classifier. In all experiments, genetic algorithm is employed to select the most important features. The proposed technique is compared and contrasted with the other reported methods and it is demonstrated that by combining nonlinear features, the performance is enhanced. This study shows that nonlinear analysis of EEG can be a useful method for discriminating depressed patients and normal subjects. It is suggested that this analysis may be a complementary tool to help psychiatrists for diagnosing depressed patients. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Mahrooghy, Majid; Ashraf, Ahmed B; Daye, Dania; McDonald, Elizabeth S; Rosen, Mark; Mies, Carolyn; Feldman, Michael; Kontos, Despina
2015-06-01
Heterogeneity in cancer can affect response to therapy and patient prognosis. Histologic measures have classically been used to measure heterogeneity, although a reliable noninvasive measurement is needed both to establish baseline risk of recurrence and monitor response to treatment. Here, we propose using spatiotemporal wavelet kinetic features from dynamic contrast-enhanced magnetic resonance imaging to quantify intratumor heterogeneity in breast cancer. Tumor pixels are first partitioned into homogeneous subregions using pharmacokinetic measures. Heterogeneity wavelet kinetic (HetWave) features are then extracted from these partitions to obtain spatiotemporal patterns of the wavelet coefficients and the contrast agent uptake. The HetWave features are evaluated in terms of their prognostic value using a logistic regression classifier with genetic algorithm wrapper-based feature selection to classify breast cancer recurrence risk as determined by a validated gene expression assay. Receiver operating characteristic analysis and area under the curve (AUC) are computed to assess classifier performance using leave-one-out cross validation. The HetWave features outperform other commonly used features (AUC = 0.88 HetWave versus 0.70 standard features). The combination of HetWave and standard features further increases classifier performance (AUCs 0.94). The rate of the spatial frequency pattern over the pharmacokinetic partitions can provide valuable prognostic information. HetWave could be a powerful feature extraction approach for characterizing tumor heterogeneity, providing valuable prognostic information.
NASA Technical Reports Server (NTRS)
Rogers, David
1991-01-01
G/SPLINES are a hybrid of Friedman's Multivariable Adaptive Regression Splines (MARS) algorithm with Holland's Genetic Algorithm. In this hybrid, the incremental search is replaced by a genetic search. The G/SPLINE algorithm exhibits performance comparable to that of the MARS algorithm, requires fewer least squares computations, and allows significantly larger problems to be considered.
Efficient Fingercode Classification
NASA Astrophysics Data System (ADS)
Sun, Hong-Wei; Law, Kwok-Yan; Gollmann, Dieter; Chung, Siu-Leung; Li, Jian-Bin; Sun, Jia-Guang
In this paper, we present an efficient fingerprint classification algorithm which is an essential component in many critical security application systems e. g. systems in the e-government and e-finance domains. Fingerprint identification is one of the most important security requirements in homeland security systems such as personnel screening and anti-money laundering. The problem of fingerprint identification involves searching (matching) the fingerprint of a person against each of the fingerprints of all registered persons. To enhance performance and reliability, a common approach is to reduce the search space by firstly classifying the fingerprints and then performing the search in the respective class. Jain et al. proposed a fingerprint classification algorithm based on a two-stage classifier, which uses a K-nearest neighbor classifier in its first stage. The fingerprint classification algorithm is based on the fingercode representation which is an encoding of fingerprints that has been demonstrated to be an effective fingerprint biometric scheme because of its ability to capture both local and global details in a fingerprint image. We enhance this approach by improving the efficiency of the K-nearest neighbor classifier for fingercode-based fingerprint classification. Our research firstly investigates the various fast search algorithms in vector quantization (VQ) and the potential application in fingerprint classification, and then proposes two efficient algorithms based on the pyramid-based search algorithms in VQ. Experimental results on DB1 of FVC 2004 demonstrate that our algorithms can outperform the full search algorithm and the original pyramid-based search algorithms in terms of computational efficiency without sacrificing accuracy.
Comparison of genetic algorithms with conjugate gradient methods
NASA Technical Reports Server (NTRS)
Bosworth, J. L.; Foo, N. Y.; Zeigler, B. P.
1972-01-01
Genetic algorithms for mathematical function optimization are modeled on search strategies employed in natural adaptation. Comparisons of genetic algorithms with conjugate gradient methods, which were made on an IBM 1800 digital computer, show that genetic algorithms display superior performance over gradient methods for functions which are poorly behaved mathematically, for multimodal functions, and for functions obscured by additive random noise. Genetic methods offer performance comparable to gradient methods for many of the standard functions.
Feature generation using genetic programming with application to fault classification.
Guo, Hong; Jack, Lindsay B; Nandi, Asoke K
2005-02-01
One of the major challenges in pattern recognition problems is the feature extraction process which derives new features from existing features, or directly from raw data in order to reduce the cost of computation during the classification process, while improving classifier efficiency. Most current feature extraction techniques transform the original pattern vector into a new vector with increased discrimination capability but lower dimensionality. This is conducted within a predefined feature space, and thus, has limited searching power. Genetic programming (GP) can generate new features from the original dataset without prior knowledge of the probabilistic distribution. In this paper, a GP-based approach is developed for feature extraction from raw vibration data recorded from a rotating machine with six different conditions. The created features are then used as the inputs to a neural classifier for the identification of six bearing conditions. Experimental results demonstrate the ability of GP to discover autimatically the different bearing conditions using features expressed in the form of nonlinear functions. Furthermore, four sets of results--using GP extracted features with artificial neural networks (ANN) and support vector machines (SVM), as well as traditional features with ANN and SVM--have been obtained. This GP-based approach is used for bearing fault classification for the first time and exhibits superior searching power over other techniques. Additionaly, it significantly reduces the time for computation compared with genetic algorithm (GA), therefore, makes a more practical realization of the solution.
Software For Genetic Algorithms
NASA Technical Reports Server (NTRS)
Wang, Lui; Bayer, Steve E.
1992-01-01
SPLICER computer program is genetic-algorithm software tool used to solve search and optimization problems. Provides underlying framework and structure for building genetic-algorithm application program. Written in Think C.
Recognition of pornographic web pages by classifying texts and images.
Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve
2007-06-01
With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.
A Random Forest-based ensemble method for activity recognition.
Feng, Zengtao; Mo, Lingfei; Li, Meng
2015-01-01
This paper presents a multi-sensor ensemble approach to human physical activity (PA) recognition, using random forest. We designed an ensemble learning algorithm, which integrates several independent Random Forest classifiers based on different sensor feature sets to build a more stable, more accurate and faster classifier for human activity recognition. To evaluate the algorithm, PA data collected from the PAMAP (Physical Activity Monitoring for Aging People), which is a standard, publicly available database, was utilized to train and test. The experimental results show that the algorithm is able to correctly recognize 19 PA types with an accuracy of 93.44%, while the training is faster than others. The ensemble classifier system based on the RF (Random Forest) algorithm can achieve high recognition accuracy and fast calculation.
A new approach to human microRNA target prediction using ensemble pruning and rotation forest.
Mousavi, Reza; Eftekhari, Mahdi; Haghighi, Mehdi Ghezelbash
2015-12-01
MicroRNAs (miRNAs) are small non-coding RNAs that have important functions in gene regulation. Since finding miRNA target experimentally is costly and needs spending much time, the use of machine learning methods is a growing research area for miRNA target prediction. In this paper, a new approach is proposed by using two popular ensemble strategies, i.e. Ensemble Pruning and Rotation Forest (EP-RTF), to predict human miRNA target. For EP, the approach utilizes Genetic Algorithm (GA). In other words, a subset of classifiers from the heterogeneous ensemble is first selected by GA. Next, the selected classifiers are trained based on the RTF method and then are combined using weighted majority voting. In addition to seeking a better subset of classifiers, the parameter of RTF is also optimized by GA. Findings of the present study confirm that the newly developed EP-RTF outperforms (in terms of classification accuracy, sensitivity, and specificity) the previously applied methods over four datasets in the field of human miRNA target. Diversity-error diagrams reveal that the proposed ensemble approach constructs individual classifiers which are more accurate and usually diverse than the other ensemble approaches. Given these experimental results, we highly recommend EP-RTF for improving the performance of miRNA target prediction.
An assessment of support vector machines for land cover classification
Huang, C.; Davis, L.S.; Townshend, J.R.G.
2002-01-01
The support vector machine (SVM) is a group of theoretically superior machine learning algorithms. It was found competitive with the best available machine learning algorithms in classifying high-dimensional data sets. This paper gives an introduction to the theoretical development of the SVM and an experimental evaluation of its accuracy, stability and training speed in deriving land cover classifications from satellite images. The SVM was compared to three other popular classifiers, including the maximum likelihood classifier (MLC), neural network classifiers (NNC) and decision tree classifiers (DTC). The impacts of kernel configuration on the performance of the SVM and of the selection of training data and input variables on the four classifiers were also evaluated in this experiment.
New knowledge-based genetic algorithm for excavator boom structural optimization
NASA Astrophysics Data System (ADS)
Hua, Haiyan; Lin, Shuwen
2014-03-01
Due to the insufficiency of utilizing knowledge to guide the complex optimal searching, existing genetic algorithms fail to effectively solve excavator boom structural optimization problem. To improve the optimization efficiency and quality, a new knowledge-based real-coded genetic algorithm is proposed. A dual evolution mechanism combining knowledge evolution with genetic algorithm is established to extract, handle and utilize the shallow and deep implicit constraint knowledge to guide the optimal searching of genetic algorithm circularly. Based on this dual evolution mechanism, knowledge evolution and population evolution can be connected by knowledge influence operators to improve the configurability of knowledge and genetic operators. Then, the new knowledge-based selection operator, crossover operator and mutation operator are proposed to integrate the optimal process knowledge and domain culture to guide the excavator boom structural optimization. Eight kinds of testing algorithms, which include different genetic operators, are taken as examples to solve the structural optimization of a medium-sized excavator boom. By comparing the results of optimization, it is shown that the algorithm including all the new knowledge-based genetic operators can more remarkably improve the evolutionary rate and searching ability than other testing algorithms, which demonstrates the effectiveness of knowledge for guiding optimal searching. The proposed knowledge-based genetic algorithm by combining multi-level knowledge evolution with numerical optimization provides a new effective method for solving the complex engineering optimization problem.
Data Compression Techniques for Maps
1989-01-01
Lempel - Ziv compression is applied to the classified and unclassified images as also to the output of the compression algorithms . The algorithms ...resulted in a compression of 7:1. The output of the quadtree coding algorithm was then compressed using Lempel - Ziv coding. The compression ratio achieved...using Lempel - Ziv coding. The unclassified image gave a compression ratio of only 1.4:1. The K means classified image
Textual and visual content-based anti-phishing: a Bayesian approach.
Zhang, Haijun; Liu, Gang; Chow, Tommy W S; Liu, Wenyin
2011-10-01
A novel framework using a Bayesian approach for content-based phishing web page detection is presented. Our model takes into account textual and visual contents to measure the similarity between the protected web page and suspicious web pages. A text classifier, an image classifier, and an algorithm fusing the results from classifiers are introduced. An outstanding feature of this paper is the exploration of a Bayesian model to estimate the matching threshold. This is required in the classifier for determining the class of the web page and identifying whether the web page is phishing or not. In the text classifier, the naive Bayes rule is used to calculate the probability that a web page is phishing. In the image classifier, the earth mover's distance is employed to measure the visual similarity, and our Bayesian model is designed to determine the threshold. In the data fusion algorithm, the Bayes theory is used to synthesize the classification results from textual and visual content. The effectiveness of our proposed approach was examined in a large-scale dataset collected from real phishing cases. Experimental results demonstrated that the text classifier and the image classifier we designed deliver promising results, the fusion algorithm outperforms either of the individual classifiers, and our model can be adapted to different phishing cases. © 2011 IEEE
Paradigms for machine learning
NASA Technical Reports Server (NTRS)
Schlimmer, Jeffrey C.; Langley, Pat
1991-01-01
Five paradigms are described for machine learning: connectionist (neural network) methods, genetic algorithms and classifier systems, empirical methods for inducing rules and decision trees, analytic learning methods, and case-based approaches. Some dimensions are considered along with these paradigms vary in their approach to learning, and the basic methods are reviewed that are used within each framework, together with open research issues. It is argued that the similarities among the paradigms are more important than their differences, and that future work should attempt to bridge the existing boundaries. Finally, some recent developments in the field of machine learning are discussed, and their impact on both research and applications is examined.
Ensemble of hybrid genetic algorithm for two-dimensional phase unwrapping
NASA Astrophysics Data System (ADS)
Balakrishnan, D.; Quan, C.; Tay, C. J.
2013-06-01
The phase unwrapping is the final and trickiest step in any phase retrieval technique. Phase unwrapping by artificial intelligence methods (optimization algorithms) such as hybrid genetic algorithm, reverse simulated annealing, particle swarm optimization, minimum cost matching showed better results than conventional phase unwrapping methods. In this paper, Ensemble of hybrid genetic algorithm with parallel populations is proposed to solve the branch-cut phase unwrapping problem. In a single populated hybrid genetic algorithm, the selection, cross-over and mutation operators are applied to obtain new population in every generation. The parameters and choice of operators will affect the performance of the hybrid genetic algorithm. The ensemble of hybrid genetic algorithm will facilitate to have different parameters set and different choice of operators simultaneously. Each population will use different set of parameters and the offspring of each population will compete against the offspring of all other populations, which use different set of parameters. The effectiveness of proposed algorithm is demonstrated by phase unwrapping examples and advantages of the proposed method are discussed.
Mobile robot dynamic path planning based on improved genetic algorithm
NASA Astrophysics Data System (ADS)
Wang, Yong; Zhou, Heng; Wang, Ying
2017-08-01
In dynamic unknown environment, the dynamic path planning of mobile robots is a difficult problem. In this paper, a dynamic path planning method based on genetic algorithm is proposed, and a reward value model is designed to estimate the probability of dynamic obstacles on the path, and the reward value function is applied to the genetic algorithm. Unique coding techniques reduce the computational complexity of the algorithm. The fitness function of the genetic algorithm fully considers three factors: the security of the path, the shortest distance of the path and the reward value of the path. The simulation results show that the proposed genetic algorithm is efficient in all kinds of complex dynamic environments.
Urbanowicz, Ryan J.; Granizo-Mackenzie, Ambrose; Moore, Jason H.
2014-01-01
Michigan-style learning classifier systems (M-LCSs) represent an adaptive and powerful class of evolutionary algorithms which distribute the learned solution over a sizable population of rules. However their application to complex real world data mining problems, such as genetic association studies, has been limited. Traditional knowledge discovery strategies for M-LCS rule populations involve sorting and manual rule inspection. While this approach may be sufficient for simpler problems, the confounding influence of noise and the need to discriminate between predictive and non-predictive attributes calls for additional strategies. Additionally, tests of significance must be adapted to M-LCS analyses in order to make them a viable option within fields that require such analyses to assess confidence. In this work we introduce an M-LCS analysis pipeline that combines uniquely applied visualizations with objective statistical evaluation for the identification of predictive attributes, and reliable rule generalizations in noisy single-step data mining problems. This work considers an alternative paradigm for knowledge discovery in M-LCSs, shifting the focus from individual rules to a global, population-wide perspective. We demonstrate the efficacy of this pipeline applied to the identification of epistasis (i.e., attribute interaction) and heterogeneity in noisy simulated genetic association data. PMID:25431544
Yook, Sunhyun; Nam, Kyoung Won; Kim, Heepyung; Hong, Sung Hwa; Jang, Dong Pyo; Kim, In Young
2015-04-01
In order to provide more consistent sound intelligibility for the hearing-impaired person, regardless of environment, it is necessary to adjust the setting of the hearing-support (HS) device to accommodate various environmental circumstances. In this study, a fully automatic HS device management algorithm that can adapt to various environmental situations is proposed; it is composed of a listening-situation classifier, a noise-type classifier, an adaptive noise-reduction algorithm, and a management algorithm that can selectively turn on/off one or more of the three basic algorithms-beamforming, noise-reduction, and feedback cancellation-and can also adjust internal gains and parameters of the wide-dynamic-range compression (WDRC) and noise-reduction (NR) algorithms in accordance with variations in environmental situations. Experimental results demonstrated that the implemented algorithms can classify both listening situation and ambient noise type situations with high accuracies (92.8-96.4% and 90.9-99.4%, respectively), and the gains and parameters of the WDRC and NR algorithms were successfully adjusted according to variations in environmental situation. The average values of signal-to-noise ratio (SNR), frequency-weighted segmental SNR, Perceptual Evaluation of Speech Quality, and mean opinion test scores of 10 normal-hearing volunteers of the adaptive multiband spectral subtraction (MBSS) algorithm were improved by 1.74 dB, 2.11 dB, 0.49, and 0.68, respectively, compared to the conventional fixed-parameter MBSS algorithm. These results indicate that the proposed environment-adaptive management algorithm can be applied to HS devices to improve sound intelligibility for hearing-impaired individuals in various acoustic environments. Copyright © 2014 International Center for Artificial Organs and Transplantation and Wiley Periodicals, Inc.
An Efficient Rank Based Approach for Closest String and Closest Substring
2012-01-01
This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results. PMID:22675483
Stroke subtyping for genetic association studies? A comparison of the CCS and TOAST classifications.
Lanfranconi, Silvia; Markus, Hugh S
2013-12-01
A reliable and reproducible classification system of stroke subtype is essential for epidemiological and genetic studies. The Causative Classification of Stroke system is an evidence-based computerized algorithm with excellent inter-rater reliability. It has been suggested that, compared to the Trial of ORG 10172 in Acute Stroke Treatment classification, it increases the proportion of cases with defined subtype that may increase power in genetic association studies. We compared Trial of ORG 10172 in Acute Stroke Treatment and Causative Classification of Stroke system classifications in a large cohort of well-phenotyped stroke patients. Six hundred ninety consecutively recruited patients with first-ever ischemic stroke were classified, using review of clinical data and original imaging, according to the Trial of ORG 10172 in Acute Stroke Treatment and Causative Classification of Stroke system classifications. There was excellent agreement subtype assigned by between Trial of ORG 10172 in Acute Stroke Treatment and Causative Classification of Stroke system (kappa = 0·85). The agreement was excellent for the major individual subtypes: large artery atherosclerosis kappa = 0·888, small-artery occlusion kappa = 0·869, cardiac embolism kappa = 0·89, and undetermined category kappa = 0·884. There was only moderate agreement (kappa = 0·41) for the subjects with at least two competing underlying mechanism. Thirty-five (5·8%) patients classified as undetermined by Trial of ORG 10172 in Acute Stroke Treatment were assigned to a definite subtype by Causative Classification of Stroke system. Thirty-two subjects assigned to a definite subtype by Trial of ORG 10172 in Acute Stroke Treatment were classified as undetermined by Causative Classification of Stroke system. There is excellent agreement between classification using Trial of ORG 10172 in Acute Stroke Treatment and Causative Classification of Stroke systems but no evidence that Causative Classification of Stroke system reduced the proportion of patients classified to undetermined subtypes. The excellent inter-rater reproducibility and web-based semiautomated nature make Causative Classification of Stroke system suitable for multicenter studies, but the benefit of reclassifying cases already classified using the Trial of ORG 10172 in Acute Stroke Treatment system on existing databases is likely to be small. © 2012 The Authors. International Journal of Stroke © 2012 World Stroke Organization.
NASA Astrophysics Data System (ADS)
Turan, Muhammed K.; Sehirli, Eftal; Elen, Abdullah; Karas, Ismail R.
2015-07-01
Gel electrophoresis (GE) is one of the most used method to separate DNA, RNA, protein molecules according to size, weight and quantity parameters in many areas such as genetics, molecular biology, biochemistry, microbiology. The main way to separate each molecule is to find borders of each molecule fragment. This paper presents a software application that show columns edges of DNA fragments in 3 steps. In the first step the application obtains lane histograms of agarose gel electrophoresis images by doing projection based on x-axis. In the second step, it utilizes k-means clustering algorithm to classify point values of lane histogram such as left side values, right side values and undesired values. In the third step, column edges of DNA fragments is shown by using mean algorithm and mathematical processes to separate DNA fragments from the background in a fully automated way. In addition to this, the application presents locations of DNA fragments and how many DNA fragments exist on images captured by a scientific camera.
Detection of nicotine content impact in tobacco manufacturing using computational intelligence.
Begic Fazlic, Lejla; Avdagic, Zikrija
2011-01-01
A study is presented for the detection of nicotine impact in different cigarette type, using recorded data and Computational Intelligence techniques. Recorded puffs are processed using Continuous Wavelet Transform and used to extract time-frequency features for normal and abnormal puffs conditions. The wavelet energy distributions are used as inputs to classifiers based on Adaptive Neuro-Fuzzy Inference Systems (ANFIS) and Genetic Algorithms (GAs). The number and the parameters of Membership Functions are used in ANFIS along with the features from wavelet energy distributionare selected using GAs, maximising the diagnosis success. GA with ANFIS (GANFIS) are trained with a subset of data with known nicotine conditions. The trained GANFIS are tested using the other set of data (testing data). A classical method by High-Performance Liquid Chromatography is also introduced to solve this problem, respectively. The results as well as the performances of these two approaches are compared. A combination of these two algorithms is also suggested to improve the efficiency of this solution procedure. Computational results show that this combined algorithm is promising.
Design of fuzzy classifier for diabetes disease using Modified Artificial Bee Colony algorithm.
Beloufa, Fayssal; Chikh, M A
2013-10-01
In this study, diagnosis of diabetes disease, which is one of the most important diseases, is conducted with artificial intelligence techniques. We have proposed a novel Artificial Bee Colony (ABC) algorithm in which a mutation operator is added to an Artificial Bee Colony for improving its performance. When the current best solution cannot be updated, a blended crossover operator (BLX-α) of genetic algorithm is applied, in order to enhance the diversity of ABC, without compromising with the solution quality. This modified version of ABC is used as a new tool to create and optimize automatically the membership functions and rules base directly from data. We take the diabetes dataset used in our work from the UCI machine learning repository. The performances of the proposed method are evaluated through classification rate, sensitivity and specificity values using 10-fold cross-validation method. The obtained classification rate of our method is 84.21% and it is very promising when compared with the previous research in the literature for the same problem. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Sun, Xiao-Gang; Tang, Hong; Yuan, Gui-Bin
2008-05-01
For the total light scattering particle sizing technique, an inversion and classification method was proposed with the dependent model algorithm. The measured particle system was inversed simultaneously by different particle distribution functions whose mathematic model was known in advance, and then classified according to the inversion errors. The simulation experiments illustrated that it is feasible to use the inversion errors to determine the particle size distribution. The particle size distribution function was obtained accurately at only three wavelengths in the visible light range with the genetic algorithm, and the inversion results were steady and reliable, which decreased the number of multi wavelengths to the greatest extent and increased the selectivity of light source. The single peak distribution inversion error was less than 5% and the bimodal distribution inversion error was less than 10% when 5% stochastic noise was put in the transmission extinction measurement values at two wavelengths. The running time of this method was less than 2 s. The method has advantages of simplicity, rapidity, and suitability for on-line particle size measurement.
A hybrid genetic algorithm for resolving closely spaced objects
NASA Technical Reports Server (NTRS)
Abbott, R. J.; Lillo, W. E.; Schulenburg, N.
1995-01-01
A hybrid genetic algorithm is described for performing the difficult optimization task of resolving closely spaced objects appearing in space based and ground based surveillance data. This application of genetic algorithms is unusual in that it uses a powerful domain-specific operation as a genetic operator. Results of applying the algorithm to real data from telescopic observations of a star field are presented.
Genetic Algorithm Tuned Fuzzy Logic for Gliding Return Trajectories
NASA Technical Reports Server (NTRS)
Burchett, Bradley T.
2003-01-01
The problem of designing and flying a trajectory for successful recovery of a reusable launch vehicle is tackled using fuzzy logic control with genetic algorithm optimization. The plant is approximated by a simplified three degree of freedom non-linear model. A baseline trajectory design and guidance algorithm consisting of several Mamdani type fuzzy controllers is tuned using a simple genetic algorithm. Preliminary results show that the performance of the overall system is shown to improve with genetic algorithm tuning.
Optimization of Support Vector Machine (SVM) for Object Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew; Dhingra, Neil; Lu, Thomas T.; Chao, Tien-Hsin
2012-01-01
The Support Vector Machine (SVM) is a powerful algorithm, useful in classifying data into species. The SVMs implemented in this research were used as classifiers for the final stage in a Multistage Automatic Target Recognition (ATR) system. A single kernel SVM known as SVMlight, and a modified version known as a SVM with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SVM as a method for classification. From trial to trial, SVM produces consistent results.
Fine-scaled human genetic structure revealed by SNP microarrays.
Xing, Jinchuan; Watkins, W Scott; Witherspoon, David J; Zhang, Yuhua; Guthery, Stephen L; Thara, Rangaswamy; Mowry, Bryan J; Bulayeva, Kazima; Weiss, Robert B; Jorde, Lynn B
2009-05-01
We report an analysis of more than 240,000 loci genotyped using the Affymetrix SNP microarray in 554 individuals from 27 worldwide populations in Africa, Asia, and Europe. To provide a more extensive and complete sampling of human genetic variation, we have included caste and tribal samples from two states in South India, Daghestanis from eastern Europe, and the Iban from Malaysia. Consistent with observations made by Charles Darwin, our results highlight shared variation among human populations and demonstrate that much genetic variation is geographically continuous. At the same time, principal components analyses reveal discernible genetic differentiation among almost all identified populations in our sample, and in most cases, individuals can be clearly assigned to defined populations on the basis of SNP genotypes. All individuals are accurately classified into continental groups using a model-based clustering algorithm, but between closely related populations, genetic and self-classifications conflict for some individuals. The 250K data permitted high-level resolution of genetic variation among Indian caste and tribal populations and between highland and lowland Daghestani populations. In particular, upper-caste individuals from Tamil Nadu and Andhra Pradesh form one defined group, lower-caste individuals from these two states form another, and the tribal Irula samples form a third. Our results emphasize the correlation of genetic and geographic distances and highlight other elements, including social factors that have contributed to population structure.
2017-01-01
Background Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development. Objective The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors’ professional performance in the United Kingdom. Methods We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians’ colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Results Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to “popular” (recall=.97), “innovator” (recall=.98), and “respected” (recall=.87) codes and was lower for the “interpersonal” (recall=.80) and “professional” (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as “respected,” “professional,” and “interpersonal” related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05). Conclusions Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor’s performance. PMID:28298265
Progressive Classification Using Support Vector Machines
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri; Kocurek, Michael
2009-01-01
An algorithm for progressive classification of data, analogous to progressive rendering of images, makes it possible to compromise between speed and accuracy. This algorithm uses support vector machines (SVMs) to classify data. An SVM is a machine learning algorithm that builds a mathematical model of the desired classification concept by identifying the critical data points, called support vectors. Coarse approximations to the concept require only a few support vectors, while precise, highly accurate models require far more support vectors. Once the model has been constructed, the SVM can be applied to new observations. The cost of classifying a new observation is proportional to the number of support vectors in the model. When computational resources are limited, an SVM of the appropriate complexity can be produced. However, if the constraints are not known when the model is constructed, or if they can change over time, a method for adaptively responding to the current resource constraints is required. This capability is particularly relevant for spacecraft (or any other real-time systems) that perform onboard data analysis. The new algorithm enables the fast, interactive application of an SVM classifier to a new set of data. The classification process achieved by this algorithm is characterized as progressive because a coarse approximation to the true classification is generated rapidly and thereafter iteratively refined. The algorithm uses two SVMs: (1) a fast, approximate one and (2) slow, highly accurate one. New data are initially classified by the fast SVM, producing a baseline approximate classification. For each classified data point, the algorithm calculates a confidence index that indicates the likelihood that it was classified correctly in the first pass. Next, the data points are sorted by their confidence indices and progressively reclassified by the slower, more accurate SVM, starting with the items most likely to be incorrectly classified. The user can halt this reclassification process at any point, thereby obtaining the best possible result for a given amount of computation time. Alternatively, the results can be displayed as they are generated, providing the user with real-time feedback about the current accuracy of classification.
Gibbons, Chris; Richards, Suzanne; Valderas, Jose Maria; Campbell, John
2017-03-15
Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development. The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom. We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05). Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor's performance. ©Chris Gibbons, Suzanne Richards, Jose Maria Valderas, John Campbell. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.03.2017.
Learning Intelligent Genetic Algorithms Using Japanese Nonograms
ERIC Educational Resources Information Center
Tsai, Jinn-Tsong; Chou, Ping-Yi; Fang, Jia-Cen
2012-01-01
An intelligent genetic algorithm (IGA) is proposed to solve Japanese nonograms and is used as a method in a university course to learn evolutionary algorithms. The IGA combines the global exploration capabilities of a canonical genetic algorithm (CGA) with effective condensed encoding, improved fitness function, and modified crossover and…
A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update
NASA Astrophysics Data System (ADS)
Lotte, F.; Bougrain, L.; Cichocki, A.; Clerc, M.; Congedo, M.; Rakotomamonjy, A.; Yger, F.
2018-06-01
Objective. Most current electroencephalography (EEG)-based brain–computer interfaces (BCIs) are based on machine learning algorithms. There is a large diversity of classifier types that are used in this field, as described in our 2007 review paper. Now, approximately ten years after this review publication, many new algorithms have been developed and tested to classify EEG signals in BCIs. The time is therefore ripe for an updated review of EEG classification algorithms for BCIs. Approach. We surveyed the BCI and machine learning literature from 2007 to 2017 to identify the new classification approaches that have been investigated to design BCIs. We synthesize these studies in order to present such algorithms, to report how they were used for BCIs, what were the outcomes, and to identify their pros and cons. Main results. We found that the recently designed classification algorithms for EEG-based BCIs can be divided into four main categories: adaptive classifiers, matrix and tensor classifiers, transfer learning and deep learning, plus a few other miscellaneous classifiers. Among these, adaptive classifiers were demonstrated to be generally superior to static ones, even with unsupervised adaptation. Transfer learning can also prove useful although the benefits of transfer learning remain unpredictable. Riemannian geometry-based methods have reached state-of-the-art performances on multiple BCI problems and deserve to be explored more thoroughly, along with tensor-based methods. Shrinkage linear discriminant analysis and random forests also appear particularly useful for small training samples settings. On the other hand, deep learning methods have not yet shown convincing improvement over state-of-the-art BCI methods. Significance. This paper provides a comprehensive overview of the modern classification algorithms used in EEG-based BCIs, presents the principles of these methods and guidelines on when and how to use them. It also identifies a number of challenges to further advance EEG classification in BCI.
Contextual classification on a CDC Flexible Processor system. [for photomapped remote sensing data
NASA Technical Reports Server (NTRS)
Smith, B. W.; Siegel, H. J.; Swain, P. H.
1981-01-01
A potential hardware organization for the Flexible Processor Array is presented. An algorithm that implements a contextual classifier for remote sensing data analysis is given, along with uniprocessor classification algorithms. The Flexible Processor algorithm is provided, as are simulated timings for contextual classifiers run on the Flexible Processor Array and another system. The timings are analyzed for context neighborhoods of sizes three and nine.
MacRae, J; Darlow, B; McBain, L; Jones, O; Stubbe, M; Turner, N; Dowell, A
2015-08-21
To develop a natural language processing software inference algorithm to classify the content of primary care consultations using electronic health record Big Data and subsequently test the algorithm's ability to estimate the prevalence and burden of childhood respiratory illness in primary care. Algorithm development and validation study. To classify consultations, the algorithm is designed to interrogate clinical narrative entered as free text, diagnostic (Read) codes created and medications prescribed on the day of the consultation. Thirty-six consenting primary care practices from a mixed urban and semirural region of New Zealand. Three independent sets of 1200 child consultation records were randomly extracted from a data set of all general practitioner consultations in participating practices between 1 January 2008-31 December 2013 for children under 18 years of age (n=754,242). Each consultation record within these sets was independently classified by two expert clinicians as respiratory or non-respiratory, and subclassified according to respiratory diagnostic categories to create three 'gold standard' sets of classified records. These three gold standard record sets were used to train, test and validate the algorithm. Sensitivity, specificity, positive predictive value and F-measure were calculated to illustrate the algorithm's ability to replicate judgements of expert clinicians within the 1200 record gold standard validation set. The algorithm was able to identify respiratory consultations in the 1200 record validation set with a sensitivity of 0.72 (95% CI 0.67 to 0.78) and a specificity of 0.95 (95% CI 0.93 to 0.98). The positive predictive value of algorithm respiratory classification was 0.93 (95% CI 0.89 to 0.97). The positive predictive value of the algorithm classifying consultations as being related to specific respiratory diagnostic categories ranged from 0.68 (95% CI 0.40 to 1.00; other respiratory conditions) to 0.91 (95% CI 0.79 to 1.00; throat infections). A software inference algorithm that uses primary care Big Data can accurately classify the content of clinical consultations. This algorithm will enable accurate estimation of the prevalence of childhood respiratory illness in primary care and resultant service utilisation. The methodology can also be applied to other areas of clinical care. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Genetic algorithms with memory- and elitism-based immigrants in dynamic environments.
Yang, Shengxiang
2008-01-01
In recent years the genetic algorithm community has shown a growing interest in studying dynamic optimization problems. Several approaches have been devised. The random immigrants and memory schemes are two major ones. The random immigrants scheme addresses dynamic environments by maintaining the population diversity while the memory scheme aims to adapt genetic algorithms quickly to new environments by reusing historical information. This paper investigates a hybrid memory and random immigrants scheme, called memory-based immigrants, and a hybrid elitism and random immigrants scheme, called elitism-based immigrants, for genetic algorithms in dynamic environments. In these schemes, the best individual from memory or the elite from the previous generation is retrieved as the base to create immigrants into the population by mutation. This way, not only can diversity be maintained but it is done more efficiently to adapt genetic algorithms to the current environment. Based on a series of systematically constructed dynamic problems, experiments are carried out to compare genetic algorithms with the memory-based and elitism-based immigrants schemes against genetic algorithms with traditional memory and random immigrants schemes and a hybrid memory and multi-population scheme. The sensitivity analysis regarding some key parameters is also carried out. Experimental results show that the memory-based and elitism-based immigrants schemes efficiently improve the performance of genetic algorithms in dynamic environments.
Segmentation and detection of fluorescent 3D spots.
Ram, Sundaresh; Rodríguez, Jeffrey J; Bosco, Giovanni
2012-03-01
The 3D spatial organization of genes and other genetic elements within the nucleus is important for regulating gene expression. Understanding how this spatial organization is established and maintained throughout the life of a cell is key to elucidating the many layers of gene regulation. Quantitative methods for studying nuclear organization will lead to insights into the molecular mechanisms that maintain gene organization as well as serve as diagnostic tools for pathologies caused by loss of nuclear structure. However, biologists currently lack automated and high throughput methods for quantitative and qualitative global analysis of 3D gene organization. In this study, we use confocal microscopy and fluorescence in-situ hybridization (FISH) as a cytogenetic technique to detect and localize the presence of specific DNA sequences in 3D. FISH uses probes that bind to specific targeted locations on the chromosomes, appearing as fluorescent spots in 3D images obtained using fluorescence microscopy. In this article, we propose an automated algorithm for segmentation and detection of 3D FISH spots. The algorithm is divided into two stages: spot segmentation and spot detection. Spot segmentation consists of 3D anisotropic smoothing to reduce the effect of noise, top-hat filtering, and intensity thresholding, followed by 3D region-growing. Spot detection uses a Bayesian classifier with spot features such as volume, average intensity, texture, and contrast to detect and classify the segmented spots as either true or false spots. Quantitative assessment of the proposed algorithm demonstrates improved segmentation and detection accuracy compared to other techniques. Copyright © 2012 International Society for Advancement of Cytometry.
Boiler-turbine control system design using a genetic algorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dimeo, R.; Lee, K.Y.
1995-12-01
This paper discusses the application of a genetic algorithm to control system design for a boiler-turbine plant. In particular the authors study the ability of the genetic algorithm to develop a proportional-integral (PI) controller and a state feedback controller for a non-linear multi-input/multi-output (MIMO) plant model. The plant model is presented along with a discussion of the inherent difficulties in such controller development. A sketch of the genetic algorithm (GA) is presented and its strategy as a method of control system design is discussed. Results are presented for two different control systems that have been designed with the genetic algorithm.
Method for hyperspectral imagery exploitation and pixel spectral unmixing
NASA Technical Reports Server (NTRS)
Lin, Ching-Fang (Inventor)
2003-01-01
An efficiently hybrid approach to exploit hyperspectral imagery and unmix spectral pixels. This hybrid approach uses a genetic algorithm to solve the abundance vector for the first pixel of a hyperspectral image cube. This abundance vector is used as initial state in a robust filter to derive the abundance estimate for the next pixel. By using Kalman filter, the abundance estimate for a pixel can be obtained in one iteration procedure which is much fast than genetic algorithm. The output of the robust filter is fed to genetic algorithm again to derive accurate abundance estimate for the current pixel. The using of robust filter solution as starting point of the genetic algorithm speeds up the evolution of the genetic algorithm. After obtaining the accurate abundance estimate, the procedure goes to next pixel, and uses the output of genetic algorithm as the previous state estimate to derive abundance estimate for this pixel using robust filter. And again use the genetic algorithm to derive accurate abundance estimate efficiently based on the robust filter solution. This iteration continues until pixels in a hyperspectral image cube end.
Integrative genetic risk prediction using non-parametric empirical Bayes classification.
Zhao, Sihai Dave
2017-06-01
Genetic risk prediction is an important component of individualized medicine, but prediction accuracies remain low for many complex diseases. A fundamental limitation is the sample sizes of the studies on which the prediction algorithms are trained. One way to increase the effective sample size is to integrate information from previously existing studies. However, it can be difficult to find existing data that examine the target disease of interest, especially if that disease is rare or poorly studied. Furthermore, individual-level genotype data from these auxiliary studies are typically difficult to obtain. This article proposes a new approach to integrative genetic risk prediction of complex diseases with binary phenotypes. It accommodates possible heterogeneity in the genetic etiologies of the target and auxiliary diseases using a tuning parameter-free non-parametric empirical Bayes procedure, and can be trained using only auxiliary summary statistics. Simulation studies show that the proposed method can provide superior predictive accuracy relative to non-integrative as well as integrative classifiers. The method is applied to a recent study of pediatric autoimmune diseases, where it substantially reduces prediction error for certain target/auxiliary disease combinations. The proposed method is implemented in the R package ssa. © 2016, The International Biometric Society.
A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory.
Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco
2011-01-01
Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithms.
Performance evaluation of various classifiers for color prediction of rice paddy plant leaf
NASA Astrophysics Data System (ADS)
Singh, Amandeep; Singh, Maninder Lal
2016-11-01
The food industry is one of the industries that uses machine vision for a nondestructive quality evaluation of the produce. These quality measuring systems and softwares are precalculated on the basis of various image-processing algorithms which generally use a particular type of classifier. These classifiers play a vital role in making the algorithms so intelligent that it can contribute its best while performing the said quality evaluations by translating the human perception into machine vision and hence machine learning. The crop of interest is rice, and the color of this crop indicates the health status of the plant. An enormous number of classifiers are available to solve the purpose of color prediction, but choosing the best among them is the focus of this paper. Performance of a total of 60 classifiers has been analyzed from the application point of view, and the results have been discussed. The motivation comes from the idea of providing a set of classifiers with excellent performance and implementing them on a single algorithm for the improvement of machine vision learning and, hence, associated applications.
NASA Astrophysics Data System (ADS)
Roverso, Davide
2003-08-01
Many-class learning is the problem of training a classifier to discriminate among a large number of target classes. Together with the problem of dealing with high-dimensional patterns (i.e. a high-dimensional input space), the many class problem (i.e. a high-dimensional output space) is a major obstacle to be faced when scaling-up classifier systems and algorithms from small pilot applications to large full-scale applications. The Autonomous Recursive Task Decomposition (ARTD) algorithm is here proposed as a solution to the problem of many-class learning. Example applications of ARTD to neural classifier training are also presented. In these examples, improvements in training time are shown to range from 4-fold to more than 30-fold in pattern classification tasks of both static and dynamic character.
Automatic morphological classification of galaxy images
Shamir, Lior
2009-01-01
We describe an image analysis supervised learning algorithm that can automatically classify galaxy images. The algorithm is first trained using a manually classified images of elliptical, spiral, and edge-on galaxies. A large set of image features is extracted from each image, and the most informative features are selected using Fisher scores. Test images can then be classified using a simple Weighted Nearest Neighbor rule such that the Fisher scores are used as the feature weights. Experimental results show that galaxy images from Galaxy Zoo can be classified automatically to spiral, elliptical and edge-on galaxies with accuracy of ~90% compared to classifications carried out by the author. Full compilable source code of the algorithm is available for free download, and its general-purpose nature makes it suitable for other uses that involve automatic image analysis of celestial objects. PMID:20161594
Genetics-based control of a mimo boiler-turbine plant
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dimeo, R.M.; Lee, K.Y.
1994-12-31
A genetic algorithm is used to develop an optimal controller for a non-linear, multi-input/multi-output boiler-turbine plant. The algorithm is used to train a control system for the plant over a wide operating range in an effort to obtain better performance. The results of the genetic algorithm`s controller designed from the linearized plant model at a nominal operating point. Because the genetic algorithm is well-suited to solving traditionally difficult optimization problems it is found that the algorithm is capable of developing the controller based on input/output information only. This controller achieves a performance comparable to the standard linear quadratic regulator.
Improved classification accuracy by feature extraction using genetic algorithms
NASA Astrophysics Data System (ADS)
Patriarche, Julia; Manduca, Armando; Erickson, Bradley J.
2003-05-01
A feature extraction algorithm has been developed for the purposes of improving classification accuracy. The algorithm uses a genetic algorithm / hill-climber hybrid to generate a set of linearly recombined features, which may be of reduced dimensionality compared with the original set. The genetic algorithm performs the global exploration, and a hill climber explores local neighborhoods. Hybridizing the genetic algorithm with a hill climber improves both the rate of convergence, and the final overall cost function value; it also reduces the sensitivity of the genetic algorithm to parameter selection. The genetic algorithm includes the operators: crossover, mutation, and deletion / reactivation - the last of these effects dimensionality reduction. The feature extractor is supervised, and is capable of deriving a separate feature space for each tissue (which are reintegrated during classification). A non-anatomical digital phantom was developed as a gold standard for testing purposes. In tests with the phantom, and with images of multiple sclerosis patients, classification with feature extractor derived features yielded lower error rates than using standard pulse sequences, and with features derived using principal components analysis. Using the multiple sclerosis patient data, the algorithm resulted in a mean 31% reduction in classification error of pure tissues.
Vision-based posture recognition using an ensemble classifier and a vote filter
NASA Astrophysics Data System (ADS)
Ji, Peng; Wu, Changcheng; Xu, Xiaonong; Song, Aiguo; Li, Huijun
2016-10-01
Posture recognition is a very important Human-Robot Interaction (HRI) way. To segment effective posture from an image, we propose an improved region grow algorithm which combining with the Single Gauss Color Model. The experiment shows that the improved region grow algorithm can get the complete and accurate posture than traditional Single Gauss Model and region grow algorithm, and it can eliminate the similar region from the background at the same time. In the posture recognition part, and in order to improve the recognition rate, we propose a CNN ensemble classifier, and in order to reduce the misjudgments during a continuous gesture control, a vote filter is proposed and applied to the sequence of recognition results. Comparing with CNN classifier, the CNN ensemble classifier we proposed can yield a 96.27% recognition rate, which is better than that of CNN classifier, and the proposed vote filter can improve the recognition result and reduce the misjudgments during the consecutive gesture switch.
Testing of the Support Vector Machine for Binary-Class Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew
2011-01-01
The Support Vector Machine is a powerful algorithm, useful in classifying data in to species. The Support Vector Machines implemented in this research were used as classifiers for the final stage in a Multistage Autonomous Target Recognition system. A single kernel SVM known as SVMlight, and a modified version known as a Support Vector Machine with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SMV as a method for classification. From trial to trial, SVM produces consistent results
Watson, Robert A
2014-08-01
To test the hypothesis that machine learning algorithms increase the predictive power to classify surgical expertise using surgeons' hand motion patterns. In 2012 at the University of North Carolina at Chapel Hill, 14 surgical attendings and 10 first- and second-year surgical residents each performed two bench model venous anastomoses. During the simulated tasks, the participants wore an inertial measurement unit on the dorsum of their dominant (right) hand to capture their hand motion patterns. The pattern from each bench model task performed was preprocessed into a symbolic time series and labeled as expert (attending) or novice (resident). The labeled hand motion patterns were processed and used to train a Support Vector Machine (SVM) classification algorithm. The trained algorithm was then tested for discriminative/predictive power against unlabeled (blinded) hand motion patterns from tasks not used in the training. The Lempel-Ziv (LZ) complexity metric was also measured from each hand motion pattern, with an optimal threshold calculated to separately classify the patterns. The LZ metric classified unlabeled (blinded) hand motion patterns into expert and novice groups with an accuracy of 70% (sensitivity 64%, specificity 80%). The SVM algorithm had an accuracy of 83% (sensitivity 86%, specificity 80%). The results confirmed the hypothesis. The SVM algorithm increased the predictive power to classify blinded surgical hand motion patterns into expert versus novice groups. With further development, the system used in this study could become a viable tool for low-cost, objective assessment of procedural proficiency in a competency-based curriculum.
NASA Astrophysics Data System (ADS)
Moon, Byung-Young
2005-12-01
The hybrid neural-genetic multi-model parameter estimation algorithm was demonstrated. This method can be applied to structured system identification of electro-hydraulic servo system. This algorithms consist of a recurrent incremental credit assignment(ICRA) neural network and a genetic algorithm. The ICRA neural network evaluates each member of a generation of model and genetic algorithm produces new generation of model. To evaluate the proposed method, electro-hydraulic servo system was designed and manufactured. The experiment was carried out to figure out the hybrid neural-genetic multi-model parameter estimation algorithm. As a result, the dynamic characteristics were obtained such as the parameters(mass, damping coefficient, bulk modulus, spring coefficient), which minimize total square error. The result of this study can be applied to hydraulic systems in industrial fields.
New Dandelion Algorithm Optimizes Extreme Learning Machine for Biomedical Classification Problems
Li, Xiguang; Zhao, Liang; Gong, Changqing; Liu, Xiaojing
2017-01-01
Inspired by the behavior of dandelion sowing, a new novel swarm intelligence algorithm, namely, dandelion algorithm (DA), is proposed for global optimization of complex functions in this paper. In DA, the dandelion population will be divided into two subpopulations, and different subpopulations will undergo different sowing behaviors. Moreover, another sowing method is designed to jump out of local optimum. In order to demonstrate the validation of DA, we compare the proposed algorithm with other existing algorithms, including bat algorithm, particle swarm optimization, and enhanced fireworks algorithm. Simulations show that the proposed algorithm seems much superior to other algorithms. At the same time, the proposed algorithm can be applied to optimize extreme learning machine (ELM) for biomedical classification problems, and the effect is considerable. At last, we use different fusion methods to form different fusion classifiers, and the fusion classifiers can achieve higher accuracy and better stability to some extent. PMID:29085425
Comparison of genetic algorithm methods for fuel management optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
DeChaine, M.D.; Feltus, M.A.
1995-12-31
The CIGARO system was developed for genetic algorithm fuel management optimization. Tests are performed to find the best fuel location swap mutation operator probability and to compare genetic algorithm to a truly random search method. Tests showed the fuel swap probability should be between 0% and 10%, and a 50% definitely hampered the optimization. The genetic algorithm performed significantly better than the random search method, which did not even satisfy the peak normalized power constraint.
Training product unit neural networks with genetic algorithms
NASA Technical Reports Server (NTRS)
Janson, D. J.; Frenzel, J. F.; Thelen, D. C.
1991-01-01
The training of product neural networks using genetic algorithms is discussed. Two unusual neural network techniques are combined; product units are employed instead of the traditional summing units and genetic algorithms train the network rather than backpropagation. As an example, a neural netork is trained to calculate the optimum width of transistors in a CMOS switch. It is shown how local minima affect the performance of a genetic algorithm, and one method of overcoming this is presented.
New Results in Astrodynamics Using Genetic Algorithms
NASA Technical Reports Server (NTRS)
Coverstone-Carroll, V.; Hartmann, J. W.; Williams, S. N.; Mason, W. J.
1998-01-01
Generic algorithms have gained popularity as an effective procedure for obtaining solutions to traditionally difficult space mission optimization problems. In this paper, a brief survey of the use of genetic algorithms to solve astrodynamics problems is presented and is followed by new results obtained from applying a Pareto genetic algorithm to the optimization of low-thrust interplanetary spacecraft missions.
Ali, Safdar; Majid, Abdul
2015-04-01
The diagnostic of human breast cancer is an intricate process and specific indicators may produce negative results. In order to avoid misleading results, accurate and reliable diagnostic system for breast cancer is indispensable. Recently, several interesting machine-learning (ML) approaches are proposed for prediction of breast cancer. To this end, we developed a novel classifier stacking based evolutionary ensemble system "Can-Evo-Ens" for predicting amino acid sequences associated with breast cancer. In this paper, first, we selected four diverse-type of ML algorithms of Naïve Bayes, K-Nearest Neighbor, Support Vector Machines, and Random Forest as base-level classifiers. These classifiers are trained individually in different feature spaces using physicochemical properties of amino acids. In order to exploit the decision spaces, the preliminary predictions of base-level classifiers are stacked. Genetic programming (GP) is then employed to develop a meta-classifier that optimal combine the predictions of the base classifiers. The most suitable threshold value of the best-evolved predictor is computed using Particle Swarm Optimization technique. Our experiments have demonstrated the robustness of Can-Evo-Ens system for independent validation dataset. The proposed system has achieved the highest value of Area Under Curve (AUC) of ROC Curve of 99.95% for cancer prediction. The comparative results revealed that proposed approach is better than individual ML approaches and conventional ensemble approaches of AdaBoostM1, Bagging, GentleBoost, and Random Subspace. It is expected that the proposed novel system would have a major impact on the fields of Biomedical, Genomics, Proteomics, Bioinformatics, and Drug Development. Copyright © 2015 Elsevier Inc. All rights reserved.
2017-01-01
In this paper, we propose a new automatic hyperparameter selection approach for determining the optimal network configuration (network structure and hyperparameters) for deep neural networks using particle swarm optimization (PSO) in combination with a steepest gradient descent algorithm. In the proposed approach, network configurations were coded as a set of real-number m-dimensional vectors as the individuals of the PSO algorithm in the search procedure. During the search procedure, the PSO algorithm is employed to search for optimal network configurations via the particles moving in a finite search space, and the steepest gradient descent algorithm is used to train the DNN classifier with a few training epochs (to find a local optimal solution) during the population evaluation of PSO. After the optimization scheme, the steepest gradient descent algorithm is performed with more epochs and the final solutions (pbest and gbest) of the PSO algorithm to train a final ensemble model and individual DNN classifiers, respectively. The local search ability of the steepest gradient descent algorithm and the global search capabilities of the PSO algorithm are exploited to determine an optimal solution that is close to the global optimum. We constructed several experiments on hand-written characters and biological activity prediction datasets to show that the DNN classifiers trained by the network configurations expressed by the final solutions of the PSO algorithm, employed to construct an ensemble model and individual classifier, outperform the random approach in terms of the generalization performance. Therefore, the proposed approach can be regarded an alternative tool for automatic network structure and parameter selection for deep neural networks. PMID:29236718
Jackowski, Konrad; Krawczyk, Bartosz; Woźniak, Michał
2014-05-01
Currently, methods of combined classification are the focus of intense research. A properly designed group of combined classifiers exploiting knowledge gathered in a pool of elementary classifiers can successfully outperform a single classifier. There are two essential issues to consider when creating combined classifiers: how to establish the most comprehensive pool and how to design a fusion model that allows for taking full advantage of the collected knowledge. In this work, we address the issues and propose an AdaSS+, training algorithm dedicated for the compound classifier system that effectively exploits local specialization of the elementary classifiers. An effective training procedure consists of two phases. The first phase detects the classifier competencies and adjusts the respective fusion parameters. The second phase boosts classification accuracy by elevating the degree of local specialization. The quality of the proposed algorithms are evaluated on the basis of a wide range of computer experiments that show that AdaSS+ can outperform the original method and several reference classifiers.
Nonlinear inversion of potential-field data using a hybrid-encoding genetic algorithm
Chen, C.; Xia, J.; Liu, J.; Feng, G.
2006-01-01
Using a genetic algorithm to solve an inverse problem of complex nonlinear geophysical equations is advantageous because it does not require computer gradients of models or "good" initial models. The multi-point search of a genetic algorithm makes it easier to find the globally optimal solution while avoiding falling into a local extremum. As is the case in other optimization approaches, the search efficiency for a genetic algorithm is vital in finding desired solutions successfully in a multi-dimensional model space. A binary-encoding genetic algorithm is hardly ever used to resolve an optimization problem such as a simple geophysical inversion with only three unknowns. The encoding mechanism, genetic operators, and population size of the genetic algorithm greatly affect search processes in the evolution. It is clear that improved operators and proper population size promote the convergence. Nevertheless, not all genetic operations perform perfectly while searching under either a uniform binary or a decimal encoding system. With the binary encoding mechanism, the crossover scheme may produce more new individuals than with the decimal encoding. On the other hand, the mutation scheme in a decimal encoding system will create new genes larger in scope than those in the binary encoding. This paper discusses approaches of exploiting the search potential of genetic operations in the two encoding systems and presents an approach with a hybrid-encoding mechanism, multi-point crossover, and dynamic population size for geophysical inversion. We present a method that is based on the routine in which the mutation operation is conducted in the decimal code and multi-point crossover operation in the binary code. The mix-encoding algorithm is called the hybrid-encoding genetic algorithm (HEGA). HEGA provides better genes with a higher probability by a mutation operator and improves genetic algorithms in resolving complicated geophysical inverse problems. Another significant result is that final solution is determined by the average model derived from multiple trials instead of one computation due to the randomness in a genetic algorithm procedure. These advantages were demonstrated by synthetic and real-world examples of inversion of potential-field data. ?? 2005 Elsevier Ltd. All rights reserved.
Reduction from cost-sensitive ordinal ranking to weighted binary classification.
Lin, Hsuan-Tien; Li, Ling
2012-05-01
We present a reduction framework from ordinal ranking to binary classification. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a ranker from the binary classifier. Based on the framework, we show that a weighted 0/1 loss of the binary classifier upper-bounds the mislabeling cost of the ranker, both error-wise and regret-wise. Our framework allows not only the design of good ordinal ranking algorithms based on well-tuned binary classification approaches, but also the derivation of new generalization bounds for ordinal ranking from known bounds for binary classification. In addition, our framework unifies many existing ordinal ranking algorithms, such as perceptron ranking and support vector ordinal regression. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages in terms of both training speed and generalization performance over existing algorithms. In addition, the newly designed algorithms lead to better cost-sensitive ordinal ranking performance, as well as improved listwise ranking performance.
Chen, Zhiru; Hong, Wenxue
2016-02-01
Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
2016-12-01
Evaluated Genetic Algorithm prepared by Justin L Paul Academy of Applied Science 24 Warren Street Concord, NH 03301 under contract W911SR...Supersonic Bending Body Projectile by a Vector-Evaluated Genetic Algorithm prepared by Justin L Paul Academy of Applied Science 24 Warren Street... Genetic Algorithm 5a. CONTRACT NUMBER W199SR-15-2-001 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Justin L Paul 5d. PROJECT
Quantum ensembles of quantum classifiers.
Schuld, Maria; Petruccione, Francesco
2018-02-09
Quantum machine learning witnesses an increasing amount of quantum algorithms for data-driven decision making, a problem with potential applications ranging from automated image recognition to medical diagnosis. Many of those algorithms are implementations of quantum classifiers, or models for the classification of data inputs with a quantum computer. Following the success of collective decision making with ensembles in classical machine learning, this paper introduces the concept of quantum ensembles of quantum classifiers. Creating the ensemble corresponds to a state preparation routine, after which the quantum classifiers are evaluated in parallel and their combined decision is accessed by a single-qubit measurement. This framework naturally allows for exponentially large ensembles in which - similar to Bayesian learning - the individual classifiers do not have to be trained. As an example, we analyse an exponentially large quantum ensemble in which each classifier is weighed according to its performance in classifying the training data, leading to new results for quantum as well as classical machine learning.
Comparison of Different EHG Feature Selection Methods for the Detection of Preterm Labor
Alamedine, D.; Khalil, M.; Marque, C.
2013-01-01
Numerous types of linear and nonlinear features have been extracted from the electrohysterogram (EHG) in order to classify labor and pregnancy contractions. As a result, the number of available features is now very large. The goal of this study is to reduce the number of features by selecting only the relevant ones which are useful for solving the classification problem. This paper presents three methods for feature subset selection that can be applied to choose the best subsets for classifying labor and pregnancy contractions: an algorithm using the Jeffrey divergence (JD) distance, a sequential forward selection (SFS) algorithm, and a binary particle swarm optimization (BPSO) algorithm. The two last methods are based on a classifier and were tested with three types of classifiers. These methods have allowed us to identify common features which are relevant for contraction classification. PMID:24454536
Vehicle Classification Using an Imbalanced Dataset Based on a Single Magnetic Sensor.
Xu, Chang; Wang, Yingguan; Bao, Xinghe; Li, Fengrong
2018-05-24
This paper aims to improve the accuracy of automatic vehicle classifiers for imbalanced datasets. Classification is made through utilizing a single anisotropic magnetoresistive sensor, with the models of vehicles involved being classified into hatchbacks, sedans, buses, and multi-purpose vehicles (MPVs). Using time domain and frequency domain features in combination with three common classification algorithms in pattern recognition, we develop a novel feature extraction method for vehicle classification. These three common classification algorithms are the k-nearest neighbor, the support vector machine, and the back-propagation neural network. Nevertheless, a problem remains with the original vehicle magnetic dataset collected being imbalanced, and may lead to inaccurate classification results. With this in mind, we propose an approach called SMOTE, which can further boost the performance of classifiers. Experimental results show that the k-nearest neighbor (KNN) classifier with the SMOTE algorithm can reach a classification accuracy of 95.46%, thus minimizing the effect of the imbalance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, Xiaobiao; Safranek, James
2014-09-01
Nonlinear dynamics optimization is carried out for a low emittance upgrade lattice of SPEAR3 in order to improve its dynamic aperture and Touschek lifetime. Two multi-objective optimization algorithms, a genetic algorithm and a particle swarm algorithm, are used for this study. The performance of the two algorithms are compared. The result shows that the particle swarm algorithm converges significantly faster to similar or better solutions than the genetic algorithm and it does not require seeding of good solutions in the initial population. These advantages of the particle swarm algorithm may make it more suitable for many accelerator optimization applications.
Genetic Algorithm for Initial Orbit Determination with Too Short Arc (Continued)
NASA Astrophysics Data System (ADS)
Li, X. R.; Wang, X.
2016-03-01
When using the genetic algorithm to solve the problem of too-short-arc (TSA) determination, due to the difference of computing processes between the genetic algorithm and classical method, the methods for outliers editing are no longer applicable. In the genetic algorithm, the robust estimation is acquired by means of using different loss functions in the fitness function, then the outlier problem of TSAs is solved. Compared with the classical method, the application of loss functions in the genetic algorithm is greatly simplified. Through the comparison of results of different loss functions, it is clear that the methods of least median square and least trimmed square can greatly improve the robustness of TSAs, and have a high breakdown point.
Bio-Inspired Genetic Algorithms with Formalized Crossover Operators for Robotic Applications.
Zhang, Jie; Kang, Man; Li, Xiaojuan; Liu, Geng-Yang
2017-01-01
Genetic algorithms are widely adopted to solve optimization problems in robotic applications. In such safety-critical systems, it is vitally important to formally prove the correctness when genetic algorithms are applied. This paper focuses on formal modeling of crossover operations that are one of most important operations in genetic algorithms. Specially, we for the first time formalize crossover operations with higher-order logic based on HOL4 that is easy to be deployed with its user-friendly programing environment. With correctness-guaranteed formalized crossover operations, we can safely apply them in robotic applications. We implement our technique to solve a path planning problem using a genetic algorithm with our formalized crossover operations, and the results show the effectiveness of our technique.
Evaluation of a New Ensemble Learning Framework for Mass Classification in Mammograms.
Rahmani Seryasat, Omid; Haddadnia, Javad
2018-06-01
Mammography is the most common screening method for diagnosis of breast cancer. In this study, a computer-aided system for diagnosis of benignity and malignity of the masses was implemented in mammogram images. In the computer aided diagnosis system, we first reduce the noise in the mammograms using an effective noise removal technique. After the noise removal, the mass in the region of interest must be segmented and this segmentation is done using a deformable model. After the mass segmentation, a number of features are extracted from it. These features include: features of the mass shape and border, tissue properties, and the fractal dimension. After extracting a large number of features, a proper subset must be chosen from among them. In this study, we make use of a new method on the basis of a genetic algorithm for selection of a proper set of features. After determining the proper features, a classifier is trained. To classify the samples, a new architecture for combination of the classifiers is proposed. In this architecture, easy and difficult samples are identified and trained using different classifiers. Finally, the proposed mass diagnosis system was also tested on mini-Mammographic Image Analysis Society and digital database for screening mammography databases. The obtained results indicate that the proposed system can compete with the state-of-the-art methods in terms of accuracy. Copyright © 2017 Elsevier Inc. All rights reserved.
Machine Learning Algorithms for Automatic Classification of Marmoset Vocalizations
Ribeiro, Sidarta; Pereira, Danillo R.; Papa, João P.; de Albuquerque, Victor Hugo C.
2016-01-01
Automatic classification of vocalization type could potentially become a useful tool for acoustic the monitoring of captive colonies of highly vocal primates. However, for classification to be useful in practice, a reliable algorithm that can be successfully trained on small datasets is necessary. In this work, we consider seven different classification algorithms with the goal of finding a robust classifier that can be successfully trained on small datasets. We found good classification performance (accuracy > 0.83 and F1-score > 0.84) using the Optimum Path Forest classifier. Dataset and algorithms are made publicly available. PMID:27654941
NASA Astrophysics Data System (ADS)
Lee, Dong Hyuk; Kim, JongHyo; Kim, Hee C.; Lee, Yong W.; Min, Byong Goo
1997-04-01
There have been a number of studies on the quantitative evaluation of diffuse liver disease by using texture analysis technique. However, the previous studies have been focused on the classification between only normal and abnormal pattern based on textural properties, resulting in lack of clinically useful information about the progressive status of liver disease. Considering our collaborative research experience with clinical experts, we judged that not only texture information but also several shape properties are necessary in order to successfully classify between various states of disease with liver ultrasonogram. Nine image parameters were selected experimentally. One of these was texture parameter and others were shape parameters measured as length, area and curvature. We have developed a neural-net algorithm that classifies liver ultrasonogram into 9 categories of liver disease: 3 main category and 3 sub-steps for each. Nine parameters were collected semi- automatically from the user by using graphical user interface tool, and then processed to give a grade for each parameter. Classifying algorithm consists of two steps. At the first step, each parameter was graded into pre-defined levels using neural network. in the next step, neural network classifier determined disease status using graded nine parameters. We implemented a PC based computer-assist diagnosis workstation and installed it in radiology department of Seoul National University Hospital. Using this workstation we collected 662 cases during 6 months. Some of these were used for training and others were used for evaluating accuracy of the developed algorithm. As a conclusion, a liver ultrasonogram classifying algorithm was developed using both texture and shape parameters and neural network classifier. Preliminary results indicate that the proposed algorithm is useful for evaluation of diffuse liver disease.
Active Learning with Irrelevant Examples
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri; Mazzoni, Dominic
2009-01-01
An improved active learning method has been devised for training data classifiers. One example of a data classifier is the algorithm used by the United States Postal Service since the 1960s to recognize scans of handwritten digits for processing zip codes. Active learning algorithms enable rapid training with minimal investment of time on the part of human experts to provide training examples consisting of correctly classified (labeled) input data. They function by identifying which examples would be most profitable for a human expert to label. The goal is to maximize classifier accuracy while minimizing the number of examples the expert must label. Although there are several well-established methods for active learning, they may not operate well when irrelevant examples are present in the data set. That is, they may select an item for labeling that the expert simply cannot assign to any of the valid classes. In the context of classifying handwritten digits, the irrelevant items may include stray marks, smudges, and mis-scans. Querying the expert about these items results in wasted time or erroneous labels, if the expert is forced to assign the item to one of the valid classes. In contrast, the new algorithm provides a specific mechanism for avoiding querying the irrelevant items. This algorithm has two components: an active learner (which could be a conventional active learning algorithm) and a relevance classifier. The combination of these components yields a method, denoted Relevance Bias, that enables the active learner to avoid querying irrelevant data so as to increase its learning rate and efficiency when irrelevant items are present. The algorithm collects irrelevant data in a set of rejected examples, then trains the relevance classifier to distinguish between labeled (relevant) training examples and the rejected ones. The active learner combines its ranking of the items with the probability that they are relevant to yield a final decision about which item to present to the expert for labeling. Experiments on several data sets have demonstrated that the Relevance Bias approach significantly decreases the number of irrelevant items queried and also accelerates learning speed.
Automatic discrimination of fine roots in minirhizotron images.
Zeng, Guang; Birchfield, Stanley T; Wells, Christina E
2008-01-01
Minirhizotrons provide detailed information on the production, life history and mortality of fine roots. However, manual processing of minirhizotron images is time-consuming, limiting the number and size of experiments that can reasonably be analysed. Previously, an algorithm was developed to automatically detect and measure individual roots in minirhizotron images. Here, species-specific root classifiers were developed to discriminate detected roots from bright background artifacts. Classifiers were developed from training images of peach (Prunus persica), freeman maple (Acer x freemanii) and sweetbay magnolia (Magnolia virginiana) using the Adaboost algorithm. True- and false-positive rates for classifiers were estimated using receiver operating characteristic curves. Classifiers gave true positive rates of 89-94% and false positive rates of 3-7% when applied to nontraining images of the species for which they were developed. The application of a classifier trained on one species to images from another species resulted in little or no reduction in accuracy. These results suggest that a single root classifier can be used to distinguish roots from background objects across multiple minirhizotron experiments. By incorporating root detection and discrimination algorithms into an open-source minirhizotron image analysis application, many analysis tasks that are currently performed by hand can be automated.
NASA Astrophysics Data System (ADS)
Mehdinejadiani, Behrouz
2017-08-01
This study represents the first attempt to estimate the solute transport parameters of the spatial fractional advection-dispersion equation using Bees Algorithm. The numerical studies as well as the experimental studies were performed to certify the integrity of Bees Algorithm. The experimental ones were conducted in a sandbox for homogeneous and heterogeneous soils. A detailed comparative study was carried out between the results obtained from Bees Algorithm and those from Genetic Algorithm and LSQNONLIN routines in FracFit toolbox. The results indicated that, in general, the Bees Algorithm much more accurately appraised the sFADE parameters in comparison with Genetic Algorithm and LSQNONLIN, especially in the heterogeneous soil and for α values near to 1 in the numerical study. Also, the results obtained from Bees Algorithm were more reliable than those from Genetic Algorithm. The Bees Algorithm showed the relative similar performances for all cases, while the Genetic Algorithm and the LSQNONLIN yielded different performances for various cases. The performance of LSQNONLIN strongly depends on the initial guess values so that, compared to the Genetic Algorithm, it can more accurately estimate the sFADE parameters by taking into consideration the suitable initial guess values. To sum up, the Bees Algorithm was found to be very simple, robust and accurate approach to estimate the transport parameters of the spatial fractional advection-dispersion equation.
Mehdinejadiani, Behrouz
2017-08-01
This study represents the first attempt to estimate the solute transport parameters of the spatial fractional advection-dispersion equation using Bees Algorithm. The numerical studies as well as the experimental studies were performed to certify the integrity of Bees Algorithm. The experimental ones were conducted in a sandbox for homogeneous and heterogeneous soils. A detailed comparative study was carried out between the results obtained from Bees Algorithm and those from Genetic Algorithm and LSQNONLIN routines in FracFit toolbox. The results indicated that, in general, the Bees Algorithm much more accurately appraised the sFADE parameters in comparison with Genetic Algorithm and LSQNONLIN, especially in the heterogeneous soil and for α values near to 1 in the numerical study. Also, the results obtained from Bees Algorithm were more reliable than those from Genetic Algorithm. The Bees Algorithm showed the relative similar performances for all cases, while the Genetic Algorithm and the LSQNONLIN yielded different performances for various cases. The performance of LSQNONLIN strongly depends on the initial guess values so that, compared to the Genetic Algorithm, it can more accurately estimate the sFADE parameters by taking into consideration the suitable initial guess values. To sum up, the Bees Algorithm was found to be very simple, robust and accurate approach to estimate the transport parameters of the spatial fractional advection-dispersion equation. Copyright © 2017 Elsevier B.V. All rights reserved.
A Test of Genetic Algorithms in Relevance Feedback.
ERIC Educational Resources Information Center
Lopez-Pujalte, Cristina; Guerrero Bote, Vicente P.; Moya Anegon, Felix de
2002-01-01
Discussion of information retrieval, query optimization techniques, and relevance feedback focuses on genetic algorithms, which are derived from artificial intelligence techniques. Describes an evaluation of different genetic algorithms using a residual collection method and compares results with the Ide dec-hi method (Salton and Buckley, 1990…
NASA Astrophysics Data System (ADS)
Avetisyan, H.; Bruna, O.; Holub, J.
2016-11-01
A numerous techniques and algorithms are dedicated to extract emotions from input data. In our investigation it was stated that emotion-detection approaches can be classified into 3 following types: Keyword based / lexical-based, learning based, and hybrid. The most commonly used techniques, such as keyword-spotting method, Support Vector Machines, Naïve Bayes Classifier, Hidden Markov Model and hybrid algorithms, have impressive results in this sphere and can reach more than 90% determining accuracy.
Deferred discrimination algorithm (nibbling) for target filter management
NASA Astrophysics Data System (ADS)
Caulfield, H. John; Johnson, John L.
1999-07-01
A new method of classifying objects is presented. Rather than trying to form the classifier in one step or in one training algorithm, it is done in a series of small steps, or nibbles. This leads to an efficient and versatile system that is trained in series with single one-shot examples but applied in parallel, is implemented with single layer perceptrons, yet maintains its fully sequential hierarchical structure. Based on the nibbling algorithm, a basic new method of target reference filter management is described.
Transonic Wing Shape Optimization Using a Genetic Algorithm
NASA Technical Reports Server (NTRS)
Holst, Terry L.; Pulliam, Thomas H.; Kwak, Dochan (Technical Monitor)
2002-01-01
A method for aerodynamic shape optimization based on a genetic algorithm approach is demonstrated. The algorithm is coupled with a transonic full potential flow solver and is used to optimize the flow about transonic wings including multi-objective solutions that lead to the generation of pareto fronts. The results indicate that the genetic algorithm is easy to implement, flexible in application and extremely reliable.
Portfolio optimization by using linear programing models based on genetic algorithm
NASA Astrophysics Data System (ADS)
Sukono; Hidayat, Y.; Lesmana, E.; Putra, A. S.; Napitupulu, H.; Supian, S.
2018-01-01
In this paper, we discussed the investment portfolio optimization using linear programming model based on genetic algorithms. It is assumed that the portfolio risk is measured by absolute standard deviation, and each investor has a risk tolerance on the investment portfolio. To complete the investment portfolio optimization problem, the issue is arranged into a linear programming model. Furthermore, determination of the optimum solution for linear programming is done by using a genetic algorithm. As a numerical illustration, we analyze some of the stocks traded on the capital market in Indonesia. Based on the analysis, it is shown that the portfolio optimization performed by genetic algorithm approach produces more optimal efficient portfolio, compared to the portfolio optimization performed by a linear programming algorithm approach. Therefore, genetic algorithms can be considered as an alternative on determining the investment portfolio optimization, particularly using linear programming models.
An improved genetic algorithm and its application in the TSP problem
NASA Astrophysics Data System (ADS)
Li, Zheng; Qin, Jinlei
2011-12-01
Concept and research actuality of genetic algorithm are introduced in detail in the paper. Under this condition, the simple genetic algorithm and an improved algorithm are described and applied in an example of TSP problem, where the advantage of genetic algorithm is adequately shown in solving the NP-hard problem. In addition, based on partial matching crossover operator, the crossover operator method is improved into extended crossover operator in order to advance the efficiency when solving the TSP. In the extended crossover method, crossover operator can be performed between random positions of two random individuals, which will not be restricted by the position of chromosome. Finally, the nine-city TSP is solved using the improved genetic algorithm with extended crossover method, the efficiency of whose solution process is much higher, besides, the solving speed of the optimal solution is much faster.
Solving TSP problem with improved genetic algorithm
NASA Astrophysics Data System (ADS)
Fu, Chunhua; Zhang, Lijun; Wang, Xiaojing; Qiao, Liying
2018-05-01
The TSP is a typical NP problem. The optimization of vehicle routing problem (VRP) and city pipeline optimization can use TSP to solve; therefore it is very important to the optimization for solving TSP problem. The genetic algorithm (GA) is one of ideal methods in solving it. The standard genetic algorithm has some limitations. Improving the selection operator of genetic algorithm, and importing elite retention strategy can ensure the select operation of quality, In mutation operation, using the adaptive algorithm selection can improve the quality of search results and variation, after the chromosome evolved one-way evolution reverse operation is added which can make the offspring inherit gene of parental quality improvement opportunities, and improve the ability of searching the optimal solution algorithm.
Balouchestani, Mohammadreza; Krishnan, Sridhar
2014-01-01
Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.
Genetic algorithm based fuzzy control of spacecraft autonomous rendezvous
NASA Technical Reports Server (NTRS)
Karr, C. L.; Freeman, L. M.; Meredith, D. L.
1990-01-01
The U.S. Bureau of Mines is currently investigating ways to combine the control capabilities of fuzzy logic with the learning capabilities of genetic algorithms. Fuzzy logic allows for the uncertainty inherent in most control problems to be incorporated into conventional expert systems. Although fuzzy logic based expert systems have been used successfully for controlling a number of physical systems, the selection of acceptable fuzzy membership functions has generally been a subjective decision. High performance fuzzy membership functions for a fuzzy logic controller that manipulates a mathematical model simulating the autonomous rendezvous of spacecraft are learned using a genetic algorithm, a search technique based on the mechanics of natural genetics. The membership functions learned by the genetic algorithm provide for a more efficient fuzzy logic controller than membership functions selected by the authors for the rendezvous problem. Thus, genetic algorithms are potentially an effective and structured approach for learning fuzzy membership functions.
Conwell, Darwin L; Lee, Linda S; Yadav, Dhiraj; Longnecker, Daniel S; Miller, Frank H; Mortele, Koenraad J; Levy, Michael J; Kwon, Richard; Lieb, John G; Stevens, Tyler; Toskes, Phillip P; Gardner, Timothy B; Gelrud, Andres; Wu, Bechien U; Forsmark, Christopher E; Vege, Santhi S
2014-11-01
The diagnosis of chronic pancreatitis remains challenging in early stages of the disease. This report defines the diagnostic criteria useful in the assessment of patients with suspected and established chronic pancreatitis. All current diagnostic procedures are reviewed, and evidence-based statements are provided about their utility and limitations. Diagnostic criteria for chronic pancreatitis are classified as definitive, probable, or insufficient evidence. A diagnostic (STEP-wise; survey, tomography, endoscopy, and pancreas function testing) algorithm is proposed that proceeds from a noninvasive to a more invasive approach. This algorithm maximizes specificity (low false-positive rate) in subjects with chronic abdominal pain and equivocal imaging changes. Furthermore, a nomenclature is suggested to further characterize patients with established chronic pancreatitis based on TIGAR-O (toxic, idiopathic, genetic, autoimmune, recurrent, and obstructive) etiology, gland morphology (Cambridge criteria), and physiologic state (exocrine, endocrine function) for uniformity across future multicenter research collaborations. This guideline will serve as a baseline manuscript that will be modified as new evidence becomes available and our knowledge of chronic pancreatitis improves.
Accuracy and efficiency of area classifications based on tree tally
Michael S. Williams; Hans T. Schreuder; Raymond L. Czaplewski
2001-01-01
Inventory data are often used to estimate the area of the land base that is classified as a specific condition class. Examples include areas classified as old-growth forest, private ownership, or suitable habitat for a given species. Many inventory programs rely on classification algorithms of varying complexity to determine condition class. These algorithms can be...
Automatic analysis and classification of surface electromyography.
Abou-Chadi, F E; Nashar, A; Saad, M
2001-01-01
In this paper, parametric modeling of surface electromyography (EMG) algorithms that facilitates automatic SEMG feature extraction and artificial neural networks (ANN) are combined for providing an integrated system for the automatic analysis and diagnosis of myopathic disorders. Three paradigms of ANN were investigated: the multilayer backpropagation algorithm, the self-organizing feature map algorithm and a probabilistic neural network model. The performance of the three classifiers was compared with that of the old Fisher linear discriminant (FLD) classifiers. The results have shown that the three ANN models give higher performance. The percentage of correct classification reaches 90%. Poorer diagnostic performance was obtained from the FLD classifier. The system presented here indicates that surface EMG, when properly processed, can be used to provide the physician with a diagnostic assist device.
Weng, Wei-Hung; Wagholikar, Kavishwar B; McCray, Alexa T; Szolovits, Peter; Chueh, Henry C
2017-12-01
The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note. We constructed the pipeline using the clinical NLP system, clinical Text Analysis and Knowledge Extraction System (cTAKES), the Unified Medical Language System (UMLS) Metathesaurus, Semantic Network, and learning algorithms to extract features from two datasets - clinical notes from Integrating Data for Analysis, Anonymization, and Sharing (iDASH) data repository (n = 431) and Massachusetts General Hospital (MGH) (n = 91,237), and built medical subdomain classifiers with different combinations of data representation methods and supervised learning algorithms. We evaluated the performance of classifiers and their portability across the two datasets. The convolutional recurrent neural network with neural word embeddings trained-medical subdomain classifier yielded the best performance measurement on iDASH and MGH datasets with area under receiver operating characteristic curve (AUC) of 0.975 and 0.991, and F1 scores of 0.845 and 0.870, respectively. Considering better clinical interpretability, linear support vector machine-trained medical subdomain classifier using hybrid bag-of-words and clinically relevant UMLS concepts as the feature representation, with term frequency-inverse document frequency (tf-idf)-weighting, outperformed other shallow learning classifiers on iDASH and MGH datasets with AUC of 0.957 and 0.964, and F1 scores of 0.932 and 0.934 respectively. We trained classifiers on one dataset, applied to the other dataset and yielded the threshold of F1 score of 0.7 in classifiers for half of the medical subdomains we studied. Our study shows that a supervised learning-based NLP approach is useful to develop medical subdomain classifiers. The deep learning algorithm with distributed word representation yields better performance yet shallow learning algorithms with the word and concept representation achieves comparable performance with better clinical interpretability. Portable classifiers may also be used across datasets from different institutions.
A collaborative filtering approach for protein-protein docking scoring functions.
Bourquard, Thomas; Bernauer, Julie; Azé, Jérôme; Poupon, Anne
2011-04-22
A protein-protein docking procedure traditionally consists in two successive tasks: a search algorithm generates a large number of candidate conformations mimicking the complex existing in vivo between two proteins, and a scoring function is used to rank them in order to extract a native-like one. We have already shown that using Voronoi constructions and a well chosen set of parameters, an accurate scoring function could be designed and optimized. However to be able to perform large-scale in silico exploration of the interactome, a near-native solution has to be found in the ten best-ranked solutions. This cannot yet be guaranteed by any of the existing scoring functions. In this work, we introduce a new procedure for conformation ranking. We previously developed a set of scoring functions where learning was performed using a genetic algorithm. These functions were used to assign a rank to each possible conformation. We now have a refined rank using different classifiers (decision trees, rules and support vector machines) in a collaborative filtering scheme. The scoring function newly obtained is evaluated using 10 fold cross-validation, and compared to the functions obtained using either genetic algorithms or collaborative filtering taken separately. This new approach was successfully applied to the CAPRI scoring ensembles. We show that for 10 targets out of 12, we are able to find a near-native conformation in the 10 best ranked solutions. Moreover, for 6 of them, the near-native conformation selected is of high accuracy. Finally, we show that this function dramatically enriches the 100 best-ranking conformations in near-native structures.
A Collaborative Filtering Approach for Protein-Protein Docking Scoring Functions
Bourquard, Thomas; Bernauer, Julie; Azé, Jérôme; Poupon, Anne
2011-01-01
A protein-protein docking procedure traditionally consists in two successive tasks: a search algorithm generates a large number of candidate conformations mimicking the complex existing in vivo between two proteins, and a scoring function is used to rank them in order to extract a native-like one. We have already shown that using Voronoi constructions and a well chosen set of parameters, an accurate scoring function could be designed and optimized. However to be able to perform large-scale in silico exploration of the interactome, a near-native solution has to be found in the ten best-ranked solutions. This cannot yet be guaranteed by any of the existing scoring functions. In this work, we introduce a new procedure for conformation ranking. We previously developed a set of scoring functions where learning was performed using a genetic algorithm. These functions were used to assign a rank to each possible conformation. We now have a refined rank using different classifiers (decision trees, rules and support vector machines) in a collaborative filtering scheme. The scoring function newly obtained is evaluated using 10 fold cross-validation, and compared to the functions obtained using either genetic algorithms or collaborative filtering taken separately. This new approach was successfully applied to the CAPRI scoring ensembles. We show that for 10 targets out of 12, we are able to find a near-native conformation in the 10 best ranked solutions. Moreover, for 6 of them, the near-native conformation selected is of high accuracy. Finally, we show that this function dramatically enriches the 100 best-ranking conformations in near-native structures. PMID:21526112
A "Hands on" Strategy for Teaching Genetic Algorithms to Undergraduates
ERIC Educational Resources Information Center
Venables, Anne; Tan, Grace
2007-01-01
Genetic algorithms (GAs) are a problem solving strategy that uses stochastic search. Since their introduction (Holland, 1975), GAs have proven to be particularly useful for solving problems that are "intractable" using classical methods. The language of genetic algorithms (GAs) is heavily laced with biological metaphors from evolutionary…
The potential of genetic algorithms for conceptual design of rotor systems
NASA Technical Reports Server (NTRS)
Crossley, William A.; Wells, Valana L.; Laananen, David H.
1993-01-01
The capabilities of genetic algorithms as a non-calculus based, global search method make them potentially useful in the conceptual design of rotor systems. Coupling reasonably simple analysis tools to the genetic algorithm was accomplished, and the resulting program was used to generate designs for rotor systems to match requirements similar to those of both an existing helicopter and a proposed helicopter design. This provides a comparison with the existing design and also provides insight into the potential of genetic algorithms in design of new rotors.
Genetic Algorithm for Initial Orbit Determination with Too Short Arc (Continued)
NASA Astrophysics Data System (ADS)
Li, Xin-ran; Wang, Xin
2017-04-01
When the genetic algorithm is used to solve the problem of too short-arc (TSA) orbit determination, due to the difference of computing process between the genetic algorithm and the classical method, the original method for outlier deletion is no longer applicable. In the genetic algorithm, the robust estimation is realized by introducing different loss functions for the fitness function, then the outlier problem of the TSA orbit determination is solved. Compared with the classical method, the genetic algorithm is greatly simplified by introducing in different loss functions. Through the comparison on the calculations of multiple loss functions, it is found that the least median square (LMS) estimation and least trimmed square (LTS) estimation can greatly improve the robustness of the TSA orbit determination, and have a high breakdown point.
NASA Technical Reports Server (NTRS)
Wang, Lui; Valenzuela-Rendon, Manuel
1993-01-01
The Space Station Freedom will require the supply of items in a regular fashion. A schedule for the delivery of these items is not easy to design due to the large span of time involved and the possibility of cancellations and changes in shuttle flights. This paper presents the basic concepts of a genetic algorithm model, and also presents the results of an effort to apply genetic algorithms to the design of propellant resupply schedules. As part of this effort, a simple simulator and an encoding by which a genetic algorithm can find near optimal schedules have been developed. Additionally, this paper proposes ways in which robust schedules, i.e., schedules that can tolerate small changes, can be found using genetic algorithms.
Automatic epileptic seizure detection in EEGs using MF-DFA, SVM based on cloud computing.
Zhang, Zhongnan; Wen, Tingxi; Huang, Wei; Wang, Meihong; Li, Chunfeng
2017-01-01
Epilepsy is a chronic disease with transient brain dysfunction that results from the sudden abnormal discharge of neurons in the brain. Since electroencephalogram (EEG) is a harmless and noninvasive detection method, it plays an important role in the detection of neurological diseases. However, the process of analyzing EEG to detect neurological diseases is often difficult because the brain electrical signals are random, non-stationary and nonlinear. In order to overcome such difficulty, this study aims to develop a new computer-aided scheme for automatic epileptic seizure detection in EEGs based on multi-fractal detrended fluctuation analysis (MF-DFA) and support vector machine (SVM). New scheme first extracts features from EEG by MF-DFA during the first stage. Then, the scheme applies a genetic algorithm (GA) to calculate parameters used in SVM and classify the training data according to the selected features using SVM. Finally, the trained SVM classifier is exploited to detect neurological diseases. The algorithm utilizes MLlib from library of SPARK and runs on cloud platform. Applying to a public dataset for experiment, the study results show that the new feature extraction method and scheme can detect signals with less features and the accuracy of the classification reached up to 99%. MF-DFA is a promising approach to extract features for analyzing EEG, because of its simple algorithm procedure and less parameters. The features obtained by MF-DFA can represent samples as well as traditional wavelet transform and Lyapunov exponents. GA can always find useful parameters for SVM with enough execution time. The results illustrate that the classification model can achieve comparable accuracy, which means that it is effective in epileptic seizure detection.
What to consider when pseudohypoparathyroidism is ruled out: iPPSD and differential diagnosis.
Pereda, Arrate; Garin, Intza; Perez de Nanclares, Guiomar
2018-03-02
Pseudohypoparathyroidism (PHP) is a rare disease whose phenotypic features are rather difficult to identify in some cases. Thus, although these patients may present with the Albright's hereditary osteodystrophy (AHO) phenotype, which is characterized by small stature, obesity with a rounded face, subcutaneous ossifications, mental retardation and brachydactyly, its manifestations are somewhat variable. Indeed, some of them present with a complete phenotype, whereas others show only subtle manifestations. In addition, the features of the AHO phenotype are not specific to it and a similar phenotype is also commonly observed in other syndromes. Brachydactyly type E (BDE) is the most specific and objective feature of the AHO phenotype, and several genes have been associated with syndromic BDE in the past few years. Moreover, these syndromes have a skeletal and endocrinological phenotype that overlaps with AHO/PHP. In light of the above, we have developed an algorithm to aid in genetic testing of patients with clinical features of AHO but with no causative molecular defect at the GNAS locus. Starting with the feature of brachydactyly, this algorithm allows the differential diagnosis to be broadened and, with the addition of other clinical features, can guide genetic testing. We reviewed our series of patients (n = 23) with a clinical diagnosis of AHO and with brachydactyly type E or similar pattern, who were negative for GNAS anomalies, and classify them according to the diagnosis algorithm to finally propose and analyse the most probable gene(s) in each case. A review of the clinical data for our series of patients, and subsequent analysis of the candidate gene(s), allowed detection of the underlying molecular defect in 12 out of 23 patients: five patients harboured a mutation in PRKAR1A, one in PDE4D, four in TRPS1 and two in PTHLH. This study confirmed that the screening of other genes implicated in syndromes with BDE and AHO or a similar phenotype is very helpful for establishing a correct genetic diagnosis for those patients who have been misdiagnosed with "AHO-like phenotype" with an unknown genetic cause, and also for better describing the characteristic and differential features of these less common syndromes.
Li, Xiaoou; Yan, Yuning; Wei, Wenshi
2013-01-01
The early detection of subjects with probable cognitive deficits is crucial for effective appliance of treatment strategies. This paper explored a methodology used to discriminate between evoked related potential signals of stroke patients and their matched control subjects in a visual working memory paradigm. The proposed algorithm, which combined independent component analysis and orthogonal empirical mode decomposition, was applied to extract independent sources. Four types of target stimulus features including P300 peak latency, P300 peak amplitude, root mean square, and theta frequency band power were chosen. Evolutionary multiple kernel support vector machine (EMK-SVM) based on genetic programming was investigated to classify stroke patients and healthy controls. Based on 5-fold cross-validation runs, EMK-SVM provided better classification performance compared with other state-of-the-art algorithms. Comparing stroke patients with healthy controls using the proposed algorithm, we achieved the maximum classification accuracies of 91.76% and 82.23% for 0-back and 1-back tasks, respectively. Overall, the experimental results showed that the proposed method was effective. The approach in this study may eventually lead to a reliable tool for identifying suitable brain impairment candidates and assessing cognitive function.
An Improved Heuristic Method for Subgraph Isomorphism Problem
NASA Astrophysics Data System (ADS)
Xiang, Yingzhuo; Han, Jiesi; Xu, Haijiang; Guo, Xin
2017-09-01
This paper focus on the subgraph isomorphism (SI) problem. We present an improved genetic algorithm, a heuristic method to search the optimal solution. The contribution of this paper is that we design a dedicated crossover algorithm and a new fitness function to measure the evolution process. Experiments show our improved genetic algorithm performs better than other heuristic methods. For a large graph, such as a subgraph of 40 nodes, our algorithm outperforms the traditional tree search algorithms. We find that the performance of our improved genetic algorithm does not decrease as the number of nodes in prototype graphs.
Effect of noise in intelligent cellular decision making.
Bates, Russell; Blyuss, Oleg; Alsaedi, Ahmed; Zaikin, Alexey
2015-01-01
Similar to intelligent multicellular neural networks controlling human brains, even single cells, surprisingly, are able to make intelligent decisions to classify several external stimuli or to associate them. This happens because of the fact that gene regulatory networks can perform as perceptrons, simple intelligent schemes known from studies on Artificial Intelligence. We study the role of genetic noise in intelligent decision making at the genetic level and show that noise can play a constructive role helping cells to make a proper decision. We show this using the example of a simple genetic classifier able to classify two external stimuli.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cao, Zheng; Ouyang, Bing; Principe, Jose
A multi-static serial LiDAR system prototype was developed under DE-EE0006787 to detect, classify, and record interactions of marine life with marine hydrokinetic generation equipment. This software implements a shape-matching based classifier algorithm for the underwater automated detection of marine life for that system. In addition to applying shape descriptors, the algorithm also adopts information theoretical learning based affine shape registration, improving point correspondences found by shape descriptors as well as the final similarity measure.
Automatic identification of high impact articles in PubMed to support clinical decision making.
Bian, Jiantao; Morid, Mohammad Amin; Jonnalagadda, Siddhartha; Luo, Gang; Del Fiol, Guilherme
2017-09-01
The practice of evidence-based medicine involves integrating the latest best available evidence into patient care decisions. Yet, critical barriers exist for clinicians' retrieval of evidence that is relevant for a particular patient from primary sources such as randomized controlled trials and meta-analyses. To help address those barriers, we investigated machine learning algorithms that find clinical studies with high clinical impact from PubMed®. Our machine learning algorithms use a variety of features including bibliometric features (e.g., citation count), social media attention, journal impact factors, and citation metadata. The algorithms were developed and evaluated with a gold standard composed of 502 high impact clinical studies that are referenced in 11 clinical evidence-based guidelines on the treatment of various diseases. We tested the following hypotheses: (1) our high impact classifier outperforms a state-of-the-art classifier based on citation metadata and citation terms, and PubMed's® relevance sort algorithm; and (2) the performance of our high impact classifier does not decrease significantly after removing proprietary features such as citation count. The mean top 20 precision of our high impact classifier was 34% versus 11% for the state-of-the-art classifier and 4% for PubMed's® relevance sort (p=0.009); and the performance of our high impact classifier did not decrease significantly after removing proprietary features (mean top 20 precision=34% vs. 36%; p=0.085). The high impact classifier, using features such as bibliometrics, social media attention and MEDLINE® metadata, outperformed previous approaches and is a promising alternative to identifying high impact studies for clinical decision support. Copyright © 2017 Elsevier Inc. All rights reserved.
An Efficient Statistical Computation Technique for Health Care Big Data using R
NASA Astrophysics Data System (ADS)
Sushma Rani, N.; Srinivasa Rao, P., Dr; Parimala, P.
2017-08-01
Due to the changes in living conditions and other factors many critical health related problems are arising. The diagnosis of the problem at earlier stages will increase the chances of survival and fast recovery. This reduces the time of recovery and the cost associated for the treatment. One such medical related issue is cancer and breast cancer has been identified as the second leading cause of cancer death. If detected in the early stage it can be cured. Once a patient is detected with breast cancer tumor, it should be classified whether it is cancerous or non-cancerous. So the paper uses k-nearest neighbors(KNN) algorithm which is one of the simplest machine learning algorithms and is an instance-based learning algorithm to classify the data. Day-to -day new records are added which leds to increase in the data to be classified and this tends to be big data problem. The algorithm is implemented in R whichis the most popular platform applied to machine learning algorithms for statistical computing. Experimentation is conducted by using various classification evaluation metric onvarious values of k. The results show that the KNN algorithm out performes better than existing models.
Korvigo, Ilia; Afanasyev, Andrey; Romashchenko, Nikolay; Skoblov, Mikhail
2018-01-01
Many automatic classifiers were introduced to aid inference of phenotypical effects of uncategorised nsSNVs (nonsynonymous Single Nucleotide Variations) in theoretical and medical applications. Lately, several meta-estimators have been proposed that combine different predictors, such as PolyPhen and SIFT, to integrate more information in a single score. Although many advances have been made in feature design and machine learning algorithms used, the shortage of high-quality reference data along with the bias towards intensively studied in vitro models call for improved generalisation ability in order to further increase classification accuracy and handle records with insufficient data. Since a meta-estimator basically combines different scoring systems with highly complicated nonlinear relationships, we investigated how deep learning (supervised and unsupervised), which is particularly efficient at discovering hierarchies of features, can improve classification performance. While it is believed that one should only use deep learning for high-dimensional input spaces and other models (logistic regression, support vector machines, Bayesian classifiers, etc) for simpler inputs, we still believe that the ability of neural networks to discover intricate structure in highly heterogenous datasets can aid a meta-estimator. We compare the performance with various popular predictors, many of which are recommended by the American College of Medical Genetics and Genomics (ACMG), as well as available deep learning-based predictors. Thanks to hardware acceleration we were able to use a computationally expensive genetic algorithm to stochastically optimise hyper-parameters over many generations. Overfitting was hindered by noise injection and dropout, limiting coadaptation of hidden units. Although we stress that this work was not conceived as a tool comparison, but rather an exploration of the possibilities of deep learning application in ensemble scores, our results show that even relatively simple modern neural networks can significantly improve both prediction accuracy and coverage. We provide open-access to our finest model via the web-site: http://score.generesearch.ru/services/badmut/.
Optimizing computer-aided colonic polyp detection for CT colonography by evolving the Pareto front1
Li, Jiang; Huang, Adam; Yao, Jack; Liu, Jiamin; Van Uitert, Robert L.; Petrick, Nicholas; Summers, Ronald M.
2009-01-01
A multiobjective genetic algorithm is designed to optimize a computer-aided detection (CAD) system for identifying colonic polyps. Colonic polyps appear as elliptical protrusions on the inner surface of the colon. Curvature-based features for colonic polyp detection have proved to be successful in several CT colonography (CTC) CAD systems. Our CTC CAD program uses a sequential classifier to form initial polyp detections on the colon surface. The classifier utilizes a set of thresholds on curvature-based features to cluster suspicious colon surface regions into polyp candidates. The thresholds were previously chosen experimentally by using feature histograms. The chosen thresholds were effective for detecting polyps sized 10 mm or larger in diameter. However, many medium-sized polyps, 6–9 mm in diameter, were missed in the initial detection procedure. In this paper, the task of finding optimal thresholds as a multiobjective optimization problem was formulated, and a genetic algorithm to solve it was utilized by evolving the Pareto front of the Pareto optimal set. The new CTC CAD system was tested on 792 patients. The sensitivities of the optimized system improved significantly, from 61.68% to 74.71% with an increase of 13.03% (95% CI [6.57%, 19.5%], p=7.78×10−5) for the size category of 6–9 mm polyps, from 65.02% to 77.4% with an increase of 12.38% (95% CI [6.23%, 18.53%], p=7.95×10−5) for polyps 6 mm or larger, and from 82.2% to 90.58% with an increase of 8.38% (95%CI [0.75%, 16%], p=0.03) for polyps 8 mm or larger at comparable false positive rates. The sensitivities of the optimized system are nearly equivalent to those of expert radiologists. PMID:19235388
Genetic algorithms for adaptive real-time control in space systems
NASA Technical Reports Server (NTRS)
Vanderzijp, J.; Choudry, A.
1988-01-01
Genetic Algorithms that are used for learning as one way to control the combinational explosion associated with the generation of new rules are discussed. The Genetic Algorithm approach tends to work best when it can be applied to a domain independent knowledge representation. Applications to real time control in space systems are discussed.
Fusion of classifiers for REIS-based detection of suspicious breast lesions
NASA Astrophysics Data System (ADS)
Lederman, Dror; Wang, Xingwei; Zheng, Bin; Sumkin, Jules H.; Tublin, Mitchell; Gur, David
2011-03-01
After developing a multi-probe resonance-frequency electrical impedance spectroscopy (REIS) system aimed at detecting women with breast abnormalities that may indicate a developing breast cancer, we have been conducting a prospective clinical study to explore the feasibility of applying this REIS system to classify younger women (< 50 years old) into two groups of "higher-than-average risk" and "average risk" of having or developing breast cancer. The system comprises one central probe placed in contact with the nipple, and six additional probes uniformly distributed along an outside circle to be placed in contact with six points on the outer breast skin surface. In this preliminary study, we selected an initial set of 174 examinations on participants that have completed REIS examinations and have clinical status verification. Among these, 66 examinations were recommended for biopsy due to findings of a highly suspicious breast lesion ("positives"), and 108 were determined as negative during imaging based procedures ("negatives"). A set of REIS-based features, extracted using a mirror-matched approach, was computed and fed into five machine learning classifiers. A genetic algorithm was used to select an optimal subset of features for each of the five classifiers. Three fusion rules, namely sum rule, weighted sum rule and weighted median rule, were used to combine the results of the classifiers. Performance evaluation was performed using a leave-one-case-out cross-validation method. The results indicated that REIS may provide a new technology to identify younger women with higher than average risk of having or developing breast cancer. Furthermore, it was shown that fusion rule, such as a weighted median fusion rule and a weighted sum fusion rule may improve performance as compared with the highest performing single classifier.
2013-01-01
intelligently selecting waveform parameters using adaptive algorithms. The adaptive algorithms optimize the waveform parameters based on (1) the EM...the environment. 15. SUBJECT TERMS cognitive radar, adaptive sensing, spectrum sensing, multi-objective optimization, genetic algorithms, machine...detection and classification block diagram. .........................................................6 Figure 5. Genetic algorithm block diagram
Identification of Anisomerous Motor Imagery EEG Signals Based on Complex Algorithms
Zhang, Zhiwen; Duan, Feng; Zhou, Xin; Meng, Zixuan
2017-01-01
Motor imagery (MI) electroencephalograph (EEG) signals are widely applied in brain-computer interface (BCI). However, classified MI states are limited, and their classification accuracy rates are low because of the characteristics of nonlinearity and nonstationarity. This study proposes a novel MI pattern recognition system that is based on complex algorithms for classifying MI EEG signals. In electrooculogram (EOG) artifact preprocessing, band-pass filtering is performed to obtain the frequency band of MI-related signals, and then, canonical correlation analysis (CCA) combined with wavelet threshold denoising (WTD) is used for EOG artifact preprocessing. We propose a regularized common spatial pattern (R-CSP) algorithm for EEG feature extraction by incorporating the principle of generic learning. A new classifier combining the K-nearest neighbor (KNN) and support vector machine (SVM) approaches is used to classify four anisomerous states, namely, imaginary movements with the left hand, right foot, and right shoulder and the resting state. The highest classification accuracy rate is 92.5%, and the average classification accuracy rate is 87%. The proposed complex algorithm identification method can significantly improve the identification rate of the minority samples and the overall classification performance. PMID:28874909
Extraction and classification of 3D objects from volumetric CT data
NASA Astrophysics Data System (ADS)
Song, Samuel M.; Kwon, Junghyun; Ely, Austin; Enyeart, John; Johnson, Chad; Lee, Jongkyu; Kim, Namho; Boyd, Douglas P.
2016-05-01
We propose an Automatic Threat Detection (ATD) algorithm for Explosive Detection System (EDS) using our multistage Segmentation Carving (SC) followed by Support Vector Machine (SVM) classifier. The multi-stage Segmentation and Carving (SC) step extracts all suspect 3-D objects. The feature vector is then constructed for all extracted objects and the feature vector is classified by the Support Vector Machine (SVM) previously learned using a set of ground truth threat and benign objects. The learned SVM classifier has shown to be effective in classification of different types of threat materials. The proposed ATD algorithm robustly deals with CT data that are prone to artifacts due to scatter, beam hardening as well as other systematic idiosyncrasies of the CT data. Furthermore, the proposed ATD algorithm is amenable for including newly emerging threat materials as well as for accommodating data from newly developing sensor technologies. Efficacy of the proposed ATD algorithm with the SVM classifier is demonstrated by the Receiver Operating Characteristics (ROC) curve that relates Probability of Detection (PD) as a function of Probability of False Alarm (PFA). The tests performed using CT data of passenger bags shows excellent performance characteristics.
Szantoi, Zoltan; Escobedo, Francisco J; Abd-Elrahman, Amr; Pearlstine, Leonard; Dewitt, Bon; Smith, Scot
2015-05-01
Mapping of wetlands (marsh vs. swamp vs. upland) is a common remote sensing application.Yet, discriminating between similar freshwater communities such as graminoid/sedge fromremotely sensed imagery is more difficult. Most of this activity has been performed using medium to low resolution imagery. There are only a few studies using highspatial resolutionimagery and machine learning image classification algorithms for mapping heterogeneouswetland plantcommunities. This study addresses this void by analyzing whether machine learning classifierssuch as decisiontrees (DT) and artificial neural networks (ANN) can accurately classify graminoid/sedgecommunities usinghigh resolution aerial imagery and image texture data in the Everglades National Park, Florida.In addition tospectral bands, the normalized difference vegetation index, and first- and second-order texturefeatures derivedfrom the near-infrared band were analyzed. Classifier accuracies were assessed using confusiontablesand the calculated kappa coefficients of the resulting maps. The results indicated that an ANN(multilayerperceptron based on backpropagation) algorithm produced a statistically significantly higheraccuracy(82.04%) than the DT (QUEST) algorithm (80.48%) or the maximum likelihood (80.56%)classifier (α<0.05). Findings show that using multiple window sizes provided the best results. First-ordertexture featuresalso provided computational advantages and results that were not significantly different fromthose usingsecond-order texture features.
Rodriguez-Diaz, Eladio; Castanon, David A; Singh, Satish K; Bigio, Irving J
2011-06-01
Optical spectroscopy has shown potential as a real-time, in vivo, diagnostic tool for identifying neoplasia during endoscopy. We present the development of a diagnostic algorithm to classify elastic-scattering spectroscopy (ESS) spectra as either neoplastic or non-neoplastic. The algorithm is based on pattern recognition methods, including ensemble classifiers, in which members of the ensemble are trained on different regions of the ESS spectrum, and misclassification-rejection, where the algorithm identifies and refrains from classifying samples that are at higher risk of being misclassified. These "rejected" samples can be reexamined by simply repositioning the probe to obtain additional optical readings or ultimately by sending the polyp for histopathological assessment, as per standard practice. Prospective validation using separate training and testing sets result in a baseline performance of sensitivity = .83, specificity = .79, using the standard framework of feature extraction (principal component analysis) followed by classification (with linear support vector machines). With the developed algorithm, performance improves to Se ∼ 0.90, Sp ∼ 0.90, at a cost of rejecting 20-33% of the samples. These results are on par with a panel of expert pathologists. For colonoscopic prevention of colorectal cancer, our system could reduce biopsy risk and cost, obviate retrieval of non-neoplastic polyps, decrease procedure time, and improve assessment of cancer risk.
Rodriguez-Diaz, Eladio; Castanon, David A.; Singh, Satish K.; Bigio, Irving J.
2011-01-01
Optical spectroscopy has shown potential as a real-time, in vivo, diagnostic tool for identifying neoplasia during endoscopy. We present the development of a diagnostic algorithm to classify elastic-scattering spectroscopy (ESS) spectra as either neoplastic or non-neoplastic. The algorithm is based on pattern recognition methods, including ensemble classifiers, in which members of the ensemble are trained on different regions of the ESS spectrum, and misclassification-rejection, where the algorithm identifies and refrains from classifying samples that are at higher risk of being misclassified. These “rejected” samples can be reexamined by simply repositioning the probe to obtain additional optical readings or ultimately by sending the polyp for histopathological assessment, as per standard practice. Prospective validation using separate training and testing sets result in a baseline performance of sensitivity = .83, specificity = .79, using the standard framework of feature extraction (principal component analysis) followed by classification (with linear support vector machines). With the developed algorithm, performance improves to Se ∼ 0.90, Sp ∼ 0.90, at a cost of rejecting 20–33% of the samples. These results are on par with a panel of expert pathologists. For colonoscopic prevention of colorectal cancer, our system could reduce biopsy risk and cost, obviate retrieval of non-neoplastic polyps, decrease procedure time, and improve assessment of cancer risk. PMID:21721830
Classifying Imbalanced Data Streams via Dynamic Feature Group Weighting with Importance Sampling.
Wu, Ke; Edwards, Andrea; Fan, Wei; Gao, Jing; Zhang, Kun
2014-04-01
Data stream classification and imbalanced data learning are two important areas of data mining research. Each has been well studied to date with many interesting algorithms developed. However, only a few approaches reported in literature address the intersection of these two fields due to their complex interplay. In this work, we proposed an importance sampling driven, dynamic feature group weighting framework (DFGW-IS) for classifying data streams of imbalanced distribution. Two components are tightly incorporated into the proposed approach to address the intrinsic characteristics of concept-drifting, imbalanced streaming data. Specifically, the ever-evolving concepts are tackled by a weighted ensemble trained on a set of feature groups with each sub-classifier (i.e. a single classifier or an ensemble) weighed by its discriminative power and stable level. The un-even class distribution, on the other hand, is typically battled by the sub-classifier built in a specific feature group with the underlying distribution rebalanced by the importance sampling technique. We derived the theoretical upper bound for the generalization error of the proposed algorithm. We also studied the empirical performance of our method on a set of benchmark synthetic and real world data, and significant improvement has been achieved over the competing algorithms in terms of standard evaluation metrics and parallel running time. Algorithm implementations and datasets are available upon request.
Proposed hybrid-classifier ensemble algorithm to map snow cover area
NASA Astrophysics Data System (ADS)
Nijhawan, Rahul; Raman, Balasubramanian; Das, Josodhir
2018-01-01
Metaclassification ensemble approach is known to improve the prediction performance of snow-covered area. The methodology adopted in this case is based on neural network along with four state-of-art machine learning algorithms: support vector machine, artificial neural networks, spectral angle mapper, K-mean clustering, and a snow index: normalized difference snow index. An AdaBoost ensemble algorithm related to decision tree for snow-cover mapping is also proposed. According to available literature, these methods have been rarely used for snow-cover mapping. Employing the above techniques, a study was conducted for Raktavarn and Chaturangi Bamak glaciers, Uttarakhand, Himalaya using multispectral Landsat 7 ETM+ (enhanced thematic mapper) image. The study also compares the results with those obtained from statistical combination methods (majority rule and belief functions) and accuracies of individual classifiers. Accuracy assessment is performed by computing the quantity and allocation disagreement, analyzing statistic measures (accuracy, precision, specificity, AUC, and sensitivity) and receiver operating characteristic curves. A total of 225 combinations of parameters for individual classifiers were trained and tested on the dataset and results were compared with the proposed approach. It was observed that the proposed methodology produced the highest classification accuracy (95.21%), close to (94.01%) that was produced by the proposed AdaBoost ensemble algorithm. From the sets of observations, it was concluded that the ensemble of classifiers produced better results compared to individual classifiers.
Seizures in the elderly: development and validation of a diagnostic algorithm.
Dupont, Sophie; Verny, Marc; Harston, Sandrine; Cartz-Piver, Leslie; Schück, Stéphane; Martin, Jennifer; Puisieux, François; Alecu, Cosmin; Vespignani, Hervé; Marchal, Cécile; Derambure, Philippe
2010-05-01
Seizures are frequent in the elderly, but their diagnosis can be challenging. The objective of this work was to develop and validate an expert-based algorithm for the diagnosis of seizures in elderly people. A multidisciplinary group of neurologists and geriatricians developed a diagnostic algorithm using a combination of selected clinical, electroencephalographical and radiological criteria. The algorithm was validated by multicentre retrospective analysis of data of patients referred for specific symptoms and classified by the experts as epileptic patients or not. The algorithm was applied to all the patients, and the diagnosis provided by the algorithm was compared to the clinical diagnosis of the experts. Twenty-nine clinical, electroencephalographical and radiological criteria were selected for the algorithm. According to criteria combination, seizures were classified in four levels of diagnosis: certain, highly probable, possible or improbable. To validate the algorithm, the medical records of 269 elderly patients were analyzed (138 with epileptic seizures, 131 with non-epileptic manifestations). Patients were mainly referred for a transient focal deficit (40%), confusion (38%), unconsciousness (27%). The algorithm best classified certain and probable seizures versus possible and improbable seizures, with 86.2% sensitivity and 67.2% specificity. Using logistical regression, 2 simplified models were developed, the first with 13 criteria (Se 85.5%, Sp 90.1%), and the second with 7 criteria only (Se 84.8%, Sp 88.6%). In conclusion, the present study validated the use of a revised diagnostic algorithm to help diagnosis epileptic seizures in the elderly. A prospective study is planned to further validate this algorithm. Copyright 2010 Elsevier B.V. All rights reserved.
Dementia diagnoses from clinical and neuropsychological data compared: the Cache County study.
Tschanz, J T; Welsh-Bohmer, K A; Skoog, I; West, N; Norton, M C; Wyse, B W; Nickles, R; Breitner, J C
2000-03-28
To validate a neuropsychological algorithm for dementia diagnosis. We developed a neuropsychological algorithm in a sample of 1,023 elderly residents of Cache County, UT. We compared algorithmic and clinical dementia diagnoses both based on DSM-III-R criteria. The algorithm diagnosed dementia when there was impairment in memory and at least one other cognitive domain. We also tested a variant of the algorithm that incorporated functional measures that were based on structured informant reports. Of 1,023 participants, 87% could be classified by the basic algorithm, 94% when functional measures were considered. There was good concordance between basic psychometric and clinical diagnoses (79% agreement, kappa = 0.57). This improved after incorporating functional measures (90% agreement, kappa = 0.76). Neuropsychological algorithms may reasonably classify individuals on dementia status across a range of severity levels and ages and may provide a useful adjunct to clinical diagnoses in population studies.
Warehouse stocking optimization based on dynamic ant colony genetic algorithm
NASA Astrophysics Data System (ADS)
Xiao, Xiaoxu
2018-04-01
In view of the various orders of FAW (First Automotive Works) International Logistics Co., Ltd., the SLP method is used to optimize the layout of the warehousing units in the enterprise, thus the warehouse logistics is optimized and the external processing speed of the order is improved. In addition, the relevant intelligent algorithms for optimizing the stocking route problem are analyzed. The ant colony algorithm and genetic algorithm which have good applicability are emphatically studied. The parameters of ant colony algorithm are optimized by genetic algorithm, which improves the performance of ant colony algorithm. A typical path optimization problem model is taken as an example to prove the effectiveness of parameter optimization.
A controlled genetic algorithm by fuzzy logic and belief functions for job-shop scheduling.
Hajri, S; Liouane, N; Hammadi, S; Borne, P
2000-01-01
Most scheduling problems are highly complex combinatorial problems. However, stochastic methods such as genetic algorithm yield good solutions. In this paper, we present a controlled genetic algorithm (CGA) based on fuzzy logic and belief functions to solve job-shop scheduling problems. For better performance, we propose an efficient representational scheme, heuristic rules for creating the initial population, and a new methodology for mixing and computing genetic operator probabilities.
Experimental Performance of a Genetic Algorithm for Airborne Strategic Conflict Resolution
NASA Technical Reports Server (NTRS)
Karr, David A.; Vivona, Robert A.; Roscoe, David A.; DePascale, Stephen M.; Consiglio, Maria
2009-01-01
The Autonomous Operations Planner, a research prototype flight-deck decision support tool to enable airborne self-separation, uses a pattern-based genetic algorithm to resolve predicted conflicts between the ownship and traffic aircraft. Conflicts are resolved by modifying the active route within the ownship s flight management system according to a predefined set of maneuver pattern templates. The performance of this pattern-based genetic algorithm was evaluated in the context of batch-mode Monte Carlo simulations running over 3600 flight hours of autonomous aircraft in en-route airspace under conditions ranging from typical current traffic densities to several times that level. Encountering over 8900 conflicts during two simulation experiments, the genetic algorithm was able to resolve all but three conflicts, while maintaining a required time of arrival constraint for most aircraft. Actual elapsed running time for the algorithm was consistent with conflict resolution in real time. The paper presents details of the genetic algorithm s design, along with mathematical models of the algorithm s performance and observations regarding the effectiveness of using complimentary maneuver patterns when multiple resolutions by the same aircraft were required.
Experimental Performance of a Genetic Algorithm for Airborne Strategic Conflict Resolution
NASA Technical Reports Server (NTRS)
Karr, David A.; Vivona, Robert A.; Roscoe, David A.; DePascale, Stephen M.; Consiglio, Maria
2009-01-01
The Autonomous Operations Planner, a research prototype flight-deck decision support tool to enable airborne self-separation, uses a pattern-based genetic algorithm to resolve predicted conflicts between the ownship and traffic aircraft. Conflicts are resolved by modifying the active route within the ownship's flight management system according to a predefined set of maneuver pattern templates. The performance of this pattern-based genetic algorithm was evaluated in the context of batch-mode Monte Carlo simulations running over 3600 flight hours of autonomous aircraft in en-route airspace under conditions ranging from typical current traffic densities to several times that level. Encountering over 8900 conflicts during two simulation experiments, the genetic algorithm was able to resolve all but three conflicts, while maintaining a required time of arrival constraint for most aircraft. Actual elapsed running time for the algorithm was consistent with conflict resolution in real time. The paper presents details of the genetic algorithm's design, along with mathematical models of the algorithm's performance and observations regarding the effectiveness of using complimentary maneuver patterns when multiple resolutions by the same aircraft were required.
Methods for data classification
Garrity, George [Okemos, MI; Lilburn, Timothy G [Front Royal, VA
2011-10-11
The present invention provides methods for classifying data and uncovering and correcting annotation errors. In particular, the present invention provides a self-organizing, self-correcting algorithm for use in classifying data. Additionally, the present invention provides a method for classifying biological taxa.
Liu, Yanqiu; Lu, Huijuan; Yan, Ke; Xia, Haixia; An, Chunlin
2016-01-01
Embedding cost-sensitive factors into the classifiers increases the classification stability and reduces the classification costs for classifying high-scale, redundant, and imbalanced datasets, such as the gene expression data. In this study, we extend our previous work, that is, Dissimilar ELM (D-ELM), by introducing misclassification costs into the classifier. We name the proposed algorithm as the cost-sensitive D-ELM (CS-D-ELM). Furthermore, we embed rejection cost into the CS-D-ELM to increase the classification stability of the proposed algorithm. Experimental results show that the rejection cost embedded CS-D-ELM algorithm effectively reduces the average and overall cost of the classification process, while the classification accuracy still remains competitive. The proposed method can be extended to classification problems of other redundant and imbalanced data.
Wang, Jun; Zhou, Bi-hua; Zhou, Shu-dao; Sheng, Zheng
2015-01-01
The paper proposes a novel function expression method to forecast chaotic time series, using an improved genetic-simulated annealing (IGSA) algorithm to establish the optimum function expression that describes the behavior of time series. In order to deal with the weakness associated with the genetic algorithm, the proposed algorithm incorporates the simulated annealing operation which has the strong local search ability into the genetic algorithm to enhance the performance of optimization; besides, the fitness function and genetic operators are also improved. Finally, the method is applied to the chaotic time series of Quadratic and Rossler maps for validation. The effect of noise in the chaotic time series is also studied numerically. The numerical results verify that the method can forecast chaotic time series with high precision and effectiveness, and the forecasting precision with certain noise is also satisfactory. It can be concluded that the IGSA algorithm is energy-efficient and superior. PMID:26000011
Scalability problems of simple genetic algorithms.
Thierens, D
1999-01-01
Scalable evolutionary computation has become an intensively studied research topic in recent years. The issue of scalability is predominant in any field of algorithmic design, but it became particularly relevant for the design of competent genetic algorithms once the scalability problems of simple genetic algorithms were understood. Here we present some of the work that has aided in getting a clear insight in the scalability problems of simple genetic algorithms. Particularly, we discuss the important issue of building block mixing. We show how the need for mixing places a boundary in the GA parameter space that, together with the boundary from the schema theorem, delimits the region where the GA converges reliably to the optimum in problems of bounded difficulty. This region shrinks rapidly with increasing problem size unless the building blocks are tightly linked in the problem coding structure. In addition, we look at how straightforward extensions of the simple genetic algorithm-namely elitism, niching, and restricted mating are not significantly improving the scalability problems.
Wang, Jun; Zhou, Bi-hua; Zhou, Shu-dao; Sheng, Zheng
2015-01-01
The paper proposes a novel function expression method to forecast chaotic time series, using an improved genetic-simulated annealing (IGSA) algorithm to establish the optimum function expression that describes the behavior of time series. In order to deal with the weakness associated with the genetic algorithm, the proposed algorithm incorporates the simulated annealing operation which has the strong local search ability into the genetic algorithm to enhance the performance of optimization; besides, the fitness function and genetic operators are also improved. Finally, the method is applied to the chaotic time series of Quadratic and Rossler maps for validation. The effect of noise in the chaotic time series is also studied numerically. The numerical results verify that the method can forecast chaotic time series with high precision and effectiveness, and the forecasting precision with certain noise is also satisfactory. It can be concluded that the IGSA algorithm is energy-efficient and superior.
Comparison of artificial intelligence classifiers for SIP attack data
NASA Astrophysics Data System (ADS)
Safarik, Jakub; Slachta, Jiri
2016-05-01
Honeypot application is a source of valuable data about attacks on the network. We run several SIP honeypots in various computer networks, which are separated geographically and logically. Each honeypot runs on public IP address and uses standard SIP PBX ports. All information gathered via honeypot is periodically sent to the centralized server. This server classifies all attack data by neural network algorithm. The paper describes optimizations of a neural network classifier, which lower the classification error. The article contains the comparison of two neural network algorithm used for the classification of validation data. The first is the original implementation of the neural network described in recent work; the second neural network uses further optimizations like input normalization or cross-entropy cost function. We also use other implementations of neural networks and machine learning classification algorithms. The comparison test their capabilities on validation data to find the optimal classifier. The article result shows promise for further development of an accurate SIP attack classification engine.
Statistical process control using optimized neural networks: a case study.
Addeh, Jalil; Ebrahimzadeh, Ata; Azarbad, Milad; Ranaee, Vahid
2014-09-01
The most common statistical process control (SPC) tools employed for monitoring process changes are control charts. A control chart demonstrates that the process has altered by generating an out-of-control signal. This study investigates the design of an accurate system for the control chart patterns (CCPs) recognition in two aspects. First, an efficient system is introduced that includes two main modules: feature extraction module and classifier module. In the feature extraction module, a proper set of shape features and statistical feature are proposed as the efficient characteristics of the patterns. In the classifier module, several neural networks, such as multilayer perceptron, probabilistic neural network and radial basis function are investigated. Based on an experimental study, the best classifier is chosen in order to recognize the CCPs. Second, a hybrid heuristic recognition system is introduced based on cuckoo optimization algorithm (COA) algorithm to improve the generalization performance of the classifier. The simulation results show that the proposed algorithm has high recognition accuracy. Copyright © 2013 ISA. Published by Elsevier Ltd. All rights reserved.
Fluorescence intensity positivity classification of Hep-2 cells images using fuzzy logic
NASA Astrophysics Data System (ADS)
Sazali, Dayang Farzana Abang; Janier, Josefina Barnachea; May, Zazilah Bt.
2014-10-01
Indirect Immunofluorescence (IIF) is a good standard used for antinuclear autoantibody (ANA) test using Hep-2 cells to determine specific diseases. Different classifier algorithm methods have been proposed in previous works however, there still no valid set as a standard to classify the fluorescence intensity. This paper presents the use of fuzzy logic to classify the fluorescence intensity and to determine the positivity of the Hep-2 cell serum samples. The fuzzy algorithm involves the image pre-processing by filtering the noises and smoothen the image, converting the red, green and blue (RGB) color space of images to luminosity layer, chromaticity layer "a" and "b" (LAB) color space where the mean value of the lightness and chromaticity layer "a" was extracted and classified by using fuzzy logic algorithm based on the standard score ranges of antinuclear autoantibody (ANA) fluorescence intensity. Using 100 data sets of positive and intermediate fluorescence intensity for testing the performance measurements, the fuzzy logic obtained an accuracy of intermediate and positive class as 85% and 87% respectively.
Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.
Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G
2017-09-01
To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
Yousef, Malik; Khalifa, Waleed; AbedAllah, Loai
2016-12-22
The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that ECkNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
Yousef, Malik; Khalifa, Waleed; AbdAllah, Loai
2016-12-01
The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that EC-kNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
An investigation of messy genetic algorithms
NASA Technical Reports Server (NTRS)
Goldberg, David E.; Deb, Kalyanmoy; Korb, Bradley
1990-01-01
Genetic algorithms (GAs) are search procedures based on the mechanics of natural selection and natural genetics. They combine the use of string codings or artificial chromosomes and populations with the selective and juxtapositional power of reproduction and recombination to motivate a surprisingly powerful search heuristic in many problems. Despite their empirical success, there has been a long standing objection to the use of GAs in arbitrarily difficult problems. A new approach was launched. Results to a 30-bit, order-three-deception problem were obtained using a new type of genetic algorithm called a messy genetic algorithm (mGAs). Messy genetic algorithms combine the use of variable-length strings, a two-phase selection scheme, and messy genetic operators to effect a solution to the fixed-coding problem of standard simple GAs. The results of the study of mGAs in problems with nonuniform subfunction scale and size are presented. The mGA approach is summarized, both its operation and the theory of its use. Experiments on problems of varying scale, varying building-block size, and combined varying scale and size are presented.
Global Optimization of a Periodic System using a Genetic Algorithm
NASA Astrophysics Data System (ADS)
Stucke, David; Crespi, Vincent
2001-03-01
We use a novel application of a genetic algorithm global optimizatin technique to find the lowest energy structures for periodic systems. We apply this technique to colloidal crystals for several different stoichiometries of binary and trinary colloidal crystals. This application of a genetic algorithm is decribed and results of likely candidate structures are presented.
Research and application of multi-agent genetic algorithm in tower defense game
NASA Astrophysics Data System (ADS)
Jin, Shaohua
2018-04-01
In this paper, a new multi-agent genetic algorithm based on orthogonal experiment is proposed, which is based on multi-agent system, genetic algorithm and orthogonal experimental design. The design of neighborhood competition operator, orthogonal crossover operator, Son and self-learning operator. The new algorithm is applied to mobile tower defense game, according to the characteristics of the game, the establishment of mathematical models, and finally increases the value of the game's monster.
Aerodynamic Shape Optimization Using A Real-Number-Encoded Genetic Algorithm
NASA Technical Reports Server (NTRS)
Holst, Terry L.; Pulliam, Thomas H.
2001-01-01
A new method for aerodynamic shape optimization using a genetic algorithm with real number encoding is presented. The algorithm is used to optimize three different problems, a simple hill climbing problem, a quasi-one-dimensional nozzle problem using an Euler equation solver and a three-dimensional transonic wing problem using a nonlinear potential solver. Results indicate that the genetic algorithm is easy to implement and extremely reliable, being relatively insensitive to design space noise.
Guinness, Robert E
2015-04-28
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity.
Guinness, Robert E.
2015-01-01
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity. PMID:25928060
Genetic algorithms as global random search methods
NASA Technical Reports Server (NTRS)
Peck, Charles C.; Dhawan, Atam P.
1995-01-01
Genetic algorithm behavior is described in terms of the construction and evolution of the sampling distributions over the space of candidate solutions. This novel perspective is motivated by analysis indicating that the schema theory is inadequate for completely and properly explaining genetic algorithm behavior. Based on the proposed theory, it is argued that the similarities of candidate solutions should be exploited directly, rather than encoding candidate solutions and then exploiting their similarities. Proportional selection is characterized as a global search operator, and recombination is characterized as the search process that exploits similarities. Sequential algorithms and many deletion methods are also analyzed. It is shown that by properly constraining the search breadth of recombination operators, convergence of genetic algorithms to a global optimum can be ensured.
Genetic algorithms as global random search methods
NASA Technical Reports Server (NTRS)
Peck, Charles C.; Dhawan, Atam P.
1995-01-01
Genetic algorithm behavior is described in terms of the construction and evolution of the sampling distributions over the space of candidate solutions. This novel perspective is motivated by analysis indicating that that schema theory is inadequate for completely and properly explaining genetic algorithm behavior. Based on the proposed theory, it is argued that the similarities of candidate solutions should be exploited directly, rather than encoding candidate solution and then exploiting their similarities. Proportional selection is characterized as a global search operator, and recombination is characterized as the search process that exploits similarities. Sequential algorithms and many deletion methods are also analyzed. It is shown that by properly constraining the search breadth of recombination operators, convergence of genetic algorithms to a global optimum can be ensured.
Genetic Algorithm Calibration of Probabilistic Cellular Automata for Modeling Mining Permit Activity
Louis, S.J.; Raines, G.L.
2003-01-01
We use a genetic algorithm to calibrate a spatially and temporally resolved cellular automata to model mining activity on public land in Idaho and western Montana. The genetic algorithm searches through a space of transition rule parameters of a two dimensional cellular automata model to find rule parameters that fit observed mining activity data. Previous work by one of the authors in calibrating the cellular automaton took weeks - the genetic algorithm takes a day and produces rules leading to about the same (or better) fit to observed data. These preliminary results indicate that genetic algorithms are a viable tool in calibrating cellular automata for this application. Experience gained during the calibration of this cellular automata suggests that mineral resource information is a critical factor in the quality of the results. With automated calibration, further refinements of how the mineral-resource information is provided to the cellular automaton will probably improve our model.
Hybrid genetic algorithm in the Hopfield network for maximum 2-satisfiability problem
NASA Astrophysics Data System (ADS)
Kasihmuddin, Mohd Shareduwan Mohd; Sathasivam, Saratha; Mansor, Mohd. Asyraf
2017-08-01
Heuristic method was designed for finding optimal solution more quickly compared to classical methods which are too complex to comprehend. In this study, a hybrid approach that utilizes Hopfield network and genetic algorithm in doing maximum 2-Satisfiability problem (MAX-2SAT) was proposed. Hopfield neural network was used to minimize logical inconsistency in interpretations of logic clauses or program. Genetic algorithm (GA) has pioneered the implementation of methods that exploit the idea of combination and reproduce a better solution. The simulation incorporated with and without genetic algorithm will be examined by using Microsoft Visual 2013 C++ Express software. The performance of both searching techniques in doing MAX-2SAT was evaluate based on global minima ratio, ratio of satisfied clause and computation time. The result obtained form the computer simulation demonstrates the effectiveness and acceleration features of genetic algorithm in doing MAX-2SAT in Hopfield network.
Genetic Algorithm for Traveling Salesman Problem with Modified Cycle Crossover Operator
Mohamd Shoukry, Alaa; Gani, Showkat
2017-01-01
Genetic algorithms are evolutionary techniques used for optimization purposes according to survival of the fittest idea. These methods do not ensure optimal solutions; however, they give good approximation usually in time. The genetic algorithms are useful for NP-hard problems, especially the traveling salesman problem. The genetic algorithm depends on selection criteria, crossover, and mutation operators. To tackle the traveling salesman problem using genetic algorithms, there are various representations such as binary, path, adjacency, ordinal, and matrix representations. In this article, we propose a new crossover operator for traveling salesman problem to minimize the total distance. This approach has been linked with path representation, which is the most natural way to represent a legal tour. Computational results are also reported with some traditional path representation methods like partially mapped and order crossovers along with new cycle crossover operator for some benchmark TSPLIB instances and found improvements. PMID:29209364
Genetic Algorithm for Traveling Salesman Problem with Modified Cycle Crossover Operator.
Hussain, Abid; Muhammad, Yousaf Shad; Nauman Sajid, M; Hussain, Ijaz; Mohamd Shoukry, Alaa; Gani, Showkat
2017-01-01
Genetic algorithms are evolutionary techniques used for optimization purposes according to survival of the fittest idea. These methods do not ensure optimal solutions; however, they give good approximation usually in time. The genetic algorithms are useful for NP-hard problems, especially the traveling salesman problem. The genetic algorithm depends on selection criteria, crossover, and mutation operators. To tackle the traveling salesman problem using genetic algorithms, there are various representations such as binary, path, adjacency, ordinal, and matrix representations. In this article, we propose a new crossover operator for traveling salesman problem to minimize the total distance. This approach has been linked with path representation, which is the most natural way to represent a legal tour. Computational results are also reported with some traditional path representation methods like partially mapped and order crossovers along with new cycle crossover operator for some benchmark TSPLIB instances and found improvements.
Tan, Robin; Perkowski, Marek
2017-01-01
Electrocardiogram (ECG) signals sensed from mobile devices pertain the potential for biometric identity recognition applicable in remote access control systems where enhanced data security is demanding. In this study, we propose a new algorithm that consists of a two-stage classifier combining random forest and wavelet distance measure through a probabilistic threshold schema, to improve the effectiveness and robustness of a biometric recognition system using ECG data acquired from a biosensor integrated into mobile devices. The proposed algorithm is evaluated using a mixed dataset from 184 subjects under different health conditions. The proposed two-stage classifier achieves a total of 99.52% subject verification accuracy, better than the 98.33% accuracy from random forest alone and 96.31% accuracy from wavelet distance measure algorithm alone. These results demonstrate the superiority of the proposed algorithm for biometric identification, hence supporting its practicality in areas such as cloud data security, cyber-security or remote healthcare systems. PMID:28230745
Tan, Robin; Perkowski, Marek
2017-02-20
Electrocardiogram (ECG) signals sensed from mobile devices pertain the potential for biometric identity recognition applicable in remote access control systems where enhanced data security is demanding. In this study, we propose a new algorithm that consists of a two-stage classifier combining random forest and wavelet distance measure through a probabilistic threshold schema, to improve the effectiveness and robustness of a biometric recognition system using ECG data acquired from a biosensor integrated into mobile devices. The proposed algorithm is evaluated using a mixed dataset from 184 subjects under different health conditions. The proposed two-stage classifier achieves a total of 99.52% subject verification accuracy, better than the 98.33% accuracy from random forest alone and 96.31% accuracy from wavelet distance measure algorithm alone. These results demonstrate the superiority of the proposed algorithm for biometric identification, hence supporting its practicality in areas such as cloud data security, cyber-security or remote healthcare systems.
A modified genetic algorithm with fuzzy roulette wheel selection for job-shop scheduling problems
NASA Astrophysics Data System (ADS)
Thammano, Arit; Teekeng, Wannaporn
2015-05-01
The job-shop scheduling problem is one of the most difficult production planning problems. Since it is in the NP-hard class, a recent trend in solving the job-shop scheduling problem is shifting towards the use of heuristic and metaheuristic algorithms. This paper proposes a novel metaheuristic algorithm, which is a modification of the genetic algorithm. This proposed algorithm introduces two new concepts to the standard genetic algorithm: (1) fuzzy roulette wheel selection and (2) the mutation operation with tabu list. The proposed algorithm has been evaluated and compared with several state-of-the-art algorithms in the literature. The experimental results on 53 JSSPs show that the proposed algorithm is very effective in solving the combinatorial optimization problems. It outperforms all state-of-the-art algorithms on all benchmark problems in terms of the ability to achieve the optimal solution and the computational time.
Leukocyte Recognition Using EM-Algorithm
NASA Astrophysics Data System (ADS)
Colunga, Mario Chirinos; Siordia, Oscar Sánchez; Maybank, Stephen J.
This document describes a method for classifying images of blood cells. Three different classes of cells are used: Band Neutrophils, Eosinophils and Lymphocytes. The image pattern is projected down to a lower dimensional sub space using PCA; the probability density function for each class is modeled with a Gaussian mixture using the EM-Algorithm. A new cell image is classified using the maximum a posteriori decision rule.
A New Challenge for Compression Algorithms: Genetic Sequences.
ERIC Educational Resources Information Center
Grumbach, Stephane; Tahi, Fariza
1994-01-01
Analyzes the properties of genetic sequences that cause the failure of classical algorithms used for data compression. A lossless algorithm, which compresses the information contained in DNA and RNA sequences by detecting regularities such as palindromes, is presented. This algorithm combines substitutional and statistical methods and appears to…
NASA Technical Reports Server (NTRS)
Cacio, Emanuela; Cohn, Stephen E.; Spigler, Renato
2011-01-01
A numerical method is devised to solve a class of linear boundary-value problems for one-dimensional parabolic equations degenerate at the boundaries. Feller theory, which classifies the nature of the boundary points, is used to decide whether boundary conditions are needed to ensure uniqueness, and, if so, which ones they are. The algorithm is based on a suitable preconditioned implicit finite-difference scheme, grid, and treatment of the boundary data. Second-order accuracy, unconditional stability, and unconditional convergence of solutions of the finite-difference scheme to a constant as the time-step index tends to infinity are further properties of the method. Several examples, pertaining to financial mathematics, physics, and genetics, are presented for the purpose of illustration.
Identification of an Efficient Gene Expression Panel for Glioblastoma Classification
Zelaya, Ivette; Laks, Dan R.; Zhao, Yining; Kawaguchi, Riki; Gao, Fuying; Kornblum, Harley I.; Coppola, Giovanni
2016-01-01
We present here a novel genetic algorithm-based random forest (GARF) modeling technique that enables a reduction in the complexity of large gene disease signatures to highly accurate, greatly simplified gene panels. When applied to 803 glioblastoma multiforme samples, this method allowed the 840-gene Verhaak et al. gene panel (the standard in the field) to be reduced to a 48-gene classifier, while retaining 90.91% classification accuracy, and outperforming the best available alternative methods. Additionally, using this approach we produced a 32-gene panel which allows for better consistency between RNA-seq and microarray-based classifications, improving cross-platform classification retention from 69.67% to 86.07%. A webpage producing these classifications is available at http://simplegbm.semel.ucla.edu. PMID:27855170
NASA Astrophysics Data System (ADS)
Attia, Khalid A. M.; Nassar, Mohammed W. I.; El-Zeiny, Mohamed B.; Serag, Ahmed
2017-01-01
For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration.
Cloud classification from satellite data using a fuzzy sets algorithm: A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1988-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
Raposo, Letícia M; Nobre, Flavio F
2017-08-30
Resistance to antiretrovirals (ARVs) is a major problem faced by HIV-infected individuals. Different rule-based algorithms were developed to infer HIV-1 susceptibility to antiretrovirals from genotypic data. However, there is discordance between them, resulting in difficulties for clinical decisions about which treatment to use. Here, we developed ensemble classifiers integrating three interpretation algorithms: Agence Nationale de Recherche sur le SIDA (ANRS), Rega, and the genotypic resistance interpretation system from Stanford HIV Drug Resistance Database (HIVdb). Three approaches were applied to develop a classifier with a single resistance profile: stacked generalization, a simple plurality vote scheme and the selection of the interpretation system with the best performance. The strategies were compared with the Friedman's test and the performance of the classifiers was evaluated using the F-measure, sensitivity and specificity values. We found that the three strategies had similar performances for the selected antiretrovirals. For some cases, the stacking technique with naïve Bayes as the learning algorithm showed a statistically superior F-measure. This study demonstrates that ensemble classifiers can be an alternative tool for clinical decision-making since they provide a single resistance profile from the most commonly used resistance interpretation systems.
Refined genetic algorithm -- Economic dispatch example
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sheble, G.B.; Brittig, K.
1995-02-01
A genetic-based algorithm is used to solve an economic dispatch (ED) problem. The algorithm utilizes payoff information of perspective solutions to evaluate optimality. Thus, the constraints of classical LaGrangian techniques on unit curves are eliminated. Using an economic dispatch problem as a basis for comparison, several different techniques which enhance program efficiency and accuracy, such as mutation prediction, elitism, interval approximation and penalty factors, are explored. Two unique genetic algorithms are also compared. The results are verified for a sample problem using a classical technique.
Immune allied genetic algorithm for Bayesian network structure learning
NASA Astrophysics Data System (ADS)
Song, Qin; Lin, Feng; Sun, Wei; Chang, KC
2012-06-01
Bayesian network (BN) structure learning is a NP-hard problem. In this paper, we present an improved approach to enhance efficiency of BN structure learning. To avoid premature convergence in traditional single-group genetic algorithm (GA), we propose an immune allied genetic algorithm (IAGA) in which the multiple-population and allied strategy are introduced. Moreover, in the algorithm, we apply prior knowledge by injecting immune operator to individuals which can effectively prevent degeneration. To illustrate the effectiveness of the proposed technique, we present some experimental results.
Flexible Space-Filling Designs for Complex System Simulations
2013-06-01
interior of the experimental region and cannot fit higher-order models. We present a genetic algorithm that constructs space-filling designs with...Computer Experiments, Design of Experiments, Genetic Algorithm , Latin Hypercube, Response Surface Methodology, Nearly Orthogonal 15. NUMBER OF PAGES 147...experimental region and cannot fit higher-order models. We present a genetic algorithm that constructs space-filling designs with minimal correlations
Genetic algorithms in conceptual design of a light-weight, low-noise, tilt-rotor aircraft
NASA Technical Reports Server (NTRS)
Wells, Valana L.
1996-01-01
This report outlines research accomplishments in the area of using genetic algorithms (GA) for the design and optimization of rotorcraft. It discusses the genetic algorithm as a search and optimization tool, outlines a procedure for using the GA in the conceptual design of helicopters, and applies the GA method to the acoustic design of rotors.
Self-calibration of a noisy multiple-sensor system with genetic algorithms
NASA Astrophysics Data System (ADS)
Brooks, Richard R.; Iyengar, S. Sitharama; Chen, Jianhua
1996-01-01
This paper explores an image processing application of optimization techniques which entails interpreting noisy sensor data. The application is a generalization of image correlation; we attempt to find the optimal gruence which matches two overlapping gray-scale images corrupted with noise. Both taboo search and genetic algorithms are used to find the parameters which match the two images. A genetic algorithm approach using an elitist reproduction scheme is found to provide significantly superior results. The presentation includes a graphic presentation of the paths taken by tabu search and genetic algorithms when trying to find the best possible match between two corrupted images.
Increasing Prediction the Original Final Year Project of Student Using Genetic Algorithm
NASA Astrophysics Data System (ADS)
Saragih, Rijois Iboy Erwin; Turnip, Mardi; Sitanggang, Delima; Aritonang, Mendarissan; Harianja, Eva
2018-04-01
Final year project is very important forgraduation study of a student. Unfortunately, many students are not seriouslydidtheir final projects. Many of studentsask for someone to do it for them. In this paper, an application of genetic algorithms to predict the original final year project of a studentis proposed. In the simulation, the data of the final project for the last 5 years is collected. The genetic algorithm has several operators namely population, selection, crossover, and mutation. The result suggest that genetic algorithm can do better prediction than other comparable model. Experimental results of predicting showed that 70% was more accurate than the previous researched.
Enhancement web proxy cache performance using Wrapper Feature Selection methods with NB and J48
NASA Astrophysics Data System (ADS)
Mahmoud Al-Qudah, Dua'a.; Funke Olanrewaju, Rashidah; Wong Azman, Amelia
2017-11-01
Web proxy cache technique reduces response time by storing a copy of pages between client and server sides. If requested pages are cached in the proxy, there is no need to access the server. Due to the limited size and excessive cost of cache compared to the other storages, cache replacement algorithm is used to determine evict page when the cache is full. On the other hand, the conventional algorithms for replacement such as Least Recently Use (LRU), First in First Out (FIFO), Least Frequently Use (LFU), Randomized Policy etc. may discard important pages just before use. Furthermore, using conventional algorithm cannot be well optimized since it requires some decision to intelligently evict a page before replacement. Hence, most researchers propose an integration among intelligent classifiers and replacement algorithm to improves replacement algorithms performance. This research proposes using automated wrapper feature selection methods to choose the best subset of features that are relevant and influence classifiers prediction accuracy. The result present that using wrapper feature selection methods namely: Best First (BFS), Incremental Wrapper subset selection(IWSS)embedded NB and particle swarm optimization(PSO)reduce number of features and have a good impact on reducing computation time. Using PSO enhance NB classifier accuracy by 1.1%, 0.43% and 0.22% over using NB with all features, using BFS and using IWSS embedded NB respectively. PSO rises J48 accuracy by 0.03%, 1.91 and 0.04% over using J48 classifier with all features, using IWSS-embedded NB and using BFS respectively. While using IWSS embedded NB fastest NB and J48 classifiers much more than BFS and PSO. However, it reduces computation time of NB by 0.1383 and reduce computation time of J48 by 2.998.
3D Protein structure prediction with genetic tabu search algorithm
2010-01-01
Background Protein structure prediction (PSP) has important applications in different fields, such as drug design, disease prediction, and so on. In protein structure prediction, there are two important issues. The first one is the design of the structure model and the second one is the design of the optimization technology. Because of the complexity of the realistic protein structure, the structure model adopted in this paper is a simplified model, which is called off-lattice AB model. After the structure model is assumed, optimization technology is needed for searching the best conformation of a protein sequence based on the assumed structure model. However, PSP is an NP-hard problem even if the simplest model is assumed. Thus, many algorithms have been developed to solve the global optimization problem. In this paper, a hybrid algorithm, which combines genetic algorithm (GA) and tabu search (TS) algorithm, is developed to complete this task. Results In order to develop an efficient optimization algorithm, several improved strategies are developed for the proposed genetic tabu search algorithm. The combined use of these strategies can improve the efficiency of the algorithm. In these strategies, tabu search introduced into the crossover and mutation operators can improve the local search capability, the adoption of variable population size strategy can maintain the diversity of the population, and the ranking selection strategy can improve the possibility of an individual with low energy value entering into next generation. Experiments are performed with Fibonacci sequences and real protein sequences. Experimental results show that the lowest energy obtained by the proposed GATS algorithm is lower than that obtained by previous methods. Conclusions The hybrid algorithm has the advantages from both genetic algorithm and tabu search algorithm. It makes use of the advantage of multiple search points in genetic algorithm, and can overcome poor hill-climbing capability in the conventional genetic algorithm by using the flexible memory functions of TS. Compared with some previous algorithms, GATS algorithm has better performance in global optimization and can predict 3D protein structure more effectively. PMID:20522256
Genetic Algorithms Applied to Multi-Objective Aerodynamic Shape Optimization
NASA Technical Reports Server (NTRS)
Holst, Terry L.
2004-01-01
A genetic algorithm approach suitable for solving multi-objective optimization problems is described and evaluated using a series of aerodynamic shape optimization problems. Several new features including two variations of a binning selection algorithm and a gene-space transformation procedure are included. The genetic algorithm is suitable for finding pareto optimal solutions in search spaces that are defined by any number of genes and that contain any number of local extrema. A new masking array capability is included allowing any gene or gene subset to be eliminated as decision variables from the design space. This allows determination of the effect of a single gene or gene subset on the pareto optimal solution. Results indicate that the genetic algorithm optimization approach is flexible in application and reliable. The binning selection algorithms generally provide pareto front quality enhancements and moderate convergence efficiency improvements for most of the problems solved.
Genetic Algorithms Applied to Multi-Objective Aerodynamic Shape Optimization
NASA Technical Reports Server (NTRS)
Holst, Terry L.
2005-01-01
A genetic algorithm approach suitable for solving multi-objective problems is described and evaluated using a series of aerodynamic shape optimization problems. Several new features including two variations of a binning selection algorithm and a gene-space transformation procedure are included. The genetic algorithm is suitable for finding Pareto optimal solutions in search spaces that are defined by any number of genes and that contain any number of local extrema. A new masking array capability is included allowing any gene or gene subset to be eliminated as decision variables from the design space. This allows determination of the effect of a single gene or gene subset on the Pareto optimal solution. Results indicate that the genetic algorithm optimization approach is flexible in application and reliable. The binning selection algorithms generally provide Pareto front quality enhancements and moderate convergence efficiency improvements for most of the problems solved.
Genetic algorithm dynamics on a rugged landscape
NASA Astrophysics Data System (ADS)
Bornholdt, Stefan
1998-04-01
The genetic algorithm is an optimization procedure motivated by biological evolution and is successfully applied to optimization problems in different areas. A statistical mechanics model for its dynamics is proposed based on the parent-child fitness correlation of the genetic operators, making it applicable to general fitness landscapes. It is compared to a recent model based on a maximum entropy ansatz. Finally it is applied to modeling the dynamics of a genetic algorithm on the rugged fitness landscape of the NK model.
MotieGhader, Habib; Gharaghani, Sajjad; Masoudi-Sobhanzadeh, Yosef; Masoudi-Nejad, Ali
2017-01-01
Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as GA, PSO, ACO and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR feature selection are proposed. SGALA algorithm uses advantages of Genetic algorithm and Learning Automata sequentially and the MGALA algorithm uses advantages of Genetic Algorithm and Learning Automata simultaneously. We applied our proposed algorithms to select the minimum possible number of features from three different datasets and also we observed that the MGALA and SGALA algorithms had the best outcome independently and in average compared to other feature selection algorithms. Through comparison of our proposed algorithms, we deduced that the rate of convergence to optimal result in MGALA and SGALA algorithms were better than the rate of GA, ACO, PSO and LA algorithms. In the end, the results of GA, ACO, PSO, LA, SGALA, and MGALA algorithms were applied as the input of LS-SVR model and the results from LS-SVR models showed that the LS-SVR model had more predictive ability with the input from SGALA and MGALA algorithms than the input from all other mentioned algorithms. Therefore, the results have corroborated that not only is the predictive efficiency of proposed algorithms better, but their rate of convergence is also superior to the all other mentioned algorithms. PMID:28979308
MotieGhader, Habib; Gharaghani, Sajjad; Masoudi-Sobhanzadeh, Yosef; Masoudi-Nejad, Ali
2017-01-01
Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as GA, PSO, ACO and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR feature selection are proposed. SGALA algorithm uses advantages of Genetic algorithm and Learning Automata sequentially and the MGALA algorithm uses advantages of Genetic Algorithm and Learning Automata simultaneously. We applied our proposed algorithms to select the minimum possible number of features from three different datasets and also we observed that the MGALA and SGALA algorithms had the best outcome independently and in average compared to other feature selection algorithms. Through comparison of our proposed algorithms, we deduced that the rate of convergence to optimal result in MGALA and SGALA algorithms were better than the rate of GA, ACO, PSO and LA algorithms. In the end, the results of GA, ACO, PSO, LA, SGALA, and MGALA algorithms were applied as the input of LS-SVR model and the results from LS-SVR models showed that the LS-SVR model had more predictive ability with the input from SGALA and MGALA algorithms than the input from all other mentioned algorithms. Therefore, the results have corroborated that not only is the predictive efficiency of proposed algorithms better, but their rate of convergence is also superior to the all other mentioned algorithms.
Automated detection of pulmonary nodules in CT images with support vector machines
NASA Astrophysics Data System (ADS)
Liu, Lu; Liu, Wanyu; Sun, Xiaoming
2008-10-01
Many methods have been proposed to avoid radiologists fail to diagnose small pulmonary nodules. Recently, support vector machines (SVMs) had received an increasing attention for pattern recognition. In this paper, we present a computerized system aimed at pulmonary nodules detection; it identifies the lung field, extracts a set of candidate regions with a high sensitivity ratio and then classifies candidates by the use of SVMs. The Computer Aided Diagnosis (CAD) system presented in this paper supports the diagnosis of pulmonary nodules from Computed Tomography (CT) images as inflammation, tuberculoma, granuloma..sclerosing hemangioma, and malignant tumor. Five texture feature sets were extracted for each lesion, while a genetic algorithm based feature selection method was applied to identify the most robust features. The selected feature set was fed into an ensemble of SVMs classifiers. The achieved classification performance was 100%, 92.75% and 90.23% in the training, validation and testing set, respectively. It is concluded that computerized analysis of medical images in combination with artificial intelligence can be used in clinical practice and may contribute to more efficient diagnosis.
Hong, Weizhe; Kennedy, Ann; Burgos-Artizzu, Xavier P; Zelikowsky, Moriel; Navonne, Santiago G; Perona, Pietro; Anderson, David J
2015-09-22
A lack of automated, quantitative, and accurate assessment of social behaviors in mammalian animal models has limited progress toward understanding mechanisms underlying social interactions and their disorders such as autism. Here we present a new integrated hardware and software system that combines video tracking, depth sensing, and machine learning for automatic detection and quantification of social behaviors involving close and dynamic interactions between two mice of different coat colors in their home cage. We designed a hardware setup that integrates traditional video cameras with a depth camera, developed computer vision tools to extract the body "pose" of individual animals in a social context, and used a supervised learning algorithm to classify several well-described social behaviors. We validated the robustness of the automated classifiers in various experimental settings and used them to examine how genetic background, such as that of Black and Tan Brachyury (BTBR) mice (a previously reported autism model), influences social behavior. Our integrated approach allows for rapid, automated measurement of social behaviors across diverse experimental designs and also affords the ability to develop new, objective behavioral metrics.
Hong, Weizhe; Kennedy, Ann; Burgos-Artizzu, Xavier P.; Zelikowsky, Moriel; Navonne, Santiago G.; Perona, Pietro; Anderson, David J.
2015-01-01
A lack of automated, quantitative, and accurate assessment of social behaviors in mammalian animal models has limited progress toward understanding mechanisms underlying social interactions and their disorders such as autism. Here we present a new integrated hardware and software system that combines video tracking, depth sensing, and machine learning for automatic detection and quantification of social behaviors involving close and dynamic interactions between two mice of different coat colors in their home cage. We designed a hardware setup that integrates traditional video cameras with a depth camera, developed computer vision tools to extract the body “pose” of individual animals in a social context, and used a supervised learning algorithm to classify several well-described social behaviors. We validated the robustness of the automated classifiers in various experimental settings and used them to examine how genetic background, such as that of Black and Tan Brachyury (BTBR) mice (a previously reported autism model), influences social behavior. Our integrated approach allows for rapid, automated measurement of social behaviors across diverse experimental designs and also affords the ability to develop new, objective behavioral metrics. PMID:26354123
Evaluating data mining algorithms using molecular dynamics trajectories.
Tatsis, Vasileios A; Tjortjis, Christos; Tzirakis, Panagiotis
2013-01-01
Molecular dynamics simulations provide a sample of a molecule's conformational space. Experiments on the mus time scale, resulting in large amounts of data, are nowadays routine. Data mining techniques such as classification provide a way to analyse such data. In this work, we evaluate and compare several classification algorithms using three data sets which resulted from computer simulations, of a potential enzyme mimetic biomolecule. We evaluated 65 classifiers available in the well-known data mining toolkit Weka, using 'classification' errors to assess algorithmic performance. Results suggest that: (i) 'meta' classifiers perform better than the other groups, when applied to molecular dynamics data sets; (ii) Random Forest and Rotation Forest are the best classifiers for all three data sets; and (iii) classification via clustering yields the highest classification error. Our findings are consistent with bibliographic evidence, suggesting a 'roadmap' for dealing with such data.
An Improved Hierarchical Genetic Algorithm for Sheet Cutting Scheduling with Process Constraints
Rao, Yunqing; Qi, Dezhong; Li, Jinling
2013-01-01
For the first time, an improved hierarchical genetic algorithm for sheet cutting problem which involves n cutting patterns for m non-identical parallel machines with process constraints has been proposed in the integrated cutting stock model. The objective of the cutting scheduling problem is minimizing the weighted completed time. A mathematical model for this problem is presented, an improved hierarchical genetic algorithm (ant colony—hierarchical genetic algorithm) is developed for better solution, and a hierarchical coding method is used based on the characteristics of the problem. Furthermore, to speed up convergence rates and resolve local convergence issues, a kind of adaptive crossover probability and mutation probability is used in this algorithm. The computational result and comparison prove that the presented approach is quite effective for the considered problem. PMID:24489491
An improved hierarchical genetic algorithm for sheet cutting scheduling with process constraints.
Rao, Yunqing; Qi, Dezhong; Li, Jinling
2013-01-01
For the first time, an improved hierarchical genetic algorithm for sheet cutting problem which involves n cutting patterns for m non-identical parallel machines with process constraints has been proposed in the integrated cutting stock model. The objective of the cutting scheduling problem is minimizing the weighted completed time. A mathematical model for this problem is presented, an improved hierarchical genetic algorithm (ant colony--hierarchical genetic algorithm) is developed for better solution, and a hierarchical coding method is used based on the characteristics of the problem. Furthermore, to speed up convergence rates and resolve local convergence issues, a kind of adaptive crossover probability and mutation probability is used in this algorithm. The computational result and comparison prove that the presented approach is quite effective for the considered problem.
NASA Astrophysics Data System (ADS)
Bal, A.; Alam, M. S.; Aslan, M. S.
2006-05-01
Often sensor ego-motion or fast target movement causes the target to temporarily go out of the field-of-view leading to reappearing target detection problem in target tracking applications. Since the target goes out of the current frame and reenters at a later frame, the reentering location and variations in rotation, scale, and other 3D orientations of the target are not known thus complicating the detection algorithm has been developed using Fukunaga-Koontz Transform (FKT) and distance classifier correlation filter (DCCF). The detection algorithm uses target and background information, extracted from training samples, to detect possible candidate target images. The detected candidate target images are then introduced into the second algorithm, DCCF, called clutter rejection module, to determine the target coordinates are detected and tracking algorithm is initiated. The performance of the proposed FKT-DCCF based target detection algorithm has been tested using real-world forward looking infrared (FLIR) video sequences.
Adaboost multi-view face detection based on YCgCr skin color model
NASA Astrophysics Data System (ADS)
Lan, Qi; Xu, Zhiyong
2016-09-01
Traditional Adaboost face detection algorithm uses Haar-like features training face classifiers, whose detection error rate is low in the face region. While under the complex background, the classifiers will make wrong detection easily to the background regions with the similar faces gray level distribution, which leads to the error detection rate of traditional Adaboost algorithm is high. As one of the most important features of a face, skin in YCgCr color space has good clustering. We can fast exclude the non-face areas through the skin color model. Therefore, combining with the advantages of the Adaboost algorithm and skin color detection algorithm, this paper proposes Adaboost face detection algorithm method that bases on YCgCr skin color model. Experiments show that, compared with traditional algorithm, the method we proposed has improved significantly in the detection accuracy and errors.
Enhancement of Fast Face Detection Algorithm Based on a Cascade of Decision Trees
NASA Astrophysics Data System (ADS)
Khryashchev, V. V.; Lebedev, A. A.; Priorov, A. L.
2017-05-01
Face detection algorithm based on a cascade of ensembles of decision trees (CEDT) is presented. The new approach allows detecting faces other than the front position through the use of multiple classifiers. Each classifier is trained for a specific range of angles of the rotation head. The results showed a high rate of productivity for CEDT on images with standard size. The algorithm increases the area under the ROC-curve of 13% compared to a standard Viola-Jones face detection algorithm. Final realization of given algorithm consist of 5 different cascades for frontal/non-frontal faces. One more thing which we take from the simulation results is a low computational complexity of CEDT algorithm in comparison with standard Viola-Jones approach. This could prove important in the embedded system and mobile device industries because it can reduce the cost of hardware and make battery life longer.
Pose estimation for augmented reality applications using genetic algorithm.
Yu, Ying Kin; Wong, Kin Hong; Chang, Michael Ming Yuen
2005-12-01
This paper describes a genetic algorithm that tackles the pose-estimation problem in computer vision. Our genetic algorithm can find the rotation and translation of an object accurately when the three-dimensional structure of the object is given. In our implementation, each chromosome encodes both the pose and the indexes to the selected point features of the object. Instead of only searching for the pose as in the existing work, our algorithm, at the same time, searches for a set containing the most reliable feature points in the process. This mismatch filtering strategy successfully makes the algorithm more robust under the presence of point mismatches and outliers in the images. Our algorithm has been tested with both synthetic and real data with good results. The accuracy of the recovered pose is compared to the existing algorithms. Our approach outperformed the Lowe's method and the other two genetic algorithms under the presence of point mismatches and outliers. In addition, it has been used to estimate the pose of a real object. It is shown that the proposed method is applicable to augmented reality applications.
Optimization of laminated stacking sequence for buckling load maximization by genetic algorithm
NASA Technical Reports Server (NTRS)
Le Riche, Rodolphe; Haftka, Raphael T.
1992-01-01
The use of a genetic algorithm to optimize the stacking sequence of a composite laminate for buckling load maximization is studied. Various genetic parameters including the population size, the probability of mutation, and the probability of crossover are optimized by numerical experiments. A new genetic operator - permutation - is proposed and shown to be effective in reducing the cost of the genetic search. Results are obtained for a graphite-epoxy plate, first when only the buckling load is considered, and then when constraints on ply contiguity and strain failure are added. The influence on the genetic search of the penalty parameter enforcing the contiguity constraint is studied. The advantage of the genetic algorithm in producing several near-optimal designs is discussed.
Development of a Tool for an Efficient Calibration of CORSIM Models
DOT National Transportation Integrated Search
2014-08-01
This project proposes a Memetic Algorithm (MA) for the calibration of microscopic traffic flow simulation models. The proposed MA includes a combination of genetic and simulated annealing algorithms. The genetic algorithm performs the exploration of ...
Locating Encrypted Data Hidden Among Non-Encrypted Data Using Statistical Tools
2007-03-01
length of a compressed sequence). If a bit sequence can be significantly compressed , then it is not random. Lempel - Ziv Compression Test This test...communication, targeting, and a host other of tasks. This software will most assuredly contain classified data or algorithms requiring protection in...containing the classified data and algorithms . As the program is executed the solider would have access to the common unclassified tasks, however, to
NASA Astrophysics Data System (ADS)
Ross, Z. E.; Meier, M. A.; Hauksson, E.
2017-12-01
Accurate first-motion polarities are essential for determining earthquake focal mechanisms, but are difficult to measure automatically because of picking errors and signal to noise issues. Here we develop an algorithm for reliable automated classification of first-motion polarities using machine learning algorithms. A classifier is designed to identify whether the first-motion polarity is up, down, or undefined by examining the waveform data directly. We first improve the accuracy of automatic P-wave onset picks by maximizing a weighted signal/noise ratio for a suite of candidate picks around the automatic pick. We then use the waveform amplitudes before and after the optimized pick as features for the classification. We demonstrate the method's potential by training and testing the classifier on tens of thousands of hand-made first-motion picks by the Southern California Seismic Network. The classifier assigned the same polarity as chosen by an analyst in more than 94% of the records. We show that the method is generalizable to a variety of learning algorithms, including neural networks and random forest classifiers. The method is suitable for automated processing of large seismic waveform datasets, and can potentially be used in real-time applications, e.g. for improving the source characterizations of earthquake early warning algorithms.
Engineered Intrinsic Bioremediation of Ammonium Perchlorate in Groundwater
2010-12-01
German Collection of Microorganisms and Cell Cultures) GA Genetic Algorithms GA-ANN Genetic Algorithm Artificial Neural Network GMO genetically...for in situ treatment of perchlorate in groundwater. This is accomplished without the addition of genetically engineered microorganisms ( GMOs ) to the...perchlorate, even in the presence of oxygen and without the addition of genetically engineered microorganisms ( GMOs ) to the environment. This approach
Efficient audio signal processing for embedded systems
NASA Astrophysics Data System (ADS)
Chiu, Leung Kin
As mobile platforms continue to pack on more computational power, electronics manufacturers start to differentiate their products by enhancing the audio features. However, consumers also demand smaller devices that could operate for longer time, hence imposing design constraints. In this research, we investigate two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound ”richer" and "fuller." Piezoelectric speakers have a small form factor but exhibit poor response in the low-frequency region. In the algorithm, we combine psychoacoustic bass extension and dynamic range compression to improve the perceived bass coming out from the tiny speakers. We also developed an audio energy reduction algorithm for loudspeaker power management. The perceptually transparent algorithm extends the battery life of mobile devices and prevents thermal damage in speakers. This method is similar to audio compression algorithms, which encode audio signals in such a ways that the compression artifacts are not easily perceivable. Instead of reducing the storage space, however, we suppress the audio contents that are below the hearing threshold, therefore reducing the signal energy. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The system is an example of an analog-to-information converter. The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. In this classifier architecture, we combine simple "base" analog classifiers to form a strong one. We also designed the circuits to implement the AdaBoost-based analog classifier.
[Algorithm of toxigenic genetically altered Vibrio cholerae El Tor biovar strain identification].
Smirnova, N I; Agafonov, D A; Zadnova, S P; Cherkasov, A V; Kutyrev, V V
2014-01-01
Development of an algorithm of genetically altered Vibrio cholerae biovar El Tor strai identification that ensures determination of serogroup, serovar and biovar of the studied isolate based on pheno- and genotypic properties, detection of genetically altered cholera El Tor causative agents, their differentiation by epidemic potential as well as evaluation of variability of key pathogenicity genes. Complex analysis of 28 natural V. cholerae strains was carried out by using traditional microbiological methods, PCR and fragmentary sequencing. An algorithm of toxigenic genetically altered V. cholerae biovar El Tor strain identification was developed that includes 4 stages: determination of serogroup, serovar and biovar based on phenotypic properties, confirmation of serogroup and biovar based on molecular-genetic properties determination of strains as genetically altered, differentiation of genetically altered strains by their epidemic potential and detection of ctxB and tcpA key pathogenicity gene polymorphism. The algorithm is based on the use of traditional microbiological methods, PCR and sequencing of gene fragments. The use of the developed algorithm will increase the effectiveness of detection of genetically altered variants of the cholera El Tor causative agent, their differentiation by epidemic potential and will ensure establishment of polymorphism of genes that code key pathogenicity factors for determination of origins of the strains and possible routes of introduction of the infection.
An expert support system for breast cancer diagnosis using color wavelet features.
Issac Niwas, S; Palanisamy, P; Chibbar, Rajni; Zhang, W J
2012-10-01
Breast cancer diagnosis can be done through the pathologic assessments of breast tissue samples such as core needle biopsy technique. The result of analysis on this sample by pathologist is crucial for breast cancer patient. In this paper, nucleus of tissue samples are investigated after decomposition by means of the Log-Gabor wavelet on HSV color domain and an algorithm is developed to compute the color wavelet features. These features are used for breast cancer diagnosis using Support Vector Machine (SVM) classifier algorithm. The ability of properly trained SVM is to correctly classify patterns and make them particularly suitable for use in an expert system that aids in the diagnosis of cancer tissue samples. The results are compared with other multivariate classifiers such as Naïves Bayes classifier and Artificial Neural Network. The overall accuracy of the proposed method using SVM classifier will be further useful for automation in cancer diagnosis.
Attia, Khalid A M; Nassar, Mohammed W I; El-Zeiny, Mohamed B; Serag, Ahmed
2017-01-05
For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration. Copyright © 2016 Elsevier B.V. All rights reserved.
Implementation and performance evaluation of acoustic denoising algorithms for UAV
NASA Astrophysics Data System (ADS)
Chowdhury, Ahmed Sony Kamal
Unmanned Aerial Vehicles (UAVs) have become popular alternative for wildlife monitoring and border surveillance applications. Elimination of the UAV's background noise and classifying the target audio signal effectively are still a major challenge. The main goal of this thesis is to remove UAV's background noise by means of acoustic denoising techniques. Existing denoising algorithms, such as Adaptive Least Mean Square (LMS), Wavelet Denoising, Time-Frequency Block Thresholding, and Wiener Filter, were implemented and their performance evaluated. The denoising algorithms were evaluated for average Signal to Noise Ratio (SNR), Segmental SNR (SSNR), Log Likelihood Ratio (LLR), and Log Spectral Distance (LSD) metrics. To evaluate the effectiveness of the denoising algorithms on classification of target audio, we implemented Support Vector Machine (SVM) and Naive Bayes classification algorithms. Simulation results demonstrate that LMS and Discrete Wavelet Transform (DWT) denoising algorithm offered superior performance than other algorithms. Finally, we implemented the LMS and DWT algorithms on a DSP board for hardware evaluation. Experimental results showed that LMS algorithm's performance is robust compared to DWT for various noise types to classify target audio signals.
Nankali, Saber; Miandoab, Payam Samadi; Baghizadeh, Amin
2016-01-01
In external‐beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation‐based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two “Genetic” and “Ranker” searching procedures. The performance of these algorithms has been evaluated using four‐dimensional extended cardiac‐torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro‐fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F‐test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation‐based feature selection algorithm, in combination with a genetic search algorithm, proved to yield best performance accuracy for selecting optimum markers. PACS numbers: 87.55.km, 87.56.Fc PMID:26894358
Incremental learning of concept drift in nonstationary environments.
Elwell, Ryan; Polikar, Robi
2011-10-01
We introduce an ensemble of classifiers-based approach for incremental learning of concept drift, characterized by nonstationary environments (NSEs), where the underlying data distributions change over time. The proposed algorithm, named Learn(++). NSE, learns from consecutive batches of data without making any assumptions on the nature or rate of drift; it can learn from such environments that experience constant or variable rate of drift, addition or deletion of concept classes, as well as cyclical drift. The algorithm learns incrementally, as other members of the Learn(++) family of algorithms, that is, without requiring access to previously seen data. Learn(++). NSE trains one new classifier for each batch of data it receives, and combines these classifiers using a dynamically weighted majority voting. The novelty of the approach is in determining the voting weights, based on each classifier's time-adjusted accuracy on current and past environments. This approach allows the algorithm to recognize, and act accordingly, to the changes in underlying data distributions, as well as to a possible reoccurrence of an earlier distribution. We evaluate the algorithm on several synthetic datasets designed to simulate a variety of nonstationary environments, as well as a real-world weather prediction dataset. Comparisons with several other approaches are also included. Results indicate that Learn(++). NSE can track the changing environments very closely, regardless of the type of concept drift. To allow future use, comparison and benchmarking by interested researchers, we also release our data used in this paper. © 2011 IEEE
On Algorithms for Generating Computationally Simple Piecewise Linear Classifiers
1989-05-01
suffers. - Waveform classification, e.g. speech recognition, seismic analysis (i.e. discrimination between earthquakes and nuclear explosions), target...assuming Gaussian distributions (B-G) d) Bayes classifier with probability densities estimated with the k-N-N method (B- kNN ) e) The -arest neighbour...range of classifiers are chosen including a fast, easy computable and often used classifier (B-G), reliable and complex classifiers (B- kNN and NNR
Distributed genetic algorithms for the floorplan design problem
NASA Technical Reports Server (NTRS)
Cohoon, James P.; Hegde, Shailesh U.; Martin, Worthy N.; Richards, Dana S.
1991-01-01
Designing a VLSI floorplan calls for arranging a given set of modules in the plane to minimize the weighted sum of area and wire-length measures. A method of solving the floorplan design problem using distributed genetic algorithms is presented. Distributed genetic algorithms, based on the paleontological theory of punctuated equilibria, offer a conceptual modification to the traditional genetic algorithms. Experimental results on several problem instances demonstrate the efficacy of this method and indicate the advantages of this method over other methods, such as simulated annealing. The method has performed better than the simulated annealing approach, both in terms of the average cost of the solutions found and the best-found solution, in almost all the problem instances tried.
Classifying epileptic EEG signals with delay permutation entropy and Multi-Scale K-means.
Zhu, Guohun; Li, Yan; Wen, Peng Paul; Wang, Shuaifang
2015-01-01
Most epileptic EEG classification algorithms are supervised and require large training datasets, that hinder their use in real time applications. This chapter proposes an unsupervised Multi-Scale K-means (MSK-means) MSK-means algorithm to distinguish epileptic EEG signals and identify epileptic zones. The random initialization of the K-means algorithm can lead to wrong clusters. Based on the characteristics of EEGs, the MSK-means MSK-means algorithm initializes the coarse-scale centroid of a cluster with a suitable scale factor. In this chapter, the MSK-means algorithm is proved theoretically superior to the K-means algorithm on efficiency. In addition, three classifiers: the K-means, MSK-means MSK-means and support vector machine (SVM), are used to identify seizure and localize epileptogenic zone using delay permutation entropy features. The experimental results demonstrate that identifying seizure with the MSK-means algorithm and delay permutation entropy achieves 4. 7 % higher accuracy than that of K-means, and 0. 7 % higher accuracy than that of the SVM.
Evolving aerodynamic airfoils for wind turbines through a genetic algorithm
NASA Astrophysics Data System (ADS)
Hernández, J. J.; Gómez, E.; Grageda, J. I.; Couder, C.; Solís, A.; Hanotel, C. L.; Ledesma, JI
2017-01-01
Nowadays, genetic algorithms stand out for airfoil optimisation, due to the virtues of mutation and crossing-over techniques. In this work we propose a genetic algorithm with arithmetic crossover rules. The optimisation criteria are taken to be the maximisation of both aerodynamic efficiency and lift coefficient, while minimising drag coefficient. Such algorithm shows greatly improvements in computational costs, as well as a high performance by obtaining optimised airfoils for Mexico City's specific wind conditions from generic wind turbines designed for higher Reynolds numbers, in few iterations.
Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei
2014-09-01
In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
An Agent Inspired Reconfigurable Computing Implementation of a Genetic Algorithm
NASA Technical Reports Server (NTRS)
Weir, John M.; Wells, B. Earl
2003-01-01
Many software systems have been successfully implemented using an agent paradigm which employs a number of independent entities that communicate with one another to achieve a common goal. The distributed nature of such a paradigm makes it an excellent candidate for use in high speed reconfigurable computing hardware environments such as those present in modem FPGA's. In this paper, a distributed genetic algorithm that can be applied to the agent based reconfigurable hardware model is introduced. The effectiveness of this new algorithm is evaluated by comparing the quality of the solutions found by the new algorithm with those found by traditional genetic algorithms. The performance of a reconfigurable hardware implementation of the new algorithm on an FPGA is compared to traditional single processor implementations.
Phase Reconstruction from FROG Using Genetic Algorithms[Frequency-Resolved Optical Gating
DOE Office of Scientific and Technical Information (OSTI.GOV)
Omenetto, F.G.; Nicholson, J.W.; Funk, D.J.
1999-04-12
The authors describe a new technique for obtaining the phase and electric field from FROG measurements using genetic algorithms. Frequency-Resolved Optical Gating (FROG) has gained prominence as a technique for characterizing ultrashort pulses. FROG consists of a spectrally resolved autocorrelation of the pulse to be measured. Typically a combination of iterative algorithms is used, applying constraints from experimental data, and alternating between the time and frequency domain, in order to retrieve an optical pulse. The authors have developed a new approach to retrieving the intensity and phase from FROG data using a genetic algorithm (GA). A GA is a generalmore » parallel search technique that operates on a population of potential solutions simultaneously. Operators in a genetic algorithm, such as crossover, selection, and mutation are based on ideas taken from evolution.« less
A novel procedure on next generation sequencing data analysis using text mining algorithm.
Zhao, Weizhong; Chen, James J; Perkins, Roger; Wang, Yuping; Liu, Zhichao; Hong, Huixiao; Tong, Weida; Zou, Wen
2016-05-13
Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure. The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data. The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.
Selection of Norway spruce somatic embryos by computer vision
NASA Astrophysics Data System (ADS)
Hamalainen, Jari J.; Jokinen, Kari J.
1993-05-01
A computer vision system was developed for the classification of plant somatic embryos. The embryos are in a Petri dish that is transferred with constant speed and they are recognized as they pass a line scan camera. A classification algorithm needs to be installed for every plant species. This paper describes an algorithm for the recognition of Norway spruce (Picea abies) embryos. A short review of conifer micropropagation by somatic embryogenesis is also given. The recognition algorithm is based on features calculated from the boundary of the object. Only part of the boundary corresponding to the developing cotyledons (2 - 15) and the straight sides of the embryo are used for recognition. An index of the length of the cotyledons describes the developmental stage of the embryo. The testing set for classifier performance consisted of 118 embryos and 478 nonembryos. With the classification tolerances chosen 69% of the objects classified as embryos by a human classifier were selected and 31$% rejected. Less than 1% of the nonembryos were classified as embryos. The basic features developed can probably be easily adapted for the recognition of other conifer somatic embryos.
Hierarchical Learning of Tree Classifiers for Large-Scale Plant Species Identification.
Fan, Jianping; Zhou, Ning; Peng, Jinye; Gao, Ling
2015-11-01
In this paper, a hierarchical multi-task structural learning algorithm is developed to support large-scale plant species identification, where a visual tree is constructed for organizing large numbers of plant species in a coarse-to-fine fashion and determining the inter-related learning tasks automatically. For a given parent node on the visual tree, it contains a set of sibling coarse-grained categories of plant species or sibling fine-grained plant species, and a multi-task structural learning algorithm is developed to train their inter-related classifiers jointly for enhancing their discrimination power. The inter-level relationship constraint, e.g., a plant image must first be assigned to a parent node (high-level non-leaf node) correctly if it can further be assigned to the most relevant child node (low-level non-leaf node or leaf node) on the visual tree, is formally defined and leveraged to learn more discriminative tree classifiers over the visual tree. Our experimental results have demonstrated the effectiveness of our hierarchical multi-task structural learning algorithm on training more discriminative tree classifiers for large-scale plant species identification.
PPCM: Combing multiple classifiers to improve protein-protein interaction prediction
Yao, Jianzhuang; Guo, Hong; Yang, Xiaohan
2015-08-01
Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using anmore » assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. Ultimately, this pipeline will be useful for predicting PPI in nonmodel species.« less
Support vector machine as a binary classifier for automated object detection in remotely sensed data
NASA Astrophysics Data System (ADS)
Wardaya, P. D.
2014-02-01
In the present paper, author proposes the application of Support Vector Machine (SVM) for the analysis of satellite imagery. One of the advantages of SVM is that, with limited training data, it may generate comparable or even better results than the other methods. The SVM algorithm is used for automated object detection and characterization. Specifically, the SVM is applied in its basic nature as a binary classifier where it classifies two classes namely, object and background. The algorithm aims at effectively detecting an object from its background with the minimum training data. The synthetic image containing noises is used for algorithm testing. Furthermore, it is implemented to perform remote sensing image analysis such as identification of Island vegetation, water body, and oil spill from the satellite imagery. It is indicated that SVM provides the fast and accurate analysis with the acceptable result.
Martin, Bryan D.; Wolfson, Julian; Adomavicius, Gediminas; Fan, Yingling
2017-01-01
We propose and compare combinations of several methods for classifying transportation activity data from smartphone GPS and accelerometer sensors. We have two main objectives. First, we aim to classify our data as accurately as possible. Second, we aim to reduce the dimensionality of the data as much as possible in order to reduce the computational burden of the classification. We combine dimension reduction and classification algorithms and compare them with a metric that balances accuracy and dimensionality. In doing so, we develop a classification algorithm that accurately classifies five different modes of transportation (i.e., walking, biking, car, bus and rail) while being computationally simple enough to run on a typical smartphone. Further, we use data that required no behavioral changes from the smartphone users to collect. Our best classification model uses the random forest algorithm to achieve 96.8% accuracy. PMID:28885550
Martin, Bryan D; Addona, Vittorio; Wolfson, Julian; Adomavicius, Gediminas; Fan, Yingling
2017-09-08
We propose and compare combinations of several methods for classifying transportation activity data from smartphone GPS and accelerometer sensors. We have two main objectives. First, we aim to classify our data as accurately as possible. Second, we aim to reduce the dimensionality of the data as much as possible in order to reduce the computational burden of the classification. We combine dimension reduction and classification algorithms and compare them with a metric that balances accuracy and dimensionality. In doing so, we develop a classification algorithm that accurately classifies five different modes of transportation (i.e., walking, biking, car, bus and rail) while being computationally simple enough to run on a typical smartphone. Further, we use data that required no behavioral changes from the smartphone users to collect. Our best classification model uses the random forest algorithm to achieve 96.8% accuracy.
A novel algorithm for simplification of complex gene classifiers in cancer
Wilson, Raphael A.; Teng, Ling; Bachmeyer, Karen M.; Bissonnette, Mei Lin Z.; Husain, Aliya N.; Parham, David M.; Triche, Timothy J.; Wing, Michele R.; Gastier-Foster, Julie M.; Barr, Frederic G.; Hawkins, Douglas S.; Anderson, James R.; Skapek, Stephen X.; Volchenboum, Samuel L.
2013-01-01
The clinical application of complex molecular classifiers as diagnostic or prognostic tools has been limited by the time and cost needed to apply them to patients. Using an existing fifty-gene expression signature known to separate two molecular subtypes of the pediatric cancer rhabdomyosarcoma, we show that an exhaustive iterative search algorithm can distill this complex classifier down to two or three features with equal discrimination. We validated the two-gene signatures using three separate and distinct data sets, including one that uses degraded RNA extracted from formalin-fixed, paraffin-embedded material. Finally, to demonstrate the generalizability of our algorithm, we applied it to a lung cancer data set to find minimal gene signatures that can distinguish survival. Our approach can easily be generalized and coupled to existing technical platforms to facilitate the discovery of simplified signatures that are ready for routine clinical use. PMID:23913937
NASA Technical Reports Server (NTRS)
Mazzoni, Dominic; Wagstaff, Kiri; Bornstein, Benjamin; Tang, Nghia; Roden, Joseph
2006-01-01
PixelLearn is an integrated user-interface computer program for classifying pixels in scientific images. Heretofore, training a machine-learning algorithm to classify pixels in images has been tedious and difficult. PixelLearn provides a graphical user interface that makes it faster and more intuitive, leading to more interactive exploration of image data sets. PixelLearn also provides image-enhancement controls to make it easier to see subtle details in images. PixelLearn opens images or sets of images in a variety of common scientific file formats and enables the user to interact with several supervised or unsupervised machine-learning pixel-classifying algorithms while the user continues to browse through the images. The machinelearning algorithms in PixelLearn use advanced clustering and classification methods that enable accuracy much higher than is achievable by most other software previously available for this purpose. PixelLearn is written in portable C++ and runs natively on computers running Linux, Windows, or Mac OS X.
NASA Astrophysics Data System (ADS)
Adya Zizwan, Putra; Zarlis, Muhammad; Budhiarti Nababan, Erna
2017-12-01
The determination of Centroid on K-Means Algorithm directly affects the quality of the clustering results. Determination of centroid by using random numbers has many weaknesses. The GenClust algorithm that combines the use of Genetic Algorithms and K-Means uses a genetic algorithm to determine the centroid of each cluster. The use of the GenClust algorithm uses 50% chromosomes obtained through deterministic calculations and 50% is obtained from the generation of random numbers. This study will modify the use of the GenClust algorithm in which the chromosomes used are 100% obtained through deterministic calculations. The results of this study resulted in performance comparisons expressed in Mean Square Error influenced by centroid determination on K-Means method by using GenClust method, modified GenClust method and also classic K-Means.
Sherer, Eric A; Sale, Mark E; Pollock, Bruce G; Belani, Chandra P; Egorin, Merrill J; Ivy, Percy S; Lieberman, Jeffrey A; Manuck, Stephen B; Marder, Stephen R; Muldoon, Matthew F; Scher, Howard I; Solit, David B; Bies, Robert R
2012-08-01
A limitation in traditional stepwise population pharmacokinetic model building is the difficulty in handling interactions between model components. To address this issue, a method was previously introduced which couples NONMEM parameter estimation and model fitness evaluation to a single-objective, hybrid genetic algorithm for global optimization of the model structure. In this study, the generalizability of this approach for pharmacokinetic model building is evaluated by comparing (1) correct and spurious covariate relationships in a simulated dataset resulting from automated stepwise covariate modeling, Lasso methods, and single-objective hybrid genetic algorithm approaches to covariate identification and (2) information criteria values, model structures, convergence, and model parameter values resulting from manual stepwise versus single-objective, hybrid genetic algorithm approaches to model building for seven compounds. Both manual stepwise and single-objective, hybrid genetic algorithm approaches to model building were applied, blinded to the results of the other approach, for selection of the compartment structure as well as inclusion and model form of inter-individual and inter-occasion variability, residual error, and covariates from a common set of model options. For the simulated dataset, stepwise covariate modeling identified three of four true covariates and two spurious covariates; Lasso identified two of four true and 0 spurious covariates; and the single-objective, hybrid genetic algorithm identified three of four true covariates and one spurious covariate. For the clinical datasets, the Akaike information criterion was a median of 22.3 points lower (range of 470.5 point decrease to 0.1 point decrease) for the best single-objective hybrid genetic-algorithm candidate model versus the final manual stepwise model: the Akaike information criterion was lower by greater than 10 points for four compounds and differed by less than 10 points for three compounds. The root mean squared error and absolute mean prediction error of the best single-objective hybrid genetic algorithm candidates were a median of 0.2 points higher (range of 38.9 point decrease to 27.3 point increase) and 0.02 points lower (range of 0.98 point decrease to 0.74 point increase), respectively, than that of the final stepwise models. In addition, the best single-objective, hybrid genetic algorithm candidate models had successful convergence and covariance steps for each compound, used the same compartment structure as the manual stepwise approach for 6 of 7 (86 %) compounds, and identified 54 % (7 of 13) of covariates included by the manual stepwise approach and 16 covariate relationships not included by manual stepwise models. The model parameter values between the final manual stepwise and best single-objective, hybrid genetic algorithm models differed by a median of 26.7 % (q₁ = 4.9 % and q₃ = 57.1 %). Finally, the single-objective, hybrid genetic algorithm approach was able to identify models capable of estimating absorption rate parameters for four compounds that the manual stepwise approach did not identify. The single-objective, hybrid genetic algorithm represents a general pharmacokinetic model building methodology whose ability to rapidly search the feasible solution space leads to nearly equivalent or superior model fits to pharmacokinetic data.
Applications of color machine vision in the agricultural and food industries
NASA Astrophysics Data System (ADS)
Zhang, Min; Ludas, Laszlo I.; Morgan, Mark T.; Krutz, Gary W.; Precetti, Cyrille J.
1999-01-01
Color is an important factor in Agricultural and the Food Industry. Agricultural or prepared food products are often grade by producers and consumers using color parameters. Color is used to estimate maturity, sort produce for defects, but also perform genetic screenings or make an aesthetic judgement. The task of sorting produce following a color scale is very complex, requires special illumination and training. Also, this task cannot be performed for long durations without fatigue and loss of accuracy. This paper describes a machine vision system designed to perform color classification in real-time. Applications for sorting a variety of agricultural products are included: e.g. seeds, meat, baked goods, plant and wood.FIrst the theory of color classification of agricultural and biological materials is introduced. Then, some tools for classifier development are presented. Finally, the implementation of the algorithm on real-time image processing hardware and example applications for industry is described. This paper also presented an image analysis algorithm and a prototype machine vision system which was developed for industry. This system will automatically locate the surface of some plants using digital camera and predict information such as size, potential value and type of this plant. The algorithm developed will be feasible for real-time identification in an industrial environment.
Fan, Jianping; Gao, Yuli; Luo, Hangzai
2008-03-01
In this paper, we have developed a new scheme for achieving multilevel annotations of large-scale images automatically. To achieve more sufficient representation of various visual properties of the images, both the global visual features and the local visual features are extracted for image content representation. To tackle the problem of huge intraconcept visual diversity, multiple types of kernels are integrated to characterize the diverse visual similarity relationships between the images more precisely, and a multiple kernel learning algorithm is developed for SVM image classifier training. To address the problem of huge interconcept visual similarity, a novel multitask learning algorithm is developed to learn the correlated classifiers for the sibling image concepts under the same parent concept and enhance their discrimination and adaptation power significantly. To tackle the problem of huge intraconcept visual diversity for the image concepts at the higher levels of the concept ontology, a novel hierarchical boosting algorithm is developed to learn their ensemble classifiers hierarchically. In order to assist users on selecting more effective hypotheses for image classifier training, we have developed a novel hyperbolic framework for large-scale image visualization and interactive hypotheses assessment. Our experiments on large-scale image collections have also obtained very positive results.
Cloud computing-based TagSNP selection algorithm for human genome data.
Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen Jane; Lin, Yaw-Ling
2015-01-05
Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.
New optimization model for routing and spectrum assignment with nodes insecurity
NASA Astrophysics Data System (ADS)
Xuan, Hejun; Wang, Yuping; Xu, Zhanqi; Hao, Shanshan; Wang, Xiaoli
2017-04-01
By adopting the orthogonal frequency division multiplexing technology, elastic optical networks can provide the flexible and variable bandwidth allocation to each connection request and get higher spectrum utilization. The routing and spectrum assignment problem in elastic optical network is a well-known NP-hard problem. In addition, information security has received worldwide attention. We combine these two problems to investigate the routing and spectrum assignment problem with the guaranteed security in elastic optical network, and establish a new optimization model to minimize the maximum index of the used frequency slots, which is used to determine an optimal routing and spectrum assignment schemes. To solve the model effectively, a hybrid genetic algorithm framework integrating a heuristic algorithm into a genetic algorithm is proposed. The heuristic algorithm is first used to sort the connection requests and then the genetic algorithm is designed to look for an optimal routing and spectrum assignment scheme. In the genetic algorithm, tailor-made crossover, mutation and local search operators are designed. Moreover, simulation experiments are conducted with three heuristic strategies, and the experimental results indicate that the effectiveness of the proposed model and algorithm framework.
The Applications of Genetic Algorithms in Medicine.
Ghaheri, Ali; Shoar, Saeed; Naderan, Mohammad; Hoseini, Sayed Shahabuddin
2015-11-01
A great wealth of information is hidden amid medical research data that in some cases cannot be easily analyzed, if at all, using classical statistical methods. Inspired by nature, metaheuristic algorithms have been developed to offer optimal or near-optimal solutions to complex data analysis and decision-making tasks in a reasonable time. Due to their powerful features, metaheuristic algorithms have frequently been used in other fields of sciences. In medicine, however, the use of these algorithms are not known by physicians who may well benefit by applying them to solve complex medical problems. Therefore, in this paper, we introduce the genetic algorithm and its applications in medicine. The use of the genetic algorithm has promising implications in various medical specialties including radiology, radiotherapy, oncology, pediatrics, cardiology, endocrinology, surgery, obstetrics and gynecology, pulmonology, infectious diseases, orthopedics, rehabilitation medicine, neurology, pharmacotherapy, and health care management. This review introduces the applications of the genetic algorithm in disease screening, diagnosis, treatment planning, pharmacovigilance, prognosis, and health care management, and enables physicians to envision possible applications of this metaheuristic method in their medical career.].
The Applications of Genetic Algorithms in Medicine
Ghaheri, Ali; Shoar, Saeed; Naderan, Mohammad; Hoseini, Sayed Shahabuddin
2015-01-01
A great wealth of information is hidden amid medical research data that in some cases cannot be easily analyzed, if at all, using classical statistical methods. Inspired by nature, metaheuristic algorithms have been developed to offer optimal or near-optimal solutions to complex data analysis and decision-making tasks in a reasonable time. Due to their powerful features, metaheuristic algorithms have frequently been used in other fields of sciences. In medicine, however, the use of these algorithms are not known by physicians who may well benefit by applying them to solve complex medical problems. Therefore, in this paper, we introduce the genetic algorithm and its applications in medicine. The use of the genetic algorithm has promising implications in various medical specialties including radiology, radiotherapy, oncology, pediatrics, cardiology, endocrinology, surgery, obstetrics and gynecology, pulmonology, infectious diseases, orthopedics, rehabilitation medicine, neurology, pharmacotherapy, and health care management. This review introduces the applications of the genetic algorithm in disease screening, diagnosis, treatment planning, pharmacovigilance, prognosis, and health care management, and enables physicians to envision possible applications of this metaheuristic method in their medical career.] PMID:26676060
Cloud Computing-Based TagSNP Selection Algorithm for Human Genome Data
Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen Jane; Lin, Yaw-Ling
2015-01-01
Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used. PMID:25569088
NASA Astrophysics Data System (ADS)
Chandra, Malavika; Scheiman, James; Simeone, Diane; McKenna, Barbara; Purdy, Julianne; Mycek, Mary-Ann
2010-01-01
Pancreatic adenocarcinoma is one of the leading causes of cancer death, in part because of the inability of current diagnostic methods to reliably detect early-stage disease. We present the first assessment of the diagnostic accuracy of algorithms developed for pancreatic tissue classification using data from fiber optic probe-based bimodal optical spectroscopy, a real-time approach that would be compatible with minimally invasive diagnostic procedures for early cancer detection in the pancreas. A total of 96 fluorescence and 96 reflectance spectra are considered from 50 freshly excised tissue sites-including human pancreatic adenocarcinoma, chronic pancreatitis (inflammation), and normal tissues-on nine patients. Classification algorithms using linear discriminant analysis are developed to distinguish among tissues, and leave-one-out cross-validation is employed to assess the classifiers' performance. The spectral areas and ratios classifier (SpARC) algorithm employs a combination of reflectance and fluorescence data and has the best performance, with sensitivity, specificity, negative predictive value, and positive predictive value for correctly identifying adenocarcinoma being 85, 89, 92, and 80%, respectively.
Evaluation of Genetic Algorithm Concepts Using Model Problems. Part 2; Multi-Objective Optimization
NASA Technical Reports Server (NTRS)
Holst, Terry L.; Pulliam, Thomas H.
2003-01-01
A genetic algorithm approach suitable for solving multi-objective optimization problems is described and evaluated using a series of simple model problems. Several new features including a binning selection algorithm and a gene-space transformation procedure are included. The genetic algorithm is suitable for finding pareto optimal solutions in search spaces that are defined by any number of genes and that contain any number of local extrema. Results indicate that the genetic algorithm optimization approach is flexible in application and extremely reliable, providing optimal results for all optimization problems attempted. The binning algorithm generally provides pareto front quality enhancements and moderate convergence efficiency improvements for most of the model problems. The gene-space transformation procedure provides a large convergence efficiency enhancement for problems with non-convoluted pareto fronts and a degradation in efficiency for problems with convoluted pareto fronts. The most difficult problems --multi-mode search spaces with a large number of genes and convoluted pareto fronts-- require a large number of function evaluations for GA convergence, but always converge.
A genetic algorithm for replica server placement
NASA Astrophysics Data System (ADS)
Eslami, Ghazaleh; Toroghi Haghighat, Abolfazl
2012-01-01
Modern distribution systems use replication to improve communication delay experienced by their clients. Some techniques have been developed for web server replica placement. One of the previous studies was Greedy algorithm proposed by Qiu et al, that needs knowledge about network topology. In This paper, first we introduce a genetic algorithm for web server replica placement. Second, we compare our algorithm with Greedy algorithm proposed by Qiu et al, and Optimum algorithm. We found that our approach can achieve better results than Greedy algorithm proposed by Qiu et al but it's computational time is more than Greedy algorithm.
A genetic algorithm for replica server placement
NASA Astrophysics Data System (ADS)
Eslami, Ghazaleh; Toroghi Haghighat, Abolfazl
2011-12-01
Modern distribution systems use replication to improve communication delay experienced by their clients. Some techniques have been developed for web server replica placement. One of the previous studies was Greedy algorithm proposed by Qiu et al, that needs knowledge about network topology. In This paper, first we introduce a genetic algorithm for web server replica placement. Second, we compare our algorithm with Greedy algorithm proposed by Qiu et al, and Optimum algorithm. We found that our approach can achieve better results than Greedy algorithm proposed by Qiu et al but it's computational time is more than Greedy algorithm.
Algorithms for detecting antibodies to HIV-1: results from a rural Ugandan cohort.
Nunn, A J; Biryahwaho, B; Downing, R G; van der Groen, G; Ojwiya, A; Mulder, D W
1993-08-01
To evaluate an algorithm using two enzyme immunoassays (EIA) for anti-HIV-1 antibodies in a rural African population and to assess alternative simplified algorithms. Sera obtained from 7895 individuals in a rural population survey were tested using an algorithm based on two different EIA systems: Recombigen HIV-1 EIA and Wellcozyme HIV-1 Recombinant. Alternative algorithms were assessed using negative or confirmed positive sera. None of the 227 sera classified as unequivocably negative by the two assays were positive by Western blot. Of 192 sera unequivocably positive by both assays, four were seronegative by Western blot. The possibility of technical error cannot be ruled out in three of these. One of the alternative algorithms assessed classified all borderline or discordant assay results as negative had a specificity of 100% and a sensitivity of 98.4%. The cost of this algorithm is one-third that of the conventional algorithm. Our evaluation suggests that high specificity and sensitivity can be obtained without using Western blot and at a considerable reduction in cost.
Conwell, Darwin L.; Lee, Linda S.; Yadav, Dhiraj; Longnecker, Daniel S.; Miller, Frank H.; Mortele, Koenraad J.; Levy, Michael J.; Kwon, Richard; Lieb, John G.; Stevens, Tyler; Toskes, Philip P.; Gardner, Timothy B.; Gelrud, Andres; Wu, Bechien U.; Forsmark, Christopher E.; Vege, Santhi S.
2016-01-01
The diagnosis of chronic pancreatitis remains challenging in early stages of the disease. This report defines the diagnostic criteria useful in the assessment of patients with suspected and established chronic pancreatitis. All current diagnostic procedures are reviewed and evidence based statements are provided about their utility and limitations. Diagnostic criteria for chronic pancreatitis are classified as definitive, probable or insufficient evidence. A diagnostic (STEP-wise; S-survey, T-tomography, E-endoscopy and P-pancreas function testing) algorithm is proposed that proceeds from a non-invasive to a more invasive approach. This algorithm maximizes specificity (low false positive rate) in subjects with chronic abdominal pain and equivocal imaging changes. Futhermore, a nomenclature is suggested to further characterize patients with established chronic pancreatitis based on TIGAR-O (T-toxic, I-idiopathic, G-genetic, A- autoimmune, R-recurrent and O-obstructive) etiology, gland morphology (Cambridge criteria) and physiologic state (exocrine, endocrine function) for uniformity across future multi-center research collaborations. This guideline will serve as a baseline manuscript that will be modified as new evidence becomes available and our knowledge of chronic pancreatitis improves. PMID:25333398
Wang, Shunfang; Liu, Shuhui
2015-12-19
An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one.
Energy minimization in medical image analysis: Methodologies and applications.
Zhao, Feng; Xie, Xianghua
2016-02-01
Energy minimization is of particular interest in medical image analysis. In the past two decades, a variety of optimization schemes have been developed. In this paper, we present a comprehensive survey of the state-of-the-art optimization approaches. These algorithms are mainly classified into two categories: continuous method and discrete method. The former includes Newton-Raphson method, gradient descent method, conjugate gradient method, proximal gradient method, coordinate descent method, and genetic algorithm-based method, while the latter covers graph cuts method, belief propagation method, tree-reweighted message passing method, linear programming method, maximum margin learning method, simulated annealing method, and iterated conditional modes method. We also discuss the minimal surface method, primal-dual method, and the multi-objective optimization method. In addition, we review several comparative studies that evaluate the performance of different minimization techniques in terms of accuracy, efficiency, or complexity. These optimization techniques are widely used in many medical applications, for example, image segmentation, registration, reconstruction, motion tracking, and compressed sensing. We thus give an overview on those applications as well. Copyright © 2015 John Wiley & Sons, Ltd.
A parallel stereo reconstruction algorithm with applications in entomology (APSRA)
NASA Astrophysics Data System (ADS)
Bhasin, Rajesh; Jang, Won Jun; Hart, John C.
2012-03-01
We propose a fast parallel algorithm for the reconstruction of 3-Dimensional point clouds of insects from binocular stereo image pairs using a hierarchical approach for disparity estimation. Entomologists study various features of insects to classify them, build their distribution maps, and discover genetic links between specimens among various other essential tasks. This information is important to the pesticide and the pharmaceutical industries among others. When considering the large collections of insects entomologists analyze, it becomes difficult to physically handle the entire collection and share the data with researchers across the world. With the method presented in our work, Entomologists can create an image database for their collections and use the 3D models for studying the shape and structure of the insects thus making it easier to maintain and share. Initial feedback shows that the reconstructed 3D models preserve the shape and size of the specimen. We further optimize our results to incorporate multiview stereo which produces better overall structure of the insects. Our main contribution is applying stereoscopic vision techniques to entomology to solve the problems faced by entomologists.
Artificial Intelligence Methods Applied to Parameter Detection of Atrial Fibrillation
NASA Astrophysics Data System (ADS)
Arotaritei, D.; Rotariu, C.
2015-09-01
In this paper we present a novel method to develop an atrial fibrillation (AF) based on statistical descriptors and hybrid neuro-fuzzy and crisp system. The inference of system produce rules of type if-then-else that care extracted to construct a binary decision system: normal of atrial fibrillation. We use TPR (Turning Point Ratio), SE (Shannon Entropy) and RMSSD (Root Mean Square of Successive Differences) along with a new descriptor, Teager- Kaiser energy, in order to improve the accuracy of detection. The descriptors are calculated over a sliding window that produce very large number of vectors (massive dataset) used by classifier. The length of window is a crisp descriptor meanwhile the rest of descriptors are interval-valued type. The parameters of hybrid system are adapted using Genetic Algorithm (GA) algorithm with fitness single objective target: highest values for sensibility and sensitivity. The rules are extracted and they are part of the decision system. The proposed method was tested using the Physionet MIT-BIH Atrial Fibrillation Database and the experimental results revealed a good accuracy of AF detection in terms of sensitivity and specificity (above 90%).
Wang, Shunfang; Liu, Shuhui
2015-01-01
An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one. PMID:26703574
Eroglu, Duygu Yilmaz; Ozmutlu, H Cenk
2014-01-01
We developed mixed integer programming (MIP) models and hybrid genetic-local search algorithms for the scheduling problem of unrelated parallel machines with job sequence and machine-dependent setup times and with job splitting property. The first contribution of this paper is to introduce novel algorithms which make splitting and scheduling simultaneously with variable number of subjobs. We proposed simple chromosome structure which is constituted by random key numbers in hybrid genetic-local search algorithm (GAspLA). Random key numbers are used frequently in genetic algorithms, but it creates additional difficulty when hybrid factors in local search are implemented. We developed algorithms that satisfy the adaptation of results of local search into the genetic algorithms with minimum relocation operation of genes' random key numbers. This is the second contribution of the paper. The third contribution of this paper is three developed new MIP models which are making splitting and scheduling simultaneously. The fourth contribution of this paper is implementation of the GAspLAMIP. This implementation let us verify the optimality of GAspLA for the studied combinations. The proposed methods are tested on a set of problems taken from the literature and the results validate the effectiveness of the proposed algorithms.
Foo, Brian; van der Schaar, Mihaela
2010-11-01
In this paper, we discuss distributed optimization techniques for configuring classifiers in a real-time, informationally-distributed stream mining system. Due to the large volume of streaming data, stream mining systems must often cope with overload, which can lead to poor performance and intolerable processing delay for real-time applications. Furthermore, optimizing over an entire system of classifiers is a difficult task since changing the filtering process at one classifier can impact both the feature values of data arriving at classifiers further downstream and thus, the classification performance achieved by an ensemble of classifiers, as well as the end-to-end processing delay. To address this problem, this paper makes three main contributions: 1) Based on classification and queuing theoretic models, we propose a utility metric that captures both the performance and the delay of a binary filtering classifier system. 2) We introduce a low-complexity framework for estimating the system utility by observing, estimating, and/or exchanging parameters between the inter-related classifiers deployed across the system. 3) We provide distributed algorithms to reconfigure the system, and analyze the algorithms based on their convergence properties, optimality, information exchange overhead, and rate of adaptation to non-stationary data sources. We provide results using different video classifier systems.
Rosenthal, E T; Bowles, K R; Pruss, D; van Kan, A; Vail, P J; McElroy, H; Wenstrup, R J
2015-12-01
Based on current consensus guidelines and standard practice, many genetic variants detected in clinical testing are classified as disease causing based on their predicted impact on the normal expression or function of the gene in the absence of additional data. However, our laboratory has identified a subset of such variants in hereditary cancer genes for which compelling contradictory evidence emerged after the initial evaluation following the first observation of the variant. Three representative examples of variants in BRCA1, BRCA2 and MSH2 that are predicted to disrupt splicing, prematurely truncate the protein, or remove the start codon were evaluated for pathogenicity by analyzing clinical data with multiple classification algorithms. Available clinical data for all three variants contradicts the expected pathogenic classification. These variants illustrate potential pitfalls associated with standard approaches to variant classification as well as the challenges associated with monitoring data, updating classifications, and reporting potentially contradictory interpretations to the clinicians responsible for translating test outcomes to appropriate clinical action. It is important to address these challenges now as the model for clinical testing moves toward the use of large multi-gene panels and whole exome/genome analysis, which will dramatically increase the number of genetic variants identified. © 2015 The Authors. Clinical Genetics published by John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Discrete mixture modeling to address genetic heterogeneity in time-to-event regression
Eng, Kevin H.; Hanlon, Bret M.
2014-01-01
Motivation: Time-to-event regression models are a critical tool for associating survival time outcomes with molecular data. Despite mounting evidence that genetic subgroups of the same clinical disease exist, little attention has been given to exploring how this heterogeneity affects time-to-event model building and how to accommodate it. Methods able to diagnose and model heterogeneity should be valuable additions to the biomarker discovery toolset. Results: We propose a mixture of survival functions that classifies subjects with similar relationships to a time-to-event response. This model incorporates multivariate regression and model selection and can be fit with an expectation maximization algorithm, we call Cox-assisted clustering. We illustrate a likely manifestation of genetic heterogeneity and demonstrate how it may affect survival models with little warning. An application to gene expression in ovarian cancer DNA repair pathways illustrates how the model may be used to learn new genetic subsets for risk stratification. We explore the implications of this model for censored observations and the effect on genomic predictors and diagnostic analysis. Availability and implementation: R implementation of CAC using standard packages is available at https://gist.github.com/programeng/8620b85146b14b6edf8f Data used in the analysis are publicly available. Contact: kevin.eng@roswellpark.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24532723
Truss Optimization for a Manned Nuclear Electric Space Vehicle using Genetic Algorithms
NASA Technical Reports Server (NTRS)
Benford, Andrew; Tinker, Michael L.
2004-01-01
The purpose of this paper is to utilize the genetic algorithm (GA) optimization method for structural design of a nuclear propulsion vehicle. Genetic algorithms provide a guided, random search technique that mirrors biological adaptation. To verify the GA capabilities, other traditional optimization methods were used to generate results for comparison to the GA results, first for simple two-dimensional structures, and then for full-scale three-dimensional truss designs.
Superscattering of light optimized by a genetic algorithm
NASA Astrophysics Data System (ADS)
Mirzaei, Ali; Miroshnichenko, Andrey E.; Shadrivov, Ilya V.; Kivshar, Yuri S.
2014-07-01
We analyse scattering of light from multi-layer plasmonic nanowires and employ a genetic algorithm for optimizing the scattering cross section. We apply the mode-expansion method using experimental data for material parameters to demonstrate that our genetic algorithm allows designing realistic core-shell nanostructures with the superscattering effect achieved at any desired wavelength. This approach can be employed for optimizing both superscattering and cloaking at different wavelengths in the visible spectral range.
A High Fuel Consumption Efficiency Management Scheme for PHEVs Using an Adaptive Genetic Algorithm
Lee, Wah Ching; Tsang, Kim Fung; Chi, Hao Ran; Hung, Faan Hei; Wu, Chung Kit; Chui, Kwok Tai; Lau, Wing Hong; Leung, Yat Wah
2015-01-01
A high fuel efficiency management scheme for plug-in hybrid electric vehicles (PHEVs) has been developed. In order to achieve fuel consumption reduction, an adaptive genetic algorithm scheme has been designed to adaptively manage the energy resource usage. The objective function of the genetic algorithm is implemented by designing a fuzzy logic controller which closely monitors and resembles the driving conditions and environment of PHEVs, thus trading off between petrol versus electricity for optimal driving efficiency. Comparison between calculated results and publicized data shows that the achieved efficiency of the fuzzified genetic algorithm is better by 10% than existing schemes. The developed scheme, if fully adopted, would help reduce over 600 tons of CO2 emissions worldwide every day. PMID:25587974
Neural-network-assisted genetic algorithm applied to silicon clusters
NASA Astrophysics Data System (ADS)
Marim, L. R.; Lemes, M. R.; dal Pino, A.
2003-03-01
Recently, a new optimization procedure that combines the power of artificial neural-networks with the versatility of the genetic algorithm (GA) was introduced. This method, called neural-network-assisted genetic algorithm (NAGA), uses a neural network to restrict the search space and it is expected to speed up the solution of global optimization problems if some previous information is available. In this paper, we have tested NAGA to determine the ground-state geometry of Sin (10⩽n⩽15) according to a tight-binding total-energy method. Our results indicate that NAGA was able to find the desired global minimum of the potential energy for all the test cases and it was at least ten times faster than pure genetic algorithm.
Automated detection of tuberculosis on sputum smeared slides using stepwise classification
NASA Astrophysics Data System (ADS)
Divekar, Ajay; Pangilinan, Corina; Coetzee, Gerrit; Sondh, Tarlochan; Lure, Fleming Y. M.; Kennedy, Sean
2012-03-01
Routine visual slide screening for identification of tuberculosis (TB) bacilli in stained sputum slides under microscope system is a tedious labor-intensive task and can miss up to 50% of TB. Based on the Shannon cofactor expansion on Boolean function for classification, a stepwise classification (SWC) algorithm is developed to remove different types of false positives, one type at a time, and to increase the detection of TB bacilli at different concentrations. Both bacilli and non-bacilli objects are first analyzed and classified into several different categories including scanty positive, high concentration positive, and several non-bacilli categories: small bright objects, beaded, dim elongated objects, etc. The morphological and contrast features are extracted based on aprior clinical knowledge. The SWC is composed of several individual classifiers. Individual classifier to increase the bacilli counts utilizes an adaptive algorithm based on a microbiologist's statistical heuristic decision process. Individual classifier to reduce false positive is developed through minimization from a binary decision tree to classify different types of true and false positive based on feature vectors. Finally, the detection algorithm is was tested on 102 independent confirmed negative and 74 positive cases. A multi-class task analysis shows high accordance rate for negative, scanty, and high-concentration as 88.24%, 56.00%, and 97.96%, respectively. A binary-class task analysis using a receiver operating characteristics method with the area under the curve (Az) is also utilized to analyze the performance of this detection algorithm, showing the superior detection performance on the high-concentration cases (Az=0.913) and cases mixed with high-concentration and scanty cases (Az=0.878).
Moghram, Basem Ameen; Nabil, Emad; Badr, Amr
2018-01-01
T-cell epitope structure identification is a significant challenging immunoinformatic problem within epitope-based vaccine design. Epitopes or antigenic peptides are a set of amino acids that bind with the Major Histocompatibility Complex (MHC) molecules. The aim of this process is presented by Antigen Presenting Cells to be inspected by T-cells. MHC-molecule-binding epitopes are responsible for triggering the immune response to antigens. The epitope's three-dimensional (3D) molecular structure (i.e., tertiary structure) reflects its proper function. Therefore, the identification of MHC class-II epitopes structure is a significant step towards epitope-based vaccine design and understanding of the immune system. In this paper, we propose a new technique using a Genetic Algorithm for Predicting the Epitope Structure (GAPES), to predict the structure of MHC class-II epitopes based on their sequence. The proposed Elitist-based genetic algorithm for predicting the epitope's tertiary structure is based on Ab-Initio Empirical Conformational Energy Program for Peptides (ECEPP) Force Field Model. The developed secondary structure prediction technique relies on Ramachandran Plot. We used two alignment algorithms: the ROSS alignment and TM-Score alignment. We applied four different alignment approaches to calculate the similarity scores of the dataset under test. We utilized the support vector machine (SVM) classifier as an evaluation of the prediction performance. The prediction accuracy and the Area Under Receiver Operating Characteristic (ROC) Curve (AUC) were calculated as measures of performance. The calculations are performed on twelve similarity-reduced datasets of the Immune Epitope Data Base (IEDB) and a large dataset of peptide-binding affinities to HLA-DRB1*0101. The results showed that GAPES was reliable and very accurate. We achieved an average prediction accuracy of 93.50% and an average AUC of 0.974 in the IEDB dataset. Also, we achieved an accuracy of 95.125% and an AUC of 0.987 on the HLA-DRB1*0101 allele of the Wang benchmark dataset. The results indicate that the proposed prediction technique "GAPES" is a promising technique that will help researchers and scientists to predict the protein structure and it will assist them in the intelligent design of new epitope-based vaccines. Copyright © 2017 Elsevier B.V. All rights reserved.
A low computation cost method for seizure prediction.
Zhang, Yanli; Zhou, Weidong; Yuan, Qi; Wu, Qi
2014-10-01
The dynamic changes of electroencephalograph (EEG) signals in the period prior to epileptic seizures play a major role in the seizure prediction. This paper proposes a low computation seizure prediction algorithm that combines a fractal dimension with a machine learning algorithm. The presented seizure prediction algorithm extracts the Higuchi fractal dimension (HFD) of EEG signals as features to classify the patient's preictal or interictal state with Bayesian linear discriminant analysis (BLDA) as a classifier. The outputs of BLDA are smoothed by a Kalman filter for reducing possible sporadic and isolated false alarms and then the final prediction results are produced using a thresholding procedure. The algorithm was evaluated on the intracranial EEG recordings of 21 patients in the Freiburg EEG database. For seizure occurrence period of 30 min and 50 min, our algorithm obtained an average sensitivity of 86.95% and 89.33%, an average false prediction rate of 0.20/h, and an average prediction time of 24.47 min and 39.39 min, respectively. The results confirm that the changes of HFD can serve as a precursor of ictal activities and be used for distinguishing between interictal and preictal epochs. Both HFD and BLDA classifier have a low computational complexity. All of these make the proposed algorithm suitable for real-time seizure prediction. Copyright © 2014 Elsevier B.V. All rights reserved.
Multiple Query Evaluation Based on an Enhanced Genetic Algorithm.
ERIC Educational Resources Information Center
Tamine, Lynda; Chrisment, Claude; Boughanem, Mohand
2003-01-01
Explains the use of genetic algorithms to combine results from multiple query evaluations to improve relevance in information retrieval. Discusses niching techniques, relevance feedback techniques, and evolution heuristics, and compares retrieval results obtained by both genetic multiple query evaluation and classical single query evaluation…
Characterization of uncertainty and sensitivity of model parameters is an essential and often overlooked facet of hydrological modeling. This paper introduces an algorithm called MOESHA that combines input parameter sensitivity analyses with a genetic algorithm calibration routin...
Surkis, Alisa; Hogle, Janice A; DiazGranados, Deborah; Hunt, Joe D; Mazmanian, Paul E; Connors, Emily; Westaby, Kate; Whipple, Elizabeth C; Adamus, Trisha; Mueller, Meridith; Aphinyanaphongs, Yindalon
2016-08-05
Translational research is a key area of focus of the National Institutes of Health (NIH), as demonstrated by the substantial investment in the Clinical and Translational Science Award (CTSA) program. The goal of the CTSA program is to accelerate the translation of discoveries from the bench to the bedside and into communities. Different classification systems have been used to capture the spectrum of basic to clinical to population health research, with substantial differences in the number of categories and their definitions. Evaluation of the effectiveness of the CTSA program and of translational research in general is hampered by the lack of rigor in these definitions and their application. This study adds rigor to the classification process by creating a checklist to evaluate publications across the translational spectrum and operationalizes these classifications by building machine learning-based text classifiers to categorize these publications. Based on collaboratively developed definitions, we created a detailed checklist for categories along the translational spectrum from T0 to T4. We applied the checklist to CTSA-linked publications to construct a set of coded publications for use in training machine learning-based text classifiers to classify publications within these categories. The training sets combined T1/T2 and T3/T4 categories due to low frequency of these publication types compared to the frequency of T0 publications. We then compared classifier performance across different algorithms and feature sets and applied the classifiers to all publications in PubMed indexed to CTSA grants. To validate the algorithm, we manually classified the articles with the top 100 scores from each classifier. The definitions and checklist facilitated classification and resulted in good inter-rater reliability for coding publications for the training set. Very good performance was achieved for the classifiers as represented by the area under the receiver operating curves (AUC), with an AUC of 0.94 for the T0 classifier, 0.84 for T1/T2, and 0.92 for T3/T4. The combination of definitions agreed upon by five CTSA hubs, a checklist that facilitates more uniform definition interpretation, and algorithms that perform well in classifying publications along the translational spectrum provide a basis for establishing and applying uniform definitions of translational research categories. The classification algorithms allow publication analyses that would not be feasible with manual classification, such as assessing the distribution and trends of publications across the CTSA network and comparing the categories of publications and their citations to assess knowledge transfer across the translational research spectrum.
Minimalist ensemble algorithms for genome-wide protein localization prediction.
Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun
2012-07-03
Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
Minimalist ensemble algorithms for genome-wide protein localization prediction
2012-01-01
Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391
An interactive system for computer-aided diagnosis of breast masses.
Wang, Xingwei; Li, Lihua; Liu, Wei; Xu, Weidong; Lederman, Dror; Zheng, Bin
2012-10-01
Although mammography is the only clinically accepted imaging modality for screening the general population to detect breast cancer, interpreting mammograms is difficult with lower sensitivity and specificity. To provide radiologists "a visual aid" in interpreting mammograms, we developed and tested an interactive system for computer-aided detection and diagnosis (CAD) of mass-like cancers. Using this system, an observer can view CAD-cued mass regions depicted on one image and then query any suspicious regions (either cued or not cued by CAD). CAD scheme automatically segments the suspicious region or accepts manually defined region and computes a set of image features. Using content-based image retrieval (CBIR) algorithm, CAD searches for a set of reference images depicting "abnormalities" similar to the queried region. Based on image retrieval results and a decision algorithm, a classification score is assigned to the queried region. In this study, a reference database with 1,800 malignant mass regions and 1,800 benign and CAD-generated false-positive regions was used. A modified CBIR algorithm with a new function of stretching the attributes in the multi-dimensional space and decision scheme was optimized using a genetic algorithm. Using a leave-one-out testing method to classify suspicious mass regions, we compared the classification performance using two CBIR algorithms with either equally weighted or optimally stretched attributes. Using the modified CBIR algorithm, the area under receiver operating characteristic curve was significantly increased from 0.865 ± 0.006 to 0.897 ± 0.005 (p < 0.001). This study demonstrated the feasibility of developing an interactive CAD system with a large reference database and achieving improved performance.
A hybrid cost-sensitive ensemble for imbalanced breast thermogram classification.
Krawczyk, Bartosz; Schaefer, Gerald; Woźniak, Michał
2015-11-01
Early recognition of breast cancer, the most commonly diagnosed form of cancer in women, is of crucial importance, given that it leads to significantly improved chances of survival. Medical thermography, which uses an infrared camera for thermal imaging, has been demonstrated as a particularly useful technique for early diagnosis, because it detects smaller tumors than the standard modality of mammography. In this paper, we analyse breast thermograms by extracting features describing bilateral symmetries between the two breast areas, and present a classification system for decision making. Clearly, the costs associated with missing a cancer case are much higher than those for mislabelling a benign case. At the same time, datasets contain significantly fewer malignant cases than benign ones. Standard classification approaches fail to consider either of these aspects. In this paper, we introduce a hybrid cost-sensitive classifier ensemble to address this challenging problem. Our approach entails a pool of cost-sensitive decision trees which assign a higher misclassification cost to the malignant class, thereby boosting its recognition rate. A genetic algorithm is employed for simultaneous feature selection and classifier fusion. As an optimisation criterion, we use a combination of misclassification cost and diversity to achieve both a high sensitivity and a heterogeneous ensemble. Furthermore, we prune our ensemble by discarding classifiers that contribute minimally to the decision making. For a challenging dataset of about 150 thermograms, our approach achieves an excellent sensitivity of 83.10%, while maintaining a high specificity of 89.44%. This not only signifies improved recognition of malignant cases, it also statistically outperforms other state-of-the-art algorithms designed for imbalanced classification, and hence provides an effective approach for analysing breast thermograms. Our proposed hybrid cost-sensitive ensemble can facilitate a highly accurate early diagnostic of breast cancer based on thermogram features. It overcomes the difficulties posed by the imbalanced distribution of patients in the two analysed groups. Copyright © 2015 Elsevier B.V. All rights reserved.
Localization Algorithms of Underwater Wireless Sensor Networks: A Survey
Han, Guangjie; Jiang, Jinfang; Shu, Lei; Xu, Yongjun; Wang, Feng
2012-01-01
In Underwater Wireless Sensor Networks (UWSNs), localization is one of most important technologies since it plays a critical role in many applications. Motivated by widespread adoption of localization, in this paper, we present a comprehensive survey of localization algorithms. First, we classify localization algorithms into three categories based on sensor nodes’ mobility: stationary localization algorithms, mobile localization algorithms and hybrid localization algorithms. Moreover, we compare the localization algorithms in detail and analyze future research directions of localization algorithms in UWSNs. PMID:22438752
A genetic algorithm for solving supply chain network design model
NASA Astrophysics Data System (ADS)
Firoozi, Z.; Ismail, N.; Ariafar, S. H.; Tang, S. H.; Ariffin, M. K. M. A.
2013-09-01
Network design is by nature costly and optimization models play significant role in reducing the unnecessary cost components of a distribution network. This study proposes a genetic algorithm to solve a distribution network design model. The structure of the chromosome in the proposed algorithm is defined in a novel way that in addition to producing feasible solutions, it also reduces the computational complexity of the algorithm. Computational results are presented to show the algorithm performance.
NASA Astrophysics Data System (ADS)
Muslim, M. A.; Herowati, A. J.; Sugiharti, E.; Prasetiyo, B.
2018-03-01
A technique to dig valuable information buried or hidden in data collection which is so big to be found an interesting patterns that was previously unknown is called data mining. Data mining has been applied in the healthcare industry. One technique used data mining is classification. The decision tree included in the classification of data mining and algorithm developed by decision tree is C4.5 algorithm. A classifier is designed using applying pessimistic pruning in C4.5 algorithm in diagnosing chronic kidney disease. Pessimistic pruning use to identify and remove branches that are not needed, this is done to avoid overfitting the decision tree generated by the C4.5 algorithm. In this paper, the result obtained using these classifiers are presented and discussed. Using pessimistic pruning shows increase accuracy of C4.5 algorithm of 1.5% from 95% to 96.5% in diagnosing of chronic kidney disease.
Automatic tissue characterization from ultrasound imagery
NASA Astrophysics Data System (ADS)
Kadah, Yasser M.; Farag, Aly A.; Youssef, Abou-Bakr M.; Badawi, Ahmed M.
1993-08-01
In this work, feature extraction algorithms are proposed to extract the tissue characterization parameters from liver images. Then the resulting parameter set is further processed to obtain the minimum number of parameters representing the most discriminating pattern space for classification. This preprocessing step was applied to over 120 pathology-investigated cases to obtain the learning data for designing the classifier. The extracted features are divided into independent training and test sets and are used to construct both statistical and neural classifiers. The optimal criteria for these classifiers are set to have minimum error, ease of implementation and learning, and the flexibility for future modifications. Various algorithms for implementing various classification techniques are presented and tested on the data. The best performance was obtained using a single layer tensor model functional link network. Also, the voting k-nearest neighbor classifier provided comparably good diagnostic rates.
NASA Astrophysics Data System (ADS)
Yusupov, L. R.; Klochkova, K. V.; Simonova, L. A.
2017-09-01
The paper presents a methodology of modeling the chemical composition of the composite material via genetic algorithm for optimization of the manufacturing process of products. The paper presents algorithms of methods based on intelligent system of vermicular graphite iron design
A multiobjective optimization algorithm is applied to a groundwater quality management problem involving remediation by pump-and-treat (PAT). The multiobjective optimization framework uses the niched Pareto genetic algorithm (NPGA) and is applied to simultaneously minimize the...
Multi-Objective Constraint Satisfaction for Mobile Robot Area Defense
2010-03-01
17 NSGA-II non-dominated sorting genetic algorithm II . . . . . . . . . . . . . . . . . . . 17 jMetal Metaheuristic Algorithms in...to alert the other agents and ensure trust in the system. This research presents an algorithm that tasks robots to meet the two specific goals of...problem is defined as a constraint satisfaction problem solved using the Non-dominated Sorting Genetic Algorithm II (NSGA-II). Both goals of
Application of genetic algorithm in modeling on-wafer inductors for up to 110 Ghz
NASA Astrophysics Data System (ADS)
Liu, Nianhong; Fu, Jun; Liu, Hui; Cui, Wenpu; Liu, Zhihong; Liu, Linlin; Zhou, Wei; Wang, Quan; Guo, Ao
2018-05-01
In this work, the genetic algorithm has been introducted into parameter extraction for on-wafer inductors for up to 110 GHz millimeter-wave operations, and nine independent parameters of the equivalent circuit model are optimized together. With the genetic algorithm, the model with the optimized parameters gives a better fitting accuracy than the preliminary parameters without optimization. Especially, the fitting accuracy of the Q value achieves a significant improvement after the optimization.
Combinatorial Multiobjective Optimization Using Genetic Algorithms
NASA Technical Reports Server (NTRS)
Crossley, William A.; Martin. Eric T.
2002-01-01
The research proposed in this document investigated multiobjective optimization approaches based upon the Genetic Algorithm (GA). Several versions of the GA have been adopted for multiobjective design, but, prior to this research, there had not been significant comparisons of the most popular strategies. The research effort first generalized the two-branch tournament genetic algorithm in to an N-branch genetic algorithm, then the N-branch GA was compared with a version of the popular Multi-Objective Genetic Algorithm (MOGA). Because the genetic algorithm is well suited to combinatorial (mixed discrete / continuous) optimization problems, the GA can be used in the conceptual phase of design to combine selection (discrete variable) and sizing (continuous variable) tasks. Using a multiobjective formulation for the design of a 50-passenger aircraft to meet the competing objectives of minimizing takeoff gross weight and minimizing trip time, the GA generated a range of tradeoff designs that illustrate which aircraft features change from a low-weight, slow trip-time aircraft design to a heavy-weight, short trip-time aircraft design. Given the objective formulation and analysis methods used, the results of this study identify where turboprop-powered aircraft and turbofan-powered aircraft become more desirable for the 50 seat passenger application. This aircraft design application also begins to suggest how a combinatorial multiobjective optimization technique could be used to assist in the design of morphing aircraft.
Franasiak, Jason M; Olcha, Meir; Shastri, Shefali; Molinaro, Thomas A; Congdon, Haley; Treff, Nathan R; Scott, Richard T
2016-10-01
Is embryonic aneuploidy, as determined by comprehensive chromosome screening (CCS), related to genetic ancestry, as determined by ancestry informative markers (AIMs)? In this study, when determining continental ancestry utilizing AIMs, genetic ancestry does not have an impact on embryonic aneuploidy. Aneuploidy is one of the best-characterized barriers to ART success and little information exists regarding ethnicity and whole chromosome aneuploidy in IVF. Classifying continental ancestry utilizing genetic profiles from a selected group of single nucleotide polymorphisms, termed AIMs, can determine ancestral origin with more accuracy than self-reported data. This is a retrospective cohort study of patients undergoing their first cycle of IVF with CCS at a single center from 2008 to 2014. There were 2328 patients identified whom had undergone IVF/CCS and AIM genotyping. All patients underwent IVF/ICSI and CCS after trophectoderm biopsy. Patients' serum was genotyped using 32 custom AIMs to identify continental origin. Admixture proportions were determined using Bayesian clustering algorithms. Patients were assigned to the population (European, African, East Asian or Central/South Asian) corresponding to their greatest admixture proportion. The mean number of embryos tested was 5.3 (range = 1-40) and the mode was 1. Patients' ethnic classifications revealed European (n = 1698), African (n = 103), East Asian (n = 206) or Central/South Asian (n = 321). When controlling for age and BMI, aneuploidy rate did not differ by genetic ancestry (P = 0.28). The study type (retrospective) and the ability to classify patients by continental rather than sub-continental origin as well as the predominantly European patient mix may impact generalizability. Post hoc power calculation revealed power to detect a 16.8% difference in embryonic aneuploidy between the two smallest sample size groups. These data do not support differences in embryonic aneuploidy among various genetic ancestry groups in patients undergoing IVF/CCS. We used a novel approach of determining continental origin using a validated panel of AIMs as opposed to patient self-reported ethnicities. It does not appear that specific recommendations for aneuploidy screening should be made based upon continental heritage. None. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Smart, Otis; Burrell, Lauren
2014-01-01
Pattern classification for intracranial electroencephalogram (iEEG) and functional magnetic resonance imaging (fMRI) signals has furthered epilepsy research toward understanding the origin of epileptic seizures and localizing dysfunctional brain tissue for treatment. Prior research has demonstrated that implicitly selecting features with a genetic programming (GP) algorithm more effectively determined the proper features to discern biomarker and non-biomarker interictal iEEG and fMRI activity than conventional feature selection approaches. However for each the iEEG and fMRI modalities, it is still uncertain whether the stochastic properties of indirect feature selection with a GP yield (a) consistent results within a patient data set and (b) features that are specific or universal across multiple patient data sets. We examined the reproducibility of implicitly selecting features to classify interictal activity using a GP algorithm by performing several selection trials and subsequent frequent itemset mining (FIM) for separate iEEG and fMRI epilepsy patient data. We observed within-subject consistency and across-subject variability with some small similarity for selected features, indicating a clear need for patient-specific features and possible need for patient-specific feature selection or/and classification. For the fMRI, using nearest-neighbor classification and 30 GP generations, we obtained over 60% median sensitivity and over 60% median selectivity. For the iEEG, using nearest-neighbor classification and 30 GP generations, we obtained over 65% median sensitivity and over 65% median selectivity except one patient. PMID:25580059
Wang, Qianqian; Zhao, Jing; Gong, Yong; Hao, Qun; Peng, Zhong
2017-11-20
A hybrid artificial bee colony (ABC) algorithm inspired by the best-so-far solution and bacterial chemotaxis was introduced to optimize the parameters of the five-parameter bidirectional reflectance distribution function (BRDF) model. To verify the performance of the hybrid ABC algorithm, we measured BRDF of three kinds of samples and simulated the undetermined parameters of the five-parameter BRDF model using the hybrid ABC algorithm and the genetic algorithm, respectively. The experimental results demonstrate that the hybrid ABC algorithm outperforms the genetic algorithm in convergence speed, accuracy, and time efficiency under the same conditions.
NASA Astrophysics Data System (ADS)
Bassa, Zaakirah; Bob, Urmilla; Szantoi, Zoltan; Ismail, Riyad
2016-01-01
In recent years, the popularity of tree-based ensemble methods for land cover classification has increased significantly. Using WorldView-2 image data, we evaluate the potential of the oblique random forest algorithm (oRF) to classify a highly heterogeneous protected area. In contrast to the random forest (RF) algorithm, the oRF algorithm builds multivariate trees by learning the optimal split using a supervised model. The oRF binary algorithm is adapted to a multiclass land cover and land use application using both the "one-against-one" and "one-against-all" combination approaches. Results show that the oRF algorithms are capable of achieving high classification accuracies (>80%). However, there was no statistical difference in classification accuracies obtained by the oRF algorithms and the more popular RF algorithm. For all the algorithms, user accuracies (UAs) and producer accuracies (PAs) >80% were recorded for most of the classes. Both the RF and oRF algorithms poorly classified the indigenous forest class as indicated by the low UAs and PAs. Finally, the results from this study advocate and support the utility of the oRF algorithm for land cover and land use mapping of protected areas using WorldView-2 image data.
Valavanis, Ioannis K; Mougiakakou, Stavroula G; Grimaldi, Keith A; Nikita, Konstantina S
2010-09-08
Obesity is a multifactorial trait, which comprises an independent risk factor for cardiovascular disease (CVD). The aim of the current work is to study the complex etiology beneath obesity and identify genetic variations and/or factors related to nutrition that contribute to its variability. To this end, a set of more than 2300 white subjects who participated in a nutrigenetics study was used. For each subject a total of 63 factors describing genetic variants related to CVD (24 in total), gender, and nutrition (38 in total), e.g. average daily intake in calories and cholesterol, were measured. Each subject was categorized according to body mass index (BMI) as normal (BMI ≤ 25) or overweight (BMI > 25). Two artificial neural network (ANN) based methods were designed and used towards the analysis of the available data. These corresponded to i) a multi-layer feed-forward ANN combined with a parameter decreasing method (PDM-ANN), and ii) a multi-layer feed-forward ANN trained by a hybrid method (GA-ANN) which combines genetic algorithms and the popular back-propagation training algorithm. PDM-ANN and GA-ANN were comparatively assessed in terms of their ability to identify the most important factors among the initial 63 variables describing genetic variations, nutrition and gender, able to classify a subject into one of the BMI related classes: normal and overweight. The methods were designed and evaluated using appropriate training and testing sets provided by 3-fold Cross Validation (3-CV) resampling. Classification accuracy, sensitivity, specificity and area under receiver operating characteristics curve were utilized to evaluate the resulted predictive ANN models. The most parsimonious set of factors was obtained by the GA-ANN method and included gender, six genetic variations and 18 nutrition-related variables. The corresponding predictive model was characterized by a mean accuracy equal of 61.46% in the 3-CV testing sets. The ANN based methods revealed factors that interactively contribute to obesity trait and provided predictive models with a promising generalization ability. In general, results showed that ANNs and their hybrids can provide useful tools for the study of complex traits in the context of nutrigenetics.
[Application of genetic algorithm in blending technology for extractions of Cortex Fraxini].
Yang, Ming; Zhou, Yinmin; Chen, Jialei; Yu, Minying; Shi, Xiufeng; Gu, Xijun
2009-10-01
To explore the feasibility of genetic algorithm (GA) on multiple objective blending technology for extractions of Cortex Fraxini. According to that the optimization objective was the combination of fingerprint similarity and the root-mean-square error of multiple key constituents, a new multiple objective optimization model of 10 batches extractions of Cortex Fraxini was built. The blending coefficient was obtained by genetic algorithm. The quality of 10 batches extractions of Cortex Fraxini that after blending was evaluated with the finger print similarity and root-mean-square error as indexes. The quality of 10 batches extractions of Cortex Fraxini that after blending was well improved. Comparing with the fingerprint of the control sample, the similarity was up, but the degree of variation is down. The relative deviation of the key constituents was less than 10%. It is proved that genetic algorithm works well on multiple objective blending technology for extractions of Cortex Fraxini. This method can be a reference to control the quality of extractions of Cortex Fraxini. Genetic algorithm in blending technology for extractions of Chinese medicines is advisable.
Using Gaussian mixture models to detect and classify dolphin whistles and pulses.
Peso Parada, Pablo; Cardenal-López, Antonio
2014-06-01
In recent years, a number of automatic detection systems for free-ranging cetaceans have been proposed that aim to detect not just surfaced, but also submerged, individuals. These systems are typically based on pattern-recognition techniques applied to underwater acoustic recordings. Using a Gaussian mixture model, a classification system was developed that detects sounds in recordings and classifies them as one of four types: background noise, whistles, pulses, and combined whistles and pulses. The classifier was tested using a database of underwater recordings made off the Spanish coast during 2011. Using cepstral-coefficient-based parameterization, a sound detection rate of 87.5% was achieved for a 23.6% classification error rate. To improve these results, two parameters computed using the multiple signal classification algorithm and an unpredictability measure were included in the classifier. These parameters, which helped to classify the segments containing whistles, increased the detection rate to 90.3% and reduced the classification error rate to 18.1%. Finally, the potential of the multiple signal classification algorithm and unpredictability measure for estimating whistle contours and classifying cetacean species was also explored, with promising results.
A., Javadpour; A., Mohammadi
2016-01-01
Background Regarding the importance of right diagnosis in medical applications, various methods have been exploited for processing medical images solar. The method of segmentation is used to analyze anal to miscall structures in medical imaging. Objective This study describes a new method for brain Magnetic Resonance Image (MRI) segmentation via a novel algorithm based on genetic and regional growth. Methods Among medical imaging methods, brains MRI segmentation is important due to high contrast of non-intrusive soft tissue and high spatial resolution. Size variations of brain tissues are often accompanied by various diseases such as Alzheimer’s disease. As our knowledge about the relation between various brain diseases and deviation of brain anatomy increases, MRI segmentation is exploited as the first step in early diagnosis. In this paper, regional growth method and auto-mate selection of initial points by genetic algorithm is used to introduce a new method for MRI segmentation. Primary pixels and similarity criterion are automatically by genetic algorithms to maximize the accuracy and validity in image segmentation. Results By using genetic algorithms and defining the fixed function of image segmentation, the initial points for the algorithm were found. The proposed algorithms are applied to the images and results are manually selected by regional growth in which the initial points were compared. The results showed that the proposed algorithm could reduce segmentation error effectively. Conclusion The study concluded that the proposed algorithm could reduce segmentation error effectively and help us to diagnose brain diseases. PMID:27672629
Ortuño, Francisco M; Valenzuela, Olga; Rojas, Fernando; Pomares, Hector; Florido, Javier P; Urquiza, Jose M; Rojas, Ignacio
2013-09-01
Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P < 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P < 0.05), whereas it shows results not significantly different to 3D-COFFEE (P > 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.
WND-CHARM: Multi-purpose image classification using compound image transforms
Orlov, Nikita; Shamir, Lior; Macura, Tomasz; Johnston, Josiah; Eckley, D. Mark; Goldberg, Ilya G.
2008-01-01
We describe a multi-purpose image classifier that can be applied to a wide variety of image classification tasks without modifications or fine-tuning, and yet provide classification accuracy comparable to state-of-the-art task-specific image classifiers. The proposed image classifier first extracts a large set of 1025 image features including polynomial decompositions, high contrast features, pixel statistics, and textures. These features are computed on the raw image, transforms of the image, and transforms of transforms of the image. The feature values are then used to classify test images into a set of pre-defined image classes. This classifier was tested on several different problems including biological image classification and face recognition. Although we cannot make a claim of universality, our experimental results show that this classifier performs as well or better than classifiers developed specifically for these image classification tasks. Our classifier’s high performance on a variety of classification problems is attributed to (i) a large set of features extracted from images; and (ii) an effective feature selection and weighting algorithm sensitive to specific image classification problems. The algorithms are available for free download from openmicroscopy.org. PMID:18958301
A multiobjective hybrid genetic algorithm for the capacitated multipoint network design problem.
Lo, C C; Chang, W H
2000-01-01
The capacitated multipoint network design problem (CMNDP) is NP-complete. In this paper, a hybrid genetic algorithm for CMNDP is proposed. The multiobjective hybrid genetic algorithm (MOHGA) differs from other genetic algorithms (GAs) mainly in its selection procedure. The concept of subpopulation is used in MOHGA. Four subpopulations are generated according to the elitism reservation strategy, the shifting Prufer vector, the stochastic universal sampling, and the complete random method, respectively. Mixing these four subpopulations produces the next generation population. The MOHGA can effectively search the feasible solution space due to population diversity. The MOHGA has been applied to CMNDP. By examining computational and analytical results, we notice that the MOHGA can find most nondominated solutions and is much more effective and efficient than other multiobjective GAs.
A Hybrid Neural Network-Genetic Algorithm Technique for Aircraft Engine Performance Diagnostics
NASA Technical Reports Server (NTRS)
Kobayashi, Takahisa; Simon, Donald L.
2001-01-01
In this paper, a model-based diagnostic method, which utilizes Neural Networks and Genetic Algorithms, is investigated. Neural networks are applied to estimate the engine internal health, and Genetic Algorithms are applied for sensor bias detection and estimation. This hybrid approach takes advantage of the nonlinear estimation capability provided by neural networks while improving the robustness to measurement uncertainty through the application of Genetic Algorithms. The hybrid diagnostic technique also has the ability to rank multiple potential solutions for a given set of anomalous sensor measurements in order to reduce false alarms and missed detections. The performance of the hybrid diagnostic technique is evaluated through some case studies derived from a turbofan engine simulation. The results show this approach is promising for reliable diagnostics of aircraft engines.
Genetic Algorithm Approaches for Actuator Placement
NASA Technical Reports Server (NTRS)
Crossley, William A.
2000-01-01
This research investigated genetic algorithm approaches for smart actuator placement to provide aircraft maneuverability without requiring hinged flaps or other control surfaces. The effort supported goals of the Multidisciplinary Design Optimization focus efforts in NASA's Aircraft au program. This work helped to properly identify various aspects of the genetic algorithm operators and parameters that allow for placement of discrete control actuators/effectors. An improved problem definition, including better definition of the objective function and constraints, resulted from this research effort. The work conducted for this research used a geometrically simple wing model; however, an increasing number of potential actuator placement locations were incorporated to illustrate the ability of the GA to determine promising actuator placement arrangements. This effort's major result is a useful genetic algorithm-based approach to assist in the discrete actuator/effector placement problem.
Online boosting for vehicle detection.
Chang, Wen-Chung; Cho, Chih-Wei
2010-06-01
This paper presents a real-time vision-based vehicle detection system employing an online boosting algorithm. It is an online AdaBoost approach for a cascade of strong classifiers instead of a single strong classifier. Most existing cascades of classifiers must be trained offline and cannot effectively be updated when online tuning is required. The idea is to develop a cascade of strong classifiers for vehicle detection that is capable of being online trained in response to changing traffic environments. To make the online algorithm tractable, the proposed system must efficiently tune parameters based on incoming images and up-to-date performance of each weak classifier. The proposed online boosting method can improve system adaptability and accuracy to deal with novel types of vehicles and unfamiliar environments, whereas existing offline methods rely much more on extensive training processes to reach comparable results and cannot further be updated online. Our approach has been successfully validated in real traffic environments by performing experiments with an onboard charge-coupled-device camera in a roadway vehicle.
Classification of ligand molecules in PDB with fast heuristic graph match algorithm COMPLIG.
Saito, Mihoko; Takemura, Naomi; Shirai, Tsuyoshi
2012-12-14
A fast heuristic graph-matching algorithm, COMPLIG, was devised to classify the small-molecule ligands in the Protein Data Bank (PDB), which are currently not properly classified on structure basis. By concurrently classifying proteins and ligands, we determined the most appropriate parameter for categorizing ligands to be more than 60% identity of atoms and bonds between molecules, and we classified 11,585 types of ligands into 1946 clusters. Although the large clusters were composed of nucleotides or amino acids, a significant presence of drug compounds was also observed. Application of the system to classify the natural ligand status of human proteins in the current database suggested that, at most, 37% of the experimental structures of human proteins were in complex with natural ligands. However, protein homology- and/or ligand similarity-based modeling was implied to provide models of natural interactions for an additional 28% of the total, which might be used to increase the knowledge of intrinsic protein-metabolite interactions. Copyright © 2012 Elsevier Ltd. All rights reserved.
Classifier fusion for VoIP attacks classification
NASA Astrophysics Data System (ADS)
Safarik, Jakub; Rezac, Filip
2017-05-01
SIP is one of the most successful protocols in the field of IP telephony communication. It establishes and manages VoIP calls. As the number of SIP implementation rises, we can expect a higher number of attacks on the communication system in the near future. This work aims at malicious SIP traffic classification. A number of various machine learning algorithms have been developed for attack classification. The paper presents a comparison of current research and the use of classifier fusion method leading to a potential decrease in classification error rate. Use of classifier combination makes a more robust solution without difficulties that may affect single algorithms. Different voting schemes, combination rules, and classifiers are discussed to improve the overall performance. All classifiers have been trained on real malicious traffic. The concept of traffic monitoring depends on the network of honeypot nodes. These honeypots run in several networks spread in different locations. Separation of honeypots allows us to gain an independent and trustworthy attack information.
NASA Astrophysics Data System (ADS)
da Silva, Flávio Altinier Maximiano; Pedrini, Helio
2015-03-01
Facial expressions are an important demonstration of humanity's humors and emotions. Algorithms capable of recognizing facial expressions and associating them with emotions were developed and employed to compare the expressions that different cultural groups use to show their emotions. Static pictures of predominantly occidental and oriental subjects from public datasets were used to train machine learning algorithms, whereas local binary patterns, histogram of oriented gradients (HOGs), and Gabor filters were employed to describe the facial expressions for six different basic emotions. The most consistent combination, formed by the association of HOG filter and support vector machines, was then used to classify the other cultural group: there was a strong drop in accuracy, meaning that the subtle differences of facial expressions of each culture affected the classifier performance. Finally, a classifier was trained with images from both occidental and oriental subjects and its accuracy was higher on multicultural data, evidencing the need of a multicultural training set to build an efficient classifier.
A pipelined FPGA implementation of an encryption algorithm based on genetic algorithm
NASA Astrophysics Data System (ADS)
Thirer, Nonel
2013-05-01
With the evolution of digital data storage and exchange, it is essential to protect the confidential information from every unauthorized access. High performance encryption algorithms were developed and implemented by software and hardware. Also many methods to attack the cipher text were developed. In the last years, the genetic algorithm has gained much interest in cryptanalysis of cipher texts and also in encryption ciphers. This paper analyses the possibility to use the genetic algorithm as a multiple key sequence generator for an AES (Advanced Encryption Standard) cryptographic system, and also to use a three stages pipeline (with four main blocks: Input data, AES Core, Key generator, Output data) to provide a fast encryption and storage/transmission of a large amount of data.
Luo, Junhai; Fu, Liang
2017-06-09
With the development of communication technology, the demand for location-based services is growing rapidly. This paper presents an algorithm for indoor localization based on Received Signal Strength (RSS), which is collected from Access Points (APs). The proposed localization algorithm contains the offline information acquisition phase and online positioning phase. Firstly, the AP selection algorithm is reviewed and improved based on the stability of signals to remove useless AP; secondly, Kernel Principal Component Analysis (KPCA) is analyzed and used to remove the data redundancy and maintain useful characteristics for nonlinear feature extraction; thirdly, the Affinity Propagation Clustering (APC) algorithm utilizes RSS values to classify data samples and narrow the positioning range. In the online positioning phase, the classified data will be matched with the testing data to determine the position area, and the Maximum Likelihood (ML) estimate will be employed for precise positioning. Eventually, the proposed algorithm is implemented in a real-world environment for performance evaluation. Experimental results demonstrate that the proposed algorithm improves the accuracy and computational complexity.
Álvarez, Aitor; Sierra, Basilio; Arruti, Andoni; López-Gil, Juan-Miguel; Garay-Vitoria, Nestor
2015-01-01
In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one. PMID:26712757
Image reconstruction through thin scattering media by simulated annealing algorithm
NASA Astrophysics Data System (ADS)
Fang, Longjie; Zuo, Haoyi; Pang, Lin; Yang, Zuogang; Zhang, Xicheng; Zhu, Jianhua
2018-07-01
An idea for reconstructing the image of an object behind thin scattering media is proposed by phase modulation. The optimized phase mask is achieved by modulating the scattered light using simulated annealing algorithm. The correlation coefficient is exploited as a fitness function to evaluate the quality of reconstructed image. The reconstructed images optimized from simulated annealing algorithm and genetic algorithm are compared in detail. The experimental results show that our proposed method has better definition and higher speed than genetic algorithm.
Predicting relapse risk in childhood acute lymphoblastic leukaemia.
Teachey, David T; Hunger, Stephen P
2013-09-01
Intensive multi-agent chemotherapy regimens and the introduction of risk-stratified therapy have substantially improved cure rates for children with acute lymphoblastic leukaemia (ALL). Current risk allocation schemas are imperfect, as some children are classified as lower-risk and treated with less intensive therapy relapse, while others deemed higher-risk are probably over-treated. Most cooperative groups previously used morphological clearance of blasts in blood and marrow during the initial phases of chemotherapy as a primary factor for risk group allocation; however, this has largely been replaced by the detection of minimal residual disease (MRD). Other than age and white blood cell count (WBC) at presentation, many clinical variables previously used for risk group allocation are no longer prognostic, as MRD and the presence of sentinel genetic lesions are more reliable at predicting outcome. Currently, a number of sentinel genetic lesions are used by most cooperative groups for risk stratification; however, in the near future patients will probably be risk-stratified using genomic signatures and clustering algorithms, rather than individual genetic alterations. This review will describe the clinical, biological, and response-based features known to predict relapse risk in childhood ALL, including those currently used and those likely to be used in the near future to risk-stratify therapy. © 2013 John Wiley & Sons Ltd.
Low-thrust orbit transfer optimization with refined Q-law and multi-objective genetic algorithm
NASA Technical Reports Server (NTRS)
Lee, Seungwon; Petropoulos, Anastassios E.; von Allmen, Paul
2005-01-01
An optimization method for low-thrust orbit transfers around a central body is developed using the Q-law and a multi-objective genetic algorithm. in the hybrid method, the Q-law generates candidate orbit transfers, and the multi-objective genetic algorithm optimizes the Q-law control parameters in order to simultaneously minimize both the consumed propellant mass and flight time of the orbit tranfer. This paper addresses the problem of finding optimal orbit transfers for low-thrust spacecraft.
Genetic algorithm for neural networks optimization
NASA Astrophysics Data System (ADS)
Setyawati, Bina R.; Creese, Robert C.; Sahirman, Sidharta
2004-11-01
This paper examines the forecasting performance of multi-layer feed forward neural networks in modeling a particular foreign exchange rates, i.e. Japanese Yen/US Dollar. The effects of two learning methods, Back Propagation and Genetic Algorithm, in which the neural network topology and other parameters fixed, were investigated. The early results indicate that the application of this hybrid system seems to be well suited for the forecasting of foreign exchange rates. The Neural Networks and Genetic Algorithm were programmed using MATLAB«.
Hybrid Architectures for Evolutionary Computing Algorithms
2008-01-01
other EC algorithms to FPGA Core Burns P1026/MAPLD 200532 Genetic Algorithm Hardware References S. Scott, A. Samal , and S. Seth, “HGA: A Hardware Based...on Parallel and Distributed Processing (IPPS/SPDP ), pp. 316-320, Proceedings. IEEE Computer Society 1998. [12] Scott, S. D. , Samal , A., and...Algorithm Hardware References S. Scott, A. Samal , and S. Seth, “HGA: A Hardware Based Genetic Algorithm”, Proceedings of the 1995 ACM Third
Ozmutlu, H. Cenk
2014-01-01
We developed mixed integer programming (MIP) models and hybrid genetic-local search algorithms for the scheduling problem of unrelated parallel machines with job sequence and machine-dependent setup times and with job splitting property. The first contribution of this paper is to introduce novel algorithms which make splitting and scheduling simultaneously with variable number of subjobs. We proposed simple chromosome structure which is constituted by random key numbers in hybrid genetic-local search algorithm (GAspLA). Random key numbers are used frequently in genetic algorithms, but it creates additional difficulty when hybrid factors in local search are implemented. We developed algorithms that satisfy the adaptation of results of local search into the genetic algorithms with minimum relocation operation of genes' random key numbers. This is the second contribution of the paper. The third contribution of this paper is three developed new MIP models which are making splitting and scheduling simultaneously. The fourth contribution of this paper is implementation of the GAspLAMIP. This implementation let us verify the optimality of GAspLA for the studied combinations. The proposed methods are tested on a set of problems taken from the literature and the results validate the effectiveness of the proposed algorithms. PMID:24977204
K-mean clustering algorithm for processing signals from compound semiconductor detectors
NASA Astrophysics Data System (ADS)
Tada, Tsutomu; Hitomi, Keitaro; Wu, Yan; Kim, Seong-Yun; Yamazaki, Hiromichi; Ishii, Keizo
2011-12-01
The K-mean clustering algorithm was employed for processing signal waveforms from TlBr detectors. The signal waveforms were classified based on its shape reflecting the charge collection process in the detector. The classified signal waveforms were processed individually to suppress the pulse height variation of signals due to the charge collection loss. The obtained energy resolution of a 137Cs spectrum measured with a 0.5 mm thick TlBr detector was 1.3% FWHM by employing 500 clusters.
Adaptive sleep-wake discrimination for wearable devices.
Karlen, Walter; Floreano, Dario
2011-04-01
Sleep/wake classification systems that rely on physiological signals suffer from intersubject differences that make accurate classification with a single, subject-independent model difficult. To overcome the limitations of intersubject variability, we suggest a novel online adaptation technique that updates the sleep/wake classifier in real time. The objective of the present study was to evaluate the performance of a newly developed adaptive classification algorithm that was embedded on a wearable sleep/wake classification system called SleePic. The algorithm processed ECG and respiratory effort signals for the classification task and applied behavioral measurements (obtained from accelerometer and press-button data) for the automatic adaptation task. When trained as a subject-independent classifier algorithm, the SleePic device was only able to correctly classify 74.94 ± 6.76% of the human-rated sleep/wake data. By using the suggested automatic adaptation method, the mean classification accuracy could be significantly improved to 92.98 ± 3.19%. A subject-independent classifier based on activity data only showed a comparable accuracy of 90.44 ± 3.57%. We demonstrated that subject-independent models used for online sleep-wake classification can successfully be adapted to previously unseen subjects without the intervention of human experts or off-line calibration.
Fuzzy Nonlinear Proximal Support Vector Machine for Land Extraction Based on Remote Sensing Image
Zhong, Xiaomei; Li, Jianping; Dou, Huacheng; Deng, Shijun; Wang, Guofei; Jiang, Yu; Wang, Yongjie; Zhou, Zebing; Wang, Li; Yan, Fei
2013-01-01
Currently, remote sensing technologies were widely employed in the dynamic monitoring of the land. This paper presented an algorithm named fuzzy nonlinear proximal support vector machine (FNPSVM) by basing on ETM+ remote sensing image. This algorithm is applied to extract various types of lands of the city Da’an in northern China. Two multi-category strategies, namely “one-against-one” and “one-against-rest” for this algorithm were described in detail and then compared. A fuzzy membership function was presented to reduce the effects of noises or outliers on the data samples. The approaches of feature extraction, feature selection, and several key parameter settings were also given. Numerous experiments were carried out to evaluate its performances including various accuracies (overall accuracies and kappa coefficient), stability, training speed, and classification speed. The FNPSVM classifier was compared to the other three classifiers including the maximum likelihood classifier (MLC), back propagation neural network (BPN), and the proximal support vector machine (PSVM) under different training conditions. The impacts of the selection of training samples, testing samples and features on the four classifiers were also evaluated in these experiments. PMID:23936016
Series Hybrid Electric Vehicle Power System Optimization Based on Genetic Algorithm
NASA Astrophysics Data System (ADS)
Zhu, Tianjun; Li, Bin; Zong, Changfu; Wu, Yang
2017-09-01
Hybrid electric vehicles (HEV), compared with conventional vehicles, have complex structures and more component parameters. If variables optimization designs are carried on all these parameters, it will increase the difficulty and the convergence of algorithm program, so this paper chooses the parameters which has a major influence on the vehicle fuel consumption to make it all work at maximum efficiency. First, HEV powertrain components modelling are built. Second, taking a tandem hybrid structure as an example, genetic algorithm is used in this paper to optimize fuel consumption and emissions. Simulation results in ADVISOR verify the feasibility of the proposed genetic optimization algorithm.
Multiple sequence alignment using multi-objective based bacterial foraging optimization algorithm.
Rani, R Ranjani; Ramyachitra, D
2016-12-01
Multiple sequence alignment (MSA) is a widespread approach in computational biology and bioinformatics. MSA deals with how the sequences of nucleotides and amino acids are sequenced with possible alignment and minimum number of gaps between them, which directs to the functional, evolutionary and structural relationships among the sequences. Still the computation of MSA is a challenging task to provide an efficient accuracy and statistically significant results of alignments. In this work, the Bacterial Foraging Optimization Algorithm was employed to align the biological sequences which resulted in a non-dominated optimal solution. It employs Multi-objective, such as: Maximization of Similarity, Non-gap percentage, Conserved blocks and Minimization of gap penalty. BAliBASE 3.0 benchmark database was utilized to examine the proposed algorithm against other methods In this paper, two algorithms have been proposed: Hybrid Genetic Algorithm with Artificial Bee Colony (GA-ABC) and Bacterial Foraging Optimization Algorithm. It was found that Hybrid Genetic Algorithm with Artificial Bee Colony performed better than the existing optimization algorithms. But still the conserved blocks were not obtained using GA-ABC. Then BFO was used for the alignment and the conserved blocks were obtained. The proposed Multi-Objective Bacterial Foraging Optimization Algorithm (MO-BFO) was compared with widely used MSA methods Clustal Omega, Kalign, MUSCLE, MAFFT, Genetic Algorithm (GA), Ant Colony Optimization (ACO), Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO) and Hybrid Genetic Algorithm with Artificial Bee Colony (GA-ABC). The final results show that the proposed MO-BFO algorithm yields better alignment than most widely used methods. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Automatic page layout using genetic algorithms for electronic albuming
NASA Astrophysics Data System (ADS)
Geigel, Joe; Loui, Alexander C. P.
2000-12-01
In this paper, we describe a flexible system for automatic page layout that makes use of genetic algorithms for albuming applications. The system is divided into two modules, a page creator module which is responsible for distributing images amongst various album pages, and an image placement module which positions images on individual pages. Final page layouts are specified in a textual form using XML for printing or viewing over the Internet. The system makes use of genetic algorithms, a class of search and optimization algorithms that are based on the concepts of biological evolution, for generating solutions with fitness based on graphic design preferences supplied by the user. The genetic page layout algorithm has been incorporated into a web-based prototype system for interactive page layout over the Internet. The prototype system is built using client-server architecture and is implemented in java. The system described in this paper has demonstrated the feasibility of using genetic algorithms for automated page layout in albuming and web-based imaging applications. We believe that the system adequately proves the validity of the concept, providing creative layouts in a reasonable number of iterations. By optimizing the layout parameters of the fitness function, we hope to further improve the quality of the final layout in terms of user preference and computation speed.
NASA Astrophysics Data System (ADS)
Narwadi, Teguh; Subiyanto
2017-03-01
The Travelling Salesman Problem (TSP) is one of the best known NP-hard problems, which means that no exact algorithm to solve it in polynomial time. This paper present a new variant application genetic algorithm approach with a local search technique has been developed to solve the TSP. For the local search technique, an iterative hill climbing method has been used. The system is implemented on the Android OS because android is now widely used around the world and it is mobile system. It is also integrated with Google API that can to get the geographical location and the distance of the cities, and displays the route. Therefore, we do some experimentation to test the behavior of the application. To test the effectiveness of the application of hybrid genetic algorithm (HGA) is compare with the application of simple GA in 5 sample from the cities in Central Java, Indonesia with different numbers of cities. According to the experiment results obtained that in the average solution HGA shows in 5 tests out of 5 (100%) is better than simple GA. The results have shown that the hybrid genetic algorithm outperforms the genetic algorithm especially in the case with the problem higher complexity.
A novel method for predicting kidney stone type using ensemble learning.
Kazemi, Yassaman; Mirroshandel, Seyed Abolghasem
2018-01-01
The high morbidity rate associated with kidney stone disease, which is a silent killer, is one of the main concerns in healthcare systems all over the world. Advanced data mining techniques such as classification can help in the early prediction of this disease and reduce its incidence and associated costs. The objective of the present study is to derive a model for the early detection of the type of kidney stone and the most influential parameters with the aim of providing a decision-support system. Information was collected from 936 patients with nephrolithiasis at the kidney center of the Razi Hospital in Rasht from 2012 through 2016. The prepared dataset included 42 features. Data pre-processing was the first step toward extracting the relevant features. The collected data was analyzed with Weka software, and various data mining models were used to prepare a predictive model. Various data mining algorithms such as the Bayesian model, different types of Decision Trees, Artificial Neural Networks, and Rule-based classifiers were used in these models. We also proposed four models based on ensemble learning to improve the accuracy of each learning algorithm. In addition, a novel technique for combining individual classifiers in ensemble learning was proposed. In this technique, for each individual classifier, a weight is assigned based on our proposed genetic algorithm based method. The generated knowledge was evaluated using a 10-fold cross-validation technique based on standard measures. However, the assessment of each feature for building a predictive model was another significant challenge. The predictive strength of each feature for creating a reproducible outcome was also investigated. Regarding the applied models, parameters such as sex, acid uric condition, calcium level, hypertension, diabetes, nausea and vomiting, flank pain, and urinary tract infection (UTI) were the most vital parameters for predicting the chance of nephrolithiasis. The final ensemble-based model (with an accuracy of 97.1%) was a robust one and could be safely applied to future studies to predict the chances of developing nephrolithiasis. This model provides a novel way to study stone disease by deciphering the complex interaction among different biological variables, thus helping in an early identification and reduction in diagnosis time. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Liu, Yan; Deng, Honggui; Ren, Shuang; Tang, Chengying; Qian, Xuewen
2018-01-01
We propose an efficient partial transmit sequence technique based on genetic algorithm and peak-value optimization algorithm (GAPOA) to reduce high peak-to-average power ratio (PAPR) in visible light communication systems based on orthogonal frequency division multiplexing (VLC-OFDM). By analysis of hill-climbing algorithm's pros and cons, we propose the POA with excellent local search ability to further process the signals whose PAPR is still over the threshold after processed by genetic algorithm (GA). To verify the effectiveness of the proposed technique and algorithm, we evaluate the PAPR performance and the bit error rate (BER) performance and compare them with partial transmit sequence (PTS) technique based on GA (GA-PTS), PTS technique based on genetic and hill-climbing algorithm (GH-PTS), and PTS based on shuffled frog leaping algorithm and hill-climbing algorithm (SFLAHC-PTS). The results show that our technique and algorithm have not only better PAPR performance but also lower computational complexity and BER than GA-PTS, GH-PTS, and SFLAHC-PTS technique.
Goudie, Catherine; Coltin, Hallie; Witkowski, Leora; Mourad, Stephanie; Malkin, David; Foulkes, William D
2017-08-01
Identifying cancer predisposition syndromes in children with tumors is crucial, yet few clinical guidelines exist to identify children at high risk of having germline mutations. The McGill Interactive Pediatric OncoGenetic Guidelines project aims to create a validated pediatric guideline in the form of a smartphone/tablet application using algorithms to process clinical data and help determine whether to refer a child for genetic assessment. This paper discusses the initial stages of the project, focusing on its overall structure, the methodology underpinning the algorithms, and the upcoming algorithm validation process. © 2017 Wiley Periodicals, Inc.
Classification of fMRI resting-state maps using machine learning techniques: A comparative study
NASA Astrophysics Data System (ADS)
Gallos, Ioannis; Siettos, Constantinos
2017-11-01
We compare the efficiency of Principal Component Analysis (PCA) and nonlinear learning manifold algorithms (ISOMAP and Diffusion maps) for classifying brain maps between groups of schizophrenia patients and healthy from fMRI scans during a resting-state experiment. After a standard pre-processing pipeline, we applied spatial Independent component analysis (ICA) to reduce (a) noise and (b) spatial-temporal dimensionality of fMRI maps. On the cross-correlation matrix of the ICA components, we applied PCA, ISOMAP and Diffusion Maps to find an embedded low-dimensional space. Finally, support-vector-machines (SVM) and k-NN algorithms were used to evaluate the performance of the algorithms in classifying between the two groups.
False alarm reduction by the And-ing of multiple multivariate Gaussian classifiers
NASA Astrophysics Data System (ADS)
Dobeck, Gerald J.; Cobb, J. Tory
2003-09-01
The high-resolution sonar is one of the principal sensors used by the Navy to detect and classify sea mines in minehunting operations. For such sonar systems, substantial effort has been devoted to the development of automated detection and classification (D/C) algorithms. These have been spurred by several factors including (1) aids for operators to reduce work overload, (2) more optimal use of all available data, and (3) the introduction of unmanned minehunting systems. The environments where sea mines are typically laid (harbor areas, shipping lanes, and the littorals) give rise to many false alarms caused by natural, biologic, and man-made clutter. The objective of the automated D/C algorithms is to eliminate most of these false alarms while still maintaining a very high probability of mine detection and classification (PdPc). In recent years, the benefits of fusing the outputs of multiple D/C algorithms have been studied. We refer to this as Algorithm Fusion. The results have been remarkable, including reliable robustness to new environments. This paper describes a method for training several multivariate Gaussian classifiers such that their And-ing dramatically reduces false alarms while maintaining a high probability of classification. This training approach is referred to as the Focused- Training method. This work extends our 2001-2002 work where the Focused-Training method was used with three other types of classifiers: the Attractor-based K-Nearest Neighbor Neural Network (a type of radial-basis, probabilistic neural network), the Optimal Discrimination Filter Classifier (based linear discrimination theory), and the Quadratic Penalty Function Support Vector Machine (QPFSVM). Although our experience has been gained in the area of sea mine detection and classification, the principles described herein are general and can be applied to a wide range of pattern recognition and automatic target recognition (ATR) problems.
Ozçift, Akin
2011-05-01
Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.
Optimization of genomic selection training populations with a genetic algorithm
USDA-ARS?s Scientific Manuscript database
In this article, we derive a computationally efficient statistic to measure the reliability of estimates of genetic breeding values for a fixed set of genotypes based on a given training set of genotypes and phenotypes. We adopt a genetic algorithm scheme to find a training set of certain size from ...
A multiscale curvature algorithm for classifying discrete return LiDAR in forested environments
Jeffrey S. Evans; Andrew T. Hudak
2007-01-01
One prerequisite to the use of light detection and ranging (LiDAR) across disciplines is differentiating ground from nonground returns. The objective was to automatically and objectively classify points within unclassified LiDAR point clouds, with few model parameters and minimal postprocessing. Presented is an automated method for classifying LiDAR returns as ground...
NASA Technical Reports Server (NTRS)
Garay, Michael J.; Mazzoni, Dominic; Davies, Roger; Wagstaff, Kiri
2004-01-01
Support Vector Machines (SVMs) are a type of supervised learning algorith,, other examples of which are Artificial Neural Networks (ANNs), Decision Trees, and Naive Bayesian Classifiers. Supervised learning algorithms are used to classify objects labled by a 'supervisor' - typically a human 'expert.'.
Incorporating spatial context into statistical classification of multidimensional image data
NASA Technical Reports Server (NTRS)
Bauer, M. E. (Principal Investigator); Tilton, J. C.; Swain, P. H.
1981-01-01
Compound decision theory is employed to develop a general statistical model for classifying image data using spatial context. The classification algorithm developed from this model exploits the tendency of certain ground-cover classes to occur more frequently in some spatial contexts than in others. A key input to this contextural classifier is a quantitative characterization of this tendency: the context function. Several methods for estimating the context function are explored, and two complementary methods are recommended. The contextural classifier is shown to produce substantial improvements in classification accuracy compared to the accuracy produced by a non-contextural uniform-priors maximum likelihood classifier when these methods of estimating the context function are used. An approximate algorithm, which cuts computational requirements by over one-half, is presented. The search for an optimal implementation is furthered by an exploration of the relative merits of using spectral classes or information classes for classification and/or context function estimation.
Comparative Analysis of Document level Text Classification Algorithms using R
NASA Astrophysics Data System (ADS)
Syamala, Maganti; Nalini, N. J., Dr; Maguluri, Lakshamanaphaneendra; Ragupathy, R., Dr.
2017-08-01
From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.
A Theoretical Analysis of Why Hybrid Ensembles Work.
Hsu, Kuo-Wei
2017-01-01
Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles.
NASA Astrophysics Data System (ADS)
Pitarch, Jaime; Ruiz-Verdú, Antonio; Sendra, María. D.; Santoleri, Rosalia
2017-02-01
We studied the performance of the MERIS maximum peak height (MPH) algorithm in the retrieval of chlorophyll-a concentration (CHL), using a matchup data set of Bottom-of-Rayleigh Reflectances (BRR) and CHL from a hypertrophic lake (Albufera de Valencia). The MPH algorithm produced a slight underestimation of CHL in the pixels classified as cyanobacteria (83% of the total) and a strong overestimation in those classified as eukaryotic phytoplankton (17%). In situ biomass data showed that the binary classification of MPH was not appropriate for mixed phytoplankton populations, producing also unrealistic discontinuities in the CHL maps. We recalibrated MPH using our matchup data set and found that a single calibration curve of third degree fitted equally well to all matchups regardless of how they were classified. As a modification to the former approach, we incorporated the Phycocyanin Index (PCI) in the formula, thus taking into account the gradient of phytoplankton composition, which reduced the CHL retrieval errors. By using in situ biomass data, we also proved that PCI was indeed an indicator of cyanobacterial dominance. We applied our recalibration of the MPH algorithm to the whole MERIS data set (2002-2012). Results highlight the usefulness of the MPH algorithm as a tool to monitor eutrophication. The relevance of this fact is higher since MPH does not require a complete atmospheric correction, which often fails over such waters. An adequate flagging or correction of sun glint is advisable though, since the MPH algorithm was sensitive to sun glint.
A Constrained Genetic Algorithm with Adaptively Defined Fitness Function in MRS Quantification
NASA Astrophysics Data System (ADS)
Papakostas, G. A.; Karras, D. A.; Mertzios, B. G.; Graveron-Demilly, D.; van Ormondt, D.
MRS Signal quantification is a rather involved procedure and has attracted the interest of the medical engineering community, regarding the development of computationally efficient methodologies. Significant contributions based on Computational Intelligence tools, such as Neural Networks (NNs), demonstrated a good performance but not without drawbacks already discussed by the authors. On the other hand preliminary application of Genetic Algorithms (GA) has already been reported in the literature by the authors regarding the peak detection problem encountered in MRS quantification using the Voigt line shape model. This paper investigates a novel constrained genetic algorithm involving a generic and adaptively defined fitness function which extends the simple genetic algorithm methodology in case of noisy signals. The applicability of this new algorithm is scrutinized through experimentation in artificial MRS signals interleaved with noise, regarding its signal fitting capabilities. Although extensive experiments with real world MRS signals are necessary, the herein shown performance illustrates the method's potential to be established as a generic MRS metabolites quantification procedure.
Fireworks algorithm for mean-VaR/CVaR models
NASA Astrophysics Data System (ADS)
Zhang, Tingting; Liu, Zhifeng
2017-10-01
Intelligent algorithms have been widely applied to portfolio optimization problems. In this paper, we introduce a novel intelligent algorithm, named fireworks algorithm, to solve the mean-VaR/CVaR model for the first time. The results show that, compared with the classical genetic algorithm, fireworks algorithm not only improves the optimization accuracy and the optimization speed, but also makes the optimal solution more stable. We repeat our experiments at different confidence levels and different degrees of risk aversion, and the results are robust. It suggests that fireworks algorithm has more advantages than genetic algorithm in solving the portfolio optimization problem, and it is feasible and promising to apply it into this field.
Ant-cuckoo colony optimization for feature selection in digital mammogram.
Jona, J B; Nagaveni, N
2014-01-15
Digital mammogram is the only effective screening method to detect the breast cancer. Gray Level Co-occurrence Matrix (GLCM) textural features are extracted from the mammogram. All the features are not essential to detect the mammogram. Therefore identifying the relevant feature is the aim of this work. Feature selection improves the classification rate and accuracy of any classifier. In this study, a new hybrid metaheuristic named Ant-Cuckoo Colony Optimization a hybrid of Ant Colony Optimization (ACO) and Cuckoo Search (CS) is proposed for feature selection in Digital Mammogram. ACO is a good metaheuristic optimization technique but the drawback of this algorithm is that the ant will walk through the path where the pheromone density is high which makes the whole process slow hence CS is employed to carry out the local search of ACO. Support Vector Machine (SVM) classifier with Radial Basis Kernal Function (RBF) is done along with the ACO to classify the normal mammogram from the abnormal mammogram. Experiments are conducted in miniMIAS database. The performance of the new hybrid algorithm is compared with the ACO and PSO algorithm. The results show that the hybrid Ant-Cuckoo Colony Optimization algorithm is more accurate than the other techniques.
Spatial Statistics for Tumor Cell Counting and Classification
NASA Astrophysics Data System (ADS)
Wirjadi, Oliver; Kim, Yoo-Jin; Breuel, Thomas
To count and classify cells in histological sections is a standard task in histology. One example is the grading of meningiomas, benign tumors of the meninges, which requires to assess the fraction of proliferating cells in an image. As this process is very time consuming when performed manually, automation is required. To address such problems, we propose a novel application of Markov point process methods in computer vision, leading to algorithms for computing the locations of circular objects in images. In contrast to previous algorithms using such spatial statistics methods in image analysis, the present one is fully trainable. This is achieved by combining point process methods with statistical classifiers. Using simulated data, the method proposed in this paper will be shown to be more accurate and more robust to noise than standard image processing methods. On the publicly available SIMCEP benchmark for cell image analysis algorithms, the cell count performance of the present paper is significantly more accurate than results published elsewhere, especially when cells form dense clusters. Furthermore, the proposed system performs as well as a state-of-the-art algorithm for the computer-aided histological grading of meningiomas when combined with a simple k-nearest neighbor classifier for identifying proliferating cells.
Dynamic traffic assignment : genetic algorithms approach
DOT National Transportation Integrated Search
1997-01-01
Real-time route guidance is a promising approach to alleviating congestion on the nations highways. A dynamic traffic assignment model is central to the development of guidance strategies. The artificial intelligence technique of genetic algorithm...
NASA Technical Reports Server (NTRS)
Peck, Charles C.; Dhawan, Atam P.; Meyer, Claudia M.
1991-01-01
A genetic algorithm is used to select the inputs to a neural network function approximator. In the application considered, modeling critical parameters of the space shuttle main engine (SSME), the functional relationship between measured parameters is unknown and complex. Furthermore, the number of possible input parameters is quite large. Many approaches have been used for input selection, but they are either subjective or do not consider the complex multivariate relationships between parameters. Due to the optimization and space searching capabilities of genetic algorithms they were employed to systematize the input selection process. The results suggest that the genetic algorithm can generate parameter lists of high quality without the explicit use of problem domain knowledge. Suggestions for improving the performance of the input selection process are also provided.
NASA Astrophysics Data System (ADS)
Ebrahimi, Mehdi; Jahangirian, Alireza
2017-12-01
An efficient strategy is presented for global shape optimization of wing sections with a parallel genetic algorithm. Several computational techniques are applied to increase the convergence rate and the efficiency of the method. A variable fidelity computational evaluation method is applied in which the expensive Navier-Stokes flow solver is complemented by an inexpensive multi-layer perceptron neural network for the objective function evaluations. A population dispersion method that consists of two phases, of exploration and refinement, is developed to improve the convergence rate and the robustness of the genetic algorithm. Owing to the nature of the optimization problem, a parallel framework based on the master/slave approach is used. The outcomes indicate that the method is able to find the global optimum with significantly lower computational time in comparison to the conventional genetic algorithm.
Sun, J; Wang, T; Li, Z D; Shao, Y; Zhang, Z Y; Feng, H; Zou, D H; Chen, Y J
2017-12-01
To reconstruct a vehicle-bicycle-cyclist crash accident and analyse the injuries using 3D laser scanning technology, multi-rigid-body dynamics and optimized genetic algorithm, and to provide biomechanical basis for the forensic identification of death cause. The vehicle was measured by 3D laser scanning technology. The multi-rigid-body models of cyclist, bicycle and vehicle were developed based on the measurements. The value range of optimal variables was set. A multi-objective genetic algorithm and the nondominated sorting genetic algorithm were used to find the optimal solutions, which were compared to the record of the surveillance video around the accident scene. The reconstruction result of laser scanning on vehicle was satisfactory. In the optimal solutions found by optimization method of genetic algorithm, the dynamical behaviours of dummy, bicycle and vehicle corresponded to that recorded by the surveillance video. The injury parameters of dummy were consistent with the situation and position of the real injuries on the cyclist in accident. The motion status before accident, damage process by crash and mechanical analysis on the injury of the victim can be reconstructed using 3D laser scanning technology, multi-rigid-body dynamics and optimized genetic algorithm, which have application value in the identification of injury manner and analysis of death cause in traffic accidents. Copyright© by the Editorial Department of Journal of Forensic Medicine
NASA Astrophysics Data System (ADS)
Wihartiko, F. D.; Wijayanti, H.; Virgantari, F.
2018-03-01
Genetic Algorithm (GA) is a common algorithm used to solve optimization problems with artificial intelligence approach. Similarly, the Particle Swarm Optimization (PSO) algorithm. Both algorithms have different advantages and disadvantages when applied to the case of optimization of the Model Integer Programming for Bus Timetabling Problem (MIPBTP), where in the case of MIPBTP will be found the optimal number of trips confronted with various constraints. The comparison results show that the PSO algorithm is superior in terms of complexity, accuracy, iteration and program simplicity in finding the optimal solution.
Research on laser marking speed optimization by using genetic algorithm.
Wang, Dongyun; Yu, Qiwei; Zhang, Yu
2015-01-01
Laser Marking Machine is the most common coding equipment on product packaging lines. However, the speed of laser marking has become a bottleneck of production. In order to remove this bottleneck, a new method based on a genetic algorithm is designed. On the basis of this algorithm, a controller was designed and simulations and experiments were performed. The results show that using this algorithm could effectively improve laser marking efficiency by 25%.
Tag SNP selection via a genetic algorithm.
Mahdevar, Ghasem; Zahiri, Javad; Sadeghi, Mehdi; Nowzari-Dalini, Abbas; Ahrabian, Hayedeh
2010-10-01
Single Nucleotide Polymorphisms (SNPs) provide valuable information on human evolutionary history and may lead us to identify genetic variants responsible for human complex diseases. Unfortunately, molecular haplotyping methods are costly, laborious, and time consuming; therefore, algorithms for constructing full haplotype patterns from small available data through computational methods, Tag SNP selection problem, are convenient and attractive. This problem is proved to be an NP-hard problem, so heuristic methods may be useful. In this paper we present a heuristic method based on genetic algorithm to find reasonable solution within acceptable time. The algorithm was tested on a variety of simulated and experimental data. In comparison with the exact algorithm, based on brute force approach, results show that our method can obtain optimal solutions in almost all cases and runs much faster than exact algorithm when the number of SNP sites is large. Our software is available upon request to the corresponding author.
Research on rolling element bearing fault diagnosis based on genetic algorithm matching pursuit
NASA Astrophysics Data System (ADS)
Rong, R. W.; Ming, T. F.
2017-12-01
In order to solve the problem of slow computation speed, matching pursuit algorithm is applied to rolling bearing fault diagnosis, and the improvement are conducted from two aspects that are the construction of dictionary and the way to search for atoms. To be specific, Gabor function which can reflect time-frequency localization characteristic well is used to construct the dictionary, and the genetic algorithm to improve the searching speed. A time-frequency analysis method based on genetic algorithm matching pursuit (GAMP) algorithm is proposed. The way to set property parameters for the improvement of the decomposition results is studied. Simulation and experimental results illustrate that the weak fault feature of rolling bearing can be extracted effectively by this proposed method, at the same time, the computation speed increases obviously.
Evolutionary fuzzy ARTMAP neural networks for classification of semiconductor defects.
Tan, Shing Chiang; Watada, Junzo; Ibrahim, Zuwairie; Khalid, Marzuki
2015-05-01
Wafer defect detection using an intelligent system is an approach of quality improvement in semiconductor manufacturing that aims to enhance its process stability, increase production capacity, and improve yields. Occasionally, only few records that indicate defective units are available and they are classified as a minority group in a large database. Such a situation leads to an imbalanced data set problem, wherein it engenders a great challenge to deal with by applying machine-learning techniques for obtaining effective solution. In addition, the database may comprise overlapping samples of different classes. This paper introduces two models of evolutionary fuzzy ARTMAP (FAM) neural networks to deal with the imbalanced data set problems in a semiconductor manufacturing operations. In particular, both the FAM models and hybrid genetic algorithms are integrated in the proposed evolutionary artificial neural networks (EANNs) to classify an imbalanced data set. In addition, one of the proposed EANNs incorporates a facility to learn overlapping samples of different classes from the imbalanced data environment. The classification results of the proposed evolutionary FAM neural networks are presented, compared, and analyzed using several classification metrics. The outcomes positively indicate the effectiveness of the proposed networks in handling classification problems with imbalanced data sets.
A novel method for in silico identification of regulatory SNPs in human genome.
Li, Rong; Zhong, Dexing; Liu, Ruiling; Lv, Hongqiang; Zhang, Xinman; Liu, Jun; Han, Jiuqiang
2017-02-21
Regulatory single nucleotide polymorphisms (rSNPs), kind of functional noncoding genetic variants, can affect gene expression in a regulatory way, and they are thought to be associated with increased susceptibilities to complex diseases. Here a novel computational approach to identify potential rSNPs is presented. Different from most other rSNPs finding methods which based on hypothesis that SNPs causing large allele-specific changes in transcription factor binding affinities are more likely to play regulatory functions, we use a set of documented experimentally verified rSNPs and nonfunctional background SNPs to train classifiers, so the discriminating features are found. To characterize variants, an extensive range of characteristics, such as sequence context, DNA structure and evolutionary conservation etc. are analyzed. Support vector machine is adopted to build the classifier model together with an ensemble method to deal with unbalanced data. 10-fold cross-validation result shows that our method can achieve accuracy with sensitivity of ~78% and specificity of ~82%. Furthermore, our method performances better than some other algorithms based on aforementioned hypothesis in handling false positives. The original data and the source matlab codes involved are available at https://sourceforge.net/projects/rsnppredict/. Copyright © 2016 Elsevier Ltd. All rights reserved.
Hajimani, Elmira; Ruano, M G; Ruano, A E
2017-07-01
This paper presents a Radial Basis Functions Neural Network (RBFNN) based detection system, for automatic identification of Cerebral Vascular Accidents (CVA) through analysis of Computed Tomographic (CT) images. For the design of a neural network classifier, a Multi Objective Genetic Algorithm (MOGA) framework is used to determine the architecture of the classifier, its corresponding parameters and input features by maximizing the classification precision, while ensuring generalization. This approach considers a large number of input features, comprising first and second order pixel intensity statistics, as well as symmetry/asymmetry information with respect to the ideal mid-sagittal line. Values of specificity of 98% and sensitivity of 98% were obtained, at pixel level, by an ensemble of non-dominated models generated by MOGA, in a set of 150 CT slices (1,867,602pixels), marked by a NeuroRadiologist. This approach also compares favorably at a lesion level with three other published solutions, in terms of specificity (86% compared with 84%), degree of coincidence of marked lesions (89% compared with 77%) and classification accuracy rate (96% compared with 88%). Copyright © 2017. Published by Elsevier B.V.
Classification of yeast cells from image features to evaluate pathogen conditions
NASA Astrophysics Data System (ADS)
van der Putten, Peter; Bertens, Laura; Liu, Jinshuo; Hagen, Ferry; Boekhout, Teun; Verbeek, Fons J.
2007-01-01
Morphometrics from images, image analysis, may reveal differences between classes of objects present in the images. We have performed an image-features-based classification for the pathogenic yeast Cryptococcus neoformans. Building and analyzing image collections from the yeast under different environmental or genetic conditions may help to diagnose a new "unseen" situation. Diagnosis here means that retrieval of the relevant information from the image collection is at hand each time a new "sample" is presented. The basidiomycetous yeast Cryptococcus neoformans can cause infections such as meningitis or pneumonia. The presence of an extra-cellular capsule is known to be related to virulence. This paper reports on the approach towards developing classifiers for detecting potentially more or less virulent cells in a sample, i.e. an image, by using a range of features derived from the shape or density distribution. The classifier can henceforth be used for automating screening and annotating existing image collections. In addition we will present our methods for creating samples, collecting images, image preprocessing, identifying "yeast cells" and creating feature extraction from the images. We compare various expertise based and fully automated methods of feature selection and benchmark a range of classification algorithms and illustrate successful application to this particular domain.
Evolving binary classifiers through parallel computation of multiple fitness cases.
Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni
2005-06-01
This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tumuluru, Jaya Shankar; McCulloch, Richard Chet James
In this work a new hybrid genetic algorithm was developed which combines a rudimentary adaptive steepest ascent hill climbing algorithm with a sophisticated evolutionary algorithm in order to optimize complex multivariate design problems. By combining a highly stochastic algorithm (evolutionary) with a simple deterministic optimization algorithm (adaptive steepest ascent) computational resources are conserved and the solution converges rapidly when compared to either algorithm alone. In genetic algorithms natural selection is mimicked by random events such as breeding and mutation. In the adaptive steepest ascent algorithm each variable is perturbed by a small amount and the variable that caused the mostmore » improvement is incremented by a small step. If the direction of most benefit is exactly opposite of the previous direction with the most benefit then the step size is reduced by a factor of 2, thus the step size adapts to the terrain. A graphical user interface was created in MATLAB to provide an interface between the hybrid genetic algorithm and the user. Additional features such as bounding the solution space and weighting the objective functions individually are also built into the interface. The algorithm developed was tested to optimize the functions developed for a wood pelleting process. Using process variables (such as feedstock moisture content, die speed, and preheating temperature) pellet properties were appropriately optimized. Specifically, variables were found which maximized unit density, bulk density, tapped density, and durability while minimizing pellet moisture content and specific energy consumption. The time and computational resources required for the optimization were dramatically decreased using the hybrid genetic algorithm when compared to MATLAB's native evolutionary optimization tool.« less
Automated Test Assembly for Cognitive Diagnosis Models Using a Genetic Algorithm
ERIC Educational Resources Information Center
Finkelman, Matthew; Kim, Wonsuk; Roussos, Louis A.
2009-01-01
Much recent psychometric literature has focused on cognitive diagnosis models (CDMs), a promising class of instruments used to measure the strengths and weaknesses of examinees. This article introduces a genetic algorithm to perform automated test assembly alongside CDMs. The algorithm is flexible in that it can be applied whether the goal is to…
ERIC Educational Resources Information Center
Tran, Huu-Khoa; Chiou, Juing -Shian; Peng, Shou-Tao
2016-01-01
In this paper, the feasibility of a Genetic Algorithm Optimization (GAO) education software based Fuzzy Logic Controller (GAO-FLC) for simulating the flight motion control of Unmanned Aerial Vehicles (UAVs) is designed. The generated flight trajectories integrate the optimized Scaling Factors (SF) fuzzy controller gains by using GAO algorithm. The…
Rausch, Tobias; Thomas, Alun; Camp, Nicola J.; Cannon-Albright, Lisa A.; Facelli, Julio C.
2008-01-01
This paper describes a novel algorithm to analyze genetic linkage data using pattern recognition techniques and genetic algorithms (GA). The method allows a search for regions of the chromosome that may contain genetic variations that jointly predispose individuals for a particular disease. The method uses correlation analysis, filtering theory and genetic algorithms (GA) to achieve this goal. Because current genome scans use from hundreds to hundreds of thousands of markers, two versions of the method have been implemented. The first is an exhaustive analysis version that can be used to visualize, explore, and analyze small genetic data sets for two marker correlations; the second is a GA version, which uses a parallel implementation allowing searches of higher-order correlations in large data sets. Results on simulated data sets indicate that the method can be informative in the identification of major disease loci and gene-gene interactions in genome-wide linkage data and that further exploration of these techniques is justified. The results presented for both variants of the method show that it can help genetic epidemiologists to identify promising combinations of genetic factors that might predispose to complex disorders. In particular, the correlation analysis of IBD expression patterns might hint to possible gene-gene interactions and the filtering might be a fruitful approach to distinguish true correlation signals from noise. PMID:18547558
A Swarm Optimization Genetic Algorithm Based on Quantum-Behaved Particle Swarm Optimization.
Sun, Tao; Xu, Ming-Hai
2017-01-01
Quantum-behaved particle swarm optimization (QPSO) algorithm is a variant of the traditional particle swarm optimization (PSO). The QPSO that was originally developed for continuous search spaces outperforms the traditional PSO in search ability. This paper analyzes the main factors that impact the search ability of QPSO and converts the particle movement formula to the mutation condition by introducing the rejection region, thus proposing a new binary algorithm, named swarm optimization genetic algorithm (SOGA), because it is more like genetic algorithm (GA) than PSO in form. SOGA has crossover and mutation operator as GA but does not need to set the crossover and mutation probability, so it has fewer parameters to control. The proposed algorithm was tested with several nonlinear high-dimension functions in the binary search space, and the results were compared with those from BPSO, BQPSO, and GA. The experimental results show that SOGA is distinctly superior to the other three algorithms in terms of solution accuracy and convergence.
Automatic classification of protein structures using physicochemical parameters.
Mohan, Abhilash; Rao, M Divya; Sunderrajan, Shruthi; Pennathur, Gautam
2014-09-01
Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.
Liu, Chun; Kroll, Andreas
2016-01-01
Multi-robot task allocation determines the task sequence and distribution for a group of robots in multi-robot systems, which is one of constrained combinatorial optimization problems and more complex in case of cooperative tasks because they introduce additional spatial and temporal constraints. To solve multi-robot task allocation problems with cooperative tasks efficiently, a subpopulation-based genetic algorithm, a crossover-free genetic algorithm employing mutation operators and elitism selection in each subpopulation, is developed in this paper. Moreover, the impact of mutation operators (swap, insertion, inversion, displacement, and their various combinations) is analyzed when solving several industrial plant inspection problems. The experimental results show that: (1) the proposed genetic algorithm can obtain better solutions than the tested binary tournament genetic algorithm with partially mapped crossover; (2) inversion mutation performs better than other tested mutation operators when solving problems without cooperative tasks, and the swap-inversion combination performs better than other tested mutation operators/combinations when solving problems with cooperative tasks. As it is difficult to produce all desired effects with a single mutation operator, using multiple mutation operators (including both inversion and swap) is suggested when solving similar combinatorial optimization problems.
Dong, Yu-Shuang; Xu, Gao-Chao; Fu, Xiao-Dong
2014-01-01
The cloud platform provides various services to users. More and more cloud centers provide infrastructure as the main way of operating. To improve the utilization rate of the cloud center and to decrease the operating cost, the cloud center provides services according to requirements of users by sharding the resources with virtualization. Considering both QoS for users and cost saving for cloud computing providers, we try to maximize performance and minimize energy cost as well. In this paper, we propose a distributed parallel genetic algorithm (DPGA) of placement strategy for virtual machines deployment on cloud platform. It executes the genetic algorithm parallelly and distributedly on several selected physical hosts in the first stage. Then it continues to execute the genetic algorithm of the second stage with solutions obtained from the first stage as the initial population. The solution calculated by the genetic algorithm of the second stage is the optimal one of the proposed approach. The experimental results show that the proposed placement strategy of VM deployment can ensure QoS for users and it is more effective and more energy efficient than other placement strategies on the cloud platform. PMID:25097872
Dong, Yu-Shuang; Xu, Gao-Chao; Fu, Xiao-Dong
2014-01-01
The cloud platform provides various services to users. More and more cloud centers provide infrastructure as the main way of operating. To improve the utilization rate of the cloud center and to decrease the operating cost, the cloud center provides services according to requirements of users by sharding the resources with virtualization. Considering both QoS for users and cost saving for cloud computing providers, we try to maximize performance and minimize energy cost as well. In this paper, we propose a distributed parallel genetic algorithm (DPGA) of placement strategy for virtual machines deployment on cloud platform. It executes the genetic algorithm parallelly and distributedly on several selected physical hosts in the first stage. Then it continues to execute the genetic algorithm of the second stage with solutions obtained from the first stage as the initial population. The solution calculated by the genetic algorithm of the second stage is the optimal one of the proposed approach. The experimental results show that the proposed placement strategy of VM deployment can ensure QoS for users and it is more effective and more energy efficient than other placement strategies on the cloud platform.
Genetic algorithm for nuclear data evaluation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arthur, Jennifer Ann
These are slides on genetic algorithm for nuclear data evaluation. The following is covered: initial population, fitness (outer loop), calculate fitness, selection (first part of inner loop), reproduction (second part of inner loop), solution, and examples.
MEMS-based sensing and algorithm development for fall detection and gait analysis
NASA Astrophysics Data System (ADS)
Gupta, Piyush; Ramirez, Gabriel; Lie, Donald Y. C.; Dallas, Tim; Banister, Ron E.; Dentino, Andrew
2010-02-01
Falls by the elderly are highly detrimental to health, frequently resulting in injury, high medical costs, and even death. Using a MEMS-based sensing system, algorithms are being developed for detecting falls and monitoring the gait of elderly and disabled persons. In this study, wireless sensors utilize Zigbee protocols were incorporated into planar shoe insoles and a waist mounted device. The insole contains four sensors to measure pressure applied by the foot. A MEMS based tri-axial accelerometer is embedded in the insert and a second one is utilized by the waist mounted device. The primary fall detection algorithm is derived from the waist accelerometer. The differential acceleration is calculated from samples received in 1.5s time intervals. This differential acceleration provides the quantification via an energy index. From this index one may ascertain different gait and identify fall events. Once a pre-determined index threshold is exceeded, the algorithm will classify an event as a fall or a stumble. The secondary algorithm is derived from frequency analysis techniques. The analysis consists of wavelet transforms conducted on the waist accelerometer data. The insole pressure data is then used to underline discrepancies in the transforms, providing more accurate data for classifying gait and/or detecting falls. The range of the transform amplitude in the fourth iteration of a Daubechies-6 transform was found sufficient to detect and classify fall events.
Trong Bui, Duong; Nguyen, Nhan Duc; Jeong, Gu-Min
2018-06-25
Human activity recognition and pedestrian dead reckoning are an interesting field because of their importance utilities in daily life healthcare. Currently, these fields are facing many challenges, one of which is the lack of a robust algorithm with high performance. This paper proposes a new method to implement a robust step detection and adaptive distance estimation algorithm based on the classification of five daily wrist activities during walking at various speeds using a smart band. The key idea is that the non-parametric adaptive distance estimator is performed after two activity classifiers and a robust step detector. In this study, two classifiers perform two phases of recognizing five wrist activities during walking. Then, a robust step detection algorithm, which is integrated with an adaptive threshold, peak and valley correction algorithm, is applied to the classified activities to detect the walking steps. In addition, the misclassification activities are fed back to the previous layer. Finally, three adaptive distance estimators, which are based on a non-parametric model of the average walking speed, calculate the length of each strike. The experimental results show that the average classification accuracy is about 99%, and the accuracy of the step detection is 98.7%. The error of the estimated distance is 2.2⁻4.2% depending on the type of wrist activities.
Fernandez-Lozano, C.; Canto, C.; Gestal, M.; Andrade-Garda, J. M.; Rabuñal, J. R.; Dorado, J.; Pazos, A.
2013-01-01
Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected. PMID:24453933
Fuel management optimization using genetic algorithms and expert knowledge
DOE Office of Scientific and Technical Information (OSTI.GOV)
DeChaine, M.D.; Feltus, M.A.
1996-09-01
The CIGARO fuel management optimization code based on genetic algorithms is described and tested. The test problem optimized the core lifetime for a pressurized water reactor with a penalty function constraint on the peak normalized power. A bit-string genotype encoded the loading patterns, and genotype bias was reduced with additional bits. Expert knowledge about fuel management was incorporated into the genetic algorithm. Regional crossover exchanged physically adjacent fuel assemblies and improved the optimization slightly. Biasing the initial population toward a known priority table significantly improved the optimization.
Optimal placement of tuning masses on truss structures by genetic algorithms
NASA Technical Reports Server (NTRS)
Ponslet, Eric; Haftka, Raphael T.; Cudney, Harley H.
1993-01-01
Optimal placement of tuning masses, actuators and other peripherals on large space structures is a combinatorial optimization problem. This paper surveys several techniques for solving this problem. The genetic algorithm approach to the solution of the placement problem is described in detail. An example of minimizing the difference between the two lowest frequencies of a laboratory truss by adding tuning masses is used for demonstrating some of the advantages of genetic algorithms. The relative efficiencies of different codings are compared using the results of a large number of optimization runs.
2008-06-01
postponed the fulfillment of her own Masters Degree by at least 18 months so that I would have the opportunity to earn mine. She is smart , lovely...GENETIC ALGORITHM AND MULTI AGENT SYSTEM TO EXPLORE EMERGENT PATTERNS OF SOCIAL RATIONALITY AND A DISTRESS-BASED MODEL FOR DECEIT IN THE WORKPLACE...of a Genetic Algorithm and Mutli Agent System to Explore Emergent Patterns of Social Rationality and a Distress-Based Model for Deceit in the
Multi-objective Optimization Design of Gear Reducer Based on Adaptive Genetic Algorithms
NASA Astrophysics Data System (ADS)
Li, Rui; Chang, Tian; Wang, Jianwei; Wei, Xiaopeng; Wang, Jinming
2008-11-01
An adaptive Genetic Algorithm (GA) is introduced to solve the multi-objective optimized design of the reducer. Firstly, according to the structure, strength, etc. in a reducer, a multi-objective optimized model of the helical gear reducer is established. And then an adaptive GA based on a fuzzy controller is introduced, aiming at the characteristics of multi-objective, multi-parameter, multi-constraint conditions. Finally, a numerical example is illustrated to show the advantages of this approach and the effectiveness of an adaptive genetic algorithm used in optimized design of a reducer.
NASA Astrophysics Data System (ADS)
Wu, Q. H.; Ma, J. T.
1993-09-01
A primary investigation into application of genetic algorithms in optimal reactive power dispatch and voltage control is presented. The application was achieved, based on (the United Kingdom) National Grid 48 bus network model, using a novel genetic search approach. Simulation results, compared with that obtained using nonlinear programming methods, are included to show the potential of applications of the genetic search methodology in power system economical and secure operations.
Algorithme intelligent d'optimisation d'un design structurel de grande envergure
NASA Astrophysics Data System (ADS)
Dominique, Stephane
The implementation of an automated decision support system in the field of design and structural optimisation can give a significant advantage to any industry working on mechanical designs. Indeed, by providing solution ideas to a designer or by upgrading existing design solutions while the designer is not at work, the system may reduce the project cycle time, or allow more time to produce a better design. This thesis presents a new approach to automate a design process based on Case-Based Reasoning (CBR), in combination with a new genetic algorithm named Genetic Algorithm with Territorial core Evolution (GATE). This approach was developed in order to reduce the operating cost of the process. However, as the system implementation cost is quite expensive, the approach is better suited for large scale design problem, and particularly for design problems that the designer plans to solve for many different specification sets. First, the CBR process uses a databank filled with every known solution to similar design problems. Then, the closest solutions to the current problem in term of specifications are selected. After this, during the adaptation phase, an artificial neural network (ANN) interpolates amongst known solutions to produce an additional solution to the current problem using the current specifications as inputs. Each solution produced and selected by the CBR is then used to initialize the population of an island of the genetic algorithm. The algorithm will optimise the solution further during the refinement phase. Using progressive refinement, the algorithm starts using only the most important variables for the problem. Then, as the optimisation progress, the remaining variables are gradually introduced, layer by layer. The genetic algorithm that is used is a new algorithm specifically created during this thesis to solve optimisation problems from the field of mechanical device structural design. The algorithm is named GATE, and is essentially a real number genetic algorithm that prevents new individuals to be born too close to previously evaluated solutions. The restricted area becomes smaller or larger during the optimisation to allow global or local search when necessary. Also, a new search operator named Substitution Operator is incorporated in GATE. This operator allows an ANN surrogate model to guide the algorithm toward the most promising areas of the design space. The suggested CBR approach and GATE were tested on several simple test problems, as well as on the industrial problem of designing a gas turbine engine rotor's disc. These results are compared to other results obtained for the same problems by many other popular optimisation algorithms, such as (depending of the problem) gradient algorithms, binary genetic algorithm, real number genetic algorithm, genetic algorithm using multiple parents crossovers, differential evolution genetic algorithm, Hookes & Jeeves generalized pattern search method and POINTER from the software I-SIGHT 3.5. Results show that GATE is quite competitive, giving the best results for 5 of the 6 constrained optimisation problem. GATE also provided the best results of all on problem produced by a Maximum Set Gaussian landscape generator. Finally, GATE provided a disc 4.3% lighter than the best other tested algorithm (POINTER) for the gas turbine engine rotor's disc problem. One drawback of GATE is a lesser efficiency for highly multimodal unconstrained problems, for which he gave quite poor results with respect to its implementation cost. To conclude, according to the preliminary results obtained during this thesis, the suggested CBR process, combined with GATE, seems to be a very good candidate to automate and accelerate the structural design of mechanical devices, potentially reducing significantly the cost of industrial preliminary design processes.
NASA Astrophysics Data System (ADS)
Sun, Xiuqiao; Wang, Jian
2018-07-01
Freeway service patrol (FSP), is considered to be an effective method for incident management and can help transportation agency decision-makers alter existing route coverage and fleet allocation. This paper investigates the FSP problem of patrol routing design and fleet allocation, with the objective of minimizing the overall average incident response time. While the simulated annealing (SA) algorithm and its improvements have been applied to solve this problem, they often become trapped in local optimal solution. Moreover, the issue of searching efficiency remains to be further addressed. In this paper, we employ the genetic algorithm (GA) and SA to solve the FSP problem. To maintain population diversity and avoid premature convergence, niche strategy is incorporated into the traditional genetic algorithm. We also employ elitist strategy to speed up the convergence. Numerical experiments have been conducted with the help of the Sioux Falls network. Results show that the GA slightly outperforms the dual-based greedy (DBG) algorithm, the very large-scale neighborhood searching (VLNS) algorithm, the SA algorithm and the scenario algorithm.
NASA Astrophysics Data System (ADS)
Khan, Asif; Ryoo, Chang-Kyung; Kim, Heung Soo
2017-04-01
This paper presents a comparative study of different classification algorithms for the classification of various types of inter-ply delaminations in smart composite laminates. Improved layerwise theory is used to model delamination at different interfaces along the thickness and longitudinal directions of the smart composite laminate. The input-output data obtained through surface bonded piezoelectric sensor and actuator is analyzed by the system identification algorithm to get the system parameters. The identified parameters for the healthy and delaminated structure are supplied as input data to the classification algorithms. The classification algorithms considered in this study are ZeroR, Classification via regression, Naïve Bayes, Multilayer Perceptron, Sequential Minimal Optimization, Multiclass-Classifier, and Decision tree (J48). The open source software of Waikato Environment for Knowledge Analysis (WEKA) is used to evaluate the classification performance of the classifiers mentioned above via 75-25 holdout and leave-one-sample-out cross-validation regarding classification accuracy, precision, recall, kappa statistic and ROC Area.
Automated spike sorting algorithm based on Laplacian eigenmaps and k-means clustering.
Chah, E; Hok, V; Della-Chiesa, A; Miller, J J H; O'Mara, S M; Reilly, R B
2011-02-01
This study presents a new automatic spike sorting method based on feature extraction by Laplacian eigenmaps combined with k-means clustering. The performance of the proposed method was compared against previously reported algorithms such as principal component analysis (PCA) and amplitude-based feature extraction. Two types of classifier (namely k-means and classification expectation-maximization) were incorporated within the spike sorting algorithms, in order to find a suitable classifier for the feature sets. Simulated data sets and in-vivo tetrode multichannel recordings were employed to assess the performance of the spike sorting algorithms. The results show that the proposed algorithm yields significantly improved performance with mean sorting accuracy of 73% and sorting error of 10% compared to PCA which combined with k-means had a sorting accuracy of 58% and sorting error of 10%.A correction was made to this article on 22 February 2011. The spacing of the title was amended on the abstract page. No changes were made to the article PDF and the print version was unaffected.
Mixture of learners for cancer stem cell detection using CD13 and H and E stained images
NASA Astrophysics Data System (ADS)
Oǧuz, Oǧuzhan; Akbaş, Cem Emre; Mallah, Maen; Taşdemir, Kasım.; Akhan Güzelcan, Ece; Muenzenmayer, Christian; Wittenberg, Thomas; Üner, Ayşegül; Cetin, A. E.; ćetin Atalay, Rengül
2016-03-01
In this article, algorithms for cancer stem cell (CSC) detection in liver cancer tissue images are developed. Conventionally, a pathologist examines of cancer cell morphologies under microscope. Computer aided diagnosis systems (CAD) aims to help pathologists in this tedious and repetitive work. The first algorithm locates CSCs in CD13 stained liver tissue images. The method has also an online learning algorithm to improve the accuracy of detection. The second family of algorithms classify the cancer tissues stained with H and E which is clinically routine and cost effective than immunohistochemistry (IHC) procedure. The algorithms utilize 1D-SIFT and Eigen-analysis based feature sets as descriptors. Normal and cancerous tissues can be classified with 92.1% accuracy in H and E stained images. Classification accuracy of low and high-grade cancerous tissue images is 70.4%. Therefore, this study paves the way for diagnosing the cancerous tissue and grading the level of it using H and E stained microscopic tissue images.
Diagnostic Accuracy Comparison of Artificial Immune Algorithms for Primary Headaches.
Çelik, Ufuk; Yurtay, Nilüfer; Koç, Emine Rabia; Tepe, Nermin; Güllüoğlu, Halil; Ertaş, Mustafa
2015-01-01
The present study evaluated the diagnostic accuracy of immune system algorithms with the aim of classifying the primary types of headache that are not related to any organic etiology. They are divided into four types: migraine, tension, cluster, and other primary headaches. After we took this main objective into consideration, three different neurologists were required to fill in the medical records of 850 patients into our web-based expert system hosted on our project web site. In the evaluation process, Artificial Immune Systems (AIS) were used as the classification algorithms. The AIS are classification algorithms that are inspired by the biological immune system mechanism that involves significant and distinct capabilities. These algorithms simulate the specialties of the immune system such as discrimination, learning, and the memorizing process in order to be used for classification, optimization, or pattern recognition. According to the results, the accuracy level of the classifier used in this study reached a success continuum ranging from 95% to 99%, except for the inconvenient one that yielded 71% accuracy.
Liao, Katherine P; Ananthakrishnan, Ashwin N; Kumar, Vishesh; Xia, Zongqi; Cagan, Andrew; Gainer, Vivian S; Goryachev, Sergey; Chen, Pei; Savova, Guergana K; Agniel, Denis; Churchill, Susanne; Lee, Jaeyoung; Murphy, Shawn N; Plenge, Robert M; Szolovits, Peter; Kohane, Isaac; Shaw, Stanley Y; Karlson, Elizabeth W; Cai, Tianxi
2015-01-01
Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM.
Liao, Katherine P.; Ananthakrishnan, Ashwin N.; Kumar, Vishesh; Xia, Zongqi; Cagan, Andrew; Gainer, Vivian S.; Goryachev, Sergey; Chen, Pei; Savova, Guergana K.; Agniel, Denis; Churchill, Susanne; Lee, Jaeyoung; Murphy, Shawn N.; Plenge, Robert M.; Szolovits, Peter; Kohane, Isaac; Shaw, Stanley Y.; Karlson, Elizabeth W.; Cai, Tianxi
2015-01-01
Background Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. Methods and Results We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. Conclusions We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM. PMID:26301417
Research on Laser Marking Speed Optimization by Using Genetic Algorithm
Wang, Dongyun; Yu, Qiwei; Zhang, Yu
2015-01-01
Laser Marking Machine is the most common coding equipment on product packaging lines. However, the speed of laser marking has become a bottleneck of production. In order to remove this bottleneck, a new method based on a genetic algorithm is designed. On the basis of this algorithm, a controller was designed and simulations and experiments were performed. The results show that using this algorithm could effectively improve laser marking efficiency by 25%. PMID:25955831
Yuan, Jun–hui; Cheng, Fang–Yun; Zhou, Shi–Liang
2012-01-01
Background Tree peonies are great ornamental plants associated with a rich ethnobotanical history in Chinese culture and have recently been used as an evolutionary model. The Qinling Mountains represent a significant geographic barrier in Asia, dividing mainland China into northern (temperate) and southern (semi–tropical) regions; however, their flora has not been well analyzed. In this study, the genetic differentiation and genetic structure of Paeonia rockii and the role of the Qinling Mountains as a barrier that has driven intraspecific fragmentation were evaluated using 14 microsatellite markers. Methodology/Principal Findings Twenty wild populations were sampled from the distributional range of P. rockii. Significant population differentiation was suggested (FST value of 0.302). Moderate genetic diversity at the population level (HS of 0.516) and high population diversity at the species level (HT of 0.749) were detected. Significant excess homozygosity (FIS of 0.076) and recent population bottlenecks were detected in three populations. Bayesian clusters, population genetic trees and principal coordinate analysis all classified the P. rockii populations into three genetic groups and one admixed Wenxian population. An isolation-by-distance model for P. rockii was suggested by Mantel tests (r = 0.6074, P<0.001) and supported by AMOVA (P<0.001), revealing a significant molecular variance among the groups (11.32%) and their populations (21.22%). These data support the five geographic boundaries surrounding the Qinling Mountains and adjacent areas that were detected with Monmonier's maximum-difference algorithm. Conclusions/Significance Our data suggest that the current genetic structure of P. rockii has resulted from the fragmentation of a formerly continuously distributed large population following the restriction of gene flow between populations of this species by the Qinling Mountains. This study provides a fundamental genetic profile for the conservation and responsible exploitation of the extant germplasm of this species and for improving the genetic basis for breeding its cultivars. PMID:22523566
NASA Astrophysics Data System (ADS)
An, M.; Assumpcao, M.
2003-12-01
The joint inversion of receiver function and surface wave is an effective way to diminish the influences of the strong tradeoff among parameters and the different sensitivity to the model parameters in their respective inversions, but the inversion problem becomes more complex. Multi-objective problems can be much more complicated than single-objective inversion in the model selection and optimization. If objectives are involved and conflicting, models can be ordered only partially. In this case, Pareto-optimal preference should be used to select solutions. On the other hand, the inversion to get only a few optimal solutions can not deal properly with the strong tradeoff between parameters, the uncertainties in the observation, the geophysical complexities and even the incompetency of the inversion technique. The effective way is to retrieve the geophysical information statistically from many acceptable solutions, which requires more competent global algorithms. Competent genetic algorithms recently proposed are far superior to the conventional genetic algorithm and can solve hard problems quickly, reliably and accurately. In this work we used one of competent genetic algorithms, Bayesian Optimization Algorithm as the main inverse procedure. This algorithm uses Bayesian networks to draw out inherited information and can use Pareto-optimal preference in the inversion. With this algorithm, the lithospheric structure of Paran"› basin is inverted to fit both the observations of inter-station surface wave dispersion and receiver function.